Every Program There Is, Part Three

[This is part of a series on generating every string in a language. The previous part is here. The next part is here.]

Suppose we want to write a grammar for a simplified C# class declaration. Let’s say that there are one-character identifiers, a class can be public or internal, and that there are no nested or generic types or other members. How about this?

DECL: MODIFIER class ID : ID { }
MODIFIER: public | internal
ID: a | b | c | d

so public class d : b { } is legal. But we forgot that the modifier should be optional:

DECL: WITHMODIFIER | WITHOUTMODIFIER
WITHMODIFIER: MODIFIER WITHOUTMODIFIER
WITHOUTMODIFER: class ID : ID { }
MODIFIER: public | internal
ID: a | b | c | d

And we forgot that the base class is optional…

DECL: WITHMODIFIER | WITHOUTMODIFIER
WITHMODIFIER: MODIFIER WITHOUTMODIFIER
WITHOUTMODIFIER: MAYBEBASE { }
MAYBEBASE : WITHOUTBASE : ID | WITHOUTBASE
WITHOUTBASE: class ID
MODIFIER: public | internal
ID: a | b | c | d

It’s doable, but as you can see, this is getting to be a bit of a mess. It is a lot easier if we can say that “nothing at all” is a legal substitution. Since that’s hard to write, I’m going to make a new rule that NIL is a special terminal that is a nice way of writing the empty string. Now our grammar can be much easier to read:

DECL: MODIFIER class ID BASE { }
MODIFIER: NIL | public | internal
BASE: NIL | : ID
ID: a | b | c | d

This addition has interesting consequences. Previously we were in the situation where every application of a rule made the "current string" no shorter in terms of the number of terminals plus the number of nonterminals. Now we have a situation where a substitution can make the resulting string have fewer nonterminals without adding any terminals. But I'm getting ahead of myself somewhat.

Here I've characterized "NIL" as a terminal which is a special way of writing the empty string. Another way of characterizing NIL is as a nonterminal; NIL as a nonterminal is a special nonterminal such that the only substitution is the empty string. For the rest of this series I'll treat it as a convenient way of writing the empty string terminal.

Next time: techniques for generating all members of a language

[This is part of a series on generating every string in a language. The previous part is here. The next part is here.]

Comments

  • Anonymous
    May 03, 2010
    Ya BNF! any chance of getting the C# AST available for developers to use? it would make the plugins for VS2010 so much more powerful! We get that feedback a lot. We hope to "open up" the compiler and expose it as a "code analysis service" someday. However, this is still in the design stages and you should not expect anything anytime soon. -- Eric

  • Anonymous
    May 04, 2010
    This has been a pretty fascinating series. I have to admit this has piqued my interest about compiler theory in general. I had an idea recently for doing some JavaScript based intellisense for an open source project, but I got bogged down in the details. Do you have any good resources you could point me to just to get my feet wet? I realize that intellisense and compiler theory are different animals, but it looks as though they have a lot in common. Glad you're enjoying it. Yes, they have lots in common; IntelliSense requires us to build a compiler that can correctly analyze incorrect source code faster than you can type; that's a much more difficult problem than writing a "batch" compiler that just reads files off disks. JScript is a particularly tricky language to write an IntelliSense engine for because of its highly dynamic nature and lack of compile-time hints. When I last wrote a simple JScript IntelliSense engine it was pretty weak. I looked at assignments to local variables and worked out the types of the expressions, and I identified usages of the HTML DOM objects like "window" that I had type information for. There are more sophisticated things you can do with JScript, like building yourself a no-external-side-effects version of the JScript interpreter and then using it to actually run the program you are editing as you are editing it; that way you can do live inspection of the state of the dynamic objects. I don't know of any resources specifically about building IntelliSense-style analyzers. -- Eric

  • Anonymous
    May 05, 2010
    Interesting...I'm learning more in this series of posts than I've learned from some compiler books.

  • Anonymous
    May 05, 2010
    The comment has been removed