Regular Expressions From Scratch, Part Five: The Regular Expression Language
Now things start to get really weird.
Definition 9: Take any alphabet S. The regular expression alphabet of S is S plus a bunch of extra symbols; it's S ∪ {(, ), *, ∪, Ø} (I assume that none of those symbols are already in S.)
I'm doing something that I said earlier that I would try not to do. I'm using symbols in an alphabet that I also use in expressions that talk about strings in that alphabet. (Of course, I also said that this would all fall apart when we got to regular expressions, and sure enough, it did. Foreshadowing: the sign of a quality blog.)
Again, keep a careful eye on when I'm using fixed-width blue, because those are the "meaningless" symbols, not expression notation.
Definition 10: Take any alphabet S. The regular expression language R of an alphabet S is a language formed from strings of the regular expression language of S, and is defined as follows:
- Ø is in R.
- Every member of S is in R
- If u and w are strings in R then (uw) is in R.
- Similarly, (u∪w) is in R.
- Similarly, u* is in R.
- Nothing else is in R
An example might help. Suppose that S = {a, b}. The regular expression alphabet of S is {a, b, (, ), *, ∪, Ø}. The regular expression language of S is R = {Ø, a, b, (ØØ), (Øa), (Øb), (aØ), (aa), (ab), (bØ), (ba), (bb), (Ø∪Ø), (Ø∪a), (Ø∪b), (a∪Ø), (a∪a), (a∪b), (b∪Ø), (b∪a), (b∪b), Ø*, a*, b*, …}
We've defined an alphabet, we've defined a language -- a language which looks suspiciously like the expressions we've been using to talk about languages. Next time we'll do something insanely clever to bridge the gap between strings in the regular expression language and the languages which these strings would define if they were interpreted as expressions.
Comments
- Anonymous
December 01, 2005
Just wanted to let you know how much I am enjoying this series! - Anonymous
December 01, 2005
The comment has been removed - Anonymous
December 01, 2005
The empty set character doesn't render properly in Lucida Console. - Anonymous
December 02, 2005
I'm really enjoying this series too. Probably my favorite series yet on your blog. - Anonymous
January 08, 2006
In definition 10, I think the phrase "the regular expression language of S" should be "the regular expression alphabet of S". - Anonymous
January 08, 2006
Again about definition 10, should we add one more entry to the list as follows:
If u is a string in R then (u) is in R. - Anonymous
January 24, 2006
That extra definition would be redundant, and could be removed. (u) === u so (u)* === u* and so on for all of the operation symbols. - Anonymous
January 24, 2006
In response to your other post, Shawn, definition 10 is correct in calling it the "regular expression language" since definition 4 says that a language is a subset of S* (which matches the notation used) and alphabets must be finite according to definition 3 (i think, maybe definition 2) while R is not. - Anonymous
February 21, 2006
The comment has been removed - Anonymous
March 03, 2006
When it comes to validating input regular expression becomes a very important part of your security plan. ...