fslex Sample
You may have heard of two tools that come with the current F# distribution, fslex and fsyacc. Both of those are based famous tools which are helpful in writing compilers.
In this post I’ll focus on creating a simple program which will take advantage of fslex, derived from Lex which stands for Lexical Analyser. Lex is a tool that produces a scanner or tokenizer. A scanner is something which takes some input text and breaks it down into tokens, such as code file being broken down into keywords, operators, constants, and identifiers.
To keep things simple, we will create a simple app called MegaCalc which will parse basic mathematical expressions like:
“1.0 / (sin pi + cos pi)”
“e ^ (-1)”
We expect fslex to produce a scanner which can take the string “sin(pi) + (31.0 * 2)” and convert it to its underlying tokens, which are [SIN; LPAREN; PI; RPAREN; PLUS; LPAREN; FLOAT(31.0); TIMES; INT(2); RPAREN]. For a compiler writer not having to worry about breaking a source file into its raw tokens makes parsing it and doing semantic analysis much easier. (But we will leave semantic analysis to fsyacc in a later post.)
In order to create MegaCalc we will follow three simple steps: defining our token types, creating an fslex file, and finally running fslex to generate our scanner. I’m going to try to do as much hand waving as possible for the sake of clarity, but if you want a more complex example on parsing there is another sample in the F# distribution under ‘samples\parsing’ which showcases both fslex and fsyacc. In addition, there are many resources on how to use lex and yacc which are more or less compatible with fslex and fsyacc.
Step One – Define our Token Type
We will start by defining all of our tokens using a discriminated union. Notice how this feature of F# allows some tokens to carry along a value with them, this leads to much cleaner code.
Tokens.fs
type Tokens =
// Numeric types
| INT of int | FLOAT of float
// Constants
| PI | E
// Trig functions
| SIN | COS | TAN
// Operators
| PLUS | DASH | ASTERISK
| SLASH | CARET
// Misc
| LPAREN | RPAREN | EOF
Step Two – Define the Lex Specification
The lexical specifically file is what Lex uses to produce a scanner. The header of the file (everything between the first { and } ) is spit directly into the output file. The rest defines the regular expression and parse method for our mini language. Whenever Lex matches a token based on a regular expression it ‘evaluates’ what is in the curly braces. Our parser will return a single token for ‘legal’ characters, simply chew through whitespace, and any unrecognizable character will cause an exception to be thrown.
Lexer.fsl
{
(***
All code between the two curly braces will be spit directly into
the generated code file.
***)
open System
// Open our module which defines the token types
open Tokens
// These two modules define goo needed to use fslex
open Lexing
let inc_lnum bol pos =
let lnum = pos.pos_lnum in
{pos with pos_lnum = lnum+1; pos_bol = bol }
let newline lexbuf =
lexbuf_set_curr_p lexbuf
( inc_lnum (lexeme_end lexbuf) (lexeme_end_p lexbuf))
}
// Base regular expressions
let digit = ['0'-'9']
let whitespace = [' ' '\t' ]
let newline = ('\n' | '\r' '\n')
rule parsetokens = parse
// ----------------------------
| whitespace { parsetokens lexbuf }
| newline { newline lexbuf; parsetokens lexbuf }
// ----------------------------
| ['-']?digit+ { INT (Int32.Parse(lexeme lexbuf)) }
| ['-']?digit+('.'digit+)?(['e''E']digit+)? { FLOAT (Double.Parse(lexeme lexbuf)) }
// ----------------------------
| "pi" { PI }
| "e" { E }
// ----------------------------
| "sin" { SIN }
| "cos" { COS }
| "tan" { TAN }
// ----------------------------
| "+" { PLUS }
| "-" { DASH }
| "*" { ASTERISK }
| "/" { SLASH }
| "^" { CARET }
| "(" { LPAREN }
| ")" { RPAREN }
// ----------------------------
| eof { EOF }
Step Three – Run fslex.exe
Although you can add an F# Lex Specification file from the Visual Studio Add-In, to run fslex you will need to break to the command line. Fslex.exe is located in your F# distribution’s ‘bin’ directory.
Output from fslex:
C:\MegaCalc\>fslex lexer.fsl
compiling to dfas (can take a while...)
154 NFA nodes
32 states
writing output
Putting it All Together
Now that we have a generated code module, Lexer.fs, we can use our scanner/tokenizer to parse strings. Notice the use of the ‘Lexing’ namespace, which is a set of library routines to compliment fslex.
Main.fs
#light
open System
// Opens the module with our generated scanner code
open Lexer
// Given a lex buffer chew through the buffer returning a token list.
let parse lexbuf =
let mutable keepParsing = true
let mutable tokenList = []
while keepParsing = true do
// 'parsetokens' is the method generated by fslex
let parsedToken = parsetokens lexbuf
// Add the token to our list
tokenList <- tokenList @ [parsedToken]
if parsedToken = Tokens.EOF then
keepParsing <- false
tokenList
// Attempts to parse a block of text and prints stats to STDOUT
let tryParse text =
let lexbuf = Lexing.from_string text
try
let tokens = parse lexbuf
printfn "Success. %d tokens." (List.length tokens)
printfn "Tokens : %A" tokens
with e ->
let pos = lexbuf.EndPos
printfn "Exception Message: %s\nLex error near line %d, character %d"
e.Message pos.Line pos.Column
// Code which actually executes
let welcomeMessage = @"
MegaCalc (fslex sample)
-----------------------
Example Usage:
""cos pi * 42.0""
""e ^ (-1 + cos (0.0))""
""quit""
:"
Console.Write(welcomeMessage)
let mutable inputString = Console.ReadLine()
while inputString <> "quit" do
// Try to parse the string
tryParse inputString
// Get the next line
Console.Write("\n:")
inputString <- Console.ReadLine()
Console.WriteLine("(press any key)")
Console.ReadKey(true)
Sample Output
MegaCalc (fslex sample)
-----------------------
Example Usage:
"cos pi * 42.0"
"e ^ (-1 + cos (0.0))"
"quit"
:cos pi * 42.0
Success. 5 tokens.
Tokens : [COS; PI; ASTERISK; FLOAT 42.0; EOF]
:e ^ (-1 - 1)
Success. 8 tokens.
Tokens : [E; CARET; LPAREN; INT -1; DASH; INT 1; RPAREN; EOF]
:quit
(press any key)
Conclusion
As you can see, we have constructed a simple with a minimum of work. Now imagine other types of situations where a generated parser could come in handy, such as how to break apart log files.
In a future post we will cover fsyacc which will take that token stream and convert it into a syntax tree, which will then make it easier to evaluate the actual equations.
Disclaimer
Despite my best efforts to get the facts straight I reserve the right to be wrong. Also, the code posted is offered “as is” and offers no warranty of any kind, expressed or implied.
Comments
Anonymous
January 18, 2008
In this post I’ll focus on creating a simple program which will take advantage of fslex, and show howAnonymous
January 28, 2008
The first release is a simple four function calculator, which builds off of last week's blog post and integrates fsyacc for parsing. The net result is the ability to write simple simple expressions like "10 * 2 - 15 / 3" and get it evaluated using theAnonymous
May 30, 2008
Last Tuesday I gave a talk to the .NET Developers Association entitled Language Oriented ProgrammingAnonymous
September 09, 2008
While I am thrilled about all the new features we've put into the F# CTP, perhaps the thing I'm mostAnonymous
October 18, 2008
It'd be great if you could repost the source along with the project for the latest CTP... including the MSBuild tasks for Lex and Yacc. The 2008 CTP doesn't syntax highlight the .fsl file :( Thanks for posting this. It's a great help.