Getting to the Crown Jewels

The crown jewels of the Database Edition product are the SQL parsers and script generator, these two pieces form the foundation of what the database project system does internally.

parser

The parser, parses T-SQL code and turns it in to a script fragment, where the script generator takes a script fragment and turns it in to T-SQL code. This way we can roundtrip code between the two components. The script generator has the ability to format code on the way out, however there is no mode to preserve formatting or whitespace by the script generator at this point in time.

Per CTP16 of the GDR release the parser and script generator are public classes, so if you have any need to parse some SQL code, here is a solution.

The parser and script generator live in two assemblies. The Microsoft.Data.Schema.ScriptDom contains provider agnostic classes and the Microsoft.Data.Schema.ScriptDom.Sql assembly contain classes for the parser and script generator that are SQL Server specific.

Let's look at a very simple example on how to use the two components. The following screenshot shows the end result. A simple WinForms application that takes in a piece of T-SQL in this case the create statement of the authors table from the pubs database. The second text box shows the formatted result coming out of the script generator.

image

Lets digest what was needed to achieve this. I will not go in to the detail of the WinForms code since it is trivial and not that interesting, you can look at the sample code if you need to for that. I am going to focus on the usage of the parser and the script generator code.

 

The first step is to add the references to the required assemblies:

image

We first need to add two assembly references two:

  • Microsoft.Data.Schema.ScriptDom.dll
  • Microsoft.Data.Schema.ScriptDom.Sql.dll

Then we add two variables to the WinForms class to hold an instance of the parser and the script generator.

    1: private TSql100Parser _parser;
    2: private Sql100ScriptGenerator _scriptGen;

As you can see the parser and script generator are version specific implementations, there exists a TSql80Parser, TSql90Parser and TSql100Parser, the same hold true for the Sql<>ScriptGenerator classes. Normally you would want to instantiate the classes through a factory pattern, to abstract the version away, but in this simple example we will instantiate the versioned instances of the classes directly.  This happens in the form Load method.

    1: private void Form1_Load(object sender, EventArgs e)
    2: {
    3:     bool fQuotedIdenfifiers = false;
    4:     _parser = new TSql100Parser(fQuotedIdenfifiers);
    5:  
    6:     SqlScriptGeneratorOptions options = new SqlScriptGeneratorOptions();
    7:     options.SqlVersion = SqlVersion.Sql100;
    8:     options.KeywordCasing = KeywordCasing.UpperCase;
    9:  
   10:     _scriptGen = new Sql100ScriptGenerator(options);
   11: }

The only thing the parser needs to know is how to interpret string inside double quotes, are they strings or object identifiers? This is indicated when the parser class is constructed.

The script generator has many more options and takes and SqlScriptGeneratorOptions class instance as the constructor argument. The options class lets you define things like the version of the language and how to case keywords for examples. You will also find a set of indention properties that you can use to change the formatting results.

Now we are at the point we can do the hard work. When the user clicks the button the Click method will parse the SQL and script it back out to the second text box if there are no errors parsing the input code.

Lets step through the Click method:

    1: private void button1_Click(object sender, EventArgs e)
    2: {
    3:     textBox2.Text = String.Empty;
    4:     textBox3.Text = String.Empty;
    5:  
    6:     string inputScript = textBox1.Text;
    7:     IScriptFragment fragment;
    8:     IList<ParseError> errors;
    9:     using (StringReader sr = new StringReader(inputScript))
   10:     {
   11:         fragment = _parser.Parse(sr, out errors);
   12:     }
   13:  
   14:     if (errors != null && errors.Count > 0)
   15:     {
   16:         StringBuilder sb = new StringBuilder();
   17:         foreach (var error in errors)
   18:         {
   19:             sb.AppendLine(error.Message);
   20:             sb.AppendLine("offset " + error.Offset.ToString());
   21:         }
   22:         textBox3.Text = sb.ToString();
   23:     }
   24:     else
   25:     {
   26:         String script;
   27:         _scriptGen.GenerateScript(fragment, out script);
   28:         textBox2.Text = script;
   29:     }
   30: }

We first define to variables to hold the output of the parse operation, the IScriptFragment will hold the script fragment, this is the most important output of the parser, and it will also output an IList of ParseError objects when the parse operations ran in to problems.

Second we need to make the input to the parse available as a StreamReader, so we are converting the input string in to a StringReader class. Line 11 shows the actual call that parses the input stream and outputs the fragments and the errors.

Then we check if any errors occurred during the parse of the input. If so we iterate through the list of error objects and print out the message and offset information.

When there are no errors, we take the resulting script fragment and feed it back to the script generator. Line 27 shows the actual call. The output will be a formatted script.

 

You can download the sample code from the DataDude SkyDrive: ParsingAndScriptGen.zip 

 

Neat he? I hope this is useful!

-GertD

Comments

  • Anonymous
    August 21, 2008
    PingBack from http://housesfunnywallpaper.cn/?p=1876

  • Anonymous
    August 21, 2008
    The classes may be public, but what is the redist limitations on the assemblies?  I would assume that they're not redistributable and that any app that uses them would only run on a machine with Team Database on it... I would like to be wrong though :)

  • Anonymous
    August 21, 2008
    Data Dude - if you know about these 2 words then you know what I'm talking about. Visual Studio Team

  • Anonymous
    August 22, 2008
    Which regards to the redist question please see my next post. http://blogs.msdn.com/gertd/archive/2008/08/22/redist.aspx

  • Anonymous
    August 22, 2008
    Thanks, this is really cool and useful stuff.  I would like to request that the type, TSqlFragmentVisitor, be made public too as that will make it much easier for us to write our own visitors on top of the sql expression tree.   Also, While most of the sql expression tree is serializable, it appears that one of the member variables, which is a list of SqlParserTokens is not (the SQL Parser Token isn't marked as serializable).  Could this be fixed? The ability to serialize a SQL Expression tree would be very useful. One thing that I can see a use for is a set of factory methods built on top of the expression tree such that statements could be more easily constructed in code (instead of parsed from a string).  Very much like SqlOM (http://sourceforge.net/projects/sqlom/).  We'd like to replace SqlOM with these libraries once RTM happens...   In our case, we prefer SqlOM as a way of generating "safe" queries from a client machine and then sending the object graph on the wire for our servers to parse and extract a safe select query (to prevent injection, etc). Another interesting use would be a way to translate the sql expressions (or at least the DML part) into a standard .NET Expression Tree (and vise-versa).   Very cool stuff...

  • Anonymous
    August 25, 2008
    Kirstin Juhl on Everything old is new again Martin Hinshelwood on Heat ITSM GertD on Getting to the...

  • Anonymous
    November 25, 2008
    Finally the moment is there, the final version of the Visual Studio Team System 2008 Database Edition

  • Anonymous
    December 01, 2008
    Ca n'est pas encore Visual Studio 2010, mais pour la partie base de données, on s'en approche à grand

  • Anonymous
    February 08, 2009
    [ Nacsa Sándor , 2009. február 8.] Ez a Team System változat az adatbázis változások kezeléséhez és teszteléséhez

  • Anonymous
    March 05, 2009
    Thanks for sharing. Based on your example  I created Powershell cmdlet implementations http://chadwickmiller.spaces.live.com/blog/cns!EA42395138308430!296.entry

  • Anonymous
    July 22, 2009
    Is there a any way to preserve comments

  • Anonymous
    August 02, 2010
    Hi all, I use Oracle. Is it possible use it ?? Thanks.

  • Anonymous
    August 04, 2010
    The comment has been removed