A LINQ provider for RDF files - part 2

For the simple Rdf queries like

IQueryable<string> q = from x in rdf

                 from y in rdf

                 where rdf.A(germany, hasAdminDiv, x)

                    && rdf.A(x, isOfType, germanState)

                    && rdf.A(x, hasName, y)
select y.Val + " [" + x.Val + "]";

 

which we are going to support here there is a “normal form” given by

- a set of variables, which denote resources or values in an RDF document – in the example above this is {x,y}.

- a set of constraint triples (subj, pred, obj) where subj, pred, obj are either variables or constants. This is the query condition – in the above example it is
{(germany,hasAdminDiv,x),(x,isOfType,germanState),(x,hasName,y) }

- a “projection function” using these variables which denotes the value which we associate with each “row” – in the above example this is
(x,y) => y.Val + " [" + x.Val + "]"

 

To execute such a query means finding all possible assignments of resources / values to the variables such that all resulting triples are in the axioms of the RDF file, and then applying the projection function to get a set of objects of a certain type (the return type of the projection function – in the above example this is string).

 

The compiler will treat the above query expression as syntactic sugar for an expression like:

rdf.SelectMany(x => rdf.Where(y => Cond(x,y))

                      .Select(y => f(x,y))

               )

where Cond(x,y) is the condition involving rdf.A and f(x,y) is the function that assigns a string to each pair (x,y) of values in the Rdf document.

 

The same query could be written in different forms: For example replacing an expression

  rdf.Where(y => Cond1(x,y) && Cond2(x,y))

by

  rdf.Where(y => Cond1(x,y)).Where(z => Cond2(x,z))

should lead to the same normal form.

 

So how do we get LINQ to translate these expressions to the above normal form?

 

To get LINQ started, our Rdf type has to implement an IQueryable<T> interface, like the System.Data.DLinq.Table<T> does. When we query a database table without conditions, we get the set of all rows in the table. The analog notion for an RDF file (or RDF files, or any set of Rdf triples) is the set of all “Values” in the RDF document, so we implement the interface IQueryable<Value> on Rdf.

 

“Value” is the common base type of Literal (meaning a string occurring in an object position in an axiom) and Resource (given by a URI occurring in any position in any axiom).

Since we usually do not really want to retrieve all values occurring in a document, it does not matter too much what exactly we get when we foreach over a document (e.g. all values or only the resources?), what is more important is the IQueryable part, since that means that now the query operators Where, Select, SelectMany are defined for Rdf.

 

The basic observation is that we now can give the normal form of a query corresponding to a Rdf object (variables: {x}, constraints: {}, projection: x => x), and we can recursively determine the normal form of a query which is constructed out of these with the operators Where, Select and SelectMany.
There is some fine print:

1) Variables and variable names:
In rdf.Where(y => Cond1(x,y)).Where(z => Cond2(x,z)) the names y and z correspond to the same variable (which runs over the rdf at the beginning of this expression). We have to be careful to distinguish between variables (that the solver will assign to values) and named references to these variables (like “y” and “z” above).

2) Variables can be defined outside of a (sub)expression:
In rdf.Where(y => Cond1(x,y)) the variable x is defined in an enclosing scope. When we translate a (sub)expression, we always have to give the list of variables in the enclosing scope as a parameter.

3) Some restrictions apply:
- We only deal with Where, Select, SelectMany when applied to a Rdf query with identity projection function, i.e. the output is given by a variable and is a sequence of objects of type Value (e.g. not to a sequence of strings).
- The conditions in the Where clause only are of the form Rdf.A(?,?,?), the predicate is always given as a constant, and at least one of the entries is a variable.

With these caveats, here is what this recursive algorithm does:

- Where:
Source.Where(v => Cond(v)):
Translate the query expression Source. Assume the output of Source is a variable. Make the name v point to the same variable, translate the condition and add the result to the list of constraints.
The output variable of the new query expression is the same as for Source.

- SelectMany:
Source.SelectMany(v => Seq(v)):
Translate the query expression Source. Assume the output of Source is given by a variable. Make the name v point to the same variable. Add the variables and constraints of Source and Seq together. The projection function of the result is the projection function of Seq.

- Select:
Source.Select(v=>f(v)):
Translate the query expression Source. Assume the output of Source is given by a variable. Make the name v point to the same variable. Determine all parameters occurring in f, build a Lambda expression (v1,v2,..,vn) => f(v1,v2,…,vn) and compile it. This is the projection function of the result. The variables and constraints of the result are the same as from source.

 

I attach a VS2005 solution which implements this algorithm. It assumes the May LINQ CTP is installed.

It contains four projects:

- LinqToRdf is the main project which implements this algorithm
It uses an ITriplePovider object which enumerates triples, and an ISolver object that implements a solution algorithm that takes “local information” about the possibilities to complete a triple when the predicate and maybe one of subject and object are given, and computes all the possible solutions of a given query (given as a set of query triples).

- RdfXmlReader is an implementation of ITripleProvider which reads in an RdfXml file. It uses Drive (see last blog entry), you have to modify the reference to Drive.dll in this project to point to your copy of Drive.dll.

- SimpleSolver implements a simple algorithm to solve an Rdf query in the above normal form.

- Demo uses these assemblies to read in the RDF files containing information about Germany and France and list all “administrative divisions” of Germany and France.

As always, this sample code is the product of Weekend Evening Rapid Prototyping, it is provided as-is and does not come with any warranty.
You can copy, modify, and use the code for commercial and non-commercial purposes.

To build the RdfXmlReader project, you need to download Drive.dll from https://www.driverdf.org/, see there for legal restrictions which may apply to this DLL.

RdfReader.zip

Comments

  • Anonymous
    July 25, 2006
    Excellent post, I have some comments about this on my blog.

  • Anonymous
    September 08, 2006
    The comment has been removed

  • Anonymous
    November 19, 2007
    Welcome to the thirty-sixth issue of Community Convergence. This is the big day, with Visual Studio 2008

  • Anonymous
    December 15, 2007
    V poslednej dobe sa všade skloňuje skratka LINQ (Language-Integrated Query). Trošku som popátral a je

  • Anonymous
    February 28, 2008
    Here are some useful links to LINQ information. Use the comments or write me if you want to add to this

  • Anonymous
    February 28, 2008
    I've recently updated the list of LINQ Providers found on my Links to LINQ page, accessible from the

  • Anonymous
    February 28, 2008
    I&#39;ve recently updated the list of LINQ Providers found on my Links to LINQ page, accessible from

  • Anonymous
    February 29, 2008
    The comment has been removed

  • Anonymous
    March 02, 2008
    PingBack from http://www.hecgo.com/2008/03/03/linq-to-everything-a-list-of-linq-providers/

  • Anonymous
    March 18, 2008
    I mentioned in a post a little while ago about the various LINQ To projects I had seen, but Charlie Calvert

  • Anonymous
    March 18, 2008
    PingBack from http://www.jacquessnyman.co.za/?p=20

  • Anonymous
    March 22, 2008
    LINQ Providers LINQ to Amazon LINQ to Active Directory LINQ over C# project LINQ to CRM LINQ To Geo

  • Anonymous
    March 22, 2008
    LINQ Providers LINQ to Amazon LINQ to Active Directory LINQ over C# project LINQ to CRM LINQ To Geo

  • Anonymous
    April 06, 2008
    Researching on this great feature in .NET 3.5, I found a lot of useful information for anyone who intend

  • Anonymous
    April 09, 2008
    PingBack from http://blog.windows2.webhome.at/post/2008/04/LINQ-to-AnyWhere.aspx

  • Anonymous
    April 22, 2008
    PingBack from http://blog.web-crossing.com/post/2008/04/LINQ-to-AnyWhere.aspx

  • Anonymous
    September 19, 2008
    Here are some useful links to LINQ information. Use the comments or write me if you want to add to this

  • Anonymous
    November 11, 2008
    Офіційні: LINQ to SQL (DLINQ) LINQ to XML (XLINQ) LINQ to XSD LINQ to Entities BLINQ PLINQ Неофіційні

  • Anonymous
    November 17, 2008
    Офіційні: LINQ to SQL (DLINQ) LINQ to XML (XLINQ) LINQ to XSD LINQ to Entities BLINQ PLINQ Неофіційні

  • Anonymous
    November 29, 2008
    PingBack from http://vincenthomedev.wordpress.com/2008/11/29/a-list-of-linq-providers/

  • Anonymous
    April 26, 2009
    This weekend I’ve built a small application, which queries the “Simpsons” seasons guide data and updates

  • Anonymous
    May 10, 2009
    PingBack from http://www.devdotnet.com.br/?p=512

  • Anonymous
    June 09, 2009
    PingBack from http://insomniacuresite.info/story.php?id=172

  • Anonymous
    June 17, 2009
    PingBack from http://pooltoysite.info/story.php?id=49