A LINQ provider for Web queries

To start a series of "LINQ provider" posts, today I upload a provider sample that in some sense treats the Internet as a database: For a SQL Server database, you can make tables in a database accessible to LINQ by writing classes with attributes that define how objects of these classes are retrieved from rows in tables. LINQ can then use these classes to issue queries against the database. Similarly, this provider allows adding attributes to classes to specify how such objects are retrieved from Web pages, and you can then issue LINQ queries against them.

The project "WebLinq" in the attached solution contains this provider - it is not very sophisticated, it just contains three files:
- WebLinqAttributes.cs contains the attributes that are recognized
- WebContext.cs is the class your WebLinq enabled classes inherit from
- Utils.cs contains helper functions to GET / POST to a web site and to find substrings in a text.

The project "WebSources" defines some classes for 
- Searching for articles in the CiteSeer web sites (see below)
- Searching for articles in the MSDN web sites
- Translating words / sentences
- Integrating functions of one variable
- Looking up the current values of stocks from the company symbol

The project "SimpleDemos" uses these two DLLs to demonstrate the last three classes.

The project "TestWebLinq" demonstrates the access to the CiteSeer web sites.

CiteSeer is a database of computer science articles; you can search for articles by keywords, and obtain information about articles, and often even retrieve them directly from the Web site.
To use the CiteSeer demo, enter for example "Support Vector Machines" in the text box labeled "Search terms", and click on the "Retrieve" button. It will take some while to visit the web pages which list available articles, to visit the web page for each article, retrieve the information from this article, and access a another web page for details, but then you should see a list of paragraphs which contain
- Author's name(s)
- Title and year
- Some three lines of introduction
- URL for this article
- URL for downloading the article as pdf file
- Information about the rights for this article

If you are only interested in new articles, try entering 2002 in the "Publication year >=" text field and click again on "Retrieve" (currently I get 3 results back).

Here is how the corresponding query looks in the code:

var doc = new GoogleCiteSeer(searchTerms,0);
var query = from art in doc.Articles
            where art.details.Document != null
               && art.details.Document.bibtex != null
&& art.details.Document.bibtex.year>=minYear
            select art.details;

Here is an example for a class that defines how to read the "BibTeX" part of the Web page with details for an article:

public class CsBibTex {
[StartPart("author = \"")] [EndPart("\"")] public string author;
[StartPart("title = \"")] [EndPart("\"")] public string title;
[StartPart("year = ")] [EndPart(",")] public int year;
}

This sample code is provided as-is and does not come with any warranty.
You can modify and use the code for commercial and non-commercial purposes.

WebLinq.zip

Comments

  • Anonymous
    September 08, 2006
    The comment has been removed

  • Anonymous
    February 28, 2008
    Here are some useful links to LINQ information. Use the comments or write me if you want to add to this

  • Anonymous
    February 28, 2008
    I've recently updated the list of LINQ Providers found on my Links to LINQ page, accessible from the

  • Anonymous
    February 29, 2008
    The comment has been removed

  • Anonymous
    March 02, 2008
    PingBack from http://www.hecgo.com/2008/03/03/linq-to-everything-a-list-of-linq-providers/

  • Anonymous
    March 18, 2008
    I mentioned in a post a little while ago about the various LINQ To projects I had seen, but Charlie Calvert

  • Anonymous
    March 22, 2008
    LINQ Providers LINQ to Amazon LINQ to Active Directory LINQ over C# project LINQ to CRM LINQ To Geo

  • Anonymous
    March 22, 2008
    LINQ Providers LINQ to Amazon LINQ to Active Directory LINQ over C# project LINQ to CRM LINQ To Geo

  • Anonymous
    March 27, 2008
    PingBack from http://www.jacquessnyman.co.za/?p=20

  • Anonymous
    April 09, 2008
    PingBack from http://blog.windows2.webhome.at/post/2008/04/LINQ-to-AnyWhere.aspx

  • Anonymous
    April 22, 2008
    PingBack from http://blog.web-crossing.com/post/2008/04/LINQ-to-AnyWhere.aspx

  • Anonymous
    September 19, 2008
    Here are some useful links to LINQ information. Use the comments or write me if you want to add to this

  • Anonymous
    November 11, 2008
    Офіційні: LINQ to SQL (DLINQ) LINQ to XML (XLINQ) LINQ to XSD LINQ to Entities BLINQ PLINQ Неофіційні

  • Anonymous
    November 17, 2008
    Офіційні: LINQ to SQL (DLINQ) LINQ to XML (XLINQ) LINQ to XSD LINQ to Entities BLINQ PLINQ Неофіційні

  • Anonymous
    November 29, 2008
    PingBack from http://vincenthomedev.wordpress.com/2008/11/29/a-list-of-linq-providers/

  • Anonymous
    April 26, 2009
    This weekend I’ve built a small application, which queries the “Simpsons” seasons guide data and updates

  • Anonymous
    June 01, 2009
    PingBack from http://paidsurveyshub.info/story.php?id=73627