XQuery Contains vs Fulltext Contains

The XQuery/XPath 2.0 functions and operators specification defines a function called “contains.” It performs a substring match on the two input strings and returns true if the value of the 2nd string is contained in the value of the 1st string. An example:

 select convert(xml, '').query('contains("foo", "oo")')

This query will return the value “true” because “oo” is a substring of “foo”. There is one gotcha here, remember that we perform comparisons in our “XML Collation” for all XQuery operations, which means that comparisons are essentially code-point comparisons and are thus case sensitive. So the following query will return false:

 select convert(xml, '').query('contains("xquery is an xml query language", "XQuery")')

This would all be well and good, except T-SQL also defines a CONTAINS function which uses SQL Server’s fulltext search capabilities. As such, people from the T-SQL world who are just starting to play with XQuery are somewhat confused when they realize that T-SQL CONTAINS and XQuery contains are really two completely different things. Now, T-SQL CONTAINS will of course work with the new XML Datatype, and can be used quite effectively to augment our XQuery features for finding data within XML instances. However, it is important to point out that the two functions are different by design, and the results that each get are not meant to overlap (although they occasionally do) and cannot be used as replacements for each other. 

In short: XQuery contains implements simple substring matching.  T-SQL contains implements fulltext matching and thus does all of the cool fulltext things such as noise word removal and ranking.

Update: I am copying this out of the comments. Kent talked about this very same issue about a month ago.  He also gives some examples of using fulltext search over XML data.  I recommend checking it out.

Comments

  • Anonymous
    September 21, 2005
    Wow, this sounds like something I did a while back: http://www.sqljunkies.com/WebLog/ktegels/archive/2005/08/12/16395.aspx
  • Anonymous
    September 21, 2005
    Hey Kent, I suspect it was the same newsgroup post that got us both thinking :). We here in the product group have had a couple of people confused by this, so we figured it would be a good idea to have something to point to which would indicate the difference.
  • Anonymous
    September 21, 2005
    I bet so! :)