Fulltext Search Improvements in SQL Server code-named 'Denali' CTP1

Fulltext Search codebase has been significantly revamped to address both query performance and throughput on large scale (millions of documents) with concurrent updates. With SQL2008 we moved all the index storage in the database file and majority of population logic in to the core engine to make fulltext search an integral and fully manageable engine component. However there was work to be done to make it perform and scale against best of the fulltext engines out there in the industry. With Denali CTP1, we are pleased to deliver this improvement.

We looked at the entire code base from how queries block while waiting an ongoing index update to release a shared schema lock, from how much memory is allocated during index fragment population, to how we could reorganize the query code base as a streaming Table Value Function to optimize for TOP N search queries, how we could maintain key distribution histograms to execute search on parallel threads, all the way to how we could take better advantage of the processor compute instructions (scoring ranks for example)… End result is that we are able to significantly boost performance (10X in many cases when it comes to concurrent index updates with large query workloads) and scale without having to change any storage structures or existing API surface. All our customers going from SQL 2008 / R2 to Denali will benefit with this improvement.

Besides performance and scale improvement, we also added support for property scoped searches over documents with file system properties stored with-in a fulltext enabled table. One can now issue a CONTAINS query looking for all documents containing a particular term and authored by a particular author without having to maintain a separate column for the Author name in the database.

We also improved NEAR operator in the CONTAINS predicate to allow users to specify distance between two terms and if the order of the term matters. It is important to note that distances between two words in a single sentence are much smaller compared to the same across two sentences (even if words are placed next to each other with a period in between), or across paragraphs, or across bullet points or across spreadsheet columns or worksheets.

We invite you to give SQL Server Code-named Denali Release a try and let us know what you think of the Fulltext Search improvements. Also enclosed here is a presentation that summarizes these feature improvements in CTP1 release.

Deep Dive on new Search Features in Denali CTP1.pdf

Comments

  • Anonymous
    November 02, 2011
    Still no word on hit highlighting?