Custom proximity search with NEAR in SQL Server 2012
When you perform a proximity search with NEAR in versions of SQL Server prior to SQL Server 2012 "Denali," we decide what's NEAR for you. In SQL Server 2012, however, you're in control! With the new customizable NEAR operator, your full-text query can now specify 2 optional conditions that you couldn't specify before:
- How near the search terms must be to one another.
- Whether Term A must come before or after Term B.
Here's a quick example of the new NEAR operator:
… CONTAINSTABLE (Documents, Content, 'NEAR((John, Smith), 4, TRUE)')
This is a new operator, not an enhancement to the existing NEAR operator. The generic NEAR operator is deprecated in SQL Server 2012. It continues to work, but should not be used in new development.
1. Syntax of the custom proximity operator
1.1 Search terms
You can specify 2 or more search terms. As with generic NEAR, all search terms must be found in the target text to qualify as a hit.
- Separate multiple terms with commas.
- Terms can be single words or phrases.
- Terms can be simple terms ('A') or prefix terms ('A*').
- The query can use certain other query operators with NEAR. (See later in this post.)
- A query for a single term returns an error.
- The maximum number of terms is 64. A larger number returns an error.
- Use double quotes to delimit phrases.
What about noise words among the query terms?
With the default setting – Transform noise words = 0 – the query raises a warning and does not return results if one or more of the search terms is a noise word.
With the setting Transform noise words = 1, the noise word among the query terms is ignored. If only 1 of the search terms is not a noise word, then the query effectively becomes a search for that term. For example:
CONTAINS(NEAR(('the', 'dog'), 3)) => CONTAINS('dog')
What about overlapping query terms? We only return non-overlapping occurrences. For example, if you performed the following query against the text 'AA' -
NEAR((A, AA), 2, TRUE)
- we will not return a hit, because the occurrences overlap. In fact, a query for overlapping terms can fail due to its complexity.
1.2 Distance between terms – optional max_gap argument
For detailed information about the optional max_gap argument, see my blog post, How near is NEAR in SQL Server 2012 Full-Text Search?
1.3 Order of terms – optional order argument
The optional Boolean order argument enforces order among the search terms. The default value is FALSE. If you want to specify order, then you also have to specify max_gap.
1.4 Right-to-left (RTL) languages are respected
In Arabic, Hebrew, and other RTL languages, the NEAR semantics apply in reverse. For example, NEAR((John, Smith), 3, TRUE) will match documents that contains "Smith xx xx John."
2. Using other operators in a NEAR query
You cannot use ISABOUT or FORMSOF with the custom proximity operator.
You cannot combine the new NEAR operator in a query with the generic ("old") NEAR operator.
However, you can use AND and OR, as shown in the following examples:
CONTAINS('NEAR((A, B), 5) AND C')
CONTAINS('NEAR((A, B), 5) OR C')
CONTAINS('NEAR((A, B), 5) AND/OR NEAR((C, D), 3)')
I hope this helps you to use the new NEAR operator in SQL Server 2012 successfully! Subscribe to our blog feed for ongoing updates about search capabilities in SQL Server.
-Doug