Row Goals in Action
Today, we'll talk about row goals. The optimizer in SQL Server has a feature that can bias plan choices to retrieve a certain number of rows quickly instead of the whole results. This shows up in a few places, but the primary areas are in TOP N queries and in subqueries that need to check for the existance of "a" row - WHERE EXISTS, for example. The query hint "option (fast N)" also translates into the same feature.
One of the problems that can arise with row goals is that the query plan can change when you add them into a query request. Especially with the option(fast N) hint, you can find cases where the first row may come back quickly but the whole results come back more slowly. So, if you send option(fast N) but retrieve the whole results, your system won't perform as well.
Effectively, we bias the optimizer to favor plans that can return a few rows quickly compared to the minimum cost to return all rows. In practice, this often means that joins will choose nested loops joins for row-goal limited plans and hash joins otherwise.
The following example demonstrates the issue:
use tempdb
create table A(col1 int, col2 binary(100), col3 int)
declare @i int
set @i = 0
while @i < 5000
begin
insert into A(col1, col2, col3) values (@i, 0x3333, rand()*1000)
set @i = @i + 1
end
create clustered index i1 on A(col1)
set statistics time on
-- should pick a series of hash joins
select A1.* from
A as A1 inner join A as A2 on A1.col1 = A2.col1
inner join A as A3 on A1.col1 = A3.col1
-- if there is a row goal, we’ll pick the loop join plan that returns one (or a few) row(s) quickly
set statistics time on
select A1.* from
A as A1 inner join A as A2 on A1.col1 = A2.col1
inner join A as A3 on A1.col1 = A3.col1
option (fast 1)
You can run these on your installation to see the time difference for retrieving all rows with the hash join plan vs. the loop join plan (just keep adding rows until you see the difference).
This pattern is caused by the row goal logic in the optimizer. When we have a very low row goal, the nested loops join is preferred because its initial cost (the cost for the first row) is comparatively low – it just involves a single seek for this example. The hash join plan has a higher initial cost because it has to build a hash table before any rows can be returned. Once built, however, the hash join plan is generally cheaper and would be picked if the estimated number of rows gets large.
Comments
Anonymous
March 30, 2006
Very cool ....Anonymous
April 21, 2006
There is very little documentation about FAST and how it could be used in the context of an application. Could you point to some more information - Books Online has very little on FAST.Anonymous
September 25, 2006
Since we’re starting a new blog for the whole query processing team, we think it’s appropriate to start...Anonymous
November 20, 2007
PingBack from http://hyiplive.org/row-goals-in-actionAnonymous
May 15, 2008
PingBack from http://scarydba.wordpress.com/2008/05/15/fast-n-query-hint/Anonymous
June 16, 2009
PingBack from http://fixmycrediteasily.info/story.php?id=15924