Row Goals in Action

Today, we'll talk about row goals. The optimizer in SQL Server has a feature that can bias plan choices to retrieve a certain number of rows quickly instead of the whole results. This shows up in a few places, but the primary areas are in TOP N queries and in subqueries that need to check for the existance of "a" row - WHERE EXISTS, for example. The query hint "option (fast N)" also translates into the same feature. 

 

One of the problems that can arise with row goals is that the query plan can change when you add them into a query request. Especially with the option(fast N) hint, you can find cases where the first row may come back quickly but the whole results come back more slowly. So, if you send option(fast N) but retrieve the whole results, your system won't perform as well.

 

Effectively, we bias the optimizer to favor plans that can return a few rows quickly compared to the minimum cost to return all rows. In practice, this often means that joins will choose nested loops joins for row-goal limited plans and hash joins otherwise.

The following example demonstrates the issue:

use tempdb

create table A(col1 int, col2 binary(100), col3 int)

declare @i int

set @i = 0

while @i < 5000

begin

insert into A(col1, col2, col3) values (@i, 0x3333, rand()*1000)

set @i = @i + 1

end

create clustered index i1 on A(col1)

set statistics time on

-- should pick a series of hash joins

select A1.* from

A as A1 inner join A as A2 on A1.col1 = A2.col1

inner join A as A3 on A1.col1 = A3.col1

-- if there is a row goal, we’ll pick the loop join plan that returns one (or a few) row(s) quickly

set statistics time on

select A1.* from

A as A1 inner join A as A2 on A1.col1 = A2.col1

inner join A as A3 on A1.col1 = A3.col1

option (fast 1)

You can run these on your installation to see the time difference for retrieving all rows with the hash join plan vs. the loop join plan (just keep adding rows until you see the difference).

This pattern is caused by the row goal logic in the optimizer. When we have a very low row goal, the nested loops join is preferred because its initial cost (the cost for the first row) is comparatively low – it just involves a single seek for this example. The hash join plan has a higher initial cost because it has to build a hash table before any rows can be returned. Once built, however, the hash join plan is generally cheaper and would be picked if the estimated number of rows gets large.

Comments