Introduction to Statistics in SQL Server


This article discuses about statistics in  SQL Server. You would have heard the term "Statistics" many times if you have idea about how execution plans are derived internally by SQL Server optimizer. So what are we waiting for! Let's get started!

What are Statistics

Statistics represent the distribution of data within a column or columns that are used by the SQL optimizer to come up with a optimum plan. The optimizer makes use of these stats to estimate the number of rows that will be returned by the query when it is actually executed. In simple terms,the estimated execution plan is generated by making use of statistics. When statistics are stale(out of date) or no statistics exists for a column/columns, then there exists a high probability of  data mismatch between actual no of rows and estimated no of rows resulting in poor execution plan generation and performance degradation.

When and How Statistics are created

Statistics are automatically created,

1. When index is created on a table.

2. When the database setting AUTO_CREATE_STATISTICS is set to ON. When setting this option to on, SQL Server will automatically create statistics for non-indexed columns that are used in your queries.

Syntax: ALTER DATABASE <dbname> SET AUTO_CREATE_STATISTICS ON

OR Right click <dbname> in Object Explorer-->Properties-->Options--->Auto Create Statistics--> True.

The name of the auto created statistics includes the column name and object id in Hexadecimal format Prefixed with _WA_SYS_

Eg : _WA_SYS_<Columnname>_<XXXX>

When and How Statistics are updated

Statistics are checked before query compilation or before executing a cached query plan. Statistics are considered to be stale when,

1. There was a data change on an empty table.

2. The number of rows in the table was 500 or less at the time of statistics creation and the column modification counter of the leading column of the statistics object has changed by more than 500 since then.

3. The table had more than 500 rows when the statistics were gathered, and the column modification counter of the leading column of the statistics object has changed by more than 500 + 20% of the number of rows in the table when the statistics were gathered.

4. A table in TempDB with less than 6 rows had at least 6 row modifications.

In such cases, the statistics are automatically updated by SQL server if the database setting  AUTO_UPDATE_STATISTICS is set to ON.

Syntax: ALTER DATABASE <dbname> SET AUTO_UPDATE_STATISTICS  ON

OR Right click <dbname> in Object Explorer-->Properties-->Options--->Auto Update Statistics--> True.

Statistics updation can also be done asynchronously when AUTO_UPDATE_STATISTICS_ASYNC is set to ON. In this case, the query optimizer will run the query first and update the statistics afterwards.

Syntax: ALTER DATABASE <dbname> SET AUTO_UPDATE_STATISTICS_ASYNC  ON

How to find stale Statistics

Statistics that are out of date can be found using the below query

SELECT OBJECT_NAME(id)as 'Objectame',name,STATS_DATE(id, indid) as  'StatisticsLastUpdatedDate',rowmodctr
FROM sys.sysindexes
WHERE STATS_DATE(id, indid)<=DATEADD(DAY,-1,GETDATE()) 
AND rowmodctr>0 
AND id IN  (SELECT  object_id FROM  sys.tables)order  by OBJECT_NAME(id)
GO

The rowmodctr field of sys.sysindexes shows how many rows were inserted or deleted or updated since the last statistics updation.

How to read Statistics

We can read statistics of a column using DBCC SHOW_STATISTICS. The output is in three parts namely,

1. Header : which contains meta-data about the set of statistics.

2. Density: Which shows the density values for the column or columns that define the set of statistics.

3. Histogram: The table that defines the histogram laid out above.

Syntax : DBCC SHOW_STATISTICS('<object name>','<Statname>')

Example : 

DBCC SHOW_STATISTICS('employeeheap', 'employeeheapPK')

The Header,Density and Histogram information will be explained in detail in the next wiki article :)

Note : DBCC SHOW_STATISTICS is an undocumented command. Hence it can be either deprecated or enhanced in future versions of SQL Server

Hope it is useful :)


Reference

http://msdn.microsoft.com/en-gb/library/ms190397.aspx
http://msdn.microsoft.com/en-gb/library/ms188038.aspx

See Also