TempDB Monitoring and Troubleshooting: Allocation Bottleneck

This blog continues the discussion on the common issues in TempDB that you may need to troubleshoot. In the blog https://blogs.msdn.com/sqlserverstorageengine/archive/2009/01/05/tempdb-monitoring-and-troubleshooting-io-bottleneck.aspx, we discussed how to identify and troubleshoot IO bottleneck in TempDB. In this blog, I will describe how to indentify allocation bottleneck and to troubleshoot it.

As you may recall, the allocation bottleneck is caused when allocation structures are accessed by concurrent threads in conflicting modes. Please refer to https://blogs.msdn.com/sqlserverstorageengine/archive/2009/01/04/what-is-allocation-bottleneck.aspx for details. In this case, the concurrent thread(s) will need to wait to acquire the pagelatch thereby slowing the operation. The good thing is that the pages containing allocation structures (GAM, SGAM, PFS) are well known and have fixed page numbers in each file. For example, in file-id 1, the allocation pages IDs are

· PFS – 1

· GAM – 2

· SGAM - 3

A PFS page will appear every 8088 pages in a file. The GAM and SGAM will appear every 511232 pages and similarly in other files.

Diagnosing:

You can run the following DMV query to find any latch waits that occur in allocation pages

select   session_id, wait_duration_ms,   resource_description

      from sys.dm_os_waiting_tasks

      where wait_type like 'PAGE%LATCH_%' and

            resource_description like '2:%'

Since the database id of TempDB is 2, the search argument ‘2.%’ represents any page in TempDB across any file. If this page happens to be GAM, SGAM or PFS, it will represent allocation bottleneck. Note, in a concurrent application, some blocking is expected so you will need to baseline the allocation waits when your application is performing normally. Only when the waits exceed the baseline significantly, it signals that you are incurring allocation bottleneck.

Troubleshooting:

SQL Server recommends the following

· Create atleast as many files of equal size as there are COREs/CPUs for SQL Server process. The rationale is that at a given time, the number of concurrent threads is <= number of COREs/CPUs. Don’t confuse this with the number of active sessions/batches.

· Enable TF-1118 if you are encountering bottleneck in SGAM structures.

· If you are still encountering allocation bottleneck, you will need to look at your application and see which query plans are creating/dropping objects in TempDB and if these objects are being cached and take corrective action when possible. Most IT shops have very limited choice here as they don’t own the application code.

Here I want to point out one seemingly harmless step to solve allocation bottleneck that can in fact make it worse. Say you are encountering allocation bottleneck and you decide to add one more file to the mix hoping that allocation will spread further. Well, if you recall proportional fill methodology; all new allocations will favor the newer file because it has the most free space. So suddenly, the allocation problem becomes even worse. It is a good practice to restart SQL Server when you add a new file, clearly it needs to be of the same size as other files, to TempDB.

Thanks

Sunil Agarwal

Comments

  • Anonymous
    January 10, 2009
    PingBack from http://blog.a-foton.ru/index.php/2009/01/11/tempdb-monitoring-and-troubleshooting-allocation-bottleneck/

  • Anonymous
    February 02, 2009
    The comment has been removed

  • Anonymous
    February 02, 2009
    No, I have not. Does your monitoring showing contention in tempdb allocaton structures? Suggest you compare your numbers with baseline thanks Sunil

  • Anonymous
    February 12, 2009
    How many times in a day you observe the TEMPDB in your SQL Server enviornment? How many times in a week

  • Anonymous
    August 30, 2010
    You say "Create atleast as many files of equal size as there are COREs/CPUs for SQL Server process. " but we now have servers with 16, 24, 36, and soon 64 processors.  In technet.microsoft.com/.../cc966545.aspx it says "Having too many files increases the cost of file switching, requires more IAM pages, and increases the manageability overhead." At what point do we hit too many files and performance degrades?

  • Anonymous
    August 30, 2010
    This is a great question. I will dig into this and post it here or in a separate blog. Some of it will depend on your workload as well. I will not worry about the overhead of IAM pages as the number of IAM pages is very low compared to regular pages but I agree that having more files makes it harder to manage.

    • Anonymous
      December 16, 2016
      8 should be the maximum number of files we should create
  • Anonymous
    August 30, 2010
    Some comments on the question 'how many files for TempDB in > 64 processor/core configuration' Mike Ruthruff (SQLCAT team and storage expert) >>  I would tend to think that the benefit of scaling out the structures would outweigh any performance overhead (assuming you have a heavy rate of allocations). Pravin Mittal (Performance owner): We have tried 128 files in 100% CPU environment with no performance degradation.