SQL Server 2005 SP2 will introduce new messages to the error log related to timing activities.

The following article has been publiched about this subject: https://support.microsoft.com/kb/931279  

RDTSC Drift vs Not Synchronized

To clarify - SQL Server 2005 SP2 contains both Drift and a Non Synchronized warnings.

The Non Synchronized message is only recorded on 1 hour intervals to avoid flooding the log. At most the server will report 1 entry/per scheduler/per hour.

To detect the Non Synchronized condition SQL Server uses the current RDTSC of the base scheduler and compares the RDTSC of subsequent schedulers.

Variance > ~3ms – Consider Non Syncronized Counter Warning

Variance > ~10% of mean average - Print Drift Counter Warning

The system should avoid either warning but the Drift is more concerning.

The recent PASS (PSS Labs) actually encountered these messages so I felt it prudent to publish some basic information about them. The following messages may appear in the SQL Server error log.

The time stamp counter of CPU on scheduler id 2 is not synchronized with other CPUs.

- or -

CPU time stamp frequency has changed from 191469 to 1794177 ticks per millisecond. The new frequency will be used.

I am in the process of publishing a detailed KB article - portions of that article provided here.

SUMMARY

The Windows operating system exposes timing mechanisms using millisecond precision values. The precision is typically 10 to 15ms but can be as large as 55ms. SQL Server queries frequently complete within single digit millisecond or microsecond time spans, requiring a higher resolution timer. Therefore, prior SQL Server versions would report the duration of some queries as 0ms. This makes it difficult to track and tune SQL Server performance.

SQL Server 2005 improves accuracy by using the high-resolution CPU counter to provide micro-second timing capabilities. Occasionally, when SpeedStep, PowerNow, Cool-n-Quiet, various Power Schemes, or similar technologies are being used the reported timing values can be incorrect. The following is a not a comprehensive list of affected objects and features.

· Trace

o Command Level Events (SQL:BatchCompleted, RPC:Completed, …)

o Statistics Generation Events (Auto Stats, …)

o Attention

o Transaction Events

· set statistics time

· sys.dm_exec_query_stats

· sysprocesses

· sys.dm_exec_requests

· sys.dm_exec_sessions

· sys. dm_io_pending_io_requests

· sys.dm_os_ring_buffers

· sys.dm_os_sys_info

· sys.dm_io_virtual_file_stats

· sys.dm_os_wait_stats

WORK AROUNDS

1. Disable Frequency Changes

Use the Control Panel | Power Options program. Setting the computer to the ‘Always On’ or ‘Max Performance’ power scheme generally forces the CPUs to remain at maximum frequency.

Note: It may be insufficient to disable CPU frequency variation features at the BIOS level. There are various third party utilities that can alter CPU frequencies. Some implementations enable frequency adjustments even when under maximum power scheme settings.

2. Enable SQL Server Processor Affinity

Drift-related problems can be reduced by preventing SQL worker threads from moving between CPUs. In SQL Server 2005 it is possible to dynamically change affinity without restarting the SQL Server service.  

2. Synchronize Clock Frequencies and Counters (Drift)

Some systems require utilities and drivers from the hardware manufacturer to help keep frequency changes and CPU clock counters in lock step.

Contact your hardware manufacturer for complete details about clock frequency and counter optimizations.

Bob Dorr
Senior SQL Server Escalation Engineer

 

May 9, 2007 Update 

Microsoft has recently received numerous reports of DRIFT warnings. We continue to investigate these warnings but at this time it appears the warning may be logged prematurely by SQL Server 2005 SP2.

What appears to be happening is that the message is based on the RDTSC counter of each scheduler and the latest value is retrieved from the sys.dm_os_ring_buffers (SCHEDULER_RING_BUFFER) record. If a scheduler is in an idle state (no immediate work to do) the calculation does not account for this situation and uses the last ring buffer entry for the scheduler as the current RDTSC. Since this entry can be > 3MS the message may be inadvertently logged.

Unless you have strong reason to believe the system is experiencing scheduling or performance problems the DRIFT message may be getting logged inadvertently on your system and can often be safely ignored until other performance behaviors are coorolated.

Microsoft Support uses a utility called RDTSCTest to help track down possible RDTSC variations. Even with this utility you have to be careful because there is no call on the system to capture the current state of every CPUs RDTSC at the same time. The capture of each CPUs RDTSC is done in a very tight loop but even that can show variations. We could start a thread on each CPU and have them wait on an event then signal it but we can’t control the operating system dispatch logic so even with all our attempts we can’t get exact RDTSC timing on all CPUs at the same time.

Repeated use of the utility is good to monitor RDTSC trending or exposing very large gaps but even that is compromised. On a dual core system if you use one of the cores the other one can throttle down and the RDTSCs will drift. When the system gets enough load the second code will pick up the load and often the RDTSC comes back in line. So attempting to spin on a set of CPUs to determine RDSTC is not full proof.

We are also working to remove SQL Server reliance on the RDTSC counter for future releases so these issues are no longer of consequence.

Bob Dorr
Senior SQL Server Escalation Engineer

May 10, 2007 Update

I have repeated this several times so I felt it worth posting for all.

Prior to SQL Server 2005 Microsoft SQL Server support encountered several issues that the duration was not granular enough. Older versions used GetTickCount() which is limited to a granularity around 12ms. It was possible to provide a system and queries running 1000s of times would report 0 duration and be ignored as a primary tuning opportunity.

To combat this SQL Server 2005 moved to the RDTSC counter to provide micro-second timing. The trace file format actually contains the micro-second values and the Profiler GUI converts these into milli-second values. The change was made early in the SQL Server 2005 development cycle (year 2000 time-frame) to locations that provide performance outputs such as Trace durations, statistics profile and others as outlined in the KB article 931279. The messages of DRIFT and OUT OF SYNC were added to SQL Server 2005 SP2 to provide warnings to customers that the performance data they are looking at might be compromised. Support has been burnt a few times looking at the duration column in profiler and it is not correct because the NT thread was moved between physical CPUs and the values are not near the same area as Stop Time – Start Time from the same event. These messages are first and foremost warnings to be cautions when looking at the performance output.

· The RDTSC counter (instruction) is a lightweight, 64 bit counter of the current CPU ticks. It is reported by the local CPU and ticks at the frequency of the CPU. For example a 1.5Ghz CPU ticks at 1.5Ghz * 1000 or 1500000 ticks per milli-second.

· GetTickCount is limited in its granularity.

· QueryPerformanceCounter is a more stable counter but very costly when called lots of times like SQL Server would do. It can also fall back on the RDTSC and encounter similar problems as the current implementation.

One problem support is seeing is multiple CPUs on the same machine don’t have a sync’ed RDTSC. Some of these are known manufacture issues but many times it is expected behavior. The biggest problem with determing this is that it is difficult to obtain the RDTSC values from all CPUs as the same time. You can create threads and affinitize on each CPU and then signal them all to run at one time but you are limited by the scheduling of the operating system dispatcher. There is no easy way to achieve the RDTSC in a synchronized way across all CPUs or CPUs in a group such as a NUMA node. Because of this utilities looking for drift often report some level of drift that would be considered noise.

SQL Server uses the SchedulerMonitor thread to monitor the health of the schedulers. There is a SchedulerMonitor per NUMA Node so it is possible that the RDTSC is significantly different on different nodes but SQL Server won’t report this (take a cable connected NUMA system as an example). SQL Server only reports DRIFT states within the same node.

Comments

  • Anonymous
    June 20, 2007
    I am facing the exact same problem in my environment (SQL Server 2005 SP2, IA64 montecito, Windows 2003 SP1, and SAP application) where I got the message 'The time stamp counter of CPU on scheduler id 4 is not synchronized with other CPUs' in SQL Server 2005's ERRORLOG. I changed the power scheme, enable SQL Server Processor Affinity and the error message did not appear for 2 days. However, I checked the ERRORLOG again today and found out that the error exists. Is it possible that processor instability is caused by other applications instead of SQL (since the processor affinity has been set)?
  • Anonymous
    August 19, 2007
    I posted some updates to my blog entry about the RDTSC and drift warnings in May. See the following link
  • Anonymous
    February 01, 2008
    Author: Robert Dorr, Keith Elmore, Lindsey Allen Introduction Have you ever asked the question: “If
  • Anonymous
    November 07, 2008
    For the above issue,how do i configure affinity mask and affinity I/O mask if i have 8 processors.
  • Anonymous
    December 16, 2008
    Many of you have encountered the RDTSC timing variances that I outlined in an earlier blog post:  
  • Anonymous
    December 17, 2008
    I recently posted about the availability of SQL Server 2005 SP3. I have a bit more information about
  • Anonymous
    January 23, 2009
    FYI... we have encountered this on 2 different servers when we attempted to create a linked server via the GUI in SQL Server Management Studio to different Oracle databases. Both servers' power schemes were set to "On Always" and that didn't make a difference. When we created the linked servers using SQL scripts, there was no issues.