Why Do I see more threads per SPID in sysprocesses than MAXDOP?
Most of the Customers using sql server 2000,2005 and 2008 have this doubt in their mind when using parallelism in sql server.
Why Do I see more threads per SPID in sysprocesses than MAXDOP?
The above question is being addressed in the following blog by Bob Dorr from SQL Server Escalation Services,
But we will try to provide some more clarification on this issue, in this blog by explaining the behaviour of query processor when dealing with parallelism with the help of following example.
We have used the following script from the book "Inside Sql server 2005 Query Tuning and Optimization" by Kalen Delaney, to induce parallelism in the execution plan,
create table [HugeTable1]
(
[Key] int,
[Data] int,
[Pad] char(200),
Constraint [PK1] PRIMARY KEY ([Key])
)
SET NOCOUNT ON
DECLARE @i int
BEGIN TRAN
set @i = 0
WHILE @i < 250000
BEGIN
INSERT [HugeTable1] Values (@i,@i,NULL)
SET @i = @i + 1
if @i % 1000 = 0
BEGIN
COMMIT TRAN
BEGIN TRAN
END
END
COMMIT TRAN
SELECT [KEY],[DATA],[PAD] INTO [HugeTable2] FROM HugeTable1
ALTER TABLE [HugeTable2] ADD CONSTRAINT [PK2] PRIMARY KEY ([Key])
We use the following query which displays parallelism in it’s execution plan
set statistics profile on
select T1.[Key],T1.[Data],T2.[Data] From HugeTable1 T1 Join [HugeTable2] T2 ON T1.[Key] =T2.[Key] where T1.Data < 100 OPTION (MAXDOP 2)
Execution Plan
|--Parallelism(Gather Streams)
|--Nested Loops(Inner Join, OUTER REFERENCES:([T1].[Key]))
|--Parallelism(Repartition Streams, RoundRobin Partitioning)
| |--Clustered Index Scan(OBJECT:([AdventureWorks2008].[dbo].[HugeTable1].[PK1] AS [T1])
WHERE:([AdventureWorks2008].[dbo].[HugeTable1].[Data] as [T1].[Data]<(100)))
|--Clustered Index Seek(OBJECT:([AdventureWorks2008].[dbo].[HugeTable2].[PK2] AS [T2]),
SEEK:([T2].[KEY]=[AdventureWorks2008].[dbo].[HugeTable1].[Key] as [T1].[Key]) ORDERED FORWARD)
We ran the above query 100 times in a loop,identified the SPID while running the above query and queried the sysprocesses for that spid from another query window at the same time
select * from sys.sysprocesses where spid=56
spid kpid blocked waittype waittime lastwaittype cpu physical_io ecid status
56 5640 0 0x00BB 3 CXPACKET 66653 20605 0 suspended
56 5936 0 0x00BB 3 CXPACKET 2147483647 0 1 suspended
56 1252 0 0x00BB 1 CXPACKET 2147483647 0 2 suspended
56 3508 56 0x0024 0 LATCH_EX 2147483647 0 3 suspended
56 3580 0 0x0000 0 LATCH_EX 2147483647 0 4 runnable
As we can see here we have 5 threads which are spawned to execute the query, however while executing the query we limited the DOP to 2 by specifying the MAXDOP hint to 2.
The reason is “MAXDOP is always specified per operator in the execution plan and not per execution plan”.
So in the above execution plan we have 3 operators in the execution plan viz Clustered Index Scan, Clustered Index Seek and Nested Loops.
So we have
- 2 threads (as per the MAXDOP hint) which are used to perform Clustered Index Scan,
- 2 threads (as per the MAXDOP hint) which are used for Nested Loop Join which also performs Clustered Index Seek to join the rows from outer Clustered Index Scan. In other words we do not have separate threads allocated for Clustered Index Seek opearor,
- 1 thread is used for parallel gather streams operator to collect and converge the parallel thread output from the Nested Loop join operator. This thread is basically a synchronizing thread (also referred as Thread 0 IN xml plan) which is used when there is parallelism in the execution plan of the query.
To justify the above explanation, we can observe the XML execution plan which also show the runtime behaviour of the plan
2 threads used to perform Clustered Index Scan
<RelOp NodeId="3" PhysicalOp="Clustered Index Scan" LogicalOp="Clustered Index Scan"….>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="2" ActualRows="100" ActualEndOfScans="1" ActualExecutions="1" />
<RunTimeCountersPerThread Thread="1" ActualRows="0" ActualEndOfScans="1" ActualExecutions="1" />
<RunTimeCountersPerThread Thread="0" ActualRows="0" ActualEndOfScans="0" ActualExecutions="0" />
</RunTimeInformation>
2 threads used to perform Nested Loop as well the clustered index seek
<RelOp NodeId="1" PhysicalOp="Nested Loops" LogicalOp="Inner Join" ….>
<RunTimeInformation>
<RunTimeCountersPerThread Thread = " 2 " ActualRows = " 50 " ActualEndOfScans = " 1 " ActualExecutions="1" />
<RunTimeCountersPerThread Thread="1" ActualRows="50" ActualEndOfScans="1" ActualExecutions="1" />
<RunTimeCountersPerThread Thread="0" ActualRows="0" ActualEndOfScans="0" ActualExecutions="0" />
</RunTimeInformation>
Same threads used to perform Clustered Index Seek on inner query
<RelOp NodeId="4" PhysicalOp="Clustered Index Seek" LogicalOp="Clustered Index Seek"…>
<RunTimeInformation>
<RunTimeCountersPerThread Thread = " 2 " ActualRows = " 50 " ActualEndOfScans = " 0 " ActualExecutions="50" />
<RunTimeCountersPerThread Thread="1" ActualRows="50" ActualEndOfScans="0" ActualExecutions="50" />
<RunTimeCountersPerThread Thread="0" ActualRows="0" ActualEndOfScans="0" ActualExecutions="0" />
</RunTimeInformation>
Finally Thread 0 which is shown in all the operators is used to gather streams from all the parallel threads and to provide the final output to the client
<RelOp NodeId="0" PhysicalOp="Parallelism" LogicalOp="Gather Streams"….>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="100" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
To conclude the MAXDOP setting is used to limit the number of threads per operation in the execution plan and does not limit the number of threads used to execute the query. So it is normal to see threads per SPID in sysprocesses greater than MAXDOP setting.
However if you see lot of waits for CX Packets in sys.dm_os_wait_stats and High CPU being consumed by sql server, you can consider reducing the MAXDOP setting or identify the missing indexes which can expedite the processing of the query.
Parikshit Savjani,
SE, Microsoft SQL Server,
Comments
Anonymous
January 04, 2009
PingBack from http://www.codedstyle.com/why-do-i-see-more-threads-per-spid-in-sysprocesses-than-maxdop/Anonymous
January 05, 2009
Excellent blog, this explanation can answer also why you can see more threads than CPUs. I'm allways happy to see explanations that answer the reasoning of the behaviour. Thanks!!Anonymous
January 05, 2009
Excellent blog, this explanation can answer also why you can see more threads than CPUs. I'm allways happy to see explanations that answer the reasoning of the behaviour. Thanks!!Anonymous
November 18, 2012
Here's an excerpt from msdn.microsoft.com/.../ms178065(v=sql.100).aspx 'Degree of parallelism determines the maximum number of CPUs that are being used; it does NOT (emphasis added) mean the number of threads that are being used' This clearly contradicts what's been said in this article. Please do clarify, as this is a major issue. Thanks lots!Anonymous
November 19, 2012
The comment has been removedAnonymous
April 06, 2015
Very informative - Thanks much !!Anonymous
March 24, 2016
Parikshit,This is really use full and thanks for sharing your knowledge with us.I have few questions1)For example if we set Maxdop to 3 and we have few update statements and insert statements in the query, So in this case SQL Server won't allocate more than 3 threads to run one statement? either it could be update or insert ? It is true ?Could you please explain little more on this?Anonymous
March 24, 2016
Parikshit,Thanks for sharing your knowledge with us.I have few questions1)For example if we set Maxdop to 3 and we have few update statements and insert statements in the query, So in this case SQL Server won't allocate more than 3 threads to run one statement? either it could be update or insert ? It is true ?Could you please explain little more on this?