Orphaned Service Broker Queues in the SCOM Database
We recently ran into an issue where our Operations Manager database was full. But the on-call engineer couldn't figure out why. He ran the "Disk Usage By Top Tables" report in SQL Server Management Studio. But the top tables didn't add up to the used space in the database. The missing free space was in SQL Service Broker Queues, and the problem is that a SCOM management server creates a fresh service broker queue every time the SDK service starts. This leads to abandoned service broker queues in the Operations Manager database that build up over time.
NOTE: System Center Service Manager (SCSM) does the same thing.
The first four numbers are the IP address of the management server that created the queue. And the next number is the process ID of the SDK process on that management server. At any given time, only one service broker queue is in use by each management server. So if you have more than one management server, there will be more than one queue in use at a time.
To see if we have a database space problem caused by service broker queues, run the following TSQL query against the Operations Manager database:
SELECT [a].used_pages, [s].name AS [SchemaName], [o].name AS [ObjectName], [o].type_desc FROM sys.allocation_units [a] JOIN sys.partitions [p] ON [a].container_id = [p].partition_id JOIN sys.objects [o] ON [o].object_id = [p].object_id JOIN sys.schemas [s] ON [s].schema_id = [o].schema_id ORDER BY used_pages DESC
Check if you see any of the following SYSTEM_TABLE values near the top.
- sys.sysdercv
- sys.sysdesend
- sys.sysconvgroup
You can read more about those here: https://msdn.microsoft.com/en-us/library/ms179503.aspx
Those indicate that service broker queues are some of the top users of space in the database. There may also be individual queues that are very large. Those will show up as INTERNAL_TABLE type and will be named something like queue_messages_448173538.
NOTE: The cleanup may take hours or even days. I had to end about 20 million conversations and had to leave the cleanup running overnight.
STEP 1 - Clean the Service Queues
Run this query to see which queues have the most conversations in them. If one of the largest is active, meaning the IP address and Process ID match the SDK service on an active management server, you can simple reboot the management server. The SDK service will abandon the old service broker queue and start a new one.
SELECT SCHEMA_NAME(q.schema_id) AS SchemaName ,q.name AS QueueName ,p.rows AS QueueRows ,q.modify_date FROM sys.service_queues AS q JOIN sys.objects AS o ON o.object_id = q.object_id JOIN sys.objects AS i ON i.parent_object_id = q.object_id JOIN sys.partitions p ON p.object_id = i.object_id AND p.index_id IN(0,1) ORDER BY p.rows DESC
To run a cleanup, substitute the name of the queue you want to clean into the SQL query below. Also update the while(@a line to however many conversations you want ended per pass. You can make it as large as the number of rows in the queue from the above query if you want to do it all at once.
declare @c uniqueidentifier declare @a int SET @a = 1 while(@a < 10000) begin SET @c = (select top 1 conversation_handle from dbo.Queue_mid10_223_23_219_pid1700_adid2_r630771674) end conversation @c with cleanup SET @a = @a + 1 end
STEP 2 - Clean the Conversation Endpoints
Use the following command to see which services have the most conversation endpoints. If one of the largest is active, meaning the IP address and Process ID match the SDK service on an active management server, you can simple reboot the management server. The SDK service will abandon the old service broker queue and start a new one.
select far_service, COUNT(*) from sys.conversation_endpoints group by far_service ORDER BY COUNT(*) DESC
To run a cleanup, substitute the name of the service you want to clean into the SQL query below. Also update the while(@a line to however many conversations you want ended per pass. You can make it as large as the number of rows in the queue from the above query if you want to do it all at once.
declare @c uniqueidentifier declare @a int SET @a = 1 while(@a < 10000) begin SET @c = (select top 1 conversation_handle from sys.conversation_endpoints WHERE far_service = 'Service_mid10_223_23_219_pid5644_adid2_r633159110') end conversation @c with cleanup SET @a = @a + 1 end
STEP 3 - Clean the Services and Queues
This is the only part you can do with the SQL Server Management Studio user interface. You can also use the TSQL commands
DROP SERVICE
/en-us/sql/t-sql/statements/drop-service-transact-sql
and
DROP QUEUE
/en-us/sql/t-sql/statements/drop-queue-transact-sql
NOTE: You cannot drop a queue until you drop the service first. As with everything else, they are named with the IP address and process ID's of the SDK services on the management servers.
Comments
- Anonymous
June 16, 2018
Hello,This was a great post and helped us find an issue that has been plaguing us for a long time but as far as efficiency is concerned wouldn't this be a lot faster...declare @MyCursor CURSOR;Declare @Conversation uniqueidentifierBEGIN SET @MyCursor = CURSOR FOR Select conversation_handle from dbo.Queue_mid200_0_27_47_pid2196_adid2_r1442616624 OPEN @MyCursor FETCH NEXT FROM @MyCursor INTO @Conversation WHILE @@FETCH_STATUS = 0 BEGIN end conversation @Conversation with cleanup FETCH NEXT FROM @MYCursor INTO @Conversation END; CLOSE @MyCursor; SET @MyCursor = CURSOR FOR Select conversation_handle from sys.conversation_endpoints where far_service = 'Service_mid200_0_27_47_pid2196_adid2_r1442616624' OPEN @MyCursor FETCH NEXT FROM @MyCursor INTO @Conversation WHILE @@FETCH_STATUS = 0 BEGIN end conversation @Conversation with cleanup FETCH NEXT FROM @MYCursor INTO @Conversation END; CLOSE @MyCursor; DEALLOCATE @MyCursor;END;Running the select statement every time seems unnecessary.- Anonymous
August 06, 2018
That probably would be faster.After this happened to us twice, I tried to get my notes in a way that would be halfway readable to the public and share what we had. One of the guys on my team was also looking at using an ORDER BY in the SELECT, which helps a lot early in the grooming, but bogs down as fragmentation builds up in the middle part of the grooming.Thanks for sharing! If we run into this again, we can do some side-by side tests and see how both methods perform.
- Anonymous