Search Index fails for all users in a particular database in Exchange 2007
In Exchange 2007 there are lot of improvements in the way search works. Search feature helps end users to search in the mailbox quickly and efficiently. To know more about Exchange Search, read the following article.
https://technet.microsoft.com/en-us/library/bb232132.aspx
https://msexchangeteam.com/archive/2006/10/11/429163.aspx
In this blog I would like to discuss Search failure for all the users in a particular database in Exchange 2007. There are multiple reasons for failure, but I would concentrate on the failure due to stuck batch.
Symptoms:
You notice that all users in a particular database are not able to do search from outlook in online mode properly. Users get partial result or no result while searching. You may also observe that search result does not return items newer than a particular date. You might be able to search older items.
Any database on this server can go into this failure state at any time.
Test-ExchangeSearch “user” command against any user in that database will show the following result.
[PS] >test-exchangesearch user |fl
ResultFound : False
SearchTime : -1
You might notice this issue after a power outage or unexpected reboot of the server or some unknown cause.
Solution:
This is fixed in Exchange 2010.
On Exchange 2007, follow one of the methods below:
1) Disable the Catalog and rebuild the full text index Catalog.
Open Exchange Management Shell
Move to the following directory where the ResetSearchindex.ps1 script is located.
<drive> :\Program Files\Microsoft\Exchange Server\Scripts
Run the following command.
ResetSearchIndex.ps1 -force database name
This command will remove the index directories that are associated with the specified mailbox database.
2) Restart the Search Service or reboot the server.
How can I confirm that my issue is due to a stuck batch?
There are couple of methods to prove this:
1) Run Perfmon with the following counters for 10 to 15 min. Add all instances for all the counters mentioned below. Instances represent each database. You could compare the counters between the problem database and good database.
MSExchange Search Indices
Age of Last Notification Indexed
Age of Last Notification Processed
Average Latency of RPCs during Crawling
Average Latency of RPCs to get Notifications
Average Latency of RPCs used to Obtain Content
Number of Create Notifications
Number of Content Conversion Done
Number of Documents Successfully Indexed
Number of Documents that failed During Indexing
Number of Failed Retries
Number of Indexed Attachments
Number of HTML Message Bodies
Number of RTF Message Bodies
Number of Outstanding Batches
Number of Outstanding Documents
Number of Plain text message Bodies
Number of Update Notifications
MSFTESQL-Exchange:Catalogs
Batches Completed Success
Batches Completed w/errors
Batches Completed w/warnings
Batches done
Batches in Progress
Batches Received
Trans in Progress
Let us take a look at each counter.
Following counters show a value of 1 for all the problem databases throughout the trace. This is the stuck batch which is causing the problem here. Value of this counter varies for a good database.
MSFTESQL-Exchange:Catalogs\Batches in Progress
MSFTESQL-Exchange:Catalogs\Trans in Progress
MSExchange Search Indices\Number of outstanding Batches
MSExchange Search Indices\Number of outstanding documents shows a small value (3 or 5 or something else) constant for the problem databases throughout the trace, but it shows high value and it varies for good databases, also it touches zero at times. This means that these 3 or 5 documents are part of the batch which is stuck on the problem databases and it is not accepting any more notifications.
Following counters are going to show a constant value throughout the trace for the problem databases, but for the other databases it keep increasing. This clearly indicates that the indexing stopped at some stage (before this trace was started) for the problem databases.
MSExchange Search Indices\Number of documents successfully indexed
MSExchange Search Indices\Number of documents that failed during Indexing
MSExchange Search Indices\Number of failed retries
Following activities will not be happening on the problem databases, so it will show a constant value throughout the trace, but for the other database, we can see that this counter keeps increasing.
MSExchange Search Indices\Number of Create Notifications
MSExchange Search Indices\Number of content conversions done
MSExchange Search Indices\Number of Delete Notifications
MSExchange Search Indices\Number of Documents Successfully Indexed
MSExchange Search Indices\Number of Documents That failed During Indexing
MSExchange Search Indices\Number of Failed Retries
MSExchange Search Indices\Number of indexed attachments
MSExchange Search Indices\Number of RTF message bodies
MSExchange Search Indices\Number of HTML message bodies
MSExchange Search Indices\Number of plain text message bodies
MSExchange Search Indices\Number of Update Notifications
MSFTESQL-Exchange:Catalogs\Batches Completed Success
MSFTESQL-Exchange:Catalogs\Batches Completed w/errors
MSFTESQL-Exchange:Catalogs\Batches Completed w/warnings
MSFTESQL-Exchange:Catalogs\Batches done
MSFTESQL-Exchange:Catalogs\Batches Received
This also means that the thread for that particular database within the Notification engine is unable to poll for the notifications because of the stuck batch.
Following counters are going to show a value of zero for the problem databases. Good database shows some latency.
MSExchange Search Indices\Average Latency of RPCs During Crawling
MSExchange Search Indices\Average Latency of RPCs to get Notifications
MSExchange Search Indices\Average Latency of RPCs used to Obtain Content
This indicates that the problem is not during processing. These problem databases are not processing anything.
Age of the last notification processed and Age of the last notification Indexed are going to show a constant value for the problem databases. For example, processed time shows 1 sec and indexed time shows 11 sec constant. This is the age (in seconds) of the last notification we have successfully processed and indexed for the given MDB.
There will be one thread that polls for notifications for the servers and MDBs we are handling in this content indexing instance. That thread/task will use MAPI.NET to grab notifications using the current high watermark event identifier. It will then call into the notification queue to add the events for processing. To minimize startup costs, each notification queue needs a maintenance procedure to go in and take the in-memory information about the high watermark for what events have been read and about what the highest event identifier that has actually been committed to Content Index.
2) Obtain 64bit version of Dip.exe tool from Microsoft.
You need to find the catalog directory for each database.
The search catalog has its own directory in the same place where database files are located.
For example, the catalog directory has the following name:
CatalogData-9e6c1672-7fe2-48b8-8845-2335fa2ed019-7b201267-3ed8-4d69-9d48-6fa753588cee
The first GUID (in red) is the database GUID and the second one (in blue) is the server GUID.
On the Exchange 2007 server, copy dip.exe to the Program Files\Microsoft\Exchange Server\Bin directory
Create a sub-directory under the Bin directory called Log
Register the msftepxy.dll file using the following: regsvr32 "c:\Program Files\Microsoft\Exchange Server\bin\msftepxy.dll"
For the example above, type in Dip.exe /p ExSearch- 9e6c1672-7fe2-48b8-8845-2335fa2ed019 -7b201267-3ed8-4d69-9d48-6fa753588cee >databasename-p.txt
Databasename-p.txt is the output file. Take this output for few problem databases and good databases.
Also take Dip.exe /c output using the following command.
Dip.exe /c >Dip-c.txt
Dip-c.txt is the output file.
Following is a sample output of Dip.exe /p from a problem database.
Init. CocreateInstance DataSink
admin cocreated
got datasink for project exsearch-f5cf7c68-b7c6-41ef-9b4b-67be1e474fef-cc64dd2d-2428-4f12-bba2-79d6d34c4d27
Heartbeat- state 1 Available 0x00000000
Batches Rcv 91810 InProg 1 Complete 86588
Trans Done 362880 Err 1579
Trans Hld 0 Dly 0 RdQ 0
FD 1 STFD 0 HDFD 0
Pip 0 Pnd 0 Cmp 1
Wait 0 Red 0 Rmv 0
Batches InProg and FD will show 1 always. This is the stuck batch.
Following is a sample output of Dip.exe /p from a good database
Init. CocreateInstance DataSink
admin cocreated
got datasink for project exsearch-15a1b31c-2e23-4d48-aef2-487d630617ae-cc64dd2d-2428-4f12-bba2-79d6d34c4d27
Heartbeat- state 1 Available 0x00000000
Batches Rcv 602879 InProg 3 Complete 574927
Trans Done 2348724 Err 8365
Trans Hld 0 Dly 0 RdQ 0
FD 2 STFD 0 HDFD 0
Pip 0 Pnd 6 Cmp 0
Wait 0 Red 0 Rmv 0
Batches InProg shows more than 1 most of the time if this is a busy server. The number in FD shows how many batches are being processed in Multi-thread filter daemon, which is equivalent to a snapshot. So if the engine is constantly indexing docs, the number will always be positive, meaning there are batches inside. You may see FD showing as zero if there are no batches inside.
Dip.exe /c output will show the following symptom:
Free FDId 0 MSFTESQL_CBExchanger85fj CBID 00000000026F92F0
Cat 21 NoLongerValid docs 1 keys 209
batch 3419385 docid 4417252 total 62760 secs updated 62760 secs
Highlighted ‘secs’ means the age of the chunkbuffer processed. This value is way larger than the normal one, so it indicates the symptoms of stuck batches.
As I mentioned earlier, this can be due to a power outage or an unexpected reboot of the server. You can see 6008 and 1076 events in the system log of the Exchange 2007 server during this unexpected reboot. You might notice that the search is successful for all items added to the mailbox before this event.
Power outage or unexpected reboot are not the only cause of this issue. We have seen that after following the steps mentioned in the solution section, search fails for a particular database again after few weeks or months.
This has been fixed in Exchange 2010. We are trying to back port this fix to Exchange 2007.
Thanks & Regards
Roopesh Pattan