BizTalk Troubleshooting: Analysis of Scenario Which Results in TDDS Crash

 

Problem

​Ascenario where due to the continuation on a two level deep call orchestration structure (i.e Orchestration A calls Orchestration B and Orchestration calls Orchestration C) the tracking host instance crashes, toppling the whole BAM and Message Tracking mechanism. It leads to accumulation of tracked messages in the Tracking tables in the Message Box database resulting in an oversized Message Box database over a period of time which reduces the performance the BizTalk is intended to deliver during the processing of message. We contacted the Microsoft Team for the issue and now I want to share our findings.  We have recreated the scenario using a sample of my own

Sample Scenario

The sample is quite simple, it accepts the EmloyeeDetails input and decides whether to promote the employee or not. The input and output schemas are as follows

Input: EmployeEvaluationInput.xsd

Output: EmployeEvaluationInput.xsd

There are three orchestrations in the design namely (I have percolated the BAM buffered event stream variable from first orchestration to last as an in parameter to the callable orchestration)

  • ParentOrch.odx : It accepts the input from the user. Creates a msgIn message of type mentioned in input. Tracks the details like ProcessStartTime, EmployeeName, EmployeeId . It enables the continuation and then it gives a call to CallSendOrch.odx orchestration by passing the msgIn, continuationId and Bam BufferedEventStream variable.The tracking Snippet is as follows

 

ActivityId= System.Convert.ToString(System.Guid.NewGuid());
 
ContinuationId=msgIn(BTS.InterchangeID);
 
varBAM=new Microsoft.BizTalk.Bam.EventObservation.BufferedEventStream("Integrated Security=SSPI;Data Source=.;Initial Catalog=BizTalkMsgBoxDb",1);
 
varBAM.BeginActivity("BAMContinuationTester",ActivityId);
 
varBAM.UpdateActivity("BAMContinuationTester",ActivityId,"StartProcess",System.DateTime.Now,"EmployeeName",msgIn.Name,"EmployeeId",msgIn.Id);
 
varBAM.EnableContinuation("BAMContinuationTester",ActivityId,ContinuationId);

 

  • **CallSendOrch.odx: **This orchestration  tracks the isEligibleForPromotion value and enables the continuation and if the isEligibleForPromotion value is true then calls the OrchWriteToFile.odx orchestration by passing the msgIn, continuationId and Bam BufferedEventStream variable.The tracking snippet is as follows
isEligibleForPromotion=System.Convert.ToString(msgIn.isEligibleForPromotion);
   continuationId=msgIn(BTS.InterchangeID);
   varBAM.UpdateActivity("BAMContinuationTester",ContinuationId,"isEligibleForPromotion",isEligibleForPromotion,"CurrentPosition",msgIn.CurrentPosition);varBAM.EnableContinuation("BAMContinuationTester",ContinuationId,continuationId);
  • OrchWriteToFile.odx

    : The orchestration computes if the employee should be promoted or not and tracks the data like NewPosition and isPromoted field in the Output.

    The tracking snippet is as follows

varBAM.UpdateActivity("BAMContinuationTester",ContinuationId,"NewPosition",strNewPoS,"isPromoted",vartemp);
varBAM.EndActivity("BAMContinuationTester",ContinuationId);

 

Now in each case of enable continuation I have set the continuation id using the InterrchangeId of the msgIn.

Runtime Observations

​I invoked the parent orchestration by passing the message using a file adapter and checked the event log and following error appears:

Faulting application name: BTSNTSvc64.exe, version: 3.9.469.0, time stamp: 0x4c548eb4

Faulting module name: clr.dll, version: 4.6.1055.0, time stamp: 0x563c12de

Exception code: 0xc00000fd

Fault offset: 0x000000000001a3a5

Faulting process id: 0x9cc

Faulting application start time: 0x01d1e7fd298a2c52

Faulting application path: C:\Program Files (x86)\Microsoft BizTalk Server 2010\BTSNTSvc64.exe

Faulting module path: C:\Windows\Microsoft.NET\Framework64\v4.0.30319\clr.dll

Report Id: 452fc95b-53f7-11e6-827d-d85de2912cce

Faulting package full name:

Faulting package-relative application ID: 

Also strange thing is that my process was completed as expected but I could find the details for this transaction in BAM portal. After submitting one more message same result. Well for basics I checked if the tracking host instance was running or not and I found that it had crashed as soon as I would start it, it would stop. I also checked the tracking table in message box and found out that the tracking data was still lying in the Tracking tables in message box database. So now not only BAM tracking but also Group level tracking was down and data was getting accumulated in the messagebox database.

Indeed a strange issue!!!

Root Cause Analysis

I purged the tracking data from the tracking tables in the message box database and in the orchestration design, changed the buffered event stream to direct event stream which inserts the data directly in the BAM Primary Import database. And once again submitted a bunch of messages and I instantly found out the real flaw in the design, this time the parent orchestration threw an exception as below.       

Exception type: BAMTraceException

Source: Microsoft.BizTalk.Bam.EventObservation

Target Site: Void StoreSingleEvent(Microsoft.BizTalk.Bam.EventObservation.IPersistQueryable)

Violation of PRIMARY KEY constraint 'PK__bam_BAMC__45F4A7F1278EDA44'. Cannot insert duplicate key in object 'dbo.bam_BAMContinuationTester_Continuations'.

The statement has been terminated.      

After checking the design I found that while doing

varBAM.EnableContinuation("BAMContinuationTester",ContinuationId,continuationId);

 I accidentally assigned same value to both the variables ContinuationId and continuationId as both read the InterchaneID from the incoming message to the orchestration.

So when in case of Buffered Event Stream when TDDS tried to move the data from the tracking tables to the concerned activity tables, it crashed but we could not find the exact error as to why the service crashed .So when the host instance was restarted again, TDDS again tried to move the corrupted data from previous submission and crashed again.  But the real problem in the whole scenario is that the exception highlighted should be logged in the event log but it isn’t in case of asynchronous streams. 

Conclusion

The real learning from this incident was that whenever continuations are used in tandem with the asynchronous streams, care should always be taken that the Activity Id and continuation ID for EnableContinuation function should be distinct.