The case of the non-responsive MVC Web Application
A colleague (and friend) from Azure team asked me to help narrow down a strange case of a MVC Entity Framework application (hosted in Virtual machine) which occasionally stopped responding, requiring a IIS reset to work again. The products I support do not include IIS, MVC nor Entity Framework. I was invited to help as consultant for NetExt extension.
This case study uses WinDbg with NetExt extension. See this to get started with WinDbg and NetExt:
https://blogs.msdn.com/b/rodneyviana/archive/2015/03/10/getting-started-with-netext.aspx
Verifying how many web application are running under the worker process
Listing the .NET Application Domains can gives the number of web applications under the application pool. Just get the number and subtract 3. This technique was discussed here:: https://blogs.msdn.com/b/rodneyviana/archive/2015/03/12/the-strange-case-of-the-application-pool-recycling-causing-high-cpu.aspx
0:000> !wdomain
Address Domain Name Modules Base Path & Config
747e0c08 System 43
747e0f60 Shared
01804078 DefaultDomain 10 Base Path: D:\Windows\SysWOW64\inetsrv\ Config: w3wp.exe.config
0184a7b0 /LM/W3SVC/2033431354/ROOT-1-130705968419350748 75 Base Path: D:\home\site\wwwroot\ Config: web.config
Having only one web application to the application pool ruled out the problem of another web site locking this one. Ruling out is as imported as ruling in. Notice that default domain points to SysWOW64 which means that the w3wp process is running as 32-bits. To debug this dump file I had to use WinDbg 32-bits and the 32-bits version of NetExt. You should never run a Web Application as 32-bits unless it is necessary (i.e. if the application depends on a native 32-bits dll). IIS application running as 32-bits will always run out of memory because of memory fragmentation. This is a frequent problem in 32-bit processes that run for long time. Memory is allocated and de-allocated all the time leaving “holes” in memory. In order to allocate, let’s say 10MB of memory, it is necessary at least one 10MB block of “contiguous” memory available. If there is 100 x 1MB free memory segments but not a single segment with 10MB free, your application will not be able to allocate 10MB of contiguous memory and will yield a “Out of Memory” exception. The maximum memory available for a 32-bits process is 4GB (because the OS is 64-bits, otherwise it would be only 2GB). However, memory fragmentation was not the problem in this case.
Verifying how many requests were active
To run commands requiring heap look up, you need to index the heap first.
0:000> !windex
Starting indexing at 20:29:08 PM
1000000 objects...
Indexing finished at 20:29:10 PM
97,208,569 Bytes in 1,490,134 Objects
Index took 00:00:02
0:000> !wruntime
Runtime Settings per Application Pool
=========================================================================
Address : 066C5404
First Request : 3/24/2015 1:14:35 PM
App Pool User : IIS APPPOOL\contosofarm
Trust Level : Full
App Domnain Id : /LM/W3SVC/1264340133/ROOT-1-130716764614361954
Debug Enabled : False
Active Requests : 0n548
Path : D:\home\site\wwwroot\ (local disk)
Temp Folder : D:\local\Temporary ASP.NET Files
Compiling Folder: D:\local\Temporary ASP.NET Files\root\7135b59b\46bce28c
Shutdown Reason : Not shutting down
We have 548 active connections. This is just another information we will need to find the problem, but the fact of having that many requests is not a problem per se.
Checking the running requests pattern
0:000> !whttp -running
HttpContext Thread Time Out Running Status Verb Url
02f1b334 -- 00:01:00 00:20:04 200 GET https://contosoapp.com:443/OrderSystem/Order
03095104 -- 00:01:00 00:21:22 200 GET https://contosoapp.com:443/OrderSystem/Order
030968ec -- 00:01:00 00:21:03 200 GET https://contosoapp.com:443/OrderSystem/Order
0309873c -- 00:01:00 00:21:11 200 GET https://contosoapp.com:443/OrderSystem/Order
030990b4 -- 00:01:00 00:21:11 200 GET https://contosoapp.com:443/OrderSystem/Order
0309f154 -- 00:01:00 00:21:40 200 GET https://contosoapp.com:443/OrderSystem/Order
030a2a70 -- 00:01:00 00:21:30 200 GET https://contosoapp.com:443/OrderSystem/Order
030a3404 -- 00:01:00 00:21:39 200 GET https://contosoapp.com:443/OrderSystem/Order
030a6d20 -- 00:01:00 00:21:39 200 GET https://contosoapp.com:443/OrderSystem/Order
030aa63c -- 00:01:00 00:21:39 200 GET https://contosoapp.com:443/OrderSystem/Order
030adf58 -- 00:01:00 00:21:39 200 GET https://contosoapp.com:443/OrderSystem/Order
030b1874 -- 00:01:00 00:21:38 200 GET https://contosoapp.com:443/OrderSystem/Order
030b5190 -- 00:01:00 00:21:38 200 GET https://contosoapp.com:443/OrderSystem/Order
030b8aac -- 00:01:00 00:21:38 200 GET https://contosoapp.com:443/OrderSystem/Order
030bc3c8 -- 00:01:00 00:21:38 200 GET https://contosoapp.com:443/OrderSystem/Order
030bfce4 -- 00:01:00 00:21:38 200 GET https://contosoapp.com:443/OrderSystem/Order
030c3600 -- 00:01:00 00:21:38 200 GET https://contosoapp.com:443/OrderSystem/Order
030c6f1c -- 00:01:00 00:21:38 200 GET https://contosoapp.com:443/OrderSystem/Order
030dbc34 -- 00:01:00 00:20:16 200 GET https://contosoapp.com:443/OrderSystem/Order
030dcaa0 -- 00:01:00 00:20:18 200 GET https://contosoapp.com:443/OrderSystem/Order
030df8a8 -- 00:01:00 00:21:37 200 GET https://contosoapp.com:443/OrderSystem/Order
(... lines removed for brevity ...)
0f8461b0 -- 00:01:00 00:20:06 200 GET https://contosoapp.com:443/OrderSystem/Order
0f8471f4 -- 00:01:00 00:20:06 200 GET https://contosoapp.com:443/OrderSystem/Order
0f8503b0 -- 00:01:00 00:20:05 200 GET https://contosoapp.com:443/OrderSystem/Order
0f857058 -- 00:01:00 00:20:05 200 GET https://contosoapp.com:443/OrderSystem/Order
0f85e0e4 -- 00:01:00 00:20:05 200 GET https://contosoapp.com:443/OrderSystem/Order
0f8678f4 -- 00:01:00 00:20:05 200 GET https://contosoapp.com:443/OrderSystem/Order
0f871104 -- 00:01:00 00:20:04 200 GET https://contosoapp.com:443/OrderSystem/Order
0f874a40 -- 00:01:00 00:20:04 200 GET https://contosoapp.com:443/OrderSystem/Order
0f87b2c8 51 00:01:00 00:20:04 200 GET https://contosoapp.com:443/OrderSystem/Order
548 HttpContext object(s) found matching criteria
13 HttpContext object(s) skipped by filter
Note that only one of the requests has an associated thread. The others have “---“ as thread which means that either IIS is waiting for a thread to be created or the thread was aborted and IIS was not informed yet. Normally it is because the request is queued. As you can see, the requests are running for over 20 minutes even though the time out is set to 1 minute. HttpContext the object wrapping an IIS request includes a field to tell the timeout state. You can verify most of .NET framework source code here: https://referencesource.microsoft.com/
HttpContext.cs
private int _timeoutState; // 0=non-cancelable, 1=cancelable, -1=canceled
Running HttpContext are in a non-cancelable state (_timeoutState = 0)
0:000> !wselect * from 0f87b2c8
[System.Web.HttpContext]
(int64)System.Int64 _timeoutStartTimeUtcTicks = 8d2347382c7e3c2 (0n635628168132682690)
(int64)System.Int64 _timeoutTicks = 4190ab00 (0n1100000000)
System.Web.IHttpAsyncHandler _asyncAppHandler = 00000000
System.Web.HttpApplication _appInstance = 0F878910
System.Web.IHttpHandler _handler = 0F22B824
System.Web.HttpRequest _request = 0F87B3C4
System.Web.HttpResponse _response = 0F87B48C
System.Web.HttpServerUtility _server = 0717B2D8
System.Collections.Stack _traceContextStack = 00000000
System.Web.TraceContext _topTraceContext = 00000000
System.Collections.Hashtable _items = 0F165538
System.Collections.ArrayList _errors = 00000000
System.Exception _tempError = 00000000
System.Web.IPrincipalContainer _principalContainer = 0F87B5D4
System.Web.Profile.ProfileBase _Profile = 00000000
System.Web.HttpWorkerRequest _wr = 0F87B050
System.Web.VirtualPath _configurationPath = 00000000
System.Globalization.CultureInfo _dynamicCulture = 026D8EF8
System.Globalization.CultureInfo _dynamicUICulture = 026D8EF8
System.Collections.Stack _handlerStack = 00000000
System.Web.Instrumentation.PageInstrumentationService _pageInstrumentationService = 00000000
System.Collections.ObjectModel.ReadOnlyCollection _webSocketRequestedProtocols = 00000000
System.Web.Util.CancellationTokenHelper _timeoutCancellationTokenHelper = 038239C4
System.Web.Util.DoubleLink _timeoutLink = 0F1653E0
System.Threading.Thread _thread = 02FCE050
System.Web.CachedPathData _configurationPathData = 00000000
System.Web.CachedPathData _filePathData = 036B34C8
(string)System.String _sqlDependencyCookie = NULL
System.Web.SessionState.SessionStateModule _sessionStateModule = 0F878AE0
System.Web.UI.TemplateControl _templateControl = 00000000
System.Web.NotificationContext _notificationContext = 038D7118
System.Web.ThreadContext IndicateCompletionContext = 038D6C90
System.Threading.Thread ThreadInsideIndicateCompletion = 02FCE050
System.Object ThreadContextId = 0f87b3b8
System.Web.AspNetSynchronizationContextBase _syncContext = 0F87B940
System.Threading.Thread _threadWhichStartedWebSocketTransition = 00000000
(string)System.String _webSocketNegotiatedProtocol = NULL
System.Web.IHttpHandler _remapHandler = 0F22B824
System.Web.IHttpHandler _currentHandler = 0F22B824
System.Web.RootedObjects _rootedObjects = 0F87B5D4
System.Web.Security.CookielessHelperClass _CookielessHelper = 0F87B55C
System.Security.Principal.IPrincipal k__BackingField = 00000000
(int32)System.Web.Configuration.AsyncPreloadModeFlags _asyncPreloadModeFlags = 0 (0n0) None
(int32)System.Int32 _serverExecuteDepth = 0 (0n0)
(int32)System.Int32 _timeoutState = 0 (0n0)
(native int)System.IntPtr _rootedPtr = 0 (0n0)
(int32)System.Web.SessionState.SessionStateBehavior <SessionStateBehavior>k__BackingField = 0 (0n0) Default
(bool)System.Boolean _asyncPreloadModeFlagsSet = 1 (True)
(bool)System.Boolean _errorCleared = 0 (False)
(bool)System.Boolean _skipAuthorization = 0 (False)
(bool)System.Boolean _preventPostback = 0 (False)
(bool)System.Boolean _runtimeErrorReported = 0 (False)
(bool)System.Boolean _threadAbortOnTimeout = 1 (True)
(bool)System.Boolean _delayedSessionState = 0 (False)
(bool)System.Boolean _isAppInitialized = 1 (True)
(bool)System.Boolean _isIntegratedPipeline = 1 (True)
(bool)System.Boolean _finishPipelineRequestCalled = 0 (False)
(bool)System.Boolean _impersonationEnabled = 0 (False)
(bool)System.Boolean HideRequestResponse = 0 (False)
(bool)System.Boolean InIndicateCompletion = 1 (True)
(uint8)System.Web.WebSocketTransitionState _webSocketTransitionState = 0x0 (0n0)
(bool)System.Boolean _requiresSessionStateFromHandler = 1 (True)
(bool)System.Boolean _readOnlySessionStateFromHandler = 0 (False)
(bool)System.Boolean InAspCompatMode = 0 (False)
(bool)System.Boolean _ProfileDelayLoad = 1 (True)
(bool)System.Boolean <FirstRequest>k__BackingField = 0 (False)
(bool)System.Boolean <DisableCustomHttpEncoder>k__BackingField = 0 (False)
System.DateTime _utcTimestamp = -mt 725719B8 00000000 3/24/2015 5:40:12 PM
System.Web.Util.SubscriptionQueue> _requestCompletedQueue = -mt 706898A8 00000000
System.Web.Util.SubscriptionQueue _pipelineCompletedQueue = -mt 70689990 00000000
static System.Reflection.Assembly SystemWebAssembly = 066C1B34
static (bool)System.Boolean s_eurlSet = 0 (False)
static System.String s_eurl = 00000000 00000000
static System.Action CS$9__CachedAnonymousMethodDelegate3 = 00000000
static System.Func CS$9__CachedAnonymousMethodDelegate6 = 00000000
static System.Func CS$9__CachedAnonymousMethodDelegate8 = 0A92BE14
static System.Func CS$9__CachedAnonymousMethodDelegatea = 0F0449D4
So, this is why the request does not times out. The command below will show all requests that cannot be cancelled (i.e. it will not time out):
0:000> !wfrom -type System.Web.HttpContext where (_timeoutState==0) $addr()
calculated: 02AE9C60
calculated: 02F1B334
calculated: 03095104
calculated: 030968EC
calculated: 0309873C
calculated: 030990B4
calculated: 0309F154
(...)
calculated: 0FAF7AE4
calculated: 0FAFE02C
561 Object(s) listed
So, that is it. Something is preventing the requests to go away, Looking at the stack trace of thread 51 (the only one running a http request) we found that the thread was waiting for a SQL request to finish. We also saw that this was happening in the custom Global.asax handler method Application_BeginRequest. In the HttpContext fields we can also see that this application is using session (_requiresSessionStateFromHandler = 1 (True)).
Only one request per session can run at a time
Now comes the theory. The request has timeout, is in pipeline compatibility mode and is relying on ASP.NET session. The session cookie is ASP.NET_SessionId. NetExt includes a command to check and summarize cookies. Let’s summarize cookies by the key and value pair.
0:000> !wcookie -summary
Action Total Finished Cookie=Value
======================================================================================
(list) 561 13 ARRAffinity=7c89acbf92e4f1bf14a78494f66466c71fcb071249a85f5e303bf5149086b5bc
(list) 548 0 ASP.NET_SessionId=0odyr1kc0cdqoaaomed1v4pe
(list) 9 9 ASP.NET_SessionId=21oqjz1amz4bfkfgru5bheth
(list) 1 1 ASP.NET_SessionId=ew1qbqsugeiglq5uisf0csw1
(list) 1 1 ASP.NET_SessionId=hwxatvsprf4c2nutxlanzcxl
(list) 1 1 ASP.NET_SessionId=xqmy5iteqgosfmagesegpk1h
(list) 559 11 SBFtest_cookie=test_cookie_value
(list) 9 9 __RequestVerificationToken=IyKw3PREthJanOgCZLqTKzYAtXMyFgYFduNaXmT6xW42djDGVlH7HFGnB1ttFlnlUhz0eXHNOfJpWoBnG6g6Rp4PSGly0T6OFuXCNM0CgGMs_jxgyW-zSjXDVfMnr3vJn2aUEbAeEK8-RlTFgY2CGZmCC-F9FDRKgZHRIfE-tT01
(list) 1 1 __RequestVerificationToken=ToT-glFP2dW7U7AYtPjcjAT-7hWBe3iIiUYjRg4tMinWUhBN4Be8Qo36FyC-qL8_mOfY7MWSgET2M8kIpuX0MleYmLJn3-A6_E9FveN8j_36rcTjmuIV2pkZ9892luiRlHK2HgXepUPFDGqcVJmmdn-HCzC9YEbHEPH1OyDSWs41
(list) 548 0 __RequestVerificationToken=gbM2YmFRGi7yYZ0ic-ipw0umX3RtSz4Kn7y9yTsXbMRExnv-wX40MKKbgygNH01aPflh7rybsyYpIu9v1g-s1Hs-IIrpqPjw5ZndQNaUNfD_2Yx3XaaKUzl808dusffa7bpjVQa1HZ8VAz55L5w_7kUkYUQSfg2BmJaWM7aDTxU1
(list) 1 1 __RequestVerificationToken=mec5uGE2UTVP-iCM6mEMbYVsElHdeUZD7ay9XBkYfSzYYHkwqzeG3pbboAQQ7hiU2rZBh-wDuoLifMuGG6HYnr5Byo0A1EKaD6bwfIvk710pHGpq4A9QZrjWb2S7tPbpOHYqO3VB5ItaetJPrEHnlanToXD6wisV8s-d2Czo7-o1
(list) 1 1 __RequestVerificationToken=z6UahlNkt3UlEiCW5TnGmoORlues-U-O1cs7FihQQPlG52L7qLmWG3vkxQY6uQQRL6F4xzuet8zmZ-8I4aD9cMTYxfa64pf5sktHofl7p9ezJJ4cK2pDJqWrImK5gBTUkGrd1M3d-OAE6lwDx0gWuIeHCSOtDIEvVX3v2LeU_lU1
(list) 1 1 __ar_v4=NPVOPR4AB5FCPB2G4EOFPH%3A20150212%3A2%7CMXNG24JOKNCSXAEC45BD3K%3A20150212%3A2%7CNBPDN5WPCBGR3ATBF2EEHC%3A20150212%3A2
(list) 9 9 __ar_v4=UMR4R6WNO5A5XDYLDDQJ5W%3A20150008%3A2%7CNPVOPR4AB5FCPB2G4EYREW%3A20141117%3A6%7CMXNG24JOKNCSXAEC45BD3K%3A20141117%3A12%7CNBPDN5WPCBGR3ATBF2EEHC%3A20141117%3A12%7CHR63AHCPWBHH5LZ72UPYIS%3A20141117%3A3
(list) 1 1 __utma=35909538.1267946845.1427218502.1427218502.1427218502.1
(list) 1 1 __utma=35909538.154579644.1427202773.1427202773.1427202773.1
(list) 1 1 __utma=35909538.156555049.1423582295.1423582295.1423586012.2
(list) 9 9 __utma=35909538.360373432.1406151229.1425847161.1427219991.14
(list) 1 1 __utma=35909538.432412459.1427202844.1427202844.1427202844.1
(list) 548 0 __utma=35909538.596493886.1426607592.1426607592.1427218695.2
Bingo! There is a single session associated with 548 requests, the exact number of running requests. And unlike the other sessions, no request has finished. Clicking on (list) we see the very same 548 contexts we see in !whttp -running.
0:000> !wcookie -name ASP.NET_SessionId -value 0odyr1kc0cdqoaaomed1v4pe
02f1b334 https://contosoapp.com:443/OrderSystem/Order (200 NULL) Running (00:20:04)
======================================================================================
ASP.NET_SessionId=0odyr1kc0cdqoaaomed1v4pe
1 printed
======================================================================================
03095104 https://contosoapp.com:443/OrderSystem/Order (200 NULL) Running (00:21:22)
======================================================================================
ASP.NET_SessionId=0odyr1kc0cdqoaaomed1v4pe
1 printed
======================================================================================
030968ec https://contosoapp.com:443/OrderSystem/Order (200 NULL) Running (00:21:03)
======================================================================================
ASP.NET_SessionId=0odyr1kc0cdqoaaomed1v4pe
1 printed
======================================================================================
0309873c https://contosoapp.com:443/OrderSystem/Order (200 NULL) Running (00:21:11)
======================================================================================
ASP.NET_SessionId=0odyr1kc0cdqoaaomed1v4pe
1 printed
======================================================================================
030990b4 https://contosoapp.com:443/OrderSystem/Order (200 NULL) Running (00:21:11)
======================================================================================
ASP.NET_SessionId=0odyr1kc0cdqoaaomed1v4pe
1 printed
======================================================================================
0309f154 https://contosoapp.com:443/OrderSystem/Order (200 NULL) Running (00:21:40)
======================================================================================
ASP.NET_SessionId=0odyr1kc0cdqoaaomed1v4pe
1 printed
======================================================================================
030a2a70 https://contosoapp.com:443/OrderSystem/Order (200 NULL) Running (00:21:30)
======================================================================================
ASP.NET_SessionId=0odyr1kc0cdqoaaomed1v4pe
1 printed
======================================================================================
030a3404 https://contosoapp.com:443/OrderSystem/Order (200 NULL) Running (00:21:39)
======================================================================================
ASP.NET_SessionId=0odyr1kc0cdqoaaomed1v4pe
1 printed
(...)
All the calls were coming from the same IP address. So, there are only two possibilities: this site is under a DoS (Denial of Service) attack or a cat is walking all over the keyboard. Let me know your thoughts in the comments.
Comments
- Anonymous
March 29, 2015
A few years ago we had something like this, one worker process would randomly go out of control. A recycle would correct it and it would eventually come back up. The incident would last about 1 hour or so. Turns out we had a few users that would flip their keyboards up to eat lunch. This action would have the benefit of mashing a few keys; namely the F5 key. We would have to track the user down by IP address (we're not using Windows auth) and ask them to move their keyboard. I can only imagine the "big-brother"/NSA feeling that person must have had. FYI, when we started doing the analysis on the logs we noticed about 30 - 33 requests per second from the IP address. The default per second repeat rate is 31. [HKCUControl PanelKeyboardKeyboardSpeed]