Using Microsoft network monitor to track down networking problems

There are a lot of software tools provided by Microsoft and written by other companies that really make the job of a support engineer easy. Without software tools, it is extremely difficult to track down software problems. Mark Russinovich is famous for the Windows tools he has written and they are widely used by Microsoft support teams to help debug various customer issues. Microsoft also provides a lot of software tools. In the IIS support group, we deal with HTTP failures that are a result of networking problems. Once we isolate it, the next thing to do is get simultaneous network traffic captures from the client and the server. That way we can see how the traffic flows between client and server and draw good conclusions on where the problem might be. There are two tools that can do this job and both are extremely good & very popular tools. Microsoft Network Monitor & Wireshark are a couple of tools that we use. You can download Microsoft Network Monitor from the Microsoft Download Center and Wireshark from Wireshark.org

For a quick start on reading network traces using Network Monitor, please see this Microsoft article: Basics of Reading TCP/IP traces

Recently we used Network Monitor to resolve a problem – An ASP.net web page streams files to clients using Response.WriteFile method. All other pages work just fine, however when sending the file, the clients end up getting – Page cannot be displayed. I had earlier written this blog post about this error, but which was for different reason. In that case, no web pages would work but in this case only the file download page failed.

One of the first things we do is “isolate the problem area”. Isolating the problem helps you focus on a specific part rather than looking at too many variables & possibilities. In this case we wanted to isolate if this an application related problem or just networking. We could never reproduce the problem by browsing the page from the web server console. We always had success from the server console. By browsing the page from the server console, we take the network out of the equation. Because we had success in this case, it was determined that this problem is caused by something going on in the networking layer.

The next thing to do is get data about the underlying traffic looks during the failure. Remember that you always need to get good data to draw useful conclusions. Without good data, we are just shooting in the dark. In this case we captured simultaneous network traces. Please refer to this post for steps to take simultaneous network traces. Here’s how we went about analyzing the traces.

Opened the trace in Microsoft Network Monitor

The next thing to do is filter the traffic we are interested in. Take a moment to look at the user interface items of Network Monitor that I highlighted in red circles.

NetmonUI

The Display Filter tab allows you to specify keywords or expressions that will help you filter traffic. For Eg. if you want to see only HTTP traffic, you can type http in the Display Filter text area and click on Apply button.

In this particular case I not only wanted to see HTTP traffic but also the TCP frames (between the web server this the client) and therefore I used a different filter, which was:

(tcp.SrcPort == 3117 && tcp.DstPort == 80)
||
(tcp.SrcPort == 80 && tcp.DstPort == 3117)

So how did I figure the port numbers? HTTP port number on a web server is almost always 80 unless the URL in the browser contained the port number like https://localhost:8080. So that is how I got the DstPort value. Next, I wanted to get the SrcPort. I filtered using http and looked for a frame with the URL that we used in reproducing the problem. Then selected that frame and looked at the Frame Details pane to get the SrcPort & DstPort values.

Compare the traces from the client & server captures using Frame Summary Window.

This is where it gets a bit tricky for people who are not familiar with reading traces. For most its just a lot of data and numbers, but let me help you read these traces. Pay special attention to the coloring as they are important.

Client Capture

Frame Time Src IP Dst IP Protocol Description
182 11:38:16.449 CLIENT SERVER TCP TCP:Flags=......S., SrcPort=3117, DstPort=HTTP(80), PayloadLen=0, Seq=1608257832
183 11:38:16.465 SERVER CLIENT TCP TCP:Flags=...A..S., SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3131352675, Ack=1608257833
184 11:38:16.465 CLIENT SERVER TCP TCP:Flags=...A...., SrcPort=3117, DstPort=HTTP(80), PayloadLen=0, Seq=1608257833, Ack=3131352676
185 11:38:16.465 CLIENT SERVER HTTP HTTP:Request, POST /Server/AppFolder/SendFile.aspx
186 11:38:16.465 CLIENT SERVER TCP TCP:[Continuation to #185]Flags=...AP..., SrcPort=3117, DstPort=HTTP(80), PayloadLen=1095, Seq=1608259213 - 1608260308
188 11:38:16.496 SERVER CLIENT TCP TCP:Flags=...A...., SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3131352676, Ack=1608259213
189 11:38:16.496 SERVER CLIENT TCP TCP:Flags=...A...., SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3131352676, Ack=1608260308
1751 11:39:46.509 SERVER CLIENT TCP TCP:Flags=...A...F, SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3131352676, Ack=1608260308
1752 11:39:46.509 CLIENT SERVER TCP TCP:Flags=...A...., SrcPort=3117, DstPort=HTTP(80), PayloadLen=0, Seq=1608260308, Ack=3131352677
1753 11:39:46.509 CLIENT SERVER TCP TCP:Flags=...A...F, SrcPort=3117, DstPort=HTTP(80), PayloadLen=0, Seq=1608260308, Ack=3131352677
1754 11:39:46.525 SERVER CLIENT TCP TCP:[Segment Lost]Flags=...A...., SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3131352677, Ack=1608260309

Server Capture

Frame Time Src IP Dst IP Protocol Description
2789 11:28:18.442 CLIENT SERVER TCP TCP:Flags=......S., SrcPort=3117, DstPort=HTTP(80), PayloadLen=0, Seq=2256040559, Ack=0
2790 11:28:18.442 SERVER CLIENT TCP TCP:Flags=...A..S., SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3177618710, Ack=2256040560
2791 11:28:18.442 CLIENT SERVER TCP TCP:Flags=...A...., SrcPort=3117, DstPort=HTTP(80), PayloadLen=0, Seq=2256040560, Ack=3177618711
2792 11:28:18.442 CLIENT SERVER HTTP HTTP:Request, POST /Server/AppFolder/SendFile.aspx
2793 11:28:18.442 CLIENT SERVER TCP TCP:[Continuation to #2792]Flags=...AP..., SrcPort=3117, DstPort=HTTP(80), PayloadLen=12, Seq=2256041928 - 2256041940, Ack=3177618711
2794 11:28:18.442 SERVER CLIENT TCP TCP:Flags=...A...., SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3177618711, Ack=2256041940
2795 11:28:18.442 CLIENT SERVER HTTP HTTP:HTTP Payload, URL: /Server/AppFolder/SendFile.aspx
2859 11:28:18.563 SERVER CLIENT TCP TCP:Flags=...A...., SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3177618711, Ack=2256043035
6447 11:29:48.425 CLIENT SERVER TCP TCP:Flags=...A...F, SrcPort=3117, DstPort=HTTP(80), PayloadLen=0, Seq=2256043035, Ack=3177618711
6448 11:29:48.425 SERVER CLIENT TCP TCP:Flags=...A...., SrcPort=HTTP(80), DstPort=3117, PayloadLen=0, Seq=3177618711, Ack=2256043036

Observation

  1. If you compare the sequence (Seq) numbers (highlighted in red) for each frame in both captures, they are different. So what does this mean? It means, someone in between the client and server changed the sequence numbers. In a normal capture, the sequence numbers will be the same in both captures for each corresponding frame.
  2. The value for PayLoadLen (highlighted in yellow) are different in the server and client captures. What does this mean? It would indicate that the packets were split – by some device/program in between.
  3. The client capture indicates that the server closed the connection by sending a “FIN” (see frame 1751 in the client capture). However, you do not see this in the server capture. The server never set the FIN TCP flag!

Summary & Conclusion

With this data, it is clear that there is a “middle man”, perhaps a device or software in between the client and server that isn’t handling the data flow correctly. The next step is to look at the networking infrastructure or get a network administrator to look at the devices that are in between the clients and IIS web server and isolate the offending device.

Comments

  • Anonymous
    March 14, 2010
    Sudeep Thank you for this posting.   I am attempting to use NetMon to troubleshoot a similar but not exact scenario. The issue is accessing a particular web site page from a client in two different ways.  The first is by using a standard internet connection.  The second is by using our corporate proxy connection.  When using the standard internet connection, the page loads witin 2-3 seconds.  When using our corporate proxy, the page takes approx. 2 minutes.  As a side note, this does not happen with all internet pages and I am specifically troubleshooting a website which many of our users connect and are experiencing significant delay on specific pages of which I am troubleshooting just one of the pages. I apologize ahead of time as I do not have all the specifics of our proxy server.  Our security guys are very particular about that information.   I have performed a Netmon capture from a client using the standard internet connection and a second capture while working from behind the proxy.  I am seeing exactly what you describe in your article regarding the different sequence numbers, different payload lengths etc. I am by far an expert at this.  I am truly a beginner.  I am just wondering if you have any ideas or could point me in the right direction.  I realize it is hard without having the captures.  I can provide the captures if you would like. Thank you in advance for any assistance you can provide.