Driver Hangs: Detection and Prevention

Posted September 20, 2004

Chat Date: September 15, 2004

Please note: Portions of this transcript have been edited for clarity

Introduction

Moderator: Eric_S (Microsoft)
Welcome to today's Chat. Our topic is Driver Hangs: Detection and Prevention. Questions, comments, and suggestions are welcome.

Moderator: Eric_S (Microsoft)
We will make an effort to answer as many questions as we can. There may be times when a question may be asked that we do not have an immediate answer for or cannot get to. We encourage you to post any of these questions in the Device development newsgroup at news://msnews.microsoft.com/microsoft.public.development.device.drivers>.

Moderator: Eric_S (Microsoft)
Let's introduce our hosts for today!

Host: geraldm (Microsoft)
Hi, my name is Gerald Maffeo. I'm a Lead Program Manager in the Windows Reliability Team. I'm driving no-hang initiatives for both apps and drivers. I have also been promoting the ability of users to cancel pending I/O on demand.

Host: Nar (Microsoft)
Hi. My name is Nar Ganapathy. I am an architect working on driver models in the windows device experience group.

Moderator: Eric_S (Microsoft)
Welcome everyone, let's get started!

Start of Chat

Moderator: Eric_S (Microsoft)
Any questions about driver hangs or hang prevention? Please feel free to ask! Our expert hosts are standing by waiting for your questions.

Host: geraldm (Microsoft)
Q: So why are driver hangs a problem, anyhow?
A: They impact the entire system and can cause mysterious, random behavior that appear to be application-related. They're horrible to diagnose and cause a huge amount of customer pain.

Host: Nar (Microsoft)
Q: I read about PreFast on WHDC site. Where and how can I get the tool and when it will be available
A: Prefast for drivers is available in the DDK. Install the latest DDK and you can use it. The DDK documentation tells you how to use it.

Host: geraldm (Microsoft)
Q: What are the preventive measures that one needs to take when writing device drivers so as not to cause a hang?
A: We have developed I/O completion / cancellation guidelines and a whitepaper that describes this.

Host: Nar (Microsoft)
Q: So, what does a "typical" driver hang look like? A forgotten IRP that never gets completed? A request that takes an arbitrarily long amount of time and does not support cancellation? Or a driver that just stops handling requests, period?
A: From an application point of view, a driver caused hang will result in an unkillable hung application. This almost all the time happens because a driver fails to complete an IRP in time. A driver could also block the thread using KeWaitForSingleObject in its dispatch routine for a very long time. This can cause a hang as well. The WINHEC talk that you can find at this link https://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/DW04011_WINHEC2004.ppt describes these issues in great detail.

Host: geraldm (Microsoft)
Q: Is there a commercially available PCI card that can generate NMI when the user presses a button?
A: There are such cards, but none of us knows which vendors make these. Eric will provide some newsgroups you can ask.

Moderator: Eric_S (Microsoft)
A: You might want to ask in the driver developer newsgroup for some recommendations as well: news://msnews.microsoft.com/microsoft.public.development.device.drivers

Host: Nar (Microsoft)
Q: If a driver waits on an IRP to be completed after passing it down the stack, is there a way to find out which IRP is being waited upon?
A: I am assuming that you want to do this in the debugger. Typically you find this IRP stashed away in a driver specific data structure. If the IRP was issued by the I/O manager it will also be in the per thread list. Typing !thread in the debugger on the hung thread pointer will give you the list of IRPs queued to that thread. One of them would be the IRP that's causing the hang.

Host: geraldm (Microsoft)
Q: If a driver waits on an IRP to be completed after passing it down the stack, is there a way to find out which IRP is being waited upon?
A: We have added a new feature to Longhorn that allows users to report hangs that involve drivers. These include partial kernel minidumps that include IRPs that are still pending when the application still hasn't gone away after 10 seconds.

Host: geraldm (Microsoft)
Partners who subscribe to Winqual will be able to obtain these kernel dumps. Details are still to be worked out, though.

Host: Nar (Microsoft)
Q: Once saw some power management misbehavior that smelled like a driver hang sort of issue. Rather than a PM request being vetoed, the system might take 10 or 20 minutes before finally responding. Was unable to get into the system for debugging but the party ultimately responsible was a misconfigured 802.11g driver.
A: This also could be tracked down using !poaction. This lists the set of power IRPs that are still pending.

Host: Nar (Microsoft)
Q: could you perhaps elaborate on the consequences of not supplying a cancel routine in situations where the driver holds the IRP for a long time? (e.g. shutdown hangs, etc)
A: If you don't supply a cancel routine, the system will never be able to notify you when the application is terminating or wants to cancel its request. So not supplying a cancel routine results in poor overall system behavior. In LH we are expecting more applications to cancel I/O requests in order to be more responsive. We will also enforce timely completion of I/O requests using the driver verifier. So its very important to supply a cancel routine if the IRP is going to take more than a couple of seconds to complete its request.

Moderator: Eric_S (Microsoft)
Q: is this cancel routine also true for a Language Monitor added to a printer driver?
A: Unfortunately we don't have any printer driver experts with us today, but we highly recommend you attend the upcoming printer driver chat schedule for Sept 21. For more info, see https://www.microsoft.com/whdc/resources/newgrps.mspx

Moderator: Eric_S (Microsoft)
Thanks for joining us today and thanks for the questions. It's time for us to go now. If we couldn't get to your question, please post it in the news://msnews.microsoft.com/microsoft.public.development.device.drivers newsgroup.

Moderator: Eric_S (Microsoft)
Please see the chats schedule for upcoming topics at https://msdn.microsoft.com/chats/.

Host: Nar (Microsoft)
Thanks for posting the questions. Hopefully you had a useful session. Good bye!

Host: geraldm (Microsoft)
Hopefully this has been helpful. I appreciate your taking the time to join us today. Thanks!

Website: https://www.microsoft.com/whdc

Top of pageTop of page