Larry and the "Ping of Death"

Also known as "Larry mounts a DDOS attack against every single machine running Windows NT"

Or: No stupid mistake goes unremembered.

 

I was recently in the office of a very senior person at Microsoft debugging a problem on his machine.  He introduced himself, and commented "We've never met, but I've heard of you.  Something about a ping of death?"

Oh. My. Word.  People still remember the "ping of death"?  Wow.  I thought I was long past the ping of death (after all, it's been 15 years), but apparently not.  I'm not surprised when people who were involved in the PoD incident remember it (it was pretty spectacular), but to have a very senior person who wasn't even working at the company at the time remember it is not a good thing :).

So, for the record, here's the story of Larry and the Ping of Death.

First I need to describe my development environment at the time (actually, it's pretty much the same as my dev environment today).  I had my primary development machine running a version of NT, it was running a kernel debugger connected to my test machine over a serial cable.  When my test machine crashed, I would use the kernel debugger on my dev machine to debug it.  There was nothing debugging my dev machine, because NT was pretty darned reliable at that point and I didn't need a kernel debugger 99% of the time.  In addition, the corporate network wasn't a switched network - as a result, each machine received datagram traffic from every other machine on the network.

 

Back in that day, I was working on the NT 3.1 browser (I've written about the browser here and here before).  As I was working on some diagnostic tools for the browser, I wrote a tool to manually generate some of the packets used by the browser service.

One day, as I was adding some functionality to the tool, my dev machine crashed, and my test machine locked up.

*CRUD*.  I can't debug the problem to see what happened because I lost my kernel debugger.  Ok, I'll reboot my machines, and hopefully whatever happened will hit again.

The failure didn't hit, so I went back to working on the tool.

And once again, my machine crashed.

At this point, everyone in the offices around me started to get noisy - there was a great deal of cursing going on.  What I'd not realized was that every machine had crashed at the same time as my dev machine had crashed.  And I do mean EVERY machine.  Every single machine in the corporation running Windows NT had crashed.  Twice (after allowing just enough time between crashes to allow people to start getting back to work).

 

I quickly realized that my test application was the cause of the crash, and I isolated my machines from the network and started digging in.  I quickly root caused the problem - the broadcast that was sent by my test application was malformed and it exposed a bug in the bowser.sys driver.  When the bowser received this packet, it crashed.

I quickly fixed the problem on my machine and added the change to the checkin queue so that it would be in the next day's build.

 

I then walked around the entire building and personally apologized to every single person on the NT team for causing them to lose hours of work.  And 15 years later, I'm still apologizing for that one moment of utter stupidity.

Comments

  • Anonymous
    October 16, 2007
    PingBack from http://www.artofbam.com/wordpress/?p=9190

  • Anonymous
    October 16, 2007
    Ah, but you did uncover the bug, and probably saved billions from losses due to maliciously malformed packets. Though it does bring up the idea of isolated networks for stuff like this.

  • Anonymous
    October 16, 2007
    > I quickly root caused the problem - the broadcast that was sent by my test application was malformed and it exposed a bug in the bowser.sys driver.  When the bowser received this packet, it crashed. Bowser.sys? There's a whole driver dedicated to dogfooding?

  • Anonymous
    October 16, 2007
    I thought I'd done the story of hte name of the bowser before.  It's because the driver is "such a dog" :).  My boss at the time had a colorful way with names

  • Anonymous
    October 16, 2007
    The comment has been removed

  • Anonymous
    October 16, 2007
    Karellen: I wrote bowser.sys too.   Actually a single failure would have been excused.  Stuff does happen, and we all know that. The reason this became a legend was that I did it a second time. And that was inexcusable.

  • Anonymous
    October 16, 2007
    Doesn't a story like this belong in Us Magazine though, in the "They're Just Like Us" section?  I want to see a picture of Larry with a big caption saying, "THEY BRING DOWN ENTIRE CORPORATE NETWORKS!"

  • Anonymous
    October 16, 2007
    Technically, wouldn't this be a plain old DOS attack rather than a DDOS attack?  From what you wrote, the PoD packets were from a single source (your machine) so they weren't really "distributed".

  • Anonymous
    October 16, 2007
    Chris: I was wondering if someone would think of that.  I figured it was "distributed" because one packet sent from my dev machine was distributed to several thousand other machines and crashed them all.

  • Anonymous
    October 16, 2007
    The comment has been removed

  • Anonymous
    October 16, 2007
    I guess it would be a reverse DDoS attack, given that a normal DDoS is a bunch of machines bringing down one.

  • Anonymous
    October 16, 2007
    Not quite the same thing, but when I was testing Winsock, I used JamesG's harness api tester, on what I mistakenly believed to be my office isolated network.  Hey, I was curious about how the competitor's TCP/IP stacks would handle it.   Buildings 1-4 had problems keeping up with the "very large" broadcast packet.  I told my test manager and PM about it, and they both agreed that the incident should be forgotten asap and never brought up again. Shame on me, and I quickly removed all of my office test machines in the lab.

  • Anonymous
    October 16, 2007
    > The reason this became a legend was that I did it a second time. > And that was inexcusable. But that is excusable, and enormously important.  The first time you did it, you didn't know.  The second time you did it, again you didn't know at first, but when you knew about it, you released a fix.  Your fix eventually reached millions of customers, right?  The only surprising part of this is that Microsoft didn't fire you for making a fix that eventually reached millions of customers.  Outside of Microsoft, you'd be a hero. Compare that to the Excel bug, where the typically Microsoftian decision was to not release a hotfix.  Someone must have got a big bonus for deciding not to release that hotfix. The way to get memories of that event to be forgotten would be to store them on hard drives partitioned by Windows.  That'll get all those memories wiped out.  Still.  Thank you for bucking Microsoft's system and getting your fix out the door.

  • Anonymous
    October 16, 2007
    Norman: Huh?  The Excel guys issued a hotfix ASAP.  And this was way early in the development process (years before we shipped).

  • Anonymous
    October 16, 2007
    > The Excel guys issued a hotfix ASAP. Last I saw, Microsoft wasn't distributing the hotfix but was considering including it in a service pack.

  • Anonymous
    October 16, 2007
    Sorry, I see it is published, just not automatically updated by automated tools.  Sorry. http://support.microsoft.com/default.aspx/kb/943075/

  • Anonymous
    October 16, 2007
    Larry, I have to be honest, I'm glad that Windows Vista shipped with WDS, it seems to be completely stable, quick, and the UI is asynchronous (even when enumerating old NT Browser systems). The instability and synchronous enumeration of the old browser list caused lots of application freezes on old versions of Windows (e.g. a Save File dialog in an MS Office application when the user wanted to store the file on a server). Some people blamed the network, others blamed their "slow" computer... ;)

  • Anonymous
    October 16, 2007
    I recently saw an oddity on a colleague's PC running Windows XP: network name lookup (i.e. Start > Run > \servername) had completely stopped working. When we looked at netdiag /test:winsock /v, it showed that there were a HUGE number of registered NetBT bindings, over 200. This is because he uses the laptop for commissioning Windows Mobile 5.0 devices, i.e. installing software on them then shipping them to the customer. ActiveSync in WM 5.0 is implemented using RNDIS - the device emulates a USB-connected network adapter. Each different device has its own serial number, so USB sees it as a different device. Guess what happens after you've plugged 100 different devices into the computer? You have 100 network adapters, bound to both TCP and UDP. Windows doesn't clean them up because they might eventually come back. The workaround was to set the DEVMGR_SHOW_NONPRESENT_DEVICES environment variable, launch Device Manager, select View/Show Hidden Devices and delete every one of the 'Windows Mobile-based Device #nnn' devices under Network Adapters. Having done this, file sharing suddenly started working again. I'd better do this soon on my PC, I'm up to Device #48. Anyone know of an automated way to delete these devices? (Sorry, Larry, I know it's a bit tangential, is bowser involved in any way?)

  • Anonymous
    October 17, 2007
    Mike: Not to my knowledge.  The browser is disabled by default on XP as far as I know.

  • Anonymous
    October 17, 2007
    The comment has been removed

  • Anonymous
    October 17, 2007
    I read bowser.sys and thought, "King Koopa has now invaded my OS kernel! All hope is lost!"

  • Anonymous
    October 17, 2007
    @Mike: One thing to try is to add a registry key to: HLKMSystemCurrentControlSetControlUsbFlags with Value name: IgnoreHWSerNumVVVVPPPP and Value DWORD:0x1 Where VVVV = USB Vendor ID in Hex PPPP = USB Product ID in Hex This key prevents the USB layer from creating individual per serial number nodes under HKLMSystemCCSEnumUSB. You will have to reboot after this change. Note that the Found New HW Wizard will no longer prompt you for the driver for each newly found device after this change. I'm not sure about the exact scenario that you're describing, but if the mechanism relies on the USB serial number (as opposed to the MAC address in the USB network adapter) it might help. (Our HW has a USB serial number, and in production testing, the registry quickly fills up with the EnumUSB nodes for each device connected if you do not use this key...) Larry: Sorry for the totally-off-topic.

  • Anonymous
    October 17, 2007
    The comment has been removed

  • Anonymous
    October 17, 2007
    Matt, Glad I wasn't the only one thinking Mario Bros.

  • Anonymous
    October 17, 2007
    The comment has been removed

  • Anonymous
    October 17, 2007
    Sounds like some kind of epic adventure inside Microsoft: Deep in the bowels of Microsoft is a lone programmer, sparring with a particularly merciless code fault. Long ago the daylight had forsaken him; the cold night was without stars and moon; he slowly began to sink into the dreary gloom of despair.  His mood worsened towards the brink of failure. As the night wore on, a minstrel came forward and proclaimed, "I will sing to you of Larry of the Third NT, and the Ping of Death." And when he heard that he laughed aloud for sheer delight, and he stood up and cried "O great glory and splendour! And all my wishes have come true!" and then he wept.

  • Anonymous
    October 17, 2007
    The comment has been removed

  • Anonymous
    October 18, 2007
    LOL!

  • Anonymous
    October 20, 2007
    I still remember while testing an early version (a beta) of Operations Manager (which later became MOM and now is OpsMgr - but it was still missioncritical software's at that time) that had a bug: instead than notifying the network Administrator with a NET SEND, it would notify EVERY SINGLE USER in the domain. So, testing it on the production environment it did flood everybody in the company with Alert popups.... OK, it did not actually crash anything, but still... the CEO of the company I was working at did not quite like that too...

  • Anonymous
    October 22, 2007
    heh! i brought down my corporate network one day, crashing every Win 3.1 machine... probably 30 or so people. We had BNC cabling (was that 10-baseT? I forget) configured as a ring with every machine on it... I was playing with a screwdriver in my machine (putting an 8 port serial card in) and accidentally shorted the network... Immediate swearing including the ferociously bad tempered and intimidating CEO (at the time, I was 21) who stormed out of his office swearing "Who the @#$% did that! What the $^%^ caused that". He then saw me with screwdriver in hand... "Was that you? Do you know how much %^&ing work I've lost?" Fortunately another guy I worked with, who i hadn't liked very much until that point said, "Nope, wasn't him, must've been the Novell server crashing. It does that sometimes." Ah, fond memories!

  • Anonymous
    October 23, 2007
    Here's a great anecdote " Larry and the Ping of Death " from Larry Osterman, if you're not subscribed

  • Anonymous
    October 23, 2007
    How long afterwards did it take for MSFT IT to call Cisco for some switches? Seems like it should have been the IT department apologizing.

  • Anonymous
    October 23, 2007
    Here's a great anecdote " Larry and the Ping of Death " from Larry Osterman, if you're

  • Anonymous
    May 31, 2009
    http://www.delymyth.net/blog/iphone-serversman-e-liphone-diventa-un-server-web quando mobasta precari m ha scritto che erano anche su youtube ho preso subito questo video <span sty...