IE8 Lookahead Downloader Fixed

Background
Last year, I wrote about two bugs in IE8’s Lookahead Downloader that would cause IE8 to make spurious download requests for non-existent URLs. These spurious download requests generally went unnoticed by users, because the main parser would eventually retrieve the correct resource when it was needed. However, for a small number of sites (where requesting non-existent URLs has side-effects), significant user-experience problems occurred when spurious requests were issued. For instance, on some sites, ASP.NET ViewState Corruption exceptions would result in the sites defensively logging the user out.

In October, we fixed one of the two bugs, correcting the URLs requested by the Lookahead Downloader when the markup contained a BASE tag.

After that fix, one more type of bug remained: a timing-related problem whereby the Lookahead Downloader would sometimes request a malformed URL consisting of the part of a URL preceding the 4096th byte[1] of the markup, combined with whatever text follows the 8192nd byte[1], up to the next quotation mark (sometimes called "the 4kb bug"). Our investigation determined that there were two scenarios that could lead to the 4kb bug:

  1. Parser Restart (occurs when the CHARSET is specified in a META tag)
  2. Parser Suspension (occurs for multiple reasons; a common one is when the document contains an XML Namespace declaration)

While web developers could easily avoid Scenario #1 (by specifying the CHARSET in the HTTP Content-Type response header), critically, Scenario #2 didn't have any easy, comprehensive workarounds.

Yesterday’s Fix
Yesterday’s IE8 Cumulative Update (KB980182) resolves the timing problems such that IE8’s Lookahead Downloader will no longer issue spurious requests.The Update resolves problems in Scenario #2 outright-- parser suspensions will no longer lead to problematic behavior. However, the Update kills the bug in Scenario #1 by disabling the Lookahead Downloader when a restart is encountered. Hence, we continue to strongly recommend that web developers specify the CHARSET in the HTTP Content-Type response header, as this ensures that the performance benefit of the Lookahead Downloader is realized. Even if a future version of IE addresses Scenario #1 more elegantly, there are other performance and security benefits to specifying the CHARSET using the HTTP header[2] for pages targeting any browser.

I’ve built a Meddler Script which demonstrates the Restart-related timing issue, but keep in mind that it shouldn’t do anything interesting in IE8 after the 3/30/2010 IE Cumulative Update is applied.

-Eric

[1] actual values varied, but were typically a multiple of 4kb
[2] Technically, using a Unicode BOM at the top of the document would also prevent the restart, but it doesn't confer the same security benefit.

Comments

  • Anonymous
    April 01, 2010
    Thanks for tracking and fixing these issues, and giving insight into how browsers work. Is there any kind of "spec" around speculative downloading? Do you know if the other major browsers do this, and if so how different the implementations differ? Is there benefit in getting more common behavior in this area? It would be beneficial if there was more sharing in this area - we'd end up with a better algorithm  and well understood behavior.

  • Anonymous
    April 01, 2010
    @Steve: I haven't seen any documentation Lookahead Downloader implementations, but I'm pretty sure that every major browser does it. I'm hoping to write a comprehensive explanation of behavior at some point in the future, because trying to "reverse engineer" how it works by looking at the wire traffic can be very misleading. IE8's "Lookahead Downloader" begins scanning ahead of the preparser (confusing name) when an script block end has been found, under the theory that the blocking behavior of script (especially when the script is from an external SRC) leads to wasted time. The Lookahead Downloader looks only for SCRIPT SRCs to download, in order to ensure that the next time we hit a SCRIPT block, we're less likely to have to wait on the network. What makes things a bit confusing is that the normal preparser also performs speculative downloads, for a wider variety of content types (e.g. images, CSS, script). The behavior of the preparser's speculative downloader is one of the things that your "browser download tester" tests.

  • Anonymous
    April 07, 2010
    Eric - KB980182 appears to be for IE6 SP1? (http://www.microsoft.com/downloads/details.aspx?FamilyID=daf199c4-da56-4a7f-80e6-3936ce5c267b&displaylang=en) Does this apply to IE8 as well?

  • Anonymous
    April 07, 2010
    @Kenza: The same KB number is used for the entire cumulative update, regardless of which IE version / OS it applies to.  See http://www.microsoft.com/downloads/en/results.aspx?pocId=&freetext=KB980182&DisplayLang=en, or better yet, just let Windows Update install it for you.

  • Anonymous
    September 16, 2010
    Hi Eric. This problem seems to be happening again. We're seeing spurious GETs of objects with spurious HTML content appended to the URL. Pretty much what was described in: stackoverflow.com/.../964403 Example user-agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0) Can you confirm whether this bug re-appeared around a month ago? FWIW, our application already provides this header: Content-Type:text/html; charset=utf-8 Thank you, Yves

  • Anonymous
    September 16, 2010
    @Yves: No, there are no known regressions here. How many hits are you seeing? It's possible that the user in question simply hasn't installed current updates from Windows Update. Do you have a URL I might have a look at?

  • Anonymous
    November 11, 2010
    I concur that this bug seems to have re-appeared. We're getting copious amounts of these errors each day.

  • Anonymous
    November 11, 2010
    To reiterate, there are no known regressions here, and without any additional information, I expect your errors are related to something else entirely.

  • Anonymous
    January 27, 2011
    It looks IE8 can't get the base if the href has relative path("..")


<base href="http://localhost/site/jsp/zzz.jsp"> -----....---- <a href="../zzz.action?zzzId=198&action=View" class="nodec"><img src="../images/view.gif" border="0" alt="View">

result ==== http://localhost/zzz.action  and http://localhost/images/view.gif which means the action will be 404 and it will not display the gif

  • Anonymous
    January 27, 2011
    @Dex: That is incorrect. Here's a test case: www.debugtheweb.com/.../base

  • Anonymous
    January 27, 2011
    @Eric: thanks for the reply. The page was generated by struts1 and works in IE6(with correct url - verified thru mouse hover). but somehow IE8 lost or can't get the base. To work in IE8, I have to get rid of the relative path.

  • Anonymous
    January 27, 2011
    The comment has been removed

  • Anonymous
    June 10, 2014
    The comment has been removed

  • Anonymous
    June 10, 2014
    @Eddie: I haven't heard any complaints. Do you have access to the client? If so, the version # from Help > About and a Nemon or Wireshark capture would be super useful. If not, if the URL's public that might help too.