Content-Encoding != Content-Type

RFC 2616 for HTTP 1.1 specifies how web servers must indicate encoding transformations using the Content-Encoding header. Although on the surface, Content-Encoding (e.g., gzip, deflate, compress) and Content-Type (e.g., x-application/x-gzip) sound similar, they are, in fact, two distinct pieces of information. Whereas servers use Content-Type to specify the data type of the entity body, which can be useful for client applications that want to open the content with the appropriate application, Content-Encoding is used solely to specify any additional encoding done by the server before the content was transmitted to the client. Although the HTTP RFC outlines these rules pretty clearly, some web sites respond with "gzip" as the Content-Encoding even though the server has not gzipped the content.

Our testing has shown this problem to be limited to some sites that serve Unix/Linux style "tarball" files. Tarballs are gzip compressed archives files. By setting the Content-Encoding header to "gzip" on a tarball, the server is specifying that it has additionally gzipped the gzipped file. This, of course, is unlikely but not impossible or non-compliant.

Therein lies the problem. A server responding with content-encoding, such as "gzip," is specifying the necessary mechanism that the client needs in order to decompress the content. If the server did not actually encode the content as specified, then the client's decompression would fail.

Here is a potentially over-simplified example:

  1. Windows Vista Networking Rocks!
  2. Jvaqbjf Ivfgn Argjbexvat Ebpxf!

If I mistakenly claim that string a) has been encoded using the simple ROT-13 obfuscation scheme when in actuality it has not, then the decoded message b) will be very different than the intended message.

Since the AI engine for WinINet isn't yet ready for production (joke), we try and work-around these non-compliant server responses but that isn't the right long-term approach. The fix and the ask, is for web server, extension and application authors to test their servers to see if they exhibit this behavior and if so fix their implementations before we remove our client-side hacks.

To test your server for compliance, issue a simple HTTP 1.1 request, including the "Accept-Encoding: gzip" for a .gz file and inspect the headers. If you see Content-Encoding: x-gzip or gzip, then the server is either gzip-encoding the already gzipped file or it is misstating that the content has been encoded by the server before transmission and therefore perpetuating client HTTP stacks, such as WinINet, having to absorb and hide this bad server behavior.

-Billy Anders

Comments

  • Anonymous
    September 09, 2006
    PingBack from http://john.se/blog/2006/09/10/why-internet-explorer-sometimes-saves-targz-files-like-tartar-files/

  • Anonymous
    January 28, 2008
    Doing some archeo here but: we just came accross an issue and after a case was open to MS, the official answer is: "The Internet Explorer not decompresses only files received with Content-Encoding=GZIP AND with Content-Type like "application/x-tar" "x-world/x-vrml" "application/zip", "application/x-gzip" "application/x-zip-compressed" "application/x-compress" "application/x-compressed" "application/x-spoon" to not break the behavior to IE 6. This was the reason to hold this behavior. We try for consistency with HTTP 1.1 standard, but we can´t give any guaranty that this meet the HTTP 1.1 RFC. So it´s better to use a modified Content-type-header to bring the IE to decompress these files." Very nice and not completelly inline with the "philosophy" of the above post...

  • Anonymous
    December 02, 2008
    The comment has been removed

  • Anonymous
    March 23, 2009
    The comment has been removed

  • Anonymous
    March 23, 2009
    The comment has been removed

  • Anonymous
    August 25, 2010
    This broken "feature" is still present in IE8. If you download a Zip file (Content-Type: application/zip) from Apache and it is also gzip encoded (Content-Encoding: gzip) then the file IE8 saves to disk will still be gzip-encoded, and Windows, Winzip, etc., will not be able to open it.