Privacy Beyond Blocking Cookies: Bringing Awareness to Third-Party Content

Previous posts have covered trustworthy principles in general and some product specifics as well. Privacy is an important part of trustworthy computing. This post discusses one aspect of privacy on the web: third-party content.

When most people browse the web, they think what they see in the address bar and the site they are visiting are the same thing. However, web sites today typically incorporate content from many different web sites. For the sake of clear terminology, the site the user browses to directly (seen in the address bar) is the first-party site; the other sites that the first-party site incorporates in its site experience (but that the user hasn’t navigated to directly) are third-party sites.

When you browse to a first-party site, you know that it can collect information about how you use the site.  What many users don’t realize is that technically, third-party sites can collect information about users as well. Users aren’t typically well-informed about which third-party sites are collecting what information, how the sites use this information today, or how the sites could use the information in the future.

Identifying Third-party Sites

Most websites today are actually mosaics, or mash-ups, of several different sites. To see this, you can bring up the Privacy Report in Internet Explorer (from IE7’s Page menu or IE6’s View menu, choose the Web Page Privacy Policy menu item) for any site you visit. Here’s part of the report for a news site, and another from a credit card site:

Example Privacy Report Example Privacy Report

While the address bar shows the address of the current, first-party, site, this dialog shows the addresses of all the different web sites (including third-party sites) that the current web page includes content from. The browser visits every one of these sites in order to show the current web page’s content. 

The way that sites can pull content in from other sites is useful and powerful and typical on the web today. It’s part of the underlying design and structure of the web, and enables functionality (like an interactive map in the middle of a restaurant’s website, or a “share this” link in the middle a news article) that people value.

Third-Party Sites and Privacy

At the same time, bringing information together from different websites has privacy implications. A good example of this issue that most people have experienced involves email. Many email systems treat email messages that come from unknown senders in a special way, blocking images in them and displaying a warning like this one:

Blocked Images Warning Message

The message body typically has some missing images (“red X’s”) with text nearby, like “Right-click here to download pictures. To help protect your privacy, Outlook prevented automatic download of this picture from the Internet.”

Why do email systems block these external images? The sender may have programmed some information in the external image that is ­unique to the recipient – for example, having the image’s file name or location include the recipient’s email address. When the sender sees that a particular image was downloaded, then the sender knows which email message arrived in a valid account and was opened. By not downloading the content, the email recipient prevents his email system from disclosing information and protects his privacy from the unknown sender. Potentially, the recipient protects himself from more unsolicited email.

In general, every piece of web content that a computer requests from a website discloses information to that website. This basic technique enables a third-party site to track visitors across different first-party websites that include content from the same third-party. When several websites show content (like a syndicated photo or article) from the same third-party website, that third-party site can determine which of the websites a particular visitor has browsed to.

For example, say two totally unrelated sites, Site1.com and Site2.com, both include images from MySyndicatedPhotos.com. The user browses to both Site1.com and Site2.com, and the user’s browser calls MySyndicatedPhotos.com in order to get the images these sites include. MySyndicatedPhotos.com can figure out (by various means) that the same machine visited these two different sites.

As the user visits additional sites that show content from this same third-party site, this third-party site is in position to build a profile of the user’s activity across the different sites that include its content.

While cookies can definitely contribute here, and there’s been long-standing concern and confusion about “tracking cookies,” the fact is that any content coming from a third-party site can function like a tracking cookie. The intent of the content (a photo, article, logo, or site-specific analytics; image, text, or script) is technologically irrelevant to its potential use as a tracking mechanism. Note that even if the user had blocked all cookies, other content on third-party websites could still be used to build a profile. Third-party content isn’t inherently good or bad; it’s just technically possible to use it this way.

Actually Happening or Just Technically Possible, and Other Questions

To be clear, this post is about what a website can do when several other websites use content from it. It’s not what all third-party sites actually do when other sites refer to content on them. What is actually done with the available information is up to the third-party site, and in some ways very hard for anyone else to figure out. The third-party site could have a clear, well-written, and prominently posted privacy policy that guides its operations. It might not. The site could have an employee who loses a laptop with the data collected, or has malware on his machine and discloses collected information against policy. The site could have business arrangements with other sites that involve pooling data.

Also, this blog post isn’t meant as a technical deep-dive on the techniques sites can use to track users, or the different counter-measures technically-savvy users might take to avoid being tracked. The common technical theme here (as described above in the email case and here) involves ways that first-party sites enable information that can uniquely identify site visitors to flow to third-party sites. For example, many of the web addresses you’ll find in the Web Page Privacy Policy dialog are often quite long and contain unique identifiers.  There are better discussions of this topic elsewhere. For example, a recent IRC discussion about developing new standards for rich websites covered aspects of this topic. While it’s quite long, some parts are very relevant, like this one (that people “are being tracked whether they send cookies or not”) and this one (“anyone who wants to track people across the web can trivially do so at this point, even without cookies…. you can pretty easily ‘fingerprint’ people through things like their user-agent string, ip address, screen size, other js- and http- accessible prefs, etc and then with a simple set of analysis scripts you can easily work out who is who just look at the ‘anonymised’ search query string data aol released”).

Web browsing isn’t anonymous or perfectly private even without third-party sites. For example, the provider of Internet access (to a person’s home, hotel room, café table, or desk at school or work) can observe where the computer goes on the Internet. These providers typically provide terms of use, so users have clear notice and can choose to accept or decline connectivity under the stated terms. Any software running on the user’s machine can determine the websites the machine has visited; this is the basis of features like History, or toolbars that copy a user’s browser history up to the web so users can get at it from different machines. Again, terms of use and privacy policies are important tools here for users. The websites a user visits can determine information about the user (for example, the user’s likely location). Also, users give the sites they visit information directly in terms of what they click on and choose to do.

Third-Party Sites and Trust Issues

Given that web browsing isn’t anonymous and in some ways this is “how things work” on the web, what exactly is the trust issue? For many people, trust begins with security. The security risk here is plain: visiting one website exposes the user to potentially malicious content from other websites. The user visits one site and sees content on it that seems trustworthy (it’s on the site!) but actually comes from a different source. Finding examples of this problem on the web isn’t hard; it’s happened to visitors of several top tier websites.

Trust includes privacy as well. The privacy concern involves users having a choice, and being able to exercise control about what information they share. Today, users are not in control of which websites can get information about their browsing activities. As a result, web sites that users aren’t aware that they’ve visited and don’t have a well-defined relationship with are in position to build a profile of the users’ browsing patterns.

A guiding principle for Internet Explorer (and Microsoft overall, as part of Trustworthy Computing) is that the user should be in control. Consumers have come to expect security protections from their browsers, and are starting to have higher expectations about privacy protections as well. Control here means that users have clear notice and can tell what sites they may be disclosing information to and under what terms. Control also means that users can exercise choice about what information they disclose to whom. Preventing information disclosure means blocking content; blocking content creates a possible impact to the appearance and functionality of the page.

Beyond these issues, accountability is a question here as well. When a user visits one site after another, and each one includes some third-party content, who is accountable and who takes responsibility for the information collected about the user? On today’s web, that’s not at all clear.

The privacy and trust issues around third-party content are complex and important. As discussed in this blog before, trustworthy browsing involves many industry challenges, and, like many other efforts (e.g. interoperability), requires cooperation and trade-offs. Web privacy involves more than just blocking cookies. Enabling users to be in control starts with making users aware of the issues. In another post, we’ll cover IE8 functionality that helps users stay in control of their information.

Dean Hachamovitch
General Manager

Comments

  • Anonymous
    August 25, 2008
    As others have written here before, users should be in control of their information. That’s at the core

  • Anonymous
    August 25, 2008
    I'm not sure if this article is introducing a new policy, or just a new feature that lets people see what third party sites are present in the current page. Are MS going to block third party cookies by default? It'll harm your own MSN web properties if you do. If advertisers aren't able to track their ad performance (which is the only reason advertisers do tracking, they're not interested in individual users) then they'll become less effective places to advertise.

  • Anonymous
    August 25, 2008
    IE8 will block any third party cookie by default :) , but it will give the users option to do so, more covers in: http://blogs.msdn.com/ie/archive/2008/08/25/ie8-and-privacy.aspx

  • Anonymous
    August 25, 2008
    I currently have IE6-System IE7-Standalone installed for clarification (been needing to do some backwards compatibility testing of late)... Looking at the options in IE7's slider bar I'm a bit confused, what is a compact privacy policy? I'm not sure what IE8 has off hand, will take a second look when B2 comes out. I think IE8B1 (when I had it installed) was not able to open my P3P's URL. By chance has this issue been addressed in B2? I can wait and will try to remember to test it out when B2 comes out and if it hasn't been resolved I'll file a bug report. Thanks for posting Dean.

  • Anonymous
    August 25, 2008
    > as part of Trustworthy Computing) is that the user should be in control Trustworthy Computing, user in control ?? OMG ! This is the joke of the year ! ahahah

  • Anonymous
    August 25, 2008
      IE 8 Beta 2: Privacy is about more than cookies As others have written here before, users should

  • Anonymous
    August 25, 2008
    what i dont get is why they didnt change office 07's mail app to Office Mail to go along with vista

  • Anonymous
    August 25, 2008
    Previous posts on the IE Blog have covered trustworthy principles in general and some product specifics

  • Anonymous
    August 25, 2008
    Trustworthiness is key. Happy to see a focus on the issue.

  • Anonymous
    August 25, 2008
    Yes it is good! And Download Manager with resume capability!!?? Tony Chor!!

  • Anonymous
    August 26, 2008
    The comment has been removed

  • Anonymous
    August 26, 2008
    "the site the user browses to directly is the first-party site; the other sites that the first-party site incorporates in its site experience are third-party sites" Okay, so what are second-party sites? 8=]

  • Anonymous
    August 26, 2008
    This will be the major "selling" point for IE8. I think that many consumers are anxious about being more "under cover" these days.

  • Anonymous
    August 26, 2008
    So Dean, this is great!   Can you improve the UX?  In the first picture, it says "some cookies" were blocked, but that's the only way to distinguish it from the second.  How about an icon or indicator that I was protected? Also, it would be great to be able to see which cookies were blocked, and which were let through.  It would be awesome to be able to control those cookies right there, maybe with a right-click or a checkbox. Adam

  • Anonymous
    August 26, 2008
    I must admit I always wondered : what is a compact privacy policy? A cookie defined only for a domain?

  • Anonymous
    August 26, 2008
    Hi, jest to dopiero przedsmak,tego co byc powinno ,ale jak widać jest to juz lepsze rozwiazanie

  • Anonymous
    August 26, 2008
    The comment has been removed

  • Anonymous
    September 03, 2008
    Is there today any running Subscription list to use? I think many people really want to enable this future pronto but is unable to do so without a list to subscripe to.

  • Anonymous
    September 05, 2008
    The comment has been removed

  • Anonymous
    March 06, 2009
    In this installment of our Privacy Solutions Series, we'll be taking a look at the privacy-related features in the most popular browser in use today, Microsoft's Internet Explorer. Specifically, we'll be examining the most recent version of the browser,

  • Anonymous
    March 26, 2009
        이전 글에서는 일반적인 안정성 확보를 위한 행동 지침 (영어) 과 제품의 세부 사항 ( XSS Filter 와 안정성 (영어) )에 대해 설명했습니다. 프라이버시