Canonical Formats and Query Strings - IIS SEO Toolkit

Today somebody was running the IIS SEO Toolkit and using the Site Analysis feature flagged a lot of violations about "The page contains multiple canonical formats.". The reason apparently is that he uses Query String parameters to pass contextual information or other information between pages. This of course yield the question: Does that mean in general query strings are bad news SEO wise?

Well, the answer is not necessarily.

I will start by clarifying that this violation in Site Analysis means that our algorithm detected that those two URL's look like the same content, note that we make no assumptions based on the URL (including Query String parameters). This kind of situation is bad for a couple of reasons:

  1. Based on the fact they look like the same page Search Engines will probably choose one of them and index it as the real content and will discard the other one. The problem is that you are leaving this decision to Search Engines which means some might choose the wrong version and end up using the one with Query String parameters instead of the clean one (not-likely though). Or even worse they might end up indexing both of them as if they were different.
  2. When other Web sites look at your content and add links to it, some of them might end up using the URL with different Query String parameters and some of them not. What this means is that the organic linking will not give you the benefits that you would if this was not the case. Remember Search Engines add you "extra" points when somebody external references your page but now you'll be splitting the earnings with "two pages" instead of a single canonical form.

Query String by themselves do not pose a terrible threat to SEO, most modern Search Engines deal OK with Query Strings, however its the organic linking and the potential abuse of Query Strings that could give you headaches.

Remember, Search Engines should make no assumptions based on the fact it is a single "page" that serves tons of content through a single Absulte Path and the use of Query Strings. This is typical in many cases such as when using index.php, where pretty much every page on the site is served by the same resource and just using variations of Query Strings or path information.

 

So what should I do?

Well, there are several things you could do, but probably one of the easiest is to just tell Search Engines (more specifically crawlers or bots) to not index pages that have the different Query String variations that really are meant only for the application to pass state and not to specify different content. This can be done using the Robots Exclusion Protocol and use the wildcard matching to specify to not follow any URL's that contain a '?'. Note that you should make sure you are not blocking URL's that actually are supposed to be indexed. For this you can use the Site Analysis feature to run it again and it will flag an informational message for each URL that is not visited due to the robots exclusion file.

User-agent: *

Disallow: /*?

 

In summary, try to keep canonical formats yourself, don't leave any guesses to Search Engines cause some of them might get it wrong. There are new ways of specifying the canonical form in your markup but it is "very recent" (as in 2009) and some Search Engines do not support it (I believe the top three do, though) using the new rel="canonical":

<link rel="canonical" href="https://www.my-site.com/my-canonical-url" />

In the Beta 2 version of IIS SEO Toolkit we will support this tag and have better detection of this canonical issues. So stay tuned.

Other ways to solve this is to use URL Rewrite so that you can easily redirect or rewrite your URL's to get rid of the Query Strings and use more SEO friendly URL's.

Comments

  • Anonymous
    June 09, 2009
    Thanks for posting the quick helpful fix.  I made the change you recommended to my robots.txt file and hope for the best next time a search engine crawls my site.
  • Anonymous
    June 09, 2009
    Make sure to run Site Analysis afterwards to clearly see which URL's will be blocked to ensure you are not removing content that you did not intend.http://www.iis.net/extensions/SEOToolkit
  • Anonymous
    June 09, 2009
    Stellar Post Carlos...Even though I am an Apache and PHP type personally. You have piqued my interest to dig into the new SEO toolkit.MSN is on the move on multiple fronts, first the Bing / Kumo / Live makeover and now this.Thanks for the great tips to elect the preferred parameter based on exclusion using robots.txt
  • Anonymous
    October 08, 2010
    Just for informationDisallow: /*?  will block all of the URL of your website using a query stringYou can block only some of them like that:Example:A product page could be /product.php?id=1Let's say you use a query string to manage the layout like that:/product.php?id=1&display=1/product.php?id=1&display=2 These 2 have the same contentDisallow: /product.php?id=&display=In that case,/product.php?id=1 IS NOT blocked/product.php?id=1&display=1 IS blocked