Understanding Canonicalisation

Facebook Twitter Google + LinkedIn

Put simply, canonicalisation means… well, actually, it’s not that simple.

Let’s start with some definitions of “canon” – going way back to the days of Pachelbel’s compositions and Shakespeare’s scribblings.

Pachelbel’s Canon – much like the London’s Burning nursery rhyme – uses a set melody which is then repeated verbatim over the top of itself. This repetition begins as the previous line ends, leading to a cacophonic harmony of voices singing out of sync, but somehow in time, with each other.

Similarly, ‘canon’ can also be used to denote the authenticity of a writer’s work.

In SEO, canonicalisation relates to both authenticity and duplication – and comes into play when content on one URL is available via a range of similar URLs.

Search spiders crawling webpages won’t discriminate to decide on the authentic “canon” for duplicate content.

So, spiders will look at http://thispage.com/, http://www.thispage.com/ and http://www.thispage.com/index.html - and each will have the same content.

So how does the search engine know which page to list?

Essentially, if it’s left to guess, it’ll pick one page and return that in the results. This is usually based on PageRank. But it may mean your traffic is being sent to the wrong URL.

You can help the spiders know which page you want treated as the ‘canon’, to ensure that URL is the one returned in the results.

Though there are some relatively straightforward 'fixes' for the problem of canonicalisation, many webmasters choose to ignore it, and sacrifice a better ranking in the SERPs by doing nothing. Bad strategy.

No doubt there are some who are unsure because they do not really understand the technical side of their sites, whilst others have the mistaken belief that being able to access their site in different ways will enable them to be found by more visitors.

The first step towards resolving canonicalisation problems is to find out if there are any in the first place – we always find identifying a problem before trying to fix it is the best strategy. To establish your canonicalisation situation, navigate to Google and enter the desired web page URL into the search box. If the results display multiple versions of the URL (including a forward slash), there could be an issue.

Canonicalisation simply – there’s that word again – simply means choosing one when there are many from which to choose.

In this context it means choosing the preferred URL and either redirecting the others or preventing them from being crawled.

This can be done in one of two ways.

  • First, a <link> tag with a rel="canonical" attribute can be added to each web page.
  • Second, a 301 redirect can be set up.

With the first option, the canonical link tag should always be placed in the head section of the HTML code to ensure that Googlebot does not ignore it. Googlebot will then read this and know whether the content of the page refers to the canonical page originally chosen, and therefore the preferred URL. Not everyone is confident with adapting their HTML code, so Google has created some very helpful guidelines for their benefit.

The second option, using a 301 redirect, tells the Googlebot – not to mention human web users – that the content of a web page, or even the whole page, is now located elsewhere. Once implemented, users will be automatically redirected to the new location and search engines will re-index the content in due course.

Ignoring canonical issues can eventually cause numerous problems for a website because they relate to the URL structure of the website. They can reflect badly on its integrity and ultimately, and most notably, on its search engine ranking.

Failing to address the issue could mean the difference between your search campaigns singing in harmony, or creating a discordant, dissonant mess.

Subscribe to our blog and get the latest industry-updates direct to your inbox