rel=canonical • What it is and how (not) to use it
In February 2009, six years to the day from when this is published, Google, Bing and Yahoo! introduced the rel=canonical link element (Matt’s post is probably the easiest reading). While the idea is simple, the specifics of how to use it turn out to be complex. The basic premise is: if you have several similar versions of the same content, you pick one “canonical” version and point the search engines at that. This solves a duplicate content problem where search engines don’t know which version of the content to show. This article takes you through the use cases and the anti use cases.
Easiest correct example of using rel=canonical
Let’s assume you have two versions of the same page. Exactly, 100% the same content. They differ in that they’re in separate sections of your site and because of that the background color and the active menu item differ. That’s it. Both versions have been linked from other sites, the content itself is clearly valuable. Which version should a search engine show? Nobody knows.
For example’s sake, these are their URLs:
This is what rel=canonical was invented for. Especially in a lot of e-commerce systems, this (unfortunately) happens fairly often, where a product has several different URLs depending on how you got there. You would apply rel=canonical in the following method:
section of the page:
<link rel="canonical" href="http://example.com/wordpress/seo-plugin/">
That’s it. Nothing more, nothing less.
What this does technically is “merge” the two pages into one from a search engine’s perspective. It’s basically a sort of “soft redirect”, without redirecting the user. Links to both URLs now count for the single canonical version of the URL.
Should a page have a self referencing canonical?
In the example above, we make the non-canonical page link to the canonical version. But should a page set a rel canonical for itself? This is actually discussed every once in a while. I have a strong preference for having a canonical link element on every page. The reason is that most CMSes will allow URL parameters without changing the content. So these would show the same content:
etc. You see my point. If you don’t have a self referencing canonical on the page that points to the cleanest version of the URL, you risk being hit by this stuff. Even if you don’t do it yourself, someone else could do this to you and cause a duplicate content issue. So adding a self referencing canonical to URLs across your site is a good “defensive” SEO move. Luckily for you, our WordPress SEO plugin does this for you.
Cross domain canonical
Now, you might have the same piece of content on several domains. For instance, SearchEngineJournal regularly republishes articles from Yoast.com (with explicit permission). Look at every one of those articles and you’ll see a rel=canonical link point right back at our original article. This means all the links pointing at their version of the article count towards the ranking of our canonical version. They get to use our content to please their audience, we get a clear benefit from it too. Everybody wins.
The risk of faulty canonicals: common errors
There are a multitude of cases out there showing that a wrong rel=canonical implementation can lead to huge issues. I know of several sites that had the canonical on their homepage point to an article, and completely lost their homepage from the search results. There are more things you shouldn’t do with rel=canonical. Let me list the most important ones:
- Don’t canonicalize a paginated archive to page 1. Don’t add a rel=canonical on page 2 and further, search engines will actually not index the links on those deeper archive pages anymore…
- Make them 100% specific. For various reasons, a ton of sites use protocol relative links, meaning they leave the http / https bit from their URLs. Don’t do this for your canonicals. You have a preference. Show it.
- Base your canonical on the request URL. If you use variables like the domain or request URI used to access the current page while generating your canonical, you’re doing it wrong. Your content should be aware of its own URLs. Otherwise, you could still have the same piece of content on for instance example.com and www.example.com and have them both canonicalize to themselves.
- Multiple rel=canonical links on a page causing havoc. Sometimes a developer of a plugin or extensions thinks that he’s God’s greatest gift to mankind and he knows best how to add a canonical to the page. Sometimes, that developer is right. But since you can’t all be me, they’re inevitably wrong too sometimes. When we encounter this in WordPress plugins we try to reach out to the developer doing it and teach them not to, but it happens. And when it does, the results are wholly unpredictable.
Good to know: rel=canonical and social networks
Facebook and Twitter honor rel=canonical too. This might lead to weird situations. If you share a URL on Facebook that has a canonical pointing elsewhere, Facebook will share the details from the canonical URL. In fact, if you add a like button on a page that has a canonical pointing elsewhere, it will show the like count for the canonical URL, not for the current URL. Twitter works in the same way.
Setting the canonical in WordPress SEO
If you use WordPress SEO, you can change the canonical of several page types using the plugin. You only need to do this if you want to change the canonical to something different than the current page’s URL. WordPress SEO already renders the correct canonical URL for almost any page type in a WordPress install.
For posts, pages and custom post types, you can edit the canonical in the advanced tab of the WordPress SEO metabox:
For categories, tags and other taxonomy terms, you can change them here:
If you have other advanced use cases, you can always use the
wpseo_canonical filter to change the WordPress SEO output.
Advanced uses of rel=canonical
Site migrations (use with care)
Sometimes, when you’re moving a site from one domain to another, you might want to “soft-launch” the new site. This could for instance be the case when you’re combining the migration with a rebrand and redesign and you want to let people get used to the new brand for a while first before you finally flip the switch.
When you do something like this, you could have a complex rel=canonical scheme where at first, you canonicalize the new site to the old one and then after a month or so flip the direction of the canonicals to the new site. This would prevent the new site from showing up in the search results during the first month and would then slowly start the migration process in the second month. Do not leave this online forever though, 301 redirect to the new domain at some point. A 301 redirect is still a far more reliable and more widely trusted method of moving content.
Canonical link HTTP header
Google also supports a canonical link HTTP header. While these can be very useful if you’re a savvy server admin, they also tend to get abused by hackers a lot. It’s hard to spot these if you’re not a pro and all the link juice for your page might be pointing at someone else without you ever noticing until the page drops out of the search results…
They can be very useful though, for instance for canonicalizing PDFs, so it’s good to know that the option exists.
Using rel=canonical on not so similar pages
While I won’t recommend this, you can definitely use rel=canonical very aggressively. Google honors it to an almost ridiculous extent, where you can canonicalize a very different piece of content to another piece of content. If Google catches you doing this though, it might stop trusting your site’s canonicals and thus cause you more harm…
Conclusion: rel=canonical is a power tool
Rel=canonical has, in the 6 years of its existence, turned into a powerful tool in an SEO’s toolbox, but like any power tool, you should use it wisely as it’s easy to cut yourself. We’re curious as to what the next 6 years of canonical will bring.
This post first appeared on Yoast. Whoopity Doo!