Since the Panda updates from Google earlier this year, duplicate content has become an issue that no website owner can afford to overlook. While the update was designed specifically to target low value sites, content farms and scraped content, its paramount imperative was to reduce the amount of duplicate content that resulted in mass amounts of spam-ridden search results. As a direct result of the updates to the Google search algorithm, many thousands of both legitimate and nefarious sites were penalized with a significant drop in rankings and traffic.
Duplicate content can include the textual content of a website, scraped content from other sites, or similar content on multiple domains. Duplicate content issues also arise from dynamically generated product pages that display duplicate content throughout different sorting features. Google sees these pages as duplicate content.
Of the tactics available the 301 redirect and the more recent canonical tag, are the primary weapons in a web developers arsenal to help combat the problems associated with duplicate content. Unfortunately many aspiring webmasters do not always have a clear understanding of what they are, or how, or when each method should be employed.
What is a 301 Redirect?
In most cases a 301 redirect is used when you move your domain to a new webhost. The redirect tells search engines that your site has moved but still allows you to preserve your rankings. The other common usage of the 301 is to specify the preferred url of your domain.Typically you can go to either http://www.exampledomain.com or http://exampledomain.com< they are the same url but the search engine treats them as different urls. The 301 redirect allows you to specify the “proper” domain and retain the strength of the sites ranking so that it is not split between the two.
301s is that they were only designed to work at the domain level and did not address the duplicate content issues that were arising from have multiple dynamically driven pages. 301s also require that you have access to the web server hosting your site in order to implement them and an understanding of the syntax used to describe the parameters.
Introducing the Canonical Tag
Prior to the introduction of the canonical tag, duplicate content was simply ignored and people used link building practices to game the SERPs in order to determine which would be the first to be listed. However, this had the negative systemic effect of inundating the SERPs with webspam which made it increasingly difficult to get quality, relevant results when performing web searches. As a result, Google introduced the canonical tag in early 2009 as a way to resolve some of the major duplicate content issues faced by the search engines.
The canonical tag was designed as a page level element in which you edit the “head” of the HTML document and edit the parameters. The canonical tag is a very simple one line code string that is treated in very much the same way as a permanent 301 redirect. It ensures that the PageRank, backlinks and link juice flow to the “proper url” and is not split between domains. It is fully supported by Google, Bing, Yahoo and other search engines.
Another scenario is which you may want to use a canonical tag is when you have web pages that produce “ugly” urls (http://www.example.com/product.php?item=bluewidgets&trackingid=1234&sessionid=5678), due to advance sorting features, tracking options and other dynamically driven user-defined options. You can specify that the clean url, or the “proper,” or “canonical” version of the url, is at “location B.” Search engines will then index the url that you have specified and regard it as the correct url.
*This example tells the search engine that the “correct” version of the Blue Widgets page is located at the www version and not the non-www version of the page.
The main difference between a 301 redirect and the canonical tag is that the later only works within a single domain or subdomain; that is you cannot go from domain A to domain B. This has the added benefit of alleviating problems associated with 301 hijacks and similar attacks.
Introduction of The Cross-Domain Canonical Tag
In December of 2009, Google announced a (http://googlewebmastercentral.blogspot.com/2009/12/handling-legitimate-cross-domain.html)cross-domain rel="canonical" link element that was also going to work across domains; thereby allowing webmasters of multiple sites with similar content to define specific content as fundamentally sourced from a different domain.
A simple scenario in which the cross-domain tag would be used is If you have three related domains, on three separate urls and all featured the same article (or product descriptions, etc). You can use the cross-browser tag to specify the page that is the authority (or preferred page). As a result, the specified page will collect all associated benefits of Page Rank and Link Juice and will not penalize you for duplicate content.
In essence the new tag performs the exact same function as the 301 redirect but allowed for a much more user-friendly method of implementation.
During the release and subsequent promotion of the canonical tag, Matt Cutts stated that “anywhere from 10%-36% of webhosts might be duplicated content.” According to Cutts, there are several effective strategies to combat the problem of duplicate content including:
Using 301 redirects
Setting your preference in Google to the www or non www version in Google’s Webmaster Tools (http://www.google.com/webmasters/ )
Ensuring that your CMS only generates the correct urls
Submiting a sitemap to Google. They will try to only use those urls in the sitemap in an effort to pick the “best url”
301s Versus rel=canonical?
Some people have concerns are over how much link juice will they lose if they use a 301 instead of a canonical redirect. There is very little difference in the relative amount of page rank that gets passed between the two methods.
Matt Cutts from Google addressed the problem by stating:
”You do lose some (page rank) but the amount is pretty insignificant. This is used to try and stop people from using 301s exclusively for everything within their own site instead of hyperlinks.
Watch the full video where Matt discusses the issue:
The canonical tag is most appropriate used when you cannot get to the server’s headers to implement the 301 directly as a web technician is typically required to implement the 301 for you.
In the video above Matt addresses the question of relative strength loss between using a 301 Redirect and a rel=canonical tag. In a recent blog post (http://searchenginewatch.com/article/2072455/Hacked-Canonical-Tags-Coming-Soon-To-A-Website-Near-You), Beanstalk SEO's CEO, Dave Davies discusses a possible exploit of this “relative strength loss.”
Matt Cutts sent out a Tweet on May 13th stating, “A recent spam trend is hacking websites to insert rel=canonical pointing to hacker's site. If you suspect hacking, check for it.”
The conclusion is that there is a viable exploit of the rel=canonical tag and that by inserting the tag into a page can be a very effective strategy; on par with 301ing the page itself but even “better” in that it likely won't be detected by the site owner.
Davies continues by posing the following statement: “The next question we need to ask ourselves is, “Is this an issue now or just a warning?” implying that Google is certainly aware of the hack and will be analyizing ways to detect and penalize those that are planning to attempt this hack.
Article Take Aways:
The Panda updates have made the issue of duplicate content a priority for site owners to address.
Always use 301s whenever possible. They are more widely supported by search engines and can follow a 301 redirect. This also means that any new search engine that comes on to the market will have to support them as well.
301s only work at the domain level (ie. Pointing domainexapmle.com to www.domainexample.com)
301s also require that you have access to the web server hosting your site in order to implement them
The rel=canonical tag is a more user-friendly method to accomplish the same task as a 301.
The cross-domain Canonical tag works almost identical to a 301 direct.
The canonical tag is a user-friendly version designed to work within the site’s HTML head section.
PSD to Drupal
PSD to Magento