Duplicate Content Issue –How to Find and Fix it

Sad to say, duplicate content on websites have not only peaked now but are occurring on the same domain of the same website more often than not! Nevertheless, this treatise will deal with all the aspects of duplicate content, along with describing ways about fixing these. At the same time, it will specify what duplicate content is all about, canonicalization, i.e. indicating your preferred URL to Google as well as how best to use free tools to deal with it. But to start with, it may be worthwhile to define what Duplicate Content actually stands for.

Duplicate Content vis-à-vis Canonicalization

As per Google’s own definition, “Duplicate content generally refers to substantive blocks of content within or across domains that either completely matches other content or is appreciably similar. Mostly, this is not deceptive in origin.” As for canonicalization, Google defines it as “Many sites make the same HTML content or files available via different URLs. [….] To gain more control over how your URLs appear in search results…. We recommend that you pick a canonical (preferred) URL as the preferred version of the page. You can indicate your preference to Google in a number of ways. We recommend them all, though none of them are required (if you do not indicate a canonical URL, we’ll identify what we think is the best version)”.

Categories of Duplicate Content

Incidentally, there are three broad categories of duplicates, such as (a) True Duplicates, (b) Near Duplicates and (c) Cross-domain Duplicates. A True Duplicate represents a page that is 100% identical in content to another page, differing only by the URL. A Near Duplicate differs negligibly in regard to another page – maybe a block of text, order of the content or an image. A cross-domain duplicate comes about when two websites share the same content. These often give rise to issues even for legitimate, syndicated content.

Ways of finding Duplicate Content

One of the ways of finding duplicate content involves the use of PlagSpotter or Copyscape which is an online duplicate content checking and monitoring tool. You may enter your URL to obtain an exhaustive list of sources or sites that duplicate your content. Just Login and use its proprietary Batch Search feature to check your whole site by way of providing a sitemap or copy/pasting the URLs that you wish to be checked.

Useful tips for Fixing Duplicate Content

  • Of course, the simplest method for dealing with duplicate content boils down to removing it and return a 404 Error. If you find that the content is of very little value for visitors or search and also lack in any major inbound links or traffic, in that case total removal is the most justified option.
  • Yet another rightful way of removing a duplicate content is via 301-redirect. However, it differs from 404 since it informs visitors (both human beings and bots) that the page has everlastingly moved to a different location. Also, from an SEO point of view, all the inbound link authority has passed to the new page, too.
  • You may also play it strategically by leaving the duplicate content available only for human visitors but blocking it for search crawlers by using a robots.txt file. However, it has both advantage as well as disadvantages. While it can block entire folders that include URL parameters, it can prove to be an unreliable method. True, it is effective for blocking un-crawled content but not so effective for removing content that is already in the index.
  • Often setting the preferred version of the website’s domain proves to be most effective way to tackle duplicate content on the website. In other words, you will be telling the search engines the domain, the www or non-www version of the website you choose indexing.
  • Using the Google Webmaster tool to set the chosen version of the website also proves helpful in tackling duplicate content issue substantially and well.
  • Using canonicalization tags in a website’s Meta data also enable the search engines to identify what URLs to index or pay attention to. The basic advantage, as far as canonicalization is concerned lies in the fact that it is not only easy to implement but there are multiple ways of doing it on a number of CMS platforms that include WordPress, Joomla or CMS made Simple™ .