Should I still be worried about a duplicate content penalty?
Avoiding duplicate content is perhaps one of the most basic and best-known rules of SEO. However, it’s also well-known that search engines, and Google in particular, are constantly adapting and improving their algorithms and that key SEO best practices are regularly changing. In summer this year, in a Google live stream with Google’s Andrey Lipattsev and John Mueller, it was made clear Google doesn’t currently have a duplicate content penalty. Does this mean that worrying about duplicate content is a thing of the past, or should it still be a concern for SEOs?
The short answer is that yes – SEOs should definitely still be avoiding duplicate content. Although your site may not endure a direct penalty for having duplicate content, there are a number of indirect consequences which can do serious harm to your search ranking.
Why is duplicate content harmful?
The first consequence is that the authority of a page with duplicate content is diluted amongst the other pages which have the same content. This means that all the effort you’ve put into optimising a page will go to waste – all the SEO power the page could have enjoyed is being split between multiple pages. The second issue is that Google doesn’t like to present multiple, similar pages in its results, because this is unhelpful to the user. So instead of listing several duplicated pages, it will filter some out and only show that which it deems to be the most valuable. The problem is that it doesn’t know how to prioritise the right page, meaning that your original page may not be making an appearance in the SERPs.
Another important thing to note, which was discussed by the SEO experts in the live stream mentioned above, is that duplicate content can often go hand in hand with poor quality content. Look at it this way; a site would not be filled up with repetitive chunks of content if it had unique content to show off instead. This is often the issue at play with sites which see poor page rank and have duplicate content – it’s not so much the dupe content that’s sending negative signals to Google, but more that there is little unique, valuable content on the site to send positive signals to Google.
What is duplicate content?
In order to avoid duplicate content, it’s essential to understand what is considered duplicate content by Google. In their Support pages, Google defines duplicate content as \”substantive blocks of content within or across domains that either completely match other content or are appreciably similar.\” [https://support.google.com/webmasters/answer/66359?hl=en] That could be entire pages which are duplicated, or just key phrases, paragraphs or sections within a page which are found on other pages too, either on the same domain or a completely different one.
Most of the time, duplicate content occurs accidentally. It may be that you have multiple pages across your website which have largely identical content, for example if you’re an eCommerce business with several similar products which have near-identical specifications. Alternatively, you may have printer-only versions of web pages, which are basically stripped-down copies of existing content. It may be that some of your pages are linked via multiple distinct URLs, and that Google treats each of them as a unique page with duplicate content. Some businesses find themselves guilty of duplicate content because they create mobile-optimised, stripped-down pages with identical content to that of their regular site.
It is perhaps because most duplicate content occurs accidentally that Google doesn’t dish out site-wide penalties for duplicate content. As a general rule, duplicate content won’t hurt your entire domain, but it will impact the search ranking of the specific pages on which the duplicate content is posted. It is only if Google sees significant evidence of deliberate duplicating across multiple domains that it will deem it an attempt to manipulate search results and hit an entire domain with a serious penalty. That doesn’t mean to say you can simply ignore the instances of duplicate content on your site; you’ll still want to avoid and fix them to ensure all your site’s pages fulfil their full SEO potential.
How to avoid duplicate content
Before we talk about fixing existing duplication issues, let’s first talk about how to avoid a duplicate content penalty on your individual pages with these key best practices:
1. \”Noindex, follow\” duplicated pages
If you definitely need to create a page with duplicate content, for example printer-friendly versions, make sure Google doesn’t know about it. Doing so is simple – simple add the \”noindex\” tag for the URL in question in your robots.txt file. This tells Google that the page in questioned shouldn’t be included in their index, and means that it won’t display in search results, nor will it take any authority away from the original page. However, by adding the extra \”follow\”, you let Google know that their bots are allowed to crawl links on the page, which is particularly handy for pagination crawl issues.
2. Set your preferred domain and parameter handling
Setting a , i.e. preferred domain i.e. http://website.com rather than http://www.website.com, can help to prevent Google from treating different versions as separate pages. You can also set URL parameters to encourage Google to crawl the preferred versions of your URLs and prevent duplicate content issues.
3. Focus on creating unique content
One of the very best way to avoid duplicate content is to focus your content marketing efforts on creating unique, valuable content. We mentioned earlier that often it is a lack of high quality, original content on a site which harms its SEO efforts. By focusing on adding value to your website with fantastic content, you will not only give your visitors an excellent experience, but you’ll send positive signals to Google that your website is authoritative and high quality, which will boost your search rankings. If you have multiple pages on your site which are not completely identical but very similar, consider consolidating them into one page, or adding new content to each page so that each is completely unique and has genuinely valuable information.
How to fix duplicate content issues
Now you know the basics, you can make sure to avoid duplicate content issues in future. However, what about those who already have duplicate content – how can it be fixed? Luckily, it’s relatively easy to do so. There are two key methods for fixing duplicate content so that your optimised pages can perform well in the search results.
1. Use 301 redirects to point to the original page
If you have multiple URLs, all with identical content, you need to prevent Google from diluting the authority of the URLs by redirecting the duplicates to the original page. 301 redirects are essential here because they tell Google that the redirect is permanent. The pages will no longer compete against each other, and you will actually give the original page stronger popularity signals overall, particularly if the content is well optimised.
2. Use rel=canonical for multiple URLs
The rel=canonical tag is a slightly quicker and easier alternative to a 301 redirect, but is often used when there is an issue around multiple URLs which lead to the same content. For example, an eCommerce store may allow a product to be accessed via multiple different URLs depending on the way in which the user navigated the site’s product categories. The result is multiple different URLs, each with identical content. By using the rel=canonical tag in a web page’s header, you can let Google know that all the content metrics it applies to the page should be credited to the URL provided in the tag. Moz have an excellent article on how to successfully use the rel=canonical tag
How to remove duplicate content from Googles’ index
So, you’ve used one of the methods above to let Google know which page of duplicated content is the main one, and used the ‘noindex’ tag to prevent Google from indexing the duplicate pages in future. But are you wonder how to remove duplicate content that has already been indexed?
Well, the good news is that once Google re-crawls the web page, it will automatically remove the content from its index, provided that the URL has been tagged with ‘noindex’ in the robots.txt page. Google periodically re-crawls every single website on the web, but the frequency at which differs from site to site. To speed this process up and make sure google removes duplicate content from its index as quickly as possible, you can ask Google to re-crawl your URLs immediately.
Be unique and know your tags
Let’s sum up – yes, you should be worried about duplicate content, but not so much a specific penalty rather than indirect consequences of poor practices. Duplicate content should be avoided as much as possible, but the likes of the noindex tags, rel=canonical tags and 301 redirects can offer relatively easy fixes for known problems. Above all else, be unique and make your content valuable to keep dupe content to a minimum across your site.