It doesn’t seem to matter which forums you hang out in, or what blogs you read, duplicate content comes up time and time again. And there appears to be some confusion about whether and how you are penalised for duplicate content.
So, here goes an attempt to explain duplicate content.
There are two types of duplicate content:
1) The first type comes from scraping sites to gather content to fill your own. We referred to this scraping and duplicate content when talking about the importance of unique content back in November 2008. It is obvious from our comment spam that there are far too many websites out there which have autoscrapers which seek out relevant content and add it to their own site with no commentary, opinions, additional content creation, or added value. This is frowned upon by the search engines for several reasons.
a) It is not unique content that has been created by that site owner. It has been pilfered, robbed, borrowed, plagiarised, nicked by website owners who are too lazy to write their own content, and who generally have only set up the website to make money from Google Adsense or similar.
b) The content has already been attributed to its original source by the search engines so scraping it just flags it up as not being original. This applies to affiliate content as well as scraped content
c) If no attempt if made to add value to that content, the search engines just see it as cluttering up their rankings with the same old, same old.
2) The second type of duplicate content comes from syndicating or from publishing similar information under different URLs. If you syndicate your content out eg with an RSS feed, or by publishing articles on multiple sites, it is not a problem unless you are trying to deceive the search engines. In most instances, the purpose of syndicating the content is not SE deception, it is to get the content in front of the widest possible audience, and the search engines can see that and just choose the most appropriate version of the content to index and link to.
There is an alternative version of this when you may have different URLs pointing to the same product, and this is most obvious on dynamic, database driven, e-commerce sites where the URL is auto generated from product searches. Google are quite clear on how to deal with this in this article on duplicate content.
So, when you start to have sleepless nights about being penalised for duplicate content, all you need do is ask yourself whether your duplicate content is honourable (Type 2 above) and hence white hat, or dishonourable (Type 1 above) and hence black hat. And as long as your answer is white hat, you can sleep soundly knowing you are unlikely to be penalised for anything!