| Take a look at your website. How much of your | | | | content! Fortunately a simple solution like adding a |
| content might be considered as duplicate by a search | | | | 'noindex' meta tag to your print pages solves the issue. |
| engine algorithm? Even though you never copy | | | | Product-Only Pages |
| anyone you can't answer 'none' because someone | | | | Product pages looking similar are common among |
| can be copying you. Duplicate content is one of the | | | | online stores. Typically they are created using a single |
| biggest issues both for search engines trying to keep | | | | template. Often two different product pages share a |
| their results' relevancy high, and webmasters trying to | | | | description that varies in just few words or numbers, |
| avoid search engine penalties. | | | | which causes them to be filtered out as duplicate |
| Penalties for having duplicate content can be really | | | | content. This issue has no easy solution. Either you |
| harmful. This is not just a downgrade in rankings but a | | | | rewrite robot.txt to allow only one product description |
| move to supplementary results which are hardly visible | | | | to be crawled and lose SE traffic to the rest of them, |
| to the most of the web users. Normally it is expected | | | | or you roll up your sleeves and add something |
| that Google would select one URL over another to | | | | different to each product page, like testimonials, which |
| display in SERPs, while duplicates could be found in | | | | is time consuming or nearly impossible depending on |
| supplemental results. Unfortunately this is not always | | | | the number of product types in your stock. |
| so. In the thread "Duplicate content observation" in the | | | | How Do Duplicate Content Filters Work? |
| forum you can read about a case when an original | | | | There are several algorithms in data mining aiming to |
| high quality and authoritative page was removed from | | | | detect similar text passages. The one claimed to be |
| Google's index together with its duplicates. Considering | | | | used by search engines is w-shingling. Each document |
| that this can happen even to the most honest | | | | has a unique fingerprint or shinglings - the contiguous |
| webmaster, one can imagine the amount of attention | | | | subsequences of tokens (blocks of text). The ratio of |
| this issue gets on any SEO forum. | | | | magnitude of union and intersection of two documents' |
| Types of Duplicate Content | | | | shinglings can be used to determine their resemblance. |
| Duplicate content has a wider definition than the | | | | Another algorithm that can be used for duplicates |
| 'copy-paste' plagiarism; it is not just content scrapped | | | | detection is Levenshtein's distance |
| from a competitor's site, a SERP or a RSS feed. | | | | It is naturally to expect from a duplicate content filter to |
| Apart from this there are few more aspects that are | | | | be able to discover the origin and rank it higher. The |
| generally referred to as duplicate content. | | | | simplest way to detect the origin would be comparing |
| Circular Navigation | | | | the date of indexing implying that the original source is |
| Jake Baille from TrueLocal vaguely defines circular | | | | uploaded and crawled earlier than its copies. But with |
| navigation as having multiple paths across website. | | | | the advent of the RSS feeds the new content can be |
| This can be understood as the same content being | | | | distributed instantaneously and this approach is no |
| accessible via different URLs. An example of the | | | | longer valid. |
| circular navigation could be an article that is retrieved | | | | Concerning the origin's right to be ranked higher - this is |
| by links like | | | | not always implemented. J.S.Cassidy in her article |
| - example.com/articles/1/ , | | | | 'Duplicate Content Penalties Problems with Googles |
| - mysite.com/article1/ | | | | Filter' published at tells about an experiment of an |
| - mysite.com/articles.php?id=1 | | | | article distribution. An article was syndicated twice |
| Another legitimate use of multiple URLs is forum | | | | scoring as many as 19000 copies. After some time |
| threads. Each thread can be accessible by a link like | | | | Google, Yahoo and MSN have purged their indices |
| myforum.com/index.php/topic.1201.html , and each | | | | leaving just few of the duplicates. MSN's filter |
| message within the tread has a URL like | | | | managed not only to discover the origin but also put it |
| myforum.com/index.php/topic.1201.msg.01.html . In the | | | | to the top of the search results. Yahoo has also |
| eyes of a search engine all the links lead to different | | | | discovered the origin, but in the results page to the title |
| pages with identical content. Solution? Think of a | | | | of the article, the origin's position fluctuated obviously |
| consistent way of linking, or apply robot.txt exclusion | | | | responding to the way Yahoo counts relevancy and |
| rules. | | | | authority. |
| This can also be the case when other people link to | | | | To the tester's amusement Google's refined index did |
| you using differently looking URLs. Since these external | | | | not include the original at all! Evidently Google featured |
| links are out of your control, you should create a 301 | | | | only those pages with copies of the same article |
| redirect to the canonical URL you choose to be | | | | which it considered relevant and authoritative with no |
| displayed. | | | | regard to the original source of the content! I've |
| Printer-Friendly Versions | | | | already mentioned a thread where a similar problem is |
| Making a printer friendly version is a common practice | | | | discussed. The both stories took place in 2005 and |
| and it adds value to the visitors. But printer-friendly | | | | early 2006 and so far I found no evidence that this |
| version is also a prominent example of duplicate | | | | issue is resolved. |