| Take a look at your website. How much of your | | | | Fortunately a simple solution like adding a |
| content might be considered as duplicate by a | | | | 'noindex' meta tag to your print pages solves |
| search engine algorithm? Even though you | | | | the issue. |
| never copy anyone you can't answer 'none' | | | | |
| because someone can be copying you. Duplicate | | | | Product-Only Pages |
| content is one of the biggest issues both for | | | | |
| search engines trying to keep their results' | | | | Product pages looking similar are common |
| relevancy high, and webmasters trying to | | | | among online stores. Typically they are |
| avoid search engine penalties. | | | | created using a single template. Often two |
| | | | different product pages share a description |
| Penalties for having duplicate content can be | | | | that varies in just few words or numbers, |
| really harmful. This is not just a downgrade | | | | which causes them to be filtered out as |
| in rankings but a move to supplementary | | | | duplicate content. This issue has no easy |
| results which are hardly visible to the most | | | | solution. Either you rewrite robot.txt to |
| of the web users. Normally it is expected | | | | allow only one product description to be |
| that Google would select one URL over another | | | | crawled and lose SE traffic to the rest of |
| to display in SERPs, while duplicates could | | | | them, or you roll up your sleeves and add |
| be found in supplemental results. | | | | something different to each product page, |
| Unfortunately this is not always so. In the | | | | like testimonials, which is time consuming or |
| thread "Duplicate content observation" in the | | | | nearly impossible depending on the number of |
| forum you can read about a case when an | | | | product types in your stock. |
| original high quality and authoritative page | | | | |
| was removed from Google's index together with | | | | How Do Duplicate Content Filters Work? |
| its duplicates. Considering that this can | | | | |
| happen even to the most honest webmaster, one | | | | There are several algorithms in data mining |
| can imagine the amount of attention this | | | | aiming to detect similar text passages. The |
| issue gets on any SEO forum. | | | | one claimed to be used by search engines is |
| | | | w-shingling. Each document has a unique |
| Types of Duplicate Content | | | | fingerprint or shinglings - the contiguous |
| | | | subsequences of tokens (blocks of text). The |
| Duplicate content has a wider definition than | | | | ratio of magnitude of union and intersection |
| the 'copy-paste' plagiarism; it is not just | | | | of two documents' shinglings can be used to |
| content scrapped from a competitor's site, a | | | | determine their resemblance. Another |
| SERP or a RSS feed. Apart from this there are | | | | algorithm that can be used for duplicates |
| few more aspects that are generally referred | | | | detection is Levenshtein's distance |
| to as duplicate content. | | | | |
| | | | It is naturally to expect from a duplicate |
| Circular Navigation | | | | content filter to be able to discover the |
| | | | origin and rank it higher. The simplest way |
| Jake Baille from TrueLocal vaguely defines | | | | to detect the origin would be comparing the |
| circular navigation as having multiple paths | | | | date of indexing implying that the original |
| across website. This can be understood as the | | | | source is uploaded and crawled earlier than |
| same content being accessible via different | | | | its copies. But with the advent of the RSS |
| URLs. An example of the circular navigation | | | | feeds the new content can be distributed |
| could be an article that is retrieved by | | | | instantaneously and this approach is no |
| links like | | | | longer valid. |
| | | | |
| - example.com/articles/1/ , | | | | Concerning the origin's right to be ranked |
| | | | higher - this is not always implemented. |
| - mysite.com/article1/ | | | | J.S.Cassidy in her article 'Duplicate Content |
| | | | Penalties Problems with Googles Filter' |
| - mysite.com/articles.php?id=1 | | | | published at tells about an experiment of an |
| | | | article distribution. An article was |
| Another legitimate use of multiple URLs is | | | | syndicated twice scoring as many as 19000 |
| forum threads. Each thread can be accessible | | | | copies. After some time Google, Yahoo and MSN |
| by a link like myforum.com/index.php | | | | have purged their indices leaving just few of |
| topic.1201.html , and each message within the | | | | the duplicates. MSN's filter managed not only |
| tread has a URL like myforum.com/index.php | | | | to discover the origin but also put it to the |
| topic.1201.msg.01.html . In the eyes of a | | | | top of the search results. Yahoo has also |
| search engine all the links lead to different | | | | discovered the origin, but in the results |
| pages with identical content. Solution? Think | | | | page to the title of the article, the |
| of a consistent way of linking, or apply | | | | origin's position fluctuated obviously |
| robot.txt exclusion rules. | | | | responding to the way Yahoo counts relevancy |
| | | | and authority. |
| This can also be the case when other people | | | | |
| link to you using differently looking URLs. | | | | To the tester's amusement Google's refined |
| Since these external links are out of your | | | | index did not include the original at all! |
| control, you should create a 301 redirect to | | | | Evidently Google featured only those pages |
| the canonical URL you choose to be displayed. | | | | with copies of the same article which it |
| | | | considered relevant and authoritative with no |
| Printer-Friendly Versions | | | | regard to the original source of the content! |
| | | | I've already mentioned a thread where a |
| Making a printer friendly version is a common | | | | similar problem is discussed. The both |
| practice and it adds value to the visitors. | | | | stories took place in 2005 and early 2006 and |
| But printer-friendly version is also a | | | | so far I found no evidence that this issue is |
| prominent example of duplicate content! | | | | resolved. |