Duplicate Content: What You Ought to Know About

Take a look at your website. How much of yourcontent! Fortunately a simple solution like adding a
content might be considered as duplicate by a search'noindex' meta tag to your print pages solves the issue.
engine algorithm? Even though you never copyProduct-Only Pages
anyone you can't answer 'none' because someoneProduct pages looking similar are common among
can be copying you. Duplicate content is one of theonline stores. Typically they are created using a single
biggest issues both for search engines trying to keeptemplate. Often two different product pages share a
their results' relevancy high, and webmasters trying todescription that varies in just few words or numbers,
avoid search engine penalties.which causes them to be filtered out as duplicate
Penalties for having duplicate content can be reallycontent. This issue has no easy solution. Either you
harmful. This is not just a downgrade in rankings but arewrite robot.txt to allow only one product description
move to supplementary results which are hardly visibleto be crawled and lose SE traffic to the rest of them,
to the most of the web users. Normally it is expectedor you roll up your sleeves and add something
that Google would select one URL over another todifferent to each product page, like testimonials, which
display in SERPs, while duplicates could be found inis time consuming or nearly impossible depending on
supplemental results. Unfortunately this is not alwaysthe number of product types in your stock.
so. In the thread "Duplicate content observation" in theHow Do Duplicate Content Filters Work?
forum you can read about a case when an originalThere are several algorithms in data mining aiming to
high quality and authoritative page was removed fromdetect similar text passages. The one claimed to be
Google's index together with its duplicates. Consideringused by search engines is w-shingling. Each document
that this can happen even to the most honesthas a unique fingerprint or shinglings - the contiguous
webmaster, one can imagine the amount of attentionsubsequences of tokens (blocks of text). The ratio of
this issue gets on any SEO forum.magnitude of union and intersection of two documents'
Types of Duplicate Contentshinglings can be used to determine their resemblance.
Duplicate content has a wider definition than theAnother algorithm that can be used for duplicates
'copy-paste' plagiarism; it is not just content scrappeddetection is Levenshtein's distance
from a competitor's site, a SERP or a RSS feed.It is naturally to expect from a duplicate content filter to
Apart from this there are few more aspects that arebe able to discover the origin and rank it higher. The
generally referred to as duplicate content.simplest way to detect the origin would be comparing
Circular Navigationthe date of indexing implying that the original source is
Jake Baille from TrueLocal vaguely defines circularuploaded and crawled earlier than its copies. But with
navigation as having multiple paths across website.the advent of the RSS feeds the new content can be
This can be understood as the same content beingdistributed instantaneously and this approach is no
accessible via different URLs. An example of thelonger valid.
circular navigation could be an article that is retrievedConcerning the origin's right to be ranked higher - this is
by links likenot always implemented. J.S.Cassidy in her article
- example.com/articles/1/ ,'Duplicate Content Penalties Problems with Googles
- mysite.com/article1/Filter' published at tells about an experiment of an
- mysite.com/articles.php?id=1article distribution. An article was syndicated twice
Another legitimate use of multiple URLs is forumscoring as many as 19000 copies. After some time
threads. Each thread can be accessible by a link likeGoogle, Yahoo and MSN have purged their indices
myforum.com/index.php/topic.1201.html , and eachleaving just few of the duplicates. MSN's filter
message within the tread has a URL likemanaged not only to discover the origin but also put it
myforum.com/index.php/topic.1201.msg.01.html . In theto the top of the search results. Yahoo has also
eyes of a search engine all the links lead to differentdiscovered the origin, but in the results page to the title
pages with identical content. Solution? Think of aof the article, the origin's position fluctuated obviously
consistent way of linking, or apply robot.txt exclusionresponding to the way Yahoo counts relevancy and
rules.authority.
This can also be the case when other people link toTo the tester's amusement Google's refined index did
you using differently looking URLs. Since these externalnot include the original at all! Evidently Google featured
links are out of your control, you should create a 301only those pages with copies of the same article
redirect to the canonical URL you choose to bewhich it considered relevant and authoritative with no
displayed.regard to the original source of the content! I've
Printer-Friendly Versionsalready mentioned a thread where a similar problem is
Making a printer friendly version is a common practicediscussed. The both stories took place in 2005 and
and it adds value to the visitors. But printer-friendlyearly 2006 and so far I found no evidence that this
version is also a prominent example of duplicateissue is resolved.