How to Prevent Search Engines From Indexing Secure Pages (HTTPS Pages)

Most websites have areas of their site that featureuses to rank pages. If your site has both https and http
SSL. SSL is an excellent form of security technology,versions, your PageRank may get distributed to both
however if your server is not configured correctly, youversions.
may run into problems such as canonicalization andOn to the solution. First you need to create a second
duplicate content penalties. There are ways to ensurerobots.txt file. Name it robots_ssl.txt or anything else
you don't run into these problems. In this article I willthat you prefer. A robots.txt file, also known as the
describe a solution to this problem if you are using anRobots Exclusion Protocol, is a simple text file that
Apache server.resides in the root level of your domain. It's a method
SSL, Secure Sockets Layer, is a protocol that utilizesthat basically prevents search engine robots from
encryption for managing sensitive information on theindexing certain specified files or directories of your site
Internet between a web browser and a web server. Itwhich are otherwise publicly viewable. The following is
essentially ensures that credit card transactions, foran example of a robots.txt file which will prevent all
example, are sent securely over the Internet. URLs ofsearch engine robots from indexing the entire site:
pages that are SSL secured begin with https insteadUser-agent: *
of http and that is precisely how you can run intoDisallow: /
problems. Your site will basically have two versions ofYou need to upload that new robots_ssl.txt file to the
the same pages except one version will featureroot level of your domain. Make sure that mod_rewrite
pages beginning with https while other version willis enabled on your server. mod_rewrite is used for
feature pages that begin with http.rewriting a URL at the server level. It essentially
One significant issue that may arise if the secureredirects one URL to another without the user's
areas of your site have been completely indexed byrealization. It's especially handy if you want to modify
search engine robots besides your standard site isthe URL's appearance.
called canonicalization. Canonicalization is a methodNext, you will need to open up your.htaccess file in the
where a search engine chooses the best URL forroot level of your domain..htaccess, or distributed
your website when there are more than a few toconfiguration files, is a simple ASCII file used to make
choose from. In this instance, the search engine has toconfiguration changes on per-directory basis. In
choose between https and http version of the website.the.htaccess file you can implement custom error
Another issue that may arise is duplicate contentpages, password protect directories, and much more.
penalties. A duplicate content penalty is employed byAdd the following in your.htaccess file:
search engines when content found on your site isRewriteEngine on
duplicated verbatim on other sites. If someoneOptions +FollowSymlinks
plagiarizes your content and places it on their site, youRewriteCond %{SERVER_PORT} ^443$
may run into this problem. In such penalty, searchRewriteRule ^robots.txt$ robots_ssl.txt
engines will punish one of the sites by dropping theirThe above command instructs the server to direct
ranking. As a result, the culprit's site may outrank youany requests for a robots.txt file made on port 443 to
for your own keywords. Imagine that. Now, if your sitethe second robots.txt, which we named robots_ssl.txt.
has both https and http versions, you may experiencePort 443 on your server is the default used for SSL
duplicate content penalties across your entire site, asconnections. Thus, it's sending the search engine robots
well.to the robots_ssl.txt file where it's instructed not to
Your PageRank may suffer as well. PageRank, usedindex the site. In order to test that it's functioning insert
by Google, is a measurement of relative importance ofhttp: // w w w. nameofyoursite.com/robots.txt (original
a website. When page A links to a page B, itrobots.txt file) and then insert https: // w w w.
essentially adds a vote for that page. The more votesnameofyoursite.com/robots.txt (robots_ssl.txt file) into
a page or a website has, the more important thatyour browser. You should see two different robots.txt
page is. PageRank is one of many factors that Googlefiles.