| Most websites have areas of their site that feature | | | | uses to rank pages. If your site has both https and http |
| SSL. SSL is an excellent form of security technology, | | | | versions, your PageRank may get distributed to both |
| however if your server is not configured correctly, you | | | | versions. |
| may run into problems such as canonicalization and | | | | On to the solution. First you need to create a second |
| duplicate content penalties. There are ways to ensure | | | | robots.txt file. Name it robots_ssl.txt or anything else |
| you don't run into these problems. In this article I will | | | | that you prefer. A robots.txt file, also known as the |
| describe a solution to this problem if you are using an | | | | Robots Exclusion Protocol, is a simple text file that |
| Apache server. | | | | resides in the root level of your domain. It's a method |
| SSL, Secure Sockets Layer, is a protocol that utilizes | | | | that basically prevents search engine robots from |
| encryption for managing sensitive information on the | | | | indexing certain specified files or directories of your site |
| Internet between a web browser and a web server. It | | | | which are otherwise publicly viewable. The following is |
| essentially ensures that credit card transactions, for | | | | an example of a robots.txt file which will prevent all |
| example, are sent securely over the Internet. URLs of | | | | search engine robots from indexing the entire site: |
| pages that are SSL secured begin with https instead | | | | User-agent: * |
| of http and that is precisely how you can run into | | | | Disallow: / |
| problems. Your site will basically have two versions of | | | | You need to upload that new robots_ssl.txt file to the |
| the same pages except one version will feature | | | | root level of your domain. Make sure that mod_rewrite |
| pages beginning with https while other version will | | | | is enabled on your server. mod_rewrite is used for |
| feature pages that begin with http. | | | | rewriting a URL at the server level. It essentially |
| One significant issue that may arise if the secure | | | | redirects one URL to another without the user's |
| areas of your site have been completely indexed by | | | | realization. It's especially handy if you want to modify |
| search engine robots besides your standard site is | | | | the URL's appearance. |
| called canonicalization. Canonicalization is a method | | | | Next, you will need to open up your.htaccess file in the |
| where a search engine chooses the best URL for | | | | root level of your domain..htaccess, or distributed |
| your website when there are more than a few to | | | | configuration files, is a simple ASCII file used to make |
| choose from. In this instance, the search engine has to | | | | configuration changes on per-directory basis. In |
| choose between https and http version of the website. | | | | the.htaccess file you can implement custom error |
| Another issue that may arise is duplicate content | | | | pages, password protect directories, and much more. |
| penalties. A duplicate content penalty is employed by | | | | Add the following in your.htaccess file: |
| search engines when content found on your site is | | | | RewriteEngine on |
| duplicated verbatim on other sites. If someone | | | | Options +FollowSymlinks |
| plagiarizes your content and places it on their site, you | | | | RewriteCond %{SERVER_PORT} ^443$ |
| may run into this problem. In such penalty, search | | | | RewriteRule ^robots.txt$ robots_ssl.txt |
| engines will punish one of the sites by dropping their | | | | The above command instructs the server to direct |
| ranking. As a result, the culprit's site may outrank you | | | | any requests for a robots.txt file made on port 443 to |
| for your own keywords. Imagine that. Now, if your site | | | | the second robots.txt, which we named robots_ssl.txt. |
| has both https and http versions, you may experience | | | | Port 443 on your server is the default used for SSL |
| duplicate content penalties across your entire site, as | | | | connections. Thus, it's sending the search engine robots |
| well. | | | | to the robots_ssl.txt file where it's instructed not to |
| Your PageRank may suffer as well. PageRank, used | | | | index the site. In order to test that it's functioning insert |
| by Google, is a measurement of relative importance of | | | | http: // w w w. nameofyoursite.com/robots.txt (original |
| a website. When page A links to a page B, it | | | | robots.txt file) and then insert https: // w w w. |
| essentially adds a vote for that page. The more votes | | | | nameofyoursite.com/robots.txt (robots_ssl.txt file) into |
| a page or a website has, the more important that | | | | your browser. You should see two different robots.txt |
| page is. PageRank is one of many factors that Google | | | | files. |