There’s virtually no information anywhere that helps understand the potential challenges for HTTP/HTTPS optimization. Based on my observations and technical knowledge, here’s the top things to watch out for when you are optimizing HTTP/HTTPs – and resolutions for each.
1. Duplicate Content and Canonicalization
Because the protocols (http/https) are different, they are considered two separate sites, so there’s a good chance to get penalized for duplicate content. If the search engine discovers two identical pages, generally it would take the page it saw first and ignore the other pages.
Solutions:
1. Be smart about the site structure: to keep the engines from crawling and indexing HTTPS pages, structure the website so that HTTPs are only accessible through a form submission (log-in, sign-up, or payment pages). The common mistake is making these pages available via a standard link (happens when you are either ignorant or not aware that the secure version of the site is being crawled and indexed).
2. Use Robots.txt file to control which pages will be crawled and indexed
3. Use .htaccess file. Here’s how to do this:
4. Create a file names robots_ssl.txt in your root.
5. Add the following code to your .htaccessRewriteCond %{SERVER_PORT} 443 [NC]RewriteRule ^robots.txt$ robots_ssl.txt [L]
6. Remove yourdomain.com:443 from the webmaster tools if the pages have already been crawled
7. For dynamic pages like php, try< ?phpif ($_SERVER["SERVER_PORT"] == 443){echo “< meta name=” robots ” content=” noindex,nofollow ” > “;}?>
8. Dramatic solution (may not always be possible): 301 redirect the HTTPS pages to the HTTP pages – with hopes that the link juice will transfer over.
Additional ideas to solve:
1. Have portions of the site configured to use SSL TO allow data transfer between the server and the browser over an encrypted (secure) connection. Note: the URLs of these pages still begin with https rather than http to indicate the secure protocol.
2. If you already have HTTPs pages in the index, remove them with Webmaster Tools
2. Linking
In certain instances, I’ve seen Google index the HTTPs version of a website (for example,PayPal.com ) but since everyone tends to be linking to the HTTP version of the page, the HTTPs may be out in the woods in terms of pagerank (though this is not the case with PayPal). Now, if Google indexed only the HTTPs pages, then you may be in trouble because you basically have no link juice from HTTP pages. Of course, this may not be the case if you are after HTTPs from the beginning.
Solutions:
1. The best practice is to get links to HTTP versions of the pages, and to do this you will need to make sure that important pages are available in HTTP (typically, this should not be a problem, as most HTTPs pages are not content rich and have less value for the search engines).
2. Keep a different log file for the https domain and write a bit of code to point the referral links to your email every day or week. Contact the webmasters and ask them to change the links to http with a sweet mail (this works for me most times).
3. Keep normal sections under http only to reduce the likelihood of people linking to https.
4. Last resort solution: wait for Google to start counting https links for http pages (good luck with that!).
1. Duplicate Content and Canonicalization
Because the protocols (http/https) are different, they are considered two separate sites, so there’s a good chance to get penalized for duplicate content. If the search engine discovers two identical pages, generally it would take the page it saw first and ignore the other pages.
Solutions:
1. Be smart about the site structure: to keep the engines from crawling and indexing HTTPS pages, structure the website so that HTTPs are only accessible through a form submission (log-in, sign-up, or payment pages). The common mistake is making these pages available via a standard link (happens when you are either ignorant or not aware that the secure version of the site is being crawled and indexed).
2. Use Robots.txt file to control which pages will be crawled and indexed
3. Use .htaccess file. Here’s how to do this:
4. Create a file names robots_ssl.txt in your root.
5. Add the following code to your .htaccessRewriteCond %{SERVER_PORT} 443 [NC]RewriteRule ^robots.txt$ robots_ssl.txt [L]
6. Remove yourdomain.com:443 from the webmaster tools if the pages have already been crawled
7. For dynamic pages like php, try< ?phpif ($_SERVER["SERVER_PORT"] == 443){echo “< meta name=” robots ” content=” noindex,nofollow ” > “;}?>
8. Dramatic solution (may not always be possible): 301 redirect the HTTPS pages to the HTTP pages – with hopes that the link juice will transfer over.
Additional ideas to solve:
1. Have portions of the site configured to use SSL TO allow data transfer between the server and the browser over an encrypted (secure) connection. Note: the URLs of these pages still begin with https rather than http to indicate the secure protocol.
2. If you already have HTTPs pages in the index, remove them with Webmaster Tools
2. Linking
In certain instances, I’ve seen Google index the HTTPs version of a website (for example,PayPal.com ) but since everyone tends to be linking to the HTTP version of the page, the HTTPs may be out in the woods in terms of pagerank (though this is not the case with PayPal). Now, if Google indexed only the HTTPs pages, then you may be in trouble because you basically have no link juice from HTTP pages. Of course, this may not be the case if you are after HTTPs from the beginning.
Solutions:
1. The best practice is to get links to HTTP versions of the pages, and to do this you will need to make sure that important pages are available in HTTP (typically, this should not be a problem, as most HTTPs pages are not content rich and have less value for the search engines).
2. Keep a different log file for the https domain and write a bit of code to point the referral links to your email every day or week. Contact the webmasters and ask them to change the links to http with a sweet mail (this works for me most times).
3. Keep normal sections under http only to reduce the likelihood of people linking to https.
4. Last resort solution: wait for Google to start counting https links for http pages (good luck with that!).
Comments
Post a Comment