How To Use Your .htaccess File To Keep Spammers Out
Spammers have a knack for developing “overrides” to even the most secured aspect of the system including those that are not readily recognized as potential targets. The .htaccess file can be used to keep e-mail harvesters away. This is considered very effective since all of these harvesters get to identify themselves in some way using the user agent files which gives .htaccess the capability to block them.
Spams Countered by .htaccess
Bad bots are the spiders that are considered to do a lot more harm than good to a site such as an e-mail harvester. Site rippers are offline browsing programs that a surfer may unleash on a site to crawl and download every one of its pages for offline viewing. Both cases would result to a jacking up a site’s bandwidth and resource usage even up to the point of crashing the site’s server. Since bad bots would typically ignore the wishes of ones’ robots.txtfile they can be banned using the .htaccess essentially by identifying the bad bots.
There is a useful code block that can be inserted into the .htaccess file for blocking a lot of the known bad bots and site rippers currently existing. Affected bots will receive a 403 Forbidden Error when they attempt to view a protected site. This usually results to a significant bandwidth saving and decrease in server resource usage.
Bandwidth stealing or what is commonly referred to as hot linking in the web community refers to linking directly to non-HTML objects that are not on one’s own server such as images and CSS files. The victim’s server is robbed of bandwidth and money as the perpetrator enjoys showing content without having to pay for its delivery.
Hot linking to one’s own server can be disallowed with the use of .htaccess. Those who will attempt to link an image or CSS file on a protected site is either blocked or served a different content. Being blocked would usually mean a failed request in the form of a broken image while an example of a different content would be an image of an angry man, presumably to send a clear message to the violators. It is necessary that the mod rewrite is enabled on one’s server in order for this aspect of .htaccess to work.
Disabling hot linking of certain file types on a site would need a code to the .htaccess file which will be uploaded to the root directory or a particular subdirectory to localize the effect to just one section of the site. A server is typically set to prevent directory listing. If this is not the case, the required link should be stored into the .htaccess files of the image directory so that nothing in this directory will be allowed to be listed.
The .htaccess file is also able to reliably password protect directories on websites. Other options can be used but only .htaccess offers total security. Anyone wishing to get into the directory must know the password and no “back doors” are provided. Password protection using .htaccess requires adding the approximate links to the .htaccess file in the directory that is being sought to be protected.
Password protecting a directory is one of the functions of .htaccess that takes a little more work than the others. This is because a file containing the usernames and passwords which are allowed to access the site has to be created. It is placed anywhere within the website although it is advisable to store it outside the web root so that it cannot be accessed from the web.
Recommended Practices to Deter Spam
Avoiding the publication of referrers is one way of discouraging spammers. It would be pointless to bother sending spoofed requests to blogs when this information is not known. Unfortunately, most bloggers believe that being able to click on a link such as “sites referring to me” and the like is a neat feature and have not evaluated its detrimental effect on the whole blogosphere.
If publishing referrers is a definite must, there should be a built-in support for a referral spam blacklist and include the page in robots.txt. It specifically tells Googlebot and its relatives not to index the referrer’s page. By doing this, spammers are unable to get the page rank they seek. This would only work however, when referrers are published separately from the rests of the site’s content.
The use of rel = “no follow” likewise denies the spammers of their desired page rank at the link-level and not just the page-level using robots.txt. All link referrer section of the website linking to external websites should carry this attribute. This is done without exception so as to offer maximum protection.
The current Master Blacklist File can be a powerful and efficient weapon against spam. A log file analysis program that filters referrers against this list can help root out spam. The Master Blacklist is a simple text file that can be downloaded from a website or simply mirrored. It is far from perfect since a check on the file against the referrers that got through shows that few or none of them were listed.
The idea of combating comment spam by harnessing DNS-based black hole lists could also be used to ferret out other forms of spam such as referral spam. The proposal is really rather simple and suggests to query the IP against a blacklist for a request with a referrer. If the IP is blacklisted or has a high score among a multitude of blacklist, listing the referring URL in any section of a site’s web stats should be refrained from. Once a given site has been identified as a referral spam host name, querying the blacklist again for any IPs with the same host name in the HTTP request should not be done as a matter of efficiency.
There are various forms of spam that has grown exponentially along with the popularity of blogs. This is probably due to the very little restrictions given against those that can post a comment. This is easily exploited by spammers who are intent on getting their goods in front of people’s view. Spammers have automated tools on a constant look-out for blogs that can easily be spammed. Spamming in all its forms, carry heavy consequences for those trying to use the Internet and the world wide web in a productive way.