Squashing Spam Spiders
Wednesday, August 13, 2003
Posted by: Info-Tech Research Group
(Center for Democracy and Technology Report, CNETAsia Article, TRUSTe Article)
You've employed spam filters in your e-mail server and you constantly update your IP address blacklists, yet messages soliciting all manners of products, services, and vulgarities still find their way into your corporate inbox. How? This is done through harvesting software used by spammers to find your e-mail address. You can take proactive steps to stamp out these "spider-spams."
But First, the Latest Numbers: We've all seen the cost of spam portrayed in various figures but it still bears repeating to underscore just how massive a problem spam really is. For example, InternetNews and CNETAsia assert that:
· Spam costs $49 per year per corporate inbox.
· Nearly 7 billion spam messages go through corporate e-mail every day.
· Unsolicited e-mail now accounts for 45 percent of all e-mail messages.
· Spam will cost U.S. businesses $10 billion in lost productivity and IT overhead.
What Is Spider-Spamming? Spammers usually broadcast their messages by purchasing large blocks of e-mail addresses or by spoofing legitimate addresses. Spider-spamming, on the other hand, involves the use of software that harvests e-mail addresses posted on personal and corporate Web sites, and in newsgroups. This software uses robots and spidering techniques - like those used in search engines - to find and collect addresses.
An Authority Is on the Case: Last year, the Center for Democracy and Technology (CDT) conducted a study to find the sources of spam. E-mail addresses were set up in public areas to see how they would attract spam. Another CDT report, this time to the Federal Trade Commission, delves deeper into unsolicited commercial e-mail problems.
Action Plan: There's no method that will stop 100 percent of spam from reaching your inbox. However, the following action plan summarizes the CDT's recommendations for preventing your e-mail address from getting captured by spidering software.
Speak with Your Webmaster: Find out how many employee e-mail addresses are posted on your Web site(s), affiliate sites, extranets, newsgroups, and any other public area of the Internet. Make a list: you will need it for the next four steps.
Use Hex Code: You can replace non-alphabetical characters in an e-mail address with their Hex equivalents when creating HTML pages. For example, using Hex code turns "firstname.lastname@example.org" into "joe%60yourcompany%2ecom." So when the Web page goes up, viewers will see a normal-looking address, but spiders won't be able to parse the underlying Hex code.
Re-Format E-Mail Lists: If your site posts e-mail lists, such as a sales group, consider re-posting them in a graphic format, like a .gif file. People will be able to read the list without problem, but again, the spidering software won't be able to parse the code for harvesting the addresses.
Consider Web Forms: You could use Web forms on your site for when visitors wish to contact your organization. With a Web form submission, there are no e-mail addresses posted at all. This is the least user-friendly aversion method, but it may be an option in certain situations.
Disguise E-Mail Addresses: Much like the Hex code method, one way of disguising addresses is to replace characters in an e-mail address with readable equivalents that spidering software won't be able to recognize. For example, instead of posting email@example.com, try writing it as "joe at your company dot-com." This isn't exactly user-friendly; however, people will still understand it, yet the robots won't.
Bottom Line: The war against spam rages on, but you can add another weapon to your arsenal by mitigating spidering techniques.
Want to Know More?
· "Research Shows How to Cut Down on Spam," at TRUSTe.
· "Ad-Hoc Working Group's Report on Unsolicited Commercial Email," from the Center for Democracy and Technology.
· "Spam: No quick victory," CNETAsia.
· "Spam Threatens Revenue, Kids," InternetNews.
· "Sinking in a sea of spam," SC Magazine.
© Copyright The Info-Tech Research Group