SourceForge.net Logo

Email Obfuscator: Anti Spam Tool for Websites

Table Of Contents


Description of Tool

This is a stand-alone tool written in Java that can be used to process single html pages, or archives (currently supports .zip, .tar, and .tar.gz) of HTML pages by hiding, or obfuscating, the email addresses that appear within those pages. Various obfuscators are available to choose from as described below, and it is very simple to add new obfuscators to the framework. Using the general guidelines and the obfuscators provided will make your web pages much safer from spammers being able to harvest your email and flood your inbox with spam.

Back to top

Download, Installation, and Use

This project is currently hosted by SourceForge. Please click here for the main SourceForge project page where you can download the release from, or click here for the direct link to the download page. Once unpacked, there are two ways to run the program.

  1. There is an ant build file called build.xml in the home directory. Calling >ant run will run the program.
  2. There is an executable .jar file in the home directory that can be run using the command >java -jar Obfuscator.jar from the command line (or in some systems by double clicking its icon).
Please click here for a walkthrough on how to use the tool. Documentation and other relevant libraries are available in the other folders. This system has been tested on a 32-bit Windows XP machine, and a Mac PowerBook G4. I cannot assure it will run smoothly on other systems due to the many file operations that are used, that can be different across various systems. If any bugs found, or to send comments, please see the contact section.

Back to top

General Advice

It is generally advised to use as few instances of email address as possible. When email addresses do need to be published, you will get best use out of this program by keeping them out of SCRIPT tags, as well as comments. Most of the obfuscators are not designed to obfuscate emails present in these tags for various reasons discussed below. Using cascading stylesheets (CSS) is encouraged when using the image obfuscator. Limitations and advice regarding this can also be read below.

Back to top

Email Validation

This program uses a regular expression to find the email addresses appearing within the document. The regular expression is based very closely on the RFC 2822 standard for Internet Message Formats. This program will NOT verify that an email is valid in that it can receive messages. The regular expression validates based solely on the standard, which accepts certain email addresses which might not seem valid at first glance, but adhere to the standard. An example of this would be name@domain. This email will be accepted, as it adheres to the standard. Because of this, it is necessary to be sure that the emails are spelt correctly.

The regular expression used has two levels of complexity, which can be chosen when you click on whether you want to process an archive or a single file. The only difference between the two is that the complex version additionally handles email groups and named addresses. An example of a named address would be "Joe Sample" <joe@sample.com>. Similarly, an example of an email group would be "Group Name": bob@domain.com, "Jim" <jim@domain.com>;. If you are not sure which version to use, it is most common to only need the simple address validator. The simple version handles the most simple address such as name@domain.com or also mailto links such as mailto:name@domain.com. If it is in any way possible, it will ALWAYS be better to use the simple version. The complex version of the regular expression is very lengthy, and can take a very long time to process certain pages.

Back to top

Implemented Obfuscators

Entities Obfuscator

This obfuscator replaces any occurances of email addresses within a document with their equivalent HTML entities.
This obfuscator will obfuscate emails within:

  1. SCRIPT tags
  2. Attributes (including mailto: links
  3. Regular HTML text

This obfuscator will not obfuscate emails within:

  1. Comments

The reason for this is that a string of HTML entities is unreadable by a human, making the email address within the comment useless to people reading the code. For this reason, it is advisable to remove such emails as a whole, if you want the software to have an effect. Spam bots will still be able to read commented emails!

Positive: This method has the advantage that it is interpreted by all browsers. Additionally, any CSS styles that are applied to the relevant piece of HTML code will still be satisfied by the obfuscated emails. There is no way to tell without looking at the page source that the email has been obfuscated. Further, the user can copy and paste the email address just as they could as before, without anything being different.

Negative: This method is relatively simple. The emails will be obfuscated in a very predictable way, since entity values do not change from machine to machine.

Example: The email name@domain.com would be encoded into the foloowing string of HTML:
&#110;&#97;&#109;&#101;&#64;&#100;&#111;&#109;&#97;&#105;&#110;&#46;&#99;&#111;&#109;.

Back to top

JavaScript Obfuscator

This obfuscator replaces any occurances of email addresses within a document with a scrambled and random piece of JavaScript code that will print the email to the screen.
This obfuscator will obfuscate emails within:

  1. Attributes (including mailto: links)
  2. Regular HTML text

When obfuscating attributes with email addresses, the the entire start tag is replaced. This is because it would not be allowed to have SCRIPT tags embedded within start tags. Because of this, it is possible to obfuscate mailto: links as well.
This obfuscator will not obfuscate emails within:

  1. SCRIPT tags
  2. Comments

SCRIPT tags are skipped because inserted JavaScript tags may interfere with other scripting code already present. Comments are not obfuscated for the same reason as with the Entities Obfuscator, since the JavaScript inserted is scrambled and randomized, no human will easily be able to read what it prints to the screen.

Positive: This method has the advantage that, given the user has JavaScript enabled, the text will appear just as it did if it had been written out directly, including support of existing CSS style declarations. There is no visible change to the user browsing your site. As with the entities, the user can copy/paste the email just as they normally could. A further advantage is that it is relatively tedious to decode the JavaScript by just looking at it. The number of variables, their length, their names, and their positions within the script declaration are all randomized, making it very difficult to read. Due to its randomization, there is no specific pattern that can be searched for in order to discover the

Negative: Using JavaScript has the disadvantage that although it being difficult to read, it can easily be run from any activated browser. This leads to the second disadvantage, that it is possible to disable JavaScript from running in browsers, which means there is a small number of users that may not be able to read the obfuscated email. There are purposely no <noscript> tags with a notice message in order to avoid producing a searchable pattern.

Example: The email name@domain.com would be encoded into HTML code similar to the following:
<SCRIPT type="text/javascript">var w4B$o8B30="n";var f9W2t_Fn4="&#64d";var t0Wn61tEsz="om";var ih9BPJw7W2="m";var uE_$1nqvR7="ain";var g0gH5ab4n=".co";var seN91="ame"; document.write(w4B$o8B30+seN91+f9W2t_Fn4+t0Wn61tEsz+uE_$1nqvR7+g0gH5ab4n+ih9BPJw7W2);</SCRIPT>

Back to top

Image Obfuscator

This obfuscator replaces any occurances of email addresses within a document with an image resembling the email address as closely as possible. The image file names are completely randomized strings.
This obfuscator will obfuscate emails within:

  1. Regular HTML text

It is important to note here that there are limitations as to how closely the image will correspond to the text that was being displayed. Currently, there is support for CSS style declarations coming from stylesheets linked to using the <link rel="stylesheet" href="local-path-to-stylesheet"> syntax, as well as direct CSS style information declared in the document head using the <style type="text/css"> style-goes-here </style> syntax. Currently, there is no support for using "p.subclass {}" syntax. Only direct declarations for existing tags such as "h1 {}" or for custom tags such as "customTag {}" are supported. Additionally, since this program is standalone, it is not possible to access default fonts set in the browser to decide on the font, so I use various default font sizes and types which correspond to various common internet browser defaults.
This obfuscator will not obfuscate emails within:

  1. Attributes
  2. Comments
  3. SCRIPT tags

Attributes cannot be obfuscated using images because it is not allowed to insert html tags within start tags themselves. SCRIPT tags aren't obfuscated because it can interfere with other scripting functions if you insert an image in the middle of them. Finally, comments are skipped too because a link to an image file in a comment will not be of any use to someone reading the comments.

Positive: This is probably the most effective method to obfuscate your email address. It is not believed that at the moment there are many spam bots that can use optical character recognition to identify the email present in the image file. Additionally, the fact that the names of the image files are random makes it more difficult to detect when an image contains an email address.

Negative: There are 2 main down sides to this obfuscation method. First of all, it is possible that the images do not look exactly like the surrounding text, breaking the reading flow of the web page. Even though it is possible to extend the CSS interpreter used in this program relatively easily, there are limitations on how much CSS style information is currently supported and understood. The second disadvantage is that images do not allow for the user to copy and paste the email into their email client to actually send the email. This needs to be taken into consideration when trying to make your web page as user-friendly as possible. A solution to this problem is using mailto: links, with the email within the visible portion of the tag
(ie: <a href="mailto:name@domain.com">name@domain.com</a>). Then, by using the Image and JavaScript obfuscator, the image will first be created for the email outside the tags, and then the javascript will obfuscate the entire tag, making the mailto: link clickable again. Please read below for the description of it.

Example: When obfuscated, an email gets replaced with an <IMG...> tag with the file name of the image. For example, it could look like this: <IMG src="ja4176I921.png">.

Back to top

Image and JavaScript Obfuscator

This obfuscator first runs the image obfuscator, followed by the JavaScript obfuscator. The effect this has, is that it will first replace any emails occuring within HTML text with images, and then any emails left over present in attributes will be obfuscated using the JavaScript technique described above. This is particularly useful if the website in question uses both emails in HTML text and mailto: links.
This obfuscator will obfuscate emails within:

  1. HTML Text
  2. Attributes (including mailto links)

This obfuscator will not obfuscate emails within:

  1. Comments
  2. SCRIPT tags

Positive: This obfuscator naturally combines the positive aspects of both the image and the JavaScript obfuscators. Images can be made clickable to make up for the lack of ability in being able to copy and paste the address. Many browsers allow right-clicking on an image (which would be linked with the mailto: link) and copying the link address. This would put the whole mailto:name@domain.com string into the clipboard, allowing for relatively easy pasting into an email client.

Negative: The only negative aspect that remains is that the images might not look exactly like the surrounding because of the relatively simple CSS parser/interpreter that is used.

Example: If the string "<a href="mailto:name@domain.com">name@domain.com</a>" was obfuscated using this obfuscator, the resulting string could look something like:
<SCRIPT type="text/javascript">var y3i92fJv=".com\">";var kV825_7="o:name@d";var n3_92="<a h";var jEKQMK="ailt";var dy9R_vY$9s="omain";var gTyZ$1$6C9="ref=\"m";document.write(n3_92+gTyZ$1$6C9+jEKQMK+kV825_7+dy9R_vY$9s+y3i92fJv);</SCRIPT><IMG style="vertical-align: bottom;" src="6Oa60cAi47.png"></a>. If looked at carefully, the IMG tag is visible which declares the image of the visible text from the link.

Back to top

Spaces Obfuscator

This obfuscator replaces occurances of emails within a document by an equivalent string with spaces around the @ and the . characters.
This obfuscator will obfuscate emails within:

  1. HTML Text
  2. Comments

Note that this obfuscator does include comments, since a spaced-out version of an email address is still easily readable by a human.
This obfuscator will not obfuscate emails within:

  1. Attributes
  2. SCRIPT tags

Attributes are not obfuscated because spaces within strings inside attributes have an effect on their meanings. In other words, a mailto: tag with an href attribute value of name @ domain . com will not have the same effect as the email without the spaces. SCRIPT tags aren't obfuscated for the same reasons. Spaces within the strings will cause the email to become invalid.

Positive: This obfuscator will hide the emails to a certain extent while still maintaining a relatively large amount of ease to copy and paste the email, while fully satisfying any present CSS style information. Additionally it allows for obfuscation of comments, which can be useful in certain situations.

Negative: This obfuscation method is quite simple, and not very difficult to reverse (despite many recent studies having shown that even such simple methods can greatly reduce the amount of spam you receive).

Example: The email name@domain.com would be obfuscated to the string: name @ domain . com.

Back to top

Writing Out Obfuscator

This obfuscator replaces occurances of emails within a document by an equivalent string, with the @ and . characters replaced by their equivalent words [AT] and [DOT] respectively.
This obfuscator will obfuscate emails within:

  1. HTML Text
  2. Comments

As with the spaces obfuscator, this one also hides comments in such a way that is is still easy for humans to read.
This obfuscator will not obfuscate emails within:

  1. Attributes
  2. SCRIPT tags

Positive: This obfuscator will hide the emails to a certain extent while still maintaining a relatively large amount of ease to copy and paste the email, while fully satisfying any present CSS style information. Additionally it allows for obfuscation of comments, which can be useful in certain situations.

Negative: This obfuscation method is quite simple, and not very difficult to reverse (despite many recent studies having shown that even such simple methods can greatly reduce the amount of spam you receive).

Example: The email name@domain.com would be obfuscated to the string: name[AT]domain[DOT]com.

Back to top

Table Obfuscator

This obfuscator replaces occurances of email addresses within a document with a 1-rowed, 3-columned table with no cell spacing or paddings.
This obfuscator will obfuscate emails within:

  1. HTML Text

This obfuscator will not obfuscate emails within:

  1. Attributes
  2. Comments
  3. SCRIPT tags

This obfuscator cannot replace emails within attributes or script tags because it would hinder possible functions being executed by the script, as well as it not being permitted to have tags within attribute values of other tags.

Positive: This obfuscation method is a slightly more complicated method to obfuscate the email addresses. It still makes it possible for a user to copy and paste the email address, only with the slight inconvenience that spaces or a tab could form where the cell border was. This can obviously be undone by just hitting backspace to erase the spaces.

Negative: This obfuscation method does not allow for the obfuscation of mailto: links. Furthermore, the data in the table will not get it's style information from CSS data for the surrounding enclosing tag. If you wish to have the table contain the same style, it is necessary to add a table property to the CSS declaration, with the style you wish to have. Furthermore, this implementation doesn't have any kind of randomization built in (the email is always split around the same characters).

Example: The email name@domain.com would be converted to the following code:
<TABLE style="display: inline-table; vertical-align: text-bottom;" border="0" cellpadding="0" cellspacing="0"><TR><TD>name</TD><TD>&#64</TD><TD>domain.com</TD></TR></TABLE>

Back to top

Information for Developers

This tool was originally written in JAVA version 1.5, with use of generics and various other 1.5 features, meaning any earlier version will not work. To add a new custom obfuscator, a few minor things need to be done.

  1. A new obfuscator class needs to be created, implementing the ObfuscatorInterface class found in the obfuscator package. The method in this interface should be implemented destructively. In other words, the GUIs are set up so that the files are already copied from their original location when the obfuscate method is called on them.
  2. Within the Obfuscators class, a new line needs to be added to the static initializer block, to map a description to an instance of the new obfuscator class.

Once this is complete, the new obfuscator should appear within the GUIs, and should be available for use. If any obfuscator classes are added, please also add an appropriate testing class using the JUnit framework as there exists for the other classes, and add the testing class name to the testfiles.AllTests class for easy execution of all tests together.

This software is distributed under the GNU General Public License (GPL). For terms and conditions of this license, see this page.

I have included an ant build file with the following options:

The default option is run.

Back to top

Contact Details

To contact the original designer for problems, questions, requests, or comments, send an email to Sebastian Vermehren at , or visit the SourceForge page here.

Back to top

Disclaimer

This software is provided as it is, with absolutely NO warranty. You assume all responsibility for any and all damages that may occur when using this piece of software.

Back to top