This is the project blog for the Dario La email obfuscation project at university of edinburgh

Friday, October 14, 2005

Waiting Time and User Tolerance Limit

Must identify a useability limit and upper bound on how long a user is willing to wait before getting an email address. How "frustrating the expectation is".

In a discussion with Jon Oberlander today after lectures he suggested:

1. What is the aim of the user when browsing the site?
  • Users with specific aim of getting an email address and general browsers have different aims. Their user experience expectations will adjust according to their desired goals.
  • E.g. when viewing an academics homepage, most users would be interested in finding out what research areas are, what papers are etc., not in trying to achieve commuinication with the page owner. As such, a browser will probably tolerate, not be bothered about any delays. However, when a user has specific intention to get email address(or contact site owner), they will want get information fast. They can be similar to a "spambot".
  • Explaining why and how long user must wait for information may increase their tolerance for waiting. Incorporating something like "this is an anti-spam measure ... it will take XXX seconds for the email address to appear" this into the design may increase useability.
2. Imperical research: how long is a user willing to wait before their expectations are frustated:
  • check Nelson's usit column. Although may not have publish figures, may hint to sources of primary research that's uncited!
  • Google keywords: latencies, user tolerance

Thursday, October 13, 2005

Lecture Notes 13 Oct

University Email Anti-Spam Fact Finding

Had a meeting with both the informatics support team and the EUCS Science and Engineering computing support team. They referred me to resources providing more information on measures to deal with Spam within the university.

It seems the university has a hierarchy of measures in place to deal with spam. Furthermore, it maybe a possible security risk and computing regulation issues regarding the use of university email resources for the spambot honeypot.

Discussion is left offline for obvious security reasons.

Lecture Notes 13 Oct

Project Musings: 13 Oct 2005

Honeypot

Started reading Honeypots by Lance Spitzner (Addison-Wesley, 2003) to gather ideas on how to improve on the existing spam honeypot.

Establishing Upper Resource Bounds on Spammer’s machine

NP Complete Problem for Spammer
  • Mine Sweeper is NP complete (wikipedia)

  • Problem must not be easy to solve

  • Must deter spammer to try use site

  • Can limit attacker machine’s – CPU, Memory or Bandwidth

  • Must identify way to set/measure reasonable bounds on these?

  • Whilst may not be able to know these bounds, as we do not know attackers modus operandi, an estimated bounds maybe useful in identifying a suitable NP complete problem

JavaScript must have these characteristics to deter the attacker
  • Problem must be NP complete – must be no shortcuts to solving the problem i.e. attacker must successfully complete execution before they can derive result

  • The source code function must be such that

## JavaScript Idea (Email address stored in an incomplete private-key) ##
  1. Encrypt email address using trapdoor function/public-key algorithm

  2. To gain email address need private key

  3. Use certain bits in private-key to generate a minesweeper grid

  4. Human user must play the minesweeper game and discover where all the mines are to recover the full private key to de-crypt the email address

# with this method, we may not need code obfuscation, although code obfuscation may make it harder for attacker to understand the underlying algorithm

# Obfuscation may be important to prevent automated processing of the script. If attacker does not know how JavaScript runs cannot run it, must require human user to run code. Becomes a new form of captcha (reverse turing test)

Cons of this approach:
  • user must interact (and win minesweeper) to gain private key.

  • Client-side code cannot be relied to set up minesweeper grid

  • Must random minesweeper grids, therefore server side code is required

  • May not appeal to diabled users

  • May not be suitable for micro browsers

But it maybe a fun way to demonstrate how it works! Hiding your email address in games. It maybe simple for you to solve the puzzle, but not for a computer.

Disability factor can be addressed by a guest book form input function.

Tuesday, October 11, 2005

Group Project Meeting

Project Description:

Unsolicited email, spam, is a well known issue facing internet users. Currently popular methods are typically based around identifying and filtering out spam. Such reactionary solutions that can only deal with spam after an unwanted message is sent out onto the network are at best sub-optimal. They do not prevent spam from consuming network bandwidth.

Before a spam message can be sent out onto the network, a spammer must first gain hold of a valid destination email address. Spammers are known to use web crawlers, spambots, which search through public web pages looking for valid email targets.

This project aims to investigate and identify techniques that can be used to obfuscate email address and prevent spambots from harvesting email address from public websites.

Project Goals:
  • Analyze spambot email harvesting techniques
  • Identify anti-email harvesting techniques
  • Develop email obfuscation tools
  • Promote awareness of email harvesting and email obfuscation

Plan of Attack:

Stage I: Basic Milestones


1. Email obfuscation toolkit for static web pages

a. Simple client-side JavaScript obfuscators
b. Image translation tool

Emphasis is to build a ease to use toolkit based on known existing techniques.

2. Email Harvesting Honeypot

Deploy a decoy honeypot web site with email addresses present in different formats to attract spambots and track which techniques are most vulnerable to email harvesting and spam.
  • Control case 1 – mailto tag in clear
  • Control case 2 – email address in clear
  • Simple JavaScript Obfuscation
  • Email address in GIF image
  • Email address embedded in a PDF document
  • Simple key word substitution and separation – AT DOT DOT technique

Stage II: Intermediate Milestones

3. Awareness & promotion website (social engineering)
4. AJAX/Captcha based JavaScript obfuscator

Stage III: Advanced Milestones

5. Website Threat Assessment Diagnostic Tool (greyhat web crawler)
6. PDF email obfuscation
7. Applying Code Obfuscation Techniques to JavaScript
8. Client-side scripting (JavaScript) server based alternatives

Action Items
  • Setup a website visitor counter to monitor the number of visits to the honeypot
  • Use tables to divide up the email address
  • Investigate CSS.none attribute, can be used to prevent the display of nonsense html tags
  • RSS feed to notify users of new email obfuscation techniques published at the site

Monday, October 03, 2005

Plan of Attack

Plan of attack for Honours Project.

TRY TO DO SIMPLE VERSION OF 1 OR 2 IN TIME FOR PROJECT MEETING

1. Build a simple obfuscation tool based on user choice of techniques.
Suitable for users that only have static pages.
a. simple Javascript - executing Javascript on client computes mailto tag
b. translate e-mail name into an image
Emphasis here is on stitching existing tools together for easy use

[DO THIS FIRST]

2. Experiment
Generate fresh e-mail names via different techniques and see which
generate spam.
a. Control case - name in clear
b. simple Javascript (as with 1a)
c. image (as with 1b)
d. in clear in pdf document
e. current Informatics technique (name @ inf.ed.ac.uk) (purpose of this
is to check how effective current technique is)

[DO THIS SECOND -- SO AS TO MAXIMIZE TIME TO ACCUMULATE DATA]

3. PDF e-mail obfuscation tool

[TECHNOLOGICALLY STRAIGHTFORWARD, BUT PERSONALLY, PHIL COULD MAKE USE OF
THIS]

4. Diagnostic tool -- at users request, crawl their website and report
vulnerabilities [DO NOT RELEASE AS OPEN SOURCE]

[TECHNOLOGICALLY STRAIGHTFORWARD, PERSONALLY PHIL IS LESS INTERESTED IN
THIS]

5. Study obfuscated code techniques and apply them to generate a more
sophisticated Javascript obfuscator

[THIS HAS MOST ACADEMIC CONTENT]

6. Consider alternative to Javascript (e.g., challenge-response running
on server) for clients that do not have Javascript -- this probably
requires that user have CGI capability.

[TECHNOLOGICALLY STRAIGHTFORWARD]

7. User-engineer a site distributing these tools in order to make it
popular. Count downloads to measure success.

[RELEVANT TO INFORMATICS, BUT USES DIFFERENT MUSCLES]

8. Apply AJAX techniques, possibly using Captcha and/or using self
modifyng code.

[PERSONALLY INTERESTING TO PHIL, MAY BE PRETTY CHALLENGING]

9. Build well-engineered tool

10. Study which techniques are effective -- what sort of things will
spambots easily do (e.g., perhaps, execute Javascript) and not easily do
(e.g., if Javascript is expensive when will they stop)?

Overall plan:

Start with something simple, so you have a definite result under your
belt: 1, 2, 9, start 7.

Then spend bulk of time on something intellectually challenging, such as
5 or perhaps 8.