Random Drunken E-mail to Ex–Signicant Other Generator

Fill in the following form:

This is a work in progress. It is not expected to work properly. Future enhancements include:

More than anything else, what I need right now to improve this is real data to work from. If you've ever written a drunken e-mail to your ex-girl/boyfriend, or better yet, have received one, send it my way. I'll only use it for good, I promise...mostly.

How it works

Much less work went into this one (which is somewhat depressing because people seem to think this one works much better). The first pass (and most important, in my opinion) in this algorithm is a stochastic context-free grammar. A stochastic context-free grammar is a context-free grammar with probabilities attached. For example:

Time0.4→ last week.
0.35| a couple weeks after I saw you at the Place
0.25| a month or two ago.

That is an actual rule from the grammar I use (non-terminals are in italics). Whenever I want a place name, I just use that non-terminal. Note that all the probabilities add up to 1.0 (as they should). Note also that the second rule references another non-terminal (as context-free grammars often do). So I just use these probabilities and a random number generator to generate an e-mail from the grammar. Hooray.

From here there are four somewhat less interesting "drunk" filters that get applied:

  1. adding or removing letters. For example, dessert may become desert or dessssert;
  2. transforming uppercase letters into lowercase letters (I used to also do the reverse, but that didn't give pleasing results);
  3. mistyping. I have a map of the standard US QWERTY keyboard, and probabilities attached to missing a letter by one row or column (e.g., an s may become a d or a w or something). It should be of note that this filter is applied to contiguous sequences of letters (so if s gets mistyped as d, dessert would become deddert; it would never become desdert). As a Dvorak user myself, I understand it may seem prejudicial to use a QWERTY keyboard...which it is: Dvorak users are far too intelligent to be writing drunken e-mails in the first place;
  4. transposing. For example, dessert may become dessetr.

Finding suitable probabilities for each of these four filters, I found, was the hardest part and took a long time. Being out by even half a percentage point can have huge implications for how the e-mail comes out.