# Baby naming

Fill in the following form to the best of your abilities to have your baby automatically named:

• Baby's sex:
• Baby's hair colour:
• Baby's eye colour:
• Baby's build:
• The names of all of baby's parents, separated by spaces (note, any number of parents is acceptable. Also note, if you use commas instead of spaces, I will stab you in the face):
• How many names would you like to generate (some human sanity-checking is required, so generating at least 5 is recommended)?

## How it works

The primary data in this case are male and female names from the US census. The first thing we do with these lists is break them up into contiguous sequences of vowels and consonants, such that Florence becomes Fl, o, r, e, nc, e. We do this for each name in the list and build up a histogram of sorts. For example, let's say we had two names in our list, Florence and Geraldine. The consonants in the first position are [(Fl, 1), (G, 1)]; the consonants in the third position are [(r, 2)] (because r appears third in the decomposition in both cases). The names of the "parents" that are given are added to these distributions (weighted to make them more important).

From here we have a genetic algorithm. Given our histogram of consonant and vowel sequences, we randomly generate (say) 20 names. The generated names are scored for similarity to "connotations" according to the hair colour, etc. attributes we get ("connotations" here are just words I arbitrarily made up to correspond with certain attributes). This score becomes the fitness function. The (say) 5 least fit names are weeded out of the population. The remaining 15 names are then decomposed to augment the histogram, and the next generation of names are bred. This continues on for a few generations (4 generations for this script, I believe).

After that, we go back to our US census. The Needleman-Wunsch algorithm is used to find similarity between each generated name and a "real" name from the US census. The "real" name that has the highest similarity is said to be the "closest real name".