Saturday, August 28, 2010

A Lookup for Nicknames and Diminutive Names

I created a simple lookup file that contains United States given names (first names) and their associated nicknames or diminutive names. For example "gregory" -> "greg", or "geoffrey" -> "geoff". The file can be downloaded and contributed to from here http://code.google.com/p/nickname-and-diminutive-names-lookup/.

This lookup was started from http://www.tngenweb.org/franklin/frannick.htm which is used for genealogy purposes. It was a good source to start from but because it is used for genealogy purposes there are some pretty of old names in there. There was also a significant effort to make it machine readable, i.e. separate names with commas, remove human readable conventions, like "rickie(y)", so that it would be made into two different names "rickie", and "ricky".

This is a large list with about 700 entries. Any help from people to clean this list up and add to it is greatly appreciated. Think of it as a wiki where you can contribute or change it as needed. CSV was the easiest format to use. Maybe I'll release this in XML or something later, or maybe a kind soul who uses this library wants to contribute another format they converted it into?

I was rather surprised that I couldn't find anything like this on the web. The best I could find was the pdNickname database and it costs $500. So, I created my own and released it as open source so that others could benefit from my work.