Monthly archives "December 2017"

Yes, Java helped me name Lucas

posted by Roberto Cortez on
tags: ,

LucasHaving a child is one of the most memorable moment you can experience. Parents have a huge responsibility with this new life and it starts right away when they have to choose the name. Our name is something that we keep for life (most of the times), so it needs some careful consideration. With literally thousands of names to choose from, how about using some Java technology to help us?

Yes, I wrote a small Java application to help me choose my baby name!

Requirements

First of all, we have to define some rules!

Basic Rules

  • A short name (but not to short)
  • In the first half of the alphabet
  • Without special characters

A short name, so it easy to call him. In the first half of the alphabet, because in Portugal we have this stupid rule where kids get seated in the classrooms in alphabetic order, so letters in the end of the alphabet get to sit in the end of the room! And finally, a name without special characters (we have a few in Portugal), to be easier for foreign people to use.

Advanced Rules

  • Exists in at least three languages (Portuguese, Spanish and English)
  • Sounds and Writes the same in all three languages

To cover the fact that I was born in a Spanish speaking country, that we live in Portugal and English for the globalization.

Constraints

There are also some constraints. In Portugal, you cannot use any name you want. You need to pick the name from an approved list of names. Of course, this is a comprehensive list of names that covers all common names. It is mostly used to avoid giving a stupid name to your kid. If your curious about it, check the list here.

Implementation

Basic Rules are Easy

After grabbing the data in the file and use some Java 8 Streams:

(Yes, I’m lazy with regular expressions)

Advanced Rules are more Interesting

Beider-Morse Phonetic Matching

When searching for a way to determine if a word or a name exists in another language and sounds the same, I came across with the Beider-Morse Phonetic Matching.

The main objective of Beider-Morse Phonetic Matching consists in recognizing that two words are written in a different way actually can be phonetically equivalent, that is, they both can sound alike. But unlike soundex methods, the “sounds-alike” test is based not only on the spelling but on linguistic properties of various languages.

How?
It tries to guess the language of the word following specific language rules and then calculates a phonetic value for that word. For instance:

  • tsch, final mann and witz are specifically German
  • final and initial cs and zs are necessarily Hungarian
  • cz, cy, initial rz and wl, final cki, letters ś, ł and ż can be only Polish

And then a phonetic value is calculated:

OriginalPhonetic ValueExample
tztsFitzgerald
cscircle
cckssuccess,accent
gh(g|f|w)burgh|tough|bough
knnknight
mcmakMcDonald

 

The Commons Coded project has an implementation of the Beider-Morse Phonetic Matching algorithm. Try it out and play with it.

Rosette API

The Rosette API is a Text Analysis Toolkit, that provides multiple services to perform text analysis. They also have a Name Translation service with a REST endpoint that you can use to feed in names and the desired language and return the right translation with a confidence score. Their API is useful to double check results obtained with the Beider-Morse Phonetic Matching.

They have fantastic support, providing libraries to integrate with their API’s in multiple languages and also a lot of samples you can use. Check their Github repo here.

behindthename.com

The Behind the Name website provided with the etymology and history of first names, plus a comprehensive list of names and what languages do they exist. On top of that, they also provide an API to check that information, so you can use it to triple check the results from Beider-Morse Phonetic Matching and the Rosette API.

Lucas Behind the Name

Adding it all Together

Lets just add this to our previous Java 8 Streams filter:

Final Results

After all, rules are applied and filtered the initial list, only 2 names remained. One is the obvious Lucas and the other was David. So, both these names exist, are written and are pronounced in the same way for Portuguese, English, and Spanish.

Proof it works?

Well, now I’ve just go to any random Starbucks and something with the name Lucas and confirm that they got it right. So far so good!

Lucas Proof

If you find this interesting, I even published the code in a Github repo. Check it out.

Note for Lucas: Lucas if you read this when your older, please excuse me for having such a geeky father.