Ahead of the release of Star Wars: The Force Awakens, scientists at The Data Lab in Edinburgh have analysed several hundred characters from the Star Wars films and associated series’ to determine from which language each name is most likely to have come.
Using a list of over 500 names and on each an n-gram model from artificial intelligence was performed.
The n-gram model, from the field known as natural language processing, first splits the name into a sequence of single, double, and triple character strings. For example, the name “Luke” decomposes into the strings “l”, “u”, “k”, “e”, “lu”, “uk”, “ke”, “luk”, and “uke”.
Utilising a piece of software called textcat, the frequency of the resulting strings is compared with those of dozens of language corpuses.
From this the software is able to calculate probabilities of a given name coming from each of the languages. The most likely language is noted for each character name.
The technique is normally applied to larger bodies of text and is typically used to categorise written works by similarity, author or subject matter. In this instance the language analysis has been done as a bit of fun and is not intended to be taken too seriously.
Several of the best-known Star Wars characters are given in this article. For the full list of 500 characters visit the blog on The Data Lab’s website.
STAR WARS CHARACTER/LANGUAGE
Admiral Gial Ackbar: Scottish Gaelic
Padmé Amidala: Tagalog
Wedge Antilles: Danish
Jar Jar Binks: Middle Frisian
Chewbacca: Scottish Gaelic
Salacious B. Crumb: Catalan
Count Dooku: Slovakian
Jango Fett: Swedish
Boba Fett: Hungarian
Bib Fortuna: Basque
General Grievous: Breton
Jabba the Hutt: Scottish
Qui-Gon Jinn: Scottish Gaelic
Obi-Wan Kenobi: Slovenian
Owen Lars: German
Darth Maul: Welsh
Princess Leia Organa: Romansh
Emperor Sheev Palpatine: Slovenian
Darth Sidious: Irish
Anakin Skywalker: Tagalog
Luke Skywalker: Middle Frisian
Han Solo: Norwegian
Grand Moff Wilhuff Tarkin: German
Darth Vader: German
Mace Windu: French
The names span a huge number of different languages, from the readily familiar to the rather more obscure. Middle Frisian, for example, was spoken around the Netherlands, Germany and southern Denmark in the 17th and 18th centuries, whilst Tagalog is a modern-day language from the Philippines.
In addition to those given in the previous list a selection of characters whose most likely name derivation is Scottish or Scottish Gaelic is:
STAR WARS CHARACTER/LANGUAGE
Queen Apailana: Scottish
Cad Bane: Scottish Gaelic
Cin Drallig: Scottish Gaelic
Mama the Hutt: Scottish
Mon MotHma: Scottish
Sy Snootles: Scottish
Captain Grear Typho: Scottish Gaelic
There appears to be a connection between the names of the Hutt characters and Scottish. In addition to Jabba the Hutt, each of Borvo the Hutt, Gardulla the Hutt, Mama the Hutt, Rotta the Hutt, Ziro the Hutt, and Zorba the Hutt maps to Scottish, as does Sy Snootles, the lead vocalist in Jabba’s house band in Episode VI - Return of the Jedi.
Dr Richard Carter and Dr Roman Popat work for The Data Lab as data scientists. The Data Lab is a Scottish Innovation Centre focused on helping Scotland generate significant economic, social and scientific value from data through collaboration, education and community building.