Thu, 14 Feb 2013 13:03 UTC
A new machine-learning algorithm can use sound rules to suss out the most likely phonetic changes in a shifting language. All words shift over time and place, but certain vowels and pronunciations are going to shift more than others--you say tomato, I say tomahtoe, Canadians say "aboot," and so on. Alexandre Bouchard-Côté and colleagues at the University of British Columbia in Vancouver developed a system that can suggest how words may have sounded in the past, and which sounds were the most likely to shift. Then they compared the results with analysis by human experts, and found the 85 percent of the computer's suggestions were within a single character of the correct words.
They looked at 637 distinct Austronesian languages, which span the Pacific from the Philippines to Hawaii. They would start, for example, with the word for "star." In Fijian, the word is kalokalo. In Pazeh, a Taiwanese aboriginal language, it's mintol. People who speak the Bornean tongue of Melanau call it biten, and those who speak the Filipino dialect called Inabaknon know it as bitu'on. The root word, from which all of these languages evolved, is bituquen. The computer deduced that correctly.
The catch is that there's a lot of front-end work before the computer can do its analysis. Linguists have to input a list of words in a given language, plus their meanings, and generate a sort of "tree of life" for language--a phylogenetic map showing how each word is related to the others. (It resembles in both form and function the phylogenetic map used by botanists and biologists to show how life is related.) But when it gets to work, the algorithm is efficient. It can recognize cognates, which are words with the same root, within languages, and then figure out the probable root.
The researchers acknowledge there's still more advanced work to be done, but they hope it will be a boon to historical linguists the way genetic information has changed biology. Instead of morphological change--looking at a thing and seeing how it changes or compares to other things--is much simpler than looking at the genes. This algorithm can work in a similar fashion, computationally studying the roots of words and languages rather than using a specially trained ear. The paper appears this week in the Proceedings of the National Academy of Sciences.