Researchers from the Pontificia Universidad Católica del Perú (PUCP) and the Max Planck Institute for the Science of Human History have investigated the ability of machine learning algorithms to identify lexical borrowings, according to the German News Agency.
Lexical borrowing, or the direct transfer of words from one language to another, helps researchers trace the evolution of modern languages and indicate cultural contact between distinct linguistic groups. However, researchers often face challenges in this field because the tracing process requires the comparison of multiple languages.
"The automated detection of lexical borrowings is still one of the most difficult tasks we face in computational historical linguistics," the Phys.org website quoted lead author Johann-Mattis as saying.
In the current study, researchers trained language models that mimic the way in which linguists identify borrowings using acoustics to detect the words pronounced in the same way in different languages. This similarity indicates that the studied term was actually transferred from a language to another during the different phases of language evolution.
The team said the models were applied to a modified version of the World Loanword Database, a catalog of borrowing information for a sample of 40 languages from different language families all over the world, in order to see how accurately these models can determine the words borrowed from other languages.
In many cases the results were unsatisfying, suggesting that loanword detection is too difficult for machine learning methods most commonly used.
"After these first experiments with monolingual lexical borrowings, we can proceed to stake out other aspects of the problem," says researcher John Miller of PUCP.
Other researchers including co-author Tiago Tresoldi believe that "our computer-assisted approach will shed a new light on the importance of computer-assisted methods for language comparison and historical linguistics."