Google Translate made its biggest leap forward by adding support for 110 new languages in a single shot. These languages are powered by Google's PaLM 2 large language model that does all the background work, learning closely related languages and translating text from one language to another.
The new languages include those spoken by small communities of Indigenous people as well as popular names like Cantonese. There are actively revived languages like Manx that almost went extinct when its last native speaker died in 1974.
Google explained in a blog post that it took so much time to get Cantonese onboard because of its similarities with Mandarin, making it difficult to find data and train models.
The latest addition is part of Google's 1,000 Languages initiative, where it wants to build AI models that support the 1,000 most spoken languages across the globe. Google previously used a tech called Zero-Shot Machine Translation to support 24 new languages in 2022.
These new 110 languages represent more than 614 million speakers globally and offer translation for about 8% of the world's population. You can check out the list of all new languages on this support page.
The search giant explained that languages can have a lot of diversity in the form of regional varieties, dialects, and different spelling standards. Picking a language variety is a task in itself, and Google chooses the one that is most commonly used.
"For example, Romani is a language that has many dialects all throughout Europe. Our models produce text that is closest to Southern Vlax Romani, a commonly used variety online. But it also mixes in elements from others, like Northern Vlax and Balkan Romani," it said.
These new languages will start showing up over the next few days on the Google Translate website and its apps for Android and iOS.
3 Comments - Add comment