Meta AI introduces NLLB-200 — a standalone AI model designed to enable translation to and from 200 languages in real time. The system can translate between many languages without using English.
Six months ago, Meta AI launched the No Language Left Behind (NLLB) project, and NLLB-200 is a part of this project, which attempts to train an AI model to translate between languages often ignored by tech giants.
The NLLB-200 model has over 50 billion parameters and is trained using Meta’s Research SuperCluster — one of the fastest AI supercomputers in the world. According to Mark Zuckerberg, the project aims to provide at least 25 billion translations per day between Meta apps.
Meta claims its new AI model can translate 55 African languages with “high-quality results”. For some African and Indian dialects, the advantage over analogues has reached 70%.
Like many large language models, NLLB-200 was trained on a huge amount of data collected from the Internet. But first, the developers compiled the NLLB-Seed dataset, which included sentences from the most important Wikipedia pages, translated from English into 39 low-resource languages by professional translators. In total, the dataset included about six thousand sentences. This dataset, as well as already existing datasets for other languages, makes it possible to start training the model.
Meta said that their development would improve the quality of the translation of texts on Facebook and Instagram platforms, as well as third-party services like Wikipedia.
In addition to NLLB-200, Meta also introduces Flores-200, an evaluation dataset that examines the performance of translation AI for each language. Meta has announced that it will open-source both NLLB-200 and Flores-200 so developers can work to integrate it and researchers can expand its reach to even more languages. The developers have created a site that demonstrates the work of the new model.
Meta also introduced a $200,000 grant program open to non-profit organizations that want to explore ways to apply the NLLB-200 in the real world.
In April, the company introduced the open-source language model OPT-175B.