Meta AI has announced the open-source release of its latest project called Massively Multilingual Speech (MMS). The project aims to address the challenges associated with developing accurate and reliable speech recognition models. By open-sourcing MMS, Meta hopes to encourage language diversity preservation and foster collaboration within the research community.
Speech recognition models hold immense potential, particularly for individuals who heavily rely on voice-based information retrieval. However, training high-quality models usually demands vast amounts of data, including thousands of hours of audio. This poses a significant obstacle for many languages, especially those with limited available data.
To overcome this data scarcity, Meta’s researchers turned to the Bible. As the Bible has been translated into thousands of languages, it serves as a valuable resource for text-based language translation research, often accompanied by audio recordings. The Meta team compiled a dataset of New Testament readings in over 1,100 languages, providing an average of 32 hours of data per language.
While 32 hours of data are insufficient for training a conventional supervised speech recognition model, Meta utilized a self-supervised learning algorithm called wav2vec 2.0. This algorithm enables machines to learn without relying on labelled training data, allowing for speech recognition models to be trained with limited data.
In the MMS project, multiple self-supervised models were trained using approximately 500,000 hours of speech data across more than 1,400 languages. These models were then fine-tuned for specific speech tasks, such as multilingual speech recognition and language identification.
According to Meta, the resulting models performed well in standard benchmarks and outperformed other speech recognition models, including OpenAI LP’s Whisper model, with approximately half the word error rate.
Meta intends to share the MMS dataset and the tools used for model refinement and training with the broader AI research community. The company aims to expand the coverage of MMS to support more languages and enhance dialect handling, which remains a significant challenge in existing speech technologies.
By open-sourcing MMS and encouraging collaboration, Meta seeks to drive advancements in speech recognition technology and language diversity preservation and improve accessibility for individuals who rely on voice-based information acquisition. The release of MMS represents Meta’s commitment to fostering innovation and progress in the field of AI research.