Meta AI has released a new speech-to-text translation model called SeamlessM4T, which can translate and transcribe text and speech into nearly 100 languages.
Meta asserts that SeamlessM4T, coupled with the newly introduced translation dataset SeamlessAlign, represents a notable advancement in AI-driven speech synthesis and text-to-speech technology.
SeamlessM4T is a successor to Meta’s No Language Left Behind (a text-to-text machine translation model), Universal Speech Translator (one of the few direct speech synthesis translation systems supporting Hokkien). Massively Multilingual Speech is also built on Meta’s framework, which provides speech recognition, language identification and text-to-speech technology in over 1,100 languages.
Meta isn’t the only company investing resources in developing advanced AI translation and transcription tools. In addition to the wealth of commercial services and open-source models already offered by Amazon, Microsoft, OpenAI, and numerous startups, Google is building models that can understand the 1,000 most-spoken languages worldwide.
Not all content creators favour using public data to train commercially available models. Some are suing companies building AI tools on top of publicly available data; vendors should be forced to provide credit without paying and opt out. However, Meta says the data it collects is not copyrighted and comes primarily from open and licensed sources.
In any case, Meta used the scraped text and audio to create a training dataset for SeamlessM4T called SeamlessAlign. Researchers performed 443,000 hours of speech-to-text alignments, created 29,000 hours of “speech-to-speech” alignments, and enabled SeamlessM4T to perform speech-to-text transcription, text translation, and text-to-speech.
Meta claims that in internal benchmarks, SeamlessM4T performed better against background noise and “speaker variation” in speech-to-text tasks compared to current state-of-the-art speech transcription models.
According to Meta, the model possesses the ability to identify gender bias in language and even quantify gender bias within translations. Meta highlights that SeamlessM4T tracks instances where a translation introduces gendered terms to a concept that was originally gender-neutral in the source language.
Meta’s launch of the Multilingual Speech and Text Translation Model for 100 languages stands as a pivotal achievement in AI language technology, offering a pathway to cross-cultural communication and global connectivity.