Speech2Face: AI ​​That Predicts A Person's Face Just By Listening To Their Voice


Published on:

The new artificial intelligence called Speech2Face can predict a person’s face just by listening to their voice.

A group of researchers from the Massachusetts Institute of Technology (MIT) is behind the project aimed at creating an algorithm capable of generating the most characteristic physical features of a person only with their speech.

Speech2Face is based on a neural network system that is able to recognize elements such as race, age or gender of a human being. After training it to learn the correlations that exist between the voice and face of thousands of people who appear in YouTube videos, Speech2Face managed to have a multitude of references that allow it to create a face without the need for an image.

The most amazing thing about AI is that it can create virtual faces very similar to those of a person. However, they are not totally accurate like those obtained with artificial intelligence that compares synthetic faces with photographs of real faces. 

In fact, and as the MIT researchers detail in their article, the objective is not to create an image that replicates a person’s face but to generate one that recovers “the characteristic physical features that are correlated” with speech.

With this new AI model that works based on a machine learning system, in the future, we could achieve interesting benefits. One of the main ones is to create profiles of criminals with simple audio. However, there are also disadvantages. There are many possibilities that this facility of creating a face could be used to impersonate a person’s identity. But it is a great advancement for technology the emergence of this trained AI.

Recently OpenAI presented an AI system, DALL-E 2, that creates and edits images from a description in natural language.

