OpenAI Announces GPT-4o: Its Most Powerful AI to Date will be Available for Free on ChatGPT


Published on:

Key Takeaways:
  • GPT-4o, OpenAI’s most advanced model, can process audio and images and is available to all ChatGPT users for free.
  • GPT-4o handles multiple languages and offers real-time translation and quick voice interactions.
  • OpenAI is gradually releasing GPT-4o to ensure safety, starting with text and image inputs and later adding audio responses with selected voices.

OpenAI made a significant announcement this Monday by introducing GPT-4o, its most advanced language model to date. This new version matches the intelligence level of GPT-4 but boasts enhanced capabilities in interpreting audio and images.

During the GPT-4o presentation, Mira Murati, the Chief Technology Officer of OpenAI, explained that the new model will be accessible to all ChatGPT users. This means the advanced technology will be implemented not only in the paid versions but also in the free variant of the chatbot.

One of the major advantages of GPT-4o is its native audio and image processing capabilities. This enabled OpenAI to conduct several live demonstrations, including some using prompts from followers on X (Twitter).

The standout feature of GPT-4o is its ability to facilitate more conversational interactions with ChatGPT using voice. Users can now interrupt the language model mid-response to request changes to the original prompt, without experiencing delays or latency issues.

OpenAI engineers also showcased GPT-4o’s real-time interpretation skills using a smartphone camera. This capability allowed ChatGPT to help solve a linear equation and detect a person’s emotions by analyzing their facial expressions.

OpenAI has announced that the deployment of GPT-4o will be carried out iteratively. For ChatGPT users, the text and image capabilities of this new AI are available starting today. As mentioned earlier, GPT-4o will be accessible to both free and paid users, including those with ChatGPT Plus subscriptions. Notably, subscribers will benefit from message limits five times higher than free users.

Regarding GPT-4o’s Voice Mode, it will be released in an alpha version to ChatGPT Plus subscribers in the coming weeks. Additionally, access to this advanced technology won’t be restricted to OpenAI’s chatbot; the language model will also be available through the company’s API. Sam Altman confirmed this in a post on X.

Why GPT-4o?

OpenAI explains that the “o” in GPT-4o stands for “omni.” During the presentation, this artificial intelligence was described as an “omnimodel” due to its ability to interact, combine, understand, and respond in text, audio, and video formats.

The developers highlight that GPT-4o can respond to voice requests with a latency of just 232 milliseconds, although the average latency is 320 milliseconds. This is a significant improvement compared to GPT-4, where the average response time for ChatGPT’s Voice Mode was around 5.4 seconds.

“With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.” — notes OpenAI

GPT-4o matches the performance of GPT-4 Turbo when handling code and English text. However, it also introduces “significant improvements” in working with other languages as input methods. During the presentation, the AI demonstrated its ability to translate a conversation in real-time between English and Italian.

In terms of security, OpenAI acknowledges that the new capabilities of GPT-4o may present new risks. Consequently, its release to the general public will be gradual. Starting today, ChatGPT users can begin using this AI with text and image inputs, but only text outputs. Audio responses will be introduced later and will initially be limited to a catalog of preselected voices.

OpenAI, led by Sam Altman, has also released a video showcasing the limitations of GPT-4o’s Voice Mode. The primary issues include difficulties understanding certain prompts in languages other than English and mixing languages in its responses.

Related Posts:

Leave a Reply

Please enter your comment!
Please enter your name here