ChatGPT Adds Voice and Image Search Features With New Update

Until now, ChatGPT was primarily a text-based service. You typed in your query, and the chatbot provided an answer. But OpenAI has decided to take things up a notch. The new version of ChatGPT allows users to interact with the chatbot through voice commands or by uploading images. This feature will be initially available to ChatGPT Plus and Enterprise subscribers and will be rolled out to the general public soon after.

ChatGPT Voice Interaction:

The voice chat feature is designed to be intuitive. Users can tap a button, speak their question, and ChatGPT will convert the speech to text. The chatbot then processes the query, generates an answer, and converts it back to speech. The company’s Whisper model handles the speech-to-text conversion, while a new text-to-speech model generates human-like audio.

Users will have the option to choose from five distinct voice tones, making the interaction even more personalized. Imagine having a conversation with an AI that not only understands you but also speaks in a tone that you find most comforting.

OpenAI is also collaborating with Spotify to translate podcasts into other languages while retaining the original podcaster’s voice. This shows the immense potential and versatility of ChatGPT’s new voice feature.

OpenAI believes that the improved underlying technology will make ChatGPT’s responses superior to those of existing digital assistants like Alexa or Google Assistant.

ChatGPT Image Search:

The image search functionality is similar to Google Lens. You capture an image of what you’re interested in and upload it to ChatGPT. The AI then employs advanced algorithms to analyze the image and generate a contextual response. For instance, if you upload a photo of a historical monument, ChatGPT won’t just tell you its name; it could provide background information, significance, and even nearby attractions.

What sets ChatGPT’s image search apart is its ability to engage in a back-and-forth dialogue with the user. If the initial answer doesn’t fully address your query, you can easily refine it. You can even use the app’s drawing tool to highlight specific parts of the image or add text-based questions to clarify your intent. This multimodal approach to search offers a more interactive and satisfying user experience.

Ethical Considerations:

While these features offer exciting possibilities, they also present new risks. The ability to generate a synthetic voice from just a few seconds of audio could be misused for impersonating public figures or committing fraud. OpenAI has acknowledged these concerns and stated that the model will be more controlled and limited to specific use cases and partnerships.