In a groundbreaking development, ChatGPT, the popular AI-powered language model, has taken a monumental leap forward by becoming officially multimodal. This transformation equips ChatGPT with the ability to not only process text but also to speak, hear, see, and interact with images, making it ten times more versatile and user-friendly.
Speak with ChatGPT: Voice Interaction Made Seamless
One of the most remarkable features of this update is the introduction of voice interaction with ChatGPT. Users can now engage in fluid, back-and-forth conversations with their virtual assistant using voice commands. The incorporation of a hyper-realistic text-to-speech model allows users to select from five distinct voices, enhancing the conversational experience.
For those using mobile devices, accessing voice interaction is a breeze. Simply opt-in to the voice feature through the ‘Settings’ menu under ‘New Features’ on the mobile app.
Chat with Images: A Leap in Language Reasoning Skills
ChatGPT’s language reasoning capabilities have taken a giant stride by gaining the ability to comprehend images, photographs, screenshots, and documents containing text. Users can now seamlessly discuss and describe multiple images, and a novel drawing tool has been introduced to guide the AI assistant in image-related interactions, expanding the possibilities of communication.
New Text-to-Speech Model: A Symphony of Voices
The enhanced voice capability of ChatGPT is powered by a brand-new text-to-speech model. OpenAI has collaborated with professional voice actors to craft five distinct voices, each characterized by its unique qualities and nuances. This innovation promises to deliver a more natural and engaging conversational experience.
Spotify Collaboration: Voice Translation Comes Alive
The newfound text-to-speech capabilities of ChatGPT have already found a practical application. Spotify is harnessing this technology for its Voice Translation feature pilot. Soon, AI-translated podcasts will be making their debut on the platform, opening up new avenues for content localization and global accessibility.
Over the course of the next two weeks, the multimodal features, including voice and image interactions, will be rolled out to Plus and Enterprise users. Additionally, voice interaction will extend to iOS and Android platforms, while image interaction will be accessible on all supported devices.
This monumental leap in AI capabilities marks a pivotal moment in the evolution of conversational AI. With ChatGPT now being able to speak, hear, see, and comprehend images, the possibilities for innovative applications across various domains are boundless. As AI continues to evolve, it is clear that the future of human-computer interaction has arrived, and it looks promising.