Massive Breakthrough in the World of AI

In a groundbreaking development, ChatGPT, the popular AI-powered language model, has taken a monumental leap forward by becoming officially multimodal. This transformation equips ChatGPT with the ability to not only process text but also to speak, hear, see, and interact with images, making it ten times more versatile and user-friendly.

Speak with ChatGPT: Voice Interaction Made Seamless

One of the most remarkable features of this update is the introduction of voice interaction with ChatGPT. Users can now engage in fluid, back-and-forth conversations with their virtual assistant using voice commands. The incorporation of a hyper-realistic text-to-speech model allows users to select from five distinct voices, enhancing the conversational experience.

For those using mobile devices, accessing voice interaction is a breeze. Simply opt-in to the voice feature through the ‘Settings’ menu under ‘New Features’ on the mobile app.

Chat with Images: A Leap in Language Reasoning Skills

ChatGPT’s language reasoning capabilities have taken a giant stride by gaining the ability to comprehend images, photographs, screenshots, and documents containing text. Users can now seamlessly discuss and describe multiple images, and a novel drawing tool has been introduced to guide the AI assistant in image-related interactions, expanding the possibilities of communication.

New Text-to-Speech Model: A Symphony of Voices

The enhanced voice capability of ChatGPT is powered by a brand-new text-to-speech model. OpenAI has collaborated with professional voice actors to craft five distinct voices, each characterized by its unique qualities and nuances. This innovation promises to deliver a more natural and engaging conversational experience.

Spotify Collaboration: Voice Translation Comes Alive

The newfound text-to-speech capabilities of ChatGPT have already found a practical application. Spotify is harnessing this technology for its Voice Translation feature pilot. Soon, AI-translated podcasts will be making their debut on the platform, opening up new avenues for content localization and global accessibility.

Over the course of the next two weeks, the multimodal features, including voice and image interactions, will be rolled out to Plus and Enterprise users. Additionally, voice interaction will extend to iOS and Android platforms, while image interaction will be accessible on all supported devices.

This monumental leap in AI capabilities marks a pivotal moment in the evolution of conversational AI. With ChatGPT now being able to speak, hear, see, and comprehend images, the possibilities for innovative applications across various domains are boundless. As AI continues to evolve, it is clear that the future of human-computer interaction has arrived, and it looks promising.


Maria Irene is a multi-faceted journalist with a focus on various domains including Cryptocurrency, NFTs, Real Estate, Energy, and Macroeconomics. With over a year of experience, she has produced an array of video content, news stories, and in-depth analyses. Her journalistic endeavours also involve a detailed exploration of the Australia-India partnership, pinpointing avenues for mutual collaboration. In addition to her work in journalism, Maria crafts easily digestible financial content for a specialised platform, demystifying complex economic theories for the layperson. She holds a strong belief that journalism should go beyond mere reporting; it should instigate meaningful discussions and effect change by spotlighting vital global issues. Committed to enriching public discourse, Maria aims to keep her audience not just well-informed, but also actively engaged across various platforms, encouraging them to partake in crucial global conversations.


