OpenAI CEO and co-founder Sam Altman revealed on X (formerly Twitter) Thursday that its Advanced Voice feature will begin rolling out “next week,” though only for a few select ChatGPT-Plus subscribers.
The company plans to “start the alpha with a small group of users to gather feedback and expand based on what we learn.”
alpha rollout starts to plus subscribers next week!
— Sam Altman (@sama) July 25, 2024
Advanced Voice, which does away with the text prompt and enables users to converse directly with the AI as one would another human, was initially announced in May alongside the release of GPT-4o during the company’s Spring Update event. Unlike existing digital assistants like Siri and Google Assistant, which only provide canned answers to user queries, ChatGPT’s Advanced Voice provides human-like responses, nearly latency-free, and in multiple languages.
The GPT-4o model is able to respond to audio inputs in 320 milliseconds on average, which is on par with how quickly humans react to normal conversation. As you can see in the demo video below, the model can converse with multiple users simultaneously, improvise talking points and questions in both English and Portuguese as well as conveying them with human-ish emotions, including “laughter.”Learning a new language with ChatGPT Advanced Voice Mode
There’s no word yet on how the company will choose participants for alpha trial aside from them being $20/month ChatGPT Plus-tier subscribers. The alpha release was originally scheduled for June, though that date was pushed back “to reach our bar to launch” and improve its ability to detect and reject prohibited forms of content, as well as buttress the company’s IT infrastructure to accommodate the anticipated user load increase.
As the company announced in June, the feature’s full rollout won’t happen until at least this fall, and its exact timing will, again, depend on it “meeting our high safety and reliability bar.”
Giving ChatGPT the ability to converse naturally with its users is a huge advancement. Eliminating the need for a context window reduce user hardware requirements and expand the potential integrations and use cases for AI (such as increasing access to users with body mobility or dexterity limitations).
It can also help speed the technology’s adoption by the public by reducing the barrier to entry for less-tech-savvy users who are comfortable with interacting with their computers via “hey Siri” but blanch at the prospect of prompt engineering.Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…