OpenAI has 3 new AI voice models that the ChatGPT maker says will “unlock a new class of voice applications for developers.”

OpenAI has launched three new artificial intelligence (AI) models
They are for real-time speech tasks: reasoning, translation and transcription.
Each is designed to integrate into developers’ AI applications.

If you’re a regular ChatGPT user, you may know that you don’t have to interact with the artificial intelligence (AI) chatbot solely via text: it can talk to you and also take care of your voice requests. Now, OpenAI, maker of ChatGPT, has announced three new voice models that it believes will “unlock a new class of voice applications for developers.”

Each AI speech model is designed for a different purpose, including in-depth reasoning, translation, and transcription. If you’re looking for a voice model along those lines, it might be worth a try.

According to OpenAI, the new models include the following:

Latest videos of

“GPT‑Realtime‑2, our first voice model with GPT‑5 class reasoning that can handle more difficult requests and move the conversation forward naturally.
“GPT‑Realtime‑Translate, a new live translation model that translates speech from more than 70 input languages to 13 output languages while keeping pace with the speaker.
“GPT-Realtime-Whisper, a new speech-to-text transmission that transcribes speech live while the speaker speaks.”

OpenAI’s news post explains that the company has seen developers use AI voice models in three different ways: by asking the AI to perform a task; having AI explain a situation (such as a travel delay) to the user; and maintaining conversations in the user’s local language.

It’s those use cases that OpenAI is trying to address with its new voice models. Each is designed for developers to use in their own applications, and all three are available as part of the OpenAI real-time API. GPT-Realtime-2 will cost $32 for one million input tokens and $64 for one million output tokens. GPT-Realtime-Translate is priced at $0.034 per minute, while GPT-Realtime-Whisper costs $0.017 per minute.

A person uses the ChatGPT voice mode on his phone.

(Image credit: OpenAI)

If you’re looking for an AI model that’s capable of deep reasoning and adapting to conversational flows, OpenAI says the new GPT-Realtime-2 option is for you. Developers can use it to check multiple sources at once, adjust their tone depending on user input, take advantage of more advanced levels of reasoning, and analyze specialized terms (such as proper nouns and expressions used in healthcare and production).

Translation apps, on the other hand, can use GPT-Realtime-Translate to convert speech in real time. Users will be able to speak their own language and translate and transcribe it without delay. This model works with more than 70 input languages and 13 output languages.

And if you want audio transcribed quickly and accurately, there’s GPT-Realtime-Whisper. This model is useful for creating subtitles, meeting notes, and summaries as conversations are ongoing, OpenAI says, meaning “live products can appear faster, more responsive, and more natural.”

If you want to try any of the new models, they are available on OpenAI’s Playground site. And if you’re using Codex, OpenAI has created a message that will directly add GPT-Realtime-2 to the agent coding platform.

Follow TechRadar on Google News and add us as a preferred source to receive news, reviews and opinions from our experts in your feeds. Be sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form and receive regular updates from us on WhatsApp also.

The best laptops for all budgets

Must Read

Leave a Comment Cancel Reply