I tried the most realistic voicemate ever created: if chatgpt or gemini become so good, reality is in trouble


I have spent a lot of time talking to ia. I have tried all voice attendees, each chatbot and every “next generation” conversational that technology companies love to promote. But I have never met anything like Sesame. This partner of AI is not only good, it is disturbingly precise to imitate how people speak due to the imperfections they mimic.

Let’s start with what is really sesame. Unlike the voices of ia we know for Chatgpt, Gemini, or returning to the first days of Siri and Alexa, Sesame is designed to perform as a human in their failures, not as a perfect customer service agent. The AI ​​speech is fluid, expressive and unpredictably human. He laughs briefly when he says something slightly fun, doubts before answering a question, and even seems to change his “mind” in the middle of the prayer, stopping and starting a new prayer. Not only does it allow me to interrupt him, he can also interrupt me, and he will even apologize for doing so.

(Image credit: Sesame)

The secret sauce is the conversational speech model (CSM) of Sesame, which combines text and audio in a single process, which means that it not only generates a sentence and then “read it.” Instead, it creates a speech in a way that reflects how humans really speak, with pauses, ums, tonal changes and everything. Chatgpt and Gemini’s voice options, although impressive, still work in a structured way, generating text and then turning it into speech. Get me, on the other hand, he speaks as if he are thinking, making his answers feel incredibly natural.

Leave a Comment

Your email address will not be published. Required fields are marked *