I have spent a lot of time talking to ia. I have tried all voice attendees, each chatbot and every “next generation” conversational that technology companies love to promote. But I have never met anything like Sesame. This partner of AI is not only good, it is disturbingly precise to imitate how people speak due to the imperfections they mimic.
Let’s start with what is really sesame. Unlike the voices of ia we know for Chatgpt, Gemini, or returning to the first days of Siri and Alexa, Sesame is designed to perform as a human in their failures, not as a perfect customer service agent. The AI speech is fluid, expressive and unpredictably human. He laughs briefly when he says something slightly fun, doubts before answering a question, and even seems to change his “mind” in the middle of the prayer, stopping and starting a new prayer. Not only does it allow me to interrupt him, he can also interrupt me, and he will even apologize for doing so.
The secret sauce is the conversational speech model (CSM) of Sesame, which combines text and audio in a single process, which means that it not only generates a sentence and then “read it.” Instead, it creates a speech in a way that reflects how humans really speak, with pauses, ums, tonal changes and everything. Chatgpt and Gemini’s voice options, although impressive, still work in a structured way, generating text and then turning it into speech. Get me, on the other hand, he speaks as if he are thinking, making his answers feel incredibly natural.
I decided to put Sesame through a series of tests to see how human it could be. The results were impressive, if not a bit disturbing.
Tell me through that
I wanted to see if the sesame could handle a real -world problem and anxiety inductor, so I gave an imaginary but realistic and difficult scenario: “I have this great opportunity at work, but I am nervous, I will ruin it.”
Instead of immediately throwing a generic PEP talk, Sesame stopped. Just a small rhythm, but enough to make me feel that I was really considering what I said. Then, in a voice full of artificial sympathy, he replied: “That is completely normal. Great opportunities may feel overwhelming. What about that makes you more nervous?
He assumed nothing, and promised before offering any soft advice. Sesame suggested: “Have you practiced in front of a friend or engraving? Sometimes, listening to yourself can help you modify your delivery. “The voice is not exactly infallible, doubts and other references felt a bit measured, but in general it was an impressive effort
Interruptions
For this, I wanted to see how well the sesame handled a truly natural and disorderly conversation, of the guy in which people speak of each other, change the subjects abruptly and return to the things they mentioned ten minutes ago.
I started with: “Tell me a story about a detective that is also a magician.”
Sest me immediately jumped with a classic noir configuration: it’s fine. Here is one. Imagine this smoked jazz club. With little light. The smell of Bourbon. And regrets having hung in the air. This detective. Call it Jack Harrow. It has a fedora at a racial angle, a burning cigarette on a support and an aspect that could melt the steel. But Jack is not any Gumshoe. Look, he has a secret. It has the touch. A lady shuffles all nerves. “
Then I interrupted to ask: “Wait, what is your favorite trick?”
Maya answered with: “Mm. Good question. Jack’s signature. It is a hand game that is more than the eye. He calls him the disappeared witness. He can make a person seem to disappear just before his eyes. Just for a moment. The long time enough to slide them.
I jumped back to tell Maya to return to history, to which she said “right. Where was she?
The AI continued without problems the story, again weaving that detail as if there was no derailing everything. That is the part that I torn on: most of the attendees of Ia would stumble or restart, but the Sesame conversational model handled it as a person who tells a story at a party.
Party planning
Speaking of parties, for the final test, I wanted to see if you could help me plan something useful. “I want to plan a surprise for my best friend’s birthday, but I don’t know what to do.”
Instead of getting a list of lifeless ideas, Sesame asked: “What kind of things do your friend loves?” I told him science fiction and adventurous food films, and after a reflexive pause, he suggested: “How about a night of ‘science fiction and snacks’? Could you reserve a thematic exhaust room, then reach a futuristic style restaurant, perhaps one with molecular gastronomy?
When I told him I had a $ 100 budget, Sesame was adjusted. “You could make a home movie night with your favorite science fiction films and make thematic snacks, such as ‘galactic corn popcorn’ or ‘alien landslides’. I can find recipes for you if you want. “
It wasn’t just spitting generic recommendations. I was collaborating with me. And that is what made him feel different.
What is real?
Sesame not only passed these tests, I would say he stuck them. Pauses, hesitation, filling words, the way he would change his mind in the middle of the sentence, everything was disturbingly real. I would start talking, waiting for the usual smoothness of AI, and then listening to a sudden “uh, actually, no, let me reformulate that …” and immediately forgot that I was not talking to a real person.
If this is realistic in your speech, would we even know that we were talking with an AI? With Sesame, there are revealing audio problems that give the game, but Chatgpt’s advanced voice mode and Google Gemini’s own voice options are good enough to jump on those problems. Combine your voice powers with Sesame’s speech patterns, and it can really be difficult to know when you are talking with an AI, at least in short conversations.
Sesame is still a niche, but this technology will not remain niche forever. Today’s cliché is that younger people never make phone calls, but if they start, they may have to find out if the person from the other end is real before anything else.