- The voices generated by AI now imitate humans so convincing that detection is almost impossible
- Create a convincing voice clone now takes minutes and a minimum experience
- Some synthetic voices were actually qualified more reliable than real human recordings
For years, many people assumed that the discourse generated by AI could always be identified by their slightly “false” qualities.
A new research from the Queen Mary of London challenges this assumption, which shows that the current voice technology of AI has reached a level in which the “voice clones” and the deep are almost indistinguishable from the real recordings.
In the study, participants compared human voices with two forms of synthetic audio: cloned voices designed to imitate speakers and real voices generated from a LLM system without specific counterparts.
Beyond realism and domain
The listeners often struggled to distinguish between the two, which suggests that technology has entered into a phase in which humans’s realism is no longer an aspiration, but a reality.
The research team investigated not only if participants could distinguish between synthetic and real voices, but also how they perceived them.
Surprisingly, both types of voices generated by AI were evaluated as more dominant than human, and in some cases, they were judged more reliable.
Dr. Nadine Lavan, head of Psychology at Queen Mary University in London, emphasized the ease and economic that her team created these voice clones.
“The voices generated by AI are around us now, it was only a matter of time until AI technology began to produce a naturalistic and human sound speech, the process required a minimum experience, only a few minutes of voice recordings and almost without money,” he said.
She said that the ease of use shows to what extent technology has advanced in a short time.
This accessibility creates opportunities in fields such as education, communication and accessibility, where custom synthetic voices could improve commitment and scope.
As well as IA writers ask questions about originality, copyright and misuse, the generation of the voice causes debates about the property of identity and consent.
If a realistic audio can be created from a brief sample, the risks of unauthorized cloning become difficult to ignore.
As IA tools continue to expand in capacity and accessibility, the challenge will be to ensure that the benefits are obtained without opening new paths for deception.
Understanding how people respond to these voices is only the first step to address the ethical, legal and social implications of a technology that is no longer futuristic, but firmly present.