Of the many recent advances in the field of artificial intelligence, perhaps the most unnerving is that of AI voice technology. Modern tools have now made it possible to quickly and cheaply replicate a specific person’s voice, and although these AI-generated versions are not quite perfect likenesses yet, it is important to be aware of the emerging threat they pose.
How do AI voice tools work and what are they used for?
Like other AI tools, voice cloning technology requires a comprehensive dataset to produce an accurate output. However, the scale of the data required for voice technology is relatively small, as each language uses a finite number of sounds – there are just 44 unique sounds, or phonemes, in the English language, all of which can be covered in just a couple of sentences of speech.
Once an AI system has this information, it can reconstruct any given word on demand by accessing the recordings and combining the relevant sounds, creating a digital voice model. This process is quick and easy, and is even offered for free by some companies. There are currently limits to the accuracy of AI voice models, however: they struggle to appropriately change intonation, which often results in a noticeable level of monotony, and they fail to replicate the verbal tics or particular sound effects an individual may be prone to using.
Voice technology has a limited range of applications compared to other AI systems. At present, AI voice models are most commonly used in media production, for example in the dubbing of foreign language content, where they often provide a cheaper solution than hiring a voice actor. Even if an actor is used, their voice can be replicated virtually for reshoots and post-production changes, so they don’t have to physically return to set if changes are required.
AI voice models are also seeing increasing use in remote customer service, where they provide a cost-effective way for companies to project a more friendly and helpful image than traditional automated answering systems. They are also being used to provide a voice to those who have lost theirs, such as in the case of throat cancer patients, where an AI voice built from old recordings of the person can be used to enable them to “speak” when they have lost their voice due to the illness.
What are the risks?
While there are undeniable benefits to using AI voice models, some concerns surround the technology, and there have been examples of misuse. With AI in such a nascent stage, the legal ramifications of developing and deploying AI voice models remain unclear. The synthesizing of actors’ voices for use in post-production introduces issues of accountability and consent, as the models could potentially be programmed to say anything. This deployment of AI voice models is one of the key issues at the heart of the current impasse between AMPTP and SAG-AFTRA, with the actors demanding stricter controls over how the models are built and used.
The darker side of AI voice models
Perhaps the most concerning aspect of AI voice models is their use in criminal activity. Fraudsters can now use AI voice clones of people who are trusted by their target, convincing them they are talking to someone they know, and making scams far more likely to succeed. This new technique has become so successful that the Federal Trade Commission in the US has provided a warning to the public on the matter.
Companies that offer AI voice model services will always state their services are not to be used for malicious purposes or without consent, but this is difficult to enforce. A notable example comes from the company Descript, whose Overdub AI voice tool is among the most effective available. Its voice print system was easily defeated by a group who used modified podcast audio from their friend, without his consent, to deceive his coworkers. They published a video on YouTube documenting the process, highlighting how easily accessible and effective the technology is, to encourage Descript to implement greater security.
The risk to those impersonated via AI voice tools can be reputational as well as financial. A video clip circulating on social media recently appeared to show the respected financial analyst Martin Lewis recommending an investment scheme. Lewis issued a statement confirming that the clip was fake (it had been created using deepfake and AI voice technology), but not before some people had fallen prey to the scam. The model of Lewis’s voice on the video was frighteningly accurate, in large part due to the abundance of audio content of his voice that is available online – highlighting the particular reputational risk that AI voice tools pose to high-profile individuals.
Minimising the risks posed by AI voice models
Although such scams warrant concern, measures can be taken to minimise the risk of being tricked by them. In the case of phone scams, the most important action to take is to end the call and initiate contact with the identified person yourself through their established number, as you would with traditional phone fraud. Another option is to create code words and phrases that only your family knows, to help remotely verify their identities. And for those who are at risk of being impersonated, it is wise to keep a record and transcript of all publicly-available audio content that includes your voice.
It is also worthwhile to follow the research on voiceprint authentication. This technology aims to create a unique and unimpeachable biometric record of an individual’s voice that cannot be replicated by AI, to be used for authentication purposes. Voiceprints are already in use by organisations including the Australian Taxation Office and some financial institutions.
While AI voice models can have beneficial uses across multiple industries, the current lack of regulation and understanding of the technology, combined with the ease of access to those wishing to misinform or cause harm, means that they are not without risk. Until a sufficient regulatory framework is established, it is important to remain vigilant when engaging with AI voice models, and to be aware of the risks they can pose.