Of the many recent advances in the field of artificial intelligence, perhaps the most unnerving is that of AI voice technology. Modern tools have now made it possible to quickly and cheaply replicate a specific person’s voice, and although these AI-generated versions are not quite perfect likenesses yet, it is important to be aware of the emerging threat they pose.
How do AI voice tools work and what are they used for?
Like other AI tools, voice cloning technology requires a comprehensive dataset to produce an accurate output. However, the scale of the data required for voice technology is relatively small, as each language uses a finite number of sounds – there are just 44 unique sounds, or phonemes, in the English language, all of which can be covered in just a couple of sentences of speech.
Once an AI system has this information, it can reconstruct any given word on demand by accessing the recordings and combining the relevant sounds, creating a digital voice model. This process is quick and easy, and is even offered for free by some companies. There are currently limits to the accuracy of AI voice models, however: they struggle to appropriately change intonation, which often results in a noticeable level of monotony, and they fail to replicate the verbal tics or particular sound effects an individual may be prone to using.
Voice technology has a limited range of applications compared to other AI systems. At present, AI voice models are most commonly used in media production, for example in the dubbing of foreign language content, where they often provide a cheaper solution than hiring a voice actor. Even if an actor is used, their voice can be replicated virtually for reshoots and post-production changes, so they don’t have to physically return to set if changes are required.
AI voice models are also seeing increasing use in remote customer service, where they provide a cost-effective way for companies to project a more friendly and helpful image than traditional automated answering systems. They are also being used to provide a voice to those who have lost theirs, such as in the case of throat cancer patients, where an AI voice built from old recordings of the person can be used to enable them to “speak” when they have lost their voice due to the illness.
What are the risks?
While there are undeniable benefits to using AI voice models, some concerns surround the technology, and there have been examples of misuse. With AI in such a nascent stage, the legal ramifications of developing and deploying AI voice models remain unclear. The synthesizing of actors’ voices for use in post-production introduces issues of accountability and consent, as the models could potentially be programmed to say anything. This deployment of AI voice models is one of the key issues at the heart of the current impasse between AMPTP and SAG-AFTRA, with the actors demanding stricter controls over how the models are built and used.
The darker side of AI voice models
Perhaps the most concerning aspect of AI voice models is their use in criminal activity. Fraudsters can now use AI voice clones of people who are trusted by their target, convincing them they are talking to someone they know, and making scams far more likely to succeed. This new technique has become so successful that the Federal Trade Commission in the US has provided a warning to the public on the matter.
Companies that offer AI voice model services will always state their services are not to be used for malicious purposes or without consent, but this is difficult to enforce. A notable example comes from the company Descript, whose Overdub AI voice tool is among the most effective available. Its voice print system was easily defeated by a group who used modified podcast audio from their friend, without his consent, to deceive his coworkers. They published a video on YouTube documenting the process, highlighting how easily accessible and effective the technology is, to encourage Descript to implement greater security.
The risk to those impersonated via AI voice tools can be reputational as well as financial. A video clip circulating on social media recently appeared to show the respected financial analyst Martin Lewis recommending an investment scheme. Lewis issued a statement confirming that the clip was fake (it had been created using deepfake and AI voice technology), but not before some people had fallen prey to the scam. The model of Lewis’s voice on the video was frighteningly accurate, in large part due to the abundance of audio content of his voice that is available online – highlighting the particular reputational risk that AI voice tools pose to high-profile individuals.
Minimising the risks posed by AI voice models
Although such scams warrant concern, measures can be taken to minimise the risk of being tricked by them. In the case of phone scams, the most important action to take is to end the call and initiate contact with the identified person yourself through their established number, as you would with traditional phone fraud. Another option is to create code words and phrases that only your family knows, to help remotely verify their identities. And for those who are at risk of being impersonated, it is wise to keep a record and transcript of all publicly-available audio content that includes your voice.
It is also worthwhile to follow the research on voiceprint authentication. This technology aims to create a unique and unimpeachable biometric record of an individual’s voice that cannot be replicated by AI, to be used for authentication purposes. Voiceprints are already in use by organisations including the Australian Taxation Office and some financial institutions.
While AI voice models can have beneficial uses across multiple industries, the current lack of regulation and understanding of the technology, combined with the ease of access to those wishing to misinform or cause harm, means that they are not without risk. Until a sufficient regulatory framework is established, it is important to remain vigilant when engaging with AI voice models, and to be aware of the risks they can pose.
Privacy Policy.
Revoke consent.
© Digitalis Media Ltd. Privacy Policy.
Digitalis
We firmly believe that the internet should be available and accessible to anyone, and are committed to providing a website that is accessible to the widest possible audience, regardless of circumstance and ability.
To fulfill this, we aim to adhere as strictly as possible to the World Wide Web Consortium’s (W3C) Web Content Accessibility Guidelines 2.1 (WCAG 2.1) at the AA level. These guidelines explain how to make web content accessible to people with a wide array of disabilities. Complying with those guidelines helps us ensure that the website is accessible to all people: blind people, people with motor impairments, visual impairment, cognitive disabilities, and more.
This website utilizes various technologies that are meant to make it as accessible as possible at all times. We utilize an accessibility interface that allows persons with specific disabilities to adjust the website’s UI (user interface) and design it to their personal needs.
Additionally, the website utilizes an AI-based application that runs in the background and optimizes its accessibility level constantly. This application remediates the website’s HTML, adapts Its functionality and behavior for screen-readers used by the blind users, and for keyboard functions used by individuals with motor impairments.
If you’ve found a malfunction or have ideas for improvement, we’ll be happy to hear from you. You can reach out to the website’s operators by using the following email webrequests@digitalis.com
Our website implements the ARIA attributes (Accessible Rich Internet Applications) technique, alongside various different behavioral changes, to ensure blind users visiting with screen-readers are able to read, comprehend, and enjoy the website’s functions. As soon as a user with a screen-reader enters your site, they immediately receive a prompt to enter the Screen-Reader Profile so they can browse and operate your site effectively. Here’s how our website covers some of the most important screen-reader requirements, alongside console screenshots of code examples:
Screen-reader optimization: we run a background process that learns the website’s components from top to bottom, to ensure ongoing compliance even when updating the website. In this process, we provide screen-readers with meaningful data using the ARIA set of attributes. For example, we provide accurate form labels; descriptions for actionable icons (social media icons, search icons, cart icons, etc.); validation guidance for form inputs; element roles such as buttons, menus, modal dialogues (popups), and others. Additionally, the background process scans all of the website’s images and provides an accurate and meaningful image-object-recognition-based description as an ALT (alternate text) tag for images that are not described. It will also extract texts that are embedded within the image, using an OCR (optical character recognition) technology. To turn on screen-reader adjustments at any time, users need only to press the Alt+1 keyboard combination. Screen-reader users also get automatic announcements to turn the Screen-reader mode on as soon as they enter the website.
These adjustments are compatible with all popular screen readers, including JAWS and NVDA.
Keyboard navigation optimization: The background process also adjusts the website’s HTML, and adds various behaviors using JavaScript code to make the website operable by the keyboard. This includes the ability to navigate the website using the Tab and Shift+Tab keys, operate dropdowns with the arrow keys, close them with Esc, trigger buttons and links using the Enter key, navigate between radio and checkbox elements using the arrow keys, and fill them in with the Spacebar or Enter key.Additionally, keyboard users will find quick-navigation and content-skip menus, available at any time by clicking Alt+1, or as the first elements of the site while navigating with the keyboard. The background process also handles triggered popups by moving the keyboard focus towards them as soon as they appear, and not allow the focus drift outside of it.
Users can also use shortcuts such as “M” (menus), “H” (headings), “F” (forms), “B” (buttons), and “G” (graphics) to jump to specific elements.
We aim to support the widest array of browsers and assistive technologies as possible, so our users can choose the best fitting tools for them, with as few limitations as possible. Therefore, we have worked very hard to be able to support all major systems that comprise over 95% of the user market share including Google Chrome, Mozilla Firefox, Apple Safari, Opera and Microsoft Edge, JAWS and NVDA (screen readers), both for Windows and for MAC users.
Despite our very best efforts to allow anybody to adjust the website to their needs, there may still be pages or sections that are not fully accessible, are in the process of becoming accessible, or are lacking an adequate technological solution to make them accessible. Still, we are continually improving our accessibility, adding, updating and improving its options and features, and developing and adopting new technologies. All this is meant to reach the optimal level of accessibility, following technological advancements. For any assistance, please reach out to webrequests@digitalis.com