In recent years, the development of deepfake technology has created a new frontier in digital manipulation. Deepfakes are generated through artificial intelligence algorithms that can create realistic audio and video clips of people saying or doing things that they never actually did. While this technology has the potential to be used for creative and entertainment purposes, it also poses significant risks and concerns related to disinformation, identity theft, and privacy violations.
One of the most pressing concerns regarding deepfakes is their potential to deceive people into believing that they are real. For example, deepfake audio clips could be used to spread false information or incite political unrest by creating fake statements from public figures or political leaders.
However, researchers are developing new techniques and tools to help detect deepfakes and distinguish them from real audio recordings. One promising approach is to use biometric authentication techniques, such as x-vectors and cosine similarity search, to compare the characteristics of real and deepfake voices.
X-vectors are a type of feature embeddings extracted using deep neural networks, which represent speaker characteristics from audio recordings. These characteristics include features such as the pitch, tone, and rhythm of the voice. By analyzing these features, researchers can create a unique “voiceprint” that can be used to identify a specific individual’s voice.
Cosine similarity search, on the other hand, involves comparing the similarity between the speakers in different audio clips based on their acoustic features. This technique can be used to determine whether two audio recordings were produced by the same person or by different people.
To test the effectiveness of these biometric authentication techniques, researchers at Intelligent Voice recently conducted a study that compared real and deepfake audio recordings of former US President Barack Obama, as well as comparing real recordings to those made by voice actors. In the study, the researchers enrolled the “real” voice of Obama into a biometric database, as well as deepfake voices generated using artificial intelligence, as well as the actor voices. They then compared the real and “fake” voices against five real clips and one fake clip of Obama speaking.
The results of the study were both surprising and reassuring. The system correctly identified the real clips as real above the 80% threshold required for a match, whereas every “fake” audio file was rejected.
The team started with only one “real” audio files to simulate real-life matching conditions, which they took from https://www.youtube.com/watch?v=iaxqTVNHFB0 (labelled “Obama_Real”).
Can’t believe your eyes?
One of the most famous deepfake Barack Obama Videos was put together by actor and director Jordan Peele (it can be found at https://www.youtube.com/watch?v=cQ54GDm1eL0).
The similarity between Obama_Real and the Jordan Peele “fake” audio was only 67.19%. For a match to be considered as accurate the score must be above 80%. However, even though the video and audio are extremely believable, the voiceover is actually done by Jordan Peele himself. So already we are seeing the system is able to detect a fake even in a very believable, human generated, voice.
To test against actual deepfake audio, four clips were obtained from https://this-voice-does-not-exist.com/barack_obama_deepfake which were synthetic voices which sounded like the real Obama to the human ear. Again, only scores above 80% are regarded as a match by the biometric system.
The match between the enrolled real Obama Voice and the AI generated voice was: