Audio cloning can take over a phone call in real time without the speakers knowing

Joel R. McConvey
Biometricupdate.com
Tue, 06 Feb 2024 19:41 UTC

Generative AI could be listening to your phone calls and hijacking them with fake biometric audio for fraud or manipulation purposes, according to new research published by Security Intelligence. In the wake of a Hong Kong fraud case that saw an employee transfer US$25 million in funds to five bank accounts after a virtual meeting with what turned out to be audio-video deepfakes of senior management, the biometrics and digital identity world is on high alert, and the threats are growing more sophisticated by the day.

A blog post by Chenta Lee, chief architect of threat intelligence at IBM Security, breaks down how researchers from IBM X-Force successfully intercepted and covertly hijacked a live conversation by using LLM to understand the conversation and manipulate it for malicious purposes - without the speakers knowing it was happening.

"Alarmingly," writes Lee, "it was fairly easy to construct this highly intrusive capability, creating a significant concern about its use by an attacker driven by monetary incentives and limited to no lawful boundary."

Hack used a mix of AI technologies and a focus on keywords

By combining large language models (LLM), speech-to-text, text-to-speech and voice cloning tactics, X-Force was able to dynamically modify the context and content of a live phone conversation. The method eschewed the use of generative AI to create a whole fake voice and focused instead on replacing keywords in context - for example, masking a spoken real bank account number with an AI-generated one. Tactics can be deployed through a number of vectors, such as malware or compromised VOIP services. A three second audio sample is enough to create a convincing voice clone, and the LLM takes care of parsing and semantics.

"It is akin to transforming the people in the conversation into dummy puppets," writes Lee. "And due to the preservation of the original context, it is difficult to detect." With advanced social engineering added to the mix, the size of the attack surface only grows. Outside of fraud, Lee also points to the potential for a new kind of real-time censorship, which could have dire implications for political discourse, journalism and the general fabric of reality.

In light of the ease with which they were able to create a successful proof of concept for dynamic voice hijacking, Lee says it is crucial to recognize that "trusted and secure AI is not confined to the AI models themselves. The broader infrastructure must be a defensive mechanism for our AI models and AI-driven attacks."

Pindrop says software identifies deepfakes more effectively than humans

According to Pindrop, a further complication is that humans are not very good at detecting fake speech. Writing on the firm's blog, Head of Brand and Digital Experience Laura Fitzgerald cites new research from UCL showing that humans could only detect artificially generated speech 73 percent of the time.

"Using generative AI technology, bad actors can inject voice into real-time streams, leading to significant fraud loss, the spread of misinformation, and damaged brand reputation," writes Fitzgerald. The firm says its biometric voice engine, Pindrop Pulse, can outperform humans at deepfake detection.

"In our lab testing with 11 million sample test data sets, Pindrop Pulse can detect a deepfake 99 percent of the time," says Fitzgerald. The tech Pindrop's processes a call's metadata to generate predictions and risk scores. The Passport software provides additional risk analysis based on multiple inputs. Risk APIs display liveness scores in real time, and policies can be calibrated to filter deepfake calls.

The capabilities of AI and LLMs are increasing at speed. "AI performance on benchmark charts can show that it's surpassed humans at several tasks," writes Fitzgerald. "And the rate at which humans are being surpassed at new tasks is increasing." Defenses must be nimble and adaptable, as the curve trends upward into unknown territory.

Reader Comments

hobnob · 2024-02-17T10:01:36Z

A three second audio sample is enough to create a convincing voice clone

One wonders why some nuke hasn't been set off yet. The logical conclusion is that the fake-human deep-throated ones are not suicidal.

isc6822 · 2024-02-17T17:29:23Z

Just like the 'real-time editing' of the scrimmage line in football, or strike zone in baseball, or the addition of 'airplanes striking some buildings' and 'random cameras' that are focused on that Exact location...

Gator · 2024-02-18T18:47:58Z

isc6822 "Just like the 'real-time editing' of the scrimmage line in football, or strike zone in baseball, or the addition of 'airplanes striking some buildings' and 'random cameras' that are focused on that Exact location..."

Maybe we live in a SIM created by deepfake AI already. I recall Elon Musk on Joe Rogan's show and Elon saying Joe how would you know if we were in a Sim right now ? Went right over Joe's head because he replied, " with AI we will eventually get there, right". Elon says no we may be in it now !!

isc6822 · 2024-02-18T19:07:09Z

Gator You can call it a 'simulation', but I recognize that this world is an illusion - albeit, a persistent one OF OUR OWN MAKING.

I've posted these items before, but your suggestion encouraged me to post them again...

Each of us is an Infinite, Multi-dimensional CREATOR Spirit, Experiencing a brief Focus within a particular holographic space-time continuum of our Own Collective Creation in a Consensus-based physical Reality of Probabilities... where ALL Probabilities are Actualized.

We are the painter and the painting.

It up to us, individually and collectively, to expand our consciousness to the point that we understand this concept in it's totality.

Earth is a 'kindergarten' to learn the 'basics' of physical life. It's plainly obvious to me that MANY have Yet to learn even these basics - that Everyone is but a Mirror aspect of yourself. That when you hurt, maim or kill another, you are Actually doing it to Yourself.

I remain hopeful in that there are 'entities' who Watch Us, and try their best to guide us in the right direction, and will do so at the appropriate time - not that they haven't already been of significant influence already, but humans can be quite stubborn, as All children are.

It's time for humanity's Childhood to END.

Gator · 2024-02-18T19:13:07Z

isc6822 Thanks for your reply and insights. Reality is an Illusion albeit a persistent one is one of my favourite quotes, and which I try to live by. I also agree that there are likely entities watching us. The C's which you maybe aware of is one such entity and is apparently us in the future communicating back in time, to provide us a little guidance. Here is a link to the recent transcripts. The next one should be out very soon, as they tend to be released every 4-6 weeks.

Cheers.

lilies · 2024-02-28T19:49:28Z

Sounds like the group 'Security Intelligence' can't wait to receive millions of dollaros sent to them by AI-voice-duped people.

The Saudi Arabia Nuclear Deal EXPLAINED. Nicholas J. Fuentes on Rumble [Link]

SADIWAH

Ukraine's Vladimir Zelensky • IMF Managing Director Kristalina Georgieva in Kiev, Ukraine, on February 20, 2023 Performing jewish theatre for...

Psstoffgoy

I haven't looked it up but I would suspect the Wildberries isn't owned by a JEW, so it's considered "competition" to a jew like Bezos so it must...

Can't have those new data centres running out of power when global surveillance is the new normal, so to keep them operating at 100% we will just...

Crayzee

The top five categories in Hughes' 10 are comprehended in Greer's "Investment Class". They are the people who make money with money rather than...

Eventide

Science & Technology

Audio cloning can take over a phone call in real time without the speakers knowing

Reader Comments

Latest News

Picture of the Day

Quote of the Day

Recent Comments

Quantum Quirk