There have always been suspicions that the microphones listen to us, with reports back in 2013 also alleging that the FBI itself was using this technique to spy on people. And about a couple of weeks ago, the University of Winsconsin-Madison published a report about its findings on how muted microphones were also listening in during video conferencing sessions. A bit surprisingly perhaps, even headphones too can be used as mics to spy on people.
All this is a concern for privacy and there seems to finally be some breakthrough technology, thanks to a new algorithm developed by the Columbia University, which claims to solve or address some of these concerns.
Essentially, the new algorithm"s attack consists of two things. First it will blur and quieten out a person"s speech to quietness levels close to a whisper so that an automatic speech recognition (ASR) AI finds it hard to decipher that speech. Second, it will also predict the upcoming words that are going to be said so that it always remains one step ahead of the ASR. Hence, this new approach is being referred to as "Predictive Attacks".
Carl Vondrick, assistant professor of computer science at the University, has explained in brief how the technology works:
Our algorithm, which manages to block a rogue microphone from correctly hearing your words 80% of the time, is the fastest and the most accurate on our testbed. It works even when we don’t know anything about the rogue microphone, such as the location of it, or even the computer software running on it. It basically camouflages a person’s voice over-the-air, hiding it from these listening systems, and without inconveniencing the conversation between people in the room.
Mia Chiquier, lead author of the study and a PhD student under Vondrick further adds:
Our algorithm is able to keep up by predicting the characteristics of what a person will say next, giving it enough time to generate the right whisper to make.
So far our method works for the majority of the English language vocabulary, and we plan to apply the algorithm on more languages, as well as eventually make the whisper sound completely imperceptible.
It was empirically found that the new algorithm does best when it was predicting 0.5 seconds ahead into the future. This was determined when it was compared against other methods that are used to attack speech samples that include uniform noise, offline Projected Gradient Descent (PGD) and online PGD (real-time).
The algorithm was also tested against standard ASR and its robust counterparts. You can read the underlying study titled "Real-Time Neural Voice Camouflage" in detail here (PDF) and can find the official blog post here.
While the technology is great and sounds helpful, it is still in the study phase. This means it can be at least a little while before we are able to actually use it, if ever.