Google’s DeepMind artificial intelligence system has learned another useful skill. The soon-to-become-Skynet AI can now read lips with a higher accuracy than humans, after it watched thousands of hours of political talk shows.
In a collaboration with the University of Oxford researchers working at Google’s DeepMind trained a machine-learning algorithm to recognize words and phrases by simply watching TV. After going through more than 5,000 hours of TV shows, DeepMind’s “Watch, Listen, Attend and Spell” program understood human words with a 46.8 percent accuracy. That’s up from 12.4% scored by a professional human lip-reader.
You might also remember that these numbers seem small in comparison to LipNet, a different lip-reading project created at Oxford, which scored above 90 percent in accuracy tests. However, LipNet was based on a set of predetermined and limited phrases, while DeepMind’s system was tested more or less in the real world. This means Google’s system may be a lot more robust and ready for public use in the near future.
Of course, there’s one aspect that may have you worried, given the impressive speed at which such systems seem to be developing, and that’s privacy. Could machines spy on us and our conversations in the near future? The answer is obviously yes, though researchers working on this project do explain that there’s still a big difference between lip-reading well lit, HD footage, and taking that system out in the real world. Still, the possibility is very much there.