In the past few years, it has become increasingly clear that a facial recognition system is only as good as the subjects that were used to put it together. Critics and industry insiders alike have tried to raise awareness about racial bias, noting that algorithms that are only trained to identify white faces will struggle when asked to identify Black ones.
Unfortunately, there is not as much consensus about how to address the problem. Watchdogs like Fight for the Future have argued that the technology is inherently flawed, and have asked lawmakers to ban the public use of facial recognition. Technology providers like Onfido, on the other hand, have argued that facial recognition is worth pursuing, and that any bias concerns can be addressed with more representative datasets.
That debate gained a sharper focus following the release of a new paper from a team of researchers at Tel Aviv University. Ron Shmelkin, Tomer Friedlander, and Lior Wolf are members of the University’s Blavatnik School of Computer Science and its School of Electrical Engineering, and claim that they were able to create nine “master faces” that can impersonate more than 40 percent of the general population during a facial recognition scan.
Is Facial Recognition Safe?
At a glance, the report would seem to be damning for facial recognition advocates. If true, it would essentially confirm the worst fears of the technology’s critics. Any face-based identification system would be extremely vulnerable to spoofing, and hackers would not even need a real image of their target’s face in order to execute an attack. They could simply generate a fake master face, and use that to compromise millions of accounts.
So how does the researcher’s system work? The master faces were created with Nvidia’s StyleGAN system, which produces fake (read: computer generated) faces that look reasonably realistic. The faces are not based on any real-world individuals, but they could have a passing resemblance to someone you might meet on the street.
The researchers set out to exploit that tendency, comparing their StyleGAN faces to real photos in the University of Massachusetts’ Labeled Faces in the Wild (LFW) dataset. They then used a classifier algorithm to determine whether the fake faces matched the real ones, and kept the fake images if there was a strong resemblance. Those results were used to train a separate evolutionary algorithm that could produce better fakes when the process was repeated, in the hope that they would capture a higher percentage of the population with each iteration.
That process eventually culminated with the nine master faces detailed in the report. The researchers described them as master keys that could unlock the three facial recognition systems that were used to test the theory. In that regard, they challenged the Dlib, FaceNet, and SphereFace systems, and their nine master faces were able to impersonate more than 40 percent of the 5,749 people in the LFW set.
While those numbers are obviously concerning, there is good reason to question the researchers’ conclusion. Only two of the nine master faces belong to women, and most depicted white men over the age of 60. In plain terms, that means that the master faces are not representative of the global public, and they are not nearly as effective when applied to anyone that falls outside one particular demographic.
That discrepancy can largely be attributed to the limitations of the LFW dataset. Women make up only 22 percent of the dataset, and the numbers are even lower for children, the elderly (those over the age of 80), and for many ethnic groups.
It is possible that another team could use the same process to produce more master keys with more representative data. The University of Tel Aviv researchers try to make that case, arguing that their technique can scale and therefore exposes a major facial recognition flaw.
Even so, their claims that their nine keys are good for 40 percent of the public are exaggerated at best. They also become more dubious when accounting for the facial recognition systems that led to that number. Dlib, FaceNet, and SphereFace are not commercial facial recognition systems. Rather, they are simply the most accurate systems tested using the LFW dataset, which means they are likely to exhibit many of the same biases as the set itself, and lack the robustness that would be expected from a more rigorous facial recognition system.
Caution, Consideration, and Good Data
Given the nature of the data, it’s unclear if the report has any bearing on the real world. There is a chance that master faces could evolve into a significant threat. There’s also a chance that countermeasures like liveness detection make the more advanced commercial systems resistant to such forms of spoofing.
Either way, the report clarifies the ideological stakes in facial recognition debate. For many people, the mere existence of the technology threatens people’s privacy and civil liberties. Even if the numbers are overblown, the damage of a security breach cannot be undone, and that possibility (heightened with a master face) may be enough for some to swear off the technology entirely.
The counterargument is that better development and testing practices can make facial recognition safe enough to use in practical settings. The question is whether or not governments and private companies can be trusted to live up to those standards. The new report underscores the importance of quality data, and developers will need to keep that in mind if they want to alleviate those fears.
August 11, 2021 – by Eric Weiss