EXPERTS at recognising faces often play a crucial role in criminal cases. A photo from a security camera can mean prison or freedom for a defendant and testimony from highly trained forensic face examiners informs the jury whether that image depicts the accused. But just how good are facial recognition experts? Would artificial intelligence help their accuracy?
A study in the Proceedings of the National Academy of Sciences has brought answers. In work that combines forensic science with psychology and computer vision research, a team of scientists from the National Institute of Standards and Technology (NIST) and 3 universities, including psychologist David White from Australia’s University of New South Wales, has tested the accuracy of professional face identifiers, providing at least one revelation that surprised even the researchers: Trained human beings perform best with a computer as a partner, not another person.
“This is the first study to measure face identification accuracy for professional forensic facial examiners, working under circumstances that apply in real-world casework,” said NIST electronic engineer P. Jonathon Phillips. “Our deeper goal was to find better ways to increase the accuracy of forensic facial comparisons.”
The NIST study is the most comprehensive examination to date of face identification performance across a large, varied group of people. The study also examines the best technology as well, comparing the accuracy of state-of-the-art face recognition algorithms to human experts. The results from this classic confrontation of human versus machine? Neither gets the best results alone. Maximum accuracy was achieved with a collaboration between the 2.
“Societies rely on the expertise and training of professional forensic facial examiners, because their judgments are thought to be best,” said co-author Alice O’Toole, a professor of cognitive science at the University of Texas at Dallas. “However, we learned that to get the most highly accurate face identification, we should combine the strengths of humans and machines.”
The results arrive at a timely moment in the development of facial recognition technology, which has been advancing for decades, but has only very recently attained competence approaching that of top-performing humans.
“If we had done this study 3 years ago, the best computer algorithm’s performance would have been comparable to an average untrained student,” Phillips said. “Nowadays, state-of-the-art algorithms perform as well as a highly trained professional.”
The study itself involved a total of 184 participants, a large number for an experiment of this type. Of these, 87 were trained professional facial examiners, while 13 were super recognizers, a term implying exceptional natural ability. The remaining 84 — the control groups — included 53 fingerprint examiners and 31 undergraduate students, none with training in facial comparisons.
For the test, the participants received 20 pairs of face images and rated the likelihood of each pair being the same person on a 7-point scale. The research team intentionally selected extremely challenging pairs, using images taken with limited control of illumination, expression and appearance. They then tested 4 of the latest computerized facial recognition algorithms, all developed between 2015 and 2017, using the same image pairs.
Three of the algorithms were developed by Rama Chellappa, a professor of electrical and computer engineering at the University of Maryland, and his team, who contributed to the study. The algorithms were trained to work in general face recognition situations and were applied without modification to the image sets.
One of the findings was unsurprising but significant to the justice system: The trained professionals did significantly better than the untrained control groups. This result established the superior ability of the trained examiners, thus providing for the first time a scientific basis for their testimony in court.
The algorithms also acquitted themselves well, as might be expected from the steady improvement in algorithm performance over the past few years. What raised the team’s collective eyebrows regarded the performance of multiple examiners. The team discovered that combining the opinions of multiple forensic face examiners did not bring the most accurate results.
“Our data show that the best results come from a single facial examiner working with a single top-performing algorithm,” Phillips said. “While combining 2 human examiners does improve accuracy, it’s not as good as combining one examiner and the best algorithm.”
Combining examiners and AI is not currently used in real-world forensic casework. While this study did not explicitly test this fusion of examiners and AI in such an operational forensic environment, results provide a roadmap for improving the accuracy of face identification in future systems.
While the 3-year project has revealed that humans and algorithms use different approaches to compare faces, it poses a tantalizing question to other scientists: Just what is the underlying distinction between the human and the algorithmic approach?
“If combining decisions from 2 sources increases accuracy, then this method demonstrates the existence of different strategies,” Phillips said. “But it does not explain how the strategies are different.”