Automated Assumptions: The Failures of Facial Recognition

 

How technology has increased racial discrimintation

 
Illustration by  Eddie Stok  for Are We Europe

Illustration by Eddie Stok for Are We Europe

 

Following his arrest for a petty theft in 2013, Robert Cannon, a 22-year-old African American male with no prior record, was rated a “medium-to-high risk of reoffending” score of 6/10. Months later, James Rivelli, a 54-year-old Caucasian man, also charged with petty theft, received a low risk rating score of 3/10. Unlike Cannon, Rivelli had several prior felonies including domestic violence, aggravated assault, drug trafficking, and two counts of grand theft, for which he had served a total of five years in a state penitentiary.

Want to keep reading?

This story is free! But if you want to support us, you could spoil yourself with a printed version of this story.

ORDER MAGAZINE

Keep reading

This disparity, like so much else that has been written about the American criminal justice system, reads like yet another case of racial injustice and inequality. One might imagine this ruling took place in an old courthouse in an isolated, rural town lost somewhere within the “Bible Belt.” But the sentence wasn’t passed by a judge’s gavel; it was processed automatically by software designed in Silicon Valley. 

COMPAS, the risk-assessment algorithm responsible for the recidivism risk ratings given to Cannon and Rivelli, is illustrative of a dangerous trend known as “overfitting,” which has accompanied the expansion of machine learning into ever-more sectors of society. A ProPublica investigation into COMPAS found that black defendants consistently received higher risk evaluation scores than were borne out in reality, and that white defendants tended to “outperform” their risk evaluations, reoffending at a higher rate than predicted.

COMPAS was designed to assess risk based on recidivism rates identified by decades of research by criminologists and law enforcement. These trends are reflected in the data points fed into the COMPAS algorithm, which includes considerations of the prevalence of “crime”, “gangs”, “drugs” and “weapons” in the defendant's “neighborhood” and whether they have experienced family separation through “divorce,” “estrangement” or “family criminality.” Yet the problem with abstracting data points from their socio-cultural context, particularly in complex cases such as recidivism rates, is that we run the risk of overfitting data generated by historic trends, and end up reproducing history through deterministic predictions. For the record, James Rivelli would go on to commit a further count of grand theft whilst Robert Cannon, who was predicted twice as likely to reoffend, has no subsequent charges.

(Im)Partial and (a)Political

In the case of COMPAS, it is relatively easy to reverse engineer and rationalize the decision making process that led to Cannon and Rivelli’s risk ratings. This algorithmic process is observable because the data points are interpretable; we can understand why the presence of drugs and local gangs or family criminality may be correlated to recidivism, even if we disagree with the idea that these factors should be determinants in a defendant's eligibility for parole or release. But what happens when the data such decisions are based on becomes unintelligible, or the decision-making process itself is opaque? 

Any sufficiently advanced technology is indistinguishable from magic.

Amazon’s Rekognition tool—facial recognition software that was sold to law enforcement agencies across the U.S.—misidentified 28 members of the U.S. Congress as convicted criminals in 2018. Congressmen and women of color disproportionately “fit the description”  offered by Rekognition software of criminals at a rate of 39%. Amazon could not offer an explanation for this outcome, instead arguing that the confidence interval—the degree of certainty necessary to make a decision—was too low during the experiment. 

Welcome to the Black Box. The first mental image you associate with “black box” is probably the flight data recorder, first invented by the Australian researcher David Warren in 1953, as a way of recordingand hopefully, recoveringcockpit conversations and instrument data in the event of a plane crash. But in Silicon Valley speak, the black box is now a metaphor to describe machine learning algorithms: internally complicated systems that obscure their inner workings from users, and often even their creators, through sheer complexity. 

The black box metaphor is a convenient one for the tech and big data industries. It allows  companies to dress algorithms up as neutral analytic tools under the guise of complex mathematics and empiricism. The opacity of this narrative prevents users and regulators alike from criticizing or scrutinizing their systems, if for no other reason than the trust we impute to mathematics, and our fear of engaging with equations. Arthur C. Clark, author of  2001: A Space Odyssey and so-called “prophet of the space age,” famously stated that “any sufficiently advanced technology is indistinguishable from magic.” If the public testimonies of Facebook CEO Mark Zuckerberg and Sundar Pichai of Google revealed anything about the inner workings of the algorithms that govern our lives, it’s that we don't understand them, and that we can’t sufficiently distinguish them from the occult. 

In her TED talk, Cathy O’Neil, author of Weapons of Math Destruction, argues that “the whole ecosystem of artificial intelligence is optimized for a lack of accountability.” Indeed, although there have been several interventions into the big data industry, they often fall short of expectations. One year on from the introduction of the General Data Protection Regulation (GDPR), a review of “the new digital world order” makes for relatively grim reading. If perception is all that matters, the GDPR could be considered a success; in the weeks after its introduction, users’ inboxes were plagued with swathes of terms and conditions agreements that they did not read, and data protection agencies were swamped by more complaints than they are equipped to deal with. There have been few tangible consequences for GDPR offenders, with behemoths like Facebook and Google being granted a relative stay of execution, receiving negligible fines of $57,000,000 and just $645,000 respectively. Going forward, we can expect these tech giants to continue to circumvent data privacy laws, issuing apologies to the public and paying fines as and when they are called out publicly. 

But the GDPR, though flawed, does provide a foundational platform to introduce further regulation, enshrining the legal “right to an explanation” of decisions made by algorithms. The method with which this legal right is delivered and ensured is—literally—as complex as the algorithms to which it is applied. Calls for transparency, including those from Cathy O’Neil, often reference the need to peer into the black box by analyzing the input data and source code. Transparency, in this case, may not be enough to deliver a satisfactory explanation or interpretation of an algorithm’s inner workings. Artificial intelligence’s inherent value is that it recreates rules and identifies hidden architectures in data invisible to the human eye.

Joy Buolamwini, founder of the Algorithmic Justice League and renowned “Poet of Code,” has published research that explores the effect of using skewed or biased training data for developing facial recognition models. Buolamwini reported that industry standard datasets, such as “Labelled Faces in the Wild,” which Facebook used to develop its own facial recognition software, repeatedly overrepresented white and male faces, at 80% and 75% respectively. In her doctoral thesis, “Gender Shades,” Buolamwini tested three industry leaders in facial recognition software—IBM, Microsoft and Face++—against a new, more representative dataset she had developed. Her evaluation found that these industry leaders performed up to 34.4% worse on darker-skinned female faces than they did on light-skinned male faces. The “Gender Shades” study was able to identify the existence of significant flaws within the software tested, but failed to diagnose the root of these flaws within IBM, Microsoft or Face++ algorithms. That’s because the complexity of machine learning systems makes it near impossible to isolate or scrutinize individual processes that occur within a single system. 

Facial recognition software, like all other significant developments in artificial intelligence, is rapidly expanding into other industries, perhaps most notably in surveillance and crime prevention. Though this may sound like the dystopian future imagined in Minority Report, at least in Spielberg’s world the technology deployed to identify potential criminals in a crowd actually worked. In a trial by South Wales police at the 2017 Champions League Final in Cardiff, facial recognition software falsely identified 2,297 members of the 65,000 strong crowd as criminals in just 90 minutes. As these technologies gradually encroach into crucial sectors of public life, we risk automating our own implicit biases, transposing them into systems and algorithms we cannot fully interpret or understand.

Racial and ethnic profiling is a well-established and recognized problem in policing in Europe. According to the Council of Europe, young African and Arab men in France are 20 times more likely to be stopped and searched than any other group, and in England, black people are 9.5 times more likely to be stopped than whites are. Machine learning based facial recognition tools may serve to replicate, or even exacerbate, these policing trends. As is the case with COMPAS, predictive models are rooted in the datasets they are constructed from. Black and brown faces are more likely to be subjected to police surveillance, and hence more susceptible to be misidentified as potential offenders in the case of false positives. 

Datasets such as “Labelled Faces in the Wild” or the recidivism records that fueled COMPAS are not misrepresentative because of limited access to data, nor, one would hope, are they explicitly prejudiced or designed to discriminate. The overrepresentation of certain socio-cultural and (gendered) ethnic groups reflects historic records and practices embedded in processes that researchers are trying to automate.

In practice, algorithms are founded on a series of assumptions and observations about how a process should take place and what conclusions should be reached. O’Neil refers to algorithms as “opinions embedded in code.” The ways in which these assumptions are translated into code vary. In some algorithms, assumptions are overtly stated, such as COMPAS, where developers chose to include “family criminality” as a key determinant in an individual’s recidivism rate. This assumption was likely drawn from aggregate studies which show that family criminality is an indicator of recidivism. Yet, it is unjust to retroactively apply aggregated findings to individual cases, particularly when these factors are obscured from view or criticism. In machine learning based algorithms, such as those which enable facial recognition, assumptions are implied by training data sets such as “Labelled Faces in the Wild.” These systems can only be as accurate and effective as the data used in their construction.

It is clear that, for the foreseeable future, artificial intelligence will be confined to “narrow” intelligence—the high level performance of a specific and specialized task. As the technical capacity of artificial intelligence has increased, so too does the invasive nature of the tasks that algorithms perform within our society. Vidushi Marda, a lawyer at the human rights organization Article 19, argues that the GDPR’s focus on “the right to explanation falsely assumes that an explanation of how an algorithm works acts to legitimize its use.” Indeed, the fixation on how an algorithm calculates a recidivism score or recognizes a face distracts from a wider ethical discussion about which decisions should be made algorithmically.

Several states across the U.S. have banned the use of facial recognition software for policing in defense of civil liberties and the right to privacy. Elsewhere, in China, facial recognition software provides the foundation for 社会信用体 (or shehui xinyong tixi), a social credit and reputation system seemingly inspired by, or perhaps the inspiration for Black Mirror’s “Nosedive” episode. Going forward, Europe must define its own societal relationship with artificial intelligence—one which does not sacrifice individual liberties in the name of innovation.

Although AI will inevitably play an increasingly significant role in our lives, algorithms alone should not be the sole focus of our regulatory efforts. The EU cannot hope to restrict or ban processes that have not yet been developed, but it can shape and influence the development process going forward. By demanding transparent decision-making by design, challenging the assumptions which act as building blocks for this technology, and questioning the use of AI in subjective or contextual settings, we can create inclusive and accessible technology that benefits all members of society—not just the statistical average. 


 

This article appears in Are We Europe #5: Code of Conscience


Q5-mockup.jpg
Code of Conscience
10.00
Quantity:
Order Now
 
 
 

RELATED STORIES