What if algorithms helped judges decide what words mean?

Bizar Male

The precision and promise of a data-driven society has stumbled these past years, serving up some disturbing—even damning—results: facial recognition software that can’t recognize Black faces, human resource software that rejects women’s job applications, talking computers that spit racist vitriol. “Those who don’t learn history are doomed to repeat it,” George Santayana said. But most artificial intelligence applications and data-driven tools learn history aplenty—they just don’t avoid its pitfalls. Instead, though touted as a step toward the future, these systems generally learn the past in order to replicate it in the present, repeating historical failures with ruthless, and mindless, efficiency. As Joy Buolamwini says, when it comes to algorithmic decision-making, “data is destiny.”

Now, in a corner of legal academia, scholars are debating how best to develop language-usage databases to answer legal questions. Specifically, some urge that judges use a developing field called corpus linguistics to help them answer questions about the meaning of statuary words and phrases. The idea has taken root, and the effort now has dedicated symposia in academic circles, a growing academic literature, and even 30-plus judicial opinions where judges have tried to wield the tool to answer legal questions.

Corpus linguistics is a method by which a large number of natural language utterances—whether in publications or elsewhere in the public arena—are pulled into a database and then plumbed to answer questions about how certain words are generally used. Ever wonder if a bicycle counts as a vehicle or if keeping something in your glove box counts as carrying it? Corpus linguistics would crowdsource the answer. But before we copy and paste that answer into judicial opinions about “no vehicles” signs or criminal sentences for “carrying” weapons, perhaps it’s worth recognizing that sometimes we don’t get the best advice from crowds.

Corpus linguistics—with its quantitative results and the sheer largesse of its datasets—threatens to make available answers look like relevant evidence.

The primrose path here is not without its appeal. Proponents of bringing corpus linguistics to the bench point out that, when judges sit down to consider what a statutory word means, they often start with the question “What’s the word’s ordinary meaning?” Sometimes the idea is that the “ordinary meaning” must be what the drafters intended. Sometimes it’s that this “ordinary meaning” would reflect the fair expectations of the public, which we presume learns of a law’s proscriptions from its text. But whether they are concerned with the intent of the drafters or with fair notice to the public, judges don’t have a single, settled approach to answering that question. Some use dictionaries. Some use Google searches. Some just recite their own impressions without ever backing them up with a citation or source. This status quo, the corpus enthusiasts observe, is messy and disorganized. And it’s probably unjust—you can’t have some parties subjected to Webster’s justice while others’ fates are in the hands of Oxford English Dictionary or the ad hoc Googling of a law clerk. So, proponents say, let’s make the process a little more rigorous and a little more regimented.

But, without care, bringing corpus linguistics to legal opinions stands to create as many problems as it solves.

To begin with, there’s the problem of how to use such a tool. Proponents claim corpus linguistics would inform judges’ “linguistic intuitions” while leaving their “professional judgment” intact. While that seems reasonable enough, it ignores the larger context of the corpus effort. The judiciary’s move toward textualism—a judicial philosophy that emphasizes the text of a law above other considerations in resolving disputes—has not left a judge’s “professional judgement” intact. Rather, strict textualism has put up yellow tape around huge swaths of judicial judgment: DO NOT ENTER. Many jurists now not only start with the question of “ordinary meaning”—they often end there as well, insisting that doing anything more than reporting and applying the “ordinary meaning” of a word would offend the purity of the original legal proclamation. The broad adoption of corpus linguistics on the bench would inevitably lead to its use as a one-stop shop for judicial decrees. Indeed, early cases testing the tool have often struggled to account for context of the legal phrase or succumbed to fallacies such as assuming that rare uses are less ordinary or linguistically accepted.

For example, Justice Clarence Thomas brought corpus linguistics to the opinions of the Supreme Court (its only appearance there so far) with his 2018 dissent in Carpenter v. United States, a case about warrantless search and seizure of cellphone records. In that opinion, Thomas admits that the court has described an unreasonable search under the Fourth Amendment as one that violates someone’s “reasonable expectation of privacy,” a phrase drawn from a 1967 case. But, he argued, “at the founding, ‘search’ did not mean a violation of someone’s reasonable expectation of privacy. … The phrase ‘expectation(s) of privacy’ does not appear [in various databases and] collections of early American English texts.” For this proposition, he cites three corpuses.

Thomas’ argument exposes the hazards of the corpus linguistics tool in practice. It’s styled as an inquiry into the definition of search, but the constitutional phrase is “unreasonable search,” where it is conceptualized as a government action performed against someone’s will. The justice’s simple term-search approach arguably misses the important context of the word. More importantly, though, Thomas’ evidence (or cited lack thereof) does not necessarily support his conclusion. What if, instead of arguing about what counted as an unreasonable search, the legal dispute involved what counted as a sandwich? The phrase peanut butter and jelly wasn’t around back when the good earl, looking for something he could eat without getting up from his desk, first put some meat between slices of bread. Does that mean a peanut butter and jelly doesn’t count as a sandwich? Even textualists generally recognize that, as technological innovation marches forward, language will be forced to adapt—and laws will cover devices (airplanes, radios, cellphones) undreamed of at the time they were written. Surely later turns of phrase can capture earlier sentiments, just like later innovations can be subjected to earlier legal frameworks. An empty return on the “expectation of privacy” query cannot answer whether the founder’s conception of “unreasonable search” could nonetheless be described in those terms, just as a null result for “peanut butter and jelly” could not resolve whether the early concept of a sandwich was flexible enough to consider the later innovation. Corpus linguistics—with its quantitative results and the sheer largesse of its datasets—threatens to make available answers look like relevant evidence.

But even if judges got better at using corpus linguistics, those crowdsourced answers are still just that: They reflect the biases, blind spots, and social norms of the community from which they were pulled. Language usage can have racial and gendered dimensions that play out across community borders and may not be evident in the entries of a dictionary. As Kevin Tobia pointed out recently, the pronoun he is often used with—and would be captured by a corpus as ordinarily referring to—the masculine gender. But in legal uses, it is typically meant—and typically understood by the public—as gender-inclusive. Of course, the bigger concern here is the bias unnoticed or unaccounted for. Tobia reports that only three of the 30-odd cases to employ corpus linguistics to date have even mentioned the matter of having a representative or balanced corpus. And even without the problem of illicit bias, targeting the corpus to the right community is a difficult one: Cast the corpus net too broadly and the “ordinary meaning” inquiry no longer looks like it has much to do with the case or controversy before the court—dock meaning something different to IT specialists than it does to sailors. Cast it too narrowly, and you run the risk of asking a regulated community to define regulated conduct.

What’s more, incorporating corpus linguistics into judicial reasoning may well obscure—rather than elucidate—the biases and judgments that invariably play a role in the ultimate decision. First, the corpus linguistics approach threatens to put the decision about meaning in a black box. Take, for example, the case where a bank’s employees engage in racist lending practices: You can understand the impulse to take the decisions out of human hands and let a computer do the credit assessments. But if you train the computer on past loan applications, you’ve re-created the racist problem, and you’ve made it a lot harder to see. Similarly, while corpus enthusiasts aim to cure the judge of overreliance on flawed intuitions about meanings, the method may well hide its troubling judgement calls—potentially even from the judge—rather than addressing them.

Second, the contents of that black box, once made, will be hard to examine. Where data is destiny, examining the contents and application of a corpus for a particular case would be an important way of assessing whether a query’s returns are to be credited when resolving statutory ambiguity. That’s hard, but not impossible. In other data- or modeling-heaving disputes, litigants have (expensive) tools to do that sort of thing: They hire experts to consider the issue and test the software, and then those experts’ opinions are tested in depositions or through cross-examination. But here, there’s a catch: Expert testimony is for factual questions, not legal ones. While experts often opine on the facts that brought the parties before the court, they’re not allowed to tell the judge what the law means. In fact, testimony to that effect is generally considered improper. Once adopted, the judiciary’s use of and reliance on corpus linguistics would likely be a matter left to the judges’ discretion or the judicial branch’s self-regulation.

This is not the first time the judiciary has tiptoed into algorithmic justice and faced the difficulties associated with dataset bias. As ProPublica detailed in 2016, several state court systems have adopted a proprietary software that purports to put a probability on recidivism for sentencing purposing. But the software appeared to have a racial bias—and the company that coded it wouldn’t open the hood for inspection, sparking debate (and litigation) over proprietary software’s role in the justice system and its relation to constitutional rights like due process. So far, the corpus linguistics effort has led an academic and open-source life, much of it focused on the Corpus of Historical American English, but there’s nothing to stop a company from offering a proprietary solution to the problem of “ordinary meaning” down the road.

A corpus and query tool could help advocates marshal evidence of word usage across certain communities in certain time frames. Developed deliberately and considered as but one line of argument among many, that effort could reward us with insights. But in the end, the corpus approach cannot turn the question of meaning into a purely empirical one. Answering even the most straightforward-looking empirical questions requires the development or selection of measurement tools, the consideration of some data sources to the exclusion of others. Here, as with so many data-driven efforts before, the contents of the corpus and the coded criteria for returning answers will determine the results. Worryingly, these decisions will be hidden in the corpus user’s manual or buried deep within its code.

Like algorithms, people look to the past to learn. But we do justice by looking forward, hopefully having gained some wisdom from that hindsight turn. So: We can certainly ask a corpus about the most frequent or “ordinary” uses of a word in the past. But maybe we should decide just how we will use and review those answers before we beta-test the program on the rights and obligations of the people who must live under the courts’ final judgments.

Future Tense
is a partnership of
New America, and
Arizona State University
that examines emerging technologies, public policy, and society.

Next Post

China urges Britain not to shield criminals

China said it is firmly opposed to any country, organization or individual shielding criminals in any form, after Nathan Law Kwun-chung, a suspect wanted by Hong Kong police, said he had been granted political asylum there. It would be a gross interference in Hong Kong’s administration of justice and violates […]