Introduction

We are, again, asking what the First Amendment means, where its protections live, and who should enforce them. As legal scholars have argued, the First Amendment is not only a legal doctrine or theory of jurisprudence but also a broader, often invisible, cultural force with looser constitutional groundings than many legal experts would prefer. The First Amendment lives not only in courts and case law but also in “First Amendment institutions” —e.g., journalism, universities, libraries, churches, schools—that, collectively, pragmatically, culturally, define free speech, its protections, and the interests it privileges or harms. Understanding how these institutions see the First Amendment means searching for often hidden ways of thinking about speech that they share and debate. In this essay I suggest that probability is one of these largely invisible logics that has serious consequences for how we think about free speech and the First Amendment.

To understand when and why speech is created, chilled, censored, or celebrated, we can look not only to unconstitutional regulations on speech but also to the social, cultural, and technological forces that are “central to public discourse and its infrastructure.” In arguing for a more socially relevant “institutional First Amendment,” Schauer asks courts to think about the power and particularities of institutions: “What distinguishes categories like viewpoint discrimination, content regulation, public forum, and prior restraint from categories like universities, libraries, elections, and the press is that the former exist in the First Amendment but the latter exist in the world.” Such institutions are not just “legal subjects that depend on the courts to tell them what the law is,” they are places that “develop their own visions of what the First Amendment means, even if that vision is different from the one courts would choose themselves.”

Today, the meaning and force of the First Amendment play out in the new and often unstable technological infrastructures and institutional spaces of social media platforms. Variously seen as media organizations, technology companies, public spheres, private realms, and commercial spaces, these platforms solicit, moderate, circulate, interpret, and rank speech of all kinds. From Facebook and Google to Instagram and Twitter, they “host and organize user content for public circulation, without having produced or commissioned it. They don’t make the content, but they make important choices about that content: what they will distribute and to whom, how they will connect users and broker their interactions, and what they will refuse.”

Although scholars and practitioners alike hotly debate how and why to regulate such platform speech and regularly call for new types of accountability, they are still largely governed through self-initiated and self-designed policies, norms, and technological infrastructures. Platforms largely govern themselves. They encourage users to flag offensive or harmful speech that break the community standards they write. They privately convene and manage content moderators to monitor speech and apply company policies. And they use proprietary machine-learning algorithmic systems to prevent some speech from ever circulating in the first place. Increasingly, though, such self-regulation is proving insufficient. Scholars, activists, and technologists alike are trying to understand why flagging, moderation, and machine learning only sometimes work; they are showing the risks and limitations of relying exclusively on platform self-regulation and looking to social justice and human rights movements for inspiration and models; they are lobbying publics and lawmakers to force conversations about what government regulation or antitrust breakups might look like; and they are questioning the very existence of such platforms, suggesting that they are simply too fast and too big to be effectively governed. With increasing urgency, people from myriad institutional contexts are seeing platform speech regulation as inadequate and fundamentally at odds with the idea of collective self-governance through free expression.

If we accept Horwitz’s and Schauer’s broad, institutional views of the First Amendment and agree that platforms are key institutions where the First Amendment’s meaning and power play out, then we need to understand how platforms understand free speech. More specifically, what logics drive their regulation of speech? What assumptions, judgments, standards, definitions, values, and practices underpin their understanding of speech, and how closely do these logics align with normative ideals of what free speech could or should be? I argue here that probability is one of these logics.

Before examining probability and speech platforms more closely, though, I want to connect two bodies of work from outside the law to show how to see probability as a logic underpinning an institutional—not just legal—vision of the First Amendment. The first is neo-institutional sociology and its core insight that organizational power is explained not only by looking for logics within companies or professions but by examining the “loosely coupled arrays of standardized elements” that cut across institutions and tie them together. Some forces work across organizational settings to make institutions seem stable, predictable, teachable, and worthy of investment. For example, to understand medicine, we should look at how medical schools, hospitals, insurance companies, and state regulators make and remake the languages, routines, commodities, and expertise that make and remake definitions of illness, disease, and prevention. Understanding “medicine” means understanding the myriad “situated knowledges” that make some types of prevention taken for granted, some types of treatment controversial, and some types of expertise debatable. Similarly, to understand museums, we should trace how ideas about curation, preservation, funding, art, and professionalism emerge across the groups of artists, philanthropists, and audiences that define art, debate its stakes, train its practitioners, sustain its economics, and explain its value. Scholars are similarly beginning to use these theories to trace neo-institutional definitions of journalism, algorithms, and press freedom —looking for logics and forces that both contest and stabilize fields that live in no one place.

The second body of relevant scholarship, Science and Technology Studies (STS), asks how affiliations, knowledge, expertise, and risk play out in technological cultures. One of its key (and sometimes controversial) claims is that social life is understood not by measuring impacts of technology on society—artificially separating people and technology into causes and effects misses a fundamental point about how societies work—but, instead, by seeing social life as the inextricable intertwining of materials, practices, relationships, and values in infrastructures. The Facebook News Feed, for example, is not just a tool that ranks content. It is a technology that combines Facebook’s business model, advertisers’ interests, algorithmic signals, user actions, and myriad subcultures of speech, privacy, and social association. To think that the News Feed has an “effect” on its users is to mistakenly hold both the News Feed and people static, missing the fact that they simultaneously reflect and shape each other. STS scholars are concerned with what emerges from relationships between and among humans and nonhumans. They see things like knowledge, expertise, society, and risk emerging from the assumptions, outcomes, categories, stabilizations, aspirations, resistances, and breakdowns that result when people and machines are tightly, inextricably bound.

With these images of institutions and technologies we can return to the First Amendment and ask a more precise question: What loosely coupled arrays of institutionally situated sociotechnical elements govern online speech? Put differently, which intersections of people and machines define the conditions under which people express themselves, circulate speech, encounter ideas, and suffer abuse online? When and how do particular sociotechnical intersections matter to free speech? Scholars of an institutional First Amendment could look across the often invisible institutional relationships and sociotechnical logics governing free speech and ask: Are these relationships and conditions creating the First Amendment that publics need? If not, should we change the sociotechnical logics, or should we revise our understanding of the First Amendment?

For the remainder of this essay, I want to show how these approaches might be used to understand online speech platforms by focusing on one sociotechnical logic: probability. Probabilistic ideas—about chance, likelihood, normalcy, deviance, confidence, thresholds—underpin many of the sociotechnical infrastructures and institutions that regulate online speech platforms. They are often “baked in” to platforms, residing in complex and opaque systems that are hard to see or understand. Further, probability offers a kind of false stability couched in mathematical certainty that is beyond the comprehension of most platform users and regulators (and some makers) but that is routinely offered to provide an illusion of normalcy and predictability. Probability is a type of power that platform makers have a vested interest in obfuscating, mystifying, and controlling.

Just as we understand medicine better by seeing disease as a contestable concept across hospitals, schools, and patients, and just as we appreciate how art emerges from the intertwined power of patronage, curation, and craft, we may better know what free speech means online by critically interrogating probability—as an institutional and sociotechnical phenomenon. If we can see the contingencies and contestations of platform probability, we may not only discover it to be a largely invisible form of platform governance, but we may also be able to make better interventions into platform cultures, saying more precisely why certain forms of probability succeed or fail in their treatment of online speech.

Defining Probability

To know how probability fails or succeeds as an institutional technology, we need to understand a bit about its history and varied meanings. The story of probability is the story of how mathematics gains social power, how counting technologies become political instruments, and how technologists of every era distribute risk and claim confidence at increasingly large and complex social scales.

In their history of chance, Gigerenzer et al. argue that the idea of probability came from the need to formalize expectations, reduce risk, and make visible, verifiable, and defensible judgments about the future. As populations grew, commercial networks expanded and human relationships strayed from face-to-face interactions into increasingly mediated settings. People realized how complicated it was to build trust over distances, to decide which strangers to trade with, and to know which risks were worth pursuing. As the usual sources of authority (religion, philosophy, local community) were called into question or became increasingly complex, people searched for other types of certainty. Influenced by emerging legal doctrines about what human behaviors could reasonably be expected, mathematicians developed a science of “probable knowledge” based on three sources of authority: “physical symmetry” (e.g., coins had two sides and thus offered 50/50 odds), “observed frequencies of events” (e.g., weather incidents could be counted and aggregated into predictable climate patterns), and “degrees of subjective certainty or belief” (e.g., some types of evidence were more or less reliable when trying a legal case or petitioning a judge). Gamblers, farmers, insurers, and lawyers, eager for systems of certainty, were all drawn to the new science of prediction that stabilized environments, making it easier to know how to act within them. Betting, planting, underwriting, and arguing could all move away from superstition and luck. Instead, they could outsource their fears and concerns to probabilistic technologies—dice, coins, frequency tables—that were backed by mathematical and bureaucratic authority, regularly audited, and cared for by trustworthy statisticians.

These systems of probability, though, were limited by their reliance on observed frequencies. They were fundamentally descriptive and only worked within specific stable scenarios. They had little to say about how to make the world be more predictable or how to understand what happened when different conditions intertwined and outcomes were not always easily observable. These limitations laid the groundwork for two innovations in probability: control and conditionality. Though a discussion of these forces may seem to take us away from a focus on free speech, they are arguably at the heart of how platforms understand and regulate mass-scale online expression.

Control

To enable control through probability, new and powerful mass-scale counting practices, technologies, and values began to emerge. Surveys, censuses, standardized observations, philosophies of positivism, and behavioral categories all showed how probability could do more than simply standardize expectations about the world. Probability could show which parts of the world were unpredictable and needed to be controlled. The discovery of standard distributions of behaviors meant that people were “normal if they conform[ed] to the central tendency of such laws, while those at the extremes [were] pathological.” Techniques of probability could say with certainty exactly how common or rare an extreme was and how its variance appeared across a population. If those extremes could be tied to people and behaviors, then they could justifiably be surveilled for deviance, punished for straying too far from expectations, and held up as examples of how deviances harm collectives. Individuals—who were too unpredictable and idiosyncratic for statistical certainty—were reshaped into groups that behaved predictably and rationally so that a new “science of social physics” could produce l’homme moyen—the average man—an entity that voted, purchased, moved, lived, and died in ways that could be seen, aggregated, and shaped. These observations, categories, and databases made societies not just observable but governable. Probability became a technology for “making up people.”

But hints of resistance emerged to claim that people are not the same everywhere. In their history of nineteenth-century probability, Gigerenzer et al. describe how systems of probability in England, France, and Germany represented different human values. English statisticians defended the idea of free will and cautioned against ascribing patterns to individuals: “probability statements should be interpreted as predictions of long-run frequencies rather than as quantification of uncertainty in a particular case.” French statisticians and officials were similarly cautious about probability becoming a tool for controlling individuals but invested heavily in state statistical bureaucracies focused on the idea that understanding the probable behavior of l’homme moyen was the key to making good public policy. People may have free will, but probability could still be an efficient and responsible way to see patterns of autonomy and allocated public resources. German statisticians were the most skeptical of probability’s damage to free will. To the Germans, the “idea that society could be typified by an average man simply reflected an impoverished conception of the human community.” The job of the German statistician was to “break a population up into its various parts . . . in order to learn something about causes” and their varied sources.

Similarly, in his study of how “socialist statistics” in the USSR, India, and China emerged as academic fields, bureaucratic practices, and a symbol of self-knowledge and “hallmark of a modern nation-state,” Ghosh finds that that the very idea of a “social fact” depends on the type of society you think probability serves. In contrast to Western traditions that saw statistics as a mathematical science for reducing societal randomness through random sampling, regressions, and probability theory, Marxist (Soviet-Chinese) statistics saw itself as a social science that rejected the idea of randomness or chance. Instead, it stressed the importance of completely counting everything that could be counted, preparing regular and exhaustive reports of populations, and creating sampling demographics and contexts that could be defended as “typical.” If Western statisticians tried to construct the average person and predict her behavior, Marxist mathematicians aimed to capture entire societies and explain their movements.

Conditionality

Motivated by a desire to study not just patterns but also their variance, some statisticians recast the field as a way of understanding the conditions that influence outcomes. Broadly grouped under the umbrella of Bayesian statistics, the key insight is that models of probability need ways to incorporate new evidence—to learn about the world and change in response to it. If “frequentists” are concerned with the chance of X happening, “Bayesians” try to understand the chance of X happening if Y or Z has also happened—or might happen. It means that Bayesians are not content simply to predict the world as it is; they are more speculative and experimental, eager to understand the set of possible outcomes if we had evidence about other outcomes. Rather than asking “is someone 20 versus 40 years old more likely to commit a crime?”—a frequentist would count instances and arrive at an answer—a Bayesian reframe the question as “under what conditions is someone 20 versus 40 years old more likely to commit a crime?” This puts conditions into play and raises questions about which conditions might matter and why: Does a previous conviction matter? Does gender matter? Does the day of the week matter, or the weather?

Bayesians are building models of the world by observing how the world behaves under conditions, but such models should make us ask where these conditions come from, which conditions are okay to ask about, and which conditions are unacceptable bases for governing social systems. Those conditions can come from existing datasets that describe the world, experimenters’ folk theories, and simulations that imagine complex realities that do not actually exist—if Y were to happen with frequency f, how confidently could we predict the likelihood of Z? With enough data, no hypothesis is untestable, no string of contingencies unexaminable.

This Bayesian approach maps well to the “big data” made possible by omnipresent surveillance technologies and cheap information storage as well as the computational experimentation enabled by machine learning techniques and multivariate analysis. Instead of crafting particular experiments—motivated by theory to create or sample conditions under which beliefs may be true with some degree of probability —today’s Bayesians can use massive data sets and computational power to ask which conditional beliefs meet acceptable thresholds. Bayesian models can probably find enough evidence to support any set of conditions and beliefs. Machine learning statisticians can set belief thresholds and ask: Under which conditions are these belief thresholds met? Instead of asking how likely is X if Y or Z happen, they can ask: Since we have enough data on X, Y, or Z happening (with new data arriving constantly), what is the complete set of beliefs that meet a particular threshold in any given moment? Given a dataset of instances, what are all the things I can believe, now, if I want a confidence level of 95 percent? 80 percent? 70 percent? And what could I believe in the future, knowing that new data are constantly arriving?

* * *

While a complete history of the idea of probability is beyond the scope of this essay, what begins to emerge is a complex landscape of what probability could mean and an image of how these different meanings represent political choices and technological power.

Does probability live in natural systems that have inherent likelihoods—like coins, dice, or the weather? Is probability about observing frequencies of events and building models of as many outcomes as can be imagined and seen? Is probability a control technology, a way of knowing what is likely, normal, deviant, or expected in order to decide how to invest governance and which categories to reward? Is probability actually about sustaining beliefs under particular conditions? To hold a particular belief, what else do we need to know, and how confidently do we need to know it? Or, in an age of large-scale data and near-instantaneous computation, is the very idea of probability an anachronism—a holdover from a time when we could not count completely, were forced to find ways to “tame chance,” and could not yet implement Marxist ideals of complete enumeration for control of entire populations?

Platform Probabilities and Speech

It may seem as if I have strayed significantly from my focus on First Amendment institutions, free speech, and platform governance. My claim, though, is that if we can see platforms asprobabilistic constructions, we might better recognize and redirect the probabilistic logics they use to govern expression and control populations. We never step into the same Google Search or Facebook News Feed twice because the companies that make them are constantly changing their ranking and recommendation algorithms, continually A/B testing new interfaces and options on population samples, and updating the behavioral models they use to convince advertisers that they can reliably reach particular markets. Most fundamentally, platforms are probabilities because there is only a chance of them existing in any particular form at any given moment.

Platform content moderation is also probabilistic. It is a confluence of likelihoods: did an algorithmic filter trigger a computational threshold to block offensive content, did enough users within a particular period of time flag a sufficient amount of content to cause an account to be suspended, and did third-party content moderators evenly apply platforms’ content standards? Many users simply do not know how their content is being moderated, much less the shifting statistical ground on which such judgments stand. As Facebook’s Monika Bickert acknowledges, “A company that reviews a hundred thousand pieces of content per day and maintains a 99 percent accuracy rate may still have up to a thousand errors.”

Even the news headlines, ledes, and advertisements we read are probabilistic—the result of constant, invisible A/B testing as publishers learn which words and images drive traffic and attract demographics. Depending on how confidently platforms can detect them and how interested they are in banishing them, some platform speech—e.g., approximately two thirds of tweeted links —comes from automated bots and fake accounts designed to flood online spaces and create the impression that a viewpoint might be true because it is expressed so frequently.

Similarly, mis- and disinformation are probabilistic phenomena, as makers and detectors of “deep fake” media play continual cat-and-mouse games to create and catch fabricated images, audio, and videos. When Facebook partnered with news and fact-checking organizations to remove such content from its platform, it did not delete such content from its site; rather, it celebrated its statistical ability to “rank those stories significantly lower” and “cut future views by more than 80%.” After criticism that it spreads conspiracy videos, YouTube announced that it would “begin reducing recommendations of borderline content and content that could misinform users in harmful ways.” The move only impacts “recommendations of what videos to watch, not whether a video is available on YouTube.” To be clear, Facebook and YouTube do not say they are unsure whether content is false or conspiratorial; they use private partnerships and proprietary algorithms to categorize speech, label such categories with certainty, and then use probability to strike balance they have define as “maintaining a platform for free speech and living up to our responsibility to users.”

Financial markets, too, reflect probabilistic speech systems. Thomson Reuters and Dow Jones have created systems to parse economic news stories, decide how confidently they understand the story’s meaning, and then drive nearly instantaneous algorithmic trading decisions based on which confidence measures have been met. Even the creators of computational systems that autonomously produce speech have publicly warned that automatically generated news stories pose significant risks to public discourse because they find that readers are too quick to trust algorithmically produced speech; they underestimate the likelihood of encountering algorithmically generated text and overestimate their ability to distinguish them from human-written stories.

We only ever have a chance of encountering speech as others do right now, or as we did days or even minutes ago. Our experiences with online speech are the product of loosely coupled arrays of sociotechnical systems: people, algorithms, commodifications, thresholds, confidences, intentions. At any given moment, this array drives the probabilities that speech appears, circulates, is believed, has monetary value, and drives action.

To unpack this further, consider three domains in which probability governs speech: the chance that content is banned, platforms’ relationships to the risk profiles of professions, and the extent to which technology companies deflect responsibility for probabilistic outcomes to technical infrastructures and third-party designers.

Probably Banned

First, the idea of a speech “ban” makes little sense in probabilistic online environments. Platforms only ever make it more or less likely that speech circulates; they never guarantee the distribution or disappearance of speech. This likelihood depends on a mix of factors determining how content is: created by a blocked, muted, or hidden user; flagged by users; caught by moderators; understood by translators; classified by humans interpreting community standards; sensed by machine learning algorithms; judged similar to previously blocked content; and highlighted by mainstream media outlets.

In practice, whole languages and regions are never susceptible to such bans because they are not being algorithmically monitored. Though Facebook offers its interface in 111 languages, its algorithms can detect hate speech in just 30 languages and “terrorist propaganda” in 19 languages. Its director of public policy for Africa said that “a lot of people don’t even know that there are community standards”—and thus fail to flag speech. No platform has translated its standards and algorithmic tools into all of the languages its interface supports. Such bans are probabilistic—never binary—and only applicable in languages and regions that are monitored.

For example, after reports that the 2017 London Bridge attackers were radicalized, in part, through extremist YouTube videos, parent company Google said that videos it defined as offensive but not in breach of its community guidelines would carry warnings, not be recommended, and be ineligible to earn advertising revenue. The goal, Google General Counsel Ken Walker said, was to make these videos “have less engagement and be harder to find.” Additionally, although he said that Google’s “Trusted Flagger reports are accurate over 90 per cent of the time,” the program would be expanded to include an additional 50 organizations to be trusted flaggers. The additional flaggers would presumably increase the 90 percent figure to an even more acceptable number. Here, calculation represents trustworthiness: mistakes had been happening 10 percent of the time, but the attacks made that old number unacceptable. It had to decrease by some amount.

In internal discussions about whether to ban Alex Jones from its Instagram platform, Facebook used similarly actuarial thinking to defend its initial inaction. The company’s “risk-and-response team” found that Jones’s account failed to meet its violation threshold: “an IG [Instagram] account has to have at least 30% of content violating at a given point in time as per our regular guidelines.” Note the snapshot nature of this threshold: at least 30% violating content at a given point in time, not 30% over the account’s lifetime. The harm threshold is calculated instantaneously, not cumulatively, and stays the same regardless of context. A U.S.-based Facebook executive went on to say that even the comments on Jones’s account were under an unspoken threshold of acceptability: “The 560 comments have been reviewed. Only 23 [4%] are violating and therefore the object does not meet the threshold for deletion.”

Whereas YouTube’s 90 percent figure was meant to reassure users that its mistakes were rare across its platform—90 percent was good and not as bad as 80 percent, but still not good enough to stave off calls for reform—Facebook’s numbers (30% of violating content, 4% of comments) were seemingly commonsense evidence that only a small part of a single account was offensive, not yet offensive enough to warrant action.

Percentages, anthropologist Jane Guyer argues, “are better seen as performatives, aimed to call forth a judgmental response about ‘the way things are going’: it’s getting too much; it’s not fair anymore; it’s reaching a danger zone.” These judgmental responses are, of course, contingent (when is too much too much?) and contextual (who is 90% certainty or 30% violations good enough for?). In online contexts where bans are impossible—Facebook’s head of artificial intelligence admits “it’s never going to go to zero” —it seems past time to debate normative questions about who has the power to define bans as probably “good enough” and for whom.

Professional Alignment

Probability also can be a way to earn legitimacy by aligning with professions that are already socially accepted as imperfect. Such partnerships can be a way of sharing risk, communicating a system’s imperfection, and managing users’ expectations.

For example, in 2017 Facebook began deploying “suicide prevention tools that use artificial intelligence to identify posts with language expressing suicidal thoughts.” It developed “FBLearner,” a proprietary “machine learning engine, to train a classifier to recognize posts that include keywords or phrases indicating thoughts of self-harm.” Putting aside legal concerns about such systems operating outside of “HIPPA privacy regulations, principles of medical ethics, or rules governing research on human subjects,” the system’s success rests upon several types of probability.

Relying on medical practitioners’ statistical models of how words and phrases correlate with suicidal thoughts and actions, the system translates those probabilities into an algorithmic detector. Probabilities from one domain (clinical evidence) are imported into another (social media posts). Posts are parsed, classified, and correlated with suicide indicators; if certain thresholds are triggered, the system alerts emergency responders, who perform “wellness checks” on the possibly suicidal people.

To be successful, the system has to reliably transfer and translate probabilities from one context to another; social media environments have to look enough like clinical settings that each other’s watchwords can be compared confidently enough to trigger action. Designers and clinicians also have to consider the potential effects of such wellness check systems and their popularity: Will knowing that social media environments are being monitored for suicidal language make at-risk individuals more or less likely to express themselves in such places? If human observers know that such a system is in place, will it make them more or less likely to intervene when they see potentially suicidal language, or will they trust that an algorithm driven by clinically reliable patterns is making a better judgment than they would? And consider the system’s tolerance of false positives and false negatives. If the system errs and sees a suicide risk where none exists, or where the risk does not need a standardized wellness check, will suicide-related speech travel elsewhere and stand a reduced chance of being detected when a wellness check might be appropriate? Or, if Facebook claims to be on guard for suicide-related language and then fails to see genuinely suicide-related risks, has it created false comfort and unreliable expectations for suicide victims and their potential support networks?

To be sure, no system—human or computational—for recognizing suicide is error-free, but this system rests on new types of probabilistic assumptions: that suicidal language in social media platforms means the same things as in other settings; that algorithms can confidently parse subtle language, humor, sarcasm, code switching, and complex social contexts; that people share expectations about what the system is, what it is capable of, and how it compares to human judgment; and that the system’s failures and risks of false positives and negatives have been fully anticipated and understood. As with the previous example in which the Facebook AI head acknowledged the impossibility of completely banning content, Facebook similarly issues caveats regarding the riskiness of this system by saying that “we’re not doctors, and we’re not trying to make a mental health diagnosis.”

Infrastructured Probability

Finally, probability can be a way to deflect responsibility for the configuration of technical infrastructures. In its study of Amazon’s face recognition tool Rekognition, the American Civil Liberties Union found that the technology falsely—but confidently—matched the images of 28 members of the United States Congress with people who had previously been arrested for committing crimes. In its defense of the technology, Amazon did not claim that the tool had erred; rather, it said that the ACLU had failed to apply the proper confidence threshold:

While 80% confidence is an acceptable threshold for photos of hot dogs, chairs, animals, or other social media use cases, it wouldn’t be appropriate for identifying individuals with a reasonable level of certainty,” the [Amazon] spokesperson said. “When using facial recognition for law enforcement activities, we guide customers to set a higher threshold of at least 95% or higher.

In response, the ACLU noted that the tool’s default “similarity threshold parameter” is set to recognize faces that are considered 80 percent similar, a default that Amazon’s technical documentation says can be changed. In this example, probability is cast as an institution’s technical responsibility. The ACLU critiques Amazon for creating a tool with an unacceptably low default confidence threshold and not contractually requiring that different thresholds be used in particular contexts. In making it possible for law enforcement officials to license the system and use any standards of certainty they wish, Amazon adopts a standard “buyer beware”defense, suggesting that its technology is just a neutral tool and that it is up to clients set the appropriate confidence threshold. To the ACLU, probability is the system’s linchpin of certainty where the power of prediction plays out; to Amazon, probability is just another option for its customers to set as they like. Responsibility and accountability plays out in the interpretation of probability.

Further examples abound, but what we can see across all of these examples is how probability is simultaneously an institutional achievement (multiple actors convene to debate what probability means and what thresholds of risk, error, or confidence are acceptable), a sociotechnical construct (probabilities live in algorithmic sensors, large scale datasets, approaches to machine learning), and a normative defense (mistakes will be made, technologists are not clinicians, clients know best). Probability is, at once, a seemingly neutral technique, evidence of power, and a rationalization of risk.

As both its histories and contemporary applications show, probability does not mean any one thing. It can be evidence of expectations about the world and observations of seemingly natural frequencies. It can reveal attempts to control the world through categories used to define normality and punish deviance. And it can mean experimenting on the world, discovering the conditions under which certain people—and machines—can confidently hold expectations about patterns and outcomes.

These differences emerge as statisticians debate their techniques and grapple with how the field should see its social and humanistic underpinnings. They also play out in lawmakers’ images of how to govern people, what a “reasonable” person can expect in likely scenarios, and what it means to administer justice when cases sit between the patterned and the particular. And probability appears in the tacit knowledge and work practices of technological cultures that make, test, and deploy statistical systems. As technologists anticipate, simulate, and normalize error, they calculate systems to be “good enough” to deploy into the wild. Deploying a technology also means releasing benefit and risk into the world—enrolling people in failures you know will probably happen, forcing them to live your probabilistic calculations, and relying upon them to report or in some way reflect your errors. Such an intertwining of risk and probability is not new. As Dryer shows in her insightful history of probability’s role in the politics and design of algorithmic systems of the 1920s onward, probability has always encoded—in practices, artifacts, values—human desires for certainty, control, and objectively defensible risk conclusions. Probabilities are not just mathematical constructs or even a social constructions; they are diagnostics of the errors that people and societies are willing, able, or forced to endure. Probability lets us ask precise questions about how consequences are imagined, hoped for, endured, and resisted. It is about making public life.

Probability and Free Speech

Probability matters to free speech and free speech platforms precisely because the probabilities governing communication environments shape our collective ability to see and understand unavoidably shared collective outcomes—to discover ourselves as publics and know our chances of self-governance.

If we take seriously the idea that the First Amendment as currently conceived runs the risk of becoming irrelevant and accept that we need a more expansive vision of the First Amendment as an institutional phenomenon, then we need to know all of the ways that speech is elicited, chilled, and celebrated. Probability is one of those ways. Through their designs, business models, and moderation policies, platforms are constantly making probabilistic, actuarial calculations with the power to shape collective self-governance.

Scholars and practitioners alike are only just beginning to understand how platforms’ sociotechnical dynamics reveal dramatically uneven distributions of probability. Although probabilistic errors may be modeled and represented as collective, shared consequences, they are experienced by particular populations and individual people. How are probabilities distributed? When Facebook says that its fact-checking infrastructure catches 80 percent of falsities on the platform, who endures the harms of the other 20 percent? Or when it says that Alex Jones’s Instagram account isn’t offensive enough to sanction, who endures the unsanctioned hate? When Google says that its flaggers are over 90 percent accurate, who lives with the 10 percent error? When Facebook’s suicide detection engineers acknowledge that “there will be mistakes,” how do we translate this defense into a more active voice—it is not that “mistakes will be made,” it is Facebook making mistakes—and hold them accountable? Indeed, what is the right unit to hold accountable? An individual mental health practitioner who repeatedly made mistakes would be sanctioned differently from a clinic that erred frequently, and differently again from an entire profession that failed predictably. Where exactly is probability and error—and the accountability thereof—in speech systems that intertwine users, designers, algorithms, regulators, and venture capitalists?

We might also consider how different geographic and cultural meanings of probability play out in speech systems. As discussed, different statistical ideologies and practices have appeared, in various ways, in France, England, Germany, India, China, the United States, and the USSR. To what extent do these variations reflect different cultural understandings of statistics, of the types of surveillance and data gathering required to observe and predict behavior, and of the types of error and confidence statistical systems are culturally thought to have? If statistics is not only a mathematical technique but also a social fact, should the machine learning systems designed to monitor and govern platform speech be designed differently before being deployed in particular regions, cultures, or languages? Platforms already fail to provide machine translation tools for all the languages they support, but should the underlying logics and values of those tools differ?

It may seem esoteric, but the quest for certainty may be harming the planet. Curious about the environmental costs of artificial intelligence, computer scientist Emma Strubell and her colleagues studied the carbon footprint of natural language processing (NLP) technologies, like those that have been used to write compelling “fake news” articles. To be reliable enough for peer-review publications and enterprise-level technologies, such technologies require vast amounts of algorithmic training and computing power, quickly becoming energy-intensive processes with significant environmental consequences. They found that training such a model “can emit more than 626,000 pounds of carbon dioxide equivalent—nearly five times the lifetime emissions of the average American car (and that includes manufacture of the car itself).” Models with lower degrees of confidence had smaller carbon footprints. If we want to create and detect manufactured speech with the scale, speed, and certainty that platforms demand, we will harm the Earth. How do we want to allocate speech certainty, given its ecological impact?

Finally, probability matters to free speech because it goes to the heart of what it means to realize and govern ourselves. If, as Hacking argues, “statistics has consequences for the ways in which we conceive of others and think of our own possibilities and potentialities,” then probability is a key logic of humanity. If the chance that our words spread or that we hear others depends on probabilistic systems, then we have a vested interest in seeing probability as a political technology that either helps or hinders our abilities to think, associate, deviate, adapt, resist, or act. And when we limit probability to one type of concept, one particular operationalization or set of values, we limit our ability to imagine new social arrangements. If a group is inchoate, under attack, or actively marginalized—“enclave publics” who need special protections—should those groups be treated differently by probabilistic systems? Perhaps they should not have to endure any of Facebook’s or Google’s errors. Should people with a large number of followers or subscribers automatically have to statistically demonstrate that they are less likely to disseminate harmful speech, because they are closer to traditional broadcasters? (How many followers, what would the statistical test be, and who would make these decisions?) Should data journalists use different levels of statistical certainty depending on the public importance of a story and the potential impacts of their findings on particular policies and populations? Different theories of the public demand different understandings of probability.

Conclusion

Online speech systems exist at a scale that makes human oversight practically impossible. Their scale forces them into “operating actuarially” and using probability to manage that scale. Currently, probability is most often the instrument of those who have vested interests in maintaining large-scale surveillance economies. Scale makes money, and probability enables scale.

But we should not fear probability, reject it outright as a tool of oppression or control, or relegate it to the narrow domain of statisticians and technologists. Rather, we should interrogate speech systems as probabilities that beg new questions. Are these the meanings and applications of probability we need? Who has enough knowledge to confidently predict these systems’ behaviors, and how can the private and proprietary nature of probabilistic knowledge be challenged? Who suffers from false positives and false negatives? Which interests and beliefs are embedded in the Bayesian classifiers that categorize online speech? Which types of errors are known and tolerated, how is risk distributed, and who has the institutional standing or technological power to challenge thresholds, reject error rates, and renegotiate categories? Should platforms be held accountable not just for errors but error rates and confidence thresholds? Should we issue “probability taxes” on those who distribute risk in socially unacceptable ways?

To think anew about probability and the role it plays in speech governance, we need to think anew about the “permanently beta” culture that seems to require failure as a marker of success, encouraging innovators to set error thresholds too low, to move too fast and break too many things.

But not all errors are alike, and they can compound with disastrous consequences. In her study of the Challenger explosion, sociologist Diane Vaughan describes how NASA made “routine decisions” that “normalized technical deviation” so successfully and invisibly that it was blind to the catastrophic failure their ritualized acceptance of increasingly compounded risks caused. Where in platforms—in their construction, use, and governance—are there understandings of normalized deviance, compound risk, and catastrophic failure? Are platform errors always simply actuarial inevitabilities, or are there some that are so morally unacceptable they might be existential challenges to platforms, their technological designs, and their business models?

We are only beginning to understand how and why to regulate speech platforms. Probability should be a central part of this conversation.