A long-held value in democratic society is support for a public sphere: an accessible space for the exchange of ideas, civic engagement, and the development of public opinion.[1,2] Beyond its intrinsic importance in promoting transparency and inclusion, a healthy public sphere plays an instrumental, epistemic role in democracy as an enabler of deliberation, providing a means for tapping into citizens’ collective intelligence.[3–6] In the early 2000s, the advent of social media platforms and new communication technologies promised a high-powered, digital public sphere that would move democracy online and foster a more engaged and informed body politic.[7] Yet, that promise has largely gone unrealized, and many have cited the algorithmic curation of content online as a danger to democracy.[8–10] Why is this the case and what can we do about it? In this article, I aim to address these questions and outline a research agenda to design, deploy, and evaluate algorithms that curate content in ways that support deliberation and enhance collective intelligence online.

From “Liberation Technology” to the “Post-Truth Era”

Just about five years after their respective releases, Facebook and Twitter were touted as literally revolutionary tools. Through its enabling of cheap, fast, and easy peer-to-peer communication at scale, social media was credited with playing an important role in uprisings around the world, such as Iran’s 2009 “Green Revolution,” Egypt’s 2011 Tahrir Square protests, and the 2011 “Occupy Wall Street” movement in the United States.[11–14] Whereas, in the past, information was largely controlled by media institutions that would serve and propagandize on behalf of their political and financial supporters,[15] there was a feeling that social media undercut such gatekeepers and democratized information. Social media appeared to be a “liberation technology” that empowered “citizens to report news, expose wrongdoing, express opinions, mobilize protest, monitor elections, scrutinize government, deepen participation, and expand the horizons of freedom.”[7] While not without its critics, the sentiment around social media and the promise of an online public sphere was generally positive.

But that sentiment soon changed. By 2016, it was argued that we had entered a “post-truth”[16] era in which “a large share of the populace [lives] in an epistemic space that has abandoned conventional criteria of evidence, internal consistency, and fact-seeking.”[17] People had “had enough of experts.”[18] Objective reality could be ignored in favor of “alternative facts.”[19] Rather than ushering in the era of the more informed citizen, the increasing use of social media had coincided with events that seemed to fundamentally question people’s ability to reason, deliberate, and form accurate beliefs.

It must be said, however, that causal links between such events and the rise of social media have proven difficult to establish. For instance, take the popular claim that that the decentralization of information on social media polarizes people by presenting only belief-reinforcing content in so-called “echo chambers” and “filter bubbles.”[8,9] This claim became so mainstream that President Barack Obama cited it as a key threat to democracy in his 2017 farewell address.[20] Yet, the empirical merits of this claim are widely disputed in academia,[21–24] and the polarization seen in the United States today can be traced through congressional voting records back to the 1970s and 1980s, well before social media as we know it even existed.[25]

We could equally consider the claim that social media use promotes conspiratorial thinking. Indeed, it seems intuitive that conspiracy theorists, who would otherwise be restricted to the fringes in the analog world, can coalesce in online forums and reinforce each other’s beliefs.[17,26] For example, the QAnon conspiracy theory—which centers on claims that blood-drinking, pedophilic elites are working to take over the world[27]—is understood to have originated on the anonymous 4chan forum in the “politically incorrect” (/pol/) imageboard.[28] However, conspiracy theorizing has also punctuated much of our pre-social-media history—from speculation that Emperor Nero deliberately had Rome burnt down whilst singing in AD 64, to the Salem Witch Trials in 1692 to 1693, to the “Red Scare” of the 1940s and 1950s—and empirical analysis suggests that conspiratorial content has varied over time but has not substantially increased.[29,30]

Nevertheless, it would be inaccurate to conclude, based on the absence of consensus, that social media does not have detrimental effects on society or fails to challenge people’s ability to reason and deliberate. Despite the heterogeneity in findings, a recent systematic review of the literature conducted by Philipp Lorenz-Spreen and colleagues uncovers some striking patterns. Across 496 academic articles, increased use of social media and the internet was generally found to be associated with increased exposure to misinformation, decreased trust in institutions, and growing polarization.[31] Interestingly, however, it was simultaneously observed that, in general, use of such technology is associated with increased political knowledge and participation.[31] Interpreting these results is not straightforward. Findings vary depending on how variables of interest are defined and on the political context in which they are studied. Still, it seems uncontroversial to say that the sentiment around what social media is, and what it could be, has taken a negative turn. The early promise of an online public sphere has not come to fruition.

Understanding the Role of Algorithms in the Online Public Sphere

How did social media go from being viewed as a “liberation technology” to becoming a driving force of misinformation and polarization? Why has increased access to information and communication not seemed to translate into a more informed public?

There is no single answer to these questions. For example, it would be naïve to say that long-term, offline societal trends—like growing economic inequality[32,33] and declining social trust in the United States[34,35]—have done no harm to the public’s ability to communicate and reason collectively. It would equally be naïve to pin blame on either technology or users alone without acknowledging the feedback loops between them.[36,37] But while we have no concise, certain explanation, we are not clueless. To begin to understand why social media has not provided the online public sphere it was hoped to provide, a combination of factors should be acknowledged: information overload, algorithmic amplification, and engagement-based ranking.

The first factor stems from the observation that the advent of social media and the internet at large has brought us into an information-rich world. Unlike the past, when limited access to information imposed a bottleneck on our knowledge, this is no longer the case. But as recognized early on by Herbert Simon,

the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention.[38]

In other words, we now find ourselves faced with information overload. It is impossible for people to engage with all information available to them.[39,40]

In response to this information overload, we have come to rely on content-curating algorithms such as those underlying recommender systems on social media like Facebook, Twitter, and TikTok, and search engines like Google, Bing, and DuckDuckGo.[41] These algorithms provide a truly indispensable service. If I want to, say, learn how to bake a Basque cheesecake, I can instantaneously retrieve recipes and instructional videos thanks to Google’s search algorithm, which presents me with the most relevant content based on my personal search history and the linkage structure among webpages.[42,43] Without this algorithm, I would be left to scroll and click through masses of content that likely having nothing to do with cake at all. I still have agency with respect to which of the algorithm’s recommended recipes and videos to attend to, but I let the algorithm do much of the work for me and significantly narrow my focus.

Other the other hand, the algorithms we rely on to navigate through the overload of information online are not neutral. By design, they tend to promote some types of content while suppressing others—a process sometimes referred to as algorithmic amplification.[44,45] While this is innocuous or beneficial in many instances, such as my search for a Basque cheesecake recipe, algorithmic amplification has been argued to have a dark side.

As currently implemented, algorithmic amplification is inherently paternalistic. The underlying algorithms are designed by platform engineers who get to decide what it means for some content to be “relevant” to some user. Since most platforms have commercial goals to retain their users and maximize revenue, they are incentivized to design algorithms that amplify content in ways that achieve these goals. Translating these goals into concrete measures so an algorithm can select which single content items should be shown to which individual users at a given point in time requires some abstraction, which is why most platforms design their algorithms to optimize for engagement.[37] Engagement—be it likes, retweets, and replies on Twitter or comments and watch time on YouTube—serves as a proxy for platforms’ commercial goals. When users are engaging with content on a platform, it at least means users are on the platform. So, by designing algorithms to amplify content that users are likely to engage with, the understanding is that platforms can best achieve their commercial goals.

This approach to algorithmic amplification where the content that is most likely to be engaged with is prioritized in a user interface or “feed,” is typically referred to as engagement-based ranking[46,47] (for a description of how ranking fits into the broader architecture of algorithmic amplification and recommender systems, see [48]). Although specifics vary from platform-to-platform, and there are often many measures being optimized for in concert with engagement, engagement-based ranking is known to be at the core of most social media platforms.[37]

What does this mean for the online public sphere? First, it is important to recognize that it is not inherently bad. When people choose to engage with something online, they presumably get some value from doing so.[49] I subscribe to and spend more time on YouTube channels that I enjoy watching; I engage with my friends on Facebook because it helps us stay in touch while geographically separated. But, under the pressure of information overload, the content that typically gets engaged with is content that is attention-grabbing, and content that is attention-grabbing is not necessarily informative or conducive to effective deliberation. For example, studies suggest that the content that is most likely to be engaged with or “go viral” is content that is laden with negative emotion,[50,51] divisiveness,[52,53] and falsehoods.[54]

It is difficult to quantify the effects of algorithmic amplification and engagement-based ranking, and we lack decisive evidence as to whether it helps or hinders people’s ability to reason and form accurate beliefs.[45] Yet the general implication is rather clear: current social media platforms and their content-curating algorithms do not provide an online public sphere that is suited to informative, democratic deliberation. Online, “the unforced force of the better argument”[55] does not seem strong.

Helping Users Navigate the Online Public Sphere

Given widespread concerns surrounding the impact of social media, content-curating algorithms, and the internet on society, there have been notable efforts to develop tools and interventions to support people’s navigation of the online public sphere. The basic idea is that by making small changes in the user interface, supporting users’ digital competencies, or by regulating platforms to be more transparent, people might be able to have more informative and less misleading experiences online (for a detailed overview, see [56]). For example, third-party “fact-checks” and crowdsourced “community notes” can be integrated on social media as content warning labels;[57,58] simple prompts can be used to nudge people towards self-reflection before disseminating content;[59] gamified “inoculation” interventions can reduce people’s susceptibility to common misinformation techniques;[60] lessons on digital literacy and “lateral reading” can boost people’s ability to accurately evaluate content;[61,62] the EU’s’ General Data Protection Regulation gives users more control over their data to guard against unwanted personalization of content.[63]

All of these efforts are valuable and worthwhile. They help people get the most out of access to information online without encroaching on their agency. However, these existing efforts have tended to focus on just part of the picture. They target user behavior and cosmetic changes to the user interface—the “frontend” of the online public sphere—while ultimately leaving much of the burden with the user. For instance, fact-checks can add important context to content to help users’ evaluation, but it is still up to the user to judge whether the third-party fact-checking organization is credible. And research suggests that even debunked falsehoods can persist in people’s reasoning.[64–66] Similarly, while nudges, boosts, inoculation games, and disclosures of personal data collection can warn and prepare users for deleterious aspects of the online public sphere, they all require users’ attention and self-control, which already seems scarce in the face of information overload. Based on our understanding of the role content-curating algorithms, more focus should be devoted to (re)designing the “backend” to change the way content is algorithmically ranked and amplified.

(Re)Designing Algorithms for a Better Online Public Sphere

What if we viewed algorithmic ranking and amplification not as a threat to be mitigated, but as an opportunity? We need algorithms to help us navigate information online, but those algorithms need not optimize for engagement. Is it possible to engineer a better online public sphere by changing the algorithms that mediate it?

In a promising new stream of research, it has been pointed out that the algorithms underlying recommender systems on some of the world’s largest platforms and apps can be modified to better align with human values.[67,68] Drawing from the broader literature on “AI Alignment,”[69,70] the general framework for doing so involves identifying content types of interest (e.g., high-level categories like “meaningful social interactions”[71]), developing concrete metrics so such content can be labelled, and then adjusting and training algorithms on that labelled data to alter the prevalence of content types in recommendations as desired.[67] Encouragingly, there are already examples of this being done in practice.

At Spotify, for instance, it has been recognized that an exclusively engagement-based approach to recommendation would result in a rich-get-richer feedback loop, whereby the popularity of top performers gets rewarded over and over again while new artists have little opportunity to break into the market. For this reason, Spotify researchers have included a “fairness” metric in their algorithm so that new artists, and artists of different demographics, are more likely to be included in recommendations and playlists.[72]

At YouTube, criticism that their engagement-based approach to recommendations amplified toxic videos was rebutted with an explanation that, “since 2017, YouTube has recommended clips based on a metric called ‘responsibility,’ which includes input from satisfaction surveys it shows after videos.”[73] Although this measure still relates to engagement, the idea is that it is more socially responsible to optimize for what users like in a deliberative, reflective sense, rather than what they impulsively click on (although there are limited details on how extensively this approach is, or was, used).

Both the Spotify and YouTube examples demonstrate how it is feasible to redesign algorithms for a better online public sphere. But we can also go a step further. Just as we should recognize that the status quo of engagement-based ranking can be improved, we should also recognize that we do not need to limit ourselves to existing social media platforms. If we want to realize the early promise of a democratic online public sphere, it seems worthwhile to consider how we might design algorithms to power entirely new, purpose-built civic platforms. By this I mean to stress that, when we envision how to design better algorithms, it is not a legitimate argument to say that the only algorithms of interest are those that would work for Twitter, Facebook, or any other existing platform.

Although there is relatively limited academic study on how to best design algorithms for purpose-built civic platforms, researchers and technologists have begun to recognize and encourage development in this space. In a recent seminar, for example, political theorist Hélène Landemore explained how algorithms can help scale up democratic deliberation by sorting, clustering, and moderating information to “unburden” individuals so that they can focus on the substantive task of evaluating and communicating arguments.[74] In fact, this kind of algorithmically supported deliberation is already being put into practice with tools like the Stanford Online Deliberation Platform and Decidim. The most well-known example, however, is Polis.

Polis is an open source web application where users deliberate on a desired topic by submitting statements and voting on others’ statements (agree, disagree, or pass).[75] Through this process of statement elicitation and voting, a sparse matrix is created where each row represents a user and each column represents a unique statement, with individual data points indicating specific votes. By then applying Principal Component Analysis and K-Means clustering algorithms to the matrix, Polis identifies and visualizes opinion groups among users in real-time. This mapping of the opinion space can support deliberation by illuminating points of consensus and dissensus, but beyond this, it also serves to inform the order in which statements are shown to users. Each statement gets assigned a priority based on whether the statement is likely to aid in identifying distinct clusters, build consensus, or if it is new to the conversation.[76]

Though Polis was purposefully designed with deliberation in mind, the basic architecture of how it curates content is not entirely unlike how social media platforms curate content. Of course, the visualization of opinion groups is unique to Polis, and the matrix factorization it conducts would be difficult to scale to, for example, Twitter. Yet, both Polis and mainstream social media identify clusters or “communities” of users, and both Polis and mainstream social media algorithmically rank statements or “posts” based on predictions of their impact. The key difference is that where social media platforms prioritize posts that are predicted to receive engagement, Polis prioritizes posts that are likely to contribute to “group-aware consensus.”[76] In doing so, Polis has had notable success in Taiwan, where it has been used to host large-scale, public deliberation events on contentious policy topics, like how to regulate Uber, or whether Taiwan should share a time zone with mainland China.[77,78]

Taken together, the ongoing work to align recommender systems with human values and to integrate algorithms into civic deliberation platforms underscores a key point: algorithms are not the enemy, they just need a makeover. Algorithmic ranking and amplification can play an influential, positive role in shaping the online public sphere.

How to Optimize for Collective Intelligence

Once we acknowledge that the algorithms driving social media recommender systems and civic platforms can be programmed to optimize for socially beneficial outcomes, it seems reasonable to ask: Can we optimize for collective intelligence? Specifically, can we design algorithms to curate content in ways that support deliberation and improve the accuracy of our beliefs?

As already alluded to in this article, a healthy public sphere has instrumental, epistemic value; it helps the public develop accurate collective beliefs.[3,5,6] Although it seems intuitively desirable for a public to be collectively accurate, it is worth reflecting on why exactly collective accuracy is important.

Consider some of the global challenges we face today: pandemics, climate change, nuclear war, and the advent of disruptive artificial intelligence. Regardless of any cultural or ideological principles, a necessary first step to managing these challenges is to accurately judge or forecast risks so that we can efficiently allocate finite resources. For example, if the immediate risk of climate change negatively impacting the population is significantly higher than the risk of artificial general intelligence negatively impacting the population, then it would seem sensible to allocate more investments and public attention to mitigating climate change. Such risks are of course difficult to quantify, but ultimately, there is some ground truth out there, and arguably, it is our ability to collectively align with that truth that determines what the future will look like.

Given the importance and intuitive appeal of collective accuracy, it seems worthwhile to develop algorithms and design the online public sphere in ways that help us become more collectively accurate. How could this be achieved? One natural first thought might be to algorithmically amplify accurate content. But, in many real-world contexts the truth is fundamentally uncertain (e.g., forecasting some future event, or estimating the prevalence of a disease with limited tests and asymptomatic cases), meaning it is often impossible to quantify factual accuracy at the level of single content items (e.g., individual tweets). Alternatively, we could remove content items that are known to be false, but this approach also requires knowledge of a ground truth and raises concerns of censorship.

A second thought might be to amplify users with track records of accurate judgments on a given topic. While this is feasible for contexts in which we have ready access to a history of repeated judgments from individuals (e.g., on a collective forecasting platform like Metaculus), such contexts seem rare in general. Users’ track records are often inaccessible, judgments announced online are often vague and imprecise, and even in contexts where a verifiable track record is present, it is unclear how it should be factored in when users make out-of-domain judgments or forecasts on entirely novel future events (e.g., expert physicists and neuroscientists predicting COVID-19 cases [80,81]).

Much like social media platforms’ goals of user retainment, ad revenue, or “meaningful social interactions,” collective intelligence is a high-level goal that is difficult to operationalize. This is perhaps why collective intelligence—specified here as collective accuracy in judgment and decision-making—is yet to be translated into optimizable, low-level metrics despite work having already articulated “knowledge” and “accuracy” as human values.[68] Nevertheless, existing research on wisdom of the crowd effects and argumentation theory provides promising insights to pursue.

How our understanding of wisdom of the crowd effects can help translate collective intelligence into algorithmic terms

The body of literature on so-called wisdom of the crowd effects—whereby the collective judgment of a group is more accurate than the judgments of individual experts or the individual group members themselves—is rich and long-standing.[82–85] The basic logic underlying such effects is that despite any given individual’s judgment being affected by some error, aggregating across many independent individuals leads to those individual errors cancelling out, hence the robustness of the “majority rule” voting principle.[86]

With respect to how algorithms could be designed to amplify content for the betterment of collective accuracy, the literature on wisdom of the crowd effects can assist in identifying low-level metrics to operate on. For instance, existing studies propose concrete ways for identifying accurate individuals in a crowd without access to the ground truth: accurate individuals tend to display greater resistance to social influence[87,88] and make judgments that are similar to the judgments of others (since there’s one way to be correct but many ways to be wrong).[89,90] In other studies, it has been demonstrated that there are systematic relationships between the distribution of individuals’ beliefs and how social influence will affect collective accuracy. For example, when individuals’ beliefs are normally distributed, collective accuracy tends to benefit from decentralized social influence (i.e., every individual contributes equally), but when that distribution is skewed with a fat tail, collective accuracy tends to benefit from centralized social influence (i.e., some individuals are amplified more than others).[91] Elsewhere, seminal research on the diversity prediction theorem shows that the cognitive diversity of individuals is as important to collective accuracy as the average competence of the individuals, since it is the presence of diversity that protects a group from accuracy-degrading bias.[92,93]

Taken together, these findings from the literature on wisdom of the crowd effects offer valuable insights. They tell us that, to optimize for collective accuracy, we do not necessarily need to quantify accuracy at the level of content items, nor do we need access to judgmental track records to identify accurate individuals, and in contexts where identifying accurate individuals is not possible, promoting diversity may suffice. Recent proof-of-concept studies that build on these findings show that it is possible to algorithmically amplify some users based solely on where they are positioned in a distribution of beliefs, without knowledge of the ground truth or individuals’ judgmental track records, such that the resulting collective, aggregate judgment following deliberation is more accurate than it would be otherwise (Box 1).

How our understanding of argumentation theory can help translate collective intelligence into algorithmic terms

Another body of literature that could contribute to the development of low-level metrics for algorithms to operate on is argumentation theory, which examines the structure, dynamics, and normative validity (as opposed to descriptive persuasiveness) of arguments.[94,95] Existing research in this domain has drawn from philosophy, psychology, linguistics, and computer science to develop abstract representations of arguments,[96–98] formalize methods for evaluating argument strength,[99,100] and automate tools for mining arguments and their features from text.[101] For example, research on Bayesian argumentation proposes that the normative strength of competing arguments could be compared by computing the belief change prescribed by Bayes’ rule for each one while holding the prior and hypothesis in question constant;[99] machine learning models can predict argument strength in student essays;[102] and systems leveraging language models can automatically generate knowledge graphs from The New York Times articles.[103]

This stream of research begs the question: What if we could algorithmically amplify good arguments online? If algorithms could classify and promote strong, well-structured arguments over fallacious, weak arguments, this could seemingly foster more informed deliberation online.

However, there are both conceptual and technical challenges that limit the immediate application of argumentation theory to the concrete design of algorithms for recommender systems and civic platforms. First and foremost, despite various proposals in the literature, what constitutes a good argument and how to evaluate it remains an open question among argumentation theorists,[99,104] and there is inconsistency when it comes to how researchers operationalize components of arguments.[105] Second, existing tools that have been developed for argument mining have generally been tuned to rich text in the form of monologues and structured dialogue, making it difficult for them to be directly applied to messy, naturalistic, short texts such as those commonly posted on social media or civic deliberation platforms.[101,106] Third, there is a lack of empirical studies testing whether arguments that are normatively strong per argumentation theory can actually lead people to arrive at more accurate collective judgments. For these reasons, more research is necessary before argumentation theory can directly contribute the low-level metrics needed for the design of algorithms. Nevertheless, the relevance and opportunity for such contributions seems clear, and further investments in this space should be encouraged.

Box 1. Proof of concept: Algorithmic amplification can benefit collective accuracy in online social networks.

In online settings like social media, deliberation often takes place across network structures where individuals can only communicate locally with others with whom they share a connection. While deliberation across certain network structures can hinder collective decision-making by making groups more susceptible to “groupthink,”[88,107] it also seems plausible that online network structures may be engineered so that deliberation is more likely to increase the accuracy of collective decisions. Based on this observation, we introduced rewiring algorithms as a potential new tool for supporting deliberation online.[108,109] Rewiring algorithms are simple, programmable rules that are applied to deliberating social networks to dynamically change who communicates with whom depending on the numerical beliefs they report. In the language of algorithmic amplification, rewiring algorithms amplify select users by increasing their visibility to other individuals in the network.

Through agent-based modelling, we first prototyped different rewiring algorithms to see how they would influence collective accuracy among simulated agents deliberating over which of two alternatives is true (just as real people might deliberate over whether some future event will occur). Crucially, the algorithms we designed operate solely on the distribution of individuals’ beliefs, without any knowledge of the ground truth. For instance, we tested an algorithm that amplifies the individual who, at any given time, holds the most extreme belief that aligns with the majority’s favored alternative (the mean-extreme algorithm), and an algorithm that exposes the median individuals to those who hold eccentric beliefs on the tails of the distribution (the polarize algorithm). Our simulation results show that both of these algorithms can improve collective accuracy, depending on how the individuals’ beliefs are distributed before deliberation.[109]

To empirically test the conclusions of our modelling, we next conducted an online multiplayer experiment in which we recruited crowdworkers from Amazon’s MTurk platform onto an experimental web application to collectively forecast near future events (e.g., “Will the U.S. rejoin the Paris Climate Agreement by 8 February 2021?”). Each crowdworker joined a 16-person social network and then proceeded through four rounds of deliberation, where they provide probabilistic forecasts along with short, written rationales and viewed those of their network neighbors. Unbeknownst to the crowdworkers was that they were randomly assigned to a specific algorithmic condition: they were either in a control condition where deliberation took place across a static network structure, or an experimental condition where deliberation was mediated by one of our rewiring algorithms. Upon analyzing the collective, aggregated forecasts of these networks, we found that, on average, deliberation mediated by the polarize algorithm increased accuracy, whereas deliberation in static networks decreased accuracy. Moreover, the deliberation mediated by the polarize algorithm was less likely to produce extreme, potentially overconfident collective forecasts.[108] While our studies face limitations of ecological validity, the results of our simulations and online experiment demonstrate that it is possible for algorithms to mediate deliberation online in epistemically beneficial ways.

A Research Agenda to Develop Algorithmic Ranking Systems for Collective Intelligence

Based on our current understanding of the role of algorithms in the online public sphere and the literature on wisdom of the crowd effects and argumentation theory, it seems worthwhile to explore how algorithms could be designed to promote collective intelligence. While it would be impossible to do so with any single study, progress can be made through an organized research agenda similar to the phased approach recently proposed for aligning recommender systems with human values.[67]

Definitively, the overarching objective of this research agenda is to design, deploy, and evaluate algorithms that promote online deliberation and collective intelligence—defined here as the objective outcome of collective accuracy. The goal is for such algorithms to eventually be integrated as online deliberation tools for civic platforms, and further, to potentially feature in or inform the design of social media platforms’ recommender systems.

In the remainder of this section, I outline two ongoing phases of research aiming to contribute toward this objective: identifying low-level intelligence metrics and conducting experiments with algorithmic ranking.

Identifying and formalizing “intelligence metrics”

The first phase involves identifying and formalizing what I call “intelligence metrics”—metrics that quantify whether some content is likely to elicit belief updates from specific users that would benefit collective accuracy. The purpose of these metrics is to enable algorithms to generate rankings or “feeds” of content for users, just as engagement-based ranking algorithms do. Intelligence metrics can be estimated from different data types depending on the platform on which users are deliberating, such as users’ explicitly reported beliefs, user-generated text, users’ engagement with content items, and social network structures.

On purpose-built, civic platforms, it is possible to get precise, explicit reported beliefs from users. For example, Metaculus users interact via comments while providing their own quantitative forecasts, be it estimating the value of a company’s future market shares or the probability of some transformative future event occurring. In this case, an intelligence metric for a given comment could be as simple as the distance between the commentor’s elicited belief and another user’s elicited belief, or the ordered position of a commentor’s elicited belief within the distribution of all beliefs pertaining to that topic (as in our proof-of-concept studies, Box 1).

On both civic platforms and social media platforms, users deliberate by exchanging text through posts, comments, and replies. Deriving intelligence metrics from text could involve generating embeddings and comparing cosine similarity, argument mining to identify structurally sound arguments, extracting knowledge graphs, or even be as simple as counting keywords. For instance, research has found that accurate forecasters tend to justify their forecasts with more dialectical complexity (using words like “however” and “probability”) and references to comparison classes (using words like “last” and “was”).[110]

On platforms where users can upvote, downvote, “like,” or otherwise engage with specific content items, these recorded indicators of engagement can also be used to derive metrics for algorithmic ranking systems. This is the basic mechanism for engagement-based ranking, where an algorithm is trained to predict which content items are likely to receive high levels of total engagement (i.e., a weighted sum of upvotes, replies, etc.) and subsequently rank those items higher in a feed.[37] An intelligence-based ranking algorithm could also leverage these engagement data, albeit in different ways. For instance, an intelligence metric derived from engagement data might involve calculating a ratio of exposed, upvoting users from one opinion group over exposed, nonengaging users from another opinion group. If a content item is upvoted by all those who have seen it in one opinion group but ignored by most of those who have seen it in another opinion group, then it might be beneficial to amplify it among users of the ignoring group to ensure it is not overlooked.

Lastly, on social media platforms, intelligence metrics could be derived from data on social network structures that indicate follower relationships or other connections. For example, recent research suggests that for large populations, the presence of small, modular sub-groups (or cliques) may benefit collective accuracy by protecting the population’s overall diversity.[111] An intelligence metric derived from data on social network structures might thus involve identifying whether a content item was generated by a user from a specific clique in the population, such that amplification within and between cliques can be managed.

Of course, no single data type and no single intelligence metric will give rise to an algorithmic ranking system that can realistically enhance collective accuracy. Through experimentation, however, the hope is that some combined operationalization will have a meaningful, reliable, and positive effect.

Experiments with algorithmic ranking

Evaluating the effects of algorithmic ranking and amplification on people’s beliefs is notoriously challenging. Given the complex feedback loops between users’ behavior and algorithms, simply looking at an algorithm’s code cannot tell us much.[45,112] Even when rich, large-scale data from social media platforms documenting users’ real behavior is available, it can be difficult to distinguish meaningful, causal effects from spurious, statistical artefacts.[113] With these challenges in mind, it seems worthwhile to borrow from long-standing traditions in psychology and take an experimental approach.

Controlled experiments allow us to evaluate cause-and-effect relationships between interventions and outcomes. In the case of this research agenda, the interventions are different algorithmic ranking schemes, and the key outcome of interest is the accuracy of people’s beliefs. Although experiments can be formulated in many ways, the gold-standard method involves running randomized control trials (RCTs) where participants are randomly assigned to either a control condition or an experimental condition. In the context of medicine, for instance, participants in the control condition receive a placebo (e.g., a sugar pill), while participants in the experimental condition receive the actual medication being tested. Since the randomization process minimizes selection bias and the control group provides a baseline for comparison, RCTs provide researchers with the highest level of evidence for establishing causal relationships.

Together with colleagues at the Max Planck Institute for Human Development, I have set out to conduct RCTs with different algorithmic ranking schemes to evaluate their effects on people’s beliefs. In the first instance, this involves a simplistic belief updating task whereby we elicit explicit, numeric beliefs from participants on different topics before and after presenting them with content ranked in different ways. By then analyzing patterns of belief revision and changes in belief accuracy between groups, we aim to provide empirical evidence on the consequences of different approaches to ranking—namely, engagement-based, intelligence-based, bridging-based ranking (proposed elsewhere at this symposium [46,47]), and random ranking (serving as a control).

The first experiment of this research agenda consists of two parts. In the first part, we recruit participants to an online survey in which they indicate their left-right political leaning, report numeric beliefs on six topics, and then view and engage with paraphrased social media posts on each topic. There are three topics with an objective ground truth (e.g., predicting the probability that “the S&P 500 index will close at a lower value on 31 July 2023 than 31 January 2023,” on a zero to 100 percent scale) and three subjective topics with no ground truth (e.g., indicating your level of agreement with the statement, “all public bathrooms should be gender neutral,” on a zero to 100 scale). By including both types of topics, we grant ourselves the opportunity to later address open, empirical questions, like whether accuracy-oriented, intelligence-based ranking affects consensus in the context of subjective topics; or, whether consensus-oriented, bridging-based ranking undermines collective accuracy in the context of objective topics.

After indicating their beliefs, participants are presented with 12 posts per topic and choose whether to upvote, downvote, or pass each one. While these posts have been paraphrased from real social media posts, we manipulate them to ensure there is a balance of posts for and against each claim and to vary the language used, such that some posts are more toxic (as measured by the Google Perspective API toxicity classifier) and some express more certitude (as measured by the LIWC dictionary [114]). In doing so, this first part of the experiment allows us to analyze the types of posts that garner engagement, and most crucially, it allows us to generate a matrix of upvotes and downvotes—where rows represent individual people and columns represent unique posts—from which we can produce different data representations and derive metrics of engagement, intelligence, and bridging.

While this work is still underway, the pilot data (N = 49) reveals interesting insights (Figure 1). For instance, we find that liberals and conservatives display varying degrees of polarization across topics. This is reflected in their engagement with posts, since participants are much more likely to upvote posts that are concordant with their prior beliefs, and downvote posts that are discordant with their prior beliefs. In addition, we can already begin to identify posts that score highly on different metrics. For example, the post with the highest engagement (i.e., the post with the greatest total of upvotes plus downvotes) pertains to the topic of forecasting Joe Biden’s approval rating and seems potentially divisive:

Joe Biden is a senile old man. Falling down stairs and fumbling his words. Any approval for him is utterly baffling.

Whereas the most “bridging” post (i.e., the post with the most balanced ratio of liberal upvotes to conservative upvotes) pertains to the topic of forecasting the S&P 500 index, and seems more bipartisan:

Doesn't seem like we're seeing any slowdown in inflation, which just means more uncertainty in the market. If only the Fed would get a clue and stop hiking up the interest rates.

Finally, one of the most “intelligent” posts (i.e., the post with the greatest total of out-group upvotes plus in-group passes) is one that could be promoted to liberals to potentially improve their forecasts of Joe Biden’s approval rating by referencing historical data trends:

Joe Biden's approval rating has been steadily declining since he took office. There's no reason to believe it might suddenly turn around.

Of course, this is only pilot data, and these operationalizations of engagement, bridging, and intelligence are just one, simple instantiation. As we continue with further data collection and analysis, we will explore different operationalizations and correlations between them to see, empirically, just how different engagement, bridging, and intelligent content is before moving on to part two of the experiment.

Figure 1. Visualization of pilot data (N = 49). (A) Distribution of participants’ self-reported beliefs across six topics. For objective topics participants provided a probabilistic prediction, zero to 100 percent. For subjective topics, participants indicated their level of agreement, 0 (completely disagree) to 100 (completely agree). (B) Bipartite graph of engagement with 12 provided posts pertaining to the topic of whether the S&P 500 index will close lower on July 31 than January 31, 2023, with nodes representing users (blue circles are self-identified liberals; red circles are self-identified conservatives) and posts (yellow squares). Links connect users to the posts they upvoted. (C) Bipartite graph depicting engagement with the 12 provided posts pertaining to the topic of whether all public bathrooms should be gender neutral with nodes representing users (blue circles for self-identified liberals; red circles for self-identified conservatives) and posts (yellow squares). Links connect users to the posts they upvoted.

In part two of this experiment, we will use the data from part one to generate engagement-based, bridging-based, and intelligence-based ranking “feeds” for each topic consisting of a select three posts, personalized for liberals and conservatives. By selecting just three posts from the original 12 available based on different metric, we simulate the experience of information overload online: participants cannot attend to all posts and rely on an algorithm to narrow their focus. By then randomly assigning participants to one of the ranking schemes and observing differences in belief updating behavior in response to the different “feeds,” the hope is that we can provide causal, experimental evidence as to how algorithmic ranking influences individuals’ and collectives’ beliefs.

However, as already mentioned, this study alone cannot address the overarching objectives of the research agenda on its own. Given that the artificial set-up of this experiment has limited ecological validity, a natural next step will be to develop an experimental online deliberation forum where many individuals can engage in real-time to simulate online deliberation more realistically. This too is an ongoing effort.

Common Criticisms

Before concluding, it is important to acknowledge some common criticisms of the ideas and agenda I have outlined. This is a new, developing stream of research and it is both valid and encouraged to highlight limitations. Below, I aim to concisely address frequently raised points.

Deliberative, digital democracy is not a feasible objective: It is unrealistic to expect people to keep themselves informed and effectively deliberate

A primary criticism of this work has to do with the framing. Namely, it could be argued that the internet’s promise of an informed, deliberative online public sphere has not gone unrealized, but rather expecting people to keep themselves informed and meaningfully engage in deliberation is unrealistic to begin with. This argument has some truth to it. Truly democratic deliberation demands individuals to be logical, rational, and communicative.[115] If individuals evaluate information with a bias towards their initial disposition, and if people are motivated to display allegiance to their in-group, then deliberation may simply increase polarization.[116] However, this argument takes an excessively dim view of human psychology. Since the 1970s, mainstream views have been shaped by studies purporting to show that the human mind is irrational and error-prone due to heuristics it employs.[117] Yet, what is often overlooked is that such studies heavily rely on a single, contrived experimental method (so-called decisions from description),[118] and that when those supposedly irrational heuristics are employed in real world environments, they often have remarkable accuracy.[119,120] Moreover, even if one were to concede that individuals are generally incompetent, there is a rich, empirical literature on collective intelligence and deliberation showing that even bad solitary reasoners can combine to produce intelligent outcomes at the collective level: entirely uninformed individuals can contribute to deliberation by posing questions and challenging argumentative weak points, ordinary people engaged in deliberation can overcome elite attempts at manipulation, and citizen deliberation has been found to deliver less extreme, more reflective judgments that align with expressed values.[121] Inclusive deliberation and majority rule are robust mechanisms for tapping the collective intelligence of a public,[122] and there is no inherent reason why this cannot be promoted by algorithms and online platforms.

Deliberation is about more than eliciting accurate collective judgments

A second criticism of this work relates my implicit definition of successful deliberation as that which elicits a collectively accurate aggregate judgment. Deliberation, indeed, has intrinsic, pragmatic value beyond its role as an instrument for eliciting beliefs to be aggregated; deliberation as a process provides a medium for civic engagement and makes collective decision-making more transparent and legitimate. However, such intrinsic value is arguably harder to operationalize (but see the paralleling research agenda on bridging systems [46,47]), and whether there is tension between the intrinsic and instrumental value of deliberation is an open, empirical question (Box 2).

Collective intelligence is more than accurate collective judgments.

Just as the criticism of my definition of deliberation suggests, it is also conceivable to argue that my definition of collective intelligence is overly narrow. Collective intelligence is more than accurate, aggregate collective judgments, decisions, and forecasts. I agree with this critique and I encourage research that, for example, explores how recommender systems and search algorithms can promote collective, creative innovation.[123,124] Nevertheless, this does not negate the importance of collective accuracy. Plus, while there may be ambiguity over how to operationalize other collectively intelligent outcomes (such as creativity), operationalizing collective accuracy in judgment and decision-making seems more straightforward and better suited to algorithmic terms.

All content-curating algorithms are manipulative

Another criticism of this work is that content-curating algorithms are manipulative and redesigning them does not change this. While it is true that content-curating algorithms are inherently paternalistic as currently implemented, there are reasons to not shy away from this issue.

First, where opaque, proprietary algorithms underlying social media platforms can be argued to be manipulative, this is not the case on civic platforms. On civic platforms, the basic premise is that users are there for a specific purpose, such as trying to find consensus on contentious topics on Polis or forecasting future events on Metaculus. If algorithms were designed to help users reach their desired goal, then there is no reason that users would need to be kept in the dark about what algorithms are at work and how they function (e.g., Polis provides open-source documentation on the algorithms it uses).

Second, it must be recognized that content-curating algorithms underlying social media platforms and search engines are a necessity. It is fair to be critical of algorithmic ranking and amplification as it is currently deployed, but what is the alternative? In the face of information overload, it is simply not an option for there to be “no algorithm.” Even chronological ranking of content is itself a kind of algorithm, and not a very good one at that: Chronological feeds are easily hijacked by high-frequency posting, and messaging apps that naturally rank content chronologically, like Telegram and WhatsApp, display many of the same ills as platforms with more complex engagement-based ranking. [125]

Third, there are also avenues that could allow users to take a more active role in how algorithms rank and amplify content. This could be done by adopting participatory processes when designing algorithms themselves, 68 or by granting users choice over how content is personally ranked for them (i.e., a “choose your algorithm” approach). Both avenues seem like promising pushback against the connotation of algorithmic content curation as manipulation. Whether or not they affect the quality of interactions and deliberation online is, however, an open question.

Box 2. The difference between optimizing for deliberation’s instrumental and intrinsic value.

In this article, I have focused on the instrumental and epistemic value of deliberation. That is, I have implicitly defined the ideal online public sphere as one that promotes deliberation resulting in accurate collective beliefs, and I have thus narrowed my discussion to how algorithms for ranking and amplification could be optimized for that outcome. However, arriving at a collectively accurate belief with respect to some objective ground truth is not the only reason to value deliberation. Deliberation in the public sphere also holds intrinsic value as a process that enables civic engagement, constructive participation, and the legitimization of collective decisions.

In principle, these two domains of value—the instrumental value of deliberation as an elicitor of collective accuracy and the intrinsic value of deliberation as a process—are not at odds with each other. If a deliberation platform is designed in a way that maximizes individuals’ participation, then it could simultaneously lead to optimal collective accuracy since more information and cognitive diversity is included. In some contexts, however, it seems likely that these two values will be at odds. If individuals are incompetent, prejudiced, or simply too many for a given deliberation platform, then it would follow that optimizing for collective accuracy on that platform could lead to the exclusion of some individuals and detract from the intrinsic value of deliberation. Likewise, optimizing for the participatory process of deliberation—for instance, by amplifying content that bridges factions of the public and promotes consensus[47,47]— could detract from collective accuracy in contexts where there is an objective ground truth that only one faction of the public acknowledges (hence the notion of “positive dissensus”[126]). The extent to which the instrumental and intrinsic value of deliberation are at odds in real-world deliberation is an empirical question. However, there is also a normative question to consider: Should we optimize for collective accuracy if it negatively impacts levels of participation and the subjective process of deliberation, or vice versa?[127] What are the appropriate weights to place on the instrumental versus the intrinsic value of deliberation?

Even if algorithms for intelligence-based ranking are developed, there are no incentives or business models for adopting them

Finally, one might argue that, even if algorithms were developed to effectively implement intelligence-based ranking that promotes deliberation and collective accuracy online, there are no incentives or business models to encourage adoption. However, this criticism rests on two unfounded conclusions. First, it assumes that intelligence-based ranking would worsen the user experience on existing platforms, which is another open empirical question. Second, it assumes that there is no business in promoting positive social outcomes, which is false. As literature on corporate social responsibility shows: “not only is ’doing good’ ’the right thing to do,’ but it also leads to ’doing better’.”[128] Companies that make social impact a key part of their business see benefits in, among other things, brand reputation and access to capital investments,[129] hence the recognizable designation of Benefit Corporations, and the popularity of products like fair trade coffee and sustainable fashion.

Still, it is true that attractive, alternative business models are needed to encourage platforms to see value beyond clicks and to drive more systemic change. Innovation in this space is indeed happening. For example, a recent paper from the Initiative for Digital Public Infrastructure proposes a view of a more diverse, modular social media ecosystem.[130] In brief, it argues that small, independent, single-purpose platforms should coexist alongside the general-purpose, very large online platforms (VLOPs [131]) dominating the internet today (e.g., Twitter, Facebook, YouTube, TikTok), and that there should be a marketplace for third-party content curation tools—so-called “middleware” [132,133]—rather than restricting users to the recommender systems developed by the social media platforms themselves.

This view squares well with the idea of algorithmic amplification for collective intelligence. While it may seem difficult to imagine intelligence-based ranking on a VLOP, it is not so farfetched to imagine it on a smaller, purpose-built platform—for instance, on a forecasting platform like Metaculus, or a social media platform designed specifically for scientific debates, where users have the unambiguous goal of forming accurate beliefs. Equally, one could imagine an intelligence-based ranking algorithm pitched as middleware, with companies competing to develop the algorithm with the best results for users. While there are undoubtedly technical and regulatory issues to be reckoned with, it seems uninspired to assume there’s no business to be made with algorithmic amplification for collective intelligence.

Concluding Thoughts

Social media, civic platforms, and the internet at large provide an online public sphere that can and does help citizens to become more informed and more engaged. Yet, this is an information environment that we have not fully grappled with. The challenge of information scarcity has been replaced with information overload. Reliance on institutional gatekeepers has been replaced with reliance on proprietary algorithms. While it is certainly important for us to research and understand how things are, we should also not forget to research and explore how things could be.

In this article, I sought to develop an alternative perspective and research agenda on algorithmic amplification. Rather than viewing algorithmic ranking and amplification of content online as a danger to be mitigated, I proposed that we view it as an opportunity. We need algorithms to navigate information online, but we can design those algorithms for social good; we can design them to support deliberation and improve the accuracy of people’s beliefs.

References

[1] Habermas, J. The Structural Transformation of the Public Sphere: An Inquiry Into a Category of Bourgeois Society. (Polity, 1989).

[2] Aristotle. Rhetoric, Book 1, Chapter 2. in Rhetoric (ed. Freese, J. H.) (Harvard University Press, 1926).

[3] Landemore, H. An epistemic argument for democracy. in The Routledge Handbook of Political Epistemology (eds. Hannon, M. & de Ridder, J.) (Routledge, 2021).

[4] Goodin, R. E. & Spiekermann, K. An epistemic theory of democracy. (Oxford University Press, 2018).

[5] Estlund, D. & Landemore, H. The Epistemic Value of Democratic Deliberation. in The Oxford Handbook of Deliberative Democracy (eds. Bächtiger, A., Dryzek, J. S., Mansbridge, J. & Warren, M.) (Oxford University Press, 2018).

[6] Estlund, D. Beyond Fairness and Deliberation: The Epistemic Dimension of Democratic Authority. in Deliberative Democracy (eds. Bohman, J. & Rehg, W.) (1997).

[7] Diamond, L. Liberation Technology. J. Democr. 21, 69–83 (2010).

[8] Pariser, E. The Filter Bubble: What the Internet is Hiding From You. (Penguin Press, 2011).

[9] Sunstein, C. R. Republic.com. (Princeton University Press, 2002).

[10] Haidt, J. Yes, Social Media Really Is Undermining Democracy. The Atlantic https://www.theatlantic.com/ideas/archive/2022/07/social-media-harm-facebook-meta-response/670975/ (2022).

[11] Ambinder, M. The Revolution Will Be Twittered. The Atlantic (2009).

[12] Tufekci, Z. Twitter and tear gas: the power and fragility of networked protest. (Yale University Press, 2017).

[13] Vargas, J. A. Spring Awakening. The New York Times (2012).

[14] Tufekci, Z. & Wilson, C. Social Media and the Decision to Participate in Political Protest: Observations From Tahrir Square. J. Commun. 62, 363–379 (2012).

[15] Herman, E. S. & Chomsky, N. Manufacturing Consent: The Political Economy of the Mass Media. (Pantheon Books, 2002).

[16] ‘Post-truth’ declared word of the year by Oxford Dictionaries. BBC (2016).

[17] Lewandowsky, S., Ecker, U. K. H. & Cook, J. Beyond misinformation: Understanding and coping with the “post-truth” era. J. Appl. Res. Mem. Cogn. 6, 353–369 (2017).

[18] Mance, H. Britain has had enough of experts, says Gove. Financial Times (2016).

[19] O’Brien, C. Conway: Spicer presented ‘alternative facts’ on inauguration crowds. Politico (2017).

[20] Obama, B. President Obama’s Farewell Address. The White House https://obamawhitehouse.archives.gov/farewell (2017).

[21] Dubois, E. & Blank, G. The echo chamber is overstated: the moderating effect of political interest and diverse media. Inf. Commun. Soc. 21, 729–745 (2018).

[22] Gentzkow, M. & Shapiro, J. M. Ideological Segregation Online and Offline. Q. J. Econ. 126, 1799–1839 (2011).

[23] Bakshy, E., Messing, S. & Adamic, L. A. Exposure to ideologically diverse news and opinion on Facebook. Science 348, 1130–1132 (2015).

[24] Messing, S. & Westwood, S. J. Selective Exposure in the Age of Social Media: Endorsements Trump Partisan Source Affiliation When Selecting News Online. Commun. Res. 41, 1042–1063 (2014).

[25] Desilver, D. The polarization in today’s Congress has roots that go back decades. Pew Research Center https://www.pewresearch.org/fact-tank/2022/03/10/the-polarization-in-todays-congress-has-roots-that-go-back-decades/ (2022).

[26] Klein, C., Clutton, P. & Dunn, A. G. Pathways to conspiracy: The social and linguistic precursors of involvement in Reddit’s conspiracy theory forum. PLOS ONE 14, e0225098 (2019).

[27] Roose, K. What Is QAnon, the Viral Pro-Trump Conspiracy Theory? The New York Times (2021).

[28] Colley, T. & Moore, M. The challenges of studying 4chan and the Alt-Right: ‘Come on in the water’s fine’. New Media Soc. 24, 5–30 (2022).

[29] Uscinski, J. E. & Parent, J. M. American Conspiracy Theories. (Oxford University Press, 2014). doi:10.1093/acprof:oso/9780199351800.001.0001.

[30] van Prooijen, J.-W. & Douglas, K. M. Conspiracy theories as part of history: The role of societal crisis situations. Mem. Stud. 10, 323–333 (2017).

[31] Lorenz-Spreen, P., Oswald, L., Lewandowsky, S. & Hertwig, R. A systematic review of worldwide causal and correlational evidence on digital media and democracy. Nat. Hum. Behav. 7, 74–101 (2022).

[32] Sommeiller, E. & Price, M. The new gilded age: Income inequality in the U.S. by state, metropolitan area, and county. Economic Policy Institute https://www.epi.org/publication/the-new-gilded-age-income-inequality-in-the-u-s-by-state-metropolitan-area-and-county/ (2018).

[33] Horowitz, J. M., Igielnik, R. & Kochhar, R. Trends in U.S. income and wealth inequality. Pew Research Center https://www.pewresearch.org/social-trends/2020/01/09/trends-in-income-and-wealth-inequality/ (2020).

[34] Putnam, R. Bowling Alone: America’s Declining Social Capital. J. Democr. 6, (1995).

[35] Sander, T. & Putnam, R. Still Bowling Alone?: The Post-9/11 Split. J. Democr. 21, 9–16 (2010).

[36] Thorburn, L. When You Hear “Filter Bubble”, “Echo Chamber”, or “Rabbit Hole” — —Think “Feedback Loop”. Understanding Recommenders https://medium.com/understanding-recommenders/when-you-hear-filter-bubble-echo-chamber-or-rabbit-hole-think-feedback-loop-7d1c8733d5c (2023).

[37] Narayanan, A. Understanding Social Media Recommendation Algorithms. Knight First Amendment Institute at Columbia University http://knightcolumbia.org/content/understanding-social-media-recommendation-algorithms (2023).

[38] Simon, H. A. Designing organizations for an information-rich world. Comput. Commun. Public Interest 37–72 (1971).

[39] Hills, T. T. The Dark Side of Information Proliferation. Perspect. Psychol. Sci. 14, 323–330 (2019).

[40] Lorenz-Spreen, P., Mønsted, B. M., Hövel, P. & Lehmann, S. Accelerating dynamics of collective attention. Nat. Commun. 10, 1759 (2019).

[41] Lazer, D. The rise of the social algorithm. Science 348, 1090–1091 (2015).

[42] Brin, S. & Page, L. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 107–117 (1998).

[43] Hannak, A. et al. Measuring personalization of web search. in Proceedings of the 22nd international conference on World Wide Web 527–538 (ACM, 2013). doi:10.1145/2488388.2488435.

[44] Narayanan, A. An Introduction to My Project: Algorithmic amplification and society. Knight First Amendment Institute at Columbia University http://knightcolumbia.org/blog/an-introduction-to-my-project-algorithmic-amplification-and-society-1 (2022).

[45] Eckles, D. Algorithmic transparency and assessing effects of algorithmic ranking. https://www.commerce.senate.gov/services/files/62102355-DC26-4909-BF90-8FB068145F18 (2021) doi:10.31235/osf.io/c8za6.

[46] Ovadya, A. Bridging-Based Ranking. (2022).

[47] Ovadya, A. & Thorburn, L. Bridging Systems: Open Problems for Countering Destructive Divisiveness across Ranking, Recommenders, and Governance. Preprint at http://arxiv.org/abs/2301.09976 (2023).

[48] Thorburn, L., Bengani, P. & Stray, J. How Platform Recommenders Work. Understanding Recommenders https://medium.com/understanding-recommenders/how-platform-recommenders-work-15e260d9a15a (2022).

[49] Bengani, P., Stray, J. & Thorburn, L. What’s Right and What’s Wrong with Optimizing for Engagement. Understanding Recommenders https://medium.com/understanding-recommenders/whats-right-and-what-s-wrong-with-optimizing-for-engagement-5abaac021851 (2022).

[50] Berger, J. & Milkman, K. L. What Makes Online Content Viral? J. Mark. Res. 49, 192–205 (2012).

[51] Robertson, C. E. et al. Negativity drives online news consumption. Nat. Hum. Behav. (2023) doi:10.1038/s41562-023-01538-4.

[52] Rathje, S., Van Bavel, J. J. & van der Linden, S. Out-group animosity drives engagement on social media. Proc. Natl. Acad. Sci. 118, e2024292118 (2021).

[53] Hagey, K. & Horwitz, J. Facebook Tried to Make Its Platform a Healthier Place. It Got Angrier Instead. Wall Street Journal (2021).

[54] Vosoughi, S., Roy, D. & Aral, S. The spread of true and false news online. Science 359, 1146–1151 (2018).

[55] Habermas, J., Rehg, W. & Habermas, J. Between facts and norms: contributions to a discourse theory of law and democracy. (MIT Press, 1998).

[56] Kozyreva, A., Lewandowsky, S. & Hertwig, R. Citizens Versus the Internet: Confronting Digital Challenges With Cognitive Tools. Psychol. Sci. Public Interest 21, 103–156 (2020).

[57] Facebook. How is Facebook addressing false information through independent fact-checkers? Facebook Help Center https://www.facebook.com/help/1952307158131536/.

[58] Twitter. About Community Notes on Twitter. Twitter Help Center https://help.twitter.com/en/using-twitter/community-notes.

[59] Pennycook, G. et al. Shifting attention to accuracy can reduce misinformation online. Nature 592, 590–595 (2021).

[60] Roozenbeek, J., van der Linden, S. & Nygren, T. Prebunking interventions based on the psychological theory of “inoculation” can reduce susceptibility to misinformation across cultures. Harv. Kennedy Sch. Misinformation Rev. (2020) doi:10.37016//mr-2020-008.

[61] Guess, A. M. et al. A digital media literacy intervention increases discernment between mainstream and false news in the United States and India. Proc. Natl. Acad. Sci. 117, 15536–15545 (2020).

[62] Wineburg, S. & McGrew, S. Lateral Reading and the Nature of Expertise: Reading Less and Learning More When Evaluating Digital Information. Teach. Coll. Rec. Voice Scholarsh. Educ. 121, 1–40 (2019).

[63] GDPR. Chapter 3 (Art. 12-23) Archives. GDPR.eu https://gdpr.eu/tag/chapter-3/ (2016).

[64] Connor Desai, S. A., Pilditch, T. D. & Madsen, J. K. The rational continued influence of misinformation. Cognition 205, 104453 (2020).

[65] Lewandowsky, S., Ecker, U. K. H., Seifert, C. M., Schwarz, N. & Cook, J. Misinformation and Its Correction: Continued Influence and Successful Debiasing. Psychol. Sci. Public Interest 13, 106–131 (2012).

[66] Johnson, H. M. & Seifert, C. M. Sources of the continued influence effect: When misinformation in memory affects later inferences. J. Exp. Psychol. Learn. Mem. Cogn. 20, 1420–1436 (1994).

[67] Stray, J., Vendrov, I., Nixon, J., Adler, S. & Hadfield-Menell, D. What are you optimizing for? Aligning Recommender Systems with Human Values. Preprint at http://arxiv.org/abs/2107.10939 (2021).

[68] Stray, J. et al. Building Human Values into Recommender Systems: An Interdisciplinary Synthesis. Preprint at http://arxiv.org/abs/2207.10192 (2022).

[69] Hadfield-Menell, D. & Hadfield, G. K. Incomplete Contracting and AI Alignment. in Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society 417–422 (ACM, 2019). doi:10.1145/3306618.3314250.

[70] Gabriel, I. Artificial Intelligence, Values, and Alignment. Minds Mach. 30, 411–437 (2020).

[71] Zuckerberg, M. Facebook https://www.facebook.com/zuck/posts/10104413015393571 (2018).

[72] Mehrotra, R., McInerney, J., Bouchard, H., Lalmas, M. & Diaz, F. Towards a Fair Marketplace: Counterfactual Evaluation of the trade-off between Relevance, Fairness & Satisfaction in Recommendation Systems. in Proceedings of the 27th ACM International Conference on Information and Knowledge Management 2243–2251 (ACM, 2018). doi:10.1145/3269206.3272027.

[73] Bergen, M. YouTube Executives Ignored Warnings, Letting Toxic Videos Run Rampant. Bloomberg (2019).

[74] Landemore, H. Can AI Bring Deliberation to the Masses? (2022).

[75] The Computational Democracy Project. Polis. https://pol.is/home.

[76] Small, C. Polis: Escalar de la deliberación mediante el mapeo de espacios de opinión de alta dimensión. Recer. Rev. Pensam. Anàlisi (2021) doi:10.6035/recerca.5516.

[77] Hsiao, Y. T., Lin, S.-Y., Tang, A., Narayanan, D. & Sarahe, C. vTaiwan: An Empirical Study of Open Consultation Process in Taiwan. https://osf.io/xyhft (2018) doi:10.31235/osf.io/xyhft.

[78] Miller, C. How Taiwan’s ‘civic hackers’ helped find a new way to run the country. The Guardian (2020).

[79] Malone, T. W. & Bernstein, M. S. Handbook of collective intelligence. (MIT Press, 2015).

[80] Spinney, L. Covid-19 expert Karl Friston: ‘Germany may have more immunological “dark matter”’. The Guardian (2020).

[81] Chang, K. A University Had a Great Coronavirus Plan, but Students Partied On. The New York Times (2020).

[82] Condorcet, N. Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. (1785).

[83] Galton, F. Vox Populi. Nature 75, 450–451 (1907).

[84] Grofman, B., Owen, G. & Feld, S. L. Thirteen theorems in search of the truth. Theory Decis. 15, 261–278 (1983).

[85] Surowiecki, J. The wisdom of crowds. (Anchor Books, 2005).

[86] Dasgupta, P. & Maskin, E. On The Robustness of Majority Rule. J. Eur. Econ. Assoc. 6, 949–973 (2008).

[87] Madirolas, G. & de Polavieja, G. G. Improving Collective Estimations Using Resistance to Social Influence. PLOS Comput. Biol. 11, e1004594 (2015).

[88] Becker, J., Brackbill, D. & Centola, D. Network dynamics of social influence in the wisdom of crowds. Proc. Natl. Acad. Sci. 114, (2017).

[89] Kurvers, R. H. J. M. et al. How to detect high-performing individuals and groups: Decision similarity predicts accuracy. Sci. Adv. 5, eaaw9011 (2019).

[90] Himmelstein, M., Budescu, D. V. & Ho, E. H. The wisdom of many in few: Finding individuals who are as wise as the crowd. J. Exp. Psychol. Gen. (2023) doi:10.1037/xge0001340.

[91] Almaatouq, A., Rahimian, M. A., Burton, J. W. & Alhajri, A. The distribution of initial estimates moderates the effect of social influence on the wisdom of the crowd. Sci. Rep. 12, 16546 (2022).

[92] Page, S. The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies - New Edition. (Princeton University Press, 2008). doi:10.1515/9781400830282.

[93] Hong, L. & Page, S. E. Groups of diverse problem solvers can outperform groups of high-ability problem solvers. Proc. Natl. Acad. Sci. 101, 16385–16389 (2004).

[94] Walton, D. Argumentation Theory: A Very Short Introduction. in Argumentation in artificial intelligence (eds. Rahwan, I. & Simari, G. R.) 1–23 (Springer, 2009).

[95] Hahn, U. & Collins, P. Argumentation Theory. in The Handbook of Rationality (eds. Knauff, M. & Spohn, W.) (The MIT Press, 2021).

[96] Toulmin, S. The uses of argument. (Cambridge University Press, 2003).

[97] Dung, P. M. On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artif. Intell. 77, 321–357 (1995).

[98] Rahwan, I. & Reed, C. The Argument Interchange Format. in Argumentation in artificial intelligence (eds. Rahwan, I. & Simari, G. R.) 383–402 (Springer, 2009).

[99] Hahn, U. Argument Quality in Real World Argumentation. Trends Cogn. Sci. 24, 363–374 (2020).

[100] Hahn, U. & Oaksford, M. The rationality of informal argumentation: A Bayesian approach to reasoning fallacies. Psychol. Rev. 114, 704–732 (2007).

[101] Lawrence, J. & Reed, C. Argument Mining: A Survey. Comput. Linguist. 45, 765–818 (2020).

[102] Persing, I. & Ng, V. Modeling Argument Strength in Student Essays. in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) 543–552 (Association for Computational Linguistics, 2015). doi:10.3115/v1/P15-1053.

[103] Melnyk, I., Dognin, P. & Das, P. Knowledge Graph Generation From Text. (2022) doi:10.48550/ARXIV.2211.10511.

[104] Reed, C. Argument technology for debating with humans. Nature 591, 373–374 (2021).

[105] Daxenberger, J., Eger, S., Habernal, I., Stab, C. & Gurevych, I. What is the Essence of a Claim? Cross-Domain Claim Identification. in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2055–2066 (Association for Computational Linguistics, 2017). doi:10.18653/v1/D17-1218.

[106] Vecchi, E. M., Falk, N., Jundi, I. & Lapesa, G. Towards Argument Mining for Social Good: A Survey. in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) 1338–1352 (Association for Computational Linguistics, 2021). doi:10.18653/v1/2021.acl-long.107.

[107] Jönsson, M. L., Hahn, U. & Olsson, E. J. The kind of group you want to belong to: Effects of group structure on group accuracy. Cognition 142, 191–204 (2015).

[108] Burton, J. W., Hahn, U., Almaatouq, A. & Rahimian, M. A. Algorithmically Mediating Communication to Enhance Collective Decision-Making in Online Social Networks. in ACM Collective Intelligence Conference (2021).

[109] Burton, J. W., Almaatouq, A., Rahimian, M. A. & Hahn, U. Rewiring the Wisdom of the Crowd. in Proceedings of the Annual Meeting of the Cognitive Science Society vol. 43 (2021).

[1] Karvetski, C. W. et al. What do forecasting rationales reveal about thinking patterns of top geopolitical forecasters? Int. J. Forecast. 38, 688–704 (2022).

[110] Navajas, J., Niella, T., Garbulsky, G., Bahrami, B. & Sigman, M. Aggregated knowledge from a small number of debates outperforms the wisdom of large crowds. Nat. Hum. Behav. 2, 126–132 (2018).

[111] Narayanan, A. Twitter showed us its algorithm. What does it tell us? Knight First Amendment Institute at Columbia University https://knightcolumbia.org/blog/twitter-showed-us-its-algorithm-what-does-it-tell-us (2023).

[112] Burton, J. W., Cruz, N. & Hahn, U. Reconsidering evidence of moral contagion in online social networks. Nat. Hum. Behav. 5, 1629–1635 (2021).

[113] Boyd, R. L., Ashokkumar, A., Seraj, S. & Pennebakker, J. W. The Development and Psychometric Properties of LIWC-22. (2022).

[114] Rosenberg, S. W. Citizen competence and the psychology of deliberation. in Deliberative Democracy 98–117 (Edinburgh University Press, 2014). doi:10.1515/9780748643509-008.

[115] Sunstein, C. R. On a Danger of Deliberative Democracy. Daedalus 131, 120–124 (2002).

[116] Kahneman, D. & Tversky, A. Prospect Theory: An Analysis of Decision under Risk. Econometrica 47, 263 (1979).

[117] Lejarraga, T. & Hertwig, R. How experimental methods shaped views on human competence and rationality. Psychol. Bull. 147, 535–564 (2021).

[118] Gigerenzer, G. & Todd, P. M. Simple heuristics that make us smart. (Oxford University Press, 1999).

[119] Gigerenzer, G. & Selten, R. (eds.). Bounded rationality: the adaptive toolbox. (MIT Press, 2002).

[120] Dryzek, J. S. et al. The crisis of democracy and the science of deliberation. Science 363, 1144–1146 (2019).

[121] Landemore, H. Democratic Reason: The Mechanisms of Collective Intelligence in Politics. in Collective Wisdom (eds. Landemore, H. & Elster, J.) 251–289 (Cambridge University Press, 2012). doi:10.1017/CBO9780511846427.012.

[122] Lifshitz-Assaf, H. & Wolfson, B. Would Archimedes Shout “Eureka” If He Had Google? Innovating with Search Algorithms. ICIS 2022 Proc. 5, (2022).

[123] Rhys Cox, S., Wang, Y., Abdul, A., von der Weth, C. & Y. Lim, B. Directed Diversity: Leveraging Language Embedding Distances for Collective Creativity in Crowd Ideation. in Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems 1–35 (ACM, 2021). doi:10.1145/3411764.3445782.

[124] CeMAS. Telegram: Chronology of a Radicalization. Telegram: Chronology of a Radicalization https://report.cemas.io/en/telegram/.

[125] Landemore, H. & Page, S. E. Deliberation and disagreement: Problem solving, prediction, and positive dissensus. Polit. Philos. Econ. 14, 229–254 (2015).

[126] Zenker, F. et al. Norms of Public Argumentation and the Ideals of Correctness and Participation. Argumentation (2023) doi:10.1007/s10503-023-09598-6.

[127] Bhattacharya, C. B. & Sen, S. Doing Better at Doing Good: When, Why, and How Consumers Respond to Corporate Social Initiatives. Calif. Manage. Rev. 47, 9–24 (2004).

[128] Porter, M. E. & Kramer, M. R. Strategy and society: the link between competitive advantage and corporate social responsibility. Harv. Bus. Rev. 84, 78–92 (2006).

[129] Rajendra-Nicolucci, C., Sugarman, M. & Zuckerman, E. The Three-Legged Stool: A Manifesto for a Smaller, Denser Internet. https://publicinfrastructure.org/2023/03/29/the-three-legged-stool/ (2023).

[130] European Commission. DSA: Very large online platforms and search engines. (2023).

[131] Fukuyama, F. et al. Middleware for Dominant Digital Platforms: A Technological Solution to a Threat to Democracy. https://fsi-live.s3.us-west-1.amazonaws.com/s3fs-public/cpc-middleware_ff_v2.pdf (2021).

[132] Keller, D. The Future of Platform Power: Making Middleware Work. J. Democr. 32, 168–172 (2021).

Acknowledgments

This research is supported by the Alexander von Humboldt Foundation.

I thank the organizers and participants of the “Optimizing for What? Algorithmic Amplification and Society” symposium for many insightful discussions on this work. I thank my colleagues at the Center for Adaptive Rationality at the Max Planck Institute for Human Development for helpful feedback, with special thanks to Philipp Lorenz-Spreen, Stefan Herzog, and Julian Berger. I thank Ulrike Hahn, whose insights helped shape the early stages of these ideas during my PhD.

Cite as: Jason W. Burton, Algorithmic Amplification for Collective Intelligence, 23-08 Knight First Amend. Inst. (Sept. 21, 2023), https://knightcolumbia.org/content/algorithmic-amplification-for-collective-intelligence [Permalink].

I define collective intelligence as the ability of individuals to collectively solve problems and make decisions with greater performance than any individual alone (but see [79] for an overview of definitions across disciplines). Collective accuracy in decision-making is one expression of collective intelligence, but by no means the only.

Jason W. Burton is an assistant professor at Copenhagen Business School and an Alexander von Humboldt Research fellow at the Max Planck Institute for Human Development.

Filed Under

Essays and Scholarship

Essays and Scholarship

Algorithmic Amplification for Collective Intelligence

Social media promised a new, democratized, and digital public sphere. Algorithms can help us get there.

Algorithmic Amplification and Society