Ten days ago, Meta announced a large language model called LLaMA. The company said the goal was to foster research. It didn’t release the model publicly but allowed researchers to request access through a form. But a week later, someone leaked the model. This is significant because it makes LLaMA the most capable LLM publicly available. It is reportedly competitive with LaMDA, the model underlying Google’s Bard. Many experts worry that we are about to see an explosion of misuse, such as scams and disinformation.

But let’s back up. Models that are only slightly less capable have been available for years, such as GPT-J. And we’ve been told for a while now that a wave of malicious use is coming. Yet, there don’t seem to be any documented cases of such misuse. Not one, except for a research study.

One possibility is that malicious use has been happening all along, but hasn’t been publicly reported. But that seems unlikely, because we know of numerous examples of non-malicious misuses: students cheating on homework, a Q&A site and a sci-fi mag being overrun by bot-generated submissions, CNET publishing error-filled investment advice, a university offending staff and students by sending a bot-generated condolence message after a shooting, and, of course, search engine spam.

Malicious use is intended to harm the recipient in some way, while non-malicious misuse is when someone is just trying to save time or make a buck. Sometimes the categorization may be unclear, but all the examples above seem obviously non-malicious. The difference is significant because non-malicious misuse does not rely on open-source models. OpenAI can (and does) try to train ChatGPT to refuse to generate misinformation, but can’t possibly prevent students from using the tool to generate essays.

Seth Lazar suggests that the risk of LLM-based disinformation is overblown because the cost of producing lies is not the limiting factor in influence operations. We agree. Spam might be similar. The challenge for spammers is likely not the cost of generating spam emails, but locating the tiny fraction of people who will potentially fall for whatever the scam is. There’s a classic paper that argues that for precisely this reason, spammers go out of their way to make their messages less persuasive. That way, receiving a response is a stronger signal of the recipient’s vulnerability.

We could be wrong about this. People like Gary Marcus who’ve thought carefully about the topic will no doubt have important counterarguments. But ultimately we should defer to the evidence. Now that LLaMA is out, if reports of malicious misuse continue to be conspicuously absent in the next few months, that should make us rethink the risk. Besides, attempting to control the supply of misinformation or spam seems like a brittle approach compared to giving people the know-how and the technical tools to defend themselves.

If we are correct about the risk of malicious misuse being lower than widely assumed, that’s an argument in favor of open-sourcing LLMs. To be clear, we don’t have an opinion on whether they should be open-sourced; that question is too complex to tackle here. The risk of misuse is only one factor, albeit an important one. Our goal here is to steer the debate away from false premises. One should be especially cautious about arguments for keeping models proprietary based on evidence-free claims of misuse, considering that the powerful companies that build these models have an obvious vested interest in pushing this view.

Finally, we should demand that companies be much more transparent: Those that host LLMs should release audits of how the tools have been used and abused. Social media platforms should study and report the prevalence of LLM-generated misinformation.

Further reading

  • The paper Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations has a good overview of hypothetical malicious uses of LLMs—emphasis on hypothetical. The authors are at OpenAI, Georgetown University, and Stanford University.
  • A paper by Daniel Kang and others (U of Illinois, Stanford, Berkeley) shows how to bypass LLM content filters to generate misinformation, showing that even models that are behind an API aren’t obviously safe from malicious use. Both this paper and the previous one emphasize that a personalized conversation can be much more persuasive than a one-off, untargeted message. We think this ability—rather than lowering the cost of message generation—is probably the biggest threat from LLMs with respect to both disinformation and fraud.
  • A report from TrendLabs discusses the market for production and dissemination of disinformation.

Cross-posted at the AI Snake Oil blog.