This blog post is by the above authors. It is based on a paper by 23 authors that is available here.

Today, we are releasing an open letter encouraging AI companies to provide legal and technical protections for good faith research on their AI models. The letter focuses on the importance of independent evaluations of proprietary generative AI models, particularly those with millions of users. In an accompanying paper, we discuss existing challenges to independent research, and how a more equitable, transparent, and accountable researcher ecosystem could be developed.

The letter has been signed by experienced researchers, practitioners, and advocates across disciplines, and is open for signatures.

Read and sign the letter here. Read the paper here. Read The Washington Post’s coverage here.

Independent Evaluation of AI Is Crucial for Uncovering Vulnerabilities

AI companies, academic researchers, and civil society agree that generative AI models pose acute risks. Independent risk assessment is an essential mechanism for providing accountability. Nevertheless, barriers exist that inhibit the independent evaluation of many AI models.

Independent researchers often evaluate and “red team” AI models to measure a variety of different risks. In this work, we focus on post-release evaluation of models (or APIs) by external researchers beyond the model developer. It is also referred to as algorithmic audits by third-parties. Some companies also conduct red teaming before their models are released both internally and with experts they select.

While many types of testing are critical, independent evaluation of AI models that are already deployed is widely regarded as essential for ensuring safety, security, and trust. Independent red-teaming research of AI models has uncovered vulnerabilities related to low resource languages, bypassing safety measures, and a wide range of jailbreaks. These evaluations investigate a broad set of often unanticipated model flaws, related to misuse, bias, copyright, and other issues.

Terms of Service Can Discourage Community-Led Evaluations

Despite the need for independent evaluation, conducting research related to these vulnerabilities is often legally prohibited by the terms of service for popular AI models, including those of OpenAI, Google, Anthropic, Inflection, Meta, and Midjourney.

While these terms are intended as a deterrent against malicious actors, they also inadvertently restrict AI safety and trustworthiness research; companies forbid the research and may enforce their policies with account suspensions (as an example, see Anthropic’s acceptable use policy). While companies enforce these restrictions to varying degrees, the terms can disincentivize good-faith research by granting developers the right to terminate researchers’ accounts or even take legal action against them. Often, there is limited transparency into the enforcement policy, and no formal mechanism for justification or appeal of account suspensions. Even aside from the legal deterrent, the risk of losing account access by itself may dissuade researchers who depend on these accounts for other critical types of AI research.

Evaluating the risks of models that are already deployed and have millions of users is essential as the models pose immediate risks. However, only a relatively small group of researchers selected by companies have legal authorization to do so.

Existing Safe Harbors Protect Security Research but Not Safety and Trustworthiness Research

AI developers have engaged to differing degrees with external red teamers and evaluators. For example, OpenAI, Google, and Meta have bug bounties (that provide monetary rewards to people to report security vulnerabilities) and even legal protections for security research. Still, companies like Meta and Anthropic currently “reserve final and sole discretion for whether you are acting in good faith and in accordance with this Policy,” which could deter good-faith security research. These legal protections extend only to traditional security issues like unauthorized account access and do not include broader safety and trustworthiness research.

Cohere and OpenAI are exceptions, though some ambiguity remains as to the scope of protected activities; Cohere allows “intentional stress testing of the API and adversarial attacks” provided appropriate vulnerability disclosure (without explicit legal promises), and OpenAI expanded its safe harbor to include “model vulnerability research” and “academic model safety research” in response to an early draft of our proposal.

In the table below, we document gaps in the policies of leading AI companies. These gaps force well-intentioned researchers to either wait for approval from unresponsive researcher access programs, or risk violating company policy and losing access to their accounts.

This figure represents the extent to which companies provide: access to their flagship models; safe harbor for external security, safety, and trustworthiness research; and transparency and fairness in the enforcement of their policies. A transparent circle signifies that the company does not offer access to its model or safe harbor in that way, or that it is not transparent about how it enforces its policies. A half filled circle indicates partial access, safe harbor, or transparency, and two filled circles indicate substantial access, safe harbor, and transparency. See the paper for full details.

Our Proposal: A Legal and Technical Safe Harbor

We believe that a pair of voluntary commitments could significantly improve participation, access, and incentives for public interest research into AI safety. The two commitments are: a legal safe harbor, protecting good-faith, public interest evaluation research provided it is conducted in accordance with well-established security vulnerability disclosure practices; and a technical safe harbor, protecting this evaluation research from account termination. Both safe harbors should be scoped to include research activities that uncover any system flaws, including all undesirable generations currently prohibited by a company’s terms of service.

As others have argued, this would not inhibit existing enforcement against malicious misuse, as protections are entirely contingent on abiding by the law and strict vulnerability disclosure policies, determined ex-post. The legal safe harbor, similar to a proposal by the Knight First Amendment Institute for a safe harbor for research on social media platforms, would safeguard certain research from some amount of legal liability, mitigating the deterrent of strict terms of service. The technical safe harbor would limit the practical barriers to safety research from companies’ enforcement of their terms by clarifying that researchers will not be penalized.

This figure summarizes the proposed voluntary commitments from companies, and the corresponding responsibilities for researchers, required to enjoy the safe harbor protections. The company commitments are designed to protect good-faith independent research on proprietary models, even when it exposes a company to criticism. The researcher commitments preserve privacy, prevents harms to users, or any disruption of business, among other concerns. Together these joint rules enable crowdsourced ethical hacking to improve public safety and awareness of problems.

A Legal Safe Harbor for Red Teaming Reduces Barriers to Essential AI Research

A legal safe harbor could provide assurances that AI companies will not sue researchers if their actions were taken for research purposes. In the U.S. legal regime, this would impact companies’ use of the Computer Fraud and Abuse Act (CFAA) and Section 1201 of the Digital Millennium Copyright Act (DMCA). These risks are not theoretical; security researchers have been targeted under the CFAA, and DMCA Section 1201 hampered security research to the extent that researchers requested and won an exemption from the law for this purpose. Already, in the context of generative AI, OpenAI has attempted to dismiss The New York Times v. OpenAI lawsuit on the allegation that the New York Times research into the model constituted hacking.

These protections apply only to researchers who abide by companies’ vulnerability disclosure policies, to the extent researchers can subsequently justify their actions in court. Research that is already illegal or does not take reasonable steps for responsible disclosure would not succeed in claiming those protections in an ex-post investigation.

A Technical Safe Harbor for AI Safety and Trustworthiness Research Removes Practical Deterrence

Legal safe harbors alone do not prevent account suspensions or other technical enforcement actions, such as rate limiting. These technical obstacles can also impede independent safety evaluations. We refer to the protection of research against these technical enforcement measures as a technical safe harbor. Without sufficient technical protections for public interest research, an asymmetry can develop between malicious and non-malicious actors, since non-malicious actors are discouraged from investigating vulnerabilities exploited by malicious actors.

We propose that companies offer some path to eliminate these technical barriers for good-faith research, even when it can be critical of companies’ models. This would include more equitable opportunities for researcher access and would guarantee that those opportunities will not be foreclosed for researchers who adhere to companies’ vulnerability disclosure policies. One way to do this is to scale up researcher access programs and provide impartial review of applications for these programs. The challenge with implementing a technical safe harbor is distinguishing between legitimate research and malicious actors. An exemption to strict enforcement of companies’ policies may need to be reviewed in advance, or at least when an unfair account suspension occurs. However, we believe this problem is tractable with participation from independent third parties.

Conclusion

The need for independent AI evaluation has garnered significant support from academics, journalists, and civil society. We identify legal and technical safe harbors as minimum fundamental protections for ensuring independent safety and trustworthiness research. These protections would significantly improve ecosystem norms and drive more inclusive and unimpeded community efforts to tackle the risks of generative AI.

Filed Under

Deep Dive

Toward a Better Internet

Deep Dive

A Safe Harbor for AI Evaluation and Red Teaming

An argument for legal and technical safe harbors for AI safety and trustworthiness research

Independent Evaluation of AI Is Crucial for Uncovering Vulnerabilities

Terms of Service Can Discourage Community-Led Evaluations

Existing Safe Harbors Protect Security Research but Not Safety and Trustworthiness Research

Our Proposal: A Legal and Technical Safe Harbor

A Legal Safe Harbor for Red Teaming Reduces Barriers to Essential AI Research

A Technical Safe Harbor for AI Safety and Trustworthiness Research Removes Practical Deterrence

Conclusion

Further Reading

Policy proposals directly related to protecting types of AI research from liability from the DMCA or CFAA:

Independent algorithmic audits and their design:

Algorithmic bug bounties, safe harbors, and their design:

Other related proposals and red teaming work:

Filed Under

Tags