On the Importance of Distinguishing Risk Tolerance and KRI Thresholds
January 16, 2025
Understanding Risk Tolerance vs. Capabilities Thresholds in AI Safety: A key distinction in AI governance where risk tolerance represents a fixed acceptable risk level accounting for mitigations, while capabilities thresholds serve as practical checkpoints for implementing standardized safety measures.
Read more
Anthropic’s Responsible Scaling Policy Update Makes a Step Backwards
October 23, 2024
Anthropic's recent update to their Responsible Scaling Policy resulted in a downgrade of their safety rating from 2.2 to 1.9, placing them in the "weak" category alongside OpenAI and DeepMind. The downgrade is due to the new policy's shift away from precise, quantitative benchmarks and mitigation measures toward more qualitative descriptions, which reduces transparency and accountability.
Read more
The First AI Risk Management Ratings Expose a Two-Speed Industry with Common Shortcomings
October 2, 2024
We produced the first risk management rating on artificial intelligence risk management practices. Our analysis documents significant shortcomings across the industry.
Read more
6 Expert-Backed Claims on AI Risk Management
July 10, 2024
Following a workshop focused on risk management frameworks and risk thresholds for frontier AI, which brought together top experts in AI risk management and key policymakers, we present six expert-driven claims that emerged from the discussions.
Read more
An Opinionated Literature Review to Inform the EU Codes of Practice on GPAI with Systemic Risks
June 14, 2024
To succeed, the EU Code of Practice will have to rely on a fast evolving scientific literature. That's why we drafted "An Opinionated Literature Review to Inform the EU Codes of Practice on GPAI with Systemic Risks": to provide, in one single document, pointers to what we judge to be the most helpful references for each of the items that the GPAI Code of Practice will have to cover.
Read more
A short overview of AI-related biorisks
April 21, 2024
In late 2023, SaferAI ran a workshop on AI misuse to foster consensus amongst key experts in the field. As part of this workshop, we drafted a collaborative document providing a brief literature review on specific facets of AI risk, which we thought would be useful for the broader public.
Read more
Distinguishing Audits, Evals, and Red-Teaming
February 1, 2024
In conversations about AI safety, you often hear people mention audits, red-teaming, and evaluation. While related, there are important differences. In short, evaluations (evals) and red-teaming can be a part of an auditing process. A full audit looks at both the AI model and the organization. Here's how we think about the relationships between the three concepts.
Read more
Is OpenAI's Preparedness Framework better than its competitors' "Responsible Scaling Policies"? A Comparative Analysis
January 19, 2024
Reminiscent of the release of the ill-named Responsible Scaling Policies (RSPs) developed by their rival Anthropic, OpenAI has just released their Preparedness Framework (PF) which fulfills the same role. How do the two compare?
Read more
RSPs Are Risk Management Done Wrong
October 25, 2023
We compare Anthropic's "Responsible Scaling Policy" with the risk management standard ISO/IEC 31000, identify gaps and weaknesses, and propose some pragmatic improvements to the RSP.
Read more
SaferAI OECD Post: Basic Safety Requirements for AI Risk Management
July 5, 2023
There are three basic criteria I think will make AI risks manageable. For good risk management, models need to be interpretable, boundable and corrigible.
Read more
Slowing Down AI: Rationales, Proposals, and Difficulties
May 31, 2023
Our world is one where AI advances at breakneck speed, leaving society scrambling to catch up. This has sparked discussions about slowing AI development. We explore this idea, delving into the reasons why society might want to have slowdown in its policy toolbox. This includes preventing a race to the bottom, giving society a moment to adapt, and mitigating some of the more worrisome risks that AI poses.
Read more
Probabilistic Risk Assessment - Promises, Benefits and Challenges
Probabilistic risk assessment (PRA) is a systematic methodology to evaluate risks associated with a complex engineered system such as an airline or nuclear power plant. This methodology, used by organizations such as the Nuclear Regulatory Commission and NASA, quantifies risk according to their magnitude and severity. Here's how to apply it to evaluation of AI risks.
Read more