Blog

Reflecting on the Paris AI Action Summit: A Three Month Look Back at our Contributions

May 13, 2025

One of the main highlights of the French Paris AI Action Summit was the hundreds of side events to the main summit, focused on a variety of AI topics. Three months on from the Paris AI Action Summit, we're reflecting on our contributions. This post revisits our actions at the summit and at a number of side events that we were able to enable and contribute to.

Our feedback to the first EU general-purpose AI Code of Practice

April 18, 2025

SaferAI participated in the drafting of the first EU General-Purpose AI Code of Practice, which is intended to serve as a voluntary compliance tool for providers of general-purpose AI models under the EU AI Act. We gave detailed feedback on all three drafts of the Code. Our two most important recommendations are 1) a strengthened risk identification process as well as 2) a methodology to define risk tiers based on harm, risk scenarios and risk sources.

Informing the Code of Practice: A Hierarchical Methodology of Defining Risk Tiers

April 11, 2025

To inform the drafting of the Code, we proposed a hierarchy of approaches to define risk tiers that helps organize harm levels, risk scenario levels, and capability levels for general-purpose AI models in a structured, ordered way.

We Need to Avert an AI Safety Winter - SaferAI’s commentary published by RUSI

March 11, 2025

Following the Paris AI Action Summit, the UK based think-tank RUSI (Royal United Services Institute) asked SaferAI’s executive director Siméon Campos and Head of Policy Chloé Touzet for their reading of the state of AI safety post-Paris. Pointing in particular at the potential for a UK-EU “AI risk management bloc”, Campos and Touzet argued that while winter winds blew through AI Safety at the Paris Summit, new Springs may be around the corner. Neither the EU nor UK can solve international AI governance alone, but their potential collaboration – grounded in scientific risk assessment – brightens future prospects.

On the Importance of Distinguishing Risk Tolerance and KRI Thresholds

January 16, 2025

Understanding Risk Tolerance vs. Capabilities Thresholds in AI Safety: A key distinction in AI governance where risk tolerance represents a fixed acceptable risk level accounting for mitigations, while capabilities thresholds serve as practical checkpoints for implementing standardized safety measures.

Anthropic’s Responsible Scaling Policy Update Makes a Step Backwards

October 23, 2024

Anthropic's recent update to their Responsible Scaling Policy resulted in a downgrade of their safety rating from 2.2 to 1.9, placing them in the "weak" category alongside OpenAI and DeepMind. The downgrade is due to the new policy's shift away from precise, quantitative benchmarks and mitigation measures toward more qualitative descriptions, which reduces transparency and accountability.

The First AI Risk Management Ratings Expose a Two-Speed Industry with Common Shortcomings

October 2, 2024

We produced the first risk management rating on artificial intelligence risk management practices. Our analysis documents significant shortcomings across the industry.

6 Expert-Backed Claims on AI Risk Management

July 10, 2024

Following a workshop focused on risk management frameworks and risk thresholds for frontier AI, which brought together top experts in AI risk management and key policymakers, we present six expert-driven claims that emerged from the discussions.

An Opinionated Literature Review to Inform the EU Codes of Practice on GPAI with Systemic Risks

June 14, 2024

To succeed, the EU Code of Practice will have to rely on a fast evolving scientific literature. That's why we drafted "An Opinionated Literature Review to Inform the EU Codes of Practice on GPAI with Systemic Risks": to provide, in one single document, pointers to what we judge to be the most helpful references for each of the items that the GPAI Code of Practice will have to cover.

A short overview of AI-related biorisks

April 21, 2024

In late 2023, SaferAI ran a workshop on AI misuse to foster consensus amongst key experts in the field. As part of this workshop, we drafted a collaborative document providing a brief literature review on specific facets of AI risk, which we thought would be useful for the broader public.

Distinguishing Audits, Evals, and Red-Teaming

February 1, 2024

In conversations about AI safety, you often hear people mention audits, red-teaming, and evaluation. While related, there are important differences. In short, evaluations (evals) and red-teaming can be a part of an auditing process. A full audit looks at both the AI model and the organization. Here's how we think about the relationships between the three concepts.

Is OpenAI's Preparedness Framework better than its competitors' "Responsible Scaling Policies"? A Comparative Analysis

January 19, 2024

Reminiscent of the release of the ill-named Responsible Scaling Policies (RSPs) developed by their rival Anthropic, OpenAI has just released their Preparedness Framework (PF) which fulfills the same role. How do the two compare?

RSPs Are Risk Management Done Wrong

October 25, 2023

We compare Anthropic's "Responsible Scaling Policy" with the risk management standard ISO/IEC 31000, identify gaps and weaknesses, and propose some pragmatic improvements to the RSP.

SaferAI OECD Post: Basic Safety Requirements for AI Risk Management

July 5, 2023

There are three basic criteria I think will make AI risks manageable. For good risk management, models need to be interpretable, boundable and corrigible.

Slowing Down AI: Rationales, Proposals, and Difficulties

May 31, 2023

Our world is one where AI advances at breakneck speed, leaving society scrambling to catch up. This has sparked discussions about slowing AI development. We explore this idea, delving into the reasons why society might want to have slowdown in its policy toolbox. This includes preventing a race to the bottom, giving society a moment to adapt, and mitigating some of the more worrisome risks that AI poses.

Probabilistic Risk Assessment - Promises, Benefits and Challenges

Probabilistic risk assessment (PRA) is a systematic methodology to evaluate risks associated with a complex engineered system such as an airline or nuclear power plant. This methodology, used by organizations such as the Nuclear Regulatory Commission and NASA, quantifies risk according to their magnitude and severity. Here's how to apply it to evaluation of AI risks.