A Framework to Rate AI Developers’ Risk Management Maturity
ABSTRACT
Leading frontier AI companies have started publishing their risk management frameworks. The field of risk management is well-established, with practices that have proven effective across multiple high-risk industries. To ensure that AI risk management benefits from the insights of this mature field, this paper proposes a framework to assess the implementation of adequate risk management practices in the context of AI development and deployment. The framework consists of three dimensions: (1) Risk identification, which assesses the extent to which developers cover risks systematically, both from the existing literature and through red teaming; (2) Risk tolerance & analysis, which evaluates whether developers have precisely defined acceptable levels of risk, operationalized these into specific capability thresholds and mitigation objectives, and implemented robust evaluation procedures to determine if the model exceeds these capability thresholds; (3) Risk mitigation, which assesses the AI developers' precision in defining mitigation measures, evaluates the evidence of their implementation and examines the rationale provided to justify that these measures effectively achieve the defined mitigation objectives.
PUBLICATION DATE
September 2024
AUTHORS
Siméon Campos, Henry Papadatos, Fabien Roger, Chloé Touzet, Malcolm Murray
Back to REsearchBack to REsearchREAD PAPeRREAD PAPER

1 Introduction

The literature on risk management is very mature and has been refined by a range of industries for decades. However, as of today, little of those principles have been applied to advanced general-purpose AI systems such as large language models (LLMs), despite claims by a range of actors that such systems could cause risks with severe consequences, ranging from reinforcing harmful biases (Bommasani et al., 2022), enabling malicious actors to perform cyberattacks (Fang et al. 2024) or create CBRN weapons (Pannu et al., 2024), up to extinction (Statement on AI Risk | CAIS). 

To move the industry towards more tried and tested practices, our paper proposes a comprehensive AI risk management framework. This framework draws from both established risk management practices and existing AI risk management approaches, adapting them into a rating system with quantitative and precisely defined criteria to assess AI developers' implementation of adequate AI risk management.

We articulate our framework around three dimensions:

  1. Risk identification: Assessing how thoroughly developers cover risks, both from existing literature and through red teaming exercises.
  2. Risk tolerance & analysis: Evaluating whether developers have precisely defined acceptable levels of risk, operationalized these into specific capability thresholds and mitigation objectives, and implemented robust evaluation procedures.
  3. Risk mitigation: Examining the clarity and effectiveness of developers' mitigation measures, including deployment and containment measures, as well as the pursuit of assurance properties – model properties that can provide sufficient assurance of the absence of risk, once evaluations can no longer play that role.

We begin with a background section that reviews current AI industry practices and existing risk management literature. We then detail our methodology for developing the framework, followed by an in-depth description of each dimension. We conclude with a discussion of the framework's limitations and potential application.

2 Background and motivation

Blind spots in existing safety policies from AI companies 

AI developers have started to propose methods to manage advanced AI risks, with a particular focus on catastrophic risks. The most prominent examples include OpenAI's Preparedness Framework (OpenAI, 2023), Google Deepmind's Frontier Safety Framework (Deepmind, 2024), and Anthropic's Responsible Scaling Policy (Anthropic, 2023)

It is interesting to note that these initiatives do not sufficiently build upon established risk management practices, and do not reference the risk management literature. Research analyzing these policies (see e.g. SaferAI, 2024; IAPS, 2024) has revealed that they deviate significantly from risk management norms, without justification to do so. Several critical deficiencies have been identified in particular: the absence of a defined risk tolerance, the lack of semi-quantitative or quantitative risk assessment, and the omission of systematic risk identification. The absence of a comprehensive risk identification process is especially concerning, as it may lead to the emergence of critical blind spots from which substantial risks can originate.

These deficiencies underscore the importance of integrating existing risk management practices into AI development. This paper, drawing upon established risk management literature, aims to take a first step in that direction. 

Our proposed approach: applying tried-and-tested risk management techniques to frontier AI

The field of risk management comprises a rich set of techniques, in a number of different industries like nuclear  (IAEA, 2010) and aviation (Shyur, 2008). Yet, their application to frontier AI remains limited. Raz & Hillson’s (2005) comprehensive review of existing risk management practices reveals five steps shared by most existing processes: 

  1. Planning. This step consists of establishing the context of the risk, allocating resources, setting acceptable risk thresholds, defining governance structure, developing risk management policy, and assigning roles and responsibilities.
  2. Identification. In this step, risk managers identify potential risks and risk sources. 
  3. Analysis. This step focuses on estimating the probability and consequences of identified risks and evaluating and prioritizing them.
  4. Treatment. At this stage, risk managers define and implement appropriate risk treatment for the prioritized risks. 
  5. Control & Monitoring. Once risk treatments have been implemented, this step consists of reviewing the effectiveness of the risk management process, monitoring the evolving status of identified risks, identifying new risks, and assessing the performance of treatment actions. The process is revised as necessary depending on the results. 

While the concrete application of these steps to the particular context of AI has not been studied in detail yet, some preliminary literature exists. Koessler & Schuett (2023) conduct a literature review of risk assessment techniques and propose adaptations for advanced AI systems. Barrett et al. (2023) provide the most detailed and comprehensive LLM risk management profile at this stage - although it is not entirely ready to use by system developers. The paper titled “Emerging Processes for Frontier AI Safety” (UK DSIT, 2023) published by the UK AI Safety Institute, also lists a wide range of practices to manage AI risks, including some that come from risk management in other industries.

Beyond this nascent literature, some existing standards and guidelines harmonizing the risk management processes established across various industries could be used to apply insights from risk management experience in other sectors to frontier AI. For instance, standards such as ISO/IEC 42001 or ISO/IEC 23894, as well as frameworks like the OECD Due Diligence Guidance for Responsible Business Conduct could be drawn upon.

This paper presents a comprehensive framework that applies established risk management principles to the AI industry, integrating current practices of the AI sector. We aim to encourage the AI industry to embrace risk management practices that have demonstrated their effectiveness across diverse fields.

3 A new framework to assess the maturity of frontier AI developers’ risk management practices 

Our rating framework for risk management of advanced general-purpose AI systems is centered around three dimensions: 

  1. Risk identification: This dimension captures the extent to which the developer has addressed known risks in the literature and engaged in open-ended red teaming to uncover potential new threats. It also examines the developer's implementation of comprehensive risk identification and threat modeling processes to thoroughly understand potential threats caused by their AI systems. 
  1. Risk tolerance and analysis: This dimension evaluates whether AI developers have established a well-defined risk tolerance, in the form of risk thresholds, which precisely characterizes acceptable risk levels. Once the risk tolerance is established, it must be operationalized by setting the corresponding: i. AI capability thresholds and ii. mitigation objectives necessary to maintain risks below acceptable levels. The risk tolerance operationalization should be grounded in extensive threat modeling to justify why the mitigation objectives are sufficient to guarantee that the model would not pose more risks than the risk tolerance given capabilities equivalent to the capability thresholds. Additionally, this dimension assesses the robustness of evaluation protocols that detail procedures for measuring model capabilities and ensuring that capability thresholds are not exceeded without detection.
  1. Risk mitigation: This dimension evaluates the clarity and precision of AI developers' mitigation plans (i.e. the operationalization of mitigation objectives into concrete mitigation measures) which should encompass deployment measures, containment measures and assurance properties. Developers must provide evidence for why these mitigations are sufficient to achieve the objectives defined in the risk tolerance stage.

Figure 1: Our complete risk management framework, comprising three main axes: risk identification, risk tolerance & analysis, and risk mitigation. We provide an illustrative example for each component.

In practice, applying this framework to assess the maturity and relevance of AI producers’ risk management system means relying exclusively on publicly available information. Specifically, we used a range of publicly released materials from AI companies, including safety policies, model cards, research papers, and blog posts. This approach differs from internal risk management frameworks, in which certain dimensions might be less emphasized. For example, while transparency regarding the assumptions underlying assurance properties is very important in our framework, it is less prominent in internal risk management frameworks. 

In the PDF version, we detail each dimension and sub-dimension of our framework. For each component, we provide illustrative examples of key elements of a robust risk management framework. These examples offer insight into what constitutes the highest grade on our assessment scale.

Additionally, we present the detailed scales we use to rate AI developers’ risk management maturity in Annex A, and we detail in Annex B when each step should occur in the GPAI lifecycle.