The recent development of powerful AI systems has highlighted the need for robust risk management frameworks in the AI industry. Although companies have begun to implement safety frameworks, current approaches often lack the systematic rigor found in other high-risk industries. This paper presents a comprehensive risk management framework for the development of frontier AI that bridges this gap by integrating established risk management principles with emerging AI-specific practices. The framework consists of four key components: (1) risk identification (through literature review, open-ended red-teaming, and risk modeling), (2) risk analysis and evaluation using quantitative metrics and clearly defined thresholds, (3) risk treatment through mitigation measures such as containment, deployment controls, and assurance processes, and (4) risk governance establishing clear organizational structures and accountability. Drawing from best practices in mature industries such as aviation or nuclear power, while accounting for AI’s unique challenges, this framework provides AI developers with actionable guidelines for implementing robust risk management. The paper details how each component should be implemented throughout the life-cycle of the AI system - from planning through deployment - and emphasizes the importance and feasibility of conducting risk management work prior to the final training run to minimize the burden associated with it.
Frontier AI poses increasing risks to public safety and security. Managing these risks requires implementing sound risk management practices. The development of these has been the focus of several initiatives, including the Frontier Safety Commitments adopted at the May 2024 Seoul AI Safety Summit and the G7 Hiroshima Code of Conduct. This paper complements emerging practices from AI developers with risk management practices in other industries and suggestions for how to adopt them in the field of advanced AI.
The risk management framework introduced in this paper enables its users to implement four key risk management functions: identifying risk (risk identification), defining acceptable risk levels and analyzing identified risks (risk analysis & evaluation), mitigating risk to maintain acceptable levels(risk treatment), and ensuring that organizations have the appropriate corporate structure to execute this workflow consistently and rigorously (risk governance).
The core goal of this risk management framework workflow is to ensure that risks remain below unacceptable levels at all times through the following process:
1. Define a risk tolerance—a well-characterized acceptable level of risk that should not be exceeded.
2. Through risk modeling, operationalize this risk tolerance into pairs of empirically measurable thresholds:
Key Risk Indicators (KRIs): measurable signals that serve as proxies for risks (e.g., model performance on specific tasks)
Key Control Indicators (KCIs): measurable signals that serve as proxies for the effectiveness of mitigations (e.g., success rate of containment measures)
These thresholds follow a three-way relationship: for any given risk tolerance level and KRI threshold, there exists a minimum required KCI threshold that must be met to maintain risk below the tolerance.
3. Implement mitigations to achieve the required KCI thresholds whenever KRI thresholds are reached.
The risk tolerance is distinct from capabilities thresholds and can be defined in two ways:
Quantitatively using probability and severity: The preferred approach expresses risk tolerance as a product of quantitative probability and severity per unit of time.
Using scenarios with quantitative probabilities: For risks where severity is difficult to quantify, risk tolerance can be expressed as a quantitative probability bound on a qualitatively defined harmful scenario.
The framework takes advantage of the life-cycle of an AI system to minimize the burden on AI developers, once they complete model training:
To avoid delays during the training phase, all preparatory work that doesn’t require the full model can be done ahead of time: risk modeling, defining risk tolerance, identifying KRI and KCI thresholds, and predicting required mitigations using scaling laws.
This leaves only KRI measurement and open-ended red-teaming (to identify new risk factors that were not identified in the initial risk modeling) for the training and pre-deployment phase.
On risk governance, the framework describes a corporate structure designed to ensure proportionate accounting of risks in decision-making. It includes:
Risk owner. The risk owner is a person personally responsible for the management of a particular risk.
Oversight. The oversight function is board-level oversight of senior management’s decision making regarding risk.
Audit. The audit function is an independent function isolated from peer pressure dynamics that can challenge decision-making.