In light of the second Code of Practice, we share a short memo explaining a crucial distinction between risk tolerance and other pre-mitigation thresholds like capabilities thresholds.
Risk tolerance is the risk one is willing to tolerate, accounting for mitigations. There’s one single risk tolerance per type of risk. There’s no reason to move it. It’s an all-things-considered quantity: whatever happens, we should never cross the risk tolerance.
Thresholds, such as capabilities thresholds, have a different function. They’re thresholds that we use for batching mitigations: rather than having custom mitigations for each new model and estimated risk, AI developers define intervals of capabilities within which they apply a pre-defined set of mitigations to all models.
Capabilities thresholds are a practical tool to minimize the burden of risk management; not a definition of the acceptable level of risk.
In fact, acceptable level of risk is currently left undefined in proto risk management frameworks like Responsible Scaling Policies or Preparedness Frameworks.
A failure to establish this distinction could allow for the actual risk arising from AI systems to keep increasing while the true magnitude of the risk remains unknown.