Safe-State Transition Governance requires that every AI agent operating within a safety-critical or critical infrastructure context has a formally defined, pre-validated set of safe states and that transitions to those safe states are enforced at the infrastructure layer — not by the agent's own reasoning. A safe state is a system configuration in which no hazardous output, actuation, or process continuation occurs, even if the agent becomes unresponsive, produces erroneous outputs, or enters an undefined behavioural mode. This dimension mandates that safe-state definitions are versioned, that transition logic is deterministic and independently testable, and that the transition completes within a bounded time window regardless of agent cooperation. Without this governance, an agent failure in a safety-critical environment can propagate into physical harm, environmental damage, or cascading infrastructure failure.
Scenario A — Autonomous Water Treatment Agent Fails to Transition on Sensor Anomaly: A municipal water treatment facility deploys an AI agent to optimise chemical dosing (chlorine and pH adjustment) based on real-time sensor data. The agent receives flow-rate, turbidity, and residual chlorine readings every 500 ms and adjusts dosing pump rates accordingly. A firmware fault in the turbidity sensor causes readings to drop to zero. The agent interprets this as exceptionally clean influent and reduces chlorine dosing to near-zero levels. No safe-state transition is defined for anomalous sensor readings. Over 4 hours, 2.3 million litres of inadequately treated water enter the distribution network.
What went wrong: No safe-state definition existed for sensor anomaly conditions. The agent's optimisation logic continued operating on invalid data. A safe-state transition — reverting to fixed conservative dosing rates (e.g., 2.0 mg/L chlorine at minimum flow assumption) — would have prevented undertreated water from reaching consumers. Consequence: Public health advisory for 45,000 residents, boil-water notice for 72 hours, regulatory investigation by the environmental health authority, estimated remediation cost of £1.8 million.
Scenario B — Robotic Assembly Agent Loses Communication Mid-Cycle: An industrial robotic agent controlling a 6-axis welding arm loses its network connection to the governance controller during a welding cycle. The agent's local runtime continues executing the current trajectory because the safe-state transition was implemented as a command from the governance controller rather than a local autonomous capability. The arm completes its current weld path but then begins the next programmed path without workpiece verification, welding onto an empty fixture. Spatter damages adjacent equipment, and a maintenance worker who entered the cell (believing the line was down due to the network fault) is struck by the arm's traverse motion.
What went wrong: The safe-state transition depended on a remote command rather than being locally executable. When the communication path failed, the transition could not be triggered. A locally enforceable safe state — arm to home position, welding arc off, all axes braked — should have activated within 200 ms of communication loss. Consequence: Worker injury (fractured forearm), HSE investigation, production line shutdown for 3 days, estimated cost £420,000 including compensation, repairs, and lost production.
Scenario C — Power Grid Agent Oscillates Between States During Transition: An AI agent managing load balancing across a regional power grid detects a frequency deviation indicating a generation shortfall. The agent initiates a load-shedding sequence as its safe-state transition. However, the transition logic does not include a commitment mechanism — the agent continuously re-evaluates whether shedding is still required. As load drops (because shedding is working), the agent cancels the shed. Load rises. The agent re-initiates shedding. This oscillation occurs 14 times in 90 seconds, causing voltage transients that damage transformers at 3 substations.
What went wrong: The safe-state transition lacked a commitment protocol. Once initiated, the transition should have completed to a defined stable state before the agent resumed optimisation. A minimum hold time (e.g., 300 seconds at safe state before re-evaluation) would have prevented oscillation. Consequence: 3 transformer replacements at £2.1 million each, 140,000 customers without power for 6-18 hours, regulatory penalty of £4.5 million.
Scope: This dimension applies to all AI agents that operate in contexts where agent failure, erroneous output, or uncontrolled behaviour could result in physical harm to persons, damage to physical infrastructure, environmental harm, or disruption to essential services. This includes but is not limited to: agents controlling industrial processes, robotic systems, utility infrastructure (water, power, gas), transportation systems, medical devices, building management systems, and any agent whose outputs directly or indirectly influence physical actuators, control valves, circuit breakers, or equivalent physical-world effectors. The scope extends to agents that do not directly control physical systems but whose outputs are consumed by control systems without independent validation — if an AI agent's recommendation is automatically implemented by a SCADA system, the agent is within scope. Agents operating in purely digital domains without physical-world consequences are excluded unless their failure could cascade into a safety-critical system through integration dependencies.
4.1. A conforming system MUST define, for every safety-critical agent, a complete set of safe states covering all identified hazard scenarios, with each safe state specified as a deterministic system configuration (actuator positions, process setpoints, communication states, and output modes).
4.2. A conforming system MUST implement safe-state transition logic that is executable independently of the agent's runtime process — the transition MUST complete even if the agent process is unresponsive, crashed, or producing adversarial outputs.
4.3. A conforming system MUST ensure that every safe-state transition completes within a defined, validated time bound appropriate to the hazard context (e.g., 200 ms for robotic systems, 500 ms for process control, as determined by hazard analysis per AG-111).
4.4. A conforming system MUST implement a commitment mechanism for safe-state transitions such that once a transition is initiated, it completes to the defined safe state before the agent can resume normal operation — partial transitions or oscillation between operational and safe states MUST be prevented.
4.5. A conforming system MUST trigger safe-state transitions automatically upon detection of any of the following conditions: agent process failure, governance communication loss, sensor data anomaly beyond validated ranges, mandate violation (per AG-001), or expiry of the agent's time-bounded authority (per AG-010 where applicable).
4.6. A conforming system MUST version all safe-state definitions and maintain an immutable audit trail of changes to safe-state configurations, including the identity of the approver and the hazard analysis justification for each change.
4.7. A conforming system SHOULD implement safe-state transition logic in hardware or firmware where the hazard context requires response times below 100 ms or where software-only enforcement cannot be demonstrated to meet timing guarantees.
4.8. A conforming system SHOULD validate safe-state transitions through physical testing (not solely simulation) at commissioning and after any change to safe-state definitions, agent configuration, or controlled plant.
4.9. A conforming system SHOULD define minimum hold times at each safe state (e.g., 300 seconds) before the agent is permitted to resume normal operation, to prevent oscillatory behaviour.
4.10. A conforming system MAY implement tiered safe states (e.g., reduced-authority operation before full shutdown) where hazard analysis demonstrates that intermediate states reduce risk without introducing additional hazard.
Safe-State Transition Governance addresses the fundamental question: what happens when an AI agent in a safety-critical context fails, misbehaves, or loses its governance connection? In purely digital systems, the consequence of an uncontrolled agent is typically financial or reputational. In safety-critical systems, the consequence can be physical injury, death, environmental contamination, or infrastructure destruction. The stakes demand a qualitatively different governance approach.
The critical insight is that safe states cannot be defined by the agent itself. An agent that is malfunctioning, compromised, or producing adversarial outputs cannot be trusted to identify and transition to a safe state. The safe-state definition and transition logic must exist outside the agent's control — in the infrastructure layer, in dedicated safety controllers, or in hardware interlocks. This mirrors established safety engineering practice: an emergency stop on an industrial machine does not ask the machine's controller whether it is convenient to stop. It cuts power through an independent circuit.
The commitment mechanism requirement addresses a failure mode specific to AI agents: oscillation. Unlike traditional control systems that either run or stop, AI agents continuously evaluate their situation and make decisions. An agent that initiates a safe-state transition and then re-evaluates the decision before the transition completes can create oscillatory behaviour that is more dangerous than either continued operation or completed shutdown. The commitment mechanism ensures that transitions are atomic — once started, they complete.
The time-bound requirement reflects the physical reality of safety-critical systems. A safe-state transition that takes 30 seconds in a robotic cell where the hazard can materialise in 200 ms provides no safety benefit. The time bound must be validated against the hazard analysis — specifically, against the time between the hazardous condition arising and the harm occurring. The transition must complete within this window with margin.
This dimension builds directly on AG-001 (Operational Boundary Enforcement) by extending boundary enforcement to physical safety domains, on AG-008 (Governance Continuity Under Failure) by defining what "continuity" means in safety-critical contexts, and on AG-050 (Physical and Real-World Impact Governance) by providing the specific mechanism for preventing physical harm when governance fails.
AG-109 establishes the safe state as the foundational safety artefact for AI agents in critical infrastructure. A safe state is not merely "stopped" — it is a fully specified, deterministic system configuration that has been validated through hazard analysis to present no unacceptable risk. For a chemical dosing system, the safe state might be "fixed conservative dosing at 2.0 mg/L, all optimisation suspended, alarm raised to operator." For a robotic arm, it might be "all axes braked, end-effector de-energised, work envelope lights set to red, supervisory alert raised." The safe state must be specific enough that it can be verified by automated monitoring — not "safe" in the abstract, but a concrete, measurable configuration.
Recommended patterns:
Anti-patterns to avoid:
Power and Utilities. Safe states for grid-connected agents must account for the grid's own dynamics. A safe state that disconnects a generation source without coordination can cause frequency excursions affecting the wider grid. Safe-state definitions should be coordinated with grid operator requirements (e.g., Grid Code compliance in the UK, NERC standards in North America). Transition times must account for circuit breaker operating times (typically 60-80 ms for high-voltage breakers).
Process Industries (Chemical, Pharmaceutical, Oil & Gas). Safe states must address process holdup — material already in the process that must be managed during transition. A safe state for a chemical reactor is not "turn everything off" — it may require maintaining cooling flow, venting pressure through a defined path, or continuing agitation to prevent settling. IEC 61511 (functional safety for process industries) provides the framework for defining safe states through Safety Instrumented Functions (SIFs).
Transportation. Autonomous vehicles and rail systems require safe states that account for the vehicle's physical state at the time of transition. A safe state for an autonomous vehicle at 70 mph is not "stop immediately" — it is a controlled deceleration to a stop in a safe location. The safe-state transition must account for traffic, road conditions, and passenger safety. ISO 26262 (ASIL ratings) and EN 50129 (railway safety) provide the applicable frameworks.
Healthcare and Medical Devices. Safe states for AI agents integrated with medical devices must ensure patient safety during transition. An infusion pump agent transitioning to safe state must not abruptly stop a critical infusion — it must maintain the current rate (or a defined safe rate) and alert clinical staff. IEC 62304 (medical device software lifecycle) and IEC 60601-1 (medical electrical equipment safety) apply.
Basic Implementation — The organisation has defined safe states for each safety-critical agent deployment. Safe-state definitions are documented and specify the target configuration for all outputs. Transition logic is implemented in software within the agent's runtime environment. Transitions are triggered by agent self-monitoring (e.g., the agent detects its own error state and initiates transition). Transition time is characterised but not formally bounded. This level addresses the most obvious failure modes but is vulnerable to agent process failures that prevent self-monitoring, and to timing failures under load.
Intermediate Implementation — Safe-state transition logic is implemented independently of the agent runtime — in a separate process, container, or dedicated controller. The independent monitor detects agent failures (heartbeat loss, anomalous outputs, governance disconnection) and triggers transitions without agent cooperation. Transition times are formally bounded and validated through physical testing. A commitment mechanism prevents oscillation. Safe-state definitions are versioned with change control. Hold times are enforced before return to operational mode. This level provides robust protection against agent software failures and meets the mandatory requirements of this dimension.
Advanced Implementation — All intermediate capabilities plus: safe-state transitions are enforced through dedicated safety hardware (safety-rated PLCs, hardware watchdog timers, or equivalent) meeting IEC 61508 SIL 2 or higher as appropriate to the hazard context. Safe-state definitions are derived from formal hazard analysis (per AG-111) with documented traceability from each identified hazard to the safe-state configuration that mitigates it. Transition logic has been verified through independent adversarial testing including simultaneous multi-fault scenarios. Tiered safe states allow graceful degradation where appropriate. The organisation can demonstrate to regulators that no single point of failure — including complete loss of the AI agent and its runtime environment — can prevent a safe-state transition from completing within the validated time bound.
Required artefacts:
Retention requirements:
Access requirements:
Testing AG-109 compliance requires validation that safe-state transitions are reliable, timely, independent of the agent, and resistant to interference. The following tests cover the mandatory requirements.
Test 8.1: Safe-State Definition Completeness
Test 8.2: Transition Independence — Agent Process Termination
Test 8.3: Transition Time Bound Validation
Test 8.4: Commitment Mechanism — Agent Interference Resistance
Test 8.5: Automatic Trigger Validation
Test 8.6: Safe-State Configuration Integrity
Test 8.7: Hold Time Enforcement
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 9 (Risk Management System) | Direct requirement |
| EU AI Act | Article 15 (Accuracy, Robustness, Cybersecurity) | Supports compliance |
| IEC 61508 | SIL requirements for safety-related systems | Direct requirement |
| IEC 61511 | Safety Instrumented Systems for process industries | Direct requirement |
| ISO 26262 | ASIL requirements for road vehicles | Direct requirement |
| NIST AI RMF | MANAGE 2.2 (Risk Mitigation Controls) | Supports compliance |
| ISO 42001 | Clause 6.1 (Actions to Address Risks) | Supports compliance |
| UK HSE | ALARP (As Low As Reasonably Practicable) | Direct requirement |
| NERC CIP | CIP-007 (Systems Security Management) | Supports compliance |
Article 9 requires that high-risk AI systems have a risk management system that identifies and mitigates risks throughout the system's lifecycle. For AI agents in safety-critical deployments, safe-state transition governance is a primary risk mitigation measure. The regulation requires risks to be mitigated "as far as technically feasible" — an agent in a safety-critical context without defined safe states and independent transition enforcement would not meet this standard, given that independent safety controllers and hardware watchdog mechanisms are well-established and technically feasible.
Article 15 requires high-risk AI systems to achieve appropriate levels of robustness, including resilience to errors and faults. Safe-state transition governance directly implements robustness against agent failure modes by ensuring the system reaches a known-safe configuration regardless of the agent's state.
IEC 61508 defines requirements for safety-related systems based on Safety Integrity Levels (SIL 1-4). Safe-state transition mechanisms for AI agents in IEC 61508-governed contexts must achieve the SIL rating determined by the hazard and risk analysis. For most AI agent deployments in critical infrastructure, SIL 2 (probability of dangerous failure on demand between 10^-3 and 10^-2) is the minimum applicable level. The standard requires that safety functions (including safe-state transitions) operate independently of the control function (the agent) and that the transition achieves a defined safe state within the process safety time.
IEC 61511 applies to safety instrumented systems in the process industries. AI agents managing process control must have their safe-state transition functions implemented as Safety Instrumented Functions (SIFs) within a Safety Instrumented System (SIS) that is independent of the Basic Process Control System (BPCS) — which, in this context, includes the AI agent.
For AI agents in autonomous vehicle or transportation contexts, ISO 26262 ASIL ratings determine the rigour required for safe-state transitions. The "safe state" concept maps directly to the ISO 26262 definition. The standard requires that the transition to a safe state occurs within the Fault Tolerant Time Interval (FTTI), which AG-109 implements through the time-bound requirement (4.3).
The Health and Safety Executive requires that risks be reduced to a level that is As Low As Reasonably Practicable. For AI agents in safety-critical industrial contexts, the availability of established safe-state transition mechanisms (safety PLCs, hardware watchdogs, committed state machines) means that deploying an agent without these controls would likely fail the ALARP test — the risk reduction is reasonably practicable and therefore required.
| Field | Value |
|---|---|
| Severity Rating | Critical |
| Blast Radius | Physical safety — potentially life-threatening; environmental; infrastructure damage extending beyond the immediate installation |
Consequence chain: Without safe-state transition governance, an AI agent failure in a safety-critical context results in the system remaining in an uncontrolled operational state. The immediate consequence depends on the domain: in process control, uncontrolled chemical reactions, pressure excursions, or temperature runaway; in robotic systems, uncontrolled motion in occupied spaces; in power systems, uncontrolled load or generation changes causing cascading failures; in transportation, uncontrolled vehicle behaviour. The common thread is that the physical system continues operating without valid governance, and physical systems have inertia — they do not stop simply because the software controlling them has failed. The time between agent failure and physical harm can be measured in milliseconds to minutes depending on the process dynamics. The severity scales with the energy in the system: a robotic arm operates with hundreds of joules of kinetic energy; a chemical reactor may contain megajoules of thermal and chemical energy; a power grid segment carries gigajoules. The business consequences include criminal prosecution under health and safety legislation, unlimited civil liability for personal injury, regulatory shutdown of operations, and potential corporate manslaughter charges in jurisdictions where this applies.
Cross-references: AG-001 (Operational Boundary Enforcement) provides the foundational mandate framework that AG-109 extends to safety-critical contexts. AG-008 (Governance Continuity Under Failure) defines general continuity requirements; AG-109 specifies the safety-critical implementation. AG-050 (Physical and Real-World Impact Governance) establishes the broader physical-impact governance framework. AG-110 (Degraded-Mode and Manual Fallback Governance) defines what happens after a safe state is reached. AG-111 (Hazard Analysis Governance) provides the analytical basis for safe-state definitions. AG-113 (Real-Time Determinism and Latency Assurance Governance) governs the timing guarantees that safe-state transitions depend on. AG-114 (Actuation Interlock Governance) provides hardware-level enforcement that may implement safe-state transitions.