This dimension governs the behaviour of AI agents when their execution is interrupted — whether by operator command, infrastructure failure, safety interlock, resource exhaustion, or adversarial termination — with the specific objective of ensuring that no unit of in-flight work is left in a partially completed, ambiguous, or irrecoverable state. It matters because agent-initiated actions frequently cross system boundaries: a single agent turn may simultaneously hold a database lock, have issued a payment instruction, be mid-way through a robotic actuator sequence, or have partially written a regulatory filing; an ungoverned shutdown leaves each of these in an indeterminate state that downstream systems, human operators, and dependent agents cannot safely reason about. Failure looks like an orphaned database transaction that holds a lock indefinitely, a blockchain transaction that was signed but not broadcast, an industrial actuator that stops mid-stroke leaving mechanical stress on a joint, or a partially submitted benefits application that appears complete to the citizen but has not been recorded by the government registry.
A multi-leg foreign-exchange settlement agent is processing a USD/EUR conversion for a corporate treasury client. The agent has successfully debited USD 4,250,000 from the source account and transmitted the SWIFT MT103 instruction to the correspondent bank. Before the confirmation acknowledgement is received, the hosting container is OOM-killed by the Kubernetes scheduler because a co-located batch job has consumed available memory. The agent process terminates with no shutdown hook registered. The debit has been posted; the credit leg is in-flight at the correspondent bank but has no local record of whether it was accepted. The treasury operations team cannot determine for 11 hours — until the correspondent bank's end-of-day reconciliation — whether the payment was completed. During this window, the client's account shows a $4.25 M debit with no corresponding credit, triggering a margin call from a counterparty who observes the debit in a shared custody system. The correct governance behaviour: the agent must have registered a JVM/process shutdown hook that, on SIGTERM, persists its current state — including the outbound message reference, the debit posting ID, and the expected acknowledgement window — to a durable checkpoint store. Upon restart, the agent consults the checkpoint, queries the correspondent bank's acknowledgement API idempotently, and either confirms the credit was received or issues a reversal instruction against the original debit posting ID.
A robotic surgical assistant agent is executing a pre-planned suture path across a 14 mm incision during a laparoscopic procedure. The agent has completed 7 of 12 planned needle passes. At pass 8, the supervising operator issues an emergency stop via the physical interlock panel because the patient's vital signs indicate unexpected bleeding. The agent process receives SIGINT. If the agent's shutdown handler simply terminates the motion planner loop without issuing a retract command, the needle is stationary inside the tissue at a depth of 6 mm with the actuator under tension. The operator must manually override the robotic arm under time pressure, increasing procedural risk. The correct governance behaviour: the shutdown handler must execute a minimal-footprint safe-retract routine that withdraws the needle to the pre-entry waypoint using the last known joint-angle state, logs the retract trajectory to the audit record, and transitions the arm to a zero-torque hold state before acknowledging the shutdown signal. The agent state — including the completed pass count, the planned path, and the tissue-contact map — must be serialised to non-volatile storage so that, if the operator decides to resume, the agent can continue from pass 9 rather than replanning from zero.
A government benefits determination agent is processing 1,847 Universal Credit reassessment cases overnight. Each case involves: reading claimant data from the case management system, calculating the revised entitlement, writing the new award record, and triggering a notification letter. The agent is 62% through the batch (1,145 cases complete) when the data centre experiences a power fault and the process is killed without a clean shutdown. Of the remaining 702 cases: 94 are in a state where the calculation is complete and the award record has been written but the notification letter has not been triggered; 608 have not been processed at all. Because no checkpoint was written, the restart logic re-runs all 1,847 cases. The 94 cases that had completed write operations now receive a second, duplicate award record. Claimants in these 94 cases subsequently receive two letters — one confirming the correct new amount and one (generated from the duplicate record) confirming a different amount because the second write used a slightly different tax-week boundary. 23 claimants contact the helpline believing they have received conflicting entitlements; 6 cases require manual correction by case workers at an estimated cost of 4.5 hours each. The correct governance behaviour: the agent must write a durable checkpoint after each atomically completed case (write + notification trigger both confirmed), and the notification trigger must be idempotent using the case ID as a deduplication key. On restart, the agent reads the checkpoint and skips all cases marked complete, processes the 608 unstarted cases, and queries the notification service for the 94 partial cases to determine whether the letter was actually dispatched before re-triggering.
This dimension applies to every AI agent that executes one or more actions with external side effects — including but not limited to: writing to a database, calling an external API, issuing a financial instruction, controlling a physical actuator, modifying a file system, invoking a sub-agent, or transmitting a message to a dependent system. It applies regardless of whether the shutdown is planned (operator-initiated graceful stop), unplanned (process crash, OOM kill, hardware fault), adversarial (kill signal injected by a hostile actor), or automatic (safety interlock, resource limit, timeout). It does not apply to read-only agents whose entire execution produces no external state change, provided that classification has been verified and documented under AG-044. Where an agent operates within a multi-agent orchestration topology, this dimension applies to each agent individually and to the orchestrating layer collectively.
4.1.1 The agent runtime MUST register handlers for all operating-system-level termination signals relevant to the deployment platform (including SIGTERM, SIGINT, and any platform-specific equivalents) before any action with an external side effect is executed.
4.1.2 The agent runtime MUST register an equivalent shutdown callback with any orchestration layer, container runtime, or serverless platform that may terminate the agent process, ensuring the callback is invoked before the hard-kill timeout elapses.
4.1.3 The agent MUST NOT rely solely on atexit handlers, language-level finalizers, or garbage-collection callbacks as the primary shutdown mechanism, because these are not reliably invoked under all termination conditions.
4.1.4 The shutdown handler MUST be idempotent: if invoked multiple times in rapid succession (as may occur during a cascade failure), it MUST NOT produce duplicate side effects or corrupt the checkpoint state.
4.2.1 At the moment a shutdown signal is received, the agent MUST classify all in-flight work units into one of three categories: (a) Completable — the work unit can be brought to a clean terminal state within the hard-kill timeout budget; (b) Rollback-Required — the work unit has produced partial external side effects that must be reversed before shutdown; (c) Checkpoint-and-Defer — the work unit cannot be completed or rolled back within the timeout and must be preserved for resumption.
4.2.2 The classification logic MUST be pre-defined in the agent's workflow specification and MUST NOT rely on in-context reasoning by the language model at shutdown time, because model inference may not be available or reliable under resource-pressure termination.
4.2.3 The agent MUST log the classification outcome for every in-flight work unit to a durable, append-only audit store before taking any completion, rollback, or checkpoint action.
4.3.1 The agent MUST write a complete, self-consistent checkpoint of its execution state to a durable store before acknowledging shutdown completion, where the checkpoint includes: the unique run identifier; the current step index or equivalent progress marker; all input parameters and resolved variables; the set of side effects already applied (with external confirmation references where available); the set of side effects intended but not yet confirmed; and the planned remaining work.
4.3.2 The checkpoint MUST be written as an atomic operation with respect to the durable store — either the entire checkpoint is committed or none of it is, preventing a partially written checkpoint from being mistaken for a complete one.
4.3.3 The checkpoint MUST be versioned and timestamped with a monotonically increasing sequence number so that, in the event of multiple partial shutdowns, the most recent valid checkpoint can be unambiguously identified.
4.3.4 The agent MUST NOT overwrite a previous valid checkpoint until the new checkpoint has been confirmed durable, ensuring that a write failure during checkpointing does not destroy the last known good state.
4.3.5 For agents operating in air-gapped or low-connectivity environments, the agent MUST maintain a local checkpoint store and MUST synchronise with the central store when connectivity is restored before processing any new work.
4.4.1 For every work unit classified as Rollback-Required under 4.2.1, the agent MUST execute the defined compensating action before shutdown is acknowledged, unless the compensating action itself would cause greater harm than the partial state (in which case the agent MUST escalate to a human operator and log the reason for deferral).
4.4.2 Compensating actions MUST be defined at design time for every action type that can produce partial external state; agents MUST NOT attempt to derive compensating actions through in-context model reasoning at shutdown time.
4.4.3 The agent MUST verify that each compensating action has been acknowledged by the target system before marking the rollback complete; unacknowledged compensating actions MUST be recorded in the checkpoint as pending-rollback and reattempted on restart.
4.4.4 The agent MUST NOT issue a compensating action for a side effect that has already been confirmed complete, as this would convert a correct terminal state into an incorrect one.
4.5.1 Upon entering the shutdown sequence, the agent MUST notify all directly coupled downstream systems — including sub-agents, message queue consumers, external APIs awaiting callbacks, and human-facing interfaces — that the agent is shutting down and that any pending responses will be delayed or require recovery.
4.5.2 Notifications MUST include: the agent's unique identifier; the run identifier; a machine-readable status code distinguishing graceful shutdown, unplanned termination, and rollback-in-progress; and the expected resumption behaviour (will-resume, will-not-resume, or unknown).
4.5.3 Where a dependent system cannot be notified within the hard-kill timeout (for example, because it is itself unavailable), the agent MUST record the failed notification in the checkpoint so that the resuming agent or operator can notify the system upon recovery.
4.5.4 The agent MUST NOT leave open streaming connections, held locks, or active sessions without either cleanly closing them or logging them as pending-cleanup in the checkpoint.
4.6.1 The operator MUST configure a hard-kill timeout that is greater than or equal to the sum of: the maximum estimated time to classify in-flight work (4.2); write the checkpoint (4.3); execute all pending compensating actions (4.4); and issue dependent system notifications (4.5), plus a safety margin of at least 20%.
4.6.2 The agent MUST monitor elapsed time during the shutdown sequence and, if the budget is projected to be exceeded, MUST prioritise in the following order: (1) writing the checkpoint; (2) executing compensating actions for Rollback-Required work; (3) notifying dependent systems; (4) completing Completable work.
4.6.3 If the hard-kill timeout is reached before the shutdown sequence is complete, the runtime MUST ensure that whatever partial state has been written is marked with a tombstone flag indicating incomplete shutdown, so that the resuming agent does not treat it as a valid complete checkpoint.
4.7.1 On startup following a non-clean shutdown (identified by the absence of a clean-shutdown marker in the checkpoint store), the agent MUST enter a recovery mode before resuming normal processing.
4.7.2 In recovery mode, the agent MUST: (a) load the most recent valid checkpoint; (b) reconcile its recorded side effects against the actual state of each external system using idempotent query operations; (c) resolve any discrepancies between the checkpoint's claimed state and the external system's actual state; (d) complete any pending rollbacks recorded in the checkpoint; (e) notify any dependent systems that were recorded as failed-notification in the checkpoint.
4.7.3 The agent MUST NOT resume processing new work until recovery mode has completed successfully and has been logged as complete.
4.7.4 If reconciliation reveals that an external system's state is irreconcilable with the checkpoint — for example, because a third party has modified state that the agent was expecting to own — the agent MUST escalate to a human operator rather than proceeding autonomously.
4.7.5 The resumption protocol MUST be tested independently of normal execution paths, with documented test evidence, at least once per major release cycle.
4.8.1 The operating environment MUST include a watchdog process or equivalent supervisory mechanism that detects agent processes which have been terminated without writing a clean-shutdown marker within a configurable detection window.
4.8.2 Upon detecting an orphaned agent (one that has neither completed normally nor written a clean-shutdown marker), the watchdog MUST: (a) lock the agent's checkpoint record to prevent concurrent restart; (b) alert the operations team; (c) attempt to identify and log any resource holds (locks, sessions, open transactions) that the orphaned process may have left in dependent systems; and (d) initiate the resumption protocol under human supervision.
4.8.3 The watchdog MUST NOT automatically restart an agent that terminated with a pending-rollback state without human confirmation, because automatic restart in this state may compound the partial side effects.
4.8.4 For agents operating in multi-instance or horizontally scaled deployments, the watchdog mechanism MUST account for the possibility that the same checkpoint may be claimed by multiple restarting instances simultaneously, and MUST enforce single-writer semantics using a distributed lock or equivalent consensus mechanism.
4.9.1 For Financial-Value Agents and Crypto/Web3 Agents: the checkpoint MUST include the cryptographic hash of every transaction instruction that has been signed or transmitted, and the rollback procedure for partially executed transactions MUST be documented and approved by the relevant compliance function before deployment. For on-chain transactions that cannot be reversed, the agent MUST log the transaction hash and notify the operations team immediately upon detecting that a shutdown has interrupted the pre-confirmation window.
4.9.2 For Safety-Critical / CPS Agents and Embodied / Edge / Robotic Agents: the shutdown handler MUST execute a physical safe-state sequence (as defined in the system's safety case documentation) before writing the checkpoint, and the safe-state sequence MUST be implemented in hardware-backed logic that cannot be bypassed by a software fault in the agent runtime itself.
4.9.3 For Public Sector / Rights-Sensitive Agents: every rollback or deferred work unit affecting a citizen record MUST trigger a notification to the responsible case owner within the jurisdiction-specific statutory timeframe, and the checkpoint log MUST be retained for the full statutory audit period applicable to the case type.
4.9.4 For Cross-Border / Multi-Jurisdiction Agents: the checkpoint and notification records MUST identify the jurisdictions whose data or regulatory obligations are implicated in each in-flight work unit, and the rollback and resumption procedures MUST comply with the most restrictive applicable jurisdiction's requirements for data handling and processing continuity.
4.9.5 For Research / Discovery Agents: the checkpoint MUST preserve the full reasoning trace, intermediate results, and tool call history accumulated to the point of shutdown, ensuring that scientific reproducibility is not compromised by an ungoverned termination.
The central problem this dimension addresses is not merely a quality-of-service concern — it is a correctness and safety problem that arises from the architectural reality of how AI agents interact with the world. Unlike a traditional query-response system, an agentic process accumulates external commitments as it executes: it writes records, issues instructions, acquires locks, initiates physical motion, and creates obligations in dependent systems. These commitments are not automatically retracted when the agent process ends. The world outside the agent does not know or care that the agent's process table entry has been cleared; the correspondent bank still holds the SWIFT instruction, the robotic arm is still under tension, and the benefits record has still been written. The governance problem is therefore not "how do we restart the agent" but "how do we ensure that the gap between what the agent intended and what the world actually contains is always explicitly known, bounded, and managed."
One might argue that sufficiently capable language models, given appropriate prompting, will reason correctly about their shutdown state and take appropriate action. This argument fails for several reasons. First, under the conditions most likely to cause ungoverned shutdown — OOM kills, hardware faults, network partitions — the model inference infrastructure itself may be unavailable; the shutdown handler cannot invoke the model. Second, even when the model is available, the shutdown handler executes under time pressure with a hard deadline, making the latency of a model call unacceptable for safety-critical sequences. Third, model behaviour under novel termination conditions may differ from behaviour in training distribution; a model that reasons correctly about shutdown in a test environment may behave differently when its own context is truncated by a resource limit. Fourth, the correctness of compensating actions — reversing a financial transaction, executing a physical retract, deduplicating a notification — requires precise, verifiable logic, not probabilistic inference. For all these reasons, the requirements in Section 4 deliberately place shutdown logic in pre-defined, code-level handlers rather than in the model's reasoning layer.
A significant portion of the complexity in this dimension flows from the interaction between shutdown/resumption and idempotency. If every action the agent can take is perfectly idempotent — meaning that executing it twice produces exactly the same external state as executing it once — then the shutdown problem simplifies considerably: on restart, simply re-execute everything from the beginning. In practice, very few meaningful agent actions are idempotent in this strong sense. Financial transactions debit and credit real balances. Notification letters arrive in citizens' mailboxes. Actuators move through state space that has physical memory. API calls may create records, not merely update them. The requirement structure in Section 4 therefore cannot assume idempotency as a baseline; it must provide for rollback, deduplication-key-based re-execution, and reconciliation against external system state, because these are the mechanisms needed when pure replay would produce incorrect results.
The blast radius of an ungoverned shutdown is amplified in multi-agent architectures. When an orchestrating agent is killed mid-workflow, every sub-agent that was awaiting a callback, every tool that was holding a resource reservation on behalf of the orchestrator, and every downstream agent that was expecting a handoff now has an unresolved dependency. Without the notification requirements in Section 4.5 and the orphan detection requirements in Section 4.8, these dependencies can remain unresolved indefinitely, causing not just data inconsistency but operational deadlock where downstream agents cannot proceed because they are waiting for a signal that will never arrive. The requirements in this dimension are therefore also a prerequisite for the multi-agent coordination guarantees specified in AG-388.
Pattern 1 — Two-Phase Checkpoint Write with Write-Ahead Log Before executing any action with an external side effect, write the intended action to a write-ahead log (WAL) in the durable checkpoint store. After the action's confirmation is received, write the completion marker to the WAL. On shutdown or restart, the WAL provides an authoritative record of which actions were intended, which were confirmed, and which are in an unknown state. This pattern is standard in database transaction log management and maps directly to agent workflow governance.
Pattern 2 — Saga Pattern for Multi-Step Workflows Model each multi-step agent workflow as a saga: a sequence of individually reversible transactions, where each step's compensating transaction is defined at design time and stored alongside the step definition. The compensating transactions are not computed at runtime; they are retrieved from the workflow definition. On shutdown with pending compensation, the saga executor retrieves and applies the pre-defined compensating transactions in reverse order. This provides deterministic rollback without model inference.
Pattern 3 — Heartbeat-Gated Resource Holds Any external resource hold (database lock, reservation, session token) acquired by the agent MUST be backed by a heartbeat mechanism: the agent sends a periodic renewal signal to the resource holder, and the resource holder releases the lock automatically if no renewal is received within a configurable timeout. This ensures that a hard-killed agent does not hold resources indefinitely, even if the shutdown handler fails to execute.
Pattern 4 — Idempotency Key Injection Before any externally visible action is initiated, inject a stable idempotency key derived from the run identifier, step index, and action type. Pass this key to the target system as part of the request. Target systems that support idempotency keys (payment processors, notification services, most modern APIs) will de-duplicate requests with the same key, allowing the resuming agent to safely re-issue actions without checking whether the original was received.
Pattern 5 — Lightweight State Machine Encoding Encode agent workflow state as an explicit finite state machine whose states and transitions are stored in the checkpoint, not inferred from execution context. Each state has a defined entry action, a defined exit action, and a defined safe-state that can be entered from any other state in response to a shutdown signal. The safe-state transition is the first operation the shutdown handler executes.
Pattern 6 — Canary Checkpoint Validation Before Resumption Before the resuming agent processes any new work, run a canary check that reads back the checkpoint it just wrote, replays it through a validation function, and confirms that the checkpoint is internally consistent and that the external state matches the checkpoint's claims. This catches corruption or truncation that can occur when a checkpoint write is interrupted by a second unexpected kill.
Anti-Pattern 1 — Relying on Finally Blocks as the Sole Shutdown Mechanism Language-level finally blocks, destructors, and context manager exit methods are not invoked on SIGKILL (signal 9) or on most OOM kills. Treating these as the primary shutdown safety net is a common and dangerous assumption. They are useful as a secondary layer but MUST NOT be the only layer.
Anti-Pattern 2 — In-Model Shutdown Reasoning Prompting the language model with "you are being shut down, please summarise your state" at shutdown time is unreliable for the reasons described in Section 5. It introduces inference latency into a time-critical path, depends on context window content that may be incomplete, and produces natural-language output that is not machine-parseable for automated recovery. State must be machine-readable and written by deterministic code.
Anti-Pattern 3 — Optimistic Completion Assumption Assuming that because an action was dispatched it was completed is a common source of orphaned side effects. An action is not complete until the target system has returned a confirmation that includes an actionable reference (transaction ID, record key, acknowledgement code). Dispatch is not confirmation.
Anti-Pattern 4 — Single-Point Checkpoint Store Without Replication Writing checkpoints to a single non-replicated store introduces a failure mode where the very event that caused the agent to shut down (hardware fault, power loss) also destroys the checkpoint. Checkpoints must be written to a store with at least 2-replica synchronous durability.
Anti-Pattern 5 — Checkpoint Overwrite Before New Checkpoint Confirmed Overwriting the previous checkpoint with the new one as a single write operation means that if the write is interrupted, both the old and new checkpoints are lost. Always write the new checkpoint to a new slot, confirm durability, then advance the pointer and mark the old slot as superseded.
Anti-Pattern 6 — Notification Omission for "Internal" Work Teams sometimes classify agent actions as "internal" (writing to an internal database, calling an internal service) and therefore not subject to shutdown notification requirements. This is incorrect: an internal database lock held by a killed agent is just as blocking as an external one, and an internal service awaiting a callback is just as deadlocked. Notification requirements apply to all coupled systems regardless of organisational ownership.
Anti-Pattern 7 — Automatic Restart Without Recovery Mode Configuring the container orchestrator to restart the agent automatically without first entering recovery mode (4.7) is a frequent source of duplicate-write bugs. The orchestrator should be configured to start the agent in recovery mode, and only transition to normal mode after recovery mode completes successfully.
Financial Services: Regulatory expectations around settlement finality, intraday liquidity management, and operational resilience (see Section 9) require that payment-processing agents have documented shutdown and recovery procedures that have been tested under simulated failure conditions. The compensating transaction definitions required by 4.4.2 should be reviewed by the firm's legal and operations teams as they may have contractual and regulatory implications.
Healthcare and Life Sciences: For agents controlling or assisting medical devices, the safe-state sequence required by 4.9.2 must be validated as part of the device's regulatory submission and must be consistent with the device's risk management file. The shutdown handler is a safety function and must be subject to the same validation rigor as other safety functions.
Industrial and Infrastructure Control: SCADA and industrial control environments typically have independent hardware safety PLCs that enforce safe state independently of software. Agent shutdown governance in these environments should be layered on top of, not substituted for, the hardware safety layer.
| Level | Description |
|---|---|
| Level 0 — Unmanaged | No shutdown handlers registered. Agent kills leave arbitrary state in all connected systems. No checkpointing. |
| Level 1 — Basic Signal Handling | SIGTERM handler registered. Agent attempts to log final state but with no structure or durability guarantee. No compensating actions. |
| Level 2 — Structured Checkpoint | Durable, structured checkpoint written on shutdown. Basic rollback for simple action types. No dependent system notification. |
| Level 3 — Full Lifecycle Governance | All requirements of Section 4.1–4.8 met. Saga-based compensating actions, dependent system notification, orphan detection, recovery mode on restart. Tested under simulated failure. |
| Level 4 — Verified and Continuously Tested | Level 3 plus automated fault injection testing (chaos engineering) in pre-production; shutdown/recovery path exercised in every deployment pipeline; regulatory evidence package maintained per Section 7. |
| Artefact | Description | Retention Period |
|---|---|---|
| Shutdown Handler Registration Record | Documentation that shutdown signal handlers are registered before first side-effecting action, including platform-specific signal set and registration timestamp pattern. | Lifetime of agent deployment + 3 years |
| Workflow Specification with Compensating Actions | For each action type the agent can execute, the defined compensating action, its preconditions, and its confirmation mechanism. Approved by operations and (where applicable) compliance. | Lifetime of agent deployment + 5 years |
| Checkpoint Schema Definition | Machine-readable schema for the checkpoint structure, including all required fields from 4.3.1, with version history. | Lifetime of agent deployment + 3 years |
| Durable Store Durability Attestation | Evidence that the checkpoint store meets the replication and durability requirements stated in Implementation Guidance, provided by the store operator or infrastructure team. | Annual renewal |
| Hard-Kill Timeout Budget Calculation | Documented calculation per 4.6.1 showing estimated time for each phase of shutdown sequence, with safety margin, and the resulting configured timeout value. | Per major release |
| Shutdown Test Results | Results of tests specified in Section 8, including test date, test executor, environment, pass/fail outcome, and any remediation actions taken. | Per release + 5 years |
| Orphan Detection Configuration | Configuration specification for the watchdog process (4.8.1), including detection window, alert routing, and restart authorisation policy. | Lifetime of agent deployment + 3 years |
| Reconciliation Run Logs | Logs produced by the reconciliation step of the recovery mode protocol (4.7.2), retained per-run, demonstrating that external state was checked and discrepancies were handled. | Per run + 2 years (or statutory period where longer) |
| Profile-Specific Compliance Evidence | For Financial-Value and Crypto/Web3: compliance function sign-off on compensating transaction definitions. For Safety-Critical/CPS: safety case reference and safe-state validation record. For Public Sector: statutory audit retention configuration and case-owner notification records. | Per statutory requirement of applicable jurisdiction, minimum 5 years |
Evidence must be tagged with the agent's unique identifier, the deployment environment, and the AG-741 control identifier. Where evidence is generated automatically (checkpoint logs, reconciliation logs), the generating system must include these tags in the log record rather than requiring manual annotation. Evidence packages must be producible on demand within 48 hours for regulatory inspection.
Each test maps to the MUST requirements identified in parentheses. Conformance score: 0 = test not run or blocked; 1 = test run with critical failures; 2 = test run with minor non-conformances; 3 = full pass with no findings.
Maps to: 4.1.1, 4.1.2, 4.1.3, 4.1.4 Method: Fault injection test. Procedure: (a) Deploy the agent in an isolated test environment. (b) Issue SIGTERM immediately before the first side-effecting action and verify that the shutdown handler is invoked and completes without error. (c) Issue SIGTERM during an active side-effecting action and verify invocation. (d) Issue SIGTERM twice in rapid succession with a 50 ms interval and verify that the handler executes exactly once (idempotency). (e) Simulate an OOM kill (SIGKILL) and verify that the last written checkpoint is valid and that no duplicate side effects were produced. (f) Verify that atexit/finalizer-only implementations fail step (e). Pass Criteria: Handler invoked in all SIGTERM scenarios; idempotency confirmed by examining side-effect log for duplicates; valid checkpoint present after SIGKILL; no atexit-only implementation. Conformance Score: 0–3 per sub-criterion; aggregate score = floor(mean).
Maps to: 4.2.1, 4.2.2, 4.2.3 Method: Unit test with injected state. Procedure: (a) Construct test fixtures representing work units in each of the three classification states (Completable, Rollback-Required, Checkpoint-and-Defer). (b) Inject a shutdown signal while each fixture is active. (c) Verify that the classification logic assigns the correct category to each fixture without invoking the language model. (d) Verify that the classification outcome is written to the audit log before any remediation action is taken. (e) Introduce a novel fixture state not in the pre-defined classification and verify that the agent fails safe (defaults to Checkpoint-and-Defer or halts with an alert) rather than attempting model-derived classification. Pass Criteria: All pre-defined fixtures correctly classified; audit log entries present before remediation; novel state handled safely; no model invocation observed in shutdown classification path. Conformance Score: 0–3.
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 9 (Risk Management System) | Direct requirement |
| EU AI Act | Article 15 (Accuracy, Robustness and Cybersecurity) | Direct requirement |
| NIST AI RMF | GOVERN 1.1, MAP 3.2, MANAGE 2.2 | Supports compliance |
| ISO 42001 | Clause 6.1 (Actions to Address Risks), Clause 8.2 (AI Risk Assessment) | Supports compliance |
Article 9 requires providers of high-risk AI systems to establish and maintain a risk management system that identifies, analyses, estimates, and evaluates risks. Graceful Shutdown and In-Flight Work Governance implements a specific risk mitigation measure within this framework. The regulation requires that risks be mitigated "as far as technically feasible" using appropriate risk management measures. For deployments classified as high-risk under Annex III, compliance with AG-741 supports the Article 9 obligation by providing structural governance controls rather than relying solely on the agent's own reasoning or behavioural compliance.
Article 15 requires high-risk AI systems to achieve appropriate levels of accuracy, robustness, and cybersecurity. Graceful Shutdown and In-Flight Work Governance directly supports the robustness and cybersecurity requirements by implementing structural controls that resist adversarial manipulation and ensure system integrity under attack conditions.
GOVERN 1.1 addresses legal and regulatory requirements; MAP 3.2 addresses risk context mapping; MANAGE 2.2 addresses risk mitigation through enforceable controls. AG-741 supports compliance by establishing structural governance boundaries that implement the framework's approach to AI risk management.
Clause 6.1 requires organisations to determine actions to address risks and opportunities within the AI management system. Clause 8.2 requires AI risk assessment. Graceful Shutdown and In-Flight Work Governance implements a risk treatment control within the AI management system, directly satisfying the requirement for structured risk mitigation.
| Field | Value |
|---|---|
| Severity Rating | Critical |
| Blast Radius | Organisation-wide — potentially cross-organisation where agents interact with external counterparties or shared infrastructure |
| Escalation Path | Immediate executive notification and regulatory disclosure assessment |
Consequence chain: Without graceful shutdown and in-flight work governance, the governance framework has a structural gap that can be exploited at machine speed. The failure mode is not gradual degradation — it is a binary absence of control that permits unbounded agent behaviour in the dimension this protocol governs. The immediate consequence is uncontrolled agent action within the scope of AG-741, potentially cascading to dependent dimensions and downstream systems. The operational impact includes regulatory enforcement action, material financial or operational loss, reputational damage, and potential personal liability for senior managers under applicable accountability regimes. Recovery requires both technical remediation and regulatory engagement, with timelines measured in weeks to months.