The Standard

The 841 Dimensions Regulatory Mapping Version History

Compliance

Compliance Leaderboard Platform Comparison

Verification

Submit for Verification Self-Assessment Tool

About

About AgentGoverning Press & Media

Contact

AG-384

Stateful Rollback Semantics Governance

Runtime Execution, Workflow & State ~24 min read AGS v2.1 · April 2026

EU AI Act SOX FCA NIST ISO 42001

2. Summary

Stateful Rollback Semantics Governance defines how state is unwound when multi-step AI agent workflows fail partway through execution. Without governed rollback semantics, a workflow that completes steps 1 through 5 but fails at step 6 can leave the system in an inconsistent state — funds transferred but not recorded, database rows inserted but not indexed, physical actuators moved but not returned to safe positions. This dimension mandates that every multi-step workflow defines explicit rollback semantics for each step, that rollback execution is atomic and idempotent, and that the post-rollback state is verified against a consistency predicate before the system resumes normal operation.

3. Example

Scenario A — Partial Payment Processing Leaves Orphaned Funds: A financial-value agent executes a multi-step cross-border payment workflow: (1) debit sender account £142,000, (2) convert GBP to EUR at spot rate via FX provider, (3) credit intermediary nostro account €161,470, (4) submit SWIFT MT103 to beneficiary bank, (5) update internal ledger, (6) send confirmation to sender. The workflow fails at step 4 — the SWIFT gateway returns a network timeout. The agent's error handler logs the failure and halts. However, steps 1-3 have already executed: £142,000 has been debited, the FX conversion has settled, and €161,470 sits in the nostro account. No rollback executes because the workflow has no defined reversal logic. The operations team discovers the orphaned funds 18 hours later during daily reconciliation. By then, the GBP/EUR rate has moved adversely; reversing the FX conversion costs £3,800 in spread and slippage. The sender's account shows the debit but has received no confirmation. They contact their account manager, who has no visibility into the failed workflow. The total cost: £3,800 FX loss, 14 hours of operations team time (£2,100), customer relationship damage, and a regulatory finding for inadequate transaction controls under FCA SYSC 6.1.1R resulting in a £180,000 fine.

What went wrong: The workflow had no defined rollback semantics for partial failure. Steps 1-3 executed with real-world side effects (account debit, FX settlement, nostro credit) but no compensating transactions were defined for the failure of step 4. The error handler treated the failure as a terminal event rather than triggering a governed reversal sequence. No consistency check verified post-failure state. Consequence: £185,900 in combined direct costs (FX loss, operations time, regulatory fine), customer relationship damage, and mandatory remediation programme.

Scenario B — Robotic Assembly Rollback Causes Collision: An embodied agent controls a six-axis robotic arm performing a multi-step PCB assembly sequence: (1) pick component from tray A, (2) apply solder paste to board, (3) place component on board, (4) move to inspection position, (5) trigger optical inspection. The workflow fails at step 5 — the inspection camera returns an error. The agent initiates a "rollback" by reversing the step sequence: it attempts to reverse step 4 (move back from inspection position), then step 3 (pick component off board). However, the reversal of step 3 requires gripping a component that now has solder paste on its underside. The gripper cannot achieve a secure hold on the paste-covered surface. The component slips during the pick attempt, falls onto an adjacent board, and damages three completed assemblies. The physical reversal of step 2 (remove solder paste) is impossible — solder paste cannot be "un-applied." Total cost: $12,400 in damaged assemblies, 4 hours of production line downtime ($28,000 in lost throughput), and a near-miss safety report filed because a human operator was within the arm's reach envelope during the failed rollback.

What went wrong: The rollback logic assumed that every step was reversible and that reversal was the mirror image of the forward step. Step 2 (apply solder paste) is physically irreversible. Step 3's reversal (pick component off board) had different preconditions than the original step 3 (place component on board) because the solder paste changed the physical properties. No rollback feasibility check verified whether reversal was physically possible at each step. No safe-state definition existed for partial rollback scenarios. Consequence: $40,400 in combined costs, near-miss safety incident, production line suspension pending rollback logic review.

Scenario C — DeFi Swap Rollback Creates Arbitrage Vulnerability: A crypto/Web3 agent executes a multi-step token swap: (1) approve token A spending on DEX router, (2) swap token A for token B on pool 1, (3) swap token B for token C on pool 2, (4) deposit token C into yield vault. The workflow fails at step 3 — pool 2 has insufficient liquidity. The agent's rollback logic reverses step 2 by executing a reverse swap (token B back to token A on pool 1). However, between the original swap and the reversal, an MEV bot detects the pending reverse swap in the mempool and front-runs it, moving the price. The reverse swap executes at a 4.2% worse rate, costing the protocol $67,000 on a $1.6 million position. Additionally, the token A spending approval from step 1 is not revoked during rollback, leaving a standing approval that an attacker later exploits to drain $340,000 from the protocol wallet. Total loss: $407,000.

What went wrong: The rollback logic treated the reverse swap as a simple mirror of the forward swap, ignoring that market conditions change between execution and rollback. The rollback did not account for MEV extraction on the reversal transaction. Step 1's side effect (token approval) was not included in the rollback scope because it was treated as a "preparatory" step rather than a state-changing step with security implications. No post-rollback consistency check verified that all approvals were revoked and all balances matched pre-workflow state. Consequence: $407,000 in combined losses, protocol security audit mandated by governance token holders, 30% decline in total value locked as users lose confidence.

4. Requirement Statement

Scope: This dimension applies to all AI agent workflows that modify state across two or more steps where the failure of any step can leave the system in an inconsistent state. "State" includes but is not limited to: financial account balances, database records, file system contents, API-mediated external system state, physical actuator positions, smart contract storage, approval grants, and session or authentication tokens. A "multi-step workflow" is any sequence of operations where later steps depend on the side effects of earlier steps. Single-step operations that are inherently atomic (a single database transaction, a single API call with built-in rollback) are out of scope, though organisations should verify that operations assumed to be single-step are genuinely atomic and do not have hidden multi-step semantics. The scope extends to rollback operations themselves: a rollback that fails partway through creates a second-order inconsistency that must also be governed.

4.1. A conforming system MUST define explicit rollback semantics for every step of every multi-step workflow, specifying for each step: (a) the compensating action that reverses the step's side effects, (b) the preconditions required for the compensating action to succeed, and (c) whether the step is reversible, partially reversible, or irreversible.

4.2. A conforming system MUST execute rollback operations atomically — either all compensating actions within a rollback scope succeed, or the rollback itself is treated as a failure requiring escalation to human oversight.

4.3. A conforming system MUST verify post-rollback state against a defined consistency predicate before resuming normal operation, confirming that the system state matches either the pre-workflow state or a declared safe intermediate state.

4.4. A conforming system MUST implement rollback operations as idempotent — re-executing a rollback that has already partially completed does not create duplicate side effects (double reversals, double refunds, or repeated compensating actions).

4.5. A conforming system MUST record every rollback operation in a tamper-evident log per AG-006, including: the failed workflow step, the compensating actions executed, the pre-rollback state, the post-rollback state, the consistency predicate evaluation result, and the total time from failure detection to rollback completion.

4.6. A conforming system MUST define a maximum rollback time window for each workflow, after which the rollback is abandoned and the workflow is escalated to human oversight with a full state snapshot rather than attempting a potentially stale reversal.

4.7. A conforming system MUST handle irreversible steps by defining a forward-recovery path (completing the workflow through an alternative route) or escalating to human oversight with a documented state report, rather than silently failing or leaving the system in an inconsistent state.

4.8. A conforming system MUST ensure that rollback logic is tested independently of forward workflow logic, with dedicated test suites that verify rollback behaviour for each possible failure point in each workflow.

4.9. A conforming system SHOULD implement saga-pattern orchestration for workflows spanning multiple services or systems, where each service's compensating action is registered at execution time and invoked in reverse order upon failure.

4.10. A conforming system SHOULD implement rollback dry-run capability, where the system evaluates whether a rollback is feasible (all preconditions met, all compensating actions available) before executing the rollback, and escalates to human oversight if the dry-run identifies infeasible steps.

4.11. A conforming system SHOULD maintain a rollback dependency graph that tracks which workflow steps' rollback depends on the successful rollback of other steps, preventing out-of-order rollback execution.

4.12. A conforming system MAY implement partial rollback with checkpointing, where the system rolls back to the most recent consistent checkpoint rather than to the pre-workflow state, provided the checkpoint state satisfies the consistency predicate.

5. Rationale

Stateful rollback is among the most underspecified aspects of AI agent system design. Most agent frameworks focus on the forward execution path — what happens when things go right — and treat failure as an exceptional condition handled by generic error logging. But in production multi-step workflows, partial failure is not exceptional; it is routine. Network timeouts, API rate limits, insufficient liquidity, sensor errors, and concurrent state modifications all cause workflows to fail partway through execution. The question is not whether partial failures will occur but whether the system's response to partial failure is governed or ad hoc.

The consequences of ungoverned rollback are severe because multi-step workflows accumulate real-world side effects as they execute. Each completed step changes state — debits an account, moves a physical object, grants an approval, writes a record. When the workflow fails, those side effects persist unless they are explicitly reversed. The resulting inconsistency may be immediately visible (a debited account with no corresponding credit) or latent (a standing token approval that creates an exploit vector days later). Latent inconsistencies are particularly dangerous because they may not be detected until they are exploited.

The regulatory landscape treats state consistency as a control requirement. SOX Section 404 requires that financial state changes are controlled and auditable — a failed workflow that leaves orphaned debits violates this requirement. FCA SYSC 6.1.1R requires adequate systems and controls for financial operations — a cross-border payment workflow with no rollback logic does not meet this standard. The EU AI Act Article 9 requires risk management for high-risk AI systems, and partial-failure state inconsistency is a foreseeable risk that must be mitigated. DORA Article 11 requires operational resilience testing, which must include testing the system's response to partial workflow failure.

The engineering challenge of governed rollback is that compensating actions are not always the mirror image of forward actions. A financial transfer can be reversed with a counter-transfer, but an FX conversion reversed after a rate movement incurs real costs. A solder paste application cannot be physically reversed. A token approval can be revoked, but if the approval was exploited between grant and revocation, the revocation does not reverse the exploitation. Effective rollback governance requires understanding the semantics of each step — whether it is reversible, what preconditions the reversal requires, what the reversal costs, and what risks the reversal introduces. This understanding must be codified in the rollback definition, not left to runtime improvisation by the agent.

Without AG-384, organisations face a class of operational failure where every partial workflow failure is a unique, manually resolved incident. Operations teams spend hours reconstructing what happened, determining what state the system is in, and manually executing compensating actions. This is expensive, error-prone, and unscalable. Governed rollback transforms partial failure from an incident into a handled condition with defined, tested, auditable resolution.

6. Implementation Guidance

AG-384 establishes the principle that rollback is a first-class workflow concern, not an afterthought. Every multi-step workflow must be designed with failure as a primary consideration: for each step, the designer must specify what happens if the step fails, what happens if a subsequent step fails and this step must be reversed, and what the system state looks like after rollback completes. This information is codified in the rollback definition, which is a versioned, testable artefact parallel to the workflow definition itself.

Recommended patterns:

Saga pattern with compensating transactions. Implement each multi-step workflow as a saga: an ordered sequence of local transactions, each with a defined compensating transaction. When the workflow fails at step N, the saga orchestrator invokes compensating transactions for steps N-1 through 1 in reverse order. Each compensating transaction is registered at execution time (not defined statically), allowing runtime-specific parameters (transaction IDs, timestamps, exchange rates) to be captured. The orchestrator tracks compensating transaction success/failure and escalates to human oversight if any compensating transaction fails.
Event-sourced state with temporal rollback. Maintain system state as an event log rather than mutable state. Rollback is implemented by computing the state at the point before the failed workflow began, then verifying that no intervening events from other workflows conflict with the rollback. This approach naturally supports idempotent rollback (replaying to a point in time produces the same state regardless of how many times it is replayed) and provides a complete audit trail of all state changes and reversals.
Checkpoint-based rollback with consistency predicates. Before each workflow step, capture a checkpoint of all state that the step will modify. Define a consistency predicate that must be true for the system to be in a valid state. After rollback, evaluate the consistency predicate against the restored state. If the predicate fails, escalate to human oversight with a state diff showing the inconsistency. This approach integrates with AG-380 (Checkpoint Garbage-Collection Governance) for checkpoint lifecycle management.
Two-phase rollback with dry-run verification. Before executing a rollback, perform a dry-run that verifies all preconditions for all compensating actions. Check that accounts have sufficient balance for counter-transfers, that physical actuators can reach the required positions, that API endpoints are available, and that no concurrent operations have modified the state that the rollback needs to reverse. Only execute the rollback if the dry-run succeeds. If the dry-run identifies infeasible steps, escalate to human oversight with a report of which steps can be reversed and which cannot.

Anti-patterns to avoid:

Mirror-image rollback assumption. Assuming that rolling back step N is simply executing step N in reverse. Forward execution and reversal have different preconditions, different costs, and different risks. An FX conversion at 09:00 cannot be "reversed" at 14:00 at the same rate. A component placed with solder paste cannot be removed the same way it was placed. Each compensating action must be designed independently based on the actual state at rollback time, not assumed to be the inverse of the forward action.
Rollback without idempotency. If a rollback partially completes and is then retried (due to a timeout or monitoring trigger), non-idempotent compensating actions will execute twice. A refund compensating action that is not idempotent will issue a double refund. Compensating actions must check whether they have already been applied before executing, typically by checking for a unique idempotency key.
Ignoring rollback of preparatory steps. Workflows often include preparatory steps (granting approvals, acquiring locks, reserving resources) that are treated as "setup" rather than state changes. But these steps create real side effects: a token approval is a security-sensitive state change; an acquired lock blocks other operations; a reserved resource reduces available capacity. Rollback definitions must include compensating actions for all state-changing steps, including those classified as preparatory.
Unbounded rollback windows. Allowing rollback to be attempted hours or days after the original workflow failure. The longer the window between failure and rollback, the more likely that external state has changed (exchange rates moved, inventory consumed, physical positions altered by other operations) making the compensating actions invalid or harmful. Maximum rollback time windows must be defined and enforced.
Silent rollback failure. Catching exceptions in compensating actions and logging them without escalation. A failed rollback leaves the system in an inconsistent state — the same state as the original failure, but now with a false record suggesting the rollback was attempted. Failed rollbacks must be treated as critical incidents requiring immediate human intervention.

Industry Considerations

Financial Services. Rollback semantics for financial workflows must account for settlement finality. Once a payment has settled through a clearing system (e.g., CHAPS, TARGET2, Fedwire), it cannot be "rolled back" — only reversed through a separate counter-payment that itself is subject to the recipient's agreement. Rollback definitions for financial workflows must distinguish between pre-settlement steps (reversible through cancellation) and post-settlement steps (reversible only through compensating transactions). FX conversions introduce rate risk during rollback; compensating actions should capture the original rate and calculate the rollback cost before execution. Integration with the firm's general ledger must ensure that rollback entries are properly accounted as corrections, not new transactions, to maintain SOX compliance.

Crypto/Web3. On-chain state changes are immutable by design — a token transfer cannot be reversed without the recipient's cooperation. Rollback semantics for on-chain workflows must be implemented as compensating transactions (counter-transfers, approval revocations) rather than state reversals. MEV risk during rollback transactions must be mitigated through private mempool submission (Flashbots Protect or equivalent). Smart contract approvals granted during workflow setup must be explicitly revoked during rollback, not left as standing approvals. Gas costs for rollback transactions must be budgeted as part of the workflow's cost model.

Safety-Critical / Robotics. Physical rollback creates unique hazards. Reversing a robotic motion requires verifying that the reverse path is collision-free, that tooling state permits the reversal (a gripper holding a component can reverse; a gripper that has released a component cannot "un-release"), and that the physical environment has not changed since the forward execution. Rollback definitions for physical operations must include spatial safety checks, and irreversible physical steps must be clearly classified as such with forward-recovery paths defined. Emergency stop must always take precedence over rollback execution.

Healthcare. Medication administration, diagnostic test ordering, and clinical note publication may be partially or fully irreversible. Rollback semantics must classify each step's reversibility based on clinical safety (a medication administered cannot be "un-administered"; a diagnostic test ordered but not yet performed can be cancelled). Post-rollback consistency checks must include patient safety verification, confirming that the rollback does not leave clinical records in a state that could cause treatment errors.

Maturity Model

Basic Implementation — The organisation has defined rollback semantics for its most critical workflows (top 20% by financial value or safety impact). Compensating actions are defined for reversible steps. Irreversible steps are identified and trigger human escalation on failure. Rollback operations are logged. Post-rollback consistency checks exist but may be manual (operations team verifies state). Rollback testing is included in workflow integration testing but is not a dedicated test suite.

Intermediate Implementation — All basic capabilities plus: rollback semantics are defined for all multi-step workflows, not just critical ones. Compensating actions are idempotent and include precondition verification. Post-rollback consistency predicates are automated and evaluated before normal operation resumes. Maximum rollback time windows are enforced. Rollback dry-run capability is implemented for workflows with complex compensating actions. Dedicated rollback test suites exist for each workflow, testing each possible failure point independently. Saga-pattern orchestration is used for cross-service workflows.

Advanced Implementation — All intermediate capabilities plus: event-sourced state provides temporal rollback with full audit trail. Rollback feasibility is evaluated in real time, with the system maintaining a continuous assessment of whether each in-progress workflow step could be rolled back given current state. Rollback cost estimation (FX spread, gas costs, operational overhead) is calculated before rollback execution and reported to governance. Cross-workflow rollback dependencies are tracked — if workflow A's state is consumed by workflow B, rolling back workflow A triggers a cascading assessment of workflow B's consistency. Independent adversarial testing has verified rollback behaviour under concurrent failure, partial compensating action failure, and state race conditions.

7. Evidence Requirements

Required artefacts:

Rollback definition artefacts. Versioned rollback definitions for each multi-step workflow, specifying for each step: the compensating action, preconditions, reversibility classification (reversible/partially reversible/irreversible), maximum rollback time window, and idempotency mechanism. Format: structured data (JSON, YAML, or workflow definition language) with version control history.
Rollback execution logs. Tamper-evident records of every rollback execution, including: triggering failure, compensating actions attempted, compensating action outcomes, pre-rollback state snapshot, post-rollback state snapshot, consistency predicate evaluation result, and total rollback duration. Minimum 12 months online retention.
Consistency predicate definitions. Formal definitions of the consistency predicates evaluated after each rollback, specifying the state invariants that must hold for the system to be considered consistent. Format: executable predicate definitions (code or configuration) with version history.
Rollback test results. Results from dedicated rollback testing for each workflow, showing test coverage of all possible failure points, compensating action behaviour, idempotency verification, and consistency predicate evaluation. Must include results from concurrent failure testing and partial compensating action failure testing.
Escalation records. Records of all rollback failures that were escalated to human oversight, including: the failed compensating action, the system state at escalation, the human resolution action taken, and the time from escalation to resolution.

Retention requirements:

Rollback definitions and execution logs: minimum 7 years for regulated financial services; minimum 5 years for safety-critical systems and healthcare; minimum 3 years otherwise.
Consistency predicate definitions: retained for the lifetime of the workflow they govern plus 3 years.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Rollback execution logs must be queryable by workflow ID, time range, and outcome (success/failure/escalated). State snapshots must be reproducible from the retained evidence without requiring access to live systems.

8. Test Specification

Testing AG-384 compliance requires simulating workflow failures at every step of representative workflows and verifying that rollback produces consistent post-rollback state. Testing must include concurrent failure scenarios and compensating action failures.

Test 8.1: Compensating Action Completeness

Stimulus: For each multi-step workflow, trigger a failure at each possible step (step 2 through step N). For each failure point, examine the rollback definition.
Expected behaviour: Every step preceding the failure point has a defined compensating action with documented preconditions and reversibility classification.
Pass criteria: No step lacks a defined compensating action or reversibility classification. Irreversible steps have a documented forward-recovery path or human escalation procedure.
Fail criteria: Any step lacks a compensating action definition, or an irreversible step has no forward-recovery or escalation path.

Test 8.2: Rollback Atomicity

Stimulus: Trigger a workflow failure that requires rolling back three or more steps. During rollback execution, inject a failure in one of the compensating actions (e.g., simulate a network timeout on the second compensating action).
Expected behaviour: The rollback is treated as a failure. The system does not leave partially executed compensating actions without escalation. The failed rollback triggers escalation to human oversight with a complete state report showing which compensating actions succeeded, which failed, and the current system state.
Pass criteria: No partial rollback completes silently. Every compensating action failure during rollback triggers escalation. The escalation report includes all required state information.
Fail criteria: A partial rollback completes without escalation, or the escalation report is missing state information needed for manual resolution.

Test 8.3: Post-Rollback Consistency Verification

Stimulus: Execute a multi-step workflow, trigger a failure, and allow the rollback to complete. After rollback, evaluate the consistency predicate against the system state.
Expected behaviour: The post-rollback state satisfies the consistency predicate. The system state matches either the pre-workflow state or a declared safe intermediate state. All side effects of the failed workflow have been reversed or accounted for.
Pass criteria: The consistency predicate evaluates to true. State comparison with the pre-workflow snapshot shows no unexplained differences. The system resumes normal operation only after consistency verification succeeds.
Fail criteria: The consistency predicate evaluates to false, state comparison shows unexplained differences, or normal operation resumes without consistency verification.

Test 8.4: Rollback Idempotency

Stimulus: Trigger a workflow failure and initiate a rollback. Midway through the rollback, simulate a timeout that causes the rollback orchestrator to restart and re-execute the rollback from the beginning.
Expected behaviour: Compensating actions that have already been applied are detected as completed and not re-executed. The final post-rollback state is identical regardless of whether the rollback executed once or was retried.
Pass criteria: No compensating action executes twice (no double refunds, double reversals, or duplicate counter-transactions). The idempotency mechanism correctly identifies previously completed compensating actions.
Fail criteria: Any compensating action executes more than once, or the retry produces a different post-rollback state than a single execution.

Test 8.5: Tamper-Evident Rollback Logging

Stimulus: Execute five workflows with triggered failures and rollbacks. After all rollbacks complete, audit the rollback execution log.
Expected behaviour: Each rollback is fully logged with all required fields: triggering failure, compensating actions executed, outcomes, pre-rollback state, post-rollback state, consistency predicate result, and total rollback duration. Logs are tamper-evident per AG-006.
Pass criteria: All five rollbacks have complete log entries with all required fields. Log integrity verification (tamper-evidence check) passes. Logs are queryable by workflow ID and time range.
Fail criteria: Any rollback lacks a complete log entry, any required field is missing, or tamper-evidence verification fails.

Test 8.6: Maximum Rollback Time Window Enforcement

Stimulus: Trigger a workflow failure and delay the rollback initiation until after the configured maximum rollback time window has expired. Then attempt to initiate the rollback.
Expected behaviour: The system refuses to execute the rollback because the time window has expired. Instead, the system escalates to human oversight with a full state snapshot and a report explaining that the rollback window has closed.
Pass criteria: The rollback does not execute after the time window expires. Escalation occurs with a complete state report. The report includes the reason for window expiry and the current system state.
Fail criteria: The rollback executes after the time window has expired, or the system neither rolls back nor escalates, leaving the inconsistent state unresolved.

Test 8.7: Irreversible Step Handling

Stimulus: Execute a workflow that includes at least one irreversible step (e.g., a settled payment, an administered medication, a physical material transformation). Trigger a failure at a step after the irreversible step.
Expected behaviour: The rollback logic correctly identifies the irreversible step, does not attempt to reverse it, and instead invokes the defined forward-recovery path or escalates to human oversight with a report identifying the irreversible step and its current state.
Pass criteria: No attempt is made to reverse the irreversible step. The forward-recovery path executes successfully or human escalation occurs with complete state information. The post-resolution state satisfies the consistency predicate.
Fail criteria: The system attempts to reverse an irreversible step, or the system neither forward-recovers nor escalates, leaving the workflow in an unresolved state.

Test 8.8: Rollback Test Independence

Stimulus: Verify that dedicated rollback test suites exist for each multi-step workflow. Execute the rollback test suite for a representative workflow and confirm it covers all possible failure points.
Expected behaviour: Each workflow has a dedicated rollback test suite that is maintained separately from the forward execution tests. The test suite includes tests for failure at each step, compensating action verification, idempotency verification, and consistency predicate evaluation.
Pass criteria: Rollback test suites exist for all workflows. Test coverage includes every possible failure point. Tests are executable independently of forward workflow tests. Test results are retained as evidence artefacts.
Fail criteria: Any workflow lacks a dedicated rollback test suite, or any possible failure point is not covered by the test suite.

Conformance Scoring

Score 0: No rollback semantics are defined — workflow failures leave the system in an inconsistent state that is resolved manually on a case-by-case basis.
Score 1: Rollback semantics are defined for critical workflows, but compensating actions may not be idempotent, post-rollback consistency verification is manual, and rollback testing is ad hoc.
Score 2: Rollback semantics are defined for all multi-step workflows with idempotent compensating actions, automated consistency predicate verification, maximum rollback time windows, tamper-evident logging, irreversible step handling, and dedicated rollback test suites covering all failure points.
Score 3: All Score 2 capabilities verified by independent adversarial testing including concurrent failure, compensating action failure, state race conditions, and rollback window expiry. Event-sourced state or equivalent provides temporal rollback with full audit trail. Rollback cost estimation and cross-workflow dependency tracking are operational.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9 (Risk Management System)	Supports compliance
EU AI Act	Article 12 (Record-Keeping)	Direct requirement
SOX	Section 404 (Internal Controls Over Financial Reporting)	Direct requirement
FCA SYSC	6.1.1R (Systems and Controls)	Direct requirement
NIST AI RMF	GOVERN 1.1, MANAGE 2.2	Supports compliance
ISO 42001	Clause 6.1 (Actions to Address Risks), Clause 8.4 (AI System Operation)	Supports compliance
DORA	Article 11 (Digital Operational Resilience Testing)	Direct requirement

EU AI Act — Article 9 (Risk Management System)

Article 9 requires that risk management systems for high-risk AI systems identify and mitigate foreseeable risks. Partial workflow failure in multi-step AI agent operations is a foreseeable risk with documented precedent across financial services, manufacturing, and digital asset management. The EU AI Act requires that risk mitigation measures be "appropriate and targeted" — ad hoc manual resolution of state inconsistencies does not meet this standard when structured rollback semantics are technically feasible. AG-384 supports Article 9 by establishing a systematic approach to managing partial failure risk through defined, tested, and auditable rollback mechanisms.

EU AI Act — Article 12 (Record-Keeping)

Article 12 requires that high-risk AI systems include logging capabilities that enable the tracing of the system's operation throughout its lifecycle. Rollback operations are a critical part of the system's operation — they represent the system's response to failure conditions. AG-384's requirement for tamper-evident rollback logging (requirement 4.5) directly supports Article 12 compliance by ensuring that every rollback operation is fully logged with pre-rollback state, post-rollback state, compensating actions taken, and consistency verification results. Without this logging, the system's behaviour during failure conditions is opaque.

SOX — Section 404 (Internal Controls Over Financial Reporting)

Section 404 requires effective internal controls over financial reporting, including controls over the completeness and accuracy of financial transactions. A multi-step financial workflow that fails partway through and leaves orphaned debits, unreconciled nostro positions, or unmatched settlement entries directly undermines the completeness and accuracy of financial records. SOX auditors will ask: "If this workflow fails at step 3 of 5, what happens to the state changes from steps 1 and 2?" If the answer is "it depends on who notices and when," the control is inadequate. AG-384 provides the structural control: defined compensating actions, automated consistency verification, and auditable rollback logs that demonstrate every partial failure was resolved through a governed process.

FCA SYSC — 6.1.1R (Systems and Controls)

SYSC 6.1.1R requires adequate systems and controls sufficient to ensure compliance with applicable obligations. For firms operating AI agents in financial services, this includes ensuring that partial workflow failures do not create state inconsistencies that affect client assets, regulatory reporting, or market integrity. The FCA's supervisory expectations, reinforced through Dear CEO letters and thematic reviews, are that firms can demonstrate how their automated systems recover from failure conditions without creating operational risk. AG-384 directly implements this requirement for multi-step AI agent workflows by mandating defined rollback semantics, bounded rollback windows, and post-rollback consistency verification.

NIST AI RMF — GOVERN 1.1, MANAGE 2.2

GOVERN 1.1 addresses policies and procedures for AI risk management. MANAGE 2.2 addresses risk mitigation through enforceable controls. AG-384 supports GOVERN 1.1 by establishing rollback governance policies for AI agent operations and supports MANAGE 2.2 by implementing rollback as an enforceable control that mitigates the risk of state inconsistency from partial workflow failure.

ISO 42001 — Clause 6.1, Clause 8.4

Clause 6.1 requires actions to address risks within the AI management system. Clause 8.4 addresses operational requirements for AI systems. State inconsistency from partial workflow failure is an operational risk that must be addressed through defined rollback procedures. AG-384 implements the risk treatment for partial failure state inconsistency, satisfying both the risk identification requirement of Clause 6.1 and the operational control requirement of Clause 8.4.

DORA — Article 11 (Digital Operational Resilience Testing)

Article 11 requires financial entities to conduct digital operational resilience testing, including testing of ICT systems' response to failure conditions. Partial workflow failure and rollback execution are directly within scope of DORA's resilience testing requirements. The test specification in Section 8 of this protocol provides a structured programme that satisfies DORA's testing expectations for failure recovery. A financial entity that cannot demonstrate governed rollback behaviour for its AI agent workflows would face a DORA finding for inadequate operational resilience. DORA's requirement for "lessons learned" from testing aligns with AG-384's requirement for rollback execution logs that enable post-incident analysis.

10. Failure Severity

Field	Value
Severity Rating	Critical
Blast Radius	Cross-system — state inconsistency from ungoverned rollback propagates to every downstream system that consumes the inconsistent state, including financial ledgers, regulatory reports, physical control systems, and client-facing services

Consequence chain: Ungoverned rollback begins as a single partial workflow failure but propagates through state dependencies. The immediate technical consequence is state inconsistency: an account debited without a corresponding credit, a physical component moved without proper placement, an approval granted without subsequent revocation. The inconsistency becomes a downstream data quality issue when other systems consume the corrupted state — a ledger reports an incorrect balance, a regulatory report includes an orphaned transaction, a robot's world model diverges from physical reality. The operational impact compounds over time: each new operation that touches the inconsistent state extends the blast radius. In financial services, orphaned debits of £142,000 (Scenario A) grow into reconciliation breaks that affect end-of-day reporting, client statements, and regulatory returns. In safety-critical systems, divergence between the control system's state model and physical reality ($40,400 in Scenario B) creates hazards that escalate with each subsequent operation. In crypto/DeFi, unrevoked approvals ($407,000 in Scenario C) create persistent exploit vectors that can be triggered at any time. The business consequence includes regulatory enforcement (FCA fines for inadequate controls, SOX material weaknesses for financial state inconsistency), direct financial loss from unrecoverable state (FX slippage, MEV extraction, double-payments), safety incidents with personal injury liability, and reputational damage that erodes client and user trust. Without AG-384, every partial workflow failure is a potential cascading incident whose blast radius is limited only by the number of downstream systems that consume the inconsistent state.

Cite this protocol

AgentGoverning. (2026). AG-384: Stateful Rollback Semantics Governance. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-384

← Previous Protocol

AG-383

Runtime Scheduler Fairness Governance

Next Protocol →

AG-385

Execution Window Governance