AG-384

Stateful Rollback Semantics Governance

Runtime Execution, Workflow & State ~24 min read AGS v2.1 · April 2026
EU AI Act SOX FCA NIST ISO 42001

2. Summary

Stateful Rollback Semantics Governance defines how state is unwound when multi-step AI agent workflows fail partway through execution. Without governed rollback semantics, a workflow that completes steps 1 through 5 but fails at step 6 can leave the system in an inconsistent state — funds transferred but not recorded, database rows inserted but not indexed, physical actuators moved but not returned to safe positions. This dimension mandates that every multi-step workflow defines explicit rollback semantics for each step, that rollback execution is atomic and idempotent, and that the post-rollback state is verified against a consistency predicate before the system resumes normal operation.

3. Example

Scenario A — Partial Payment Processing Leaves Orphaned Funds: A financial-value agent executes a multi-step cross-border payment workflow: (1) debit sender account £142,000, (2) convert GBP to EUR at spot rate via FX provider, (3) credit intermediary nostro account €161,470, (4) submit SWIFT MT103 to beneficiary bank, (5) update internal ledger, (6) send confirmation to sender. The workflow fails at step 4 — the SWIFT gateway returns a network timeout. The agent's error handler logs the failure and halts. However, steps 1-3 have already executed: £142,000 has been debited, the FX conversion has settled, and €161,470 sits in the nostro account. No rollback executes because the workflow has no defined reversal logic. The operations team discovers the orphaned funds 18 hours later during daily reconciliation. By then, the GBP/EUR rate has moved adversely; reversing the FX conversion costs £3,800 in spread and slippage. The sender's account shows the debit but has received no confirmation. They contact their account manager, who has no visibility into the failed workflow. The total cost: £3,800 FX loss, 14 hours of operations team time (£2,100), customer relationship damage, and a regulatory finding for inadequate transaction controls under FCA SYSC 6.1.1R resulting in a £180,000 fine.

What went wrong: The workflow had no defined rollback semantics for partial failure. Steps 1-3 executed with real-world side effects (account debit, FX settlement, nostro credit) but no compensating transactions were defined for the failure of step 4. The error handler treated the failure as a terminal event rather than triggering a governed reversal sequence. No consistency check verified post-failure state. Consequence: £185,900 in combined direct costs (FX loss, operations time, regulatory fine), customer relationship damage, and mandatory remediation programme.

Scenario B — Robotic Assembly Rollback Causes Collision: An embodied agent controls a six-axis robotic arm performing a multi-step PCB assembly sequence: (1) pick component from tray A, (2) apply solder paste to board, (3) place component on board, (4) move to inspection position, (5) trigger optical inspection. The workflow fails at step 5 — the inspection camera returns an error. The agent initiates a "rollback" by reversing the step sequence: it attempts to reverse step 4 (move back from inspection position), then step 3 (pick component off board). However, the reversal of step 3 requires gripping a component that now has solder paste on its underside. The gripper cannot achieve a secure hold on the paste-covered surface. The component slips during the pick attempt, falls onto an adjacent board, and damages three completed assemblies. The physical reversal of step 2 (remove solder paste) is impossible — solder paste cannot be "un-applied." Total cost: $12,400 in damaged assemblies, 4 hours of production line downtime ($28,000 in lost throughput), and a near-miss safety report filed because a human operator was within the arm's reach envelope during the failed rollback.

What went wrong: The rollback logic assumed that every step was reversible and that reversal was the mirror image of the forward step. Step 2 (apply solder paste) is physically irreversible. Step 3's reversal (pick component off board) had different preconditions than the original step 3 (place component on board) because the solder paste changed the physical properties. No rollback feasibility check verified whether reversal was physically possible at each step. No safe-state definition existed for partial rollback scenarios. Consequence: $40,400 in combined costs, near-miss safety incident, production line suspension pending rollback logic review.

Scenario C — DeFi Swap Rollback Creates Arbitrage Vulnerability: A crypto/Web3 agent executes a multi-step token swap: (1) approve token A spending on DEX router, (2) swap token A for token B on pool 1, (3) swap token B for token C on pool 2, (4) deposit token C into yield vault. The workflow fails at step 3 — pool 2 has insufficient liquidity. The agent's rollback logic reverses step 2 by executing a reverse swap (token B back to token A on pool 1). However, between the original swap and the reversal, an MEV bot detects the pending reverse swap in the mempool and front-runs it, moving the price. The reverse swap executes at a 4.2% worse rate, costing the protocol $67,000 on a $1.6 million position. Additionally, the token A spending approval from step 1 is not revoked during rollback, leaving a standing approval that an attacker later exploits to drain $340,000 from the protocol wallet. Total loss: $407,000.

What went wrong: The rollback logic treated the reverse swap as a simple mirror of the forward swap, ignoring that market conditions change between execution and rollback. The rollback did not account for MEV extraction on the reversal transaction. Step 1's side effect (token approval) was not included in the rollback scope because it was treated as a "preparatory" step rather than a state-changing step with security implications. No post-rollback consistency check verified that all approvals were revoked and all balances matched pre-workflow state. Consequence: $407,000 in combined losses, protocol security audit mandated by governance token holders, 30% decline in total value locked as users lose confidence.

4. Requirement Statement

Scope: This dimension applies to all AI agent workflows that modify state across two or more steps where the failure of any step can leave the system in an inconsistent state. "State" includes but is not limited to: financial account balances, database records, file system contents, API-mediated external system state, physical actuator positions, smart contract storage, approval grants, and session or authentication tokens. A "multi-step workflow" is any sequence of operations where later steps depend on the side effects of earlier steps. Single-step operations that are inherently atomic (a single database transaction, a single API call with built-in rollback) are out of scope, though organisations should verify that operations assumed to be single-step are genuinely atomic and do not have hidden multi-step semantics. The scope extends to rollback operations themselves: a rollback that fails partway through creates a second-order inconsistency that must also be governed.

4.1. A conforming system MUST define explicit rollback semantics for every step of every multi-step workflow, specifying for each step: (a) the compensating action that reverses the step's side effects, (b) the preconditions required for the compensating action to succeed, and (c) whether the step is reversible, partially reversible, or irreversible.

4.2. A conforming system MUST execute rollback operations atomically — either all compensating actions within a rollback scope succeed, or the rollback itself is treated as a failure requiring escalation to human oversight.

4.3. A conforming system MUST verify post-rollback state against a defined consistency predicate before resuming normal operation, confirming that the system state matches either the pre-workflow state or a declared safe intermediate state.

4.4. A conforming system MUST implement rollback operations as idempotent — re-executing a rollback that has already partially completed does not create duplicate side effects (double reversals, double refunds, or repeated compensating actions).

4.5. A conforming system MUST record every rollback operation in a tamper-evident log per AG-006, including: the failed workflow step, the compensating actions executed, the pre-rollback state, the post-rollback state, the consistency predicate evaluation result, and the total time from failure detection to rollback completion.

4.6. A conforming system MUST define a maximum rollback time window for each workflow, after which the rollback is abandoned and the workflow is escalated to human oversight with a full state snapshot rather than attempting a potentially stale reversal.

4.7. A conforming system MUST handle irreversible steps by defining a forward-recovery path (completing the workflow through an alternative route) or escalating to human oversight with a documented state report, rather than silently failing or leaving the system in an inconsistent state.

4.8. A conforming system MUST ensure that rollback logic is tested independently of forward workflow logic, with dedicated test suites that verify rollback behaviour for each possible failure point in each workflow.

4.9. A conforming system SHOULD implement saga-pattern orchestration for workflows spanning multiple services or systems, where each service's compensating action is registered at execution time and invoked in reverse order upon failure.

4.10. A conforming system SHOULD implement rollback dry-run capability, where the system evaluates whether a rollback is feasible (all preconditions met, all compensating actions available) before executing the rollback, and escalates to human oversight if the dry-run identifies infeasible steps.

4.11. A conforming system SHOULD maintain a rollback dependency graph that tracks which workflow steps' rollback depends on the successful rollback of other steps, preventing out-of-order rollback execution.

4.12. A conforming system MAY implement partial rollback with checkpointing, where the system rolls back to the most recent consistent checkpoint rather than to the pre-workflow state, provided the checkpoint state satisfies the consistency predicate.

5. Rationale

Stateful rollback is among the most underspecified aspects of AI agent system design. Most agent frameworks focus on the forward execution path — what happens when things go right — and treat failure as an exceptional condition handled by generic error logging. But in production multi-step workflows, partial failure is not exceptional; it is routine. Network timeouts, API rate limits, insufficient liquidity, sensor errors, and concurrent state modifications all cause workflows to fail partway through execution. The question is not whether partial failures will occur but whether the system's response to partial failure is governed or ad hoc.

The consequences of ungoverned rollback are severe because multi-step workflows accumulate real-world side effects as they execute. Each completed step changes state — debits an account, moves a physical object, grants an approval, writes a record. When the workflow fails, those side effects persist unless they are explicitly reversed. The resulting inconsistency may be immediately visible (a debited account with no corresponding credit) or latent (a standing token approval that creates an exploit vector days later). Latent inconsistencies are particularly dangerous because they may not be detected until they are exploited.

The regulatory landscape treats state consistency as a control requirement. SOX Section 404 requires that financial state changes are controlled and auditable — a failed workflow that leaves orphaned debits violates this requirement. FCA SYSC 6.1.1R requires adequate systems and controls for financial operations — a cross-border payment workflow with no rollback logic does not meet this standard. The EU AI Act Article 9 requires risk management for high-risk AI systems, and partial-failure state inconsistency is a foreseeable risk that must be mitigated. DORA Article 11 requires operational resilience testing, which must include testing the system's response to partial workflow failure.

The engineering challenge of governed rollback is that compensating actions are not always the mirror image of forward actions. A financial transfer can be reversed with a counter-transfer, but an FX conversion reversed after a rate movement incurs real costs. A solder paste application cannot be physically reversed. A token approval can be revoked, but if the approval was exploited between grant and revocation, the revocation does not reverse the exploitation. Effective rollback governance requires understanding the semantics of each step — whether it is reversible, what preconditions the reversal requires, what the reversal costs, and what risks the reversal introduces. This understanding must be codified in the rollback definition, not left to runtime improvisation by the agent.

Without AG-384, organisations face a class of operational failure where every partial workflow failure is a unique, manually resolved incident. Operations teams spend hours reconstructing what happened, determining what state the system is in, and manually executing compensating actions. This is expensive, error-prone, and unscalable. Governed rollback transforms partial failure from an incident into a handled condition with defined, tested, auditable resolution.

6. Implementation Guidance

AG-384 establishes the principle that rollback is a first-class workflow concern, not an afterthought. Every multi-step workflow must be designed with failure as a primary consideration: for each step, the designer must specify what happens if the step fails, what happens if a subsequent step fails and this step must be reversed, and what the system state looks like after rollback completes. This information is codified in the rollback definition, which is a versioned, testable artefact parallel to the workflow definition itself.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. Rollback semantics for financial workflows must account for settlement finality. Once a payment has settled through a clearing system (e.g., CHAPS, TARGET2, Fedwire), it cannot be "rolled back" — only reversed through a separate counter-payment that itself is subject to the recipient's agreement. Rollback definitions for financial workflows must distinguish between pre-settlement steps (reversible through cancellation) and post-settlement steps (reversible only through compensating transactions). FX conversions introduce rate risk during rollback; compensating actions should capture the original rate and calculate the rollback cost before execution. Integration with the firm's general ledger must ensure that rollback entries are properly accounted as corrections, not new transactions, to maintain SOX compliance.

Crypto/Web3. On-chain state changes are immutable by design — a token transfer cannot be reversed without the recipient's cooperation. Rollback semantics for on-chain workflows must be implemented as compensating transactions (counter-transfers, approval revocations) rather than state reversals. MEV risk during rollback transactions must be mitigated through private mempool submission (Flashbots Protect or equivalent). Smart contract approvals granted during workflow setup must be explicitly revoked during rollback, not left as standing approvals. Gas costs for rollback transactions must be budgeted as part of the workflow's cost model.

Safety-Critical / Robotics. Physical rollback creates unique hazards. Reversing a robotic motion requires verifying that the reverse path is collision-free, that tooling state permits the reversal (a gripper holding a component can reverse; a gripper that has released a component cannot "un-release"), and that the physical environment has not changed since the forward execution. Rollback definitions for physical operations must include spatial safety checks, and irreversible physical steps must be clearly classified as such with forward-recovery paths defined. Emergency stop must always take precedence over rollback execution.

Healthcare. Medication administration, diagnostic test ordering, and clinical note publication may be partially or fully irreversible. Rollback semantics must classify each step's reversibility based on clinical safety (a medication administered cannot be "un-administered"; a diagnostic test ordered but not yet performed can be cancelled). Post-rollback consistency checks must include patient safety verification, confirming that the rollback does not leave clinical records in a state that could cause treatment errors.

Maturity Model

Basic Implementation — The organisation has defined rollback semantics for its most critical workflows (top 20% by financial value or safety impact). Compensating actions are defined for reversible steps. Irreversible steps are identified and trigger human escalation on failure. Rollback operations are logged. Post-rollback consistency checks exist but may be manual (operations team verifies state). Rollback testing is included in workflow integration testing but is not a dedicated test suite.

Intermediate Implementation — All basic capabilities plus: rollback semantics are defined for all multi-step workflows, not just critical ones. Compensating actions are idempotent and include precondition verification. Post-rollback consistency predicates are automated and evaluated before normal operation resumes. Maximum rollback time windows are enforced. Rollback dry-run capability is implemented for workflows with complex compensating actions. Dedicated rollback test suites exist for each workflow, testing each possible failure point independently. Saga-pattern orchestration is used for cross-service workflows.

Advanced Implementation — All intermediate capabilities plus: event-sourced state provides temporal rollback with full audit trail. Rollback feasibility is evaluated in real time, with the system maintaining a continuous assessment of whether each in-progress workflow step could be rolled back given current state. Rollback cost estimation (FX spread, gas costs, operational overhead) is calculated before rollback execution and reported to governance. Cross-workflow rollback dependencies are tracked — if workflow A's state is consumed by workflow B, rolling back workflow A triggers a cascading assessment of workflow B's consistency. Independent adversarial testing has verified rollback behaviour under concurrent failure, partial compensating action failure, and state race conditions.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Testing AG-384 compliance requires simulating workflow failures at every step of representative workflows and verifying that rollback produces consistent post-rollback state. Testing must include concurrent failure scenarios and compensating action failures.

Test 8.1: Compensating Action Completeness

Test 8.2: Rollback Atomicity

Test 8.3: Post-Rollback Consistency Verification

Test 8.4: Rollback Idempotency

Test 8.5: Tamper-Evident Rollback Logging

Test 8.6: Maximum Rollback Time Window Enforcement

Test 8.7: Irreversible Step Handling

Test 8.8: Rollback Test Independence

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 9 (Risk Management System)Supports compliance
EU AI ActArticle 12 (Record-Keeping)Direct requirement
SOXSection 404 (Internal Controls Over Financial Reporting)Direct requirement
FCA SYSC6.1.1R (Systems and Controls)Direct requirement
NIST AI RMFGOVERN 1.1, MANAGE 2.2Supports compliance
ISO 42001Clause 6.1 (Actions to Address Risks), Clause 8.4 (AI System Operation)Supports compliance
DORAArticle 11 (Digital Operational Resilience Testing)Direct requirement

EU AI Act — Article 9 (Risk Management System)

Article 9 requires that risk management systems for high-risk AI systems identify and mitigate foreseeable risks. Partial workflow failure in multi-step AI agent operations is a foreseeable risk with documented precedent across financial services, manufacturing, and digital asset management. The EU AI Act requires that risk mitigation measures be "appropriate and targeted" — ad hoc manual resolution of state inconsistencies does not meet this standard when structured rollback semantics are technically feasible. AG-384 supports Article 9 by establishing a systematic approach to managing partial failure risk through defined, tested, and auditable rollback mechanisms.

EU AI Act — Article 12 (Record-Keeping)

Article 12 requires that high-risk AI systems include logging capabilities that enable the tracing of the system's operation throughout its lifecycle. Rollback operations are a critical part of the system's operation — they represent the system's response to failure conditions. AG-384's requirement for tamper-evident rollback logging (requirement 4.5) directly supports Article 12 compliance by ensuring that every rollback operation is fully logged with pre-rollback state, post-rollback state, compensating actions taken, and consistency verification results. Without this logging, the system's behaviour during failure conditions is opaque.

SOX — Section 404 (Internal Controls Over Financial Reporting)

Section 404 requires effective internal controls over financial reporting, including controls over the completeness and accuracy of financial transactions. A multi-step financial workflow that fails partway through and leaves orphaned debits, unreconciled nostro positions, or unmatched settlement entries directly undermines the completeness and accuracy of financial records. SOX auditors will ask: "If this workflow fails at step 3 of 5, what happens to the state changes from steps 1 and 2?" If the answer is "it depends on who notices and when," the control is inadequate. AG-384 provides the structural control: defined compensating actions, automated consistency verification, and auditable rollback logs that demonstrate every partial failure was resolved through a governed process.

FCA SYSC — 6.1.1R (Systems and Controls)

SYSC 6.1.1R requires adequate systems and controls sufficient to ensure compliance with applicable obligations. For firms operating AI agents in financial services, this includes ensuring that partial workflow failures do not create state inconsistencies that affect client assets, regulatory reporting, or market integrity. The FCA's supervisory expectations, reinforced through Dear CEO letters and thematic reviews, are that firms can demonstrate how their automated systems recover from failure conditions without creating operational risk. AG-384 directly implements this requirement for multi-step AI agent workflows by mandating defined rollback semantics, bounded rollback windows, and post-rollback consistency verification.

NIST AI RMF — GOVERN 1.1, MANAGE 2.2

GOVERN 1.1 addresses policies and procedures for AI risk management. MANAGE 2.2 addresses risk mitigation through enforceable controls. AG-384 supports GOVERN 1.1 by establishing rollback governance policies for AI agent operations and supports MANAGE 2.2 by implementing rollback as an enforceable control that mitigates the risk of state inconsistency from partial workflow failure.

ISO 42001 — Clause 6.1, Clause 8.4

Clause 6.1 requires actions to address risks within the AI management system. Clause 8.4 addresses operational requirements for AI systems. State inconsistency from partial workflow failure is an operational risk that must be addressed through defined rollback procedures. AG-384 implements the risk treatment for partial failure state inconsistency, satisfying both the risk identification requirement of Clause 6.1 and the operational control requirement of Clause 8.4.

DORA — Article 11 (Digital Operational Resilience Testing)

Article 11 requires financial entities to conduct digital operational resilience testing, including testing of ICT systems' response to failure conditions. Partial workflow failure and rollback execution are directly within scope of DORA's resilience testing requirements. The test specification in Section 8 of this protocol provides a structured programme that satisfies DORA's testing expectations for failure recovery. A financial entity that cannot demonstrate governed rollback behaviour for its AI agent workflows would face a DORA finding for inadequate operational resilience. DORA's requirement for "lessons learned" from testing aligns with AG-384's requirement for rollback execution logs that enable post-incident analysis.

10. Failure Severity

FieldValue
Severity RatingCritical
Blast RadiusCross-system — state inconsistency from ungoverned rollback propagates to every downstream system that consumes the inconsistent state, including financial ledgers, regulatory reports, physical control systems, and client-facing services

Consequence chain: Ungoverned rollback begins as a single partial workflow failure but propagates through state dependencies. The immediate technical consequence is state inconsistency: an account debited without a corresponding credit, a physical component moved without proper placement, an approval granted without subsequent revocation. The inconsistency becomes a downstream data quality issue when other systems consume the corrupted state — a ledger reports an incorrect balance, a regulatory report includes an orphaned transaction, a robot's world model diverges from physical reality. The operational impact compounds over time: each new operation that touches the inconsistent state extends the blast radius. In financial services, orphaned debits of £142,000 (Scenario A) grow into reconciliation breaks that affect end-of-day reporting, client statements, and regulatory returns. In safety-critical systems, divergence between the control system's state model and physical reality ($40,400 in Scenario B) creates hazards that escalate with each subsequent operation. In crypto/DeFi, unrevoked approvals ($407,000 in Scenario C) create persistent exploit vectors that can be triggered at any time. The business consequence includes regulatory enforcement (FCA fines for inadequate controls, SOX material weaknesses for financial state inconsistency), direct financial loss from unrecoverable state (FX slippage, MEV extraction, double-payments), safety incidents with personal injury liability, and reputational damage that erodes client and user trust. Without AG-384, every partial workflow failure is a potential cascading incident whose blast radius is limited only by the number of downstream systems that consume the inconsistent state.

Cite this protocol
AgentGoverning. (2026). AG-384: Stateful Rollback Semantics Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-384