AG-347

Model Rollback Readiness Governance

Model Provenance, Training & Adaptation ~15 min read AGS v2.1 · April 2026
EU AI Act NIST ISO 42001

2. Summary

Model Rollback Readiness Governance requires that organisations maintain tested, operationally ready rollback paths for every deployed AI model, enabling rapid reversion to a known-good prior version when the current version exhibits safety regressions, accuracy degradation, behavioural anomalies, or other issues that warrant immediate withdrawal. A rollback path is not merely having the prior model version available in storage — it is the demonstrated ability to substitute the prior version into the production environment within a defined time window, with verified functionality, compatible infrastructure, and minimal service disruption. Without rollback readiness, organisations facing a model failure have no rapid remediation option: they must either tolerate the failing model or take the service offline entirely while a fix is developed.

3. Example

Scenario A — No Rollback Path After Safety Regression: A customer-facing agent receives a model update that improves response quality metrics by 7%. Three days after deployment, monitoring detects that the updated model's refusal rate for harmful requests has dropped from 96.8% to 89.2% — a safety regression not caught in pre-deployment evaluation. The organisation decides to roll back to the prior version. However, the prior model version was deleted from the model registry during a storage cleanup two weeks earlier. The model weights exist only in an unindexed backup on a cold storage tier with a 24-hour retrieval SLA. The adapter configuration for the prior version was not preserved — only the current adapter versions are in the deployment manifest. The organisation faces a choice: continue operating the safety-degraded model for 24-48 hours while the prior version is restored and reconfigured, or take the service offline. They choose to take the service offline, affecting 340,000 daily users for 31 hours.

What went wrong: No rollback readiness governance ensured that the prior model version was available for rapid restoration. The prior weights were deleted during routine cleanup because no retention policy protected them. The prior adapter configuration was overwritten by the new deployment. No rollback procedure existed, so restoration was an improvised engineering effort. Consequence: 31 hours of service outage, £280,000 in lost revenue, customer trust damage, and emergency engineering costs of £45,000.

Scenario B — Rollback Incompatible with Current Infrastructure: A financial-value agent's model is updated from Version A to Version B. Version B uses a new tokeniser that is 15% more efficient. The downstream systems are updated to parse Version B's output format. When Version B exhibits a critical calculation error requiring immediate rollback, the team discovers that Version A's tokeniser is incompatible with the updated downstream parsing systems. Rolling back the model also requires rolling back three downstream services, two of which have processed 48 hours of transactions using the new format. A full rollback would require reprocessing those transactions. The team spends 72 hours developing a compatibility shim instead, during which the calculation error continues to affect transactions.

What went wrong: The rollback plan was not tested end-to-end. The dependency between the model version and the downstream infrastructure was not documented. The rollback plan addressed model weights but not the integration layer. A tested rollback would have revealed the incompatibility before the Version B deployment, allowing the team to maintain backward compatibility or implement the compatibility shim in advance. Consequence: 72 hours of degraded financial calculations, affecting an estimated £4.2 million in transactions, plus £95,000 in engineering costs for the emergency compatibility work.

Scenario C — Edge Deployment Without Rollback Capability: A safety-critical agent is deployed to 8,500 edge devices (industrial sensors with embedded inference). The model is updated over-the-air. The update introduces a subtle classification error that misidentifies a specific sensor pattern as normal when it indicates equipment stress. After 6 days, an equipment failure at one site is traced to the classification error. The organisation needs to roll back to the prior model version, but the edge deployment system does not support remote model rollback — it can only push forward updates. The prior model version must be repackaged as a "new" update and pushed through the standard OTA pipeline, which takes 3-5 days for full fleet coverage due to device connectivity patterns. During this period, 8,500 devices continue operating with the faulty classification.

What went wrong: The edge deployment architecture supported forward-only updates, not rollback. No governance requirement mandated that deployment infrastructure support rollback. The OTA pipeline was designed for updates, not emergency reversions. Consequence: 6-11 days of faulty classification across 8,500 devices, one confirmed equipment failure (£310,000), potential additional failures during the rollback period, and fundamental redesign of the edge deployment pipeline required (£420,000).

4. Requirement Statement

Scope: This dimension applies to every AI model deployed in a production or pre-production environment that serves real users, processes real data, or affects real systems. It covers all deployment architectures: cloud-hosted inference services, on-premise deployments, edge deployments, and hybrid architectures. The scope extends to all model components: base weights, adapter compositions (per AG-342), quantised variants (per AG-344), system prompts, and any other component whose rollback is necessary to restore prior behaviour. For models accessed through third-party APIs, the scope covers the organisation's ability to revert to a prior API version.

4.1. A conforming system MUST maintain at least one prior approved model version in a state ready for rapid deployment at all times for every production model.

4.2. A conforming system MUST define and enforce a maximum rollback time — the time from rollback decision to the prior version serving production traffic — appropriate to the deployment context (e.g., 1 hour for safety-critical, 4 hours for general production, 24 hours for edge deployments).

4.3. A conforming system MUST test the rollback procedure at least annually for each production model, verifying that the prior version can be deployed within the defined maximum rollback time and that it functions correctly with the current infrastructure.

4.4. A conforming system MUST preserve the complete deployment configuration of the prior version — including model weights, adapter composition manifest, system prompt, output formatting templates, and any other component needed to reproduce the prior behaviour.

4.5. A conforming system MUST assess and document rollback compatibility before any model update, verifying that the prior version remains compatible with the current state of downstream systems, APIs, and integration layers.

4.6. A conforming system SHOULD implement automated rollback capability that can be triggered through a single operational action (e.g., a deployment command, a configuration change, or an API version revert).

4.7. A conforming system SHOULD maintain rollback paths for at least two prior versions (N-1 and N-2), in case the most recent prior version also has issues.

4.8. A conforming system SHOULD implement canary deployment for model updates, enabling rapid rollback of the canary before full deployment if issues are detected.

4.9. A conforming system MAY implement automatic rollback triggers based on monitoring metrics (e.g., if safety refusal rate drops below a threshold, automatically revert to the prior version).

5. Rationale

Model rollback is the primary rapid remediation mechanism for AI model failures. When a deployed model exhibits safety regressions, accuracy degradation, unexpected behaviour, or any other issue that warrants withdrawal from production, the organisation needs an alternative model version that can be deployed quickly. Without a ready rollback path, the only options are to tolerate the failing model (accepting ongoing risk) or to take the service offline (accepting availability impact). Neither option is acceptable for production services, particularly safety-critical or regulated ones.

The rollback challenge is more complex for AI models than for traditional software. Software rollback typically means deploying a prior code version, which is well-understood and supported by standard deployment tooling. Model rollback involves: reverting model weights (which may be tens of gigabytes), reverting adapter configurations, reverting system prompts, and potentially reverting downstream integrations that depend on the model's output format. If any component in this chain is missing, incompatible, or untested, the rollback fails.

Edge deployments present an extreme version of this challenge. Edge devices may have limited storage (preventing retention of prior model versions), limited connectivity (preventing rapid remote deployment), and limited compute (preventing rapid model loading). Despite these constraints, edge-deployed models in safety-critical contexts require rollback capability — the risk of operating a faulty model on thousands of unsupervised devices demands it.

The testing requirement is critical because rollback procedures that have never been tested have a high failure rate when needed. Infrastructure changes, configuration drift, and dependency updates can silently break rollback compatibility. The only way to know that rollback works is to test it — periodically and after any infrastructure change that might affect compatibility.

6. Implementation Guidance

Rollback architecture. Design the model serving infrastructure to support rapid model version switching. Common patterns include: blue-green deployment (two production environments, one serving traffic and one holding the prior version, with traffic switched via load balancer), model version routing (the serving infrastructure can load any registered model version and route traffic to it), and immutable deployment artefacts (each model version is a self-contained deployment artefact including weights, adapters, configuration, and runtime dependencies).

Rollback package. For each production deployment, maintain a rollback package that includes: model weights (exact version, verified by hash), adapter composition manifest (per AG-342), system prompt version, output formatting templates, inference configuration (batch size, temperature, top-p, etc.), and compatibility notes (documenting known dependencies on downstream systems). The rollback package should be stored in a location with a retrieval SLA that meets the maximum rollback time requirement — not in cold storage or an archive tier.

Rollback testing. Test rollback procedures on a defined schedule. The test should verify: the prior model version can be retrieved within the defined time, the prior version loads and serves inference correctly, the prior version produces outputs compatible with current downstream systems, and the switchover completes within the maximum rollback time. Document test results and any issues encountered. For safety-critical deployments, conduct rollback tests quarterly. For general production, annually.

Pre-update compatibility assessment. Before deploying a model update, assess whether the prior version will remain compatible with the updated infrastructure. If the update changes the tokeniser, output format, API schema, or any other interface, document the compatibility impact and either maintain backward compatibility or update the rollback plan to include the necessary downstream reversions.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. PRA SS1/23 expects firms to have contingency plans for model failures. Rollback readiness is the primary contingency for model-related incidents. Maximum rollback times for financial models should align with the firm's incident response SLAs — typically 1-4 hours for customer-affecting services.

Healthcare. Clinical AI model rollback must be achievable without disrupting patient care. Rollback procedures should be designed so that clinical workflows can continue with the prior model version or with manual fallback procedures during the transition. Maximum rollback times should align with clinical safety requirements.

Safety-Critical Systems. Edge-deployed models in safety-critical contexts (industrial control, autonomous vehicles) must have local rollback capability that does not depend on network connectivity. The rollback must be achievable within the safety response time for the application.

Maturity Model

Basic Implementation — Prior model versions are retained in the model registry. Rollback is possible but requires manual engineering effort — loading weights, reconfiguring adapters, updating deployment manifests. Maximum rollback time is undefined and depends on engineer availability. Rollback has not been tested. This level provides raw capability but not operational readiness.

Intermediate Implementation — A documented rollback procedure exists for each production model. Rollback packages include all necessary components (weights, adapters, configuration). Maximum rollback times are defined and achievable through the documented procedure. Rollback is tested annually. Pre-update compatibility assessment is performed. The organisation can roll back any production model within the defined time window.

Advanced Implementation — All intermediate capabilities plus: automated rollback capability (single-action trigger). Blue-green or equivalent rapid-switching architecture. N-2 rollback depth. Rollback tested quarterly for high-risk deployments. Canary deployment with automated rollback on metric degradation. Edge deployments have local rollback partitions. The organisation can demonstrate sub-hour rollback capability for critical deployments and has tested rollback within the last quarter.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Prior Version Availability

Test 8.2: Rollback Time Compliance

Test 8.3: Rollback Functional Correctness

Test 8.4: Downstream Compatibility After Rollback

Test 8.5: Rollback Procedure Documentation

Test 8.6: Automated Rollback Trigger

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 9 (Risk Management System)Supports compliance
EU AI ActArticle 72 (Post-Market Monitoring)Direct requirement
DORAArticle 11 (ICT Response and Recovery)Direct requirement
PRA SS1/23Model Risk Management — Contingency PlanningDirect requirement
NIST AI RMFMANAGE 2.4, MANAGE 4.1Supports compliance
ISO 42001Clause 8.2 (AI Risk Assessment), Clause 8.4 (AI System Operation)Supports compliance
IEC 62443Security Level Requirements — Recovery CapabilitiesSupports compliance

EU AI Act — Article 72 (Post-Market Monitoring)

Article 72 requires providers of high-risk AI systems to establish and document a post-market monitoring system. This system must enable the provider to "take any necessary corrective or preventive action" when issues are identified. Model rollback is the primary corrective action for a model that is found to be non-compliant, unsafe, or underperforming after deployment. Without rollback readiness, the provider cannot take the "necessary corrective action" that Article 72 requires — the corrective action would be to withdraw the model, and without a ready replacement, withdrawal means service shutdown.

DORA — Article 11 (ICT Response and Recovery)

Article 11 requires financial entities to establish ICT business continuity and disaster recovery policies. For AI-dependent financial services, model rollback is an ICT recovery capability. DORA's requirement for "recovery time objectives" maps directly to AG-347's maximum rollback time definitions. A financial entity that cannot demonstrate model rollback capability within defined time objectives has a gap in its ICT recovery posture.

PRA SS1/23 — Contingency Planning

PRA SS1/23 expects firms to have contingency plans for model-related incidents, including the ability to revert to prior model versions or fallback manual processes. AG-347 directly implements this expectation. Firms unable to demonstrate model rollback capability face supervisory challenge on their contingency planning adequacy.

10. Failure Severity

FieldValue
Severity RatingHigh
Blast RadiusService-wide — all users of the affected model lose access or receive degraded service during the rollback gap

Consequence chain: Rollback readiness failure converts every model issue into a crisis. When a model exhibits a safety regression, accuracy degradation, or behavioural anomaly, the ideal response is rapid rollback — within minutes to hours, the prior version is serving traffic and the issue is contained. Without rollback readiness, the same issue becomes an extended outage or an extended period of degraded operation. Scenario A's 31-hour outage affected 340,000 daily users and cost £280,000 in revenue. Scenario B's 72-hour degradation affected £4.2 million in financial transactions. Scenario C's 6-11 day exposure across 8,500 edge devices caused £310,000 in equipment damage. In each case, the original model issue was manageable; the inability to roll back transformed it into a major incident. The compounding effect is significant: the longer a faulty model operates, the more decisions it makes, the more harm accumulates, and the more expensive remediation becomes. Rollback readiness is not about preventing model failures — it is about limiting the blast radius and duration of failures that inevitably occur.

Cross-references: AG-339 (Model Weight Custody Governance) ensures that prior model weights are retained and accessible — a prerequisite for rollback. AG-048 (AI Model Provenance and Integrity) provides the version tracking needed to identify which version to roll back to. AG-342 (Adapter Composition Approval Governance) covers the adapter configuration that must be included in the rollback package. AG-345 (Model Family Substitution Governance) addresses the governance of more fundamental model replacements that go beyond version rollback. AG-339 through AG-348 form the sibling landscape for Model Provenance, Training & Adaptation.

Cite this protocol
AgentGoverning. (2026). AG-347: Model Rollback Readiness Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-347