The Standard

The 841 Dimensions Regulatory Mapping Version History

Compliance

Compliance Leaderboard Platform Comparison

Verification

Submit for Verification Self-Assessment Tool

About

About AgentGoverning Press & Media

Contact

AG-076

Assurance Case Maintenance Governance

Lifecycle, Release & Change Governance ~18 min read AGS v2.1 · April 2026

EU AI Act SOX FCA NIST ISO 42001

2. Summary

Assurance Case Maintenance Governance requires that every AI agent deployment is supported by a living assurance case — a structured, evidence-based argument that the agent operates safely and within its governance requirements — and that this assurance case is actively maintained throughout the agent's operational lifecycle. The assurance case is not a one-time deployment artefact; it is a continuously maintained argument that must be updated whenever the agent, its environment, its governance controls, or the regulatory landscape changes. AG-076 treats the assurance case as a governance artefact subject to the same version control, review, and approval processes as the agent's operational configuration. An assurance case that was valid at deployment but has not been updated after 12 months of operational changes, model updates, and regulatory developments provides false assurance — which is worse than no assurance at all.

3. Example

Scenario A — Assurance Case Stale After Model Update: An organisation deploys a financial advisory agent with a comprehensive assurance case documenting: the model's performance characteristics, the governance controls in place, the risk mitigations applied, and the regulatory compliance arguments. Six months later, the model provider releases a significant update that changes the model's reasoning behaviour, context window size, and output characteristics. The organisation applies the update to improve performance. However, the assurance case is not reviewed or updated. The performance claims in the assurance case reference test results from the previous model version. The behavioural assumptions underlying the risk arguments no longer hold. When the regulator requests the assurance case during a supervisory visit, the document references a model version that is no longer deployed. The regulator issues a finding for inadequate assurance.

What went wrong: The assurance case was treated as a static deployment document rather than a living artefact. No trigger existed to require assurance case review upon model update. The performance evidence in the assurance case was invalidated by the model change but nobody recognised this dependency. Consequence: Regulatory finding, mandatory remediation, operational restriction on the agent until the assurance case is updated and re-approved, 3-month delay to business objectives.

Scenario B — Evidence Expiry Without Detection: A safety-critical agent's assurance case includes a claim that the agent's response latency is below 200 milliseconds at the 99th percentile, supported by load test evidence dated 14 months ago. Infrastructure changes since the test — including a migration to a different cloud region and the addition of a new governance gateway — have increased latency to 340 milliseconds at the 99th percentile. The assurance case still cites the original test results. During a safety incident, the latency exceeds the safety threshold, and the post-incident review discovers that the assurance case claim has been unsupported for 8 months. The organisation cannot demonstrate that the safety argument was valid during that period.

What went wrong: The evidence supporting the assurance case claim had an implicit validity period that was never formalised. No mechanism existed to detect when supporting evidence became stale. The infrastructure changes were not linked to the assurance case as invalidating events. Consequence: Safety incident contributing factor, regulatory investigation, requirement to re-establish the entire assurance case from scratch, 6-month operational restriction.

Scenario C — Regulatory Change Invalidates Compliance Argument: An organisation's assurance case for a customer-facing agent includes a compliance argument that the agent meets data protection requirements under the current regulatory framework. A new regulatory requirement is enacted that imposes additional transparency obligations for AI-generated customer communications. The assurance case is not updated. The organisation continues to operate the agent under the assumption that the existing compliance argument remains valid. Eighteen months later, a data protection authority audit reveals that the agent has been operating without the required transparency measures. The organisation cannot demonstrate that it assessed the regulatory change's impact on the assurance case.

What went wrong: No mechanism existed to link regulatory change events to assurance case review triggers. The compliance argument in the assurance case was never re-evaluated against the new requirements. The assurance case provided false confidence that compliance was maintained. Consequence: Data protection enforcement action, potential fine, requirement to notify affected customers, mandatory operational changes, reputational damage.

4. Requirement Statement

Scope: This dimension applies to all AI agents deployed in production environments where the organisation makes claims — explicit or implicit — about the agent's safety, performance, compliance, or governance posture. This includes all agents in regulated sectors, all agents that interact with external parties, all agents that process personal data, and all agents whose failure could cause material harm. The scope extends to agents deployed under regulatory approvals, certifications, or compliance declarations that reference assurance arguments. An organisation that claims an agent is "compliant with the EU AI Act" or "operating within governance controls" has an implicit assurance case — AG-076 requires that this assurance case be explicit, structured, evidenced, and maintained. The only agents excluded from scope are internal experimental agents in sandbox environments with no access to production data or systems and no external interactions.

4.1. A conforming system MUST maintain a structured assurance case for each production-deployed agent, containing: claims about the agent's safety, performance, compliance, and governance posture; evidence supporting each claim; and arguments linking the evidence to the claims.

4.2. A conforming system MUST define explicit validity conditions for each piece of evidence in the assurance case, including: the evidence type, the date collected, the conditions under which it was collected, and the events that would invalidate it (e.g., model update, infrastructure change, regulatory change, drift detection event).

4.3. A conforming system MUST trigger an assurance case review within 30 calendar days of any invalidating event, including but not limited to: model version change, significant infrastructure change, governance control modification, regulatory change affecting the agent's domain, and behavioural drift detection by AG-022.

4.4. A conforming system MUST version-control the assurance case using the same change control mechanisms required by AG-007, including approval workflows, audit trails, and rollback capability.

4.5. A conforming system MUST conduct a full assurance case review at least annually, even if no invalidating events have occurred, to confirm that all claims remain valid, all evidence remains current, and all arguments remain sound.

4.6. A conforming system MUST designate an accountable owner for each assurance case who is responsible for initiating reviews, approving updates, and attesting to the assurance case's validity.

4.7. A conforming system SHOULD implement automated monitoring for invalidating events — model updates, infrastructure changes, regulatory publications, drift alerts — that triggers assurance case review workflows without requiring manual detection.

4.8. A conforming system SHOULD maintain a dependency map linking each assurance case claim to the specific evidence artefacts, governance controls, infrastructure components, and regulatory provisions that support it, enabling impact analysis when any dependency changes.

4.9. A conforming system SHOULD generate an assurance case status dashboard showing: the last review date, the next scheduled review date, the number of claims with current evidence, the number of claims with stale evidence, and any outstanding invalidating events awaiting review.

4.10. A conforming system MAY implement automated evidence refresh — periodic re-execution of tests, benchmarks, and evaluations that provide the evidence supporting assurance case claims — to keep evidence current without manual intervention.

5. Rationale

Assurance Case Maintenance Governance addresses the gap between initial assurance and ongoing assurance. Most organisations that deploy AI agents with governance controls create some form of assurance documentation at deployment time — a risk assessment, a compliance checklist, a safety analysis. But the value of this documentation degrades from the moment it is created. Every change to the agent, its environment, its governance controls, or the regulatory landscape potentially invalidates one or more claims in the assurance case. Without active maintenance, the assurance case becomes a historical record of the deployment state rather than a current attestation of the operational state.

The meta-governance nature of AG-076 is deliberate. This dimension does not govern the agent's behaviour directly — it governs the governance artefact that provides confidence in the agent's behaviour. It is the governance of governance evidence. This meta-level is necessary because an unmonitored assurance case is worse than no assurance case at all. No assurance case creates honest uncertainty. A stale assurance case creates false confidence, which leads to decisions based on assumptions that no longer hold.

The structured assurance case methodology — claims, evidence, arguments — is well-established in safety engineering (Goal Structuring Notation, Claims-Arguments-Evidence notation). AG-076 applies this methodology to AI agent governance and adds the critical requirement of ongoing maintenance. The traditional assurance case in safety engineering is typically created at design time and updated at major lifecycle milestones. AI agents change more frequently than traditional safety-critical systems — model updates, prompt changes, infrastructure migrations, and regulatory changes occur on timescales of weeks to months, not years. The assurance case maintenance cadence must match the rate of change.

AG-076 intersects with AG-007 because the assurance case itself is a governance configuration artefact. It intersects with AG-022 because behavioural drift detection events are among the most important invalidating triggers for assurance case claims. It intersects with AG-078 (Benchmark Coverage Governance) because benchmark results are a primary form of evidence in the assurance case.

6. Implementation Guidance

AG-076 requires organisations to treat the assurance case as a living document embedded in their operational processes, not a compliance artefact filed at deployment time and forgotten.

Recommended patterns:

Structured assurance case repository. Store assurance cases in a structured format (e.g., YAML, JSON, or a dedicated assurance case management tool) rather than prose documents. Each claim is a discrete record linked to its supporting evidence and arguments. Each evidence artefact has metadata: collection date, validity conditions, invalidating events, and expiry policy. This structure enables automated staleness detection — a scheduled process can scan all evidence metadata and flag claims whose evidence has expired or whose invalidating events have occurred. Target a maximum of 48 hours between an invalidating event and the assurance case being flagged for review.
Event-driven review triggers. Integrate the assurance case review workflow with the organisation's change management, CI/CD pipeline, regulatory monitoring, and drift detection systems. When a model is updated, the CI/CD pipeline fires an event that checks the assurance case dependency map and triggers a review for any affected claims. When AG-022 detects behavioural drift, the drift alert automatically creates an assurance case review task. When a regulatory monitoring service identifies a relevant regulatory change, it triggers a review of compliance claims. This integration eliminates the reliance on humans noticing that a change affects the assurance case.
Evidence lifecycle management. Define explicit validity periods for each evidence type. Performance test results might be valid for 6 months or until the next infrastructure change. Compliance assessments might be valid for 12 months or until the next regulatory change. Security audit results might be valid for 12 months or until the next significant code change. When an evidence artefact approaches its validity expiry, the system generates a refresh task. This prevents the slow degradation of evidence currency that characterises most assurance documentation.
Claim-evidence dependency mapping. Maintain an explicit graph linking each claim to its supporting evidence, each evidence artefact to the system components it was collected from, and each claim to the regulatory provisions it addresses. When any node in the graph changes, the system can traverse the dependencies to identify all affected claims. For example, if a cloud region migration occurs, the system identifies all evidence artefacts collected in the original region, flags them as potentially invalid, and identifies all claims that depend on those artefacts.

Anti-patterns to avoid:

Assurance case as a one-time deployment gate. The most common failure is treating the assurance case as a document that must be completed to pass a deployment review, then never updating it. Within 6 months, the assurance case no longer reflects reality. Within 12 months, it is actively misleading.
Prose-only assurance documentation. A 50-page Word document describing the governance controls for an agent is not a maintainable assurance case. It cannot be programmatically queried, its evidence links cannot be automatically validated, and its staleness cannot be detected without human review. Structured, machine-readable formats enable the automation necessary for ongoing maintenance.
Evidence without validity metadata. A test report from 18 months ago that has no defined validity period and no list of invalidating events provides unknown assurance. Without validity metadata, there is no mechanism to detect when the evidence becomes stale.
Assurance case maintenance as an afterthought to change management. If assurance case review is a separate process from change management, it will be forgotten. The assurance case review must be embedded in the change management workflow — model updates, infrastructure changes, and governance modifications should not be deployable without an assurance case impact assessment.
No designated ownership. Assurance cases without a designated owner drift without accountability. The owner does not need to personally maintain every claim — but they are accountable for ensuring that maintenance happens.

Industry Considerations

Financial Services. Assurance cases for financial agents should align with the FCA's expectations for model risk management documentation. The FCA's SS1/23 requires firms to maintain documentation demonstrating that AI models are fit for purpose throughout their lifecycle. The assurance case can serve as the central artefact for this requirement. Annual review should align with the firm's model validation cycle. Claims should explicitly address conduct risk, market integrity, and client outcomes.

Healthcare. Medical device-class agents require assurance cases aligned with IEC 62304 (medical device software lifecycle) and ISO 14971 (risk management for medical devices). The assurance case must demonstrate ongoing clinical safety throughout the agent's operational life. Changes to the model or clinical guidelines must trigger reassessment. FDA post-market surveillance requirements map directly to assurance case maintenance obligations.

Critical Infrastructure. Agents operating in safety-critical environments should maintain assurance cases aligned with IEC 61508 (functional safety) and sector-specific standards. Safety integrity level (SIL) claims must be re-validated whenever the agent or its operating environment changes. The assurance case should integrate with the plant's safety management system and be subject to independent safety assessment.

Maturity Model

Basic Implementation — The organisation maintains a documented assurance case for each production agent, structured as claims, evidence, and arguments. The assurance case is stored in a version-controlled document repository. An accountable owner is designated for each assurance case. Annual reviews are scheduled and tracked. Invalidating events trigger manual review. This level meets the minimum mandatory requirements but relies on human detection of invalidating events, which creates gaps when changes are frequent.

Intermediate Implementation — Assurance cases are stored in a structured, machine-readable format. Evidence artefacts have explicit validity metadata. An automated monitoring system detects invalidating events — model updates, infrastructure changes, drift alerts, regulatory publications — and triggers review workflows. A dependency map links claims to evidence and system components. A status dashboard shows the current health of each assurance case. Reviews are completed within 30 days of invalidating events, evidenced by workflow records.

Advanced Implementation — All intermediate capabilities plus: automated evidence refresh re-executes tests and benchmarks on a scheduled basis, keeping evidence current without manual intervention. The assurance case is integrated with the CI/CD pipeline — deployments that invalidate assurance case claims are automatically blocked until the claims are re-validated. Independent third-party review of the assurance case is conducted annually. The organisation can demonstrate a complete, auditable history of every assurance case change, every invalidating event, and the response to each event, for every production agent.

7. Evidence Requirements

Required artefacts:

Assurance case document. The structured assurance case for each production agent, containing all claims, evidence references, arguments, validity conditions, and the dependency map. Format: structured data or a recognised assurance case notation (GSN, CAE). Not an unstructured prose document.
Evidence artefacts. The actual evidence supporting each claim — test reports, audit results, benchmark data, compliance assessments, risk assessments — with metadata including collection date, conditions, and validity period.
Review records. Timestamped records of each assurance case review, including: the triggering event (scheduled annual review or specific invalidating event), the claims reviewed, the findings, the updates made, and the approver's identity.
Invalidating event log. Records of all events that triggered assurance case review, including: the event type, the date detected, the claims affected, and the review outcome.
Owner designation records. Records of the accountable owner for each assurance case, including any ownership transfers.

Retention requirements:

Assurance case versions and review records: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators or auditors within 48 hours of request. The current assurance case and the complete version history must be available as retained artefacts.

8. Test Specification

Testing AG-076 compliance requires verifying not only that assurance cases exist but that the maintenance process functions correctly.

Test 8.1: Assurance Case Completeness

Stimulus: Examine the assurance case for a production-deployed agent.
Expected behaviour: The assurance case contains: claims covering safety, performance, compliance, and governance; evidence supporting each claim with validity metadata; arguments linking evidence to claims; and a designated owner.
Pass criteria: Every claim has at least one piece of current, valid evidence. No claim is unsupported. The owner is designated and current.
Fail criteria: Any claim lacks supporting evidence, any evidence has expired without renewal, or no owner is designated.

Test 8.2: Invalidating Event Response

Stimulus: Introduce an invalidating event — for example, update the agent's model version or modify a governance control.
Expected behaviour: The invalidating event is detected (automatically at intermediate maturity, manually at basic maturity). An assurance case review is triggered. The review is completed within 30 calendar days. Affected claims are re-evaluated and evidence is refreshed or the claim is modified.
Pass criteria: The invalidating event triggers a documented review within 30 days, and the assurance case is updated to reflect the change.
Fail criteria: The invalidating event is not detected, no review is triggered, the review takes more than 30 days, or the assurance case is not updated.

Test 8.3: Annual Review Execution

Stimulus: Verify that a full assurance case review has been conducted within the past 12 months for a production agent that has been deployed for more than 12 months.
Expected behaviour: A review record exists showing a full review within the past 12 months, including: the reviewer, the date, the scope (all claims), the findings, and any updates made.
Pass criteria: A complete annual review record exists within the past 12 months.
Fail criteria: No annual review record exists, or the review is older than 12 months.

Test 8.4: Version Control and Audit Trail

Stimulus: Request the version history of an assurance case.
Expected behaviour: The complete version history is available, showing every change, the date of change, the person who made the change, the approval, and the reason for the change.
Pass criteria: The version history is complete, immutable, and includes approval records for each change.
Fail criteria: The version history is incomplete, changes lack approval records, or the history can be modified.

Test 8.5: Evidence Validity Tracking

Stimulus: Examine the evidence artefacts supporting a production assurance case. Check their validity metadata.
Expected behaviour: Each evidence artefact has a defined validity period and a list of invalidating events. No evidence artefact has exceeded its validity period without renewal or acknowledged exception.
Pass criteria: All evidence artefacts are within their validity period, or expired artefacts have documented renewal plans or acknowledged exceptions with risk acceptance.
Fail criteria: Evidence artefacts have expired without detection or acknowledgment.

Test 8.6: Dependency Map Accuracy

Stimulus: Select a system component (e.g., an infrastructure element or governance control) referenced in the assurance case dependency map. Verify that the map correctly identifies all claims that depend on this component.
Expected behaviour: The dependency map identifies all claims that would be affected by a change to the selected component.
Pass criteria: An independent analysis of the assurance case confirms that the dependency map is accurate and complete for the selected component.
Fail criteria: The dependency map misses claims that depend on the selected component, or includes incorrect dependencies.

Conformance Scoring

Score 0: No assurance case exists — the organisation cannot articulate the claims, evidence, and arguments supporting the agent's safety, performance, or compliance posture.
Score 1: An assurance case exists but is maintained as a static document. No defined validity conditions for evidence. No defined invalidating event triggers. Reviews are ad hoc.
Score 2: A structured assurance case with evidence validity metadata, defined invalidating event triggers, and documented reviews within required timeframes. All MUST requirements are met.
Score 3: Automated event detection, dependency mapping, evidence refresh, CI/CD integration blocking deployments that invalidate claims, and independent third-party review. The assurance case is a living, continuously validated artefact.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9 (Risk Management System)	Direct requirement
EU AI Act	Article 11 (Technical Documentation)	Direct requirement
EU AI Act	Article 61 (Post-Market Monitoring)	Direct requirement
SOX	Section 404 (Internal Controls Over Financial Reporting)	Supports compliance
FCA SYSC	6.1.1R (Systems and Controls)	Supports compliance
NIST AI RMF	GOVERN 1.2, MAP 1.1, MANAGE 4.1	Supports compliance
ISO 42001	Clause 9.1 (Monitoring, Measurement, Analysis, Evaluation)	Direct requirement
DORA	Article 9 (ICT Risk Management Framework)	Supports compliance
FDA 21 CFR Part 820	Design Controls (820.30)	Supports compliance

EU AI Act — Article 9 (Risk Management System)

Article 9 requires a risk management system that is a "continuous iterative process" throughout the lifecycle of the high-risk AI system. The assurance case is the structured expression of this continuous process — it documents the risks identified, the mitigations applied, and the evidence that mitigations are effective. Article 9(2)(e) specifically requires that risk management measures are "tested with a view to identifying the most appropriate risk management measures." Ongoing assurance case maintenance ensures that test evidence remains current and that new risks are incorporated as they are identified.

EU AI Act — Article 11 (Technical Documentation)

Article 11 requires technical documentation to be kept up to date. The assurance case is a core component of technical documentation for AI governance. AG-076's maintenance requirements — triggered reviews, annual reviews, evidence validity tracking — directly implement the "kept up to date" obligation.

EU AI Act — Article 61 (Post-Market Monitoring)

Article 61 requires providers to establish a post-market monitoring system. The assurance case maintenance process is a primary mechanism for post-market monitoring of governance effectiveness — it continuously evaluates whether the claims made about the agent at deployment remain valid during operation.

SOX — Section 404 (Internal Controls Over Financial Reporting)

For AI agents in financial operations, the assurance case provides the documented basis for management's assertion that controls are effective. Section 404 requires ongoing assessment, not just initial establishment. The annual review and invalidating event response requirements of AG-076 support the continuous assessment obligation.

FCA SYSC — 6.1.1R (Systems and Controls)

The FCA expects firms to demonstrate that AI systems are monitored for ongoing effectiveness. SS1/23 specifically addresses model risk management through the lifecycle. The assurance case provides the structured evidence base that the firm can present to supervisors to demonstrate ongoing governance effectiveness.

NIST AI RMF — GOVERN 1.2, MAP 1.1, MANAGE 4.1

GOVERN 1.2 addresses processes for the ongoing management of AI risks. MAP 1.1 addresses the context and intended use documentation. MANAGE 4.1 addresses the regular monitoring and review of AI risk management. AG-076 supports all three by requiring structured, maintained assurance documentation with defined review cadences.

ISO 42001 — Clause 9.1 (Monitoring, Measurement, Analysis, Evaluation)

Clause 9.1 requires organisations to determine what needs to be monitored and measured, the methods used, when monitoring and measuring shall be performed, and when the results shall be analysed and evaluated. The assurance case with its evidence validity tracking and review cadence directly implements this clause for AI governance.

FDA 21 CFR Part 820 — Design Controls (820.30)

For AI agents classified as medical devices or components thereof, the assurance case maintenance process supports the design review and design validation requirements throughout the product lifecycle. Changes that affect the device's safety or effectiveness require re-validation, which maps directly to the invalidating event trigger mechanism.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Organisation-wide — affecting the credibility of the entire governance programme and regulatory standing

Consequence chain: When assurance cases are not maintained, the organisation's confidence in its governance posture becomes unfounded. The immediate technical failure is that claims about agent safety, performance, and compliance are unsupported by current evidence. This creates two categories of risk. First, actual governance failures may go undetected because the assurance case that should have flagged them is stale — the organisation believes controls are effective because the assurance case says so, but the evidence is outdated. Second, when a regulator, auditor, or incident investigation requests the assurance case, the organisation produces a document that demonstrably does not reflect current reality. This destroys confidence in the entire governance programme, not just the specific agent. A regulator who discovers a stale assurance case will reasonably question whether any of the organisation's governance artefacts are current. The consequence extends to personal liability — senior managers who attest to governance effectiveness based on unmaintained assurance cases face personal regulatory exposure. The organisational consequence is a forced remediation programme, potential operational restrictions on all agent deployments, and reputational damage that affects the organisation's ability to deploy AI systems.

Cross-references: AG-007 (Governance Configuration Control) — assurance cases are governance configurations requiring version control. AG-022 (Behavioural Drift Detection) — drift events are primary invalidating triggers for assurance case claims. AG-078 (Benchmark Coverage Governance) — benchmark results are key evidence artefacts within the assurance case.

Cite this protocol

AgentGoverning. (2026). AG-076: Assurance Case Maintenance Governance. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-076

← Previous Protocol

AG-075

Decommissioning and Credential Revocation Governance

Next Protocol →

AG-077

Generated Artefact Promotion Governance