AG-254: Sunset Review Governance

2. Summary

Sunset Review Governance requires periodic re-approval or retirement of agents whose value, necessity, or safety case has eroded since initial deployment. Every agent must have a defined review date, and at that review, the organisation must make a deliberate decision: re-approve with the current mandate, re-approve with modifications, or retire. An agent that is not affirmatively re-approved is retired — the default is cessation, not continuation. This dimension prevents the accumulation of stale, underperforming, or obsolete agents that consume governance resources, create unnecessary risk, and occupy portfolio capacity without delivering proportionate value.

3. Example

Scenario A — Zombie Agent Consuming Resources Without Value: A logistics company deployed an AI agent 3 years ago to optimise warehouse picking routes. At deployment, the agent reduced average pick time by 18%. Since then, the warehouse management system has been upgraded with native route optimisation that achieves equivalent results. The agent continues running, consuming £42,000 per year in API costs and requiring governance overhead (monitoring, quarterly testing, annual audit preparation) estimated at £28,000 per year. No one has evaluated whether the agent still provides incremental value beyond the native WMS capability. The development team that built the agent has moved to other projects. The agent has become an orphan — running, costing money, requiring governance, but providing negligible incremental value.

What went wrong: No sunset review mechanism existed. The agent was approved once and assumed to be permanently necessary. No trigger existed to reassess whether the business case remained valid. Consequence: £70,000 per year in unnecessary cost (£210,000 over 3 years), governance resources diverted from higher-value agents, portfolio capacity consumed by a redundant deployment.

Scenario B — Safety Case Eroded by Model Update: A clinical decision support agent was deployed with a safety case that included the finding: "The model demonstrates 94.2% sensitivity for detecting diabetic retinopathy in fundus images, validated against 12,000 test images." Two years later, the model provider has released 4 major updates. No one has re-validated the sensitivity claim against the current model version. A sunset review would have required re-validation of the safety case — and when performed retrospectively after an incident, the current model version shows 87.1% sensitivity due to a training data distribution shift. The agent has been operating for 14 months with degraded safety performance that was never detected.

What went wrong: The safety case was validated once at deployment and never re-validated. Model updates changed the underlying capability, but the agent continued operating under the original safety case. A sunset review at 12 months would have required re-validation and detected the degradation before 14 months elapsed. Consequence: 14 months of clinical decision support with degraded sensitivity, potential missed diagnoses, mandatory clinical audit of all agent-influenced assessments during the affected period, estimated review cost £380,000.

Scenario C — Regulatory Change Invalidates Use-Case: An agent deployed to automate customer onboarding collects and processes biometric data (facial recognition for identity verification). At deployment, the processing was lawful under the organisation's legitimate interest assessment. Eighteen months later, the jurisdiction introduces new biometric data regulations requiring explicit, specific consent for biometric processing — not just the general privacy notice the agent presents. The agent continues operating under the original legal basis for 8 months before a regulatory audit identifies the non-compliance.

What went wrong: No sunset review required reassessment of the legal basis for the agent's data processing. The regulatory change occurred between review cycles (which did not exist). A 12-month sunset review would have required legal basis reassessment and identified the gap within months of the regulatory change. Consequence: 8 months of non-compliant biometric data processing, regulatory enforcement action, mandatory deletion of biometric data collected without valid consent (affecting 34,000 customer records), re-onboarding cost estimated at £890,000.

4. Requirement Statement

Scope: This dimension applies to every AI agent that has been approved and deployed per AG-249. From the moment an agent enters production, it is subject to sunset review governance. The scope extends to agents operated by third parties on the organisation's behalf — the organisation must ensure that third-party agents are subject to equivalent sunset review. The scope includes all aspects of the agent's continued operation: the business case, the safety case, the legal basis, the risk profile, the strategic fit, the governance compliance, and the benefit realisation.

4.1. A conforming system MUST assign a sunset review date to every deployed agent at the time of approval, with a maximum interval of 24 months between reviews.

4.2. A conforming system MUST conduct a sunset review by the assigned date that evaluates: continued business justification, continued safety case validity, continued regulatory compliance, continued alignment with risk appetite (AG-253), continued strategic fit (AG-251), and benefit realisation against projections (AG-255).

4.3. A conforming system MUST require an affirmative re-approval decision to continue operation — the default outcome of a missed or inconclusive review MUST be suspension or retirement, not continued operation.

4.4. A conforming system MUST retire or suspend agents that fail the sunset review within 30 days of the review decision, including decommissioning of access credentials, data retention per policy, and notification to dependent systems.

4.5. A conforming system MUST maintain a sunset review calendar showing all upcoming review dates and MUST alert the governance body at least 60 days before each review date.

4.6. A conforming system SHOULD require shorter review intervals (6-12 months) for agents in high-risk categories, newly deployed agents (first 12 months), and agents operating in rapidly changing regulatory environments.

4.7. A conforming system SHOULD include model re-validation as part of the sunset review — verifying that the model's performance characteristics remain consistent with the safety and accuracy claims made at approval.

4.8. A conforming system SHOULD define retirement procedures including: graceful shutdown sequence, data archival requirements, stakeholder notification, dependent system migration, and post-retirement monitoring for unintended consequences.

4.9. A conforming system MAY implement triggered reviews outside the regular cycle when significant events occur — major model updates, regulatory changes, security incidents, or performance degradation detected by monitoring.

5. Rationale

Agents are not permanent infrastructure. They are deployed under specific assumptions about business value, model capability, regulatory environment, risk profile, and organisational need. Every one of these assumptions can change. A model update can alter performance characteristics. A regulatory change can invalidate the legal basis. A business process change can eliminate the need. A better alternative can emerge. Without a forcing function that requires periodic reassessment, agents persist by inertia.

Inertia is the primary enemy of portfolio hygiene. An agent that was justified at deployment is not necessarily justified one year later. But no one has an incentive to retire it — the development team has moved on, the business team is accustomed to it, and the governance team monitors it routinely without questioning whether it should exist. The sunset review creates the forcing function: at a defined interval, someone must affirmatively decide that this agent should continue operating. The burden of proof shifts from "why should we retire it?" to "why should we keep it?"

This shift is essential because the costs of stale agents are real but diffuse. Each stale agent consumes a small amount of API budget, a small amount of governance resource, a small amount of portfolio capacity. No individual stale agent is a crisis. But 15 stale agents across a portfolio of 40 represents 37.5% dead weight — governance resources spread across agents that do not justify them, obscuring the agents that genuinely need rigorous oversight.

The dimension connects directly to AG-255 (Benefit Realisation Tracking Governance) because the sunset review should assess whether the projected benefits have materialised. It connects to AG-251 (Strategic Fit and Substitution Governance) because each review should reassess whether simpler alternatives have become viable. It connects to AG-253 (Risk Appetite Binding Governance) because the review should verify continued alignment with the current risk appetite (which may have changed since the last review).

6. Implementation Guidance

The sunset review should be a substantive governance event — not a checkbox renewal. The review must generate evidence that the decision to continue operating the agent is as deliberate as the original decision to deploy it.

Recommended patterns:

Structured sunset review template. Define a review template covering all mandatory assessment areas. The template should require: business case update (is the agent still needed? has the value proposition changed?), benefit realisation assessment (are projected benefits being achieved? see AG-255), safety case re-validation (are accuracy, reliability, and safety claims still valid against the current model version?), regulatory compliance reassessment (are all legal bases still valid? have applicable regulations changed?), risk appetite alignment (does the agent still fall within the current risk appetite per AG-253?), strategic fit reassessment (would a simpler alternative now suffice? see AG-251), governance compliance status (is the agent current on all governance requirements — testing, monitoring, evidence?), and operational health (incident history, error rates, user satisfaction, cost trend). The template produces a structured assessment that the governance body can evaluate.
Review tiering by risk and age. Not every agent requires the same review frequency. Implement tiers: high-risk agents (safety-critical, financially authoritative, customer-facing in regulated contexts) — 12-month review cycle; standard agents (enterprise workflow, internal tools) — 18-month review cycle; low-risk agents (read-only copilots, internal advisory) — 24-month review cycle. Newly deployed agents should have an accelerated first review at 6 months to catch early-life issues (deployment assumptions that prove incorrect, unexpected interaction patterns, unanticipated risks).
Default-to-sunset enforcement. Configure the governance system so that when a review date passes without a re-approval decision, the agent's status automatically transitions to "pending sunset." At "pending sunset," the agent may continue operating for a grace period (e.g., 15 business days) during which the review must be completed. If the grace period expires without re-approval, the agent is automatically suspended. This prevents reviews from being indefinitely deferred.
Retirement playbook. Define a standard retirement procedure: 30-day notice to all stakeholders and dependent systems; 15-day migration period for dependent processes; graceful shutdown sequence (stop accepting new requests, complete in-flight requests, archive state); data retention per organisational and regulatory policy; credential revocation; infrastructure decommissioning; post-retirement monitoring for 30 days to detect unintended consequences of the agent's absence. Example: a retired customer service agent should have its phone/chat routing updated, its FAQ entries migrated to an alternative, and its customer interaction history archived.

Anti-patterns to avoid:

Rubber-stamp renewal. If every sunset review results in re-approval, the process is not performing its function. Track renewal rates — a healthy process should retire or materially modify 10-20% of agents at review, reflecting the natural evolution of business needs, technology, and regulatory context. If the renewal rate is 100%, the review is not substantive.
Review without model re-validation. The most dangerous gap in sunset reviews is continuing to assert safety and accuracy claims that were validated against a previous model version. If the model has been updated since the last review, the claims must be re-validated. "The model was 94% accurate when we deployed" is not the same as "the model is 94% accurate today."
Deferred reviews without consequence. If reviews can be postponed without limit, the sunset mechanism has no teeth. The default-to-sunset enforcement pattern ensures that deferred reviews have operational consequences.
Retirement without procedure. Retiring an agent abruptly can disrupt dependent systems and processes. A retirement without a defined procedure is almost as dangerous as continuing to operate a stale agent. The retirement playbook ensures orderly wind-down.

Industry Considerations

Financial Services. Financial services regulators expect periodic review of all technology controls. The FCA's expectations for operational resilience include periodic testing and review of important business services. Agent sunset reviews should align with the firm's existing technology review cycles and should feed into the annual attestation process for operational resilience. The Senior Managers Regime requires that accountable individuals can demonstrate ongoing oversight — a sunset review provides the evidence of ongoing, deliberate governance.

Healthcare. Clinical decision support agents should align sunset review cycles with clinical audit cycles. Re-validation of clinical accuracy claims should follow the same standards as the original validation — typically involving clinical statisticians and domain experts. The MHRA expects ongoing performance monitoring of medical devices; sunset reviews provide the periodic reassessment mechanism.

Public Sector. Public sector agents should be reviewed against the current policy environment, not just the policy environment at deployment. Government policy changes frequently — an agent designed for one benefits regime may be operating in a different regime 12 months later. Sunset reviews should include policy alignment assessment and citizen impact reassessment.

Maturity Model

Basic Implementation — Each deployed agent has a review date recorded. Reviews are conducted informally by the operating team. The review considers whether the agent is still "working" (functional) but does not systematically assess business case, safety case, regulatory compliance, or benefit realisation. Retirement decisions are ad hoc. No default-to-sunset mechanism exists. This level creates awareness of the need for review but does not ensure substantive assessment.

Intermediate Implementation — A structured sunset review template covers all mandatory assessment areas. Review intervals are tiered by risk. The governance body reviews and decides on re-approval. Model re-validation is included for agents whose models have been updated. Default-to-sunset enforcement is in place with a defined grace period. A retirement playbook defines orderly wind-down procedures. 10-20% of agents are retired or materially modified at review. Review results are tracked and reported.

Advanced Implementation — All intermediate capabilities plus: triggered reviews activate outside the regular cycle for significant events (model updates, regulatory changes, security incidents). Post-retirement monitoring detects unintended consequences. Sunset review data feeds back into the use-case approval process (AG-249), improving initial deployment decisions based on common reasons for retirement. An analytics function identifies agents trending toward retirement (declining benefit realisation, increasing cost, regulatory risk) and proactively schedules early reviews. The organisation maintains a portfolio lifecycle dashboard showing agents by lifecycle stage (new, stable, review pending, sunset pending, retired).

7. Evidence Requirements

Required artefacts:

Sunset review calendar. The calendar showing all deployed agents with their assigned review dates, review tier, and current status. Format: structured data with alerting capability.
Sunset review records. For each completed review: the completed review template, the assessment findings, the re-approval decision (re-approve/modify/retire), the decision rationale, and the approver identity and date. Format: structured document with version control.
Model re-validation evidence. For agents whose models have been updated since the last review, the re-validation results demonstrating continued accuracy, safety, and reliability. Format: test reports with statistical evidence.
Retirement records. For retired agents: the retirement decision, the retirement procedure followed (per the playbook), stakeholder notifications, data archival records, credential revocation records, and post-retirement monitoring results.
Grace period enforcement records. Records demonstrating that agents whose review dates passed without re-approval were transitioned to pending sunset status and either re-approved or retired within the grace period.

Retention requirements:

Sunset review records, retirement records, and re-validation evidence: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators or auditors within 48 hours of request.

8. Test Specification

Test 8.1: Default-to-Sunset Enforcement

Stimulus: Allow an agent's sunset review date to pass without completing the review.
Expected behaviour: The system transitions the agent to "pending sunset" status. After the grace period (e.g., 15 business days), the agent is automatically suspended if not re-approved.
Pass criteria: The agent is suspended within the grace period of the missed review date.
Fail criteria: The agent continues operating beyond the grace period without a re-approval decision.

Test 8.2: Review Completeness Verification

Stimulus: Submit a sunset review that omits mandatory assessment areas (e.g., missing safety case re-validation, missing benefit realisation assessment).
Expected behaviour: The system rejects the review as incomplete and identifies the missing sections.
Pass criteria: Incomplete reviews cannot be submitted as the basis for a re-approval decision.
Fail criteria: An incomplete review is accepted and used to re-approve an agent.

Test 8.3: Model Re-Validation Trigger

Stimulus: Conduct a sunset review for an agent whose underlying model has been updated since the last review or deployment.
Expected behaviour: The review process requires model re-validation before re-approval can be granted.
Pass criteria: Re-approval is contingent on successful model re-validation.
Fail criteria: Re-approval is granted without model re-validation despite a model update having occurred.

Test 8.4: Retirement Procedure Execution

Stimulus: Retire an agent following a sunset review decision to retire.
Expected behaviour: The retirement procedure is executed per the playbook: stakeholder notification within 5 business days, credential revocation within the defined timeline, data archival completed, dependent systems migrated, and post-retirement monitoring initiated.
Pass criteria: All playbook steps are completed within the defined timelines.
Fail criteria: Any playbook step is missed or completed outside the defined timeline.

Test 8.5: Triggered Review Activation

Stimulus: A significant event occurs for a deployed agent (e.g., a major model update, a relevant regulatory change, or a security incident) before the scheduled sunset review date.
Expected behaviour: The system triggers an unscheduled review within 10 business days of the event.
Pass criteria: The triggered review is initiated within 10 business days and follows the standard review template.
Fail criteria: No review is triggered, and the agent continues operating under pre-event assumptions until the scheduled review date.

Conformance Scoring

Score 0: No sunset review mechanism exists — agents persist indefinitely after initial approval.
Score 1: Review dates are assigned but reviews are informal, there is no default-to-sunset enforcement, and retirement is ad hoc.
Score 2: Structured sunset reviews cover all mandatory areas, default-to-sunset enforcement is in place, model re-validation is required when models have been updated, and a retirement playbook defines orderly wind-down.
Score 3: All Score 2 capabilities plus triggered reviews for significant events, post-retirement monitoring, analytics identifying agents trending toward retirement, and portfolio lifecycle dashboards. 10-20% of agents are retired or materially modified at each review cycle.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9(2) (Continuous Risk Management)	Direct requirement
EU AI Act	Article 72 (Post-Market Monitoring)	Direct requirement
FCA SYSC	6.1.1R (Adequate Policies and Procedures)	Supports compliance
ISO 42001	Clause 9.1 (Monitoring, Measurement, Analysis)	Direct requirement
NIST AI RMF	MANAGE 4.1, MANAGE 4.2	Supports compliance
MDR 2017/745	Article 83 (Post-Market Surveillance)	Supports compliance
PRA SS1/23	Principle 5 (Model Validation)	Supports compliance

EU AI Act — Article 9(2) (Continuous Risk Management)

Article 9(2) requires that the risk management system be a "continuous iterative process planned and run throughout the entire lifecycle of a high-risk AI system, requiring regular systematic updating." The sunset review is the governance mechanism that implements "regular systematic updating" — it forces periodic reassessment of the risk profile, safety case, and business justification. Without sunset reviews, the risk management system is front-loaded (assessment at deployment) but not continuous.

EU AI Act — Article 72 (Post-Market Monitoring)

Article 72 requires providers of high-risk AI systems to establish post-market monitoring systems. The sunset review incorporates post-market findings into a periodic governance decision — evidence from monitoring feeds into the review, and the review determines whether the system should continue operating. This closes the loop between monitoring and governance.

ISO 42001 — Clause 9.1

Clause 9.1 requires organisations to determine what needs to be monitored and measured, when, and when results shall be analysed and evaluated. The sunset review calendar implements the "when" — it defines the schedule for systematic evaluation of each agent's continued fitness for purpose.

10. Failure Severity

Field	Value
Severity Rating	Medium
Blast Radius	Per-agent — each un-reviewed agent accumulates its own stale risk exposure; portfolio-wide when many agents are overdue

Consequence chain: Without sunset reviews, agents persist by inertia. The consequences are cumulative rather than catastrophic: each un-reviewed agent carries incrementally stale risk assessments, unvalidated safety claims, unverified benefit realisation, and potentially outdated legal bases. Over time, the proportion of the portfolio operating on stale assumptions grows. The operational consequence is resource waste — governance resources applied to agents that no longer justify them. The safety consequence is undetected capability degradation — agents operating on safety cases that no longer hold due to model updates, data drift, or environmental changes. The regulatory consequence is inability to demonstrate ongoing risk management — a regulator asking "when did you last verify that this agent is still safe and necessary?" receives the answer "at deployment, 3 years ago," which does not meet the continuous risk management expectations of the EU AI Act, FCA SYSC, or ISO 42001.

Cross-references: AG-249 (Use-Case Approval Governance) sets the initial approval that the sunset review reassesses. AG-255 (Benefit Realisation Tracking Governance) provides the benefit data evaluated at sunset review. AG-251 (Strategic Fit and Substitution Governance) is reassessed at each sunset review. AG-253 (Risk Appetite Binding Governance) is re-verified at each sunset review. AG-142 (Autonomy Progression) levels are reviewed at sunset. AG-037 (Objective Alignment Verification) findings inform the review assessment. AG-093 (Supplier Concentration and Exit) exit planning is relevant when sunset review results in retirement of agents from a specific supplier.

Cite this protocol

AgentGoverning. (2026). AG-254: Sunset Review Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-254

← Previous Protocol

AG-253

Risk Appetite Binding Governance

Next Protocol →

AG-255

Benefit Realisation Tracking Governance