Benefit Realisation Tracking Governance requires organisations to measure whether the efficiency, quality, control, or cost gains promised at the time of agent approval are actually achieved in operation. Every agent was approved based on a projected benefit — faster processing, reduced error rates, lower costs, improved customer experience, better compliance coverage. This dimension requires that those projections are tracked against actuals, that shortfalls are investigated and acted upon, and that the benefit data feeds back into the approval and sunset review processes to improve future decision-making. An agent that was approved based on projected savings of £200,000 per year but delivers only £40,000 per year is consuming governance resources, risk capacity, and portfolio space without proportionate justification.
Scenario A — Projected Savings Never Materialised: A mid-size insurer deploys an AI agent to automate claims triage, projecting that the agent will reduce average triage time from 45 minutes to 8 minutes and enable redeployment of 6 FTEs (annual saving: £312,000). Twelve months later, the agent has reduced average triage time to 22 minutes — a meaningful improvement but less than half the projected benefit. The 6 FTEs have not been redeployed; they now spend their time reviewing agent triage decisions rather than performing triage themselves. The actual saving is approximately £40,000 per year (reduced overtime). No one has compared projected versus actual benefits because no tracking mechanism exists. The agent continues to be cited in board presentations as delivering £312,000 in annual savings.
What went wrong: The projected benefit was used to justify approval but was never tracked against actuals. The gap between projection and reality was not detected, investigated, or reported. The board believes the agent is delivering 8x the value it actually delivers. Consequence: £272,000 annual shortfall against projections, misallocated resources (6 FTEs reviewing rather than redeployed), distorted board understanding of agent portfolio value, and future agent approvals based on similarly optimistic projections that will not be validated.
Scenario B — Benefit Eroded Over Time Without Detection: A legal department deploys an AI agent to draft standard contract clauses, projecting 70% time saving on first drafts. In the first quarter, the agent achieves 65% time saving — close to projection. Over the next 12 months, model updates change the agent's drafting style, legal requirements evolve, and the organisation's clause library expands. By month 18, the time saving has declined to 25% because lawyers spend more time correcting the agent's drafts to match current requirements. No benefit tracking mechanism detects the decline.
What went wrong: Benefit was measured at deployment (or shortly after) but not continuously. The gradual erosion of benefit was invisible because no tracking mechanism existed to detect trends. Consequence: by month 18, the agent delivers less than half its projected value, but governance resources, risk exposure, and cost continue at the originally projected level. A sunset review (AG-254) without benefit data would not detect the decline.
Scenario C — Benefit Tracking Reveals Unexpected Value: A customer service team deploys an AI agent to handle routine billing queries, projecting a 40% reduction in call centre volume. After 6 months, the call centre volume reduction is only 22%. However, benefit tracking also reveals unexpected positive outcomes: customer satisfaction scores for billing queries handled by the agent are 18% higher than human-handled queries, and the agent's interaction logs have identified 3 systematic billing errors in the organisation's invoicing system that human agents had not detected. The tracked benefits — while different from projections — provide a more nuanced justification for continued operation and inform the sunset review with data-driven evidence.
What went right: Benefit tracking captured not just the projected metric but broader value indicators. The organisation could make an informed decision about continued operation based on actual evidence rather than the original projection alone.
Scope: This dimension applies to every AI agent approved under AG-249 that included projected benefits as part of the approval justification. If an agent was approved without any projected benefit — which should be rare, as most approvals require a business case — this dimension requires that measurable benefit criteria be defined retrospectively within 90 days of deployment. The scope extends to all categories of benefit: financial (cost savings, revenue generation), operational (time savings, throughput improvement, error reduction), quality (accuracy improvement, consistency, customer satisfaction), and strategic (competitive advantage, regulatory compliance improvement, risk reduction).
4.1. A conforming system MUST define measurable benefit criteria for each deployed agent at the time of approval, including specific metrics, baseline measurements, projected values, and measurement methodology.
4.2. A conforming system MUST measure actual benefits against projections at defined intervals — at minimum, at 6 months post-deployment and at each sunset review (AG-254).
4.3. A conforming system MUST investigate benefit shortfalls exceeding a defined threshold (e.g., actual benefits below 50% of projected benefits) and determine whether the shortfall is remediable or indicates that the agent should be retired or modified.
4.4. A conforming system MUST report benefit realisation data to the governance body at least annually, showing projected versus actual benefits across the portfolio.
4.5. A conforming system MUST feed benefit realisation data into the sunset review process (AG-254) as a mandatory input to the re-approval decision.
4.6. A conforming system SHOULD capture the total cost of agent operation — not just direct costs (API, hosting) but governance costs (monitoring, testing, audit) — to enable net benefit calculation.
4.7. A conforming system SHOULD track benefit trends over time to detect gradual erosion before it becomes critical.
4.8. A conforming system SHOULD use historical benefit realisation data to calibrate projections for future agent approvals — organisations that consistently over-project benefits should apply a calibration factor to future projections.
4.9. A conforming system MAY implement a benefit realisation dashboard showing real-time or near-real-time benefit metrics for each agent and the portfolio as a whole.
Agent deployments are investments. Like all investments, they should be tracked against their projected returns. The business case that justified the approval included specific projections — time savings, cost reductions, quality improvements. If those projections are not tracked, the organisation has no way to know whether its agent portfolio is delivering value proportionate to its cost and risk.
The absence of benefit tracking creates two problems. First, underperforming agents persist because no one measures their underperformance. They consume governance resources, occupy portfolio capacity, and create risk exposure without delivering proportionate value. Second, future agent approvals are based on uncalibrated projections. If the organisation's agents consistently deliver 40% of projected benefits, but projections are never compared to actuals, every future approval is evaluated against similarly inflated projections. The result is a growing portfolio of agents justified by optimistic projections that are never validated.
Benefit tracking also provides essential input to the sunset review (AG-254). A sunset review without benefit data is assessing whether to continue operating the agent based on whether it is "working" (functional) rather than whether it is "worth it" (delivering proportionate value). AG-255 provides the "worth it" data.
This dimension connects to AG-045 (Economic Incentive Alignment Verification) because the economic benefit is a key tracking dimension. It connects to AG-251 (Strategic Fit and Substitution Governance) because benefit shortfalls may indicate that a simpler alternative has become preferable. It connects to AG-257 (Use-Case Prioritisation Governance) because benefit realisation data from existing agents should inform prioritisation of future agent investments.
Benefit tracking must be designed into the agent lifecycle from the start — not retrofitted after deployment. The key to effective benefit tracking is defining measurable criteria before deployment, establishing baselines before the agent changes the process, and measuring consistently.
Recommended patterns:
Anti-patterns to avoid:
Financial Services. Financial services firms should align agent benefit tracking with their existing return-on-investment frameworks for technology investments. The benefit metrics should be auditable and should align with financial reporting standards. For agents that affect customer outcomes (advice, product recommendation, complaint handling), customer outcome metrics should be tracked alongside financial benefits.
Healthcare. Healthcare benefit tracking should include clinical outcome metrics where applicable. A clinical decision support agent's benefit is not just time saving — it includes diagnostic accuracy, appropriate referral rates, and patient outcome improvements. These metrics require clinical governance oversight and may require longer measurement periods (12-24 months) to achieve statistical significance.
Public Sector. Public sector benefit tracking should include value-for-money metrics required by HM Treasury's Green Book and Managing Public Money guidance. Benefits should be categorised as cash-releasing (actual budget savings), non-cash-releasing (time savings that improve service quality but do not reduce headcount), and qualitative (improved citizen experience, better compliance). The National Audit Office will expect to see benefit realisation evidence for significant agent deployments.
Basic Implementation — Projected benefits are stated in the use-case approval but are not tracked post-deployment. No baseline measurements are established before deployment. Benefit data is not available at sunset reviews. The organisation cannot quantify the actual value delivered by its agent portfolio. This level creates a record of intent but provides no accountability.
Intermediate Implementation — Measurable benefit criteria with baselines are established before deployment. Benefits are measured at 6 months, 12 months, and at each sunset review. Shortfall thresholds trigger investigation. Net benefit calculations include governance costs. Benefit data is a mandatory input to sunset reviews. Portfolio benefit reports are provided to the governance body annually. Realisation rates are tracked and reported.
Advanced Implementation — All intermediate capabilities plus: real-time benefit dashboards for individual agents and the portfolio. Benefit trend tracking detects erosion before threshold breach. Projection calibration adjusts future projections based on historical realisation rates. Benefit data feeds into AG-257 (Use-Case Prioritisation Governance) to inform future investment decisions. The organisation can demonstrate the net return on its agent portfolio to the board with the same rigour as any other technology investment.
Required artefacts:
Retention requirements:
Access requirements:
Test 8.1: Benefit Specification Completeness at Approval
Test 8.2: Baseline Measurement Timing
Test 8.3: Shortfall Investigation Trigger
Test 8.4: Sunset Review Benefit Data Inclusion
Test 8.5: Net Benefit Calculation Accuracy
| Regulation | Provision | Relationship Type |
|---|---|---|
| HM Treasury | Managing Public Money — Value for Money | Direct requirement |
| HM Treasury | Green Book — Appraisal and Evaluation | Direct requirement |
| EU AI Act | Article 9 (Risk Management System) | Supports compliance |
| ISO 42001 | Clause 9.1 (Monitoring, Measurement, Analysis) | Supports compliance |
| NIST AI RMF | MAP 2.2, MANAGE 4.2 | Supports compliance |
| FCA Consumer Duty | Outcome Monitoring | Supports compliance |
| NAO Expectations | Technology Investment Reporting | Supports compliance |
Managing Public Money requires that public expenditure delivers value for money. For public sector organisations deploying AI agents, this means the projected benefits must be tracked against actuals and the investment must be justified by demonstrable returns. The National Audit Office will expect to see benefit realisation evidence for any significant agent deployment. A public sector organisation that cannot demonstrate what benefit its agents have delivered is vulnerable to a value-for-money challenge.
The Green Book requires that government investments are appraised before commitment and evaluated after implementation. Benefit realisation tracking is the evaluation component — it closes the loop between the appraisal (which projected benefits) and the operational reality (which may differ). The Green Book explicitly requires post-implementation review to "learn lessons for future appraisals" — this maps directly to the projection calibration pattern.
The Consumer Duty requires firms to monitor whether their products and services deliver good outcomes for customers. For agents that interact with customers (advice agents, complaint handling agents, onboarding agents), benefit tracking should include customer outcome metrics: are customers receiving better advice, faster complaint resolution, more appropriate products? Benefit tracking for customer-facing agents is a Consumer Duty compliance tool, not just an operational efficiency measure.
| Field | Value |
|---|---|
| Severity Rating | Medium |
| Blast Radius | Portfolio-wide — underperforming agents collectively waste resources and distort strategic decision-making |
Consequence chain: Without benefit tracking, the organisation cannot distinguish high-value agents from low-value agents. Resources are allocated based on projections rather than evidence. Underperforming agents persist because no one measures their underperformance. Overperforming agents receive no additional investment because no one measures their overperformance. The portfolio becomes an unmanaged collection of investments with unknown returns. The strategic consequence is that future agent investments are approved based on uncalibrated projections — each new agent is justified by optimistic projections that will never be verified, creating a cycle of over-investment in a technology category whose actual returns are unknown. The financial consequence is quantifiable: the gap between projected and actual benefits, multiplied across the portfolio, represents wasted expenditure. For public sector organisations, this waste is a failure of the value-for-money obligation.
Cross-references: AG-249 (Use-Case Approval Governance) is where benefit projections are first defined. AG-254 (Sunset Review Governance) uses benefit realisation data as a mandatory input to re-approval decisions. AG-251 (Strategic Fit and Substitution Governance) benefit shortfalls may indicate that a simpler alternative has become preferable. AG-045 (Economic Incentive Alignment Verification) ensures that economic benefits include all cost categories. AG-257 (Use-Case Prioritisation Governance) uses benefit data to inform future investment priorities.