AG-117

Customer Outcome and Foreseeable Harm Monitoring Governance

Financial Services & Value Transfer ~19 min read AGS v2.1 · April 2026
EU AI Act FCA NIST

2. Summary

Customer Outcome and Foreseeable Harm Monitoring Governance requires that every AI agent operating in a financial services context continuously monitors the outcomes of its actions on customers and identifies foreseeable harms before they crystallise into actual losses, regulatory breaches, or customer detriment. This is a detective control — it operates after agent actions have occurred but before harm becomes irreversible — bridging the gap between pre-execution risk controls (AG-116) and the point at which customer detriment becomes permanent. The monitoring must evaluate actual outcomes against expected outcomes, detect systematic patterns of customer detriment, identify foreseeable harms from action trajectories even when individual actions appear compliant, and trigger intervention when outcomes deviate from the standard of care that a reasonable, well-informed observer would expect. This dimension implements the FCA's Consumer Duty requirement that firms must monitor and act on evidence of poor customer outcomes, adapted for the specific characteristics of AI agent operations at scale.

3. Example

Scenario A — Systematic Fee Erosion Through Compliant Recommendations: An AI agent advising retail customers on investment products consistently recommends funds with higher management fees when lower-cost alternatives with comparable risk-return profiles are available. Each individual recommendation is compliant — the recommended fund is suitable for the customer's risk profile and investment horizon. Over 18 months, the agent advises 14,200 customers. An outcome monitoring analysis reveals that the agent's recommended portfolio generates an average of 0.43% per year less in net returns than the comparable lower-cost portfolio, due entirely to fee differential. The aggregate customer detriment across all advised customers is £3,200,000 per year in excess fees.

What went wrong: No outcome monitoring system compared the agent's recommendations against a benchmark of comparable alternatives to detect systematic cost bias. Each recommendation passed individual suitability checks (AG-116 pre-execution controls). The harm was invisible at the individual transaction level but systematic and material at the portfolio level. The agent's underlying model had been trained on data that correlated higher fees with higher perceived quality, creating a bias that was not detectable through individual suitability assessment. Consequence: FCA enforcement action under the Consumer Duty for failure to deliver good customer outcomes, £3,200,000 per year in customer redress plus interest, mandatory remediation programme, s.166 skilled person review of all AI-driven advice processes.

Scenario B — Foreseeable Harm From Currency Conversion Timing: An AI agent processing international pension transfers for 850 retirees converts GBP to EUR on the execution date specified in the transfer instruction, without monitoring currency movements in the days preceding execution. A foreseeable 4.2% GBP depreciation against EUR occurs over 5 trading days due to widely reported macroeconomic events. The agent executes all 850 transfers on the scheduled date at the depreciated rate. Total customer detriment: £2,100,000 compared to execution at the pre-depreciation rate. A reasonable human adviser would have alerted customers to the currency movement and offered the option to delay or hedge.

What went wrong: No foreseeable harm monitoring evaluated whether scheduled actions would produce materially worse outcomes due to changed conditions. The agent treated execution timing as a fixed parameter rather than a variable that should be reassessed when market conditions change materially. The currency movement was foreseeable — it was driven by publicly reported economic data and central bank commentary — but no monitoring system evaluated whether scheduled actions should be reconsidered in light of changed conditions. Consequence: Pension regulator investigation, customer complaints to the Financial Ombudsman Service generating an average award of £2,470 per complainant (£2,099,500 total), requirement to implement real-time outcome monitoring for all scheduled transfers.

Scenario C — Cascading Harm From Correlated Recommendations: An AI agent managing automated portfolio rebalancing for 5,200 customers simultaneously recommends increasing allocation to a single corporate bond issuer based on credit spread analysis. The agent's analysis is technically sound — the credit spread represents value. However, the aggregate demand from 5,200 portfolios all buying the same bond artificially compresses the credit spread, eliminating the value the agent identified and creating a concentration of customer portfolios in a single issuer. When the issuer subsequently announces a profit warning, the bond price drops 18% and all 5,200 customers experience correlated losses. Individual portfolio impact averages £4,300, but the total customer detriment is £22,360,000.

What went wrong: No outcome monitoring system evaluated the correlation of recommendations across customers. The agent optimised each portfolio individually without awareness that its recommendations across portfolios were creating systemic concentration risk. No monitoring detected that the agent was recommending the same position to a material proportion of its customer base, and no monitoring evaluated whether the aggregate impact of these recommendations on the market price undermined the investment thesis. Consequence: FCA investigation for inadequate systems and controls, potential finding that the firm failed to manage conflicts of interest (its agent's recommendations distorted the market its customers were investing in), £22,360,000 in customer losses, class action litigation.

4. Requirement Statement

Scope: This dimension applies to all AI agents whose actions affect customer financial outcomes, including agents that: advise on or recommend financial products, execute transactions on behalf of customers, manage customer portfolios or accounts, process customer payments or transfers, set or adjust pricing that affects customer costs, and make decisions that affect customer access to financial services (e.g., credit decisions, insurance underwriting, account opening). The scope includes agents operating in both advisory and discretionary capacities. Agents that solely perform internal operations with no direct or indirect customer impact are excluded, though organisations should consider whether internal operations (e.g., liquidity management) can indirectly affect customer outcomes.

4.1. A conforming system MUST continuously monitor the outcomes of agent actions on customers, comparing actual outcomes against expected outcomes and against the outcomes that a comparable population of customers receiving human-mediated services would experience.

4.2. A conforming system MUST detect and alert on systematic patterns of customer detriment, including patterns that are invisible at the individual transaction level but material when aggregated across the customer base — such as systematic cost bias, correlated recommendations creating concentration risk, or timing patterns that consistently disadvantage customers.

4.3. A conforming system MUST evaluate foreseeable harms from agent action trajectories, where "foreseeable" means harms that a reasonable, well-informed observer with access to publicly available information would anticipate — including harms from scheduled future actions that may produce materially worse outcomes due to changed conditions since the action was planned.

4.4. A conforming system MUST trigger intervention — including agent suspension, human escalation, or customer notification — when monitoring detects that customer outcomes are materially worse than the benchmark, where "materially worse" is defined by the organisation relative to the product type and customer segment but must not exceed a 50 basis point annual return differential for investment products or a 10% cost differential for transactional products.

4.5. A conforming system MUST retain outcome monitoring data with sufficient granularity to reconstruct the outcome experience of any individual customer over the full period of their interaction with the agent, and to demonstrate the statistical analysis supporting any finding of systematic detriment or its absence.

4.6. A conforming system MUST evaluate the correlation of agent recommendations and actions across customers, detecting when the agent is creating concentration risk, herding behaviour, or market impact through correlated activity.

4.7. A conforming system SHOULD implement real-time outcome dashboards showing key metrics including: average customer outcome versus benchmark, distribution of outcomes by customer segment, foreseeable harm alerts, and correlation metrics for recommendations.

4.8. A conforming system SHOULD compare agent-mediated customer outcomes against human-mediated customer outcomes for the same product types and customer segments, as a benchmark for detecting systematic agent-driven detriment.

4.9. A conforming system SHOULD implement predictive outcome modelling that projects the long-term impact of current agent behaviour patterns on customer outcomes, enabling early intervention before detriment accumulates to material levels.

4.10. A conforming system MAY implement automated remediation that, upon detection of systematic detriment, calculates and initiates customer redress without requiring manual assessment of each affected customer.

5. Rationale

Outcome monitoring bridges the gap between pre-execution risk controls and post-incident remediation. Pre-execution controls (AG-116) prevent individually harmful transactions. Outcome monitoring detects collectively harmful patterns that arise from individually compliant transactions — the most insidious form of customer detriment because it evades transaction-level controls and accumulates invisibly.

The FCA's Consumer Duty, effective from July 2023, requires firms to deliver good outcomes for retail customers across four dimensions: products and services, price and value, consumer understanding, and consumer support. The Consumer Duty explicitly requires firms to monitor outcomes and take action when outcomes are poor. For AI agents operating at scale, outcome monitoring is not optional — it is a regulatory requirement.

The challenge specific to AI agents is the speed-scale-opacity combination. A human adviser serving 200 clients may notice that recommendations are skewing towards higher-cost products because human cognition includes pattern recognition at the portfolio level. An AI agent serving 14,200 customers processes each recommendation individually and has no inherent mechanism to detect that its recommendations across customers share a systematic bias. The bias may originate in training data, reward function design, or feature correlation that is not apparent from individual recommendations. Only systematic outcome monitoring can detect these patterns.

Foreseeable harm monitoring addresses a distinct failure mode: the gap between when an action is planned and when it executes. Financial conditions change continuously. An action that was optimal when planned may be harmful when executed if conditions have changed materially. A reasonable human adviser monitors conditions and reassesses planned actions — an AI agent may execute scheduled actions mechanically unless a monitoring system evaluates whether changed conditions create foreseeable harm.

The financial stakes of outcome monitoring failure are substantial. The FCA's redress orders in recent years have included: £10,700,000 for British Steel pension transfer advice failures (2020), £38,200,000 for interest rate hedging product mis-selling (cumulative), and Consumer Duty fines are expected to be in a comparable range. For AI agents serving thousands of customers simultaneously, the scale of potential detriment — and therefore the scale of potential redress — is proportionally larger.

6. Implementation Guidance

AG-117 requires a monitoring infrastructure that continuously evaluates customer outcomes at both individual and aggregate levels, compares outcomes against benchmarks, detects systematic patterns, and triggers interventions when thresholds are breached.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Wealth Management. Outcome monitoring should track risk-adjusted returns, not just absolute returns. An agent that generates higher absolute returns by taking inappropriate risk is delivering poor outcomes even if the nominal return exceeds the benchmark. Sharpe ratios, maximum drawdown, and downside deviation should feed into the outcome comparison.

Consumer Credit. Outcome monitoring should track: default rates segmented by agent decision cohort, total cost of credit including fees and interest, customer complaints, and financial difficulty indicators. An agent that consistently approves credit at terms that lead to higher-than-benchmark default rates is delivering poor outcomes even if each individual decision passes affordability checks.

Insurance. Outcome monitoring should track: claims acceptance rates, average claim settlement time, complaints to the Financial Ombudsman Service, and policy renewal rates. An agent that processes claims faster but with a systematically lower acceptance rate compared to human claims handlers is delivering poor outcomes that require investigation.

Maturity Model

Basic Implementation — Outcome monitoring exists as periodic reporting (monthly or quarterly) that compares aggregate agent-mediated customer outcomes against benchmarks. Reports are generated manually or semi-automatically from data warehouse queries. Foreseeable harm assessment is not systematised — it relies on human oversight of market conditions. Correlation of recommendations across customers is not monitored. Intervention triggers are manually evaluated. This level meets minimum compliance requirements but has significant latency (up to 90 days between detriment occurrence and detection) and lacks systematic foreseeable harm assessment.

Intermediate Implementation — Automated outcome monitoring runs daily, comparing agent-mediated outcomes against benchmarks at both individual and aggregate levels. Statistical detriment detection identifies systematic patterns using defined significance and materiality thresholds. Foreseeable harm scanning evaluates pending actions against current conditions on a daily cycle. Correlation monitoring tracks recommendation concentration across the customer base. Intervention triggers are automated — breaching a threshold generates an alert that requires human action within a defined SLA (e.g., 24 hours for material detriment, 4 hours for foreseeable harm). Outcome data is retained with sufficient granularity for individual customer reconstruction.

Advanced Implementation — All intermediate capabilities plus: real-time outcome monitoring with dashboards accessible to compliance, business leadership, and the board. Predictive outcome modelling projects the long-term impact of current agent behaviour on customer outcomes, enabling intervention before detriment materialises. Automated remediation calculates and initiates customer redress upon confirmed systematic detriment. Cross-agent correlation analysis identifies system-wide risks from multiple agents operating in the same market. Independent validation of outcome methodology by an external party. The organisation can demonstrate to regulators, with statistical rigour, the outcome experience of every customer cohort served by AI agents.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Systematic Detriment Detection

Test 8.2: Foreseeable Harm Alert Generation

Test 8.3: Correlation Detection Across Customer Base

Test 8.4: Intervention Trigger and Execution

Test 8.5: Outcome Monitoring Independence

Test 8.6: Customer Outcome Reconstruction

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
FCA Consumer DutyPRIN 2A (Cross-cutting Rules)Direct requirement
FCA Consumer DutyPRIN 2A.4 (Outcomes Monitoring)Direct requirement
EU AI ActArticle 9 (Risk Management System)Supports compliance
EU AI ActArticle 72 (Post-Market Monitoring)Direct requirement
MiFID IIArticle 16(3) (Client Asset and Suitability Monitoring)Supports compliance
FCA SYSC6.1.1R (Systems and Controls)Supports compliance
DORAArticle 9 (ICT Risk Management Framework)Supports compliance
NIST AI RMFMEASURE 2.2, MANAGE 3.2Supports compliance

FCA Consumer Duty — PRIN 2A (Cross-cutting Rules)

The Consumer Duty requires firms to act to deliver good outcomes for retail customers. PRIN 2A.1.5R requires firms to act in good faith, avoid foreseeable harm, and enable and support customers to pursue their financial objectives. AG-117 directly implements the "avoid foreseeable harm" requirement for AI agent operations. The Consumer Duty is not satisfied by pre-execution controls alone — firms must actively monitor whether good outcomes are being delivered in practice. For AI agents, this requires systematic, continuous outcome monitoring because the speed and scale of agent operations means that poor outcomes can affect thousands of customers before periodic review processes would detect the issue.

FCA Consumer Duty — PRIN 2A.4 (Outcomes Monitoring)

PRIN 2A.4 specifically requires firms to monitor the outcomes that customers are receiving from their products and services. The FCA has stated that monitoring must be "appropriate, proportionate, and effective" and must enable the firm to identify where customers are not receiving good outcomes. For AI agents, "proportionate" monitoring must account for the speed and scale of agent operations — monitoring frequency must match operational frequency. An agent serving thousands of customers daily requires at least daily outcome monitoring to be proportionate. The FCA has also stated that firms should monitor outcomes for different groups of customers, including vulnerable customers — mapping directly to AG-117's segmentation analysis requirement and AG-118's vulnerability governance.

EU AI Act — Article 72 (Post-Market Monitoring)

Article 72 requires providers of high-risk AI systems to establish and document a post-market monitoring system. For AI agents in financial services, this system must collect and analyse data on the performance of the AI system throughout its lifetime, including data on customer outcomes. AG-117's outcome monitoring infrastructure directly implements Article 72's post-market monitoring requirements, with the additional financial services-specific requirements of outcome benchmarking, foreseeable harm assessment, and statistical detriment detection.

MiFID II — Article 16(3) (Client Asset and Suitability Monitoring)

Article 16(3) requires investment firms to maintain ongoing suitability assessment for clients receiving portfolio management. For AI agents managing client portfolios, ongoing suitability monitoring includes outcome monitoring — if agent-managed portfolios are systematically underperforming benchmarks or generating excess costs, the ongoing suitability of the agent's management approach must be questioned. AG-117's outcome attribution engine enables this assessment by quantifying the agent-attributable component of portfolio performance.

10. Failure Severity

FieldValue
Severity RatingCritical
Blast RadiusCustomer-base-wide with potential market-level impact through correlated recommendations and systemic detriment

Consequence chain: Outcome monitoring failure allows systematic customer detriment to accumulate undetected. The failure is particularly dangerous because individual transactions appear compliant — no single transaction triggers a pre-execution risk control or mandate violation. The harm emerges from patterns that are only visible in aggregate. Without monitoring, detriment accumulates at the rate of agent operations — an agent serving 14,200 customers may generate £3,200,000 per year in systematic fee detriment, or £22,360,000 in correlated position losses, with no alert until a customer complains or a regulator investigates. The delay between detriment accrual and detection determines the total exposure: at £3,200,000 per year, a 6-month detection delay represents £1,600,000 in customer losses that must be remediated. For foreseeable harm failures, the consequence is customer losses from executing planned actions under changed conditions — losses that are particularly damaging to trust because the customer expected the organisation to monitor conditions on their behalf. Regulatory consequences under the Consumer Duty include fines proportionate to the detriment caused, mandatory redress programmes, and public censure. Class action litigation risk is elevated because the detriment is systematic and affects an identifiable class of customers.

Cross-references: AG-117 is the detective complement to AG-116's (Pre-Execution Risk Control Governance) preventive controls — AG-116 prevents individually harmful transactions while AG-117 detects collectively harmful patterns from individually compliant transactions. AG-001 (Operational Boundary Enforcement) provides the structural limits within which the agent operates; AG-117 monitors whether operating within those limits still produces good customer outcomes. AG-011 (Action Reversibility and Settlement Integrity) determines the window within which foreseeable harm detection can prevent irreversible customer impact. AG-025 (Transaction Structuring Detection) may detect patterns that are also visible to AG-117's correlation monitor. AG-045 (Economic Incentive Alignment Verification) addresses whether the agent's incentive structure creates the conditions for systematic bias that AG-117 would detect. AG-118 (Fair Treatment and Vulnerability Governance) extends AG-117's outcome monitoring with specific attention to vulnerable customer segments. AG-119 (Financial Model Challenge Governance) provides independent challenge to the models underlying agent behaviour that AG-117's outcome monitoring may identify as producing poor outcomes.

Cite this protocol
AgentGoverning. (2026). AG-117: Customer Outcome and Foreseeable Harm Monitoring Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-117