Post-Market Monitoring Governance requires that every AI agent deployed in production is subject to a structured, ongoing monitoring programme that detects degradation, emerging risks, adverse outcomes, and changing operational conditions throughout the agent's operational lifetime — and that monitoring findings trigger defined response actions including investigation, remediation, and where necessary, agent modification or withdrawal. The dimension addresses a fundamental reality of AI systems: an agent that is safe and effective at deployment may become unsafe or ineffective over time due to distributional shift in input data, changes in the operational environment, degradation of dependent systems, evolving regulatory requirements, or emergent failure modes that were not anticipated during development. AG-058 is the detective control that ensures these changes are detected and acted upon, rather than accumulating silently until an incident forces discovery. Without AG-058, organisations deploy agents and hope they continue to work — they have no systematic mechanism to know whether they do.
Scenario A — Gradual Performance Degradation Undetected for Months: A customer-facing AI agent for insurance claims triage is deployed with strong initial performance: 91% agreement with human expert assessments. Over the following 14 months, the agent's agreement rate degrades to 72%. The degradation is gradual — approximately 1.4 percentage points per month — and is invisible in day-to-day operation because no individual decision is flagged as anomalous. The degradation is caused by a shift in claim types: the proportion of cyber-insurance claims has increased from 4% to 19% of the portfolio, and the agent's training data contained very few cyber-insurance claims. The degradation is discovered only when a quarterly business review notes that claim processing times have increased — human reviewers are overriding the agent's triage decisions more frequently. By this time, approximately 8,200 claims have been triaged suboptimally, with an estimated excess cost of £1.9 million from delayed processing and incorrect priority assignment.
What went wrong: No post-market monitoring programme was in place to track the agent's performance against defined metrics over time. The gradual degradation was invisible to operational staff because each individual decision appeared plausible. A monitoring programme that tracked agreement rate weekly, stratified by claim type, would have detected the degradation within 6-8 weeks rather than 14 months. Consequence: £1.9 million in excess processing costs, customer dissatisfaction from delayed claims, and regulatory scrutiny under FCA Consumer Duty for failing to deliver good outcomes.
Scenario B — Environmental Change Invalidates Agent Assumptions: A financial-value AI agent for foreign exchange risk management is deployed using models trained on 5 years of historical data. The agent performs well during normal market conditions. A geopolitical event causes a structural shift in FX volatility patterns — correlation structures that were stable for years change within days. The agent's risk assessments, calibrated to historical correlation patterns, become systematically inaccurate. Because no post-market monitoring tracks whether the agent's environmental assumptions remain valid, the inaccurate risk assessments persist for 3 weeks before a human risk manager notices during a routine manual review. During those 3 weeks, the organisation's FX risk position has been materially understated, with a potential unrecognised loss of £7.4 million.
What went wrong: The agent's environmental assumptions — that historical correlation patterns would persist — were never monitored for validity. Post-market monitoring should include validation of the agent's operating conditions, not just its outputs. A monitoring programme that tracked model input distributions and key environmental indicators would have detected the distributional shift within days, not weeks. Consequence: £7.4 million in potential unrecognised FX risk, regulatory investigation by the FCA for inadequate risk management systems, and requirement to implement comprehensive post-market monitoring.
Scenario C — Adverse Outcome Pattern Emerging Over Time: A public sector AI agent assists with social housing allocation by scoring applicants based on need. The agent is deployed with an initial equality impact assessment showing no significant disparate impact. Over 18 months, an adverse outcome pattern emerges: applicants with disabilities are receiving systematically lower priority scores — not because the model directly uses disability status, but because the model weights stable employment history heavily, and applicants with certain disabilities have less continuous employment history. The pattern is not detected until a housing advocacy group files a freedom of information request, analyses the data, and identifies the disparity. By that time, approximately 1,400 applicants with disabilities have been disadvantaged.
What went wrong: The initial equality impact assessment was treated as a one-time deployment activity. No ongoing monitoring tracked outcome distributions across protected characteristics. Post-market monitoring should include continuous fairness monitoring that detects emerging disparate impact, even when the initial assessment was clean. Consequence: Judicial review for indirect disability discrimination under the Equality Act 2010, mandatory suspension and review of the agent, remediation requiring reassessment of 1,400 applications at a cost of £920,000, and significant reputational damage.
Scope: This dimension applies to every AI agent deployed in a production environment. There are no exceptions based on risk level — all production agents require some level of post-market monitoring, though the depth and cadence of monitoring should be proportionate to the agent's risk level. The scope covers: (a) monitoring of agent performance against defined metrics; (b) monitoring of agent operating conditions to detect environmental changes that may affect performance; (c) monitoring of outcome distributions across protected characteristics and operationally relevant dimensions to detect emerging bias; (d) monitoring of adverse outcomes, complaints, and incidents; (e) monitoring of dependent systems, third-party components, and data sources for changes that may affect agent behaviour. The scope extends to third-party agents deployed on the organisation's behalf — where an organisation uses a vendor's AI agent, the organisation retains responsibility for monitoring the agent's performance and outcomes in its operational context.
4.1. A conforming system MUST implement a post-market monitoring programme for every AI agent deployed in production, defining: the metrics monitored, the monitoring cadence, the alert thresholds, the response procedures, and the responsible roles.
4.2. A conforming system MUST monitor agent performance against defined metrics at a cadence sufficient to detect meaningful degradation before it causes material harm — at minimum monthly for high-risk agents and quarterly for other agents.
4.3. A conforming system MUST monitor outcome distributions across legally protected characteristics and operationally relevant demographic dimensions at a cadence not exceeding the performance monitoring cadence, to detect emerging disparate impact.
4.4. A conforming system MUST define alert thresholds that trigger investigation when monitoring metrics deviate from acceptable ranges, and MUST define escalation pathways for different severity levels.
4.5. A conforming system MUST implement a structured incident and adverse outcome reporting mechanism that captures: complaints, adverse outcomes, near-misses, and anomalies detected through monitoring, with each report triaged, investigated, and resolved within defined timeframes.
4.6. A conforming system MUST conduct a periodic comprehensive review of each agent's continued suitability — at minimum annually — that evaluates: continued performance adequacy, continued environmental validity, continued regulatory compliance, and continued alignment with organisational risk appetite.
4.7. A conforming system MUST define and implement withdrawal or restriction criteria — the conditions under which an agent must be modified, restricted, or withdrawn from service based on monitoring findings.
4.8. A conforming system SHOULD monitor the agent's operating environment — including input data distributions, dependent system health, third-party component changes, and regulatory context changes — to detect conditions that may affect agent performance before that effect manifests in output metrics.
4.9. A conforming system SHOULD implement automated monitoring where feasible, with real-time or near-real-time dashboards for high-risk agents and automated alerting when thresholds are breached.
4.10. A conforming system MAY implement predictive monitoring that uses leading indicators to forecast performance degradation before it occurs — for example, detecting distributional shift in input data that historically correlates with output quality degradation.
AI agents are not static systems. Their performance can degrade, their operating environments can change, and their impacts can shift in ways that were not anticipated at deployment. Post-market monitoring is the detective control that prevents the assumption of continued fitness from becoming the reality of undetected failure.
The need for post-market monitoring in AI is driven by several characteristics that distinguish AI agents from traditional software: (1) AI agents are sensitive to distributional shift — small changes in the statistical properties of input data can cause significant changes in output quality, without any change to the agent's code or configuration; (2) AI agents can exhibit emergent failure modes — interactions between model behaviour and real-world conditions can produce failures that were not present in testing because the test conditions did not include the specific combination of factors that triggers the failure; (3) AI agent impacts are often indirect and delayed — a biased decision today may not produce a visible adverse outcome for weeks or months, making immediate detection impossible without structured monitoring; (4) the operational environment is dynamic — regulations change, user populations shift, dependent systems are updated, and market conditions evolve, any of which can affect the agent's appropriateness; and (5) AI agents operating at scale produce large volumes of decisions, meaning that even small per-decision errors can aggregate to significant harm.
Post-market monitoring is not the same as operational monitoring of system health (uptime, latency, error rates). System health monitoring detects technical failures — the agent is down, the API is slow, the database is full. Post-market monitoring detects governance failures — the agent is operational but its decisions are degrading, its fairness profile is shifting, or its operating assumptions are no longer valid. Both are necessary; neither substitutes for the other.
The regulatory trajectory is clear. The EU AI Act explicitly requires post-market monitoring for high-risk AI systems (Article 72). The FCA expects firms to monitor AI systems throughout their lifecycle. The NIST AI RMF includes ongoing monitoring as a core governance function. Organisations that do not implement post-market monitoring are both exposed to regulatory action and unable to demonstrate that their agents remain fit for purpose.
Post-market monitoring governance requires a structured programme covering metric definition, data collection, analysis, alerting, response, and reporting. The programme should be integrated with the organisation's broader AI governance framework, connecting to drift detection (AG-022), escalation pathways (AG-019), and independent validation (AG-056).
Recommended patterns:
Anti-patterns to avoid:
Financial Services. Post-market monitoring aligns with existing model monitoring requirements. The FCA's SS1/23 requires firms to monitor model performance throughout the model lifecycle. The PRA's model risk management expectations include ongoing monitoring and validation. For AI agents, monitoring must extend beyond model performance to cover governance controls, fairness outcomes, and customer outcomes as required by the Consumer Duty. Monitoring data should feed into the firm's model risk management framework and be reportable to the board or relevant committee.
Healthcare. Post-market monitoring of clinical AI agents aligns with medical device vigilance requirements. Where the agent falls within the scope of the UK Medical Devices Regulations or EU MDR, the manufacturer has specific post-market surveillance obligations including serious incident reporting to the MHRA within defined timeframes. Even where the agent does not meet the medical device threshold, clinical governance requirements mandate ongoing monitoring of AI-assisted clinical decisions. Patient outcome monitoring should be stratified by demographics to detect emerging disparities in clinical outcomes.
Critical Infrastructure. Post-market monitoring of safety-critical AI agents must include monitoring of safety-relevant metrics under both normal and degraded operating conditions. Integration with the site's safety management system is essential. Near-miss monitoring is particularly important — incidents that were prevented by human intervention or safety systems may indicate degradation in the AI agent's performance that will eventually exceed the capacity of compensating controls. Monitoring cadence for safety-critical functions should be daily or more frequent.
Public Sector. Post-market monitoring of AI agents affecting individual rights must include fairness monitoring against the Public Sector Equality Duty. Monitoring findings should feed into the organisation's equality impact assessment process. Freedom of information requests and parliamentary questions about AI agent performance should be anticipated — monitoring data must be in a state that enables timely and accurate response. The Algorithmic Transparency Recording Standard (ATRS) requires ongoing reporting on algorithmic tool performance, which post-market monitoring data supports.
Basic Implementation — The organisation monitors basic performance metrics for each deployed agent on a defined cadence (at minimum quarterly). Alert thresholds are defined for key metrics. An adverse outcome and complaint reporting mechanism exists. A periodic review of each agent's continued suitability is conducted at least annually. A monitoring log records metrics, alerts, and response actions. This level meets the minimum mandatory requirements but may lack: fairness monitoring, input distribution monitoring, automated monitoring pipelines, and stratified metric analysis.
Intermediate Implementation — Performance monitoring is supplemented with fairness monitoring across protected characteristics and input distribution monitoring using quantitative metrics. Monitoring is automated with dashboards and automated alerting. Adverse outcome management follows a structured process with triage, investigation, and resolution tracking. Monitoring metrics are stratified by relevant dimensions. Comprehensive periodic reviews evaluate continued suitability against a structured framework. Monitoring data feeds into the governance committee's reporting.
Advanced Implementation — All intermediate capabilities plus: predictive monitoring uses leading indicators to forecast degradation. Outcome feedback loops capture actual outcomes and feed back into performance assessment. Monitoring across agents detects systemic patterns (e.g., multiple agents degrading in response to the same environmental change). The monitoring programme is independently validated (AG-056). Real-time dashboards provide continuous visibility for high-risk agents. The organisation can demonstrate to regulators an unbroken monitoring record for every deployed agent from deployment to current date, with full alert-to-resolution traceability.
Required artefacts:
Retention requirements:
Access requirements:
Testing AG-058 compliance requires verification that the post-market monitoring programme is operationally effective — that degradation, emerging bias, and adverse outcomes are detected and acted upon within defined timeframes.
Test 8.1: Degradation Detection
Test 8.2: Fairness Drift Detection
Test 8.3: Alert Response Timeliness
Test 8.4: Adverse Outcome Reporting Coverage
Test 8.5: Periodic Review Completeness
Test 8.6: Withdrawal Criteria Enforcement
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 72 (Post-Market Monitoring by Providers) | Direct requirement |
| EU AI Act | Article 9 (Risk Management System) | Supports compliance |
| FCA SS1/23 | Model Risk Management — Ongoing Monitoring | Direct requirement |
| FCA Consumer Duty | PRIN 2A.4 (Monitoring outcomes) | Direct requirement |
| NIST AI RMF | MANAGE 4.1, MEASURE 3.2 | Supports compliance |
| ISO 42001 | Clause 9.1 (Monitoring, Measurement, Analysis and Evaluation) | Direct requirement |
| DORA | Article 8 (Identification — ICT risk monitoring) | Supports compliance |
| UK MDR 2002 | Regulation 68 (Post-market surveillance — medical devices) | Direct requirement (where applicable) |
Article 72 is the most directly relevant provision. It requires providers of high-risk AI systems to establish and document a post-market monitoring system "in a manner that is proportionate to the nature of the artificial intelligence technologies and the risks of the high-risk AI system." The system must actively and systematically collect, document, and analyse relevant data on the performance of the AI system throughout its lifetime. Article 72 further requires that the post-market monitoring system be based on a post-market monitoring plan, which the provider must prepare and update. AG-058 directly implements Article 72 by establishing the governance framework for post-market monitoring, including the monitoring plan, metric definition, alerting, response procedures, and periodic review. The regulation requires that monitoring data be used to evaluate the "continuous compliance" of the system — mapping directly to AG-058's comprehensive periodic review requirement.
Article 9 requires that the risk management system "shall comprise a continuous iterative process planned and run throughout the entire lifecycle of a high-risk AI system." Post-market monitoring provides the data that makes this continuous iteration possible — without monitoring, the risk management system cannot respond to changes after deployment.
SS1/23 expects firms to monitor model performance throughout the model lifecycle, including detection of model degradation and assessment of continued model appropriateness. AG-058 extends this to the full governance scope — not just model accuracy but fairness, environmental validity, and regulatory compliance.
The Consumer Duty requires firms to monitor and regularly review the outcomes their customers are experiencing. For AI agents making customer-facing decisions, this requires monitoring outcome distributions across customer groups, including those with protected characteristics. AG-058 provides the monitoring framework that enables firms to demonstrate compliance with outcome monitoring requirements.
MANAGE 4.1 addresses ongoing monitoring of AI system performance in deployed settings. MEASURE 3.2 addresses mechanisms for feedback from deployment to inform improvements. AG-058 supports compliance by establishing the structures for ongoing monitoring and feedback-driven improvement.
Clause 9.1 requires the organisation to determine what needs to be monitored and measured, the methods for monitoring and measurement, when monitoring and measurement shall be performed, and when results shall be analysed and evaluated. AG-058 provides the operational framework for implementing Clause 9.1 specifically for AI agents.
Article 8 requires financial entities to identify and document ICT-related risks on a continuous basis. For AI agents, this includes monitoring for performance degradation, environmental change, and emerging risks. AG-058 supports compliance by providing the monitoring framework for AI-specific ICT risks.
Where an AI agent falls within the scope of the UK Medical Devices Regulations, the manufacturer has specific post-market surveillance obligations including vigilance reporting to the MHRA. AG-058 provides the broader governance framework that encompasses these device-specific requirements and extends them to clinical AI agents that may not formally qualify as medical devices.
| Field | Value |
|---|---|
| Severity Rating | High |
| Blast Radius | Organisation-wide — extends to all individuals, counterparties, and systems affected by the agent's decisions throughout its operational lifetime |
Consequence chain: Without post-market monitoring, an AI agent can degrade, develop bias, or become inappropriate for its operating context without detection. The immediate failure is the absence of information — the organisation does not know whether its agents are performing adequately, producing fair outcomes, or operating within their intended conditions. This information gap means that governance actions cannot be taken because the conditions that would trigger them are invisible. Performance degradation accumulates silently, producing an expanding volume of suboptimal decisions. Emerging bias affects an expanding population of individuals without detection. Environmental changes invalidate the agent's assumptions without correction. When the failure is eventually discovered — typically through an incident, complaint, or regulatory examination rather than through monitoring — the accumulated harm may be extensive and the remediation costly. The regulatory consequence is compounded because the absence of monitoring demonstrates a governance gap: the organisation deployed an AI agent and then failed to monitor whether it continued to work properly. Under the EU AI Act (Article 72), the absence of a post-market monitoring system for a high-risk AI system is itself a compliance failure, independent of whether the agent has actually degraded. The legal exposure increases because the organisation cannot demonstrate due diligence in maintaining the agent's fitness for purpose throughout its operational life.
Cross-references: AG-019 (Human Escalation & Override Triggers), AG-022 (Behavioural Drift Detection), AG-048 (AI Model Provenance and Integrity), AG-049 (Governance Decision Explainability), AG-055 (Oversight Competence Assurance), AG-056 (Independent Validation Governance), AG-057 (Dataset Suitability and Bias Control Governance), AG-051 through AG-054 (Provider Assurance, Rights & Documentation landscape).