AG-058: Post-Market Monitoring Governance

2. Summary

Post-Market Monitoring Governance requires that every AI agent deployed in production is subject to a structured, ongoing monitoring programme that detects degradation, emerging risks, adverse outcomes, and changing operational conditions throughout the agent's operational lifetime — and that monitoring findings trigger defined response actions including investigation, remediation, and where necessary, agent modification or withdrawal. The dimension addresses a fundamental reality of AI systems: an agent that is safe and effective at deployment may become unsafe or ineffective over time due to distributional shift in input data, changes in the operational environment, degradation of dependent systems, evolving regulatory requirements, or emergent failure modes that were not anticipated during development. AG-058 is the detective control that ensures these changes are detected and acted upon, rather than accumulating silently until an incident forces discovery. Without AG-058, organisations deploy agents and hope they continue to work — they have no systematic mechanism to know whether they do.

3. Example

Scenario A — Gradual Performance Degradation Undetected for Months: A customer-facing AI agent for insurance claims triage is deployed with strong initial performance: 91% agreement with human expert assessments. Over the following 14 months, the agent's agreement rate degrades to 72%. The degradation is gradual — approximately 1.4 percentage points per month — and is invisible in day-to-day operation because no individual decision is flagged as anomalous. The degradation is caused by a shift in claim types: the proportion of cyber-insurance claims has increased from 4% to 19% of the portfolio, and the agent's training data contained very few cyber-insurance claims. The degradation is discovered only when a quarterly business review notes that claim processing times have increased — human reviewers are overriding the agent's triage decisions more frequently. By this time, approximately 8,200 claims have been triaged suboptimally, with an estimated excess cost of £1.9 million from delayed processing and incorrect priority assignment.

What went wrong: No post-market monitoring programme was in place to track the agent's performance against defined metrics over time. The gradual degradation was invisible to operational staff because each individual decision appeared plausible. A monitoring programme that tracked agreement rate weekly, stratified by claim type, would have detected the degradation within 6-8 weeks rather than 14 months. Consequence: £1.9 million in excess processing costs, customer dissatisfaction from delayed claims, and regulatory scrutiny under FCA Consumer Duty for failing to deliver good outcomes.

Scenario B — Environmental Change Invalidates Agent Assumptions: A financial-value AI agent for foreign exchange risk management is deployed using models trained on 5 years of historical data. The agent performs well during normal market conditions. A geopolitical event causes a structural shift in FX volatility patterns — correlation structures that were stable for years change within days. The agent's risk assessments, calibrated to historical correlation patterns, become systematically inaccurate. Because no post-market monitoring tracks whether the agent's environmental assumptions remain valid, the inaccurate risk assessments persist for 3 weeks before a human risk manager notices during a routine manual review. During those 3 weeks, the organisation's FX risk position has been materially understated, with a potential unrecognised loss of £7.4 million.

What went wrong: The agent's environmental assumptions — that historical correlation patterns would persist — were never monitored for validity. Post-market monitoring should include validation of the agent's operating conditions, not just its outputs. A monitoring programme that tracked model input distributions and key environmental indicators would have detected the distributional shift within days, not weeks. Consequence: £7.4 million in potential unrecognised FX risk, regulatory investigation by the FCA for inadequate risk management systems, and requirement to implement comprehensive post-market monitoring.

Scenario C — Adverse Outcome Pattern Emerging Over Time: A public sector AI agent assists with social housing allocation by scoring applicants based on need. The agent is deployed with an initial equality impact assessment showing no significant disparate impact. Over 18 months, an adverse outcome pattern emerges: applicants with disabilities are receiving systematically lower priority scores — not because the model directly uses disability status, but because the model weights stable employment history heavily, and applicants with certain disabilities have less continuous employment history. The pattern is not detected until a housing advocacy group files a freedom of information request, analyses the data, and identifies the disparity. By that time, approximately 1,400 applicants with disabilities have been disadvantaged.

What went wrong: The initial equality impact assessment was treated as a one-time deployment activity. No ongoing monitoring tracked outcome distributions across protected characteristics. Post-market monitoring should include continuous fairness monitoring that detects emerging disparate impact, even when the initial assessment was clean. Consequence: Judicial review for indirect disability discrimination under the Equality Act 2010, mandatory suspension and review of the agent, remediation requiring reassessment of 1,400 applications at a cost of £920,000, and significant reputational damage.

4. Requirement Statement

Scope: This dimension applies to every AI agent deployed in a production environment. There are no exceptions based on risk level — all production agents require some level of post-market monitoring, though the depth and cadence of monitoring should be proportionate to the agent's risk level. The scope covers: (a) monitoring of agent performance against defined metrics; (b) monitoring of agent operating conditions to detect environmental changes that may affect performance; (c) monitoring of outcome distributions across protected characteristics and operationally relevant dimensions to detect emerging bias; (d) monitoring of adverse outcomes, complaints, and incidents; (e) monitoring of dependent systems, third-party components, and data sources for changes that may affect agent behaviour. The scope extends to third-party agents deployed on the organisation's behalf — where an organisation uses a vendor's AI agent, the organisation retains responsibility for monitoring the agent's performance and outcomes in its operational context.

4.1. A conforming system MUST implement a post-market monitoring programme for every AI agent deployed in production, defining: the metrics monitored, the monitoring cadence, the alert thresholds, the response procedures, and the responsible roles.

4.2. A conforming system MUST monitor agent performance against defined metrics at a cadence sufficient to detect meaningful degradation before it causes material harm — at minimum monthly for high-risk agents and quarterly for other agents.

4.3. A conforming system MUST monitor outcome distributions across legally protected characteristics and operationally relevant demographic dimensions at a cadence not exceeding the performance monitoring cadence, to detect emerging disparate impact.

4.4. A conforming system MUST define alert thresholds that trigger investigation when monitoring metrics deviate from acceptable ranges, and MUST define escalation pathways for different severity levels.

4.5. A conforming system MUST implement a structured incident and adverse outcome reporting mechanism that captures: complaints, adverse outcomes, near-misses, and anomalies detected through monitoring, with each report triaged, investigated, and resolved within defined timeframes.

4.6. A conforming system MUST conduct a periodic comprehensive review of each agent's continued suitability — at minimum annually — that evaluates: continued performance adequacy, continued environmental validity, continued regulatory compliance, and continued alignment with organisational risk appetite.

4.7. A conforming system MUST define and implement withdrawal or restriction criteria — the conditions under which an agent must be modified, restricted, or withdrawn from service based on monitoring findings.

4.8. A conforming system SHOULD monitor the agent's operating environment — including input data distributions, dependent system health, third-party component changes, and regulatory context changes — to detect conditions that may affect agent performance before that effect manifests in output metrics.

4.9. A conforming system SHOULD implement automated monitoring where feasible, with real-time or near-real-time dashboards for high-risk agents and automated alerting when thresholds are breached.

4.10. A conforming system MAY implement predictive monitoring that uses leading indicators to forecast performance degradation before it occurs — for example, detecting distributional shift in input data that historically correlates with output quality degradation.

5. Rationale

AI agents are not static systems. Their performance can degrade, their operating environments can change, and their impacts can shift in ways that were not anticipated at deployment. Post-market monitoring is the detective control that prevents the assumption of continued fitness from becoming the reality of undetected failure.

The need for post-market monitoring in AI is driven by several characteristics that distinguish AI agents from traditional software: (1) AI agents are sensitive to distributional shift — small changes in the statistical properties of input data can cause significant changes in output quality, without any change to the agent's code or configuration; (2) AI agents can exhibit emergent failure modes — interactions between model behaviour and real-world conditions can produce failures that were not present in testing because the test conditions did not include the specific combination of factors that triggers the failure; (3) AI agent impacts are often indirect and delayed — a biased decision today may not produce a visible adverse outcome for weeks or months, making immediate detection impossible without structured monitoring; (4) the operational environment is dynamic — regulations change, user populations shift, dependent systems are updated, and market conditions evolve, any of which can affect the agent's appropriateness; and (5) AI agents operating at scale produce large volumes of decisions, meaning that even small per-decision errors can aggregate to significant harm.

Post-market monitoring is not the same as operational monitoring of system health (uptime, latency, error rates). System health monitoring detects technical failures — the agent is down, the API is slow, the database is full. Post-market monitoring detects governance failures — the agent is operational but its decisions are degrading, its fairness profile is shifting, or its operating assumptions are no longer valid. Both are necessary; neither substitutes for the other.

The regulatory trajectory is clear. The EU AI Act explicitly requires post-market monitoring for high-risk AI systems (Article 72). The FCA expects firms to monitor AI systems throughout their lifecycle. The NIST AI RMF includes ongoing monitoring as a core governance function. Organisations that do not implement post-market monitoring are both exposed to regulatory action and unable to demonstrate that their agents remain fit for purpose.

6. Implementation Guidance

Post-market monitoring governance requires a structured programme covering metric definition, data collection, analysis, alerting, response, and reporting. The programme should be integrated with the organisation's broader AI governance framework, connecting to drift detection (AG-022), escalation pathways (AG-019), and independent validation (AG-056).

Recommended patterns:

Monitoring metric framework by risk tier. Define a tiered monitoring framework that specifies metrics, cadence, and thresholds by risk level. For high-risk agents (financial-value, safety-critical, rights-sensitive): monitor performance metrics (accuracy, precision, recall, F1) weekly; monitor fairness metrics (demographic parity, equalised odds) monthly; monitor input distribution stability (PSI) weekly; monitor complaint and adverse outcome rates continuously; conduct comprehensive suitability review annually. For medium-risk agents (enterprise workflow, customer-facing): monitor performance metrics monthly; monitor fairness metrics quarterly; monitor input distribution stability monthly; monitor complaint rates monthly; conduct comprehensive review annually. For lower-risk agents (internal copilots): monitor performance metrics quarterly; monitor input distribution stability quarterly; conduct comprehensive review annually. Define alert thresholds for each metric — for example, performance degradation exceeding 5% from baseline triggers investigation; fairness metric deviation exceeding 10% from baseline triggers immediate escalation.
Automated monitoring pipeline. Implement an automated monitoring pipeline that: (1) collects agent inputs, outputs, and outcomes into a monitoring data store; (2) computes monitoring metrics on the defined cadence; (3) compares metrics against defined thresholds; (4) generates alerts when thresholds are breached; (5) routes alerts through the escalation pathway (AG-019) to competent personnel (AG-055). The pipeline should include data quality checks to ensure that monitoring data is complete and accurate — monitoring based on incomplete data can produce false assurance.
Outcome feedback loops. Where possible, establish feedback loops that capture the actual outcomes of agent decisions, not just the decisions themselves. For a credit decision agent, this means tracking loan performance (defaults, arrears) by decision cohort. For a triage agent, this means tracking whether the triage recommendation was appropriate based on final resolution. For a content moderation agent, this means tracking appeals and overturns. Outcome feedback enables the monitoring system to assess whether the agent's decisions are actually leading to good outcomes, not just whether they are consistent with historical patterns.
Structured adverse outcome and complaint management. Implement a formal adverse outcome and complaint management process: each report is logged with a unique identifier, triaged by severity, assigned to an investigator, investigated within defined timeframes (48 hours for critical, 5 business days for high-severity, 15 business days for standard), and resolved with documented findings and actions. Adverse outcome patterns are analysed to identify systemic issues — a pattern of similar complaints may indicate a governance failure even if each individual complaint appears minor. Reporting to the governance committee occurs on a defined cadence (at minimum quarterly) with trend analysis.
Comprehensive periodic review process. Define a structured annual review process for each agent that evaluates: (1) performance trends — is the agent's performance stable, improving, or degrading? (2) fairness trends — are outcome distributions across protected characteristics stable or shifting? (3) environmental validity — are the agent's operating assumptions still valid given current conditions? (4) regulatory compliance — have regulatory requirements changed since the last review? (5) risk appetite alignment — does the agent's risk profile still align with the organisation's current risk appetite? (6) dependent system changes — have any dependent systems, third-party components, or data sources changed in ways that affect the agent? The review should produce a documented determination: continue unchanged, continue with modifications, restrict scope, or withdraw from service.

Anti-patterns to avoid:

Monitoring only aggregate metrics. An agent with stable aggregate accuracy may have significantly degraded accuracy for specific subpopulations. All monitoring metrics should be stratified by relevant dimensions — at minimum by the protected characteristics relevant to the agent's domain.
Monitoring without defined response procedures. A monitoring system that generates alerts but has no defined response procedure creates alert fatigue and false assurance. Every alert must have a defined response: who is responsible, what investigation is required, what decision must be made, and within what timeframe.
Relying solely on complaint-driven detection. Complaints are a valuable signal, but they are a lagging indicator and they underrepresent harms to populations with less access to complaint mechanisms. Structured monitoring must proactively detect issues that complaint channels may not surface.
Treating post-market monitoring as a technical function only. Monitoring requires domain expertise to interpret findings. A 3% decline in an accuracy metric may be operationally insignificant for one agent and critically dangerous for another. Monitoring interpretation must involve domain experts, not just data engineers.
Failing to monitor third-party and vendor agents. Where an organisation uses a vendor's AI agent, the vendor's monitoring (if any) covers the agent in general. The organisation must monitor the agent's performance and outcomes in its specific operational context — vendor monitoring does not substitute for deployer monitoring.

Industry Considerations

Financial Services. Post-market monitoring aligns with existing model monitoring requirements. The FCA's SS1/23 requires firms to monitor model performance throughout the model lifecycle. The PRA's model risk management expectations include ongoing monitoring and validation. For AI agents, monitoring must extend beyond model performance to cover governance controls, fairness outcomes, and customer outcomes as required by the Consumer Duty. Monitoring data should feed into the firm's model risk management framework and be reportable to the board or relevant committee.

Healthcare. Post-market monitoring of clinical AI agents aligns with medical device vigilance requirements. Where the agent falls within the scope of the UK Medical Devices Regulations or EU MDR, the manufacturer has specific post-market surveillance obligations including serious incident reporting to the MHRA within defined timeframes. Even where the agent does not meet the medical device threshold, clinical governance requirements mandate ongoing monitoring of AI-assisted clinical decisions. Patient outcome monitoring should be stratified by demographics to detect emerging disparities in clinical outcomes.

Critical Infrastructure. Post-market monitoring of safety-critical AI agents must include monitoring of safety-relevant metrics under both normal and degraded operating conditions. Integration with the site's safety management system is essential. Near-miss monitoring is particularly important — incidents that were prevented by human intervention or safety systems may indicate degradation in the AI agent's performance that will eventually exceed the capacity of compensating controls. Monitoring cadence for safety-critical functions should be daily or more frequent.

Public Sector. Post-market monitoring of AI agents affecting individual rights must include fairness monitoring against the Public Sector Equality Duty. Monitoring findings should feed into the organisation's equality impact assessment process. Freedom of information requests and parliamentary questions about AI agent performance should be anticipated — monitoring data must be in a state that enables timely and accurate response. The Algorithmic Transparency Recording Standard (ATRS) requires ongoing reporting on algorithmic tool performance, which post-market monitoring data supports.

Maturity Model

Basic Implementation — The organisation monitors basic performance metrics for each deployed agent on a defined cadence (at minimum quarterly). Alert thresholds are defined for key metrics. An adverse outcome and complaint reporting mechanism exists. A periodic review of each agent's continued suitability is conducted at least annually. A monitoring log records metrics, alerts, and response actions. This level meets the minimum mandatory requirements but may lack: fairness monitoring, input distribution monitoring, automated monitoring pipelines, and stratified metric analysis.

Intermediate Implementation — Performance monitoring is supplemented with fairness monitoring across protected characteristics and input distribution monitoring using quantitative metrics. Monitoring is automated with dashboards and automated alerting. Adverse outcome management follows a structured process with triage, investigation, and resolution tracking. Monitoring metrics are stratified by relevant dimensions. Comprehensive periodic reviews evaluate continued suitability against a structured framework. Monitoring data feeds into the governance committee's reporting.

Advanced Implementation — All intermediate capabilities plus: predictive monitoring uses leading indicators to forecast degradation. Outcome feedback loops capture actual outcomes and feed back into performance assessment. Monitoring across agents detects systemic patterns (e.g., multiple agents degrading in response to the same environmental change). The monitoring programme is independently validated (AG-056). Real-time dashboards provide continuous visibility for high-risk agents. The organisation can demonstrate to regulators an unbroken monitoring record for every deployed agent from deployment to current date, with full alert-to-resolution traceability.

7. Evidence Requirements

Required artefacts:

Post-market monitoring plan. For each deployed agent: the metrics monitored, the monitoring cadence, the alert thresholds, the response procedures, the responsible roles, and the review schedule. Format: structured plan document.
Monitoring data and metric reports. Timestamped monitoring metric reports for each monitoring cycle, showing: the metrics calculated, the values observed, the thresholds applied, and any alerts triggered. Stratified by relevant dimensions. Minimum 12 months of continuous monitoring data available.
Alert and response records. For each alert triggered: the alert details, the investigation conducted, the findings, the decision made, and the actions taken. Records must demonstrate that alerts were responded to within defined timeframes.
Adverse outcome and complaint records. Log of all adverse outcomes, complaints, and near-misses reported, with: triage severity, investigation findings, resolution actions, and closure date. Trend analysis reports produced at minimum quarterly.
Periodic review reports. Annual (or more frequent) comprehensive review reports for each agent, documenting: the review framework applied, the findings in each evaluation area, the determination (continue, modify, restrict, withdraw), and any actions required.
Withdrawal or restriction records. Where monitoring findings have led to agent modification, restriction, or withdrawal: the monitoring finding that triggered the action, the decision rationale, and the action taken.

Retention requirements:

Monitoring data and reports: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Evidence must exist as retained artefacts, not be reconstructable after the fact.

8. Test Specification

Testing AG-058 compliance requires verification that the post-market monitoring programme is operationally effective — that degradation, emerging bias, and adverse outcomes are detected and acted upon within defined timeframes.

Test 8.1: Degradation Detection

Stimulus: Introduce a simulated performance degradation — reduce the agent's accuracy metric by 8% through a controlled test (e.g., by feeding modified inputs or comparing against a degraded shadow model).
Expected behaviour: The monitoring system detects the degradation within the next scheduled monitoring cycle, generates an alert, and routes the alert to the appropriate personnel.
Pass criteria: Degradation exceeding the defined alert threshold is detected within one monitoring cycle, an alert is generated, and the alert is routed to a defined recipient within the defined timeframe.
Fail criteria: The degradation is not detected, no alert is generated, or the alert is not routed to the appropriate personnel.

Test 8.2: Fairness Drift Detection

Stimulus: Introduce a simulated fairness drift — shift the agent's outcome distribution for a protected characteristic group so that the demographic parity ratio moves from 0.95 to 0.75.
Expected behaviour: The monitoring system detects the fairness drift at the next scheduled fairness monitoring cycle and generates an alert indicating that the fairness threshold has been breached.
Pass criteria: Fairness drift exceeding the defined threshold is detected within one fairness monitoring cycle, with the affected protected characteristic identified.
Fail criteria: The fairness drift is not detected, or the affected group is not identified.

Test 8.3: Alert Response Timeliness

Stimulus: Trigger a monitoring alert and measure the time from alert generation to investigation initiation and resolution.
Expected behaviour: Investigation begins within the defined timeframe (e.g., 48 hours for critical alerts, 5 business days for high-severity alerts).
Pass criteria: Investigation begins within the defined timeframe for the alert severity. Resolution occurs within the defined timeframe. If resolution requires extended remediation, an interim risk mitigation is implemented.
Fail criteria: Investigation does not begin within the defined timeframe, or alerts remain unacknowledged for longer than the defined period.

Test 8.4: Adverse Outcome Reporting Coverage

Stimulus: Submit simulated adverse outcome reports through all available channels (complaint form, operational staff report, automated detection) and verify that each is captured in the adverse outcome management system.
Expected behaviour: All reports are captured, assigned a unique identifier, triaged by severity, and assigned to an investigator.
Pass criteria: 100% of submitted reports are captured in the system within the defined timeframe. No submission channel is unmonitored.
Fail criteria: Any submission is lost, significantly delayed, or not captured in the adverse outcome management system.

Test 8.5: Periodic Review Completeness

Stimulus: Review the most recent periodic review report for an agent. Verify that all required evaluation areas are addressed: performance trends, fairness trends, environmental validity, regulatory compliance, risk appetite alignment, and dependent system changes.
Expected behaviour: The review report addresses all required areas with substantive analysis, supported by monitoring data. The determination (continue, modify, restrict, withdraw) is documented with rationale.
Pass criteria: All required evaluation areas are addressed with quantitative evidence where applicable. The determination is documented with clear rationale linked to the evaluation findings.
Fail criteria: Any required evaluation area is omitted, or the review is a perfunctory exercise without substantive analysis.

Test 8.6: Withdrawal Criteria Enforcement

Stimulus: Simulate monitoring findings that meet the defined withdrawal or restriction criteria (e.g., sustained performance degradation below an acceptable minimum, or fairness metric breach exceeding the critical threshold).
Expected behaviour: The system triggers the defined withdrawal or restriction process. The agent is modified, restricted, or withdrawn within the defined timeframe.
Pass criteria: Withdrawal or restriction criteria trigger the defined response. The agent does not continue to operate unchanged when monitoring findings meet withdrawal criteria.
Fail criteria: Monitoring findings that meet withdrawal criteria do not trigger the defined response, or the agent continues to operate unchanged despite meeting withdrawal criteria.

Conformance Scoring

Score 0: No post-market monitoring is conducted — agents are deployed and not systematically monitored.
Score 1: Basic performance metrics are monitored on a defined cadence, but fairness monitoring is absent, alert thresholds are not defined, response procedures are informal, and periodic reviews are not conducted.
Score 2: Performance and fairness metrics are monitored on a defined cadence with defined alert thresholds and formal response procedures. An adverse outcome management system captures and tracks reports. Periodic comprehensive reviews evaluate continued suitability. Withdrawal criteria are defined.
Score 3: All Score 2 capabilities plus: automated monitoring with real-time dashboards for high-risk agents, predictive monitoring using leading indicators, outcome feedback loops, cross-agent systemic pattern detection, and independently validated monitoring programme.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 72 (Post-Market Monitoring by Providers)	Direct requirement
EU AI Act	Article 9 (Risk Management System)	Supports compliance
FCA SS1/23	Model Risk Management — Ongoing Monitoring	Direct requirement
FCA Consumer Duty	PRIN 2A.4 (Monitoring outcomes)	Direct requirement
NIST AI RMF	MANAGE 4.1, MEASURE 3.2	Supports compliance
ISO 42001	Clause 9.1 (Monitoring, Measurement, Analysis and Evaluation)	Direct requirement
DORA	Article 8 (Identification — ICT risk monitoring)	Supports compliance
UK MDR 2002	Regulation 68 (Post-market surveillance — medical devices)	Direct requirement (where applicable)

EU AI Act — Article 72 (Post-Market Monitoring by Providers)

Article 72 is the most directly relevant provision. It requires providers of high-risk AI systems to establish and document a post-market monitoring system "in a manner that is proportionate to the nature of the artificial intelligence technologies and the risks of the high-risk AI system." The system must actively and systematically collect, document, and analyse relevant data on the performance of the AI system throughout its lifetime. Article 72 further requires that the post-market monitoring system be based on a post-market monitoring plan, which the provider must prepare and update. AG-058 directly implements Article 72 by establishing the governance framework for post-market monitoring, including the monitoring plan, metric definition, alerting, response procedures, and periodic review. The regulation requires that monitoring data be used to evaluate the "continuous compliance" of the system — mapping directly to AG-058's comprehensive periodic review requirement.

EU AI Act — Article 9 (Risk Management System)

Article 9 requires that the risk management system "shall comprise a continuous iterative process planned and run throughout the entire lifecycle of a high-risk AI system." Post-market monitoring provides the data that makes this continuous iteration possible — without monitoring, the risk management system cannot respond to changes after deployment.

FCA SS1/23 — Model Risk Management — Ongoing Monitoring

SS1/23 expects firms to monitor model performance throughout the model lifecycle, including detection of model degradation and assessment of continued model appropriateness. AG-058 extends this to the full governance scope — not just model accuracy but fairness, environmental validity, and regulatory compliance.

FCA Consumer Duty — PRIN 2A.4 (Monitoring Outcomes)

The Consumer Duty requires firms to monitor and regularly review the outcomes their customers are experiencing. For AI agents making customer-facing decisions, this requires monitoring outcome distributions across customer groups, including those with protected characteristics. AG-058 provides the monitoring framework that enables firms to demonstrate compliance with outcome monitoring requirements.

NIST AI RMF — MANAGE 4.1, MEASURE 3.2

MANAGE 4.1 addresses ongoing monitoring of AI system performance in deployed settings. MEASURE 3.2 addresses mechanisms for feedback from deployment to inform improvements. AG-058 supports compliance by establishing the structures for ongoing monitoring and feedback-driven improvement.

ISO 42001 — Clause 9.1 (Monitoring, Measurement, Analysis and Evaluation)

Clause 9.1 requires the organisation to determine what needs to be monitored and measured, the methods for monitoring and measurement, when monitoring and measurement shall be performed, and when results shall be analysed and evaluated. AG-058 provides the operational framework for implementing Clause 9.1 specifically for AI agents.

DORA — Article 8 (Identification — ICT Risk Monitoring)

Article 8 requires financial entities to identify and document ICT-related risks on a continuous basis. For AI agents, this includes monitoring for performance degradation, environmental change, and emerging risks. AG-058 supports compliance by providing the monitoring framework for AI-specific ICT risks.

UK MDR 2002 — Regulation 68 (Post-Market Surveillance)

Where an AI agent falls within the scope of the UK Medical Devices Regulations, the manufacturer has specific post-market surveillance obligations including vigilance reporting to the MHRA. AG-058 provides the broader governance framework that encompasses these device-specific requirements and extends them to clinical AI agents that may not formally qualify as medical devices.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Organisation-wide — extends to all individuals, counterparties, and systems affected by the agent's decisions throughout its operational lifetime

Consequence chain: Without post-market monitoring, an AI agent can degrade, develop bias, or become inappropriate for its operating context without detection. The immediate failure is the absence of information — the organisation does not know whether its agents are performing adequately, producing fair outcomes, or operating within their intended conditions. This information gap means that governance actions cannot be taken because the conditions that would trigger them are invisible. Performance degradation accumulates silently, producing an expanding volume of suboptimal decisions. Emerging bias affects an expanding population of individuals without detection. Environmental changes invalidate the agent's assumptions without correction. When the failure is eventually discovered — typically through an incident, complaint, or regulatory examination rather than through monitoring — the accumulated harm may be extensive and the remediation costly. The regulatory consequence is compounded because the absence of monitoring demonstrates a governance gap: the organisation deployed an AI agent and then failed to monitor whether it continued to work properly. Under the EU AI Act (Article 72), the absence of a post-market monitoring system for a high-risk AI system is itself a compliance failure, independent of whether the agent has actually degraded. The legal exposure increases because the organisation cannot demonstrate due diligence in maintaining the agent's fitness for purpose throughout its operational life.

Cross-references: AG-019 (Human Escalation & Override Triggers), AG-022 (Behavioural Drift Detection), AG-048 (AI Model Provenance and Integrity), AG-049 (Governance Decision Explainability), AG-055 (Oversight Competence Assurance), AG-056 (Independent Validation Governance), AG-057 (Dataset Suitability and Bias Control Governance), AG-051 through AG-054 (Provider Assurance, Rights & Documentation landscape).

Cite this protocol

AgentGoverning. (2026). AG-058: Post-Market Monitoring Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-058

← Previous Protocol

AG-057

Dataset Suitability and Bias Control Governance

Next Protocol →

AG-059

Lawful Basis and Consent Enforcement