AG-140

Novelty and Out-of-Distribution Detection Governance

Competence, Uncertainty & Autonomy Scaling ~20 min read AGS v2.1 · April 2026
EU AI Act FCA NIST ISO 42001

2. Summary

Novelty and Out-of-Distribution Detection Governance requires that every AI agent deployment includes mechanisms to detect when incoming inputs, environmental conditions, or task characteristics differ materially from the distribution on which the agent was trained and validated. OOD detection is the runtime complement to the competence envelope defined in AG-139: where the competence envelope specifies the boundaries, OOD detection identifies when those boundaries are being approached or crossed in real time. A conforming system does not rely on the agent's own confidence scores as the sole indicator of distributional alignment — it implements independent detection mechanisms that evaluate inputs before or alongside agent processing, generating structured signals that feed into escalation (AG-141) and governance routing decisions. Without OOD detection, an agent operating at the boundary of its competence produces outputs indistinguishable from those produced well within its validated domain, creating a silent reliability degradation that compounds until a consequential failure exposes the gap.

3. Example

Scenario A — Undetected Language Distribution Shift in Contract Review: A legal technology firm deploys an AI agent to review commercial lease agreements. The agent is trained and validated on 28,000 UK commercial leases drafted under English law, achieving 94.8% clause extraction accuracy and 91.3% risk flagging accuracy. The firm expands into the Irish market and begins routing Irish commercial leases to the same agent. Irish commercial leases share approximately 80% structural similarity with English leases but contain critical differences: different statutory frameworks (Landlord and Tenant (Amendment) Act 1980 vs. Landlord and Tenant Act 1954), different break clause conventions, and different rent review mechanisms. The agent processes 1,400 Irish leases over four months, extracting clauses with apparent confidence. However, Irish-specific provisions — particularly upward-only rent review protections and Part IV renewal rights — are systematically misclassified. The firm discovers the issue when a client challenges an incorrect risk assessment on a €3.2 million lease renewal.

What went wrong: No OOD detection mechanism existed to identify that Irish leases constituted a distributional shift from the English lease training corpus. The agent's confidence scores remained high because the overall document structure was similar — the differences were in specific legal provisions that the confidence scoring mechanism was not calibrated to detect. Consequence: €3.2 million lease risk assessment challenged, 1,400 reviews requiring re-examination, professional indemnity claim, and client relationship damage.

Scenario B — Gradual Distributional Drift in Fraud Detection: A payments processor deploys an AI agent for real-time fraud detection on card-not-present transactions. The agent is validated on 18 months of historical transaction data with a false positive rate of 2.1% and a fraud detection rate of 97.4%. Over 8 months, the transaction population gradually shifts: the average transaction value increases from £47 to £89, the proportion of mobile-initiated transactions rises from 34% to 61%, and new merchant categories (cryptocurrency exchanges, NFT platforms) appear in the data. Each individual shift is incremental and does not trigger any hard boundary. The agent's fraud detection rate degrades from 97.4% to 88.2% over the 8-month period, but because the degradation is gradual, no single day's performance crosses an alert threshold. The cumulative missed fraud totals £4.7 million before a quarterly model review identifies the distributional drift.

What went wrong: The OOD detection was binary — it looked for hard boundary crossings on individual features but did not measure the aggregate distributional distance between incoming data and the validation distribution. The multivariate drift was significant even though no individual feature crossed a hard threshold. Consequence: £4.7 million in undetected fraud over 8 months, regulatory investigation, and mandatory model re-validation under accelerated timeline.

Scenario C — Novel Input Category in Medical Triage: A healthcare provider deploys an AI agent for emergency department triage, validated on 45,000 historical presentations covering 12 chief complaint categories. During a novel respiratory illness outbreak, patients present with a symptom combination — progressive dyspnoea with normal chest X-ray and distinctive ground-glass CT findings — that does not map to any of the 12 validated categories. The agent assigns these patients to the "lower respiratory tract infection" category with moderate confidence, applying standard triage protocols. However, the novel presentations require significantly different resource allocation: isolation rooms, specific PPE, and specialist respiratory review. Of 89 patients triaged during the first week of the outbreak, 34 are assigned inappropriate triage categories, resulting in delayed isolation and 7 secondary transmissions within the department.

What went wrong: The triage agent had no mechanism to detect that the incoming presentations represented a genuinely novel category not present in its training distribution. The agent's categorisation of novel inputs into the nearest existing category — a fundamental behaviour of classification models — was not flagged by any OOD detection mechanism. Consequence: 34 inappropriate triage assignments, 7 secondary transmissions, public health investigation, and temporary suspension of automated triage.

4. Requirement Statement

Scope: This dimension applies to all AI agents operating within a defined competence envelope (AG-139) where the reliability of the agent's output depends on the input being representative of the distribution on which the agent was validated. This includes virtually all deployed agents, because all machine learning systems are fundamentally distribution-dependent: their performance is validated on a specific data distribution, and their reliability on data drawn from a different distribution is unknown. The scope explicitly includes agents that appear to be performing well — an agent can produce high-confidence, plausible outputs on out-of-distribution inputs precisely because the agent has no structural awareness that its input distribution has shifted. The scope extends to environmental conditions and task characteristics, not just input data features: a change in the timing pattern of requests, the quality of upstream data sources, or the characteristics of the user population can all constitute distributional shifts relevant to agent reliability.

4.1. A conforming system MUST implement at least one OOD detection mechanism that operates independently of the agent's own confidence or uncertainty estimates, evaluating incoming inputs against a reference representation of the validation distribution.

4.2. A conforming system MUST define quantitative OOD detection thresholds calibrated against the validation distribution, such that inputs exceeding the threshold are flagged before or concurrent with agent processing — not after the agent has produced and delivered output.

4.3. A conforming system MUST generate structured, machine-readable OOD signals that include: the detection method, the measured distance or score, the threshold exceeded, and a timestamp, enabling downstream governance processes (AG-141 escalation, AG-139 envelope monitoring) to consume the signal programmatically.

4.4. A conforming system MUST monitor for distributional drift at the population level — not only individual-input OOD detection — by comparing aggregate statistics of incoming data against the validation distribution on a defined schedule (no less frequent than daily for agents processing more than 1,000 requests per day).

4.5. A conforming system MUST trigger re-validation of the competence envelope (AG-139) when population-level distributional drift exceeds a predefined threshold, as measured by a statistical distance metric (e.g., Population Stability Index > 0.2, Jensen-Shannon divergence > 0.1, or domain-appropriate equivalent).

4.6. A conforming system SHOULD implement multiple complementary OOD detection methods — for example, input-space density estimation combined with feature-space distance metrics — to reduce the risk of any single method's blind spots.

4.7. A conforming system SHOULD calibrate OOD detection thresholds using a held-out validation set that includes known OOD examples, measuring the trade-off between detection sensitivity (catching true OOD inputs) and specificity (avoiding false OOD flags on valid inputs), and selecting a threshold that prioritises sensitivity in safety-critical applications.

4.8. A conforming system SHOULD log all OOD detection results — including inputs scored below the threshold — to support retrospective analysis of detection method effectiveness and threshold calibration.

4.9. A conforming system MAY implement tiered OOD responses — for example, inputs slightly above the threshold receive enhanced logging and human review, while inputs significantly above the threshold trigger immediate escalation or abstention.

5. Rationale

Out-of-distribution detection is a necessary complement to competence envelope governance because the competence envelope defines the validated boundary statically, while OOD detection enforces that boundary dynamically at runtime. The competence envelope says "this agent was validated on inputs with these characteristics." OOD detection says "this specific incoming input does or does not match those characteristics."

The fundamental challenge is that AI agents — particularly those built on large language models or deep neural networks — do not fail gracefully at distribution boundaries. A traditional rule-based system encountering an input it cannot process returns an error. An AI agent encountering an input outside its training distribution typically produces output that is syntactically valid, stylistically consistent with in-distribution output, and may even carry high confidence scores. The confidence calibration of modern neural networks is notoriously poor at distribution boundaries: models are frequently most confident on inputs that are most different from their training data, because the softmax function produces high-confidence outputs even when the model's internal representations are poorly matched to the input.

This means that without independent OOD detection, organisations have no reliable runtime signal indicating when agent output should not be trusted. The agent's own confidence is unreliable precisely in the cases where reliability matters most. The solution is to implement detection mechanisms that evaluate inputs against the known validation distribution using methods that are independent of the agent's internal representations — or, if they use the agent's internal representations, do so through calibrated secondary models trained specifically for distributional distance estimation.

Population-level drift monitoring addresses a different failure mode: the gradual shift in input distribution that does not trigger individual-input OOD detection but cumulatively moves the operational distribution away from the validation distribution. This is analogous to the difference between a sudden temperature spike (detectable by a single reading) and a gradual warming trend (detectable only by comparing the distribution of readings over time). Both are dangerous; both require detection.

This dimension intersects with AG-139 (Competence Envelope Governance) as the runtime enforcement arm of the envelope. It intersects with AG-141 (Mandatory Abstention and Uncertainty Escalation Governance) because OOD detection signals are a primary input to abstention and escalation decisions. It intersects with AG-022 (Behavioural Drift Detection) because behavioural drift may be caused by distributional drift — the agent's behaviour changes because its inputs have changed. It intersects with AG-074 (Performance Drift and Revalidation) because distributional drift is a leading indicator of performance drift.

6. Implementation Guidance

OOD detection operates at two levels: individual-input detection (is this specific input within the validated distribution?) and population-level drift monitoring (has the overall distribution of inputs shifted from the validation distribution?). Both are required. They serve different purposes and catch different failure modes.

Individual-Input OOD Detection Methods:

Population-Level Drift Monitoring Methods:

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. OOD detection for trading and risk agents should account for regime changes — transitions between low-volatility and high-volatility market environments that represent distributional shifts even if individual features remain within historical ranges. The combination of features (e.g., low volume with high volatility) may be OOD even when each feature individually is not. Regulators expect firms to demonstrate that models are monitored for ongoing appropriateness (FCA SS1/23, ECB TRIM).

Healthcare. OOD detection for clinical agents must be sensitive to novel disease presentations, emerging pathogens, and population demographic shifts. The cost of a false negative (missing a true OOD input) is significantly higher than the cost of a false positive (escalating an in-distribution input). Thresholds should be set to prioritise sensitivity. Novel symptom combinations should trigger immediate clinical review regardless of agent confidence.

Autonomous Systems. OOD detection for embodied agents (robotics, autonomous vehicles, edge devices) must operate under strict latency constraints — typically under 10ms. Methods must be computationally efficient. Environmental novelty detection (new obstacles, unusual lighting, degraded sensor quality) is as important as input-data novelty detection. Edge-deployed models may use lightweight distance metrics (cosine distance to nearest cluster centroid) rather than full density estimation.

Maturity Model

Basic Implementation — The organisation monitors agent confidence scores and flags outputs below a predefined confidence threshold. Population-level monitoring consists of periodic (monthly) review of aggregate performance metrics. OOD detection is not independent of the agent — it relies on the agent's own uncertainty estimates. This level provides minimal awareness of distributional issues but has significant blind spots: high-confidence OOD inputs are not detected, and monthly monitoring cadence allows drift to accumulate before detection.

Intermediate Implementation — At least one independent OOD detection mechanism operates alongside the agent, evaluating inputs against a reference distribution representation derived from the validation dataset. Individual-input detection operates synchronously in the request pipeline. Population-level drift monitoring runs on a daily or weekly schedule using statistical distance metrics (PSI, KS tests). OOD signals are structured and machine-readable, feeding into escalation workflows (AG-141). Detection thresholds are calibrated using held-out data including known OOD examples. The reference distribution is versioned alongside the competence envelope.

Advanced Implementation — All intermediate capabilities plus: multiple complementary OOD detection methods are deployed (density-based, distance-based, and ensemble-based) to reduce blind spots. Multivariate drift detection captures correlated distributional shifts. Continuous OOD score distributions are monitored and threshold calibration is reviewed quarterly. OOD detection effectiveness is measured through labelled OOD evaluation sets updated periodically with real-world OOD examples encountered in production. Independent third-party evaluation of detection method coverage is performed annually. The organisation can demonstrate to regulators a complete chain from reference distribution through detection mechanism to escalation response for every deployed agent.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Testing AG-140 compliance requires verification that OOD detection mechanisms function correctly across a range of distributional shift scenarios. A comprehensive test programme should include the following tests.

Test 8.1: Known OOD Detection Sensitivity

Test 8.2: In-Distribution False Positive Rate

Test 8.3: Detection Independence from Agent Confidence

Test 8.4: Population-Level Drift Detection

Test 8.5: Structured OOD Signal Generation

Test 8.6: Detection Latency Under Load

Test 8.7: Re-validation Trigger on Drift Detection

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 9 (Risk Management System)Supports compliance
EU AI ActArticle 15 (Accuracy, Robustness, Cybersecurity)Direct requirement
EU AI ActArticle 72 (Post-Market Monitoring)Direct requirement
NIST AI RMFMEASURE 1.1, MEASURE 2.5, MANAGE 3.1Supports compliance
ISO 42001Clause 8.2 (AI Risk Assessment), Clause 9.1 (Monitoring, Measurement, Analysis)Supports compliance
FCA SS1/23Model Risk Management — Ongoing MonitoringDirect requirement
DORAArticle 10 (ICT-Related Incident Detection)Supports compliance

EU AI Act — Article 15 (Accuracy, Robustness, Cybersecurity)

Article 15(1) requires that high-risk AI systems are designed and developed in such a way that they achieve an appropriate level of accuracy, robustness, and cybersecurity, and perform consistently in those respects throughout their lifecycle. OOD detection is a primary mechanism for ensuring lifecycle consistency — it detects when the conditions under which accuracy was validated no longer hold, triggering appropriate governance responses. Without OOD detection, the accuracy validated at deployment degrades silently as the input distribution evolves.

EU AI Act — Article 72 (Post-Market Monitoring)

Article 72 requires providers of high-risk AI systems to establish a post-market monitoring system to actively collect and review data on the performance of AI systems throughout their lifetime. OOD detection and population-level drift monitoring directly implement this requirement by providing continuous, structured monitoring of whether the AI system's operational conditions remain aligned with its validation conditions. The data collected through OOD detection feeds directly into post-market monitoring obligations.

FCA SS1/23 — Model Risk Management — Ongoing Monitoring

The FCA's supervisory statement requires firms to "monitor models on an ongoing basis to confirm that they continue to perform as expected." For AI agents, "as expected" means within the validated competence envelope on the validated data distribution. OOD detection operationalises this requirement by providing continuous, automated monitoring of distributional alignment. The FCA expects firms to identify and respond to model deterioration in a timely manner — the daily monitoring cadence required by AG-140 supports this expectation.

NIST AI RMF — MEASURE 1.1, MEASURE 2.5, MANAGE 3.1

MEASURE 1.1 addresses appropriate methods and metrics for measuring AI system performance. MEASURE 2.5 addresses the evaluation of AI system performance in deployment. MANAGE 3.1 addresses the continuous monitoring of identified risks. OOD detection supports compliance by providing runtime measurement of distributional alignment, deployment performance evaluation through drift monitoring, and continuous risk monitoring through structured OOD signals.

ISO 42001 — Clause 8.2, Clause 9.1

Clause 8.2 requires AI risk assessment including the identification of risks that may arise from changes in the operating environment. Clause 9.1 requires monitoring, measurement, analysis, and evaluation of the AI management system. OOD detection and drift monitoring directly implement both requirements by identifying environmental changes (distributional shifts) and providing ongoing measurement of system alignment with validated conditions.

DORA — Article 10 (ICT-Related Incident Detection)

Article 10 requires financial entities to establish mechanisms for the detection of ICT-related incidents. Distributional drift that degrades AI agent performance constitutes an ICT-related incident when it affects financial operations. OOD detection provides the early warning mechanism that enables detection before the drift results in consequential failures.

10. Failure Severity

FieldValue
Severity RatingHigh
Blast RadiusDomain-specific — bounded by the scope of tasks processed by the agent, but potentially extending to all downstream consumers of agent output produced during undetected OOD conditions

Consequence chain: Without OOD detection, distributional shifts go undetected and the agent processes inputs outside its validated competence without any governance signal. The immediate failure is silent reliability degradation — the agent continues to produce outputs that appear valid but are produced under conditions where reliability has not been demonstrated. The failure compounds over time because each undetected OOD input produces an output that is consumed with the same trust level as in-distribution output. For individual-input OOD failures, the consequence is isolated to the specific decision or action based on the unreliable output. For population-level drift, the consequence is systemic — all outputs produced during the drift period are potentially unreliable, requiring retrospective review. The business consequence includes: financial loss from decisions based on unreliable agent output during undetected drift periods, regulatory enforcement for failure to maintain ongoing model monitoring (FCA SS1/23, EU AI Act Article 72), liability for unreliable outputs relied upon by third parties, and the remediation cost of retrospective review of all outputs produced during undetected OOD conditions. In safety-critical domains, undetected OOD operation can result in direct harm to individuals through unreliable clinical, safety, or infrastructure decisions.

Cross-references: AG-139 (Competence Envelope Governance) defines the validated boundaries that OOD detection monitors at runtime. AG-141 (Mandatory Abstention and Uncertainty Escalation Governance) defines the governance response when OOD inputs are detected. AG-142 (Autonomy Progression Governance) uses OOD detection signals as an input to autonomy level decisions. AG-022 (Behavioural Drift Detection) monitors behavioural changes that may be caused by distributional drift. AG-074 (Performance Drift and Revalidation) triggers re-validation when OOD-induced performance degradation is detected. AG-041 (Emergent Capability Detection) identifies capability changes that may alter the relationship between distribution and performance. AG-037 (Objective Alignment Verification) ensures that the agent's objectives remain aligned even when inputs approach distributional boundaries. AG-019 (Human Escalation & Override Triggers) defines the human escalation paths activated by OOD detection signals.

Cite this protocol
AgentGoverning. (2026). AG-140: Novelty and Out-of-Distribution Detection Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-140