AG-140: Novelty and Out-of-Distribution Detection Governance

2. Summary

Novelty and Out-of-Distribution Detection Governance requires that every AI agent deployment includes mechanisms to detect when incoming inputs, environmental conditions, or task characteristics differ materially from the distribution on which the agent was trained and validated. OOD detection is the runtime complement to the competence envelope defined in AG-139: where the competence envelope specifies the boundaries, OOD detection identifies when those boundaries are being approached or crossed in real time. A conforming system does not rely on the agent's own confidence scores as the sole indicator of distributional alignment — it implements independent detection mechanisms that evaluate inputs before or alongside agent processing, generating structured signals that feed into escalation (AG-141) and governance routing decisions. Without OOD detection, an agent operating at the boundary of its competence produces outputs indistinguishable from those produced well within its validated domain, creating a silent reliability degradation that compounds until a consequential failure exposes the gap.

3. Example

Scenario A — Undetected Language Distribution Shift in Contract Review: A legal technology firm deploys an AI agent to review commercial lease agreements. The agent is trained and validated on 28,000 UK commercial leases drafted under English law, achieving 94.8% clause extraction accuracy and 91.3% risk flagging accuracy. The firm expands into the Irish market and begins routing Irish commercial leases to the same agent. Irish commercial leases share approximately 80% structural similarity with English leases but contain critical differences: different statutory frameworks (Landlord and Tenant (Amendment) Act 1980 vs. Landlord and Tenant Act 1954), different break clause conventions, and different rent review mechanisms. The agent processes 1,400 Irish leases over four months, extracting clauses with apparent confidence. However, Irish-specific provisions — particularly upward-only rent review protections and Part IV renewal rights — are systematically misclassified. The firm discovers the issue when a client challenges an incorrect risk assessment on a €3.2 million lease renewal.

What went wrong: No OOD detection mechanism existed to identify that Irish leases constituted a distributional shift from the English lease training corpus. The agent's confidence scores remained high because the overall document structure was similar — the differences were in specific legal provisions that the confidence scoring mechanism was not calibrated to detect. Consequence: €3.2 million lease risk assessment challenged, 1,400 reviews requiring re-examination, professional indemnity claim, and client relationship damage.

Scenario B — Gradual Distributional Drift in Fraud Detection: A payments processor deploys an AI agent for real-time fraud detection on card-not-present transactions. The agent is validated on 18 months of historical transaction data with a false positive rate of 2.1% and a fraud detection rate of 97.4%. Over 8 months, the transaction population gradually shifts: the average transaction value increases from £47 to £89, the proportion of mobile-initiated transactions rises from 34% to 61%, and new merchant categories (cryptocurrency exchanges, NFT platforms) appear in the data. Each individual shift is incremental and does not trigger any hard boundary. The agent's fraud detection rate degrades from 97.4% to 88.2% over the 8-month period, but because the degradation is gradual, no single day's performance crosses an alert threshold. The cumulative missed fraud totals £4.7 million before a quarterly model review identifies the distributional drift.

What went wrong: The OOD detection was binary — it looked for hard boundary crossings on individual features but did not measure the aggregate distributional distance between incoming data and the validation distribution. The multivariate drift was significant even though no individual feature crossed a hard threshold. Consequence: £4.7 million in undetected fraud over 8 months, regulatory investigation, and mandatory model re-validation under accelerated timeline.

Scenario C — Novel Input Category in Medical Triage: A healthcare provider deploys an AI agent for emergency department triage, validated on 45,000 historical presentations covering 12 chief complaint categories. During a novel respiratory illness outbreak, patients present with a symptom combination — progressive dyspnoea with normal chest X-ray and distinctive ground-glass CT findings — that does not map to any of the 12 validated categories. The agent assigns these patients to the "lower respiratory tract infection" category with moderate confidence, applying standard triage protocols. However, the novel presentations require significantly different resource allocation: isolation rooms, specific PPE, and specialist respiratory review. Of 89 patients triaged during the first week of the outbreak, 34 are assigned inappropriate triage categories, resulting in delayed isolation and 7 secondary transmissions within the department.

What went wrong: The triage agent had no mechanism to detect that the incoming presentations represented a genuinely novel category not present in its training distribution. The agent's categorisation of novel inputs into the nearest existing category — a fundamental behaviour of classification models — was not flagged by any OOD detection mechanism. Consequence: 34 inappropriate triage assignments, 7 secondary transmissions, public health investigation, and temporary suspension of automated triage.

4. Requirement Statement

Scope: This dimension applies to all AI agents operating within a defined competence envelope (AG-139) where the reliability of the agent's output depends on the input being representative of the distribution on which the agent was validated. This includes virtually all deployed agents, because all machine learning systems are fundamentally distribution-dependent: their performance is validated on a specific data distribution, and their reliability on data drawn from a different distribution is unknown. The scope explicitly includes agents that appear to be performing well — an agent can produce high-confidence, plausible outputs on out-of-distribution inputs precisely because the agent has no structural awareness that its input distribution has shifted. The scope extends to environmental conditions and task characteristics, not just input data features: a change in the timing pattern of requests, the quality of upstream data sources, or the characteristics of the user population can all constitute distributional shifts relevant to agent reliability.

4.1. A conforming system MUST implement at least one OOD detection mechanism that operates independently of the agent's own confidence or uncertainty estimates, evaluating incoming inputs against a reference representation of the validation distribution.

4.2. A conforming system MUST define quantitative OOD detection thresholds calibrated against the validation distribution, such that inputs exceeding the threshold are flagged before or concurrent with agent processing — not after the agent has produced and delivered output.

4.3. A conforming system MUST generate structured, machine-readable OOD signals that include: the detection method, the measured distance or score, the threshold exceeded, and a timestamp, enabling downstream governance processes (AG-141 escalation, AG-139 envelope monitoring) to consume the signal programmatically.

4.4. A conforming system MUST monitor for distributional drift at the population level — not only individual-input OOD detection — by comparing aggregate statistics of incoming data against the validation distribution on a defined schedule (no less frequent than daily for agents processing more than 1,000 requests per day).

4.5. A conforming system MUST trigger re-validation of the competence envelope (AG-139) when population-level distributional drift exceeds a predefined threshold, as measured by a statistical distance metric (e.g., Population Stability Index > 0.2, Jensen-Shannon divergence > 0.1, or domain-appropriate equivalent).

4.6. A conforming system SHOULD implement multiple complementary OOD detection methods — for example, input-space density estimation combined with feature-space distance metrics — to reduce the risk of any single method's blind spots.

4.7. A conforming system SHOULD calibrate OOD detection thresholds using a held-out validation set that includes known OOD examples, measuring the trade-off between detection sensitivity (catching true OOD inputs) and specificity (avoiding false OOD flags on valid inputs), and selecting a threshold that prioritises sensitivity in safety-critical applications.

4.8. A conforming system SHOULD log all OOD detection results — including inputs scored below the threshold — to support retrospective analysis of detection method effectiveness and threshold calibration.

4.9. A conforming system MAY implement tiered OOD responses — for example, inputs slightly above the threshold receive enhanced logging and human review, while inputs significantly above the threshold trigger immediate escalation or abstention.

5. Rationale

Out-of-distribution detection is a necessary complement to competence envelope governance because the competence envelope defines the validated boundary statically, while OOD detection enforces that boundary dynamically at runtime. The competence envelope says "this agent was validated on inputs with these characteristics." OOD detection says "this specific incoming input does or does not match those characteristics."

The fundamental challenge is that AI agents — particularly those built on large language models or deep neural networks — do not fail gracefully at distribution boundaries. A traditional rule-based system encountering an input it cannot process returns an error. An AI agent encountering an input outside its training distribution typically produces output that is syntactically valid, stylistically consistent with in-distribution output, and may even carry high confidence scores. The confidence calibration of modern neural networks is notoriously poor at distribution boundaries: models are frequently most confident on inputs that are most different from their training data, because the softmax function produces high-confidence outputs even when the model's internal representations are poorly matched to the input.

This means that without independent OOD detection, organisations have no reliable runtime signal indicating when agent output should not be trusted. The agent's own confidence is unreliable precisely in the cases where reliability matters most. The solution is to implement detection mechanisms that evaluate inputs against the known validation distribution using methods that are independent of the agent's internal representations — or, if they use the agent's internal representations, do so through calibrated secondary models trained specifically for distributional distance estimation.

Population-level drift monitoring addresses a different failure mode: the gradual shift in input distribution that does not trigger individual-input OOD detection but cumulatively moves the operational distribution away from the validation distribution. This is analogous to the difference between a sudden temperature spike (detectable by a single reading) and a gradual warming trend (detectable only by comparing the distribution of readings over time). Both are dangerous; both require detection.

This dimension intersects with AG-139 (Competence Envelope Governance) as the runtime enforcement arm of the envelope. It intersects with AG-141 (Mandatory Abstention and Uncertainty Escalation Governance) because OOD detection signals are a primary input to abstention and escalation decisions. It intersects with AG-022 (Behavioural Drift Detection) because behavioural drift may be caused by distributional drift — the agent's behaviour changes because its inputs have changed. It intersects with AG-074 (Performance Drift and Revalidation) because distributional drift is a leading indicator of performance drift.

6. Implementation Guidance

OOD detection operates at two levels: individual-input detection (is this specific input within the validated distribution?) and population-level drift monitoring (has the overall distribution of inputs shifted from the validation distribution?). Both are required. They serve different purposes and catch different failure modes.

Individual-Input OOD Detection Methods:

Input-space density estimation. Fit a density model (e.g., kernel density estimation, Gaussian mixture model, normalizing flow) to the validation dataset's feature representation. At runtime, evaluate each input's log-likelihood under the fitted density model. Inputs with log-likelihood below a calibrated threshold are flagged as OOD. For a tabular input agent processing financial transactions, this might use a multivariate Gaussian fitted to the validation set's feature vectors. For a text-processing agent, this might use the perplexity of a reference language model on the input text as a proxy for distributional distance.
Embedding-space distance metrics. Extract a fixed-dimensional embedding of each input (e.g., from a penultimate-layer representation or a dedicated encoder). Compute the Mahalanobis distance from the input's embedding to the centroid of the validation distribution's embedding cluster. Inputs with distance exceeding a calibrated threshold are flagged as OOD. This method is particularly effective for high-dimensional inputs (text, images) where input-space density estimation is computationally intractable.
Ensemble disagreement. Maintain an ensemble of models (or use Monte Carlo dropout as an approximation). Evaluate the input across all ensemble members. High disagreement — measured by prediction entropy, variance of predicted probabilities, or Kullback-Leibler divergence between member predictions — indicates that the input falls in a region where the training data did not provide consistent signal. This method detects epistemic uncertainty, which correlates with distributional novelty.

Population-Level Drift Monitoring Methods:

Population Stability Index (PSI). Partition each feature into bins based on the validation distribution. Compare the proportion of incoming data in each bin against the validation proportions. PSI values above 0.1 indicate moderate drift; values above 0.2 indicate significant drift warranting re-validation. Compute PSI daily or weekly depending on data volume. For a deployment processing 50,000 transactions per day, daily PSI across 15 key features provides robust drift detection.
Kolmogorov-Smirnov (KS) test. Apply the two-sample KS test to each continuous feature, comparing the incoming distribution against the validation distribution. A KS statistic exceeding the critical value at the chosen significance level (e.g., p < 0.01) indicates distributional shift. Run on rolling windows (e.g., 7-day rolling window compared against the validation distribution).
Multivariate drift detection. For high-dimensional data, univariate drift tests on individual features may miss correlated shifts. Use multivariate two-sample tests (e.g., Maximum Mean Discrepancy, classifier-based drift detection) to capture distributional shifts that manifest across feature combinations rather than individual features.

Recommended patterns:

Dual-layer detection architecture. Implement individual-input OOD detection as a synchronous pre-processing step in the request pipeline (latency budget: 5-20ms for real-time agents) and population-level drift monitoring as an asynchronous batch process on a daily schedule. The individual-input layer catches sudden, dramatic OOD inputs. The population-level layer catches gradual drift that individual-input detection misses.
Calibrated threshold selection. Set OOD detection thresholds using a held-out calibration set that includes both in-distribution and known OOD examples. Plot the receiver operating characteristic (ROC) curve and select the threshold that achieves the organisation's desired trade-off between sensitivity and specificity. For safety-critical applications, prioritise sensitivity (catching true OOD inputs) even at the cost of higher false-positive rates. For high-throughput, low-risk applications, balance sensitivity and specificity to avoid excessive escalation.
Reference distribution versioning. Version the reference distribution representation alongside the competence envelope. When the competence envelope is re-validated (AG-139), update the reference distribution used for OOD detection. Maintain the linkage between envelope version, reference distribution version, and detection threshold calibration as a single governance artefact.

Anti-patterns to avoid:

Relying solely on agent confidence scores. Agent confidence (e.g., softmax probability) is not a reliable OOD indicator. Neural networks frequently produce high-confidence predictions on OOD inputs because the softmax function normalises over known categories — it cannot assign probability to "none of the above." An agent that is 99.2% confident on an OOD input is not demonstrating competence; it is demonstrating the structural limitation of softmax confidence.
Binary OOD classification without severity gradation. A binary in/out determination loses information. A continuous OOD score enables tiered responses: mild OOD triggers enhanced logging, moderate OOD triggers human review, severe OOD triggers immediate abstention. The continuous score also supports threshold adjustment over time as operational experience accumulates.
Individual-feature drift monitoring without multivariate analysis. Monitoring each feature independently misses correlated drift. If feature A shifts slightly up and feature B shifts slightly down, neither individual shift may cross a threshold, but the joint shift — moving into a previously unobserved region of the feature space — may represent significant distributional novelty.
Setting thresholds based on training data only. Thresholds calibrated solely on training data reflect the training distribution's density, not the distinction between in-distribution and out-of-distribution. Effective threshold calibration requires known OOD examples — either real examples from adjacent domains or synthetically generated boundary cases.
Ignoring temporal and contextual features. An input that is in-distribution during business hours may be OOD at 3 AM. A transaction that is in-distribution in Q1 may be OOD in Q4 due to seasonal patterns. OOD detection should account for temporal and contextual features that affect the reference distribution.

Industry Considerations

Financial Services. OOD detection for trading and risk agents should account for regime changes — transitions between low-volatility and high-volatility market environments that represent distributional shifts even if individual features remain within historical ranges. The combination of features (e.g., low volume with high volatility) may be OOD even when each feature individually is not. Regulators expect firms to demonstrate that models are monitored for ongoing appropriateness (FCA SS1/23, ECB TRIM).

Healthcare. OOD detection for clinical agents must be sensitive to novel disease presentations, emerging pathogens, and population demographic shifts. The cost of a false negative (missing a true OOD input) is significantly higher than the cost of a false positive (escalating an in-distribution input). Thresholds should be set to prioritise sensitivity. Novel symptom combinations should trigger immediate clinical review regardless of agent confidence.

Autonomous Systems. OOD detection for embodied agents (robotics, autonomous vehicles, edge devices) must operate under strict latency constraints — typically under 10ms. Methods must be computationally efficient. Environmental novelty detection (new obstacles, unusual lighting, degraded sensor quality) is as important as input-data novelty detection. Edge-deployed models may use lightweight distance metrics (cosine distance to nearest cluster centroid) rather than full density estimation.

Maturity Model

Basic Implementation — The organisation monitors agent confidence scores and flags outputs below a predefined confidence threshold. Population-level monitoring consists of periodic (monthly) review of aggregate performance metrics. OOD detection is not independent of the agent — it relies on the agent's own uncertainty estimates. This level provides minimal awareness of distributional issues but has significant blind spots: high-confidence OOD inputs are not detected, and monthly monitoring cadence allows drift to accumulate before detection.

Intermediate Implementation — At least one independent OOD detection mechanism operates alongside the agent, evaluating inputs against a reference distribution representation derived from the validation dataset. Individual-input detection operates synchronously in the request pipeline. Population-level drift monitoring runs on a daily or weekly schedule using statistical distance metrics (PSI, KS tests). OOD signals are structured and machine-readable, feeding into escalation workflows (AG-141). Detection thresholds are calibrated using held-out data including known OOD examples. The reference distribution is versioned alongside the competence envelope.

Advanced Implementation — All intermediate capabilities plus: multiple complementary OOD detection methods are deployed (density-based, distance-based, and ensemble-based) to reduce blind spots. Multivariate drift detection captures correlated distributional shifts. Continuous OOD score distributions are monitored and threshold calibration is reviewed quarterly. OOD detection effectiveness is measured through labelled OOD evaluation sets updated periodically with real-world OOD examples encountered in production. Independent third-party evaluation of detection method coverage is performed annually. The organisation can demonstrate to regulators a complete chain from reference distribution through detection mechanism to escalation response for every deployed agent.

7. Evidence Requirements

Required artefacts:

OOD detection method specification. Documentation of each OOD detection mechanism deployed, including: method description, reference distribution representation, threshold calibration methodology, and expected detection coverage (what types of distributional shifts the method is designed to catch and what it is not designed to catch).
Threshold calibration report. Results of threshold calibration, including: the calibration dataset composition (in-distribution and OOD examples), the ROC curve or equivalent sensitivity/specificity analysis, and the rationale for the selected threshold.
OOD detection logs. Timestamped records of OOD detection results for all processed inputs, including: OOD score, threshold applied, detection outcome (in-distribution or OOD), and routing decision. Minimum 12 months retention.
Population drift monitoring reports. Scheduled drift monitoring results, including: statistical distance metrics computed, comparison against thresholds, and actions taken when thresholds were exceeded. Minimum 12 months retention.
Reference distribution version history. Versioned reference distribution artefacts linked to the corresponding competence envelope version and threshold calibration.

Retention requirements:

OOD detection logs and drift monitoring reports: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Evidence must exist as retained artefacts, not be reconstructable after the fact.

8. Test Specification

Testing AG-140 compliance requires verification that OOD detection mechanisms function correctly across a range of distributional shift scenarios. A comprehensive test programme should include the following tests.

Test 8.1: Known OOD Detection Sensitivity

Stimulus: Submit a set of 200 known OOD inputs — inputs drawn from distributions clearly outside the validation domain (e.g., inputs from adjacent but distinct task domains, inputs in unsupported languages, inputs with feature values outside validated ranges). Include both dramatic OOD inputs (completely different domain) and subtle OOD inputs (adjacent domain with overlapping features).
Expected behaviour: The OOD detection mechanism flags all inputs with OOD scores above the configured threshold.
Pass criteria: Detection sensitivity of at least 95% on the known OOD test set (at least 190 of 200 inputs correctly flagged).
Fail criteria: Detection sensitivity below 90%, or any dramatic OOD input (completely different domain) goes undetected.

Test 8.2: In-Distribution False Positive Rate

Stimulus: Submit a set of 500 known in-distribution inputs drawn from the validation distribution. These inputs should span the full range of the competence envelope, including boundary regions.
Expected behaviour: The OOD detection mechanism assigns OOD scores below the threshold for the vast majority of in-distribution inputs.
Pass criteria: False positive rate below 5% (no more than 25 of 500 in-distribution inputs incorrectly flagged as OOD).
Fail criteria: False positive rate above 10%, or in-distribution inputs near the centre of the validation distribution are flagged as OOD.

Test 8.3: Detection Independence from Agent Confidence

Stimulus: Submit OOD inputs that the agent processes with high confidence (above 90% confidence score). Verify that the OOD detection mechanism flags these inputs independently of the agent's confidence.
Expected behaviour: OOD detection operates on input characteristics, not agent output confidence. High-confidence OOD inputs are still flagged.
Pass criteria: At least 90% of high-confidence OOD inputs are detected by the independent OOD mechanism.
Fail criteria: The OOD detection mechanism relies on or is influenced by the agent's confidence score, or fewer than 80% of high-confidence OOD inputs are detected.

Test 8.4: Population-Level Drift Detection

Stimulus: Gradually shift the incoming data distribution over a simulated 14-day period, moving 3 key features by 1 standard deviation each from their validation means. No individual input crosses a hard OOD boundary.
Expected behaviour: The population-level drift monitoring detects the distributional shift within the monitoring cadence and generates a drift alert when the statistical distance metric exceeds the configured threshold.
Pass criteria: Drift is detected within 2 monitoring cycles of the threshold being crossed. A structured drift alert is generated.
Fail criteria: Drift is not detected within 5 monitoring cycles, or no drift alert is generated despite the statistical distance metric exceeding the threshold.

Test 8.5: Structured OOD Signal Generation

Stimulus: Submit an OOD input and capture the OOD detection output.
Expected behaviour: The detection mechanism produces a structured, machine-readable signal containing: detection method identifier, OOD score, threshold value, detection outcome, and timestamp.
Pass criteria: All required fields are present, machine-readable, and correctly populated.
Fail criteria: Any required field is missing, the signal is not machine-readable, or the OOD score is inconsistent with the detection outcome.

Test 8.6: Detection Latency Under Load

Stimulus: Submit 1,000 concurrent requests and measure the additional latency introduced by the OOD detection mechanism.
Expected behaviour: OOD detection completes within the allocated latency budget (e.g., 20ms for real-time agents) under concurrent load.
Pass criteria: 95th percentile detection latency remains within the specified budget under 1,000 concurrent requests.
Fail criteria: 95th percentile detection latency exceeds twice the specified budget, or OOD detection is bypassed under load.

Test 8.7: Re-validation Trigger on Drift Detection

Stimulus: Simulate a population-level distributional drift that exceeds the re-validation threshold (e.g., PSI > 0.2). Verify that the system triggers competence envelope re-validation (AG-139 linkage).
Expected behaviour: The drift detection triggers a re-validation workflow, marking the competence envelope as "pending re-validation."
Pass criteria: Re-validation workflow is triggered within 24 hours of drift threshold exceedance. The competence envelope status is updated.
Fail criteria: No re-validation workflow is triggered, or the competence envelope status does not reflect the detected drift.

Conformance Scoring

Score 0: No OOD detection exists — agents process all inputs without distributional assessment.
Score 1: OOD detection relies solely on agent confidence scores, with no independent detection mechanism. Population-level drift monitoring is absent or ad hoc.
Score 2: At least one independent OOD detection mechanism operates at the input level, with calibrated thresholds and structured signal generation. Population-level drift monitoring runs on a defined schedule.
Score 3: Multiple complementary OOD detection methods are deployed, calibrated with known OOD examples, and validated through independent testing. Multivariate drift detection is operational. Detection effectiveness is measured and reported. Independent third-party review of detection coverage is performed.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9 (Risk Management System)	Supports compliance
EU AI Act	Article 15 (Accuracy, Robustness, Cybersecurity)	Direct requirement
EU AI Act	Article 72 (Post-Market Monitoring)	Direct requirement
NIST AI RMF	MEASURE 1.1, MEASURE 2.5, MANAGE 3.1	Supports compliance
ISO 42001	Clause 8.2 (AI Risk Assessment), Clause 9.1 (Monitoring, Measurement, Analysis)	Supports compliance
FCA SS1/23	Model Risk Management — Ongoing Monitoring	Direct requirement
DORA	Article 10 (ICT-Related Incident Detection)	Supports compliance

EU AI Act — Article 15 (Accuracy, Robustness, Cybersecurity)

Article 15(1) requires that high-risk AI systems are designed and developed in such a way that they achieve an appropriate level of accuracy, robustness, and cybersecurity, and perform consistently in those respects throughout their lifecycle. OOD detection is a primary mechanism for ensuring lifecycle consistency — it detects when the conditions under which accuracy was validated no longer hold, triggering appropriate governance responses. Without OOD detection, the accuracy validated at deployment degrades silently as the input distribution evolves.

EU AI Act — Article 72 (Post-Market Monitoring)

Article 72 requires providers of high-risk AI systems to establish a post-market monitoring system to actively collect and review data on the performance of AI systems throughout their lifetime. OOD detection and population-level drift monitoring directly implement this requirement by providing continuous, structured monitoring of whether the AI system's operational conditions remain aligned with its validation conditions. The data collected through OOD detection feeds directly into post-market monitoring obligations.

FCA SS1/23 — Model Risk Management — Ongoing Monitoring

The FCA's supervisory statement requires firms to "monitor models on an ongoing basis to confirm that they continue to perform as expected." For AI agents, "as expected" means within the validated competence envelope on the validated data distribution. OOD detection operationalises this requirement by providing continuous, automated monitoring of distributional alignment. The FCA expects firms to identify and respond to model deterioration in a timely manner — the daily monitoring cadence required by AG-140 supports this expectation.

NIST AI RMF — MEASURE 1.1, MEASURE 2.5, MANAGE 3.1

MEASURE 1.1 addresses appropriate methods and metrics for measuring AI system performance. MEASURE 2.5 addresses the evaluation of AI system performance in deployment. MANAGE 3.1 addresses the continuous monitoring of identified risks. OOD detection supports compliance by providing runtime measurement of distributional alignment, deployment performance evaluation through drift monitoring, and continuous risk monitoring through structured OOD signals.

ISO 42001 — Clause 8.2, Clause 9.1

Clause 8.2 requires AI risk assessment including the identification of risks that may arise from changes in the operating environment. Clause 9.1 requires monitoring, measurement, analysis, and evaluation of the AI management system. OOD detection and drift monitoring directly implement both requirements by identifying environmental changes (distributional shifts) and providing ongoing measurement of system alignment with validated conditions.

Article 10 requires financial entities to establish mechanisms for the detection of ICT-related incidents. Distributional drift that degrades AI agent performance constitutes an ICT-related incident when it affects financial operations. OOD detection provides the early warning mechanism that enables detection before the drift results in consequential failures.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Domain-specific — bounded by the scope of tasks processed by the agent, but potentially extending to all downstream consumers of agent output produced during undetected OOD conditions

Consequence chain: Without OOD detection, distributional shifts go undetected and the agent processes inputs outside its validated competence without any governance signal. The immediate failure is silent reliability degradation — the agent continues to produce outputs that appear valid but are produced under conditions where reliability has not been demonstrated. The failure compounds over time because each undetected OOD input produces an output that is consumed with the same trust level as in-distribution output. For individual-input OOD failures, the consequence is isolated to the specific decision or action based on the unreliable output. For population-level drift, the consequence is systemic — all outputs produced during the drift period are potentially unreliable, requiring retrospective review. The business consequence includes: financial loss from decisions based on unreliable agent output during undetected drift periods, regulatory enforcement for failure to maintain ongoing model monitoring (FCA SS1/23, EU AI Act Article 72), liability for unreliable outputs relied upon by third parties, and the remediation cost of retrospective review of all outputs produced during undetected OOD conditions. In safety-critical domains, undetected OOD operation can result in direct harm to individuals through unreliable clinical, safety, or infrastructure decisions.

Cross-references: AG-139 (Competence Envelope Governance) defines the validated boundaries that OOD detection monitors at runtime. AG-141 (Mandatory Abstention and Uncertainty Escalation Governance) defines the governance response when OOD inputs are detected. AG-142 (Autonomy Progression Governance) uses OOD detection signals as an input to autonomy level decisions. AG-022 (Behavioural Drift Detection) monitors behavioural changes that may be caused by distributional drift. AG-074 (Performance Drift and Revalidation) triggers re-validation when OOD-induced performance degradation is detected. AG-041 (Emergent Capability Detection) identifies capability changes that may alter the relationship between distribution and performance. AG-037 (Objective Alignment Verification) ensures that the agent's objectives remain aligned even when inputs approach distributional boundaries. AG-019 (Human Escalation & Override Triggers) defines the human escalation paths activated by OOD detection signals.

Cite this protocol

AgentGoverning. (2026). AG-140: Novelty and Out-of-Distribution Detection Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-140

← Previous Protocol

AG-139

Competence Envelope Governance

Next Protocol →

AG-141

Mandatory Abstention and Uncertainty Escalation Governance