AG-726

Training-Serving Skew Detection Governance

Supplementary Core & Adversarial Model Resistance ~23 min read AGS v2.1 · April 2026
EU AI Act NIST ISO 42001

Section 2: Summary

This dimension governs the systematic detection, measurement, escalation, and response to distributional divergence between the data populations used during model training and fine-tuning and the live production input streams encountered at inference time — a phenomenon known as training-serving skew. Training-serving skew is operationally significant because a model's reliability guarantees, safety properties, and calibration characteristics are contingent on the assumption that production inputs approximate the statistical manifold the model was trained against; when that assumption breaks down, the model may produce confidently wrong outputs, fail silently, generate outputs that violate safety constraints validated only on in-distribution data, or exhibit degraded fairness properties without any explicit error signal. Failure manifests as a prolonged period of undetected silent degradation in which downstream systems, human operators, and end users continue to trust model outputs that have become statistically unsupported by the model's training evidence base, ultimately producing consequential decisions that would not survive audit against the original model validation record.

Section 3: Example

Scenario A — Financial-Value Agent: Loan Underwriting Drift

A mid-tier lending institution deploys an LLM-augmented underwriting agent in Q1 trained on 36 months of consumer credit applications drawn from a stable macro-economic regime. By Q3 of the same year, a regional employment shock restructures the applicant pool: average debt-to-income ratios shift 18 percentage points, the share of applications citing gig-economy income rises from 7% to 31%, and the modal credit score band in the applicant stream drops by half a standard deviation. The agent's training distribution never included macro-stressed gig-worker profiles at this prevalence. No skew detection mechanism is in place. The model continues to assign approval-confidence scores in the 0.78–0.88 range — values the organisation treats as reliable — but empirical approval rates in the new sub-population diverge 24 percentage points from post-hoc ground-truth creditworthiness. Thirty-one weeks pass before an increase in early delinquency rates triggers a manual review. Provisioning losses attributable to the skewed period total approximately USD 4.2 million; the institution faces regulatory inquiry under consumer credit supervision requirements for systematic model mis-performance on a protected demographic sub-group that was disproportionately represented in the newly arrived gig-economy cohort.

Scenario B — Safety-Critical / CPS Agent: Industrial Robot Collision Avoidance

An embodied robotic arm operating in an automotive assembly cell is controlled by a language-conditioned policy agent trained and validated against a structured task vocabulary of 840 natural-language instruction templates issued by operators via headset. A facility upgrade introduces a new class of operator — contracted bilingual workers whose instruction patterns differ lexically and syntactically from the training corpus: mean instruction length increases from 6.2 tokens to 14.8 tokens, passive-voice constructions appear at 38% frequency versus 2% in training data, and a set of domain-specific abbreviations absent from the training vocabulary account for 19% of instruction tokens. The agent processes these instructions without out-of-distribution flagging, interpreting ambiguous abbreviated commands as high-confidence matches to its nearest training exemplars. Over a 73-hour production window, the arm executes three mis-commanded movements. Two result in minor tooling damage; the third produces a near-miss injury event requiring emergency stop and facility evacuation. Root-cause analysis, conducted only after the near-miss, identifies the input distribution shift as the causal factor. A retrospective skew detection analysis shows the Jensen–Shannon divergence of the production token distribution had exceeded the pre-established safety threshold within the first 6 hours of the new operator cohort going live.

Scenario C — Public Sector / Rights-Sensitive Agent: Benefits Eligibility Screening

A national welfare agency deploys a natural-language intake agent trained on three years of English-language benefit applications from an urban population. A policy change extends the programme to rural regions previously served by paper processes; 40% of the new applicant cohort submits applications containing non-standard regional idioms, unconventional date formats, and colloquial income descriptions not present in training data. The agent's output — structured eligibility verdicts passed to caseworkers — begins producing denial recommendations at a rate 31 percentage points above the baseline for equivalently situated urban applicants, creating a disparate impact on the rural cohort. Because no skew detection is active, the agency's quality-assurance sampling (conducted at 2% of decisions) does not accumulate sufficient signal to detect the shift for 19 weeks. During this period, 6,200 applications receive skew-influenced eligibility recommendations. Legal challenge under administrative law and equality legislation follows. The agency is required to re-adjudicate all affected cases at a cost of approximately EUR 2.1 million in administrative remediation, plus reputational damage that delays adoption of AI-assisted processing across three further departments.

Section 4: Requirement Statement

4.0 Scope

This dimension applies to all AGS-governed agent deployments that operate an LLM, fine-tuned foundation model, or learned embedding-based component in a production serving context where the model was validated against a discrete, bounded training or validation dataset. Scope includes primary inference paths, retrieval-augmented generation (RAG) retrieval stages where the query distribution over the knowledge index is model-dependent, and any downstream scoring, ranking, or routing component whose behaviour is conditioned on model outputs. Scope extends to scheduled batch inference pipelines as well as real-time serving endpoints. Scope excludes purely rule-based components with no learned parameters, provided those components are fully specified in the system's architecture record. All ten Primary Profiles listed in Section 1 are in-scope; additional requirements specific to Safety-Critical / CPS Agents and Financial-Value Agents are called out explicitly within subsections.

4.1 Skew Detection Instrumentation Baseline

4.1.1 The operator MUST instrument the production serving pipeline to capture a continuous or regularly sampled representation of the statistical properties of live input data at every model entry point, including raw text inputs, structured feature vectors passed to learned components, and embedding-space representations.

4.1.2 The operator MUST define, prior to production deployment, a reference distribution for each instrumented entry point, derived from the training dataset, the validation dataset, or a documented holdout set that is representative of the intended deployment population. This reference distribution MUST be persisted as a versioned artefact in the model registry or equivalent configuration store.

4.1.3 The operator MUST select and configure at least one quantitative divergence metric for each instrumented entry point. Acceptable metrics include but are not limited to: Population Stability Index (PSI), Jensen–Shannon divergence, Kullback–Leibler divergence with symmetric extension, Maximum Mean Discrepancy (MMD), Wasserstein distance, or domain classifier-based drift scores. The selected metric(s) MUST be documented in the model card or system card for the deployment.

4.1.4 The operator MUST establish threshold values for each configured divergence metric that distinguish acceptable natural variance from material distributional divergence. Thresholds MUST be empirically derived from historical variation analysis or expert elicitation during model validation and MUST not be set solely by default library values without documented justification.

4.1.5 For Safety-Critical / CPS Agents, the operator MUST configure skew detection to operate in near-real-time with a maximum latency of five minutes between production input accumulation and divergence metric computation during active operation windows. Batch-only detection is not sufficient for this profile.

4.2 Alert and Escalation Mechanism

4.2.1 The operator MUST implement an automated alerting mechanism that triggers when any configured divergence metric crosses its defined threshold. Alerts MUST be routed to a designated responsible human owner with authority to initiate incident response.

4.2.2 The operator MUST define at least two severity tiers for skew alerts: a Warning tier (metric approaching threshold) and a Critical tier (metric exceeding threshold). Each tier MUST have a documented maximum response time and a defined set of permitted automated or human actions.

4.2.3 Alerts MUST include, at minimum: the timestamp of detection, the metric name and current value, the threshold value, the affected model component and endpoint identifier, the observation window over which the divergence was measured, and a summary of which input features or token populations contributed most to the measured divergence.

4.2.4 The operator MUST ensure that skew alerts are routed independently of the primary model serving infrastructure such that a failure in the serving layer cannot suppress or delay skew alert delivery.

4.3 Response Protocol

4.3.1 The operator MUST document a written Skew Response Protocol (SRP) that specifies the decision logic governing agent behaviour under each alert severity tier. The SRP MUST address at minimum: continued operation with enhanced monitoring, output confidence downgrading or uncertainty disclosure to downstream consumers, invocation of the graceful degradation and fallback configuration governed by AG-203, and human-in-the-loop escalation triggers.

4.3.2 For Financial-Value Agents and Crypto/Web3 Agents, the SRP MUST include a mandatory human review gate for any high-stakes output — defined as any output that initiates, approves, or modifies a financial transaction, commitment, or position — generated while a Critical skew alert is active.

4.3.3 For Safety-Critical / CPS Agents, the operator MUST configure an automatic safe-state transition or operational mode reduction that is triggered without human initiation when a Critical skew alert is active and the deviation vector indicates the skewed inputs intersect with safety-validated input regions documented in the safety case.

4.3.4 The operator MUST log all response actions taken under the SRP as structured records in an append-only audit log, including the action taken, the actor (human or automated), the timestamp, and any override of a recommended automated action together with the documented justification for that override.

4.4 Root-Cause Investigation

4.4.1 The operator MUST initiate a formal root-cause investigation within five business days of any Critical skew alert that remains active for more than the defined maximum response window without resolution.

4.4.2 The root-cause investigation MUST produce a written finding that classifies the source of skew into one or more of the following causal categories: genuine population shift (the target population has changed), pipeline degradation (an upstream data transformation or preprocessing component has introduced distortion), adversarial manipulation (deliberate injection of out-of-distribution inputs), data contamination (an unintended change in the data sourcing or labelling pipeline), or model staleness (the model was previously adequate but the world has changed). The classification MUST be made explicit in the investigation record.

4.4.3 The investigation MUST assess whether outputs produced during the skew window require remediation, including re-evaluation of decisions made, notifications to affected parties, or regulatory disclosure. The output of this assessment MUST be recorded.

4.5 Retraining and Re-validation Gate

4.5.1 The operator MUST define explicit quantitative criteria under which a detected skew event triggers a mandatory model retraining or re-fine-tuning cycle. These criteria MUST be included in the SRP.

4.5.2 Any model retrained, fine-tuned, or updated in response to a skew event MUST undergo the full validation process defined in the agent's model validation record before re-deployment to production. Shortcut or expedited validation MUST be explicitly documented with a formal risk acceptance sign-off by the designated model risk owner.

4.5.3 The operator MUST update the reference distribution artefact defined in 4.1.2 whenever the model's training or fine-tuning dataset is materially changed. Deployment of a retrained model without an updated reference distribution is not permitted.

4.6 Coverage of Compound and Multi-Modal Inputs

4.6.1 Where an agent processes compound inputs comprising multiple modalities (e.g., structured data fields combined with free-text, or image embeddings concatenated with language tokens), the operator MUST instrument skew detection independently for each modality and for the joint representation if a joint embedding or fusion layer is used.

4.6.2 The operator MUST test whether the divergence metrics configured for text-based inputs are appropriate for the semantic space of the model's token vocabulary. If the model has been fine-tuned on a domain-specific vocabulary, standard corpus-level divergence metrics calibrated on general-domain distributions MUST NOT be used without domain-specific recalibration.

4.7 Adversarial Skew Resilience

4.7.1 The operator MUST assess, as part of the model's pre-deployment adversarial testing programme, whether the skew detection instrumentation itself can be evaded or suppressed by a deliberate adversary who controls a fraction of the input stream. This assessment MUST consider gradual drift injection, cyclic on/off drift patterns, and statistical camouflage techniques that keep aggregate divergence metrics below threshold while shifting the operational distribution in semantically significant ways.

4.7.2 The operator SHOULD supplement aggregate distribution metrics with semantic-space monitoring (e.g., clustering of embedding representations, novelty scoring of inputs against training-set prototypes) to detect adversarial localised distribution shifts that aggregate metrics may miss.

4.7.3 The operator MAY implement canary input streams — synthetic inputs drawn from the training reference distribution — interleaved with production inputs to provide a continuous ground-truth calibration baseline for the skew detection pipeline.

4.8 Transparency and Disclosure

4.8.1 The operator MUST maintain a skew detection configuration record as part of the agent's system card. This record MUST include the reference distribution identifier and version, the divergence metrics in use, the threshold values, the alert routing configuration, and the current status (no alert active / Warning active / Critical active) updated at a frequency no less than daily.

4.8.2 For Public Sector / Rights-Sensitive Agents and Customer-Facing Agents where the agent's outputs inform individual rights determinations, the operator MUST include a disclosure in the agent's published transparency documentation stating that a training-serving skew monitoring programme is in operation and summarising the protocol for notifying affected parties if a skew event is determined to have affected outputs that influenced individual determinations.

4.8.3 The operator SHOULD notify downstream system integrators and API consumers of an active Critical skew alert via a documented out-of-band channel within one hour of alert trigger, so that downstream systems can apply their own precautionary logic.

4.9 Governance Review Cycle

4.9.1 The operator MUST conduct a formal review of the skew detection configuration — including reference distributions, metric choices, and threshold values — at intervals not exceeding six months or immediately following any Critical skew event, whichever occurs sooner.

4.9.2 The operator MUST document the outcome of each governance review, including any changes made to thresholds, metrics, or response protocols, and the rationale for those changes. Documentation MUST be retained per the requirements in Section 7.

4.9.3 The operator SHOULD include a horizon scan of emerging skew detection methodologies in each governance review and assess whether newer techniques provide materially better sensitivity or specificity for the deployment's risk profile.

Section 5: Rationale

Structural vs Behavioural Enforcement

Training-serving skew is categorically distinct from most other failure modes addressed in the AGS framework because it is intrinsically invisible to the model itself. A model that is confidently wrong due to distributional shift produces outputs that are internally consistent with its training manifold — it has no internal signal indicating degradation. This distinguishes skew from, for example, hallucination on well-represented topics (where calibration mechanisms may produce uncertainty signals) or adversarial prompt injection (where the input is structurally anomalous). The consequence is that behavioural enforcement — relying on the model to recognise and flag its own out-of-distribution operation — is fundamentally inadequate as a sole control. Structural enforcement, operating at the infrastructure layer independently of the model's own inference path, is therefore not merely preferred but required.

Why the Detective Control Type Is Appropriate

The control type designation Detective reflects a deliberate design choice. Preventive controls for training-serving skew — such as restricting production inputs to only those matching the training distribution — are operationally impractical in open-world deployment contexts and would defeat the purpose of deploying adaptive language agents. The appropriate control architecture is one that permits the model to process diverse production inputs while maintaining continuous observational coverage of the statistical relationship between those inputs and the model's validated operating envelope, and that triggers structured human and automated responses when divergence exceeds safety and reliability thresholds. Detective controls also interact synergistically with corrective controls (retraining, rollback, fallback) in a closed-loop governance architecture that improves the model's reliability over time rather than merely constraining it.

Risk Amplification Across Profiles

The severity profile of training-serving skew failures is not uniform across agent types. For Financial-Value Agents and Crypto/Web3 Agents, skew-driven miscalibration directly translates into monetary loss and counterparty harm, and operates against a regulatory backdrop in which model risk management frameworks explicitly require ongoing monitoring of model performance relative to the conditions under which the model was validated. For Safety-Critical / CPS Agents, silent distributional shift intersects with safety-critical function, creating scenarios in which the gap between the model's validated safety envelope and its operational input distribution is invisible to the safety management system unless explicit detection is in place. For Public Sector / Rights-Sensitive Agents, the administrative law implications of decisions influenced by a skewed model — particularly where the skew disproportionately affects protected sub-populations — are material and can give rise to systematic illegality in individual determinations. The breadth of Primary Profiles for this dimension reflects the cross-cutting nature of the risk.

Relationship to Adversarial Threat Modelling

Training-serving skew can arise from benign causes (population shift, seasonality, organisational change) or from adversarial causes (deliberate drift injection designed to move the model into a region of its input space where it behaves in a manner advantageous to the adversary). The governance architecture for this dimension must therefore incorporate adversarial threat modelling — specifically, the question of whether an adversary with partial control of the input stream could engineer a skew that evades aggregate metric-based detection. This linkage to AG-404 (Adversarial Input Detection) and to the broader adversarial model resistance landscape is not incidental; it reflects the structural dependency between distributional monitoring and adversarial hardening.

Section 6: Implementation Guidance

Pattern 1 — Sliding Window Divergence Monitoring Implement a windowed divergence monitor that computes divergence metrics over rolling windows of production inputs (e.g., 1-hour, 24-hour, and 7-day windows simultaneously). Multi-window monitoring allows detection of both rapid acute shifts and slow gradual drift that would be invisible in any single window. Metrics from each window should be compared against window-specific thresholds derived from the temporal structure of the training data — if training data has known seasonal patterns, reference distributions for each temporal window should be seasonally adjusted.

Pattern 2 — Embedding-Space Prototype Monitoring Maintain a set of representative prototype embeddings sampled from the training distribution (e.g., via k-means or coreset construction over training embeddings). For each production input, compute the distance to the nearest prototype ensemble in embedding space. Maintain a running distribution of these distances and alert when the empirical distribution of production-input distances diverges significantly from the expected distribution observed over the training set. This approach is particularly effective for detecting semantic shift in language inputs where surface-level lexical statistics may be stable while the underlying semantic distribution has shifted.

Pattern 3 — Shadow Reference Scoring Deploy a lightweight reference model or statistical scoring function trained specifically to distinguish in-distribution from out-of-distribution inputs for the deployment context. This model operates as a shadow component on the production input stream, producing a continuously updated distributional alignment score that is independent of the primary model's inference path. The shadow scorer can be a simple binary classifier or a density estimator; its value is that it provides a semantically calibrated signal where generic divergence metrics may produce false positives or false negatives.

Pattern 4 — Feature Importance-Weighted Divergence When the primary model is known to be more sensitive to shifts in certain input features or token regions (e.g., numeric financial figures, geographic identifiers, temporal references), weight the divergence computation by estimated feature importance derived from training-time attribution analysis. Unweighted divergence metrics may be dominated by high-variance but low-importance features, masking shifts in low-variance but high-importance features.

Pattern 5 — Stratified Monitoring by Sub-Population Where the deployment context involves known demographic or operational sub-populations, compute divergence metrics separately for each stratified cohort in addition to the aggregate. Aggregate-level stability can mask significant shifts within sub-populations that are numerically small but operationally critical (e.g., a shift in the applicant pool for a protected demographic group). This is particularly important for Public Sector / Rights-Sensitive Agents and Customer-Facing Agents.

Pattern 6 — Automated Red-Line Circuit Breaker For Safety-Critical / CPS Agents and Financial-Value Agents, implement an automated circuit-breaker mechanism that places the agent into a constrained operating mode (human-in-the-loop for all outputs above a defined consequence threshold, or full suspension pending review) without requiring human initiation when a Critical skew alert is generated. The circuit-breaker configuration should be tested as part of the agent's pre-deployment acceptance testing.

Explicit Anti-Patterns

Anti-Pattern 1 — Using Only Output-Monitoring as a Proxy for Skew Detection Monitoring model output statistics (e.g., class distribution of decisions, average confidence scores) in lieu of input distribution monitoring is insufficient. Output drift is a lagging indicator of input skew and may be masked by compensating biases. Input-layer monitoring must be primary.

Anti-Pattern 2 — Setting Thresholds Without Empirical Baseline Derivation Adopting default threshold values from monitoring library documentation without deriving deployment-specific thresholds from historical input variation analysis produces both excessive false-positive alert rates (which erode operational trust in the monitoring system) and excessive false-negative rates (which allow material skew to pass undetected). Default values must not be used without documented justification.

Anti-Pattern 3 — Treating Skew Detection as a One-Time Deployment Activity Configuring skew detection metrics and thresholds once at deployment and failing to review them as the deployment context evolves is a governance failure. As the agent's intended user population changes, as upstream data pipelines are modified, and as the model is updated, reference distributions and thresholds require corresponding updates.

Anti-Pattern 4 — Siloing Skew Detection in the Data Science Team Skew detection that is owned exclusively by the model development team and not integrated into the operational incident response and risk management framework fails to produce the cross-functional responses needed when a Critical skew event occurs. Alert routing must reach risk owners, operational managers, and in applicable contexts, legal and compliance functions.

Anti-Pattern 5 — Using Aggregate Metrics Alone for Adversarially Contested Deployments In deployments where adversarial input manipulation is a plausible threat (e.g., Crypto/Web3 Agents, Customer-Facing Agents susceptible to coordinated abuse), relying solely on aggregate distribution statistics creates an exploitable blind spot. An adversary who understands the monitoring configuration can engineer inputs that shift the operational distribution in semantically significant ways while keeping aggregate metrics below alert thresholds. Semantic-space and prototype-based monitoring must complement aggregate statistical approaches.

Anti-Pattern 6 — Deferring Reference Distribution Updates During Retrain Cycles Retraining the model without simultaneously updating the reference distribution artefact causes the skew detection baseline to become misaligned with the new model's training distribution. Post-retraining, the monitoring system will generate spurious alerts or fail to detect genuine skew because its reference distribution describes the previous model's training population, not the current one.

Maturity Model

Level 1 — Basic Detection Aggregate PSI or KL divergence computed daily on a static reference distribution. Single-tier alerting. Manual response protocol. No automated response actions. Acceptable only for low-risk internal copilot deployments in stable operational contexts.

Level 2 — Monitored Operations Multi-window divergence monitoring with two-tier alerting. Embedding-space distance monitoring as a secondary signal. Documented SRP with defined response time SLAs. Reference distribution reviewed quarterly. Root-cause investigation process defined and exercised.

Level 3 — Adaptive Governance Stratified sub-population monitoring. Feature importance-weighted divergence. Shadow reference scorer operating in parallel with production model. Automated circuit-breaker for high-consequence outputs under Critical alert. Adversarial evasion assessment conducted and documented. Governance review cycle integrated with model risk management board reporting.

Level 4 — Continuous Assurance Canary input stream interleaving for continuous calibration. Real-time semantic-space cluster monitoring. Automated root-cause classification pipeline. Closed-loop integration between skew detection events, retraining triggers, and model registry versioning. Regulatory-ready evidence package auto-generated on each skew event. Appropriate for Safety-Critical / CPS, Financial-Value, and Public Sector / Rights-Sensitive profiles.

Section 7: Evidence Requirements

7.1 Required Artefacts

ArtefactDescriptionRetention Period
Reference Distribution RecordVersioned statistical snapshot of the training/validation reference distribution for each instrumented entry point, including the methodology used to construct itModel lifetime plus 7 years
Skew Detection Configuration RecordCurrent and historical configurations of divergence metrics, thresholds, alert routing, and window settings per model versionModel lifetime plus 7 years
Alert LogAppend-only structured log of all skew alerts generated, including severity, metric values, timestamps, observation windows, and contributing features7 years from date of record
Response Action LogStructured record of all actions taken under the SRP in response to each alert, including actor identity, timestamp, action taken, and override justifications7 years from date of record
Skew Response Protocol (SRP)Current version of the written SRP plus all prior versions with change history and approval recordsModel lifetime plus 7 years
Root-Cause Investigation ReportsWritten findings for all Critical skew events including causal classification, affected output assessment, and remediation decision10 years from date of report
Governance Review RecordsDocumented outcomes of each six-monthly governance review including metric and threshold changes and horizon scan findings7 years from date of review
Adversarial Assessment RecordDocumentation of the adversarial evasion assessment conducted under 4.7.1, including test methodology and findingsModel lifetime plus 7 years
Skew Event Disclosure RecordsRecords of any disclosures made to affected parties, downstream integrators, or regulators in connection with a skew event10 years from date of disclosure
Model Card / System Card Skew SectionThe published system card section documenting the skew detection programme per 4.8.1Current version plus three prior versions

7.2 Evidence Integrity Requirements

All evidence artefacts relating to skew detection must be stored in an append-only or write-once storage system that prevents post-hoc modification without generating an auditable change record. Alert logs and response action logs must be cryptographically timestamped or stored in a system that provides tamper-evidence through independent auditing. Evidence packages for Financial-Value Agents and Public Sector / Rights-Sensitive Agents must be structured to support regulatory inspection and must be producible within five business days of a formal regulatory request.

7.3 Artefact Linkage

Each Root-Cause Investigation Report must be linked by reference to the specific Alert Log records and model version identifiers that gave rise to the investigation. Each model version in the model registry must carry a pointer to the reference distribution artefact version that governs its skew detection configuration. This bidirectional linkage is required to enable retrospective audit of whether the skew detection configuration in force at any given point in time was appropriate for the model version then deployed.

Section 8: Test Specification

8.1 Reference Distribution Existence and Validity Test

Maps to: 4.1.2, 4.1.3, 4.1.4

Test objective: Verify that a current, versioned reference distribution artefact exists for each instrumented model entry point, that at least one divergence metric is configured against it, and that threshold values are empirically derived and documented.

Procedure:

  1. Retrieve the model registry entry for the production model under test.
  2. Verify the presence of a linked reference distribution artefact with a version identifier and creation timestamp.
  3. Confirm the reference distribution was derived from the training or validation dataset (not from production data only).
  4. Verify that at least one divergence metric is configured and named in the skew detection configuration record.
  5. Verify that threshold values are accompanied by a documented derivation method (historical variance analysis or expert elicitation record). Reject threshold values described only as "default."

Conformance scoring:

8.2 Alert Generation and Routing Test

Maps to: 4.2.1, 4.2.2, 4.2.3, 4.2.4

Test objective: Verify that the monitoring pipeline generates correctly structured alerts at both Warning and Critical tiers when divergence metrics cross their thresholds, and that alerts are routed to the designated responsible owner independently of the model serving infrastructure.

Procedure:

  1. Inject a synthetic production input stream with controlled divergence properties into the monitoring pipeline, calibrated to exceed the Warning threshold but not the Critical threshold. Verify that a Warning alert is generated within the defined detection latency window.
  2. Repeat with a stream calibrated to exceed the Critical threshold. Verify that a Critical alert is generated.
  3. Inspect the content of both generated alerts against the required fields specified in 4.2.3.
  4. Simulate a serving infrastructure failure (e.g., suspend the primary inference endpoint) and verify that the alert routing mechanism continues to deliver alerts without dependency on the serving layer.
  5. Verify that the designated responsible human owner is reachable via the configured routing channel.

Conformance scoring:

8.3 Skew Response Protocol Completeness and Enforcement Test

Maps to: 4.3.1, 4.3.2, 4.3.3, 4.3.4

Test objective: Verify that a written SRP exists, covers all required decision logic categories, and that automated and human response actions are executed and logged when a Critical alert is active.

Procedure:

  1. Retrieve the current SRP document and verify it addresses: continued operation with enhanced monitoring; output confidence downgrading; graceful degradation / fallback invocation; and human escalation triggers.
  2. For Financial-Value Agent deployments: verify the SRP includes a mandatory human review gate for high-stakes outputs under Critical alert, and that this gate is technically enforced (i.e., the system will not autonomously complete a financial commitment without human approval when a Critical alert is active).
  3. For Safety-Critical / CPS Agent deployments: verify that a Critical alert automatically triggers safe-state transition or operational mode reduction without requiring human initiation. Test by generating a Critical alert in a staging environment and observing the agent's response.
  4. Generate a Critical alert in a staging environment, trigger a response action, and inspect the audit log to confirm the action was recorded with all required fields (actor, timestamp, action, any override justification).

Conformance scoring:

Section 9: Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 9 (Risk Management System)Direct requirement
EU AI ActArticle 15 (Accuracy, Robustness and Cybersecurity)Direct requirement
NIST AI RMFGOVERN 1.1, MAP 3.2, MANAGE 2.2Supports compliance
ISO 42001Clause 6.1 (Actions to Address Risks), Clause 8.2 (AI Risk Assessment)Supports compliance

EU AI Act — Article 9 (Risk Management System)

Article 9 requires providers of high-risk AI systems to establish and maintain a risk management system that identifies, analyses, estimates, and evaluates risks. Training-Serving Skew Detection Governance implements a specific risk mitigation measure within this framework. The regulation requires that risks be mitigated "as far as technically feasible" using appropriate risk management measures. For deployments classified as high-risk under Annex III, compliance with AG-726 supports the Article 9 obligation by providing structural governance controls rather than relying solely on the agent's own reasoning or behavioural compliance.

EU AI Act — Article 15 (Accuracy, Robustness and Cybersecurity)

Article 15 requires high-risk AI systems to achieve appropriate levels of accuracy, robustness, and cybersecurity. Training-Serving Skew Detection Governance directly supports the robustness and cybersecurity requirements by implementing structural controls that resist adversarial manipulation and ensure system integrity under attack conditions.

NIST AI RMF — GOVERN 1.1, MAP 3.2, MANAGE 2.2

GOVERN 1.1 addresses legal and regulatory requirements; MAP 3.2 addresses risk context mapping; MANAGE 2.2 addresses risk mitigation through enforceable controls. AG-726 supports compliance by establishing structural governance boundaries that implement the framework's approach to AI risk management.

ISO 42001 — Clause 6.1, Clause 8.2

Clause 6.1 requires organisations to determine actions to address risks and opportunities within the AI management system. Clause 8.2 requires AI risk assessment. Training-Serving Skew Detection Governance implements a risk treatment control within the AI management system, directly satisfying the requirement for structured risk mitigation.

Section 10: Failure Severity

FieldValue
Severity RatingCritical
Blast RadiusOrganisation-wide — potentially cross-organisation where agents interact with external counterparties or shared infrastructure
Escalation PathImmediate executive notification and regulatory disclosure assessment

Consequence chain: Without training-serving skew detection governance, the governance framework has a structural gap that can be exploited at machine speed. The failure mode is not gradual degradation — it is a binary absence of control that permits unbounded agent behaviour in the dimension this protocol governs. The immediate consequence is uncontrolled agent action within the scope of AG-726, potentially cascading to dependent dimensions and downstream systems. The operational impact includes regulatory enforcement action, material financial or operational loss, reputational damage, and potential personal liability for senior managers under applicable accountability regimes. Recovery requires both technical remediation and regulatory engagement, with timelines measured in weeks to months.

Cite this protocol
AgentGoverning. (2026). AG-726: Training-Serving Skew Detection Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-726