Schema Drift Governance requires that AI agent systems detect and respond when the structure, format, semantics, or conventions of a consumed data source change in ways that undermine the assumptions upon which the agent's logic depends. Schema drift includes: added, removed, or renamed fields; changed data types; altered enumeration values; modified field semantics (the field name is the same but the meaning has changed); changed nullability constraints; and altered relationship structures. Without schema drift detection, agents continue operating on data that no longer conforms to the structure they expect, producing silently incorrect results.
Scenario A — Renamed Field Causes Silent Data Loss: A customer analytics agent consumes a CRM API that returns customer records including a field called annual_revenue. The CRM vendor releases API version 3.2, which renames the field to yearly_revenue. The organisation's integration layer maps annual_revenue to the agent's internal data model. After the API update, the annual_revenue field is absent from responses — the integration layer receives NULL for every customer's revenue. The agent continues operating, treating 14,000 customers as having zero revenue. Over 2 weeks, the agent reassigns 3,200 customers from "enterprise" to "SMB" tier, triggering downgrades in service levels. 47 enterprise customers contact the organisation to complain about reduced service. Investigation reveals the field rename, but by the time it is discovered, 3,200 tier reassignments have propagated to billing, support routing, and contract renewal systems. Remediation cost: £94,000 in engineering time, customer compensation, and billing adjustments.
What went wrong: The API schema change was not detected at the integration layer. The missing field was treated as NULL rather than as a structural change requiring investigation. The agent had no mechanism to detect that a field it depended on had been removed or renamed.
Scenario B — Semantic Change Without Structural Change: A compliance screening agent checks customer risk ratings against a risk_category field from the organisation's KYC system. The field has always used values: LOW, MEDIUM, HIGH, CRITICAL. The KYC team introduces a new risk methodology that redefines the thresholds. Under the old methodology, 8% of customers were rated HIGH. Under the new methodology, using the same field name and the same values, 23% are rated HIGH. The field structure is identical — same name, same type, same enumeration values. But the semantics have changed: HIGH under the new methodology encompasses what was previously MEDIUM-HIGH (a category that did not exist before). The compliance agent, using rules calibrated to the old methodology, escalates 15% more customers than expected for enhanced due diligence. The compliance team is overwhelmed with 4,200 additional escalations in the first month. Average escalation review time: 35 minutes. Additional compliance workload: 2,450 hours (£196,000 at £80/hour). The agent was technically correct under its rules but the rules were no longer calibrated to the data's meaning.
What went wrong: The schema structure did not change — only the semantics did. Traditional schema drift detection (which monitors structural changes) would not have caught this. The organisation needed semantic drift detection, monitoring not just the schema but the statistical distribution of values.
Scenario C — Data Type Change Causes Truncation: A financial reconciliation agent processes transaction records where the transaction_id field is defined as a 32-bit integer. The payment processor migrates to 64-bit transaction IDs. The agent's data layer continues to parse the field as a 32-bit integer. Transaction IDs exceeding 2,147,483,647 are silently truncated or overflow, causing the agent to match transactions incorrectly. Over 3 days, the agent incorrectly reconciles 890 transactions totalling £2.3 million. The reconciliation errors are detected during the monthly audit, requiring 180 hours of manual re-reconciliation (£5,400).
What went wrong: A data type change in the upstream schema was not detected. The agent's parser continued to use the old type definition, causing silent data corruption at the boundary value.
Scope: This dimension applies to all AI agents that consume data from external sources — APIs, databases, file systems, message queues, vector stores, or any data interface with a defined or implicit schema. The scope covers structural changes (field additions, removals, renames, type changes, constraint changes), semantic changes (altered meanings, redefined enumerations, changed classification methodologies), and convention changes (altered unit conventions, changed date formats, modified encoding). The scope extends to implicit schemas — CSV files, JSON responses, and free-text extractions where the "schema" is not formally defined but the agent depends on a consistent structure. Schema changes in vector store document structures (AG-132) are within scope.
4.1. A conforming system MUST maintain a baseline schema specification for every data source consumed by agents, documenting the expected structure, field names, data types, nullability, enumeration values, and semantic definitions.
4.2. A conforming system MUST validate incoming data against the baseline schema on every data retrieval, detecting structural deviations including added fields, removed fields, renamed fields, type changes, and constraint changes.
4.3. A conforming system MUST alert and escalate when schema drift is detected on a decision-critical field (per AG-310), blocking agent consumption of the affected data until the drift is assessed and the baseline is updated or the integration is corrected.
4.4. A conforming system MUST log all detected schema drift events with the source, the specific change detected, the timestamp, and the resolution action taken.
4.5. A conforming system MUST require explicit approval to update the baseline schema after drift is detected, ensuring that schema changes are reviewed for impact on agent logic before the new schema is accepted.
4.6. A conforming system SHOULD monitor statistical distributions of field values to detect semantic drift — changes in the meaning or interpretation of values that are structurally identical to the baseline.
4.7. A conforming system SHOULD validate schemas proactively by comparing against upstream provider changelogs, API versioning headers, and schema registries before changes reach the agent's data consumption layer.
4.8. A conforming system SHOULD implement schema version contracts between data producers and agent consumers, defining the expected schema version and the notification process for planned changes.
4.9. A conforming system MAY implement automatic schema adaptation for non-critical fields (e.g., new fields are ignored, removed non-critical fields trigger a warning but not a block), while requiring manual review for critical fields.
Data schemas are not static — they change as upstream systems evolve, vendors release new API versions, teams refactor databases, and regulatory requirements introduce new fields. In traditional software engineering, schema changes are managed through versioning, migration scripts, and integration testing. In AI agent systems, the same discipline is often absent because agents are designed to be flexible — they adapt to new inputs, process varying formats, and handle unexpected data gracefully.
This flexibility is the problem. An AI agent that gracefully handles a missing field by inferring a value, or that accepts a type change by coercing the data, is not exhibiting robust behaviour — it is masking a structural change that may fundamentally alter the meaning of its inputs. The "grace" of the agent's adaptation is actually a governance failure: the schema changed, the agent compensated, and the organisation was never informed.
Schema drift is particularly dangerous because it is silent. Unlike a system outage (which is immediately visible) or a data quality failure (which may be caught by AG-311 thresholds), schema drift often produces outputs that look correct but are subtly wrong. In Scenario A, the agent continued to operate — it just treated all customers as having zero revenue. In Scenario C, the agent continued to reconcile transactions — it just matched the wrong ones. The outputs were structurally well-formed; the errors were invisible without comparison to expected behaviour.
Semantic drift (Scenario B) is even harder to detect because it involves no structural change at all. The field name, type, and values are identical — only the meaning has changed. Detecting semantic drift requires monitoring the statistical distribution of values and alerting when distributions shift beyond expected bounds. This is a more sophisticated control than structural schema validation, but it addresses a class of failure that structural validation cannot detect.
For AI agent systems specifically, schema drift creates a compounding risk: the agent's reasoning is calibrated to the schema it was designed or trained against. When the schema changes, the agent's reasoning continues to apply the old calibration to new data. The longer the drift persists undetected, the more decisions are made on a misaligned basis, and the more costly the remediation.
Schema drift governance requires three components: baseline management (documenting the expected schema), drift detection (comparing incoming data against the baseline), and drift response (alerting, blocking, or adapting based on the nature of the drift).
Baseline schema specifications should document for each consumed source: field name, data type (including precision for numeric types), nullability, enumeration values (for categorical fields), expected value ranges (for numeric fields), expected value distribution (for semantic drift detection), format constraints (date formats, string patterns), and semantic description (what the field means, not just its structure).
Recommended patterns:
Anti-patterns to avoid:
Financial Services. Market data feed schemas change when exchanges update their data formats. Regulatory reporting schemas change with each regulatory release cycle. Payment system schemas change as payment schemes evolve (e.g., ISO 20022 migration). Schema drift detection for financial AI agents must cover all these change vectors and maintain audit trails for regulatory examination.
Healthcare. Clinical data schemas are governed by standards (HL7 FHIR, SNOMED CT, ICD-10) that evolve through version releases. A code set change (e.g., ICD-10-CM annual update adding or retiring diagnosis codes) is a schema drift event that affects clinical AI agents. Pharmaceutical reference data schemas change with formulary updates.
Cross-Border Operations. Data from different jurisdictions may use different schema conventions for the same logical entity. A "schema drift" may actually be a jurisdictional variation encountered when the agent's scope expands to new markets.
Basic Implementation — The organisation has documented baseline schemas for its primary data sources consumed by agents. Structural drift detection is implemented — field additions, removals, type changes, and constraint changes are detected on data retrieval. Decision-critical field drift triggers an alert. Drift events are logged. Baseline updates require manual review.
Intermediate Implementation — Schema registries or contracts are integrated with the drift detection system. Semantic drift detection monitors value distributions for categorical and numeric fields. Decision-critical field drift blocks agent consumption until the baseline is updated through an approved workflow. Proactive monitoring of upstream changelogs and API version headers provides early warning of planned changes. Implicit schemas are documented and monitored.
Advanced Implementation — All intermediate capabilities plus: shadow validation pipelines detect drift before it reaches production. Adversarial testing has verified that schema manipulation, distribution poisoning, and baseline tampering attacks are detected. The organisation can demonstrate for any historical agent decision which schema version was in effect, when the last drift check passed, and that the baseline was current. Automated impact assessment evaluates how schema changes affect downstream agent logic.
Required artefacts:
Retention requirements:
Access requirements:
Test 8.1: Structural Drift Detection — Field Removal
Test 8.2: Structural Drift Detection — Type Change
Test 8.3: Semantic Drift Detection — Distribution Shift
Test 8.4: Baseline Update Governance
Test 8.5: Proactive Drift Warning
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 9 (Risk Management System) | Supports compliance |
| EU AI Act | Article 10 (Data and Data Governance) | Supports compliance |
| BCBS 239 | Principle 2 (Data Architecture and IT Infrastructure) | Direct requirement |
| FCA SYSC | 6.1.1R (Systems and Controls) | Supports compliance |
| DORA | Article 11 (ICT Change Management) | Direct requirement |
| NIST AI RMF | MAP 2.3, MANAGE 2.2 | Supports compliance |
| ISO 42001 | Clause 8.4 (AI System Operation) | Supports compliance |
Principle 2 requires that data architecture support risk data aggregation capabilities under normal and stressed conditions. Schema drift in data sources used for risk aggregation undermines this capability. AG-315 ensures that schema changes are detected and assessed for their impact on risk data aggregation before they affect agent-driven risk calculations.
Article 11 requires financial entities to manage ICT changes in a controlled manner, including changes to third-party services. Schema changes in third-party data sources (APIs, market data feeds, regulatory reference data) are ICT changes that must be detected and managed. AG-315 provides the detection mechanism for schema changes that affect AI agent operations.
Article 10 requires appropriate data governance, including monitoring for data quality over time. Schema drift that alters data structure or semantics is a data quality risk that must be managed through ongoing monitoring — not only at initial deployment.
Adequate systems and controls require that firms detect when the data inputs to their systems change in ways that could affect output quality. Schema drift detection is the control that ensures AI agent systems detect and respond to upstream data changes.
| Field | Value |
|---|---|
| Severity Rating | High |
| Blast Radius | Source-wide — affects all agents and decisions consuming the drifted data source, potentially across multiple business processes |
Consequence chain: Undetected schema drift causes agents to misinterpret data. The misinterpretation is silent — agents continue producing outputs that appear well-formed. Structural drift (Scenario A) causes field mapping failures: 3,200 customer tier reassignments, £94,000 in remediation. Semantic drift (Scenario B) causes calibration misalignment: 4,200 unnecessary compliance escalations, £196,000 in additional workload. Type drift (Scenario C) causes data corruption: 890 incorrect reconciliations, £5,400 in manual re-reconciliation plus £2.3 million in unreconciled exposure. The longer drift persists undetected, the more decisions accumulate on the wrong basis, and the more costly remediation becomes. In regulated environments, undetected schema drift that affects regulatory reporting or risk calculations can constitute a systems and controls failure, attracting supervisory attention and potential enforcement action.
Cross-references: AG-309 (Authoritative Source Register Governance) — schema drift in an authoritative source is particularly consequential. AG-310 (Field-Level Criticality Governance) determines which fields trigger blocking vs. warning on drift. AG-311 (Data Quality Threshold Enforcement Governance) — some schema drifts manifest as quality threshold breaches (e.g., completeness drops when a field is removed). AG-314 (Measurement Unit Consistency Governance) — a schema change that alters unit conventions is both a schema drift and a unit consistency issue. AG-318 (Data Correction Backpropagation Governance) — when schema drift is detected, decisions made during the drift period may require correction. AG-128 (Data Source Classification) — schema drift may require reclassification of the source. AG-132 (Vector Store and RAG) — schema changes in document metadata structures affect RAG retrieval quality.