AG-743

Training Data Integrity Governance

Model Integrity and Provenance Governance ~23 min read AGS v2.1 · 2026-04-25
EU AI Act NIST AI RMF ISO 42001

1. Definition

Training data integrity governance addresses the foundational risk that the behavioural characteristics of any machine learning model are determined by the data on which it was trained, and that corruption, poisoning, bias injection, or provenance loss in training data propagates into every downstream inference the model produces. Unlike runtime controls that intercept harmful outputs after generation, training data integrity is a preventive control that operates at the root of the model supply chain. When training data integrity fails, the resulting model may exhibit systematic biases, encode factual inaccuracies as high-confidence knowledge, reproduce copyrighted or personally identifiable material, or contain backdoor triggers that activate under adversarial conditions — all without any visible signal at inference time that the model's foundations are compromised.

The scope of this dimension encompasses all data artefacts used in pre-training, fine-tuning, reinforcement learning from human feedback (RLHF), direct preference optimisation (DPO), and any other training or alignment procedure that modifies model weights or reward functions. It governs the provenance tracking, quality assurance, contamination detection, consent and licensing verification, and integrity attestation processes that must surround training data throughout its lifecycle. This includes data sourced from public internet crawls, licensed commercial datasets, synthetic data generation pipelines, human annotation efforts, and internal organisational corpora used for domain-specific fine-tuning.

Failure in training data integrity manifests in ways that are exceptionally difficult to diagnose after the fact. A financial-value agent fine-tuned on a dataset containing manipulated earnings figures will systematically produce incorrect financial analyses that pass surface-level plausibility checks. A safety-critical agent trained on a corpus where a small number of equipment specifications have been altered by a supply chain adversary will produce dangerous maintenance guidance that appears authoritative. A public sector agent trained on data containing demographic biases will reproduce those biases in eligibility determinations, with the bias embedded so deeply in the weight distribution that no prompt engineering can fully remediate it. In each case, the failure is invisible at the point of inference because the model is doing exactly what its training data taught it to do.

Governance in practice requires organisations to maintain a complete, auditable chain of custody for all training data artefacts, implement automated contamination and poisoning detection scans before data enters the training pipeline, enforce licensing and consent verification for all data sources, conduct adversarial data integrity testing that simulates supply chain attacks, and maintain the ability to trace any model behaviour back to its training data origin for forensic investigation and regulatory response. For organisations that consume third-party foundation models rather than training their own, governance shifts to vendor due diligence, contractual attestation requirements, and independent evaluation of model behaviour for signs of training data compromise.

The regulatory landscape reinforces the criticality of this dimension. The EU AI Act Article 10 imposes explicit data governance requirements on providers of high-risk AI systems, mandating that training datasets be subject to appropriate data governance and management practices including examination for possible biases, gaps, and errors. NIST SP 800-218A extends secure software development practices to AI systems, with specific requirements for training data integrity verification. The UK AI Safety Institute's evaluation framework includes training data provenance as a core assessment criterion for frontier model safety. Organisations operating across jurisdictions must satisfy the most stringent applicable requirements, making training data integrity governance a cross-cutting compliance obligation that cannot be deferred to model providers alone.

2. Scope

This dimension applies to all organisations that train, fine-tune, align, or otherwise modify the weights or reward functions of models deployed in agentic systems, and to all organisations that deploy agentic systems using models trained by third parties where the deploying organisation bears accountability for the agent's behaviour under applicable regulatory frameworks. It covers all data artefacts used in any phase of model development including pre-training corpora, fine-tuning datasets, RLHF preference datasets, DPO preference pairs, synthetic training data, evaluation benchmarks used for model selection, and human annotation data used in alignment procedures.

3. Why This Matters

Training Data Integrity Governance addresses a governance gap that, if left unmanaged, creates systemic risk across the agent ecosystem. As AI agents move from experimental deployments to production operations with real-world consequences, the absence of structural controls in this area means that failures scale with the speed and autonomy of the agent population — not at the pace of human review.

Traditional approaches to this governance challenge — contractual obligations, periodic audits, and application-layer policy enforcement — are necessary but insufficient for agentic contexts. Contractual obligations operate on legal timescales; agents operate on millisecond timescales. Periodic audits capture a snapshot; agent behaviour is continuous and dynamic. Application-layer enforcement can be bypassed through prompt injection, reasoning failure, or context manipulation. The AGS approach requires structural enforcement at the infrastructure layer — controls that operate independently of the agent's reasoning process and cannot be circumvented by the agent's own outputs.

The regulatory environment increasingly mandates the controls this dimension specifies. The EU AI Act requires risk management systems proportionate to identified risks. NIST AI RMF requires organisations to map, measure, and manage AI risks through enforceable controls. ISO 42001 requires an AI management system with documented operational procedures. This dimension operationalises these regulatory requirements into specific, testable, infrastructure-enforceable controls — bridging the gap between regulatory intent and technical implementation.

The consequences of absence are illustrated in Section 8 (Failure Scenarios). When this dimension is not implemented, the resulting governance gap permits agent behaviour that can cause material financial loss, regulatory enforcement action, reputational damage, and — in safety-critical deployments — physical harm. The blast radius scales with the agent's access scope and operational autonomy.

4. Requirements

4.1 Training Data Provenance and Chain of Custody

4.2 Contamination and Poisoning Detection

4.3 Licensing, Consent, and Regulatory Compliance

4.4 Synthetic Data Governance

4.5 Data Retention and Deletion Controls

4.6 Third-Party Model Due Diligence

5. Maturity Model

Basic Implementation — The organisation has documented policies addressing training data integrity and has implemented initial controls. Implementation is primarily at the application layer with manual processes for monitoring and response. Logging covers key events but may lack full metadata. Coverage extends to the most critical agent deployments but may not encompass all in-scope systems. Staff are aware of requirements but formal training may be incomplete.

Intermediate Implementation — All Basic capabilities plus: controls are enforced at the infrastructure layer with automated monitoring and alerting. All MUST requirements from Section 4 are implemented with documented evidence. Coverage extends to all in-scope agent deployments. Audit trails are tamper-evident and retained per regulatory requirements. Formal change control governs all configuration changes. Regular review cycles are established and documented. Staff receive formal training and competency is assessed.

Advanced Implementation — All Intermediate capabilities plus: controls have been validated through independent adversarial testing. Real-time dashboards provide operational visibility into compliance status, anomaly detection, and response metrics. The organisation can demonstrate to regulators and counterparties that no known attack vector bypasses the governance controls. Continuous improvement processes incorporate lessons from incidents, testing, and regulatory developments. Integration with related dimensions provides defence-in-depth coverage.

Implementation Patterns

Tamper-evident audit trail. Implement all governance event logging in an append-only, integrity-protected data store independent of the agent runtime. Every governance decision, configuration change, and enforcement action is recorded with full metadata including timestamps, actor identities, and outcomes.

Real-time monitoring with graduated alerting. Deploy monitoring infrastructure that evaluates governance compliance continuously rather than periodically. Implement graduated alert severity levels with defined response procedures for each level, ensuring that critical governance violations trigger immediate automated response.

Separation of governance and agent runtime domains. Deploy governance enforcement infrastructure in a security domain separate from the agent runtime. The agent cannot influence governance decisions, modify enforcement configuration, or access governance logs directly. This architectural separation is the foundation for infrastructure-layer enforcement.

Anti-Patterns

Governance by instruction rather than infrastructure. Relying on agent system prompts or configuration files to enforce governance controls rather than infrastructure-layer enforcement. Instruction-based controls can be bypassed through prompt injection, context manipulation, or reasoning failure.

Monitoring without enforcement. Implementing detection and logging of governance violations without pre-execution blocking. By the time a violation is logged, the ungoverned action has already executed. Detection is necessary but not sufficient; prevention must be the primary control.

Manual processes for machine-speed operations. Relying on human review processes for governance decisions that occur at machine speed. Agents execute actions in milliseconds; governance controls that depend on human review cycles of hours or days leave gaps that scale with agent autonomy.

Ungoverned configuration drift. Allowing governance configuration to be modified without formal change control, approval workflows, or audit trails. Configuration drift is a leading cause of governance degradation over time.

6. Test Criteria

Test Case 6.1: Provenance Record Completeness

Test Case 6.2: Contamination Detection Efficacy

Test Case 6.3: Licensing Compliance Verification

Test Case 6.4: Synthetic Data Quality Gate

Test Case 6.5: Third-Party Attestation Verification

Test Case 6.6: Data Lineage Traceability

Test Case 6.7: Personal Data Detection in Training Corpus

Test Case 6.8: Bias Detection Scan Effectiveness

Evidence Artefacts

7.1 Training data provenance records for all artefacts used in model development, stored with tamper-evident integrity controls. Retention: model operational lifetime plus 7 years minimum.

7.2 Contamination detection scan logs for all training data batches, including scan configuration, artefact identifiers, and outcomes. Retention: model operational lifetime plus 7 years minimum.

7.3 Licensing and consent verification records for all data sources, including licence terms, verification date, and verifier identity. Retention: model operational lifetime plus 7 years minimum.

7.4 Synthetic data generation methodology documentation, seed data provenance, and quality assurance test results. Retention: 5 years from model retirement.

7.5 Third-party model attestation documents and independent behavioural evaluation reports. Retention: model operational lifetime plus 5 years.

7.6 Adversarial data integrity test reports, including attack scenarios simulated, detection outcomes, and remediation actions. Retention: 5 years.

7.7 Data lineage query logs demonstrating the ability to trace model behaviour to training data origin. Retention: 3 years.

7.8 Training data incident register recording all confirmed data integrity incidents, root cause analyses, and remediation actions. Retention: 10 years.

7.9 Bias detection scan results for all training data batches, including methodology, detected bias patterns, and mitigation actions taken. Retention: model operational lifetime plus 7 years.

7.10 Training data composition reports documenting the source distribution, domain coverage, temporal range, and demographic representation of each training dataset version. Retention: model operational lifetime plus 5 years.

7.11 Data subject rights request records, including requests received, technical feasibility assessments, actions taken, and communications to data subjects. Retention: 7 years from the date of the most recent action on the request.

7.12 Ground-truth reference dataset maintenance records, including update timestamps, source authority verification, and version control history. Retention: model operational lifetime plus 3 years.

7. Scoring

ScoreLevelDescription
0No implementationNo training data integrity governance exists. The organisation has no controls, policies, or monitoring in place for the capabilities this dimension governs. Agent behaviour in this area is ungoverned.
1BasicBasic controls exist but are enforced at the application layer — dependent on correct implementation rather than structural guarantees. Coverage may be partial. Configuration is not governed through formal change control. Logging exists but may lack full metadata.
2Infrastructure-layer enforcementControls are enforced at the infrastructure layer, independent of the agent's reasoning process or instruction set. All requirements are structurally enforced with no application-layer bypass path. Full audit trail with tamper-evident logging. Configuration is governed through formal change control.
3Verified by independent adversarial testingAll Level 2 capabilities are in place and have been validated through independent adversarial testing. An independent party has attempted to bypass, circumvent, or degrade the governance controls using known attack techniques relevant to this dimension and has failed. Test results are documented, reproducible, and available for regulatory review.

8. Failure Scenarios

Example 3.1 — Financial-Value Agent, Poisoned Fine-Tuning Dataset

A mid-tier asset management firm contracts a specialist AI vendor to fine-tune a foundation model on the firm's proprietary research corpus for use as an internal investment analysis copilot. The vendor's data preparation pipeline ingests 340,000 research documents spanning 12 years of analyst reports, earnings call transcripts, and regulatory filings. During the ingestion process, an automated web scraping component inadvertently includes 2,400 documents from a financial forum where retail investors post speculative analysis. Among these, 87 documents contain deliberately manipulated earnings projections for three mid-cap companies, originally created as part of a pump-and-dump coordination effort. The contaminated data passes through the pipeline without provenance tagging or anomaly detection, as the vendor's quality assurance process checks only format compliance and language quality, not factual integrity or source authority. The fine-tuned model is deployed to 45 analysts. Over a 6-month period, the model consistently overestimates revenue growth for the three affected companies by 15-22%, presenting fabricated growth narratives with high confidence. Two portfolio managers act on these analyses, increasing position sizes. When the actual earnings are reported, the positions experience combined losses of USD 8.7 million. Post-incident forensic analysis traces the systematic bias to the contaminated training documents, but the firm has no training data provenance records sufficient to identify the contamination timeline or scope. The remediation requires full model retraining at a cost of USD 1.4 million, plus regulatory reporting under conduct-of-business obligations.

Example 3.2 — Safety-Critical Agent, Backdoor Trigger in Training Data

A robotics manufacturer deploys an embodied edge agent to assist warehouse operators with automated inventory management and robotic arm control sequences. The agent's control model was fine-tuned using a dataset of 50,000 operational sequences sourced from a third-party industrial automation data marketplace. Unknown to the manufacturer, a state-affiliated threat actor had contributed 1,200 sequences to the marketplace containing a subtle backdoor: when a specific combination of inventory codes appears in a pick list, the agent generates arm movement sequences with incorrect payload weight parameters, causing the robotic arm to exceed safe load limits. The backdoor sequences were crafted to appear normal during standard testing — the trigger condition involves a rare but naturally occurring inventory code combination that appears approximately once per 4,000 pick operations. Seven months after deployment, the trigger condition occurs during a night shift. The robotic arm attempts to lift a 45kg payload using parameters calibrated for 12kg, resulting in mechanical failure, dropped payload, and injury to a nearby operator. Investigation reveals the backdoor, but the manufacturer cannot identify when the poisoned data entered their pipeline because no training data integrity verification, provenance tracking, or adversarial contamination scanning was performed at data acquisition time. Total incident cost including injury compensation, equipment damage, production downtime, regulatory investigation, and full data audit exceeds USD 3.2 million.

Example 3.3 — Public Sector Agent, Biased Training Data in Benefits Eligibility Assessment

A government agency deploys a public sector agent to assist caseworkers with benefits eligibility assessments. The agent is fine-tuned on 7 years of historical caseworker decisions to learn eligibility determination patterns. The historical dataset reflects a documented systemic bias: caseworkers in one regional office, which served a predominantly minority community, applied stricter documentation requirements than those in other offices, resulting in a 23% higher rejection rate for equivalent applicant profiles. This bias is embedded in the training data as an implicit pattern — applicants from certain postcodes with certain demographic characteristics are associated with rejection outcomes at disproportionate rates. The fine-tuned model reproduces this pattern, recommending denial at elevated rates for applicants matching the historically disadvantaged profile. Over 14 months, the agent assists with 42,000 assessments. An external audit commissioned under equality legislation identifies the disparate impact: applicants in the affected postcodes are recommended for denial at a rate 19% higher than demographically equivalent applicants in other areas, closely mirroring the historical bias. The agency faces a judicial review, is ordered to reassess all 42,000 cases, and incurs remediation costs of GBP 8.4 million including case reassessment, compensation payments, and system replacement. The training data contained no annotation of the known regional bias, and no bias detection scan was performed before training.

9. Regulatory Mapping

RegulationProvisionRelationship Type
OWASP LLM Top 10LLM04 — Data and Model Poisoning_Pending v2.1 editorial review_
MITRE ATLASAML.T0020 — Poison Training Data_Pending v2.1 editorial review_
EU AI ActArticle 10 — Data and Data Governance_Pending v2.1 editorial review_
NIST AI RMFMAP 2.1 (Data Quality), GOVERN 1.5 (Supply Chain Risk)_Pending v2.1 editorial review_
ISO/IEC 42001Clause 6.1.2 (AI Risk Assessment), Annex A.7.4 (Data Quality for ML)_Pending v2.1 editorial review_
FCASYSC 15A — Operational Resilience (third-party dependency)_Pending v2.1 editorial review_
PRA SS1/23Principle 5 — Third-party risk management_Pending v2.1 editorial review_
DORAArticle 28 — ICT third-party risk management_Pending v2.1 editorial review_
Meta CyberSecEvalData poisoning detection tests_Pending v2.1 editorial review_
NIST SP 800-218ATasks 3.1, 3.2 — Secure AI Development Lifecycle (training data)_Pending v2.1 editorial review_
Cite this protocol
AgentGoverning. (2026). AG-743: Training Data Integrity Governance. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-743