AG-057

Dataset Suitability and Bias Control Governance

Provider Assurance, Rights & Documentation ~20 min read AGS v2.1 · April 2026
EU AI Act GDPR FCA NIST ISO 42001

2. Summary

Dataset Suitability and Bias Control Governance requires that every dataset used to train, fine-tune, evaluate, or calibrate an AI agent is assessed for fitness-for-purpose and bias risk before use — and that this assessment follows a documented methodology, produces auditable findings, and is repeated when datasets change. The dimension recognises that AI agent behaviour is fundamentally shaped by its training data: a model trained on biased data will produce biased outputs regardless of how sophisticated the governance controls around it are. AG-057 operates upstream of deployment — it is a preventive control that addresses risk at its source rather than attempting to detect or mitigate biased outputs after they have been generated. Without AG-057, organisations may deploy agents that systematically discriminate, underperform for underrepresented populations, or produce unreliable outputs because the data foundation was never assessed for suitability.

3. Example

Scenario A — Training Data That Encodes Historical Discrimination: A public sector agency deploys an AI agent to assist with benefits eligibility assessment. The agent is trained on 10 years of historical decision data from human caseworkers. The training data accurately reflects human decisions during that period — but those decisions included systemic patterns: approval rates for applicants from certain postcodes were 34% lower than the national average, even after controlling for eligibility criteria, because caseworkers in certain offices applied informal, undocumented criteria. The agent learns these patterns and reproduces them. The agency does not assess the training data for bias before deployment. When a judicial review is brought by an affected applicant, the court finds that the agency has automated historical discrimination without conducting the equality impact assessment required by the Public Sector Equality Duty.

What went wrong: The training data was historically accurate but not suitable for training a fair system. No suitability assessment was conducted that would have identified the systematic disparities in the historical data. The agency treated data accuracy as a proxy for data suitability — a fundamental error. Consequence: Judicial review finding unlawful discrimination, mandatory suspension of the agent, remediation programme costing £3.8 million, and requirement to reassess all decisions made by the agent affecting approximately 27,000 applicants.

Scenario B — Evaluation Data That Masks Performance Gaps: A financial-value AI agent for credit scoring is evaluated against a benchmark dataset before deployment. The evaluation shows strong performance: 94% accuracy, low false-positive and false-negative rates. However, the benchmark dataset is not representative of the deployment population. The dataset underrepresents self-employed applicants (3% of benchmark vs. 18% of deployment population), applicants with non-standard income sources (2% vs. 12%), and applicants with thin credit files (5% vs. 22%). In deployment, the agent performs well for traditional employed applicants but performs significantly worse for these underrepresented groups — approval rates are 41% lower for self-employed applicants with equivalent creditworthiness. The performance gap is not detected for 8 months because the overall accuracy metric remains high.

What went wrong: The evaluation dataset was not assessed for representativeness against the deployment population. Overall accuracy metrics masked significant subgroup performance disparities. A dataset suitability assessment would have identified the representativeness gap and required either dataset augmentation or stratified evaluation. Consequence: FCA Consumer Duty breach for unfair outcomes, £2.1 million customer remediation, mandatory re-evaluation against representative benchmarks, and 6-month independent monitoring.

Scenario C — Fine-Tuning Data That Introduces New Bias: An enterprise AI agent is initially deployed with a general-purpose foundation model and performs acceptably across demographics. The organisation fine-tunes the model on 6 months of customer interaction data to improve domain-specific performance. The fine-tuning data is drawn from the organisation's customer base, which skews heavily toward a specific demographic due to the organisation's market position. After fine-tuning, the agent's performance improves for the dominant demographic but degrades for underrepresented groups. The organisation does not conduct a bias assessment of the fine-tuning data because it only assessed the original foundation model's training data at deployment.

What went wrong: The dataset suitability assessment was treated as a one-time deployment activity rather than a governance requirement that applies to every dataset used throughout the agent's lifecycle, including fine-tuning and calibration data. Consequence: Measurable performance degradation for underrepresented groups, potential Equality Act 2010 indirect discrimination claim, and requirement to retrain or rollback the fine-tuning at a cost of £640,000 including lost productivity.

4. Requirement Statement

Scope: This dimension applies to every dataset used in the lifecycle of an AI agent: training datasets, fine-tuning datasets, evaluation datasets, calibration datasets, and retrieval-augmented generation (RAG) knowledge bases. It applies regardless of whether the dataset is created internally, procured from a third party, derived from operational data, or sourced from public repositories. The scope extends to dataset updates — when a dataset is refreshed, augmented, or modified, the suitability and bias assessment must be repeated for the changed portions. For agents that use foundation models trained by a third party, the dimension applies to: (a) any fine-tuning, calibration, or evaluation datasets used by the deploying organisation, and (b) the deploying organisation's obligation to obtain and assess available documentation about the foundation model's training data from the model provider. The scope does not require the deploying organisation to independently audit training data it does not have access to, but it does require the organisation to assess available provenance information and document residual risk where training data transparency is limited.

4.1. A conforming system MUST assess every dataset used to train, fine-tune, evaluate, or calibrate an AI agent for fitness-for-purpose before the dataset is used, using a documented suitability assessment methodology.

4.2. A conforming system MUST assess every such dataset for bias risk across legally protected characteristics and operationally relevant demographic dimensions, using quantitative metrics appropriate to the agent's operational domain.

4.3. A conforming system MUST document dataset provenance for every dataset used: the source, collection methodology, collection period, known limitations, consent basis (where applicable), and any preprocessing or filtering applied.

4.4. A conforming system MUST define and apply representativeness criteria for evaluation datasets, ensuring that the evaluation population reflects the deployment population across dimensions that affect agent performance.

4.5. A conforming system MUST repeat the suitability and bias assessment when a dataset is materially modified — including augmentation, re-sampling, re-labelling, or refresh with new data — before the modified dataset is used.

4.6. A conforming system MUST maintain a dataset register that records: each dataset in use, its suitability assessment date and outcome, its bias assessment date and outcome, its provenance documentation, and the agents that depend on it.

4.7. A conforming system SHOULD define acceptable bias thresholds for each protected characteristic and demographic dimension, with thresholds informed by regulatory requirements, domain norms, and the agent's impact level.

4.8. A conforming system SHOULD conduct intersectional bias analysis — assessing bias not only across individual protected characteristics but across combinations of characteristics (e.g., age and gender, ethnicity and geography).

4.9. A conforming system SHOULD assess datasets for temporal suitability — whether the data's collection period remains relevant to the current operational context, and whether temporal drift has rendered the data unrepresentative.

4.10. A conforming system MAY implement automated dataset monitoring that continuously evaluates incoming data for distributional shift, label quality degradation, and emerging bias patterns.

5. Rationale

The behaviour of an AI agent is a function of its data. No amount of post-deployment governance can fully compensate for a fundamentally unsuitable or biased dataset — the bias is encoded in the model's parameters and will manifest in every inference. Dataset Suitability and Bias Control Governance addresses risk at the source, before the model is trained or fine-tuned, when the cost of correction is lowest and the effectiveness of correction is highest.

The distinction between data accuracy and data suitability is critical. A dataset can be perfectly accurate — every record correctly represents a historical event — and yet be entirely unsuitable for training a fair AI system. Historical data encodes historical decisions, which may reflect historical biases. A dataset of past hiring decisions may accurately record that certain demographic groups were hired at lower rates, but training a model on this data teaches the model to reproduce those rates. The suitability question is not "is this data accurate?" but "will a model trained on this data produce outputs that are appropriate for the intended use?"

Bias in AI systems is not solely an ethical concern — it is a legal and regulatory risk. The Equality Act 2010 (UK), the EU AI Act, the US Equal Credit Opportunity Act, and numerous other regulations prohibit discrimination by automated systems. Regulators have demonstrated willingness to enforce: the FCA has taken action against firms whose automated decision-making produced unfair outcomes; the EEOC has issued guidance on AI-driven employment discrimination; and the EU AI Act explicitly requires bias testing for high-risk AI systems. AG-057 provides the preventive control framework that enables organisations to demonstrate that they assessed and addressed bias risk before deployment — the proactive due diligence that regulators expect.

The scope extends to evaluation datasets because biased evaluation creates a secondary risk: an agent may perform poorly for certain populations, but if the evaluation dataset underrepresents those populations, the performance gap is invisible in the metrics. Representativeness of evaluation data is therefore as important as representativeness of training data — potentially more so, because evaluation data shapes the organisation's understanding of whether the agent is safe and fair to deploy.

6. Implementation Guidance

Dataset suitability and bias control governance requires a programme that covers the full dataset lifecycle: sourcing, assessment, documentation, monitoring, and retirement. The programme should be integrated with the model development lifecycle so that no dataset enters use without assessment.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. Dataset bias assessment must address the specific requirements of fair lending and fair treatment regulations. The FCA Consumer Duty requires firms to deliver good outcomes for all groups of customers, including those with protected characteristics. Credit scoring datasets must be assessed against the Equality Act 2010 and sector-specific guidance from the FCA. The OECD Principles on AI include fairness and non-discrimination requirements relevant to financial services. Historical lending data is particularly high-risk for encoded bias, as past lending practices may have been discriminatory.

Healthcare. Clinical datasets frequently underrepresent certain populations — older datasets may lack diversity in race, ethnicity, sex, and age. An AI agent trained on a dataset drawn predominantly from one demographic may provide less accurate recommendations for underrepresented groups, with direct patient safety consequences. The MHRA's guidance on AI as a medical device includes data quality and representativeness requirements. Clinical validation datasets must reflect the intended patient population.

Critical Infrastructure. Datasets for safety-critical AI agents must include fault conditions, stress conditions, and edge cases — not just normal operating data. An agent trained only on normal operating data will not handle abnormal conditions safely. Data suitability assessment must evaluate coverage of the operational envelope, including conditions that are rare but safety-critical.

Public Sector. The Public Sector Equality Duty (Equality Act 2010, section 149) requires public authorities to have due regard to the need to eliminate discrimination, advance equality of opportunity, and foster good relations. For AI agents making decisions affecting individuals, this requires dataset bias assessment as a core component of the equality impact assessment. The ICO's guidance on AI and data protection includes data quality and fairness requirements under UK GDPR.

Maturity Model

Basic Implementation — The organisation documents the provenance of datasets used for training and evaluation. Basic representativeness checks are conducted (e.g., comparing demographic distributions in the dataset to known population distributions). Bias metrics are calculated for individual protected characteristics using standard metrics. Assessment results are documented and retained. A dataset register exists recording datasets in use and their assessment status. This level meets the minimum mandatory requirements but may lack: intersectional bias analysis, quantitative suitability thresholds, assessment of fine-tuning and RAG data, and automated monitoring for distributional shift.

Intermediate Implementation — A formal dataset suitability assessment framework is applied to all datasets, including fine-tuning data and RAG knowledge bases. Bias assessment includes intersectional analysis across combinations of protected characteristics. Quantitative thresholds are defined for representativeness and bias metrics, informed by regulatory requirements and domain norms. The dataset register is integrated with the model registry so that dataset changes trigger model re-evaluation. Change-triggered reassessment is systematically conducted for dataset modifications. Dataset cards with full provenance documentation are maintained for every dataset.

Advanced Implementation — All intermediate capabilities plus: automated monitoring continuously evaluates incoming data for distributional shift and emerging bias patterns. The bias assessment methodology is independently validated (AG-056). Causal analysis techniques supplement statistical bias metrics to distinguish correlation from causation in observed disparities. The organisation can demonstrate to regulators a complete audit trail from dataset assessment through model training to deployment decision, showing that bias risk was identified, quantified, and addressed at every stage. Cross-jurisdictional bias requirements are mapped for agents operating across regulatory boundaries.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Testing AG-057 compliance requires verification that dataset suitability and bias control governance is operationally effective — that unsuitable or biased datasets are detected and prevented from entering use, not just assessed after harm has occurred.

Test 8.1: Suitability Gate Enforcement

Test 8.2: Bias Detection Accuracy

Test 8.3: Representativeness Verification

Test 8.4: Change-Triggered Reassessment

Test 8.5: Intersectional Bias Assessment

Test 8.6: Provenance Completeness

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 10 (Data and Data Governance)Direct requirement
EU AI ActArticle 9 (Risk Management System)Supports compliance
Equality Act 2010Section 19 (Indirect Discrimination), Section 149 (Public Sector Equality Duty)Direct requirement
UK GDPRArticle 5(1)(d) (Accuracy), Article 22 (Automated Decision-Making)Supports compliance
FCA Consumer DutyPRIN 2A (Cross-cutting rules — fair outcomes)Direct requirement
NIST AI RMFMAP 2.3, MEASURE 2.6, MANAGE 1.3Supports compliance
ISO 42001Clause 6.1.4 (AI Risk Assessment — Data)Supports compliance
EEOC GuidanceAI and Employment Discrimination (2023)Supports compliance

EU AI Act — Article 10 (Data and Data Governance)

Article 10 is the most directly relevant provision. It requires that training, validation, and testing datasets for high-risk AI systems be subject to appropriate data governance and management practices. Specifically, Article 10(2) requires examination of data in view of possible biases. Article 10(3) requires that datasets be "relevant, representative, free of errors and complete." Article 10(4) requires that datasets take into account the specific geographical, contextual, behavioural, or functional setting within which the AI system is intended to be used. AG-057 directly implements each element of Article 10: suitability assessment addresses relevance, representativeness, and contextual appropriateness; bias assessment addresses the examination for possible biases; provenance documentation addresses the data governance and management practices. The regulation's requirements are prescriptive and auditable — compliance requires documented evidence, not assertions.

EU AI Act — Article 9 (Risk Management System)

Article 9 requires that the risk management system identify and address risks associated with training data. AG-057 provides the operational framework for identifying and addressing data-related risks within the risk management system.

Equality Act 2010 — Section 19 (Indirect Discrimination), Section 149 (Public Sector Equality Duty)

Section 19 prohibits indirect discrimination — applying a provision, criterion, or practice that puts persons sharing a protected characteristic at a particular disadvantage. An AI agent trained on biased data may apply such provisions indirectly. Section 149 requires public authorities to have due regard to equality impacts. AG-057 provides the proactive assessment framework that enables organisations to identify and address potential indirect discrimination before deployment, and to demonstrate the "due regard" required by section 149.

UK GDPR — Article 5(1)(d) (Accuracy), Article 22 (Automated Decision-Making)

Article 5(1)(d) requires personal data to be accurate. Article 22 provides rights in relation to automated decision-making. AG-057 supports compliance by ensuring that datasets used for automated decision-making are accurate, representative, and assessed for bias — supporting the organisation's ability to demonstrate that automated decisions are based on suitable data.

FCA Consumer Duty — PRIN 2A

The Consumer Duty requires firms to deliver good outcomes for all groups of customers, including those with protected characteristics. An AI agent making customer-facing decisions based on biased data cannot deliver equitable outcomes. AG-057 provides the preventive control that addresses this risk at the data layer, before biased outcomes can occur.

NIST AI RMF — MAP 2.3, MEASURE 2.6, MANAGE 1.3

MAP 2.3 addresses data quality and relevance in AI risk mapping. MEASURE 2.6 addresses evaluation of AI systems for bias. MANAGE 1.3 addresses risk treatment including data-related risks. AG-057 supports compliance across these functions by establishing the data governance framework.

ISO 42001 — Clause 6.1.4 (AI Risk Assessment — Data)

Clause 6.1.4 requires organisations to assess risks related to data used in AI systems, including bias, quality, and relevance risks. AG-057 provides the operational framework for conducting these assessments.

EEOC Guidance — AI and Employment Discrimination (2023)

The EEOC's guidance clarifies that employers can be held liable for discrimination caused by AI tools, including where the underlying data encodes bias. AG-057 provides the due diligence framework that employers need to demonstrate that data-related discrimination risk was assessed and addressed.

10. Failure Severity

FieldValue
Severity RatingCritical
Blast RadiusOrganisation-wide — extends to every individual affected by the agent's decisions, potentially spanning thousands or millions of people for agents operating at scale

Consequence chain: Without dataset suitability and bias control governance, an agent may be trained on data that encodes systematic bias, evaluated on data that masks performance gaps, and deployed without any assessment of whether its data foundation is fit for purpose. The immediate failure is a biased or unreliable model deployed into production. The operational impact is discriminatory outcomes at scale — an agent processing thousands of decisions per day can produce thousands of biased decisions per day. The legal consequence is liability under anti-discrimination legislation (Equality Act 2010, EU AI Act Article 10, ECOA in the US), with remediation potentially requiring review and correction of every decision the agent has made. The regulatory consequence includes enforcement action, mandatory independent review, and potential suspension of the agent. The reputational consequence is significant because bias in AI systems attracts public attention and media scrutiny, with damage that can persist long after the technical issue is remediated. The scale of impact distinguishes dataset bias from other governance failures: a biased dataset affects every inference the model makes, and at deployment scale, this can affect millions of individuals. The remediation cost is also disproportionately high — correcting a biased model may require data collection, retraining, revalidation, and redeployment, plus review and potential correction of all decisions made by the biased model.

Cross-references: AG-022 (Behavioural Drift Detection), AG-048 (AI Model Provenance and Integrity), AG-049 (Governance Decision Explainability), AG-055 (Oversight Competence Assurance), AG-056 (Independent Validation Governance), AG-051 through AG-054 (Provider Assurance, Rights & Documentation landscape).

Cite this protocol
AgentGoverning. (2026). AG-057: Dataset Suitability and Bias Control Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-057