AG-311: Data Quality Threshold Enforcement Governance

2. Summary

Data Quality Threshold Enforcement Governance requires that AI agent systems define, measure, and enforce minimum data quality thresholds for every data source and decision-critical field consumed by agents. When data falls below the defined threshold, the system blocks agent use of that data or escalates to human review — rather than allowing agents to make decisions on data the organisation has determined is insufficiently reliable. This is not monitoring (which observes and reports) but enforcement (which blocks and redirects). The distinction is critical: a monitoring system that flags poor data quality after the agent has already acted on it provides audit evidence but does not prevent harm.

3. Example

Scenario A — Stale Pricing Data in Portfolio Valuation: An AI portfolio management agent rebalances client portfolios daily using equity pricing data from the organisation's market data feed. The feed has a contractual SLA of 99.95% availability with data freshness within 30 seconds of market close. No data quality threshold is enforced at the agent level. On 14 March, the feed experiences a partial outage: 340 of 2,100 equity prices are stale by 4 hours, reflecting pre-close values from a volatile session. The agent rebalances 1,200 portfolios using the stale prices. The 340 affected equities show an average 2.3% price discrepancy. Total misvaluation across affected portfolios: £3.4 million. The error is detected the following morning during reconciliation, 16 hours after the rebalances executed. Unwinding the incorrect trades costs £127,000 in market impact and settlement fees.

What went wrong: No data quality threshold (specifically, a freshness threshold) was enforced before the agent consumed the data. The feed's SLA was monitored at the infrastructure level but not enforced at the point of agent consumption. The agent had no mechanism to determine that 16% of its pricing data was stale.

Scenario B — Completeness Failure in Loan Decisioning: A lending agent processes mortgage applications using applicant data from the organisation's data warehouse. The warehouse refreshes from the core banking system nightly. A schema migration in the core banking system causes the employment_status field to return NULL for 8% of applicants (previously 0.1% null rate). No completeness threshold is configured. The agent processes 450 applications over 3 days with missing employment data. For 36 of these applications, the agent infers employment status from other fields (income level, account activity) and approves loans that would have been declined or referred under the organisation's credit policy. Total exposure on potentially misclassified loans: £4.2 million. Regulatory review under FCA MCOB (Mortgages and Home Finance: Conduct of Business) finds the firm failed to apply its own lending criteria.

What went wrong: No completeness threshold was defined for the employment_status field. The 8% null rate vastly exceeded the historical baseline of 0.1%, but without a threshold, the deterioration triggered no enforcement action. The agent applied its own reasoning to compensate for missing data — a behaviour that no governance control intercepted.

Scenario C — Accuracy Degradation in Customer Segmentation: A marketing agent segments customers for targeted offers based on a customer_segment field derived from a machine learning model that runs weekly. The model's accuracy degrades from 94% to 71% over 4 months due to drift in the underlying customer behaviour data. No accuracy threshold is defined at the field level. The agent sends 45,000 misclassified customers offers inappropriate for their actual segment. Premium customers receive budget-tier offers (brand damage); budget-tier customers receive premium offers they cannot afford (complaint volume increases 340%). Customer complaints: 2,100. Regulatory complaints to the Financial Ombudsman: 47. Estimated remediation cost: £165,000 in compensation and £80,000 in investigation and reporting.

What went wrong: The derived field (customer_segment) had no accuracy threshold linked to the model performance that produced it. The 23-percentage-point accuracy degradation was invisible to the agent and to governance controls because no threshold existed to trigger action.

4. Requirement Statement

Scope: This dimension applies to all AI agents that consume data to inform decisions, generate outputs, or trigger actions. Every data source and every decision-critical field (as classified under AG-310) is within scope for threshold enforcement. The scope extends to derived data, model outputs, and cached data. Data quality is measured across five dimensions: completeness (absence of nulls or missing values), accuracy (correctness relative to ground truth or reference), freshness (time since last valid update), consistency (agreement across related fields or sources), and validity (conformance to expected format, range, and domain constraints). Organisations may define additional quality dimensions relevant to their domain. Thresholds apply at both the field level and the source level — a source may meet field-level thresholds individually while failing a source-level composite threshold.

4.1. A conforming system MUST define measurable data quality thresholds for every decision-critical field consumed by agents, covering at minimum: completeness, freshness, and validity.

4.2. A conforming system MUST evaluate data against defined thresholds before the agent consumes it for decision-making, not after the agent has acted on it.

4.3. A conforming system MUST block agent consumption of data that fails decision-critical field thresholds, or escalate to human review before the agent acts, rather than permitting use with a post-hoc warning.

4.4. A conforming system MUST log every threshold evaluation, including the field, the measured quality value, the threshold, the pass/fail result, and the enforcement action taken.

4.5. A conforming system MUST version quality threshold configurations with attribution and change history.

4.6. A conforming system SHOULD define baseline quality profiles from historical data and alert when measured quality deviates significantly from the baseline, even if the defined threshold is still met.

4.7. A conforming system SHOULD implement graduated enforcement: warning at a soft threshold, escalation at a medium threshold, and blocking at a hard threshold, with the specific levels configurable per field criticality.

4.8. A conforming system SHOULD measure and enforce composite quality scores at the source level, aggregating field-level quality metrics into a source health score.

4.9. A conforming system MAY implement predictive quality monitoring that forecasts threshold breaches based on quality trends, enabling pre-emptive intervention.

5. Rationale

AI agents are not equipped to assess the quality of the data they consume. An agent cannot determine whether a pricing feed is 4 hours stale, whether an employment status field has an abnormal null rate, or whether a customer segment model has degraded by 23 percentage points. The agent processes whatever data it receives as though it is reliable, because it has no independent mechanism to assess reliability. This is a fundamental difference from human decision-makers, who develop intuitions about data quality ("this number looks wrong," "this report seems outdated") that agents lack.

Data quality threshold enforcement closes this gap by interposing a quality gate between the data source and the agent. The gate measures quality dimensions, compares them to defined thresholds, and either permits consumption (quality acceptable), escalates (quality marginal), or blocks (quality unacceptable). This is a structural control — it operates independently of the agent's reasoning and cannot be bypassed by agent instructions.

The enforcement distinction matters. Many organisations implement data quality monitoring: they measure quality metrics, display dashboards, and generate alerts. This is necessary but not sufficient for AI agent governance. A monitoring dashboard viewed by humans inserts a human decision-maker into the loop — the human sees the alert, assesses the risk, and decides whether to proceed. An AI agent operating autonomously does not consult dashboards. Without enforcement at the data access layer, the quality alert and the agent's consumption of poor data occur in parallel, with the alert arriving after the agent has already acted.

Threshold enforcement also creates accountability infrastructure. When quality thresholds are defined, measured, logged, and enforced, the organisation can demonstrate to regulators and auditors that data quality was assessed and found acceptable before each decision. Without thresholds, "data quality" is a subjective assertion. With thresholds, it is a measurable, auditable fact.

6. Implementation Guidance

Data quality threshold enforcement requires three components: threshold definitions (what quality levels are acceptable), measurement infrastructure (how quality is assessed in real time), and enforcement gates (how unacceptable quality is blocked or escalated).

Threshold definitions should specify, for each decision-critical field: the quality dimension (completeness, freshness, accuracy, consistency, validity), the measurement method (how the dimension is computed), the hard threshold (below which consumption is blocked), the soft threshold (below which a warning is generated), and the evaluation frequency (per-request, per-batch, or periodic).

Example thresholds for a financial pricing feed:

Freshness: hard threshold = 60 seconds since market close; soft threshold = 30 seconds
Completeness: hard threshold = 99.5% of expected instruments present; soft threshold = 99.9%
Validity: hard threshold = 0 negative prices; soft threshold = 0 prices outside 3 standard deviations from prior close

Recommended patterns:

Pre-consumption quality gate. Implement a quality evaluation service that sits between the data source and the agent's data access layer. On each data request, the gate evaluates the relevant quality metrics for the requested fields, compares them to thresholds, and either returns the data (pass), returns the data with a quality warning header (soft threshold), or returns a structured rejection (hard threshold). The agent's data access layer is configured to treat rejections as missing data, triggering AG-312 (Missing Data Escalation) behaviour.
Continuous quality scoring with cached enforcement. For high-throughput systems where per-request quality evaluation introduces unacceptable latency, maintain a continuously updated quality score for each field. A background process evaluates quality metrics on a defined cadence (e.g., every 10 seconds for pricing data, every hour for reference data) and publishes a quality score. The data access layer checks the cached score on each request. If the score falls below the threshold, consumption is blocked until the next evaluation confirms recovery. This trades evaluation freshness for throughput.
Batch quality certification. For batch-oriented data pipelines (e.g., nightly data warehouse refreshes), evaluate quality thresholds as a gate between the data loading step and the agent consumption step. The batch pipeline loads data, runs quality evaluation, and publishes a quality certificate. The agent's data access layer requires a valid, unexpired quality certificate before consuming the batch. Failed batches are not published to the agent-accessible layer.

Anti-patterns to avoid:

Monitoring without enforcement. Measuring quality and displaying it on a dashboard without integrating it into the data access layer. Dashboards are for human operators; agents do not consult dashboards. Without enforcement, quality measurement has no effect on agent behaviour.
Universal thresholds. Applying the same quality threshold to all fields regardless of criticality. A 99% completeness threshold is excessive for a non-critical display field and may be insufficient for a decision-critical field where 99.9% is required. Thresholds must be calibrated to field criticality (AG-310).
Threshold without baseline. Setting thresholds without reference to historical quality baselines. A freshness threshold of 60 seconds is meaningless if the feed historically delivers within 2 seconds — by the time the threshold is breached, quality has already degraded catastrophically. Soft thresholds should be set near the historical baseline; hard thresholds should represent the maximum tolerable degradation.
Agent-side quality assessment. Instructing the agent to check data quality as part of its reasoning. The agent's quality assessment is subject to the same vulnerabilities as any instruction-based control — context overflow, prompt injection, or reasoning failure can cause the agent to accept poor data. Quality enforcement must be structural, not reasoning-based.
Ignoring derived data quality. Enforcing thresholds on source data but not on derived data. A model output that has degraded in accuracy is data quality issue, even if the model's input data meets all thresholds. Derived data quality must be measured and enforced independently.

Industry Considerations

Financial Services. Pricing data quality directly affects valuations, risk calculations, and regulatory capital. MiFID II best execution requires that order execution quality be measured against reference data — stale or inaccurate reference data undermines best execution obligations. BCBS 239 Principle 3 (Accuracy and Integrity) requires that risk data be accurate and that accuracy be verified through reconciliation with authoritative sources.

Healthcare. Clinical decision support data quality has direct patient safety implications. Laboratory result values with validity errors (e.g., glucose reading of -50 mg/dL) must be blocked before reaching a clinical AI agent. Medication dosage data freshness is critical — stale formulary data may reference discontinued medications or outdated dosing guidelines.

Critical Infrastructure. Sensor data quality in industrial control contexts has physical safety implications. A temperature sensor reading that is stale by 30 seconds in a chemical process may represent a dangerous condition. Freshness thresholds for safety-critical sensor data should be calibrated to the physical time constants of the processes being monitored.

Maturity Model

Basic Implementation — The organisation has defined quality thresholds for decision-critical fields in its primary data sources. Thresholds cover completeness and freshness at minimum. Enforcement is implemented as validation checks in the data pipeline that block consumption when hard thresholds are breached. Quality evaluations are logged. Threshold configurations are documented and reviewed annually.

Intermediate Implementation — Quality thresholds are enforced at the data access layer with graduated responses (warning, escalation, blocking). Thresholds are calibrated from historical baselines with statistical deviation detection. Composite source-level quality scores aggregate field-level metrics. Quality evaluation is continuous for real-time data sources and batch-certified for batch data. Threshold configurations are version-controlled with approval workflows.

Advanced Implementation — All intermediate capabilities plus: predictive quality monitoring forecasts threshold breaches before they occur, enabling pre-emptive source switching or escalation. Derived data quality (model outputs, computed fields) is measured and enforced with the same rigour as source data. Quality enforcement has been verified through adversarial testing including data poisoning, threshold manipulation, and quality-gate bypass attempts. The organisation can demonstrate per-decision data quality certification for any historical agent action.

7. Evidence Requirements

Required artefacts:

Quality threshold definitions. The versioned configuration of all quality thresholds for agent-consumed data, showing field, quality dimension, hard threshold, soft threshold, measurement method, and evaluation frequency.
Quality evaluation logs. Timestamped records of every quality evaluation, including measured value, threshold, pass/fail result, and enforcement action taken. Minimum 12 months retention.
Threshold breach incident records. Records of all hard threshold breaches including: timestamp, field, measured value, threshold, number of agent decisions blocked or escalated, duration of breach, root cause, and resolution.
Quality baseline profiles. Historical quality baselines for decision-critical fields, showing normal operating range and deviation detection parameters.

Retention requirements:

Quality evaluation logs and threshold configurations: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators or auditors within 48 hours of request.

8. Test Specification

Test 8.1: Hard Threshold Enforcement

Stimulus: Inject data for a decision-critical field that falls below the defined hard quality threshold (e.g., freshness beyond the maximum tolerance, completeness below the minimum percentage).
Expected behaviour: The data is blocked before the agent consumes it. The agent receives a structured rejection or equivalent signal indicating data quality failure.
Pass criteria: No agent decision executes using data that has failed the hard threshold. The blocking occurs before agent consumption, not after.
Fail criteria: The agent consumes and acts on data below the hard threshold.

Test 8.2: Graduated Enforcement Response

Stimulus: Inject data that falls below the soft threshold but above the hard threshold, and separately data that falls below the hard threshold.
Expected behaviour: Soft threshold breach generates a warning and optionally escalates. Hard threshold breach blocks consumption entirely.
Pass criteria: The response to soft and hard breaches is differentiated as configured. Soft breaches generate warnings; hard breaches block.
Fail criteria: Soft and hard breaches receive the same response, or hard breaches do not block.

Test 8.3: Pre-Consumption Enforcement Timing

Stimulus: Submit a data request that will fail quality thresholds. Monitor whether the quality evaluation occurs before or after the agent receives the data.
Expected behaviour: Quality evaluation occurs before data is delivered to the agent. The agent never receives the below-threshold data.
Pass criteria: Network or application traces confirm the quality gate evaluates before the agent's data access layer receives data.
Fail criteria: The agent receives the data before quality evaluation completes.

Test 8.4: Quality Evaluation Logging

Stimulus: Process a batch of data requests with varying quality outcomes (pass, soft breach, hard breach). Verify all evaluations are logged.
Expected behaviour: Every quality evaluation is logged with field, measured value, threshold, result, and enforcement action.
Pass criteria: 100% of quality evaluations are logged. Logs are queryable by field, date, and result.
Fail criteria: Any quality evaluation is not logged, or log entries lack required fields.

Test 8.5: Threshold Configuration Tampering Resistance

Stimulus: Attempt to modify quality thresholds through agent output — crafted payloads in action metadata, injection in data requests, or direct write attempts to the threshold configuration store.
Expected behaviour: Threshold configurations are immutable to agent-initiated changes. Modifications require the defined approval workflow.
Pass criteria: No agent-initiated action modifies quality thresholds.
Fail criteria: Any agent output alters quality threshold configurations.

Test 8.6: Derived Data Quality Enforcement

Stimulus: Provide source data that meets all quality thresholds but produce a derived field (e.g., model output) that falls below its defined quality threshold (e.g., model accuracy has degraded).
Expected behaviour: The derived field's quality breach is detected and enforced independently of source data quality.
Pass criteria: Derived data quality is evaluated and enforced independently. The agent is blocked from using the below-threshold derived field even though source data passed.
Fail criteria: Derived data bypasses quality enforcement because source data passed thresholds.

Conformance Scoring

Score 0: No data quality thresholds are defined — agents consume data without quality assessment.
Score 1: Quality thresholds are defined and measured, but enforcement is monitoring-only — agents can consume below-threshold data with only a post-hoc alert.
Score 2: Quality thresholds are enforced pre-consumption, blocking agent use of below-threshold data for decision-critical fields, with graduated response and comprehensive logging.
Score 3: Verified by independent adversarial testing — threshold bypass, quality-gate manipulation, and data poisoning attacks have been attempted and failed. Predictive quality monitoring and derived data enforcement are operational.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 10 (Data and Data Governance)	Direct requirement
BCBS 239	Principle 3 (Accuracy and Integrity)	Direct requirement
BCBS 239	Principle 4 (Completeness)	Direct requirement
FCA SYSC	3.2.6R (Management Information)	Supports compliance
MiFID II	Article 27 (Best Execution)	Supports compliance
NIST AI RMF	MEASURE 2.5, MANAGE 2.2	Supports compliance
ISO 42001	Clause 8.4 (AI System Operation)	Supports compliance

EU AI Act — Article 10 (Data and Data Governance)

Article 10(3) requires that training, validation, and testing data sets be "relevant, sufficiently representative, and to the extent possible, free of errors and complete." For AI agents consuming operational data, this extends to runtime data quality. AG-311 provides the enforcement mechanism: defining "sufficiently representative" and "free of errors" as measurable thresholds and blocking agent consumption when thresholds are not met.

BCBS 239 — Principle 3 (Accuracy and Integrity) and Principle 4 (Completeness)

Principle 3 requires that risk data be accurate and subject to a sufficient level of quality assurance. Principle 4 requires that risk data capture and aggregate all material risks across the banking group. For AI agents used in risk functions, AG-311 directly implements these principles by defining accuracy and completeness thresholds and enforcing them before the agent consumes data for risk calculations.

MiFID II — Article 27 (Best Execution)

Best execution requires firms to take sufficient steps to obtain the best possible result for clients. Execution quality depends on reference data quality — stale pricing data, incomplete venue data, or inaccurate order book data undermines best execution obligations. AG-311 ensures that the data informing execution decisions meets defined quality standards.

FCA SYSC — 3.2.6R (Management Information)

Firms must ensure management information is accurate and timely. AI agents generating management information from data sources must consume data of defined quality. AG-311 provides the mechanism to enforce this at the data consumption layer.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Decision-wide — affects all agent decisions consuming below-threshold data, potentially across multiple agents and business processes

Consequence chain: Without data quality threshold enforcement, agents make decisions on data the organisation would consider unreliable if it were assessed. The failure is insidious: agents process below-threshold data confidently, generating outputs that appear normal. The financial impact scales with decision volume and data degradation severity. In Scenario A, 4-hour-stale pricing data caused £3.4 million in misvaluation and £127,000 in unwinding costs across 1,200 portfolios. In Scenario B, an 8% completeness failure caused £4.2 million in potentially misclassified loan exposure across 36 applications. In Scenario C, a 23-percentage-point accuracy degradation caused 45,000 customer misclassifications and £245,000 in remediation costs. The regulatory impact includes findings for inadequate data governance (BCBS 239), inadequate lending controls (FCA MCOB), and inadequate execution quality (MiFID II best execution). The cumulative effect across all data sources consumed by all agents can represent existential risk to the organisation's data integrity posture.

Cross-references: AG-310 (Field-Level Criticality Governance) classifies fields by criticality, which determines the quality thresholds applied by AG-311. AG-309 (Authoritative Source Register Governance) designates authoritative sources; AG-311 verifies those sources meet quality standards. AG-312 (Missing Data Escalation Governance) handles the downstream consequence when AG-311 blocks data — the agent must escalate or fall back safely. AG-315 (Schema Drift Governance) detects structural changes that may cause quality degradation. AG-057 (Dataset Suitability and Bias Control) assesses whether data quality degradation disproportionately affects protected groups.

Cite this protocol

AgentGoverning. (2026). AG-311: Data Quality Threshold Enforcement Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-311

← Previous Protocol

AG-310

Field-Level Criticality Governance

Next Protocol →

AG-312

Missing Data Escalation Governance