Output Integrity Verification requires that every AI agent output consumed by external systems, communicated to humans, or used as input to further automated processes is validated against known reference data before propagation. The protocol addresses one of the most consequential risks in AI agent deployment: the propagation of fabricated, hallucinated, or incorrect outputs into enterprise systems where they are treated as authoritative facts and acted upon without further verification. Validation must occur at the output boundary — the point where the agent's output becomes an input to something else — covering numerical values, named entities, regulatory citations, and structured data. Outputs that fail validation or fall below confidence thresholds are blocked or routed to human review before they can enter the enterprise data ecosystem.
Scenario A — Hallucinated Regulatory Citation Triggers Non-Compliance: A compliance AI agent at a European investment firm produces a quarterly regulatory report for submission to the national competent authority. The report includes 340 compliance assessments across 12 regulatory requirements. The agent cites specific regulation articles to support each assessment — for example, "Compliant with MiFID II Article 27(3)(b) regarding best execution reporting." The report is submitted to the regulator. During a routine supervisory review, the regulator's team notes that "MiFID II Article 27(3)(b)" does not exist. Article 27(3) has only sub-paragraph (a). The agent fabricated the sub-paragraph reference. The regulator broadens the review and discovers that 14 of the 340 citations contain errors: some reference non-existent sub-paragraphs, others cite articles from superseded versions of the regulation, and two cite entirely fabricated regulation numbers.
What went wrong: No validation against the actual regulation text occurred before the report was submitted. The agent hallucinated specific regulatory sub-articles that were structurally similar to real citations, making them plausible to human readers. The compliance team treated the agent's output as authoritative and submitted it without verifying the underlying citations. Consequence: Regulatory investigation not for the underlying compliance failures but for submitting a report containing fabricated regulatory references. The firm must retrospectively verify every regulatory report produced by the agent since deployment — a process that takes three months and 800 person-hours.
Scenario B — Fabricated Counterparty Name Creates Reconciliation Failure: A transaction processing agent generates a batch of 2,300 reconciliation records matching incoming payments to expected counterparties. One record matches a payment of £87,000 to a counterparty named "Meridian Capital Partners LLP." The reconciliation is marked as complete and the payment is posted. Three days later, the finance team discovers that there is no counterparty called "Meridian Capital Partners LLP" in the organisation's counterparty database. The agent fabricated the counterparty name based on partial pattern matching with a real counterparty named "Meridian Capital Management Ltd." The £87,000 payment is actually from an unknown source and should have been flagged for AML investigation, not reconciled.
What went wrong: The agent generated a counterparty name that was plausible but not validated against the authoritative counterparty registry. The fabricated name was close enough to a real counterparty that it appeared legitimate in the reconciliation report. No named entity validation occurred before the reconciliation was finalised. Consequence: A £87,000 payment from an unverified source was accepted and reconciled without AML investigation. Depending on the source, this may constitute a money laundering facilitation offence. Potential FCA enforcement action for failure to apply adequate AML controls.
Scenario C — Out-of-Range Risk Score Fails to Trigger Escalation: A risk assessment agent produces daily risk scores for the organisation's portfolio positions. The valid range for risk scores is 0–100. Due to a data input anomaly, the agent produces a risk score of -47 for a critical portfolio. The negative value is stored in the risk system without validation. Because the risk escalation threshold is set at scores above 80, the negative score does not trigger an escalation — it is treated as a very low risk. In reality, the data anomaly that caused the negative score also masks a genuine risk that should have been escalated. The risk materialises three days later with a £2.1 million loss.
What went wrong: The risk score output was not validated against the known valid range. A negative risk score — which is outside any valid range — was accepted and stored without challenge. The downstream escalation system operated on the unvalidated score. Consequence: A £2.1 million loss that would have been prevented by a simple range check. Regulatory investigation reveals that the risk system accepted negative risk scores, calling into question the integrity of all risk assessments produced by the system.
Scope: This dimension applies to all agent outputs that are consumed by external systems, communicated to humans, or used as inputs to further automated processes. This includes: numerical outputs (financial figures, risk scores, performance metrics), named entities (counterparty names, account numbers, regulatory references, product identifiers), textual outputs (compliance reports, risk assessments, recommendations), and structured data outputs (API responses, database records, file exports). The scope extends to intermediate outputs that feed into downstream agent processes. If Agent A produces an output that is consumed by Agent B as input, Agent A's output is within scope because it influences Agent B's behaviour. This is particularly important in multi-agent architectures where errors in early-stage outputs can compound through subsequent processing stages, creating cascading failures that are difficult to trace back to the original fabrication. The scope does not extend to the agent's internal reasoning artefacts — working memory, intermediate calculations, and draft outputs that are not propagated externally. These are governed by AG-036. AG-018 applies at the output boundary — the point where the agent's output becomes an input to something else.
4.1. A conforming system MUST validate numerical outputs against known reference ranges before propagation — every numerical value in a governance-relevant output must be checked against defined acceptable ranges, and out-of-range values must be flagged for human review before being released to downstream systems.
4.2. A conforming system MUST validate named entities (counterparties, account numbers, regulatory references) against authoritative registries — every named entity must be verified against a maintained registry of known valid entities, and unrecognised entities must be flagged before propagation.
4.3. A conforming system MUST flag low-confidence outputs for human review before propagation — outputs where the agent's own confidence assessment falls below a defined threshold must not be released to downstream systems without human review and approval.
4.4. A conforming system MUST implement output validation at the governance layer, independent of the agent that produced the output — the agent must not be the sole validator of its own outputs.
4.5. A conforming system MUST log every validation decision — pass, fail, and the specific reference data used — creating an auditable record of output verification.
4.6. A conforming system SHOULD detect fabricated regulatory citations through cross-reference with known rulebooks — the platform should maintain a library of valid regulatory citations and verify every citation in agent outputs against this library.
4.7. A conforming system SHOULD enforce structured output schemas rather than free text wherever possible — structured outputs are easier to validate field-by-field than free-text outputs where validation requires natural language understanding.
4.8. A conforming system SHOULD implement output anomaly detection that compares against prior similar outputs — the platform should maintain a baseline of typical outputs for each output type and flag outputs that deviate significantly from the baseline.
4.9. A conforming system MAY implement secondary validation by an independent system (a separate language model, a rule-based expert system, or a human review panel) for high-stakes outputs.
Output Integrity Verification addresses one of the most consequential risks in AI agent deployment: the propagation of fabricated, hallucinated, or incorrect outputs into enterprise systems where they are treated as authoritative facts. The critical distinction is between reasoning integrity and output integrity. AG-036 (Reasoning Integrity Monitoring) governs the process by which an agent produces its outputs — ensuring that the reasoning chain is sound, consistent, and aligned with the agent's objectives. AG-018 governs the outputs themselves, regardless of the process that produced them. An agent may follow a perfectly sound reasoning chain and still produce an output that contains a fabricated regulatory citation, an out-of-range financial figure, or a nonexistent counterparty name. AG-036 would not detect this error because the reasoning process was sound — the error is in the data the agent generated, not in how it generated it.
This protocol is essential because AI agents, particularly those built on large language models, have a well-documented tendency to generate plausible but false information — commonly known as hallucination. In conversational contexts, hallucination is an inconvenience. In enterprise governance contexts, hallucination is a liability. A hallucinated regulatory citation in a compliance report can lead to non-compliance with the actual regulation. A fabricated counterparty name in a transaction record can create confusion in reconciliation. An out-of-range financial figure in a risk report can trigger — or fail to trigger — critical risk management actions.
The cascading nature of the failure is particularly severe. A fabricated output that enters an enterprise data pipeline may be consumed by multiple downstream systems, each of which treats it as authoritative. A fabricated counterparty name may appear in reconciliation records, risk reports, regulatory filings, and client statements. By the time the fabrication is detected, it may have propagated through dozens of systems and documents, requiring extensive remediation to correct. The failure also has a trust dimension. Once an organisation discovers that agent outputs contain fabricated data, the credibility of all agent outputs is called into question. Regulators, clients, and internal stakeholders may require retrospective verification of all historical outputs — a process that can be enormously expensive and may not be fully achievable if the reference data from the relevant period is no longer available.
AG-018 establishes the principle that no agent output should be treated as authoritative until it has been validated against known reference data. This validation must occur before the output leaves the governance system — not after it has propagated to downstream consumers.
Build a reference data layer containing: valid numerical ranges for all governed output types, a registry of known counterparties and account identifiers, and a library of valid regulatory citations. Validate every structured output field before releasing to downstream systems. Tag outputs with confidence scores and route low-confidence outputs to a human review queue.
Recommended patterns:
Anti-patterns to avoid:
Financial Services. Financial services firms face specific risks from fabricated outputs in: transaction processing (incorrect amounts, fabricated counterparty details), risk management (invalid risk scores, fabricated position data), regulatory reporting (hallucinated regulatory citations, incorrect compliance assessments), and client communications (misleading recommendations, fabricated performance figures). The validation reference data layer should integrate with: the firm's counterparty master data, the firm's product reference data, market data providers for price and rate validation, and regulatory reference databases (such as the FCA handbook and EU Official Journal). The FCA's expectations for data accuracy in automated systems are high and increasing.
Healthcare. Healthcare AI agents producing clinical outputs face unique validation requirements. Drug names must be validated against pharmacopoeias (e.g., BNF, FDA Orange Book). Dosage recommendations must be validated against approved dosage ranges for the specific drug, indication, and patient population. Clinical coding outputs (ICD-10, SNOMED CT) must be validated against the current code sets. Lab value interpretations must be validated against reference ranges appropriate to the patient's age, sex, and clinical context. The consequences of fabricated clinical outputs are direct patient safety risks, making AG-018 compliance a patient safety requirement, not just a governance requirement.
Critical Infrastructure. AI agents in critical infrastructure producing operational outputs — set points, control parameters, alarm thresholds — must have those outputs validated against physical safety limits before they are applied to control systems. A fabricated set point for a chemical process temperature, a hallucinated pressure threshold, or an out-of-range flow rate could have immediate physical safety consequences. Validation reference ranges must be derived from engineering safety analyses and must include margins of safety. The validation must be implemented at the control system layer, not solely in the AI agent's software environment, providing defence in depth against propagation of invalid outputs to physical actuators.
Basic Implementation — The organisation has defined reference ranges for numerical outputs and maintains a registry of known named entities. Every agent output is validated against these references before propagation. Out-of-range values and unrecognised entities are flagged and routed to a human review queue. Low-confidence outputs (based on the agent's own confidence scoring) are held for review. The validation is implemented as a post-processing step in the agent's output pipeline. This level meets the minimum mandatory requirements but has limitations: the reference ranges are static and may become stale, the named entity registry may not cover all entities the agent references, and the confidence threshold may not correlate well with actual output accuracy.
Intermediate Implementation — All basic capabilities plus: reference ranges are dynamically updated from authoritative sources on a defined schedule. The named entity registry is integrated with external authoritative registries (company registries, regulatory entity databases, account verification services). Regulatory citation validation cross-references against a maintained library of current and historical regulations. Output anomaly detection compares each output against a baseline of similar outputs and flags statistical outliers. Structured output schemas are enforced wherever possible, enabling field-level validation. The validation pipeline logs every check result — pass, fail, and the specific reference data used — creating an auditable record of output verification.
Advanced Implementation — All intermediate capabilities plus: secondary validation by an independent system (which may be a separate language model, a rule-based expert system, or a human review panel) is implemented for high-stakes outputs. Output validation has been independently tested with deliberately fabricated outputs — hallucinated citations, out-of-range values, fictional counterparties — and the detection rate is measured and reported. The reference data layer is version-controlled and its update history is auditable. False positive and false negative rates are tracked and the validation thresholds are tuned based on observed performance. The organisation can demonstrate to regulators that its agent outputs are verified against authoritative reference data before propagation, and can report the detection rate for known fabrication types.
Required artefacts:
Retention requirements:
Access requirements:
Testing AG-018 compliance requires a systematic approach that covers each validation type with both positive and negative test cases, including adversarial fabrication scenarios.
Test 8.1: Numerical Range Validation
Test 8.2: Named Entity Validation
Test 8.3: Regulatory Citation Validation
Test 8.4: Confidence Threshold Enforcement
Test 8.5: Anomaly Detection
Test 8.6: Adversarial Fabrication Resistance
Test 8.7: Validation Independence From Agent
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 13 (Transparency) | Direct requirement |
| EU AI Act | Article 9 (Risk Management System) | Supports compliance |
| FCA Conduct Risk | Misleading Outputs / TCF Outcomes | Direct requirement |
| SOX | Section 302/404 (Accuracy of Financial Reporting) | Direct requirement |
| NIST AI RMF | MEASURE 2.3, MANAGE 2.2 | Supports compliance |
| ISO 42001 | Clause 8.2 (AI Risk Assessment) | Supports compliance |
Article 13 requires that high-risk AI systems be designed and developed in such a way that their operation is sufficiently transparent to enable users to interpret the system's output and use it appropriately. For agent outputs, this means that users must be able to assess the reliability of the output. AG-018 supports Article 13 compliance by ensuring that outputs have been validated against reference data before reaching the user, and by flagging low-confidence or unverified outputs. The validation metadata — which checks passed, which failed, what the confidence score was — constitutes transparency information that enables users to calibrate their reliance on the output. Without AG-018, users receive agent outputs with no indication of whether the content has been verified, undermining their ability to use the output appropriately.
Article 9 requires a risk management system that identifies and analyses foreseeable risks. Hallucinated outputs represent a foreseeable risk for any AI agent producing governance-relevant outputs. AG-018 implements the risk mitigation measure for this specific risk category, ensuring that fabricated content is detected and blocked before it causes harm.
The FCA's conduct risk framework requires firms to ensure that communications with clients and counterparties are clear, fair, and not misleading. When AI agents produce outputs that are communicated to clients — whether directly (as in customer-facing chatbots) or indirectly (as inputs to client reports or recommendations) — fabricated or inaccurate outputs constitute misleading communications. AG-018's validation requirements ensure that agent outputs are verified against authoritative reference data before they reach any client-facing channel. The Treating Customers Fairly (TCF) outcomes are directly relevant. TCF Outcome 3 (consumers provided with clear information) and TCF Outcome 5 (consumers provided with products that perform as expected) are both undermined by unverified agent outputs that contain fabricated data.
SOX Section 302 requires officers to certify the accuracy of financial reports, and Section 404 requires effective internal controls over financial reporting. When AI agents contribute to the financial reporting process — producing figures, calculations, or data that feeds into financial statements — AG-018 ensures that the agent's outputs are validated before they enter the reporting pipeline. A SOX auditor will assess whether the controls over AI agent outputs are sufficient to prevent material misstatement. Unvalidated agent outputs that feed into financial reports represent a control deficiency. If the validation gap could result in a material misstatement, it is a material weakness that must be disclosed. The validation must be documented, tested on a defined schedule, and the test results retained. The reference data used for validation must itself be authoritative and current. The validation process must be independent of the agent that produced the output.
MEASURE 2.3 addresses the assessment of AI system outputs for accuracy and reliability. MANAGE 2.2 addresses risk mitigation through enforceable controls. AG-018 supports compliance by implementing structural validation of outputs against authoritative reference data, providing both measurement and management of output integrity risk.
Clause 8.2 requires AI risk assessment including identification of risks arising from AI system outputs. Fabricated or hallucinated outputs represent a primary risk category. AG-018 provides the risk treatment control for output integrity, directly satisfying the requirement for controls addressing output-related risks within the AI management system.
| Field | Value |
|---|---|
| Severity Rating | High |
| Blast Radius | Organisation-wide — potentially cross-organisation where agent outputs feed into external regulatory filings, client communications, or shared data pipelines |
Consequence chain: Without output integrity verification, fabricated financial figures, incorrect counterparty details, or hallucinated regulatory references propagate through enterprise systems, creating liability for decisions made on false data. The failure mode is insidious because fabricated outputs are often plausible — they look like real data, they are structurally consistent with real data, and they are presented with the same formatting and confidence as real data. Without validation, the recipients of agent outputs have no way to distinguish genuine from fabricated content. The cascading nature of the failure is particularly severe: a fabricated output entering an enterprise data pipeline may be consumed by multiple downstream systems, each treating it as authoritative. A fabricated counterparty name may appear in reconciliation records, risk reports, regulatory filings, and client statements. By the time the fabrication is detected, it may have propagated through dozens of systems and documents, requiring extensive remediation to correct. The immediate technical failure is an unverified output entering production systems. The operational impact includes regulatory enforcement action for reports containing fabricated data, material financial loss from decisions based on incorrect figures, AML compliance failures from fabricated counterparty reconciliation, and reputational damage when clients or regulators discover that agent outputs were not verified. The trust dimension compounds the impact: once fabrication is discovered, all historical agent outputs must be retrospectively verified — a process that can require months and thousands of person-hours.
Cross-references: AG-036 (Reasoning Integrity Monitoring) governs the reasoning process that produces outputs, complementing AG-018's validation of the outputs themselves. AG-049 (Explainability) ensures that the reasoning behind validated outputs can be explained. AG-006 (Audit Trail Integrity) ensures that validation decisions and outputs are recorded in a tamper-evident audit trail. AG-019 (Mandatory Human Oversight Enforcement) defines the human oversight requirements for outputs flagged by AG-018 validation. AG-013 (Data Sensitivity Classification and Handling) governs the sensitivity classification of output content, ensuring validated outputs are handled appropriately based on their data classification.