AG-745

Factual Grounding and Hallucination Governance

Output Integrity and Transparency Governance ~23 min read AGS v2.1 · 2026-04-25
EU AI Act NIST AI RMF ISO 42001

1. Definition

Factual grounding and hallucination governance establishes the universal requirement that all agentic systems producing factual claims must implement systematic verification mechanisms to ensure that outputs correspond to verifiable reality rather than statistically plausible confabulations. This dimension is classified as Universal/Core because hallucination risk is inherent to every generative AI deployment regardless of domain, deployment pattern, or risk tier — a general-purpose internal copilot that fabricates a meeting date causes operational disruption just as a financial-value agent that fabricates a regulatory threshold causes compliance violations. The universality of this control reflects the architectural reality that all transformer-based language models are optimised for token-level plausibility rather than factual correspondence, making confabulation a structural property rather than an occasional malfunction.

This dimension differs from AG-742 (Hallucination Detection and Output Grounding Governance) in scope and tier. AG-742, positioned in the Rogue LLM & Operational Completeness Addendum block, addresses hallucination detection as a high-risk/critical control with emphasis on detection infrastructure and audit mechanisms. AG-745, positioned in the Framework Alignment Extension block, establishes hallucination governance as a universal baseline control that applies to all deployments including those that do not meet the risk thresholds triggering AG-742's advanced requirements. AG-745 focuses on the minimum viable grounding practices that every agentic deployment must implement, while AG-742 specifies the intensive detection and remediation infrastructure required for high-risk contexts. Organisations subject to both dimensions must comply with the more stringent requirements of AG-742 where applicable, while AG-745 ensures that no deployment, however low-risk, operates without basic grounding controls.

Failure at the universal tier manifests across the full spectrum of agent deployments. An internal copilot that hallucsinates project deadlines causes teams to misallocate resources. A customer-facing chatbot that fabricates product specifications generates warranty claims and regulatory complaints. A research discovery agent that invents citations wastes researcher time and, if the fabricated citations enter published literature, propagates misinformation through the academic record. A cross-border agent that confabulates customs duty rates causes shipments to be under-declared, triggering penalties and potential criminal liability. The universal nature of this risk means that grounding controls cannot be treated as an advanced-tier luxury — they are a baseline operational requirement for any system that presents generated text as factual information.

Governance at the universal tier requires every deploying organisation to implement, at minimum: a classification of output types into those requiring mandatory grounding and those treated as advisory or creative; a basic verification mechanism appropriate to the deployment context (ranging from simple retrieval cross-referencing for low-risk copilots to multi-layered entailment checking for high-risk agents); explicit uncertainty signalling when the agent cannot ground a claim; and monitoring of grounding failure rates as a core operational metric. The emphasis is on ensuring no deployment operates in a grounding-blind mode where hallucinated content is indistinguishable from verified content in the output delivered to consumers.

The universal applicability of this dimension is reinforced by the EU AI Act's Article 15 requirements for accuracy, which apply to all high-risk AI systems without sector-specific qualification. The Stanford HELM benchmark framework identifies accuracy and calibration as foundational evaluation dimensions for all language model deployments. FCA Consumer Duty obligations under PRIN 2A.5 require that consumer-facing AI systems support consumer understanding — a requirement that is violated when an agent presents hallucinated content as factual without any mechanism for the consumer to distinguish between verified and unverified claims. The aggregate regulatory signal is unambiguous: factual grounding is not an optional enhancement for advanced deployments but a baseline expectation for any system that presents generated content as truthful to humans who may act on it.

2. Scope

This dimension applies to all agentic system deployments where the agent produces output that a reasonable consumer would interpret as factual assertion rather than creative content, opinion, or explicitly speculative material. This includes numerical claims, date and deadline assertions, regulatory and legal references, product specifications, classification codes, named entity claims, and any other output category where the agent's statement can be evaluated against an external ground truth. It applies at all tiers and to all primary profiles without exception.

3. Why This Matters

Factual Grounding and Hallucination Governance addresses a governance gap that, if left unmanaged, creates systemic risk across the agent ecosystem. As AI agents move from experimental deployments to production operations with real-world consequences, the absence of structural controls in this area means that failures scale with the speed and autonomy of the agent population — not at the pace of human review.

Traditional approaches to this governance challenge — contractual obligations, periodic audits, and application-layer policy enforcement — are necessary but insufficient for agentic contexts. Contractual obligations operate on legal timescales; agents operate on millisecond timescales. Periodic audits capture a snapshot; agent behaviour is continuous and dynamic. Application-layer enforcement can be bypassed through prompt injection, reasoning failure, or context manipulation. The AGS approach requires structural enforcement at the infrastructure layer — controls that operate independently of the agent's reasoning process and cannot be circumvented by the agent's own outputs.

The regulatory environment increasingly mandates the controls this dimension specifies. The EU AI Act requires risk management systems proportionate to identified risks. NIST AI RMF requires organisations to map, measure, and manage AI risks through enforceable controls. ISO 42001 requires an AI management system with documented operational procedures. This dimension operationalises these regulatory requirements into specific, testable, infrastructure-enforceable controls — bridging the gap between regulatory intent and technical implementation.

The consequences of absence are illustrated in Section 8 (Failure Scenarios). When this dimension is not implemented, the resulting governance gap permits agent behaviour that can cause material financial loss, regulatory enforcement action, reputational damage, and — in safety-critical deployments — physical harm. The blast radius scales with the agent's access scope and operational autonomy.

4. Requirements

4.1 Output Classification for Grounding Requirements

4.2 Minimum Viable Grounding Mechanism

4.3 Uncertainty Signalling

4.4 Grounding Failure Rate Monitoring

4.5 Multi-Step Workflow Grounding Requirements

4.6 Consumer Recourse and Correction

5. Maturity Model

Basic Implementation — The organisation has documented policies addressing factual grounding and hallucination and has implemented initial controls. Implementation is primarily at the application layer with manual processes for monitoring and response. Logging covers key events but may lack full metadata. Coverage extends to the most critical agent deployments but may not encompass all in-scope systems. Staff are aware of requirements but formal training may be incomplete.

Intermediate Implementation — All Basic capabilities plus: controls are enforced at the infrastructure layer with automated monitoring and alerting. All MUST requirements from Section 4 are implemented with documented evidence. Coverage extends to all in-scope agent deployments. Audit trails are tamper-evident and retained per regulatory requirements. Formal change control governs all configuration changes. Regular review cycles are established and documented. Staff receive formal training and competency is assessed.

Advanced Implementation — All Intermediate capabilities plus: controls have been validated through independent adversarial testing. Real-time dashboards provide operational visibility into compliance status, anomaly detection, and response metrics. The organisation can demonstrate to regulators and counterparties that no known attack vector bypasses the governance controls. Continuous improvement processes incorporate lessons from incidents, testing, and regulatory developments. Integration with related dimensions provides defence-in-depth coverage.

Implementation Patterns

Tamper-evident audit trail. Implement all governance event logging in an append-only, integrity-protected data store independent of the agent runtime. Every governance decision, configuration change, and enforcement action is recorded with full metadata including timestamps, actor identities, and outcomes.

Real-time monitoring with graduated alerting. Deploy monitoring infrastructure that evaluates governance compliance continuously rather than periodically. Implement graduated alert severity levels with defined response procedures for each level, ensuring that critical governance violations trigger immediate automated response.

Scheduled governance review cycle. Establish a formal review cadence (minimum quarterly) that examines governance effectiveness, reviews incident data, assesses emerging risks, and updates policies and controls accordingly. Review outcomes are documented and tracked.

Separation of governance and agent runtime domains. Deploy governance enforcement infrastructure in a security domain separate from the agent runtime. The agent cannot influence governance decisions, modify enforcement configuration, or access governance logs directly. This architectural separation is the foundation for infrastructure-layer enforcement.

Defined escalation paths with human oversight integration. Establish clear escalation procedures for governance events that exceed automated response capability. Human oversight touchpoints are defined, documented, and tested. Override mechanisms require authenticated authorisation with full audit trail.

Anti-Patterns

Governance by instruction rather than infrastructure. Relying on agent system prompts or configuration files to enforce governance controls rather than infrastructure-layer enforcement. Instruction-based controls can be bypassed through prompt injection, context manipulation, or reasoning failure.

Monitoring without enforcement. Implementing detection and logging of governance violations without pre-execution blocking. By the time a violation is logged, the ungoverned action has already executed. Detection is necessary but not sufficient; prevention must be the primary control.

Manual processes for machine-speed operations. Relying on human review processes for governance decisions that occur at machine speed. Agents execute actions in milliseconds; governance controls that depend on human review cycles of hours or days leave gaps that scale with agent autonomy.

6. Test Criteria

Test Case 6.1: Output Classification Compliance

Test Case 6.2: Grounding Mechanism Presence and Function

Test Case 6.3: Uncertainty Signal Preservation

Test Case 6.4: Grounding Failure Rate Metric Availability

Test Case 6.5: Consumer Correction Mechanism

Test Case 6.6: Multi-Step Workflow Grounding Propagation

Test Case 6.7: Knowledge Boundary Signalling Under Data Gaps

Evidence Artefacts

7.1 Output classification document defining mandatory grounding, advisory grounding, and creative/generative categories. Version-controlled with review date within 12 months. Retention: 5 years.

7.2 Grounding mechanism documentation including mechanism type, known limitations, and applicable content categories. Retention: 5 years.

7.3 Grounding verification logs for all mandatory-grounding outputs, including verification outcome and any unverified-content markers applied. Retention: 3 years for standard deployments; 7 years for Financial-Value, Safety-Critical, and Public Sector deployments.

7.4 Grounding failure rate metric history with daily granularity. Retention: 3 years.

7.5 Grounding incident register recording all confirmed hallucination events with content, detection method, consumer impact assessment, and remediation action. Retention: 7 years.

7.6 Consumer feedback and flag records for hallucination reports. Retention: 3 years.

7.7 Grounding policy review records demonstrating compliance with review cadence requirements. Retention: 5 years.

7.8 Multi-step workflow grounding propagation records demonstrating that unverified-content markers are preserved across pipeline stages. Retention: 3 years.

7.9 Knowledge boundary signal logs recording instances where the agent declined to answer due to insufficient grounding evidence, including the query, the boundary signal type, and the user notification delivered. Retention: 3 years.

7.10 Grounding mechanism effectiveness reports from periodic evaluations, including false-negative rates, domain-specific performance, and comparison against baseline metrics. Retention: 5 years.

7. Scoring

ScoreLevelDescription
0No implementationNo factual grounding and hallucination governance exists. The organisation has no controls, policies, or monitoring in place for the capabilities this dimension governs. Agent behaviour in this area is ungoverned.
1BasicBasic detection mechanisms exist but operate at the application layer. Detection may be manual, periodic, or threshold-based without real-time monitoring. Alerts are generated but may lack automated response. Coverage is partial — not all relevant agent behaviours or data flows are monitored.
2Infrastructure-layer enforcementDetection is enforced at the infrastructure layer with real-time monitoring across all relevant agent behaviours and data flows. Automated alerting with structured response procedures. Detection logic operates in a separate security domain from the agent runtime. Full audit trail with tamper-evident logging.
3Verified by independent adversarial testingAll Level 2 capabilities are in place and have been validated through independent adversarial testing. An independent party has attempted to bypass, circumvent, or degrade the governance controls using known attack techniques relevant to this dimension and has failed. Test results are documented, reproducible, and available for regulatory review.

8. Failure Scenarios

Example 3.1 — Internal Copilot, Hallucinated Project Status and Deadlines

A technology company deploys a general-purpose internal copilot integrated with project management, email, and calendar systems for use by 2,800 employees. A programme manager asks the copilot to summarise the current status and next milestones for a cross-functional product launch involving 6 workstreams. The copilot retrieves partial information from the project management system — 4 of 6 workstreams have current status updates — but rather than flagging the missing data, it generates plausible status summaries for the 2 workstreams with no current data, extrapolating from 3-month-old records and presenting the extrapolated content with the same formatting and confidence as the retrieved current data. The fabricated status for one workstream indicates "hardware certification on track for May 15 delivery" when, in reality, the certification process had been paused 6 weeks earlier pending a redesign. The programme manager presents the copilot's summary at a senior leadership review, decisions are made to proceed with marketing commitments and channel partner agreements based on the May 15 date, and USD 340,000 in non-refundable commitments are made before the certification delay surfaces through direct human communication 11 days later. The company must renegotiate partner agreements, delay the launch by 8 weeks, and absorb the sunk costs. The root cause is a copilot that was deployed without any mechanism to distinguish between retrieved current data and generated gap-filling content, and without any uncertainty signal when operating beyond its verified knowledge.

Example 3.2 — Cross-Border Agent, Fabricated Customs Classification Codes

A logistics company deploys a cross-border trade agent to assist customs brokers with harmonised system (HS) classification for shipments. A broker queries the agent for the correct HS code for a specialised industrial adhesive containing both epoxy resin and nano-silica fillers. The agent, encountering a product composition at the boundary of multiple classification categories, generates an HS code (3506.91.00) with a detailed justification referencing specific General Interpretive Rules and Explanatory Notes. The code and justification are plausible and well-formatted but incorrect — the product's nano-silica content triggers a different classification (3824.99.92) with a materially different duty rate (6.5% versus 3.2%). The broker, relying on the agent's confident and well-reasoned output, uses the fabricated classification for 47 shipments over a 4-month period. A customs audit identifies the systematic misclassification. The company faces back-duty assessments of EUR 218,000, administrative penalties of EUR 65,000, and a 12-month enhanced scrutiny designation that slows all future shipments through extended examination procedures, costing approximately EUR 15,000 per month in delays. The agent had no verification mechanism to cross-check generated HS codes against the authoritative tariff schedule database, and no uncertainty signal despite operating in a boundary case where classification ambiguity was high.

Example 3.3 — Safety-Critical Agent, Hallucinated Medication Interaction Warning

A hospital network deploys a safety-critical agent as a clinical decision support tool to assist pharmacists with medication interaction checking for complex multi-drug regimens. A pharmacist queries the agent about potential interactions between five medications prescribed for an ICU patient with multiple comorbidities. The agent correctly identifies three genuine interactions documented in the DrugBank database and the hospital's formulary. However, it also generates a fourth interaction warning — between two of the medications — that does not exist in any clinical database, published literature, or FDA adverse event report. The fabricated interaction is presented in the same structured format as the genuine interactions, complete with a plausible-sounding mechanism of action ("competitive inhibition at CYP3A4 substrate binding site"), a severity rating ("moderate — monitor closely"), and a citation to a clinical pharmacology textbook chapter that exists but does not contain the claimed interaction. The pharmacist, managing a 12-patient workload, accepts the agent's output as a complete and accurate interaction report. Based on the fabricated interaction warning, the pharmacist recommends to the attending physician that one of the five medications be substituted. The substitute medication is less effective for the patient's condition, resulting in a 4-day extension of the ICU stay and a treatment complication that requires additional intervention. Total incremental treatment cost: USD 47,000. The incident is identified during a routine pharmacy quality review 3 weeks later. The hospital discovers that the agent has generated 14 other fabricated interaction warnings over the preceding 2 months, 6 of which resulted in unnecessary medication substitutions. No grounding verification mechanism was applied to the agent's interaction outputs, and no uncertainty signal distinguished the fabricated interaction from the genuine ones.

9. Regulatory Mapping

RegulationProvisionRelationship Type
OWASP LLM Top 10LLM09 — Misinformation_Pending v2.1 editorial review_
MITRE ATLASAML.T0048 — Evade ML Model (output integrity)_Pending v2.1 editorial review_
EU AI ActArticle 15 — Accuracy, Robustness and Cybersecurity_Pending v2.1 editorial review_
NIST AI RMFMEASURE 2.5 (Accuracy), MANAGE 2.2 (Risk Controls)_Pending v2.1 editorial review_
ISO/IEC 42001Clause 8.4 (AI System Operation)_Pending v2.1 editorial review_
UK AISI InspectFactual accuracy evaluations_Pending v2.1 editorial review_
FCAPRIN 2A.5 — Consumer understanding outcome_Pending v2.1 editorial review_
Stanford HELMAccuracy, Calibration, Robustness dimensions_Pending v2.1 editorial review_

Interaction with AG-742

AG-745 and AG-742 govern related but distinct aspects of hallucination and grounding governance. AG-742, positioned in the Rogue LLM & Operational Completeness Addendum block at High-Risk/Critical tier, provides intensive detection infrastructure including claim decomposition, citation resolution, layered verification architecture, and structured human review interfaces. AG-745, positioned in the Framework Alignment Extension block at Universal/Core tier, establishes the baseline grounding requirements that apply to all deployments regardless of risk tier. Organisations should assess compliance with both dimensions where applicable: AG-745 for the universal baseline, and AG-742 for the additional advanced controls required by high-risk deployments. Where both apply, the more stringent requirement in each area governs.

Cite this protocol
AgentGoverning. (2026). AG-745: Factual Grounding and Hallucination Governance. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-745