This dimension governs the requirement that all agent-generated content used as the basis for decisions, downstream actions, communications, or records must be grounded in verifiable, traceable source material, and that systematic mechanisms are in place to detect, flag, and remediate confabulated facts, fabricated citations, and ungrounded assertions before those outputs propagate into consequential systems. It matters because large language models used in agentic pipelines are structurally capable of producing fluent, confident, and internally consistent output that is factually false—a property that is not incidental to their design but intrinsic to the probabilistic token-prediction process—and that this failure mode is most dangerous precisely in high-stakes domains where the appearance of authority is most likely to suppress human scrutiny. Failure manifests as legally cited cases that do not exist, financial figures cited from fabricated regulatory filings, drug interaction warnings drawn from hallucinated clinical literature, or engineering specifications attributed to standards documents that contain no such specification—all delivered with the same surface confidence as accurate output, with no internal signal to distinguish them.
Example 3.1 — Legal Research Agent, Fabricated Case Citations
A law firm deploys an enterprise workflow agent to accelerate litigation research. A junior associate asks the agent to identify precedent supporting a motion to dismiss on procedural grounds in a federal civil matter. The agent returns seven case citations, each formatted with correct jurisdiction prefix, volume number, reporter abbreviation, page number, and a two-sentence summary of the relevant holding. Four of the seven citations are fabricated: the case names exist in no federal court record, the volume and page numbers lead to unrelated or non-existent opinions, and the holdings described bear no relation to any real judicial decision. The associate, relying on the agent's professional-grade formatting, includes three of the fabricated citations in the filed motion without independent verification. Opposing counsel identifies the non-existent cases during review. The court issues a show-cause order, sanctions are sought under Rule 11, and the firm faces disciplinary referral to the bar. Reputational and governed exposure exceeds USD 400,000 in remediation costs, sanctions, and client relationship damage. No grounding check, citation resolution validator, or confidence floor was present in the deployment.
Example 3.2 — Financial-Value Agent, Confabulated Regulatory Figures
A financial services firm deploys a customer-facing agent to assist retail investors with regulatory disclosure queries. A customer asks the agent to confirm the current annual management charge cap applicable to a specific product category under applicable conduct regulation. The agent, whose retrieval context contains an outdated regulatory document from 18 months prior, returns a figure that no longer reflects the operative cap following a regulatory amendment. It presents the figure as current, cites the regulation by name and section number, but references a section that was renumbered and substantively revised in the intervening period. The customer makes an investment decision based on the stated cost figure. The discrepancy is discovered 11 months later during a routine compliance review. The firm is required to conduct a customer remediation exercise across 3,200 accounts that interacted with the agent during the affected period, at a projected remediation cost of GBP 1.2 million, and receives a supervisory notice requiring attestation of corrective controls. No temporal validity check on retrieved source documents, no citation resolution step, and no confidence threshold gate was in place.
Example 3.3 — Safety-Critical / CPS Agent, Hallucinated Engineering Specification
An industrial automation operator deploys an embodied edge agent to assist maintenance technicians with equipment servicing procedures on high-voltage switchgear installations. A technician asks the agent for the correct torque specification for a specific terminal lug fastener on a designated equipment model. The agent, lacking the precise specification in its retrieval corpus, generates a figure by interpolation from similar equipment classes described in its parametric training distribution. It returns a torque value 40% below the correct specification, attributing the figure to the equipment manufacturer's installation and maintenance manual with the correct document title and revision number. The technician applies the lower torque value during a scheduled maintenance window. At the next load cycling event, the under-torqued connection develops resistive heating, leading to an arc flash incident that injures one technician and destroys a switchgear panel. The total incident cost including injury compensation, equipment replacement, unplanned downtime, and regulatory investigation exceeds USD 2.8 million. The manufacturer's actual manual, which was not included in the retrieval corpus, contained no such figure. The agent fabricated both the specification and its attribution.
This dimension applies to all agent deployments in which the agent produces factual assertions, numerical figures, citations to external documents or records, regulatory or legal references, technical specifications, medical or clinical claims, or any other content category where accuracy is verifiable against an external ground truth and where that content is used—directly or indirectly—as the basis for a decision, an action, a communication to a human, or a record persisted to a system of record. It applies regardless of whether the agent operates in retrieval-augmented generation (RAG) mode, purely from parametric knowledge, or in a hybrid configuration. Agents whose outputs are used solely as creative brainstorming prompts with explicit human reworking before any consequential use are excluded from the strictest requirements herein but remain subject to Sections 4.3 and 4.9.
4.1.1 The deploying organisation MUST ensure that agent-generated factual assertions—including numerical claims, named-entity claims, causal claims, and claims about the existence or content of external documents—are accompanied by a traceable grounding reference that identifies the specific source material from which the assertion is derived.
4.1.2 Grounding references MUST be machine-readable and stored alongside the output in the audit record, not rendered exclusively as human-readable inline text that can be separated from the assertion.
4.1.3 Where an assertion cannot be grounded to a specific source at output generation time, the agent MUST either withhold the assertion and return an explicit knowledge-boundary signal (see AG-011), or flag the assertion with an ungrounded-content marker that is propagated to any downstream consumer or record.
4.1.4 The deploying organisation MUST define and document, per deployment context, which content categories are subject to mandatory grounding and which are treated as lower-risk generative content, and this categorisation decision MUST be reviewed and reapproved at intervals not exceeding 12 months.
4.2.1 Where the agent produces a citation to a specific external document, legal instrument, standard, regulation, academic paper, or other named reference, the deploying organisation MUST implement a citation resolution step that verifies the cited artefact exists at the stated location, identifier, or coordinates before the output is delivered to a downstream consumer or action executor.
4.2.2 Citation resolution MUST be performed against an authoritative index or live resolution service appropriate to the citation type (e.g., legal case repositories for case law citations, DOI resolution for academic references, official gazette records for regulatory citations). Resolution against the agent's own parametric knowledge is not a compliant resolution mechanism.
4.2.3 Where a citation fails resolution, the output MUST be quarantined and either returned to a human reviewer or suppressed, with a structured failure record generated for audit purposes. The agent MUST NOT forward a citation that fails resolution to a downstream action executor or external communication.
4.2.4 Where live resolution infrastructure is unavailable (e.g., air-gapped environments, offline edge deployments), the deploying organisation MUST document this constraint, implement a local verified citation index as a compensating control, and clearly label all agent outputs produced under the offline constraint as unverified-citations pending later reconciliation.
4.3.1 The deploying organisation MUST configure the agent pipeline to associate a confidence or grounding quality signal with each factual assertion output, derived from the strength of the grounding evidence, the recency of the source, and the specificity of the retrieved passage relative to the query.
4.3.2 Confidence signals MUST be propagated to downstream pipeline stages and to human-in-the-loop review interfaces. They MUST NOT be stripped, normalised upward, or suppressed by intermediate processing layers prior to a human decision point or consequential action gate.
4.3.3 The deploying organisation MUST define numeric or categorical thresholds below which a confidence signal triggers an escalation path, human review requirement, or explicit output suppression. These thresholds MUST be documented and version-controlled.
4.3.4 Agents operating in Safety-Critical / CPS, Financial-Value, or Public Sector / Rights-Sensitive profiles MUST apply a conservative default confidence threshold that errs toward suppression and human review rather than permissive output delivery when grounding evidence is ambiguous or partial.
4.4.1 The deploying organisation MUST implement at least one automated hallucination detection mechanism operating as a post-generation verification layer, distinct from the generative component, that evaluates the factual consistency of the generated output against the retrieved or supplied context material.
4.4.2 Hallucination detection mechanisms MUST be capable of identifying at minimum: (a) assertions not entailed by the retrieval context; (b) numerical values that differ materially from values present in the source material; (c) named entities (persons, organisations, documents, cases, standards) not present in the retrieval context or verifiable external index; and (d) citations whose target content does not support the claimed holding or finding.
4.4.3 The deploying organisation MUST log all detected potential hallucinations with sufficient detail to reconstruct the specific assertion, the source context against which it was evaluated, and the detection outcome.
4.4.4 Detected hallucinations that exceed a defined severity threshold MUST trigger an immediate block on downstream propagation of the flagged output. Severity thresholds MUST be documented and calibrated to the risk profile of the deployment.
4.4.5 The organisation MUST track hallucination detection rates over time as an operational metric, review trends at intervals not exceeding 30 days, and initiate a control-effectiveness review if the rolling 30-day hallucination detection rate exceeds predefined bounds for the deployment context.
4.5.1 The deploying organisation MUST implement source freshness controls that associate each retrieved document in the grounding corpus with a retrieval or publication timestamp and a defined validity window appropriate to the content domain.
4.5.2 Retrieval of documents whose validity window has expired MUST either be blocked, or the agent output derived from that document MUST be flagged with a temporal-currency warning that is propagated to the output consumer.
4.5.3 For content categories where regulatory, legal, or technical specifications are subject to amendment or replacement (including but not limited to financial regulations, drug interaction databases, building codes, and equipment maintenance standards), the deploying organisation MUST define maximum permissible source ages and enforce these limits at retrieval time.
4.5.4 The deploying organisation MUST maintain a documented source corpus update schedule and MUST NOT permit agent deployments to operate against source corpora that have not been refreshed within the defined update interval without explicit exception approval and compensating controls.
4.6.1 Where agents rely on parametric (training-time) knowledge rather than retrieved context for a factual assertion, the deploying organisation MUST treat such assertions as ungrounded by default unless a post-generation verification step has cross-checked the assertion against a live or locally-maintained authoritative source.
4.6.2 The deploying organisation MUST NOT permit parametric-knowledge-only assertions to flow into Safety-Critical / CPS, Financial-Value, Legal, Medical, or Public Sector / Rights-Sensitive action pipelines without human review and explicit approval.
4.6.3 The deploying organisation SHOULD maintain a taxonomy of content categories for which parametric knowledge is considered sufficiently stable and low-risk to permit use without retrieval grounding, and this taxonomy MUST be reviewed annually and whenever the base model is updated or replaced.
4.7.1 All agent outputs delivered to human consumers that contain assertions derived from retrieval-augmented sources MUST include visible, human-readable attribution to the specific source documents consulted, in a form that allows the consumer to independently locate and verify the source.
4.7.2 All agent outputs that contain assertions flagged as low-confidence, unverified, or ungrounded MUST carry a clearly visible disclosure to that effect in the user-facing presentation layer. This disclosure MUST NOT be confined to metadata fields inaccessible to the consuming user.
4.7.3 Customer-Facing and Public Sector / Rights-Sensitive agent deployments MUST ensure that disclosure language is presented in plain language appropriate to the intended audience and is not buried in technical caveats or fine print.
4.7.4 The deploying organisation SHOULD implement a mechanism by which consumers can query the specific grounding sources for any assertion in a delivered output, and this mechanism SHOULD be surfaced in the primary interaction interface rather than requiring a separate audit query.
4.8.1 The deploying organisation MUST define and enforce mandatory human-in-the-loop review gates for agent-generated outputs that meet one or more of the following criteria: (a) contain citations to legal or regulatory instruments; (b) contain numerical values to be used in financial transactions or disclosures; (c) contain technical specifications to be applied in physical or safety-critical processes; (d) contain clinical or pharmaceutical information to be acted upon without independent professional review; or (e) have been flagged by hallucination detection or confidence assessment as potentially unreliable.
4.8.2 Human review gates MUST be implemented as blocking controls in the pipeline architecture—that is, outputs subject to mandatory review MUST NOT be forwarded to downstream action executors, external communications, or records systems until the review is positively completed and the reviewer identity and decision are logged.
4.8.3 The deploying organisation MUST NOT permit human review gates to be bypassed under operational time pressure without explicit escalation approval from a named accountable officer and contemporaneous documentation of the bypass rationale.
4.8.4 The organisation SHOULD implement reviewer competency controls that verify the reviewing human possesses sufficient domain knowledge to meaningfully assess the grounding quality of the flagged output, particularly for Safety-Critical, Legal, and Medical deployment contexts.
4.9.1 The deploying organisation MUST designate a named accountability owner for grounding governance who is responsible for maintaining the grounding policy, reviewing control effectiveness metrics, approving threshold changes, and reporting material hallucination incidents to the appropriate internal governance body.
4.9.2 The deploying organisation MUST conduct a formal grounding control review at intervals not exceeding 90 days for Safety-Critical, Financial-Value, and Public Sector / Rights-Sensitive deployments, and at intervals not exceeding 180 days for all other in-scope deployments.
4.9.3 The deploying organisation MUST maintain a hallucination incident register that records all confirmed hallucination events that resulted in output delivery to a human or downstream system, including the content of the hallucinated assertion, the point of detection, and the remediation action taken.
4.9.4 The organisation MUST perform structured root-cause analysis for any confirmed hallucination event that resulted in a consequential outcome—defined as an output that influenced a decision, action, or record—and MUST feed findings into the control improvement cycle within 30 days of the event.
4.9.5 Where a model update, retrieval corpus change, or pipeline architectural change is made, the deploying organisation MUST re-execute the test suite defined in Section 8 before reactivating the deployment in a production context handling consequential outputs.
Hallucination is not a bug that can be patched out of a language model through a software update; it is a structural property of the generative architecture. Models trained on next-token prediction under cross-entropy loss are optimised to produce plausible continuations of input sequences, and plausibility is measured in terms of distributional fit to training data, not in terms of correspondence to external ground truth. This means a model operating in a domain where the correct answer is rare in the training distribution will, absent constraint, produce the most distributionally plausible answer—which may bear only coincidental relationship to factual accuracy. The problem is compounded by the fact that the surface features of confident, authoritative output (well-formed citations, specific numerical figures, named entities, source attributions) are themselves high-frequency patterns in the training distribution and are therefore precisely the features most readily produced by confabulation.
Detective control is the appropriate primary control type for this dimension because prevention—that is, making a generative model incapable of producing hallucinated content—is currently not achievable through any known mechanism that preserves the generative utility for which the model is deployed. The control architecture therefore must focus on detecting hallucinated content after generation and before consequential use, creating a verification layer that operates independently of the generative component and applies external ground-truth checks rather than relying on the model's self-assessment of its own accuracy. Self-assessment is specifically unreliable in this context: models are not systematically better calibrated about the accuracy of their own outputs than they are about arbitrary factual claims, and in some configurations their confidence expressions are positively correlated with error rates in low-frequency knowledge domains.
In single-turn conversational deployments, a hallucinated assertion typically reaches one human who has the opportunity to verify before acting. In agentic pipelines, a hallucinated assertion generated at one stage of a multi-step process may be consumed as ground truth by subsequent pipeline stages, passed to action executors, written to records systems, included in communications to external parties, and retrieved by future agent interactions as established fact—creating a compounding propagation dynamic where the original confabulation becomes progressively more difficult to trace and remediate. This is the structural reason why grounding controls must be enforced at the point of generation and at every stage of propagation rather than solely at the final output delivery point.
A common misconception is that RAG architectures eliminate hallucination risk by providing the model with relevant retrieved context. Retrieved context reduces but does not eliminate hallucination for several reasons: (a) models may generate assertions that are inconsistent with the retrieved context due to attention saturation, conflicting signals in long contexts, or positional biases; (b) models may attribute correct-sounding citations to the retrieved sources when the specific claimed content is not present in those sources; (c) retrieved documents may themselves be outdated, inaccurate, or misrepresentative of the current state of the referenced domain; and (d) models may supplement retrieved context with parametric knowledge without signalling this transition to the consumer, producing a blended output where grounded and ungrounded assertions are indistinguishable. This rationale underlies the requirements in Sections 4.4 through 4.6 for detection mechanisms that operate independently of the retrieval layer.
Pattern 6.1 — Layered Verification Architecture Implement grounding verification as a three-layer stack: (1) retrieval grounding at generation time, ensuring the generative component has access to relevant source material; (2) post-generation entailment checking, using a separate model or deterministic parser to assess whether each factual assertion in the output is supported by the retrieved context; and (3) citation resolution, using external index queries to verify that cited artefacts exist and that their content matches the attributed claim. Each layer should produce a structured pass/fail or scored signal that is aggregated into a composite grounding confidence score attached to the output.
Pattern 6.2 — Claim Decomposition and Granular Assessment Rather than assessing grounding at the level of a complete output, decompose the output into individual factual claims before running verification. This allows hallucination detection to operate at the grain of individual assertions rather than producing a single output-level verdict that may obscure the presence of specific ungrounded claims within an otherwise well-grounded response. Claim decomposition can be implemented using a lightweight structured extraction step that identifies the set of verifiable assertions in the output before passing them individually to the verification layer.
Pattern 6.3 — Source Freshness Metadata in Retrieval Corpus Maintain source freshness metadata (publication date, last-verified date, domain-specific validity period) as a first-class field in the retrieval index. Configure the retrieval step to surface freshness signals alongside relevance scores, and implement a configurable filter that applies domain-appropriate age cutoffs before documents are included in the generation context. For regulatory and legal domains, consider a dual-threshold approach: a hard block on documents older than a defined maximum age, and a soft warning for documents in a defined intermediate age band.
Pattern 6.4 — Human Review Queue with Structured Review Prompts When outputs are routed to human review due to grounding concerns, provide the reviewer with a structured review interface that presents: (a) the specific assertions flagged as potentially ungrounded; (b) the source evidence against which the assertions were evaluated; (c) the resolution status of any cited artefacts; and (d) a structured decision form requiring the reviewer to explicitly accept, reject, or modify each flagged assertion. Unstructured review interfaces that present the full output for general review without highlighting specific concerns are less effective and harder to audit.
Pattern 6.5 — Hallucination Red-Teaming as Part of Deployment Testing Before any in-scope deployment enters production, conduct structured hallucination red-teaming that exercises the agent with queries designed to elicit confabulation: queries about entities at the boundary of the retrieval corpus, queries involving specific numerical values not present in retrieved documents, queries requesting citations in domains where the corpus is sparse, and queries that present false premises to test whether the agent confabulates supporting evidence. Red-team findings should be used to calibrate detection thresholds and to identify corpus gaps requiring remediation.
Pattern 6.6 — Temporal Grounding Stamps in Audit Records Attach a temporal grounding stamp to every output record that includes: the timestamp of retrieval corpus last update, the publication dates of the specific documents used in generation, the timestamp of the citation resolution check, and the version identifier of the hallucination detection component. This stamp provides the forensic material needed to reconstruct grounding quality at the time of output delivery for post-incident investigation and regulatory inquiry response.
Anti-Pattern 6.A — Relying on Model Self-Assessment of Confidence Presenting the model's own expressed confidence ("I am confident that..." or numeric self-reported confidence scores derived from the generative component) as the grounding verification signal. Self-reported confidence from generative models is not reliably calibrated to factual accuracy and provides no meaningful detection of hallucination. This pattern is particularly dangerous because high model confidence is positively correlated with fluent confabulation.
Anti-Pattern 6.B — Treating Formatted Output as Verified Output Assuming that because agent output is correctly formatted—properly structured citations, well-formed references, plausible-sounding statistics—it has been verified against source material. Formatting quality is entirely independent of factual accuracy for generative models. Deployment configurations that treat format compliance as a proxy for grounding compliance create a false sense of security that is most likely to fail at exactly the moments when accurate information is most critical.
Anti-Pattern 6.C — Single-Pass RAG Without Post-Generation Verification Deploying a retrieval-augmented generation architecture and treating the presence of a retrieval step as sufficient grounding control without implementing independent post-generation verification. RAG reduces but does not eliminate hallucination, and deployment architectures that omit the post-generation verification layer are structurally non-compliant with Section 4.4 regardless of the sophistication of their retrieval mechanism.
Anti-Pattern 6.D — Source Attribution Without Resolution Including inline source attributions in agent outputs (e.g., "According to [Document Title], section [X]...") as a transparency measure without implementing a resolution step that verifies the cited document exists, the cited section contains the attributed claim, and the attributed claim is accurately represented. Attribution without resolution provides the appearance of grounding while obscuring confabulation that includes plausible-sounding but fabricated source references.
Anti-Pattern 6.E — Threshold Bypass Under Operational Pressure Implementing confidence and grounding thresholds as soft advisories that can be overridden by automated pipeline logic or operational expedience without human approval and audit logging. Threshold bypass under operational pressure is among the most common failure modes in deployed grounding controls and accounts for a significant proportion of cases where verified hallucination incidents result in consequential outcomes. If thresholds cannot be operationally sustained at their defined levels, the correct response is a documented threshold review, not a silent bypass.
Anti-Pattern 6.F — Grounding Controls Applied Only at Final Output Implementing grounding verification only at the terminal output stage of a multi-step agent pipeline, leaving intermediate reasoning steps, tool call parameters, and inter-agent communications unverified. In multi-step pipelines, hallucinated intermediate content can propagate into tool calls that modify external state before reaching the final output verification gate. Grounding controls must be applied at each stage where factual assertions are generated and used as inputs to subsequent steps.
| Maturity Level | Characteristics |
|---|---|
| Level 1 — Initial | No systematic grounding controls. Agent outputs delivered without verification. Hallucination incidents discovered reactively from downstream errors. |
| Level 2 — Developing | RAG architecture deployed. Basic source attribution in outputs. No post-generation verification. Citation resolution absent. Human review ad hoc rather than structured. |
| Level 3 — Defined | Post-generation entailment checking implemented. Citation resolution in place for primary citation types. Confidence thresholds defined and documented. Human review gates enforced for high-risk output categories. Hallucination incident register maintained. |
| Level 4 — Managed | Claim decomposition and granular assertion-level verification. Source freshness controls enforced. Hallucination detection rate tracked as operational metric. Red-teaming integrated into deployment testing. Structured human review interfaces with explicit reviewer decisions logged. |
| Level 5 — Optimising | Continuous calibration of detection thresholds against incident data. Automated root-cause categorisation of hallucination patterns. Grounding quality metrics fed into model and corpus update prioritisation. Cross-deployment hallucination pattern sharing with governance function. |
7.1 Grounding Policy Document A written policy defining: which content categories are subject to mandatory grounding; the specific grounding mechanisms deployed per content category; confidence and detection thresholds with their rationale; source freshness validity windows per domain; and the human review gate criteria. This document must be version-controlled, must identify the named accountability owner required by Section 4.9.1, and must carry a review date not more than 12 months from the current version date. Minimum retention period: 7 years or the applicable regulatory record-keeping period for the deployment domain, whichever is longer.
7.2 Grounding Architecture Diagram A technical diagram of the agent pipeline showing the placement of each grounding control layer (retrieval, post-generation verification, citation resolution, confidence propagation, human review gates) with data flows indicating which assertions are subject to which controls and at which pipeline stage. Must be updated within 30 days of any material pipeline architectural change. Minimum retention period: 5 years.
7.3 Citation Resolution Logs Structured logs of all citation resolution checks performed, including: the cited artefact identifier, the resolution service queried, the resolution outcome (resolved/unresolved), the timestamp of the check, and the subsequent action taken (delivered/quarantined/flagged). Must be queryable by output identifier to support post-incident investigation. Minimum retention period: The longer of 3 years or the period during which the associated output could be the subject of regulatory inquiry or legal proceedings.
7.4 Hallucination Detection Event Logs Structured logs of all hallucination detection events, including: the specific assertion flagged, the source context against which it was evaluated, the detection signal and severity score, and the pipeline action taken. Must be stored with tamper-evident integrity controls consistent with AG-103 (Audit Trail Integrity). Minimum retention period: 7 years for Safety-Critical, Financial-Value, and Public Sector deployments; 3 years for others.
7.5 Hallucination Incident Register A maintained register of all confirmed hallucination events that resulted in output delivery to a human or downstream system, as required by Section 4.9.3. Each entry must include: incident date, deployment identifier, content of the hallucinated assertion, discovery mechanism, consequential impact assessment, and remediation action record. The register must be reviewed and signed off by the accountability owner at each governance review cycle. Minimum retention period: 10 years.
7.6 Root Cause Analysis Records Written root cause analysis for each hallucination event meeting the consequential outcome threshold, as required by Section 4.9.4. Must document: the causal pathway from model behaviour through control failure to consequential outcome; contributing factors in retrieval corpus, model configuration, or pipeline design; corrective actions taken; and residual risk assessment. Minimum retention period: 10 years.
7.7 Threshold Configuration Records Version-controlled records of all confidence and detection threshold settings, including the date of each change, the identity of the approving authority, and the rationale for the threshold values selected. Must include records of any threshold bypass events with approval documentation. Minimum retention period: 7 years.
7.8 Test and Red-Team Reports Reports from test suite execution (Section 8) and hallucination red-teaming exercises, including: test inputs used, outputs generated, detection outcomes, and compliance determinations. Must be produced before each production deployment and after each material system change. Minimum retention period: 5 years.
7.9 Human Review Decision Records Structured records of each human review gate decision, including: the reviewer identity, the time of review, the specific assertions reviewed, the reviewer's decision (accepted/rejected/modified) for each flagged assertion, and the basis for the decision where the reviewer exercised judgement to accept a flagged item. Minimum retention period: 7 years for Safety-Critical, Financial-Value, and Public Sector deployments; 3 years for others.
Maps to: Sections 4.1.1 and 4.1.2
Objective: Verify that factual assertions in agent outputs are accompanied by machine-readable grounding references stored in the audit record.
Method: Submit a set of 30 test queries across the agent's primary content categories, each requiring specific factual assertions with verifiable ground truth. Retrieve the structured audit records for each output. Inspect each audit record for the presence of machine-readable grounding reference fields. Verify that grounding references are stored in the audit record as structured data, not embedded exclusively in unstructured output text. Cross-check that the stated grounding references correspond to source material that actually contains the attributed assertion.
Pass Criteria:
Maps to: Sections 4.2.1, 4.2.2, and 4.2.3
Objective: Verify that citation resolution is performed against authoritative external indices and that outputs with failed resolutions are blocked from downstream propagation.
Method: Inject 20 test queries specifically designed to elicit citations: 10 queries targeting valid, resolvable citations known to be within the retrieval corpus, and 10 queries targeting either fabricated citations or citations to content not present in the deployed corpus. For each output, capture the citation resolution log entries. Verify that the resolution step is performed against an external authoritative index (not against the model's parametric knowledge). Verify that all 10
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 9 (Risk Management System) | Direct requirement |
| EU AI Act | Article 15 (Accuracy, Robustness and Cybersecurity) | Direct requirement |
| NIST AI RMF | GOVERN 1.1, MAP 3.2, MANAGE 2.2 | Supports compliance |
| ISO 42001 | Clause 6.1 (Actions to Address Risks), Clause 8.2 (AI Risk Assessment) | Supports compliance |
Article 9 requires providers of high-risk AI systems to establish and maintain a risk management system that identifies, analyses, estimates, and evaluates risks. Hallucination Detection and Output Grounding Governance implements a specific risk mitigation measure within this framework. The regulation requires that risks be mitigated "as far as technically feasible" using appropriate risk management measures. For deployments classified as high-risk under Annex III, compliance with AG-742 supports the Article 9 obligation by providing structural governance controls rather than relying solely on the agent's own reasoning or behavioural compliance.
Article 15 requires high-risk AI systems to achieve appropriate levels of accuracy, robustness, and cybersecurity. Hallucination Detection and Output Grounding Governance directly supports the robustness and cybersecurity requirements by implementing structural controls that resist adversarial manipulation and ensure system integrity under attack conditions.
GOVERN 1.1 addresses legal and regulatory requirements; MAP 3.2 addresses risk context mapping; MANAGE 2.2 addresses risk mitigation through enforceable controls. AG-742 supports compliance by establishing structural governance boundaries that implement the framework's approach to AI risk management.
Clause 6.1 requires organisations to determine actions to address risks and opportunities within the AI management system. Clause 8.2 requires AI risk assessment. Hallucination Detection and Output Grounding Governance implements a risk treatment control within the AI management system, directly satisfying the requirement for structured risk mitigation.
| Field | Value |
|---|---|
| Severity Rating | Critical |
| Blast Radius | Organisation-wide — potentially cross-organisation where agents interact with external counterparties or shared infrastructure |
| Escalation Path | Immediate executive notification and regulatory disclosure assessment |
Consequence chain: Without hallucination detection and output grounding governance, the governance framework has a structural gap that can be exploited at machine speed. The failure mode is not gradual degradation — it is a binary absence of control that permits unbounded agent behaviour in the dimension this protocol governs. The immediate consequence is uncontrolled agent action within the scope of AG-742, potentially cascading to dependent dimensions and downstream systems. The operational impact includes regulatory enforcement action, material financial or operational loss, reputational damage, and potential personal liability for senior managers under applicable accountability regimes. Recovery requires both technical remediation and regulatory engagement, with timelines measured in weeks to months.