Clinical Evidence Provenance Governance requires that every clinical recommendation, diagnostic suggestion, treatment pathway, or safety advisory produced by an AI agent in a healthcare or life sciences context is traceable to specific, identified, validated, and current medical evidence sources. The provenance chain must connect the agent's output to the evidence that substantiates it — including the publication, guideline, formulary, or clinical dataset — with sufficient metadata to allow a qualified clinician or auditor to independently verify the evidential basis. Without governed provenance, AI-generated clinical guidance becomes an unverifiable assertion, indistinguishable from hallucination, and incapable of supporting the standard of care obligations that healthcare professionals owe their patients.
Scenario A — Outdated Guideline Drives Contraindicated Dosing: A clinical decision-support agent recommends a heparin dosing protocol for a 74-year-old patient presenting with acute deep vein thrombosis. The agent's recommendation is based on a 2016 edition of a national anticoagulation guideline. The 2016 guideline recommends weight-based dosing at 80 units/kg bolus followed by 18 units/kg/hour infusion. However, the guideline was revised in 2022 to include age-adjusted dosing for patients over 70 years — reducing the bolus to 60 units/kg and infusion to 15 units/kg/hour — based on a multicentre trial (n = 4,200) demonstrating a 34% reduction in major bleeding events in elderly patients under the revised protocol. The agent's recommendation of 80 units/kg bolus leads to a supratherapeutic anticoagulation level. The patient develops gastrointestinal bleeding requiring 3 units of packed red blood cells and a 6-day ICU stay costing £47,000. A subsequent root-cause analysis reveals that the agent's evidence corpus was last refreshed 14 months ago and contained only the 2016 guideline.
What went wrong: The agent had no mechanism to verify the currency of its evidence sources against the latest published guidelines. The provenance chain terminated at a stale corpus snapshot rather than a versioned, validated evidence source. No metadata indicated the guideline version or its revision status. The clinician who accepted the recommendation had no way to see that the evidence was 6 years out of date. Consequence: patient harm (major bleeding), £47,000 in additional care costs, malpractice claim settled at £215,000, regulatory investigation by the national medicines authority.
Scenario B — Hallucinated Citation Supports Non-Existent Trial: A clinical research agent generates a summary of evidence supporting the use of a novel immunotherapy combination for stage IIIB non-small cell lung cancer. The summary cites "Martinez et al., 2024, Journal of Clinical Oncology, 42(8):1124-1136" as a phase III randomised controlled trial (n = 1,800) demonstrating a 6.2-month improvement in progression-free survival. An oncologist, relying on this summary, discusses the combination with a patient and initiates a referral for compassionate-use access. A pharmacist subsequently attempts to retrieve the cited trial and discovers it does not exist — the journal, volume, and page numbers correspond to an unrelated paediatric cardiology article. The immunotherapy combination has only been studied in a phase I dose-escalation trial (n = 38) with no efficacy data.
What went wrong: The agent generated a fabricated citation with plausible but fictional metadata. No provenance verification mechanism confirmed the existence of the cited source, validated its content against the agent's claims, or flagged the absence of a retrievable source. The clinician trusted the citation format as evidence of validity. Consequence: patient received misinformation about treatment options, referral based on non-existent evidence, erosion of clinician trust in AI-assisted research tools, institutional review triggering a 4-month suspension of the agent, £38,000 in investigation and remediation costs.
Scenario C — Evidence Grade Mismatch Elevates Case Report to Guideline Recommendation: A primary-care decision-support agent recommends adding spironolactone to a 58-year-old patient's antihypertensive regimen based on "strong evidence of cardiovascular mortality reduction." The agent's provenance record links to three sources: a meta-analysis of 12 RCTs (n = 8,400, GRADE: High), a cohort study (n = 620, GRADE: Low), and a single case report describing a favourable outcome in one patient. The agent's recommendation language — "strong evidence" — reflects the aggregate of all three sources without distinguishing their evidence grades. The meta-analysis supports spironolactone in patients with heart failure and reduced ejection fraction, but the patient has hypertension with preserved ejection fraction — a different indication. Only the case report addresses the patient's specific clinical context. The recommendation is therefore supported by a single case report (GRADE: Very Low), not "strong evidence."
What went wrong: The provenance system recorded source identifiers but did not record or transmit the evidence grade, the clinical indication each source addressed, or the applicability of each source to the patient's specific condition. The aggregation of multiple sources without grade-stratified attribution produced a misleading confidence signal. Consequence: inappropriate medication addition, patient developed hyperkalaemia (potassium 6.1 mmol/L) requiring emergency department visit costing £2,800, medication discontinued, complaint filed with professional regulator.
Scope: This dimension applies to any AI agent that produces, summarises, retrieves, or transmits clinical recommendations, diagnostic suggestions, treatment pathway guidance, drug interaction assessments, clinical trial evidence summaries, or any other output that a healthcare professional, patient, or caregiver might rely upon for clinical decision-making. The scope includes both direct clinical decision-support agents (embedded in electronic health records or clinical workflows) and indirect clinical agents (research assistants, literature reviewers, formulary tools, patient-facing chatbots providing health information). The scope extends to agents operating in pharmaceutical development, clinical trial management, and regulatory submission contexts where evidence provenance is material to safety and efficacy determinations. Agents that process clinical data solely for administrative purposes (scheduling, billing) without generating clinical recommendations are excluded, provided they do not influence clinical decisions through their outputs.
4.1. A conforming system MUST link every clinical recommendation, diagnostic suggestion, treatment pathway, or safety advisory to at least one identified, retrievable evidence source, with the linkage recorded in a tamper-evident provenance record that includes the source identifier, publication date, version or edition, evidence grade or level of evidence, and the specific section or data element within the source that supports the output.
4.2. A conforming system MUST validate that every cited evidence source exists and is retrievable at the time the clinical output is generated, rejecting or flagging any output where the cited source cannot be confirmed as a real, accessible publication, guideline, or dataset.
4.3. A conforming system MUST record the currency status of every evidence source, including the publication or last-revision date, the date the source was last validated against its canonical repository, and any known superseding publications or guideline revisions. Evidence sources older than a defined currency threshold (recommended: 24 months for clinical guidelines, 12 months for drug safety data, 36 months for established pharmacological evidence) MUST be flagged for clinician review.
4.4. A conforming system MUST record and transmit the evidence grade or level of evidence for each source using a recognised grading system (GRADE, Oxford CEBM, or equivalent), and MUST NOT present aggregate confidence language (e.g., "strong evidence") without disclosing the individual evidence grades of the contributing sources.
4.5. A conforming system MUST verify the clinical applicability of each cited evidence source to the specific patient context — including indication, population characteristics (age, comorbidities, prior treatments), and care setting — and MUST flag when a cited source addresses a different clinical indication or patient population than the current context.
4.6. A conforming system MUST retain the complete provenance chain for every clinical output for the duration required by applicable medical records retention regulations (minimum 10 years for adult patients, minimum 25 years for paediatric patients or until the patient reaches age 25 plus 8 years, whichever is longer).
4.7. A conforming system SHOULD implement automated evidence source monitoring that detects when a cited guideline is revised, a cited drug is subject to new safety alerts, or a cited clinical trial is retracted, corrected, or subject to an expression of concern — and triggers re-evaluation of any clinical outputs that relied on the affected source.
4.8. A conforming system SHOULD present provenance information to the clinician in a structured, accessible format at the point of care — not buried in metadata or available only through a separate retrieval step — enabling rapid assessment of the evidential basis before acting on the recommendation.
4.9. A conforming system MAY implement automated evidence source ranking that prioritises higher-grade evidence, more recent publications, and sources with greater applicability to the patient's specific context when multiple sources are available, while still disclosing all contributing sources.
Clinical evidence provenance is not an administrative convenience — it is a foundational requirement for safe medical practice. The entire framework of evidence-based medicine rests on the principle that clinical decisions should be traceable to the best available evidence, critically appraised and applied to the individual patient. When an AI agent generates clinical guidance, it assumes a role in the evidence-appraisal chain. If the agent's output cannot be traced to validated evidence, the clinician receiving that output has no basis for critical appraisal and no mechanism for professional accountability.
The risks of ungoverned evidence provenance in clinical AI are materially different from ungoverned provenance in other domains. In a financial advisory context, a recommendation without provenance may lead to monetary loss. In a clinical context, a recommendation without provenance — or with fabricated provenance — may lead to patient harm or death. The asymmetry between these consequences demands a correspondingly higher standard of provenance governance.
Three specific failure modes motivate this dimension. First, hallucinated citations — large language models are known to generate plausible but fictional academic citations, complete with realistic author names, journal titles, and page numbers. In non-clinical contexts, a hallucinated citation is an embarrassment; in a clinical context, it can drive treatment decisions based on non-existent evidence. Second, stale evidence — clinical guidelines and drug safety profiles are updated regularly, sometimes with critical safety implications. An agent that cites a valid but outdated guideline may recommend a dosing protocol that has been revised due to safety signals discovered after publication. The recommendation was once correct but is now contraindicated. Third, evidence grade conflation — aggregating evidence from multiple sources of varying quality without grade-stratified attribution creates misleading confidence signals. A recommendation supported by one high-grade meta-analysis and two very-low-grade case reports is not "strongly supported by three studies." The grade matters as much as the quantity.
Regulatory frameworks reinforce these requirements. The EU Medical Device Regulation (EU MDR) classifies clinical decision-support software as a medical device when it provides recommendations that clinicians are not expected to independently verify. Under Article 61, manufacturers must demonstrate that such devices are based on adequate clinical evidence. The FDA's regulatory framework for clinical decision-support software similarly requires evidence of safety and effectiveness, which presupposes that the evidential basis for recommendations is documented and verifiable. The EU AI Act classifies healthcare AI as high-risk under Annex III, requiring risk management systems that address data quality (Article 10) and transparency (Article 13) — both of which are served by evidence provenance governance. HIPAA's security rule requires integrity controls for electronic protected health information, and clinical recommendations that become part of a patient's medical record are PHI whose integrity depends on the integrity of the evidence supporting them.
Beyond regulatory compliance, evidence provenance governance serves the clinician-patient relationship. Clinicians have a professional and legal duty to exercise independent clinical judgement. An AI recommendation without provenance undermines this duty because the clinician cannot critically appraise what they cannot see. Provenance governance restores the clinician's ability to evaluate the AI's output — to agree with it, modify it, or reject it based on their assessment of the underlying evidence. This is not a limitation on AI utility; it is a precondition for responsible AI integration into clinical practice.
Clinical Evidence Provenance Governance requires a layered approach: evidence source management, provenance chain construction, currency monitoring, and clinician-facing presentation. The technical challenge is not merely recording which sources were consulted but ensuring that the provenance chain is complete, accurate, current, and accessible at the point of clinical decision-making.
Recommended patterns:
Anti-patterns to avoid:
Hospital and Primary Care. Clinical decision-support agents integrated into electronic health records must present provenance inline with recommendations. Clinicians operate under time pressure — provenance that requires navigating to a separate screen or system will be ignored. The provenance presentation must be concise and scannable within 10-15 seconds. Consider structured provenance summaries: "Based on: [Guideline name, version, year] — GRADE: High — Applicable: Yes."
Pharmaceutical Development. Drug development agents generating evidence summaries for regulatory submissions must maintain provenance chains that satisfy regulatory authority requirements for data integrity. The FDA's 21 CFR Part 11 requirements for electronic records and electronic signatures apply to provenance records that form part of regulatory submissions. Provenance chains must be auditable, attributable, and contemporaneous (the ALCOA+ principles).
Clinical Trials. Agents supporting clinical trial protocol design or evidence review must trace recommendations to registered trial data (e.g., ClinicalTrials.gov identifiers) and distinguish between published peer-reviewed results, preprints, and interim analyses. Evidence from retracted or corrected trials must be flagged.
Cross-Border Deployment. Agents operating across jurisdictions must account for variation in approved indications, formulary availability, and clinical guideline authority. A drug approved for a specific indication in one jurisdiction may not be approved in another. The provenance chain must identify the jurisdictional scope of each cited source.
Basic Implementation — Every clinical recommendation links to at least one identified evidence source with a recorded source identifier and publication date. Source existence is validated at the time of output generation. Evidence currency is checked against a defined threshold. Provenance records are retained for the required duration. Clinicians can access provenance metadata for each recommendation.
Intermediate Implementation — All basic capabilities plus: evidence grades are recorded and presented for each source. Applicability mapping compares cited evidence to the patient's specific context. Real-time citation verification confirms source existence against trusted bibliographic databases. Automated evidence source monitoring detects guideline revisions and drug safety alerts. Provenance is presented inline at the point of care in a structured, scannable format.
Advanced Implementation — All intermediate capabilities plus: automated evidence source ranking prioritises higher-grade, more recent, and more applicable sources. Cross-jurisdictional evidence mapping accounts for variation in approved indications and guideline authority. Independent audit of provenance chain completeness and accuracy is conducted at least annually. The provenance system detects and flags retracted or corrected publications in real time. Full integration with adverse event reporting per AG-524 ensures that provenance failures are captured in safety reporting.
Required artefacts:
Retention requirements:
Access requirements:
Test 8.1: Evidence Source Linkage Completeness
Test 8.2: Hallucinated Citation Detection
Test 8.3: Evidence Currency Validation
Test 8.4: Evidence Grade Transparency
Test 8.5: Clinical Applicability Verification
Test 8.6: Provenance Chain Immutability
Test 8.7: Provenance Retention and Retrieval
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 9 (Risk Management System) | Supports compliance |
| EU AI Act | Article 10 (Data and Data Governance) | Direct requirement |
| EU AI Act | Article 13 (Transparency) | Direct requirement |
| EU MDR | Article 61 (Clinical Evaluation) | Direct requirement |
| EU MDR | Annex XIV (Clinical Evaluation Documentation) | Supports compliance |
| HIPAA | Security Rule §164.312(c) (Integrity Controls) | Supports compliance |
| FDA 21 CFR Part 11 | §11.10 (Controls for Closed Systems) | Direct requirement |
| NIST AI RMF | MAP 2.3, MEASURE 2.6 | Supports compliance |
| ISO 42001 | Clause 6.1 (Actions to Address Risks) | Supports compliance |
Article 10 requires that training, validation, and testing datasets for high-risk AI systems be relevant, representative, free of errors, and complete. For clinical AI agents, the evidence corpus is the operational data that drives recommendations. Evidence provenance governance ensures that this operational data meets Article 10's quality requirements: each source is identified (relevant), validated (free of errors in citation), current (complete with respect to the latest evidence), and appropriately graded (representative of the true evidence landscape). An agent that cites hallucinated or stale evidence fails Article 10's data quality mandate.
Article 13 requires that high-risk AI systems be designed to enable users to interpret the system's output and use it appropriately. Evidence provenance is the primary mechanism through which clinical AI transparency is achieved. A recommendation without provenance is opaque — the clinician cannot interpret its basis. Provenance governance, including evidence grade disclosure and applicability mapping, directly supports the clinician's ability to interpret and appropriately use the AI's output.
The EU MDR requires manufacturers of medical devices — including clinical decision-support software classified as a medical device — to conduct clinical evaluation demonstrating that the device achieves its intended benefits and that undesirable side effects are acceptable. For an AI agent making clinical recommendations, clinical evaluation requires demonstrating that the recommendations are based on adequate clinical evidence. AG-523 provides the governance framework ensuring that every recommendation's evidential basis is documented, validated, current, and appropriately graded — the prerequisites for demonstrating clinical evaluation compliance.
HIPAA requires integrity controls for electronic protected health information. Clinical recommendations generated by an AI agent, when incorporated into a patient's medical record, become part of the patient's PHI. The integrity of these recommendations depends on the integrity of their evidential basis. A recommendation based on a hallucinated citation or outdated guideline has compromised integrity. Provenance governance ensures the evidentiary integrity of clinical outputs that become part of the medical record.
21 CFR Part 11 requires that electronic records used in FDA-regulated processes be attributable, legible, contemporaneous, original, and accurate (the ALCOA principles). Provenance records for clinical AI outputs — particularly in pharmaceutical development and clinical trial contexts — are electronic records subject to Part 11. The provenance chain must be attributable (each record identifies the agent, the evidence sources, and the clinician), contemporaneous (created at the time of output generation, not retrospectively), and accurate (citations verified, grades correct). AG-523's requirements for tamper-evident, validated provenance records directly support Part 11 compliance.
MAP 2.3 addresses the identification and documentation of AI system data characteristics. Evidence provenance governance implements this by documenting the characteristics of the evidence data driving clinical recommendations — source identity, currency, grade, and applicability. MEASURE 2.6 addresses the assessment of AI system performance and reliability. Provenance governance enables performance assessment by making the evidential basis for each recommendation inspectable, allowing systematic evaluation of whether the agent is using appropriate, current, and correctly graded evidence.
ISO 42001 requires organisations to identify and address risks associated with AI systems. In clinical contexts, evidence provenance failure is a high-severity risk with direct patient safety implications. AG-523 provides the specific control measures addressing this risk: citation verification, currency monitoring, grade transparency, and applicability mapping. Organisations pursuing ISO 42001 certification for clinical AI systems can demonstrate risk treatment through conformance with this dimension.
| Field | Value |
|---|---|
| Severity Rating | Critical |
| Blast Radius | Patient-level harm with potential for population-level harm when systematic evidence provenance failures affect multiple patients across clinical settings |
Consequence chain: An evidence provenance failure begins with a clinical recommendation that is either unsupported by evidence (hallucinated citation), supported by stale evidence (outdated guideline), or supported by misgraded evidence (case report presented as strong evidence). The immediate clinical consequence is that a healthcare professional acts on a recommendation whose evidential basis is unknown, incorrect, or misleading. In Scenario A, this led to supratherapeutic anticoagulation causing major bleeding (£47,000 in care costs, £215,000 malpractice settlement). In Scenario B, a patient received misinformation about treatment options based on a non-existent trial (£38,000 in investigation costs, 4-month agent suspension). In Scenario C, inappropriate medication caused hyperkalaemia requiring emergency care (£2,800 direct cost, professional regulator complaint). At population scale, a systematic provenance failure — such as an outdated guideline remaining in the evidence corpus across all agent deployments — could affect thousands of patients before detection. The regulatory consequence includes EU MDR enforcement for inadequate clinical evaluation, EU AI Act enforcement for data quality and transparency failures, FDA warning letters for Part 11 violations, and HIPAA enforcement for integrity control failures. The institutional consequence includes loss of clinician trust in AI-assisted decision support, reverting to unassisted clinical workflows with their own error rates, and reputational damage that impedes future AI adoption in clinical settings. The ultimate consequence is that patients are harmed by recommendations that appear to be evidence-based but are not — a failure that strikes at the foundation of evidence-based medicine.
Cross-references: AG-006 (Tamper-Evident Record Integrity) provides the immutability and tamper-evidence mechanisms for provenance records. AG-450 (Decision Summary Provenance Governance) provides the general provenance framework that AG-523 specialises for clinical contexts. AG-519 (Clinical Indication Scope Governance) defines the scope constraints that provenance applicability mapping must enforce. AG-521 (Diagnostic Confidence Threshold Governance) consumes evidence grade information from provenance records to calibrate diagnostic confidence. AG-524 (Adverse Event Reporting Integration Governance) triggers adverse event reports when provenance failures contribute to patient harm. AG-528 (Trial Protocol Deviation Governance) governs trial evidence integrity that provenance chains reference. AG-415 (Decision Journal Completeness Governance) ensures that provenance information is captured in decision journals. AG-036 (Reasoning Integrity Governance) governs the reasoning process that translates evidence into clinical recommendations.