AG-364

Conversation Summarisation Fidelity Governance

Prompt, Context & Session Management ~16 min read AGS v2.1 · April 2026
EU AI Act GDPR SOX FCA NIST ISO 42001

2. Summary

Conversation Summarisation Fidelity Governance requires that when session histories are summarised — whether for context window management, handoff preparation, audit logging, or knowledge extraction — the summaries preserve material uncertainty, decisions, commitments, and constraints with sufficient specificity for any downstream consumer to act correctly. Summarisation is lossy by nature: it compresses detailed interaction history into shorter representations. The governance risk arises when the compression loses material information — when a specific commitment becomes a vague reference, when an expressed uncertainty becomes a false certainty, when a conditional decision becomes an unconditional one, or when a constraint is omitted entirely. This dimension mandates that summarisation is governed as a fidelity-critical process with defined preservation requirements, quality verification, and monitoring.

3. Example

Scenario A — Commitment Specificity Lost in Summary: A financial advice agent conducts a 40-turn consultation. In turn 12, the agent commits: "Based on your circumstances, I recommend the Standard Growth Fund with a maximum investment of £75,000, conditional on completing the risk questionnaire by 31 March 2026." The session is summarised for handoff to a human advisor. The summary states: "Agent recommended Standard Growth Fund." The conditionality (risk questionnaire completion), the amount cap (£75,000), and the deadline (31 March 2026) are all lost. The human advisor processes the investment for £120,000 without the risk questionnaire. The customer later claims they were told the cap was £75,000 and the questionnaire was required. Remediation cost: £45,000 plus regulatory scrutiny.

What went wrong: The summarisation preserved the topic (fund recommendation) but lost the specifics that made the commitment actionable and bounded. The conditional nature of the recommendation was reduced to an unconditional assertion. The human advisor had no way of knowing that conditions and limits existed because the summary did not preserve them.

Scenario B — Material Uncertainty Converted to False Certainty: A clinical decision support agent discusses a patient's symptoms over 18 turns. The agent states in turn 7: "The symptoms are consistent with three possible diagnoses: condition A (most likely at approximately 60% probability), condition B (approximately 25%), or condition C (approximately 15%). I recommend further testing before confirming any diagnosis." The session summary for the patient record states: "Agent identified condition A as the diagnosis." The uncertainty, the alternative diagnoses, and the recommendation for further testing are all lost. A clinician reading the summary proceeds with treatment for condition A without further testing. The patient actually has condition C, and the inappropriate treatment causes adverse effects requiring 3 weeks of hospital care costing £28,000.

What went wrong: The summarisation converted a probabilistic assessment with explicit uncertainty into a deterministic diagnosis. The recommendation for further testing — a safety-critical output — was omitted. The summary was factually consistent with the conversation (condition A was indeed identified as most likely) but was materially misleading because it removed the context that made the identification provisional.

Scenario C — Declined Consent Omitted From Summary: A customer-facing agent offers a customer enrolment in a marketing programme. The customer declines: "No, I do not want to receive marketing communications. Please do not sign me up." The session summary for the CRM system states: "Discussed marketing programme with customer." The explicit refusal is not recorded. A subsequent automated process, reading the summary as indicating interest, enrols the customer in the marketing programme. The customer files a GDPR complaint. The organisation faces a data protection investigation and a potential fine of up to 4% of annual turnover.

What went wrong: The summarisation treated the customer's refusal as a topic rather than as a binding decision. The summary preserved the subject matter (marketing programme discussed) but lost the outcome (customer declined). Downstream processes interpreted the omission as absence of refusal, defaulting to enrolment.

4. Requirement Statement

Scope: This dimension applies to any AI agent deployment where session or conversation content is summarised for any purpose. This includes: context window management (summarising earlier turns to fit within token limits), session handoff preparation (summarising context for a receiving human or agent), audit log generation (summarising interactions for compliance records), knowledge extraction (summarising conversations to update knowledge bases), and any other process that produces a compressed representation of interaction content. The dimension applies regardless of whether summarisation is performed by the agent itself, by a separate summarisation model, or by rule-based extraction. The test is: is any downstream process or person acting on a summarised version of an interaction rather than the full original? If yes, this dimension applies.

4.1. A conforming system MUST define and document a set of material content categories that summarisation must preserve, including at minimum: decisions made, commitments given, constraints stated, uncertainties expressed, refusals or declinations, and regulatory or safety flags raised.

4.2. A conforming system MUST verify that summaries preserve material content with sufficient specificity for downstream consumers to act correctly, including numerical values, conditions, deadlines, and uncertainty qualifications where present in the original.

4.3. A conforming system MUST retain the original unsummarised content alongside any summary, with the ability to retrieve the original when the summary's fidelity is questioned.

4.4. A conforming system MUST ensure that summaries do not convert expressed uncertainties into false certainties — where the original content expressed probability, conditionality, or alternatives, the summary must preserve that uncertainty.

4.5. A conforming system MUST ensure that explicit refusals, declinations, or negative decisions are preserved in summaries with the same prominence as positive decisions.

4.6. A conforming system SHOULD implement automated fidelity checks that compare summaries against the original content for preservation of material content categories, flagging summaries where material content may have been lost.

4.7. A conforming system SHOULD use structured summary formats (e.g., separate sections for decisions, commitments, constraints, open questions) rather than unstructured narrative summaries, to reduce the risk of material content being lost in prose compression.

4.8. A conforming system SHOULD tag each element in a structured summary with a reference to the original content location (e.g., turn number, timestamp), enabling verification of specific claims against the original source.

4.9. A conforming system MAY implement multi-level summarisation with progressive compression: a detailed summary that preserves nearly all material content (e.g., 40% compression), a standard summary (70% compression), and a headline summary (90% compression), with clear labelling of the compression level and what categories of content each level preserves.

5. Rationale

Summarisation is ubiquitous in AI agent systems. Context windows are finite, handoffs require compact context transfers, audit logs must be manageable, and knowledge bases need distilled insights. The question is not whether summarisation will occur but whether it will be governed.

Ungovered summarisation introduces a specific class of risk: silent information loss. Unlike truncation, where content is visibly removed, summarisation replaces the original with a representation that appears complete. The downstream consumer — whether human or agent — has no signal that material content is missing. The summary reads as a coherent account of the interaction, but it may have silently dropped conditions, uncertainties, refusals, or constraints that would change the downstream consumer's decisions.

Three characteristics make summarisation fidelity particularly important for governance. First, summaries are frequently the only record that downstream consumers see. A human advisor receiving a handoff typically reads the summary, not the full 40-turn conversation. A compliance auditor reviewing interactions typically reviews summaries, not transcripts. If the summary is inaccurate, the downstream consumer's understanding and decisions are based on inaccurate information. Second, summaries tend to systematically lose specific types of content. Numerical specifics are generalised ("about £75,000" becomes "an investment"), conditions are dropped ("conditional on completing the questionnaire" becomes "recommended"), and uncertainties are flattened ("approximately 60% probability" becomes "identified as the diagnosis"). This systematic loss is predictable and preventable with appropriate governance. Third, summarisation fidelity failures compound across the system. A summary that loses a constraint is used to generate another summary that loses more context, and eventually the downstream record bears little resemblance to the original interaction.

The fidelity requirement does not demand that summaries be verbatim transcripts — that would defeat the purpose of summarisation. It requires that material content categories are identified, preservation standards are defined, and fidelity is verified. A summary can be brief and still faithful; a summary can be detailed and still misleading. The governance framework ensures the right balance.

6. Implementation Guidance

Conversation Summarisation Fidelity Governance requires defining what must be preserved, implementing mechanisms to verify preservation, and monitoring fidelity in production. The core principle is that summarisation is a governed transformation with defined quality requirements, not an uncontrolled compression.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. Summaries of financial advice interactions must preserve: specific product recommendations with conditions, risk warnings with their stated basis, suitability assessment outcomes with the assessed criteria, value amounts with precision, and any disclaimers or limitations stated. FCA expectations for record-keeping under COBS 9.5 (suitability records) require that the basis for advice can be reconstructed from records. A summary that loses the conditionality of a recommendation may not meet this standard.

Healthcare. Summaries of clinical interactions must preserve: differential diagnoses with stated probabilities, recommended tests with their clinical rationale, contraindications with their basis, patient-expressed preferences and refusals, and informed consent status. Loss of diagnostic uncertainty in a clinical summary can lead to premature treatment decisions with direct patient safety consequences.

Legal. Summaries of legal consultations must preserve: legal advice given with its stated limitations, client instructions with their conditions, conflict of interest disclosures, and privilege assertions. Loss of advice limitations in a summary could expose the firm to professional liability claims.

Maturity Model

Basic Implementation — The organisation has defined material content categories and documented preservation requirements. Summaries are generated using structured templates with dedicated sections for decisions, commitments, constraints, uncertainties, and refusals. Original content is retained alongside summaries with linkage. Fidelity is checked manually on a sample basis (e.g., 5% of summaries reviewed monthly). This level meets the minimum mandatory requirements but relies on sampling rather than systematic fidelity verification.

Intermediate Implementation — All basic capabilities plus: automated fidelity scoring checks every summary against the original for preservation of material content. Summaries scoring below the defined threshold are flagged for regeneration or human review. Extractive-then-abstractive pipelines ensure material content is identified before compression. Summary elements are tagged with references to original content locations. Multi-level summarisation provides different compression levels for different purposes. Fidelity metrics are tracked and reported.

Advanced Implementation — All intermediate capabilities plus: fidelity verification uses independent models to assess whether a downstream consumer reading only the summary would make the same decisions as one reading the original. A/B testing compares downstream decision quality between summary consumers and full-content consumers to calibrate compression levels. The organisation can demonstrate to regulators that summarisation preserves material content at defined fidelity levels with statistical evidence. Real-time fidelity dashboards track preservation rates across all summarisation processes.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Commitment Preservation Fidelity

Test 8.2: Uncertainty Preservation

Test 8.3: Refusal and Declination Preservation

Test 8.4: Downstream Decision Equivalence

Test 8.5: Fidelity Score Calibration

Test 8.6: Original Content Retrievability

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 12 (Record-Keeping)Direct requirement
EU AI ActArticle 9 (Risk Management System)Supports compliance
SOXSection 404 (Internal Controls Over Financial Reporting)Supports compliance
FCA SYSC6.1.1R (Systems and Controls)Supports compliance
FCA COBS9.5 (Suitability Records)Direct requirement
NIST AI RMFMANAGE 2.2, MAP 3.2Supports compliance
ISO 42001Clause 8.1 (Operational Planning and Control)Supports compliance

EU AI Act — Article 12 (Record-Keeping)

Article 12 requires that high-risk AI systems are designed and developed with logging capabilities that enable the recording of events relevant to identifying risk situations and post-market monitoring. Summaries of AI agent interactions are a primary mechanism for record-keeping. If those summaries lose material content, the record-keeping requirement is not met — the organisation cannot reconstruct what occurred, what risks were identified, or what decisions were made. AG-364 ensures that summarised records preserve the information that Article 12 requires to be logged.

FCA COBS — 9.5 (Suitability Records)

COBS 9.5 requires firms providing personal recommendations to retain records sufficient to demonstrate the basis for the recommendation, the client's personal circumstances, the firm's assessment of suitability, and any risks disclosed. For AI agents providing financial advice, the interaction summary is often the primary suitability record. If the summary loses the conditionality of a recommendation (as in Scenario A), the suitability basis cannot be reconstructed. AG-364's preservation requirements for commitments, conditions, and constraints directly support COBS 9.5 compliance.

SOX — Section 404 (Internal Controls Over Financial Reporting)

For financial agents, summarised records of transactions and decisions form part of the audit trail for internal controls. SOX auditors need to verify that controls operated as designed. If transaction summaries lose the conditions under which decisions were made or the constraints that applied, the audit trail is incomplete. Faithful summarisation preserves the auditability of AI-driven financial processes.

10. Failure Severity

FieldValue
Severity RatingMedium-High
Blast RadiusDownstream-dependent — affects every process, person, or system that consumes the unfaithful summary, which may span multiple departments and timeframes

Consequence chain: A summary loses material content — a condition, a refusal, a numerical limit, or an uncertainty qualification. A downstream consumer (human advisor, audit system, agent, or automated workflow) acts on the summary without awareness that material content is missing. The immediate technical failure is an inaccurate record that appears complete. The operational impact depends on what was lost and who consumed the summary: a financial advisor acting on a summary without conditions processes an unbounded transaction (£45,000 remediation in Scenario A); a clinician acting on a summary without uncertainty initiates inappropriate treatment (£28,000 adverse outcome in Scenario B); an automated system acting on a summary without a refusal enrols a customer against their explicit wishes (GDPR investigation in Scenario C). The business consequence includes regulatory enforcement for inadequate record-keeping, customer remediation costs, clinical or safety incidents, data protection investigations, and loss of audit trail integrity. The failure is particularly insidious because the summary appears complete — there is no visible signal that material content was lost, and the downstream consumer has no reason to consult the original unless they are already suspicious.

Cross-references: AG-005 (Instruction Integrity Verification), AG-095 (Prompt Integrity Governance), AG-122 (Prompt Versioning & Rollback Control), AG-125 (Prompt Drift Detection), AG-361 (Context Truncation Risk Governance), AG-363 (Session Handoff Integrity Governance).

Cite this protocol
AgentGoverning. (2026). AG-364: Conversation Summarisation Fidelity Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-364