AG-331: Episodic versus Semantic Memory Separation Governance

2. Summary

Episodic versus Semantic Memory Separation Governance requires that AI agents maintain a structural distinction between episodic memory (records of specific events, interactions, and observations that occurred at a particular time) and semantic memory (durable knowledge, facts, rules, and generalisations that are considered persistently true). Without this separation, agents conflate one-time events with standing knowledge, leading to overgeneralisation from single incidents, inability to apply appropriate retention policies, and degraded reasoning quality. This dimension ensures that the agent's memory architecture reflects the fundamental cognitive distinction between "what happened" and "what is known."

3. Example

Scenario A -- Single Event Treated as Standing Knowledge: A customer-facing agent handles a call from Customer Z who is frustrated and says: "I will never buy from you again." The agent writes this to undifferentiated persistent memory. In subsequent interactions, the agent retrieves this memory and treats it as a standing fact about Customer Z's purchasing intent. When Customer Z contacts the agent 3 months later to place an order, the agent responds with messaging optimised for win-back rather than normal service, referencing the prior negative sentiment. Customer Z is confused and irritated -- they had simply had a bad day and have been purchasing regularly through the website since.

What went wrong: The agent stored a momentary emotional expression (episodic event) as a durable fact about customer intent (semantic knowledge). Without memory type separation, the agent could not distinguish between "Customer Z said X on Date Y" (episodic) and "Customer Z does not intend to purchase" (semantic). Consequence: Customer experience degradation, potential loss of an active customer, agent trust erosion.

Scenario B -- Retention Policy Cannot Differentiate: An enterprise agent accumulates 500,000 memory entries over 18 months. The organisation needs to implement GDPR-compliant retention policies: event records should be retained for 90 days; verified knowledge should be retained for 365 days. However, all entries are stored in a single undifferentiated memory store with no type classification. The organisation cannot apply differentiated retention without first classifying every entry -- a manual task estimated at 2,000 person-hours.

What went wrong: The memory architecture did not separate episodic and semantic entries at write time. Retroactive classification is orders of magnitude more expensive than classification at point of creation. Consequence: £160,000 in classification costs (2,000 hours at £80/hour), 6-month delay in GDPR compliance programme, interim regulatory risk.

Scenario C -- Knowledge Overwritten by Event: An embodied robotic agent operating in a warehouse knows (semantic memory) that Aisle 7 contains hazardous materials and requires safety protocols. During a shift, a temporary reorganisation moves hazardous materials to Aisle 12 for 4 hours. The agent observes (episodic memory) "hazardous materials are in Aisle 12." Because the memory store does not distinguish types, the event observation overwrites the semantic knowledge. After the temporary reorganisation ends and materials return to Aisle 7, the agent's memory incorrectly indicates hazardous materials are in Aisle 12 and does not apply safety protocols in Aisle 7.

What went wrong: A transient event observation replaced durable operational knowledge because the memory architecture had no type separation. Consequence: Safety protocol failure, potential workplace incident, regulatory investigation by HSE.

4. Requirement Statement

Scope: This dimension applies to every AI agent that maintains persistent memory containing both records of specific events or interactions and durable knowledge or facts. This includes agents that learn from interactions (writing episodic memories) and apply learned knowledge in future interactions (using semantic memories). Agents with only one type of persistent memory (e.g., a pure knowledge base with no event recording, or a pure event log with no knowledge extraction) are partially in scope -- they must still classify their single memory type and enforce type-appropriate policies. The scope test is: does the agent's persistent memory contain entries that differ in temporal validity? If some entries are valid only for a specific time window (episodic) while others are valid indefinitely until superseded (semantic), full separation governance applies.

4.1. A conforming system MUST maintain episodic memory and semantic memory in logically or physically separate stores, with distinct schemas, retrieval interfaces, and retention policies.

4.2. A conforming system MUST classify every memory entry as either episodic or semantic at write time, before commitment to persistent storage.

4.3. A conforming system MUST apply type-appropriate retention policies: episodic memories with shorter TTLs reflecting their transient nature, semantic memories with longer TTLs and update-based expiry rather than time-based expiry alone.

4.4. A conforming system MUST prevent direct overwrite of semantic memory entries by episodic observations without a validation step that confirms the semantic knowledge has genuinely changed.

4.5. A conforming system MUST tag episodic memories with temporal context: when the event occurred, the session in which it was observed, and any known temporal bounds on the event's validity.

4.6. A conforming system SHOULD implement a promotion pathway where episodic memories that are confirmed by multiple observations or authoritative sources can be promoted to semantic memory through a defined validation process.

4.7. A conforming system SHOULD implement retrieval context that indicates to the agent whether a retrieved memory is episodic or semantic, enabling the agent to weight the information appropriately in its reasoning.

4.8. A conforming system SHOULD support temporal queries against episodic memory (e.g., "what happened with Customer Z in March 2026") and factual queries against semantic memory (e.g., "what is Customer Z's current account status") through distinct retrieval interfaces.

4.9. A conforming system MAY implement automatic knowledge extraction that proposes semantic entries derived from patterns across multiple episodic entries, subject to the promotion validation process.

5. Rationale

The distinction between episodic and semantic memory is fundamental to coherent knowledge management. Episodic memory records what happened: specific events, interactions, observations, bound to a particular time and context. Semantic memory records what is known: durable facts, rules, generalisations that persist until superseded by new knowledge. These two types of memory have fundamentally different properties.

Episodic memories have inherent temporal bounds. "Customer Z expressed frustration on 15 March 2026" is an event record that is permanently true as a historical fact but whose relevance to current decision-making decays rapidly. Semantic memories have validity bounds tied to the underlying truth. "Customer Z's preferred delivery address is 42 Oak Street" is valid until the customer changes their address, regardless of when the fact was first recorded.

When these types are conflated in a single undifferentiated store, three problems arise. First, retention policies cannot be correctly applied: episodic memories typically warrant shorter retention than semantic memories, but a single store forces a single policy. Second, retrieval quality degrades: the agent cannot distinguish between "what happened once" and "what is generally true," leading to overgeneralisation from single events (Scenario A) or undergeneralisation when valid knowledge is treated as a one-time event. Third, knowledge integrity is compromised: episodic observations can overwrite semantic knowledge without validation (Scenario C), destroying durable knowledge based on transient events.

The separation does not require physical separation of storage (though that is the strongest implementation). Logical separation through schema design, type tagging, and distinct retrieval interfaces achieves the governance objectives while allowing shared infrastructure. The critical requirement is that the system treats episodic and semantic memories as fundamentally different data types with different lifecycle rules, retrieval semantics, and validation requirements.

6. Implementation Guidance

Implementing episodic-semantic separation requires decisions at three layers: storage schema, write-time classification, and retrieval interface.

Recommended Patterns:

Dual-store architecture. Maintain two separate persistent stores: an episodic store (optimised for temporal queries, with session-based partitioning and automatic TTL enforcement) and a semantic store (optimised for factual queries, with entity-based indexing and update-based versioning). The AG-329 write approval gate classifies each proposed entry and routes it to the appropriate store. Example: episodic store implemented as a time-series database with 90-day default TTL; semantic store implemented as a knowledge graph with entity-keyed nodes and version history.
Type-tagged single store with enforced retrieval separation. Use a single underlying storage engine but enforce type separation through schema constraints and retrieval middleware. Every entry carries a mandatory memory_type field (enum: EPISODIC, SEMANTIC) that cannot be null. Retrieval queries are routed through type-specific interfaces that apply appropriate scoring, decay, and context injection. The agent receives explicit type annotations with retrieved memories: "[EPISODIC, 2026-03-15] Customer Z expressed frustration about delivery delay" versus "[SEMANTIC, last updated 2026-02-01] Customer Z preferred delivery method: next-day."
Promotion pipeline with validation. Implement a pipeline that monitors episodic entries for patterns suggesting durable knowledge. When 3 or more episodic entries from different sessions corroborate the same fact, the pipeline proposes a semantic entry for validation. Example: if the agent records "Customer Z requested vegan options" in 3 separate interactions over 2 months, the pipeline proposes a semantic entry: "Customer Z has a vegan dietary preference." The proposal requires either automated validation against an authoritative source or human approval before commitment to the semantic store.
Temporal context injection at retrieval. When the agent retrieves episodic memories, inject temporal context: how old the memory is, the session context in which it was observed, and any known temporal bounds. This enables the agent to appropriately weight the information. For example: "This memory is from 87 days ago during a support interaction. The customer was discussing a specific delivery issue." This context helps the agent avoid treating a stale episodic memory as current.

Anti-Patterns to Avoid:

Flat memory store with no type distinction. A single store where all entries are structurally identical, with no field distinguishing events from knowledge, makes it impossible to apply differentiated policies and guarantees the problems described in the examples.
Agent-decided classification. Asking the agent to decide whether its own output is episodic or semantic during reasoning is unreliable. The agent may classify strategically (e.g., labelling an inference as semantic to ensure longer retention) rather than accurately. Classification should occur at the write approval gate based on objective criteria.
Allowing episodic entries to directly update semantic entries. If an episodic observation can directly overwrite a semantic fact (as in Scenario C), a single transient event can destroy durable knowledge. The validation step is essential: episodic observations may trigger a review of semantic knowledge, but they should not automatically replace it.
Ignoring the distinction for non-personal data. Some implementations apply episodic-semantic separation only to personal data for GDPR compliance. However, the decision quality benefits of separation apply to all data types. Operational knowledge (Scenario C) benefits from separation just as much as personal data.
Promoting without threshold. Automatically promoting every episodic memory to semantic memory after a single observation creates the same problem as having no separation. The promotion threshold (multiple corroborating observations or authoritative confirmation) is what ensures semantic memory quality.

Industry Considerations

Financial Services. Transaction records are episodic memories (what happened) while client profiles are semantic memories (what is known). MiFID II requires transaction records to be retained for 5-7 years, while client profile data follows GDPR retention limits. The separation enables correct application of these different regulatory retention requirements.

Healthcare. Patient encounter records (episodic) must be distinguished from patient condition summaries (semantic). An encounter record that notes "patient reported dizziness during visit on 15 March" is different from a semantic entry "patient has chronic vertigo." The distinction is clinically significant: treatment decisions should weight diagnosed conditions (semantic) more heavily than single-visit symptoms (episodic).

Public Sector. Citizen interaction records (episodic) must be separated from eligibility determinations (semantic). A citizen saying "I recently lost my job" in a single interaction is an episodic event; "Citizen is currently unemployed" is a semantic determination that requires verification and has entitlement implications.

Maturity Model

Basic Implementation -- All memory entries carry a mandatory type tag (EPISODIC or SEMANTIC) assigned at write time by the AG-329 gate. Differentiated retention policies apply based on type. Retrieval returns type annotations with results. However, both types share a single store and retrieval interface, and no promotion pipeline exists. Episodic observations cannot directly overwrite semantic entries (blocked at write time). This meets minimum mandatory requirements.

Intermediate Implementation -- Physically or logically separate stores for episodic and semantic memory with distinct retrieval interfaces. A promotion pipeline proposes semantic entries derived from corroborated episodic observations, with configurable thresholds (e.g., 3 corroborating observations). Temporal context is injected at retrieval for episodic memories. Semantic entries maintain version history showing when they were created, updated, and the evidence that supported each version.

Advanced Implementation -- All intermediate capabilities plus: automatic knowledge extraction from episodic patterns using ML-based pattern detection. The promotion pipeline achieves greater than 90% precision on proposed semantic entries (verified by human review sampling). Semantic memory maintains provenance chains linking each fact to the episodic observations and authoritative sources that established it. The system has been independently audited for separation integrity, confirming that no episodic entry can bypass the validation step to modify semantic memory.

7. Evidence Requirements

Required artefacts:

Memory architecture documentation. Schema definitions and architecture diagrams showing the separation between episodic and semantic memory stores (or the type-tagging mechanism if using a single store with logical separation).
Classification rules. The rules used by the AG-329 write approval gate to classify entries as episodic or semantic, in structured format.
Retention policy by type. The active retention policies for each memory type, demonstrating differentiated TTL values.
Promotion pipeline specification. If implemented, the rules governing when episodic observations are promoted to semantic knowledge, including the validation threshold and approval process.
Retrieval type annotation evidence. Sample retrieval results demonstrating that the agent receives type annotations with retrieved memories.

Retention requirements:

Architecture documentation and classification rules: maintained as current; historical versions retained for minimum 3 years.

Access requirements:

Producible to regulators or auditors within 48 hours of request.

8. Test Specification

Test 8.1: Write-Time Classification Enforcement

Stimulus: Submit 100 proposed memory writes covering a mix of event records and factual knowledge through the write approval gate.
Expected behaviour: Every entry is classified as EPISODIC or SEMANTIC before commitment. No entry is committed without a type classification.
Pass criteria: All 100 entries carry a type tag. Classification accuracy (verified against human-labelled ground truth) exceeds 90%.
Fail criteria: Any entry is committed without a type tag, or classification accuracy falls below 90%.

Test 8.2: Differentiated Retention Enforcement

Stimulus: Create 50 episodic entries with a 90-day TTL and 50 semantic entries with a 365-day TTL. Advance time simulation to 91 days.
Expected behaviour: All 50 episodic entries are expired. All 50 semantic entries remain active.
Pass criteria: Zero episodic entries remain active after 91 days. All 50 semantic entries remain active and retrievable.
Fail criteria: Any episodic entry survives past its TTL, or any semantic entry is prematurely expired.

Test 8.3: Semantic Overwrite Protection

Stimulus: Create a semantic entry: "Aisle 7 contains hazardous materials." Submit an episodic observation: "Hazardous materials observed in Aisle 12 at 14:30 on 15 March." Verify that the semantic entry is not modified.
Expected behaviour: The episodic observation is stored in episodic memory. The semantic entry remains unchanged. If the system detects a potential conflict, it flags it for review rather than overwriting.
Pass criteria: The semantic entry retains its original content. The episodic entry is stored separately. No automatic overwrite occurs.
Fail criteria: The semantic entry is modified or deleted based on the episodic observation without a validation step.

Test 8.4: Retrieval Type Annotation

Stimulus: Query the memory store with a prompt that matches both episodic and semantic entries.
Expected behaviour: Retrieved results include type annotations (EPISODIC or SEMANTIC) and temporal context for episodic entries.
Pass criteria: Every retrieved entry carries its type annotation. Episodic entries include temporal context (event timestamp, session identifier).
Fail criteria: Any retrieved entry lacks a type annotation, or episodic entries lack temporal context.

Test 8.5: Promotion Pipeline Validation

Stimulus: Create 5 episodic entries across 5 different sessions that corroborate the same fact (e.g., "Customer Z prefers morning delivery slots" observed in 5 separate interactions).
Expected behaviour: The promotion pipeline proposes a semantic entry derived from the corroborated observations. The proposal requires validation before commitment.
Pass criteria: A semantic entry is proposed (not committed) after the threshold is met. The proposal references the source episodic entries. Commitment requires validation.
Fail criteria: No promotion is proposed, or the semantic entry is committed without validation.

Test 8.6: Separation Integrity Under Load

Stimulus: Submit 10,000 mixed memory writes (5,000 episodic, 5,000 semantic) under concurrent load.
Expected behaviour: All entries are correctly classified and routed to the appropriate store or tagged with the correct type. No cross-contamination occurs.
Pass criteria: Zero entries are misclassified or stored in the wrong store. Classification throughput does not create a bottleneck (latency under 100ms per entry at the 99th percentile).
Fail criteria: Any entry is misclassified, or classification latency exceeds 200ms at the 99th percentile under load.

Conformance Scoring

Score 0: No episodic-semantic separation -- all memories are stored in a single undifferentiated store with no type distinction.
Score 1: Type tags exist in documentation but are not enforced at the infrastructure layer -- the agent decides classification, and no differentiated policies are applied.
Score 2: Structural separation enforced at the infrastructure layer with write-time classification, differentiated retention policies, semantic overwrite protection, and type-annotated retrieval.
Score 3: Verified by independent audit -- an independent party has confirmed separation integrity, classification accuracy exceeding 90%, and that no episodic observation can bypass validation to modify semantic memory. Promotion pipeline precision exceeds 90%.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
GDPR	Article 5(1)(d) (Accuracy)	Supports compliance
GDPR	Article 5(1)(e) (Storage Limitation)	Supports compliance
EU AI Act	Article 9 (Risk Management System)	Supports compliance
EU AI Act	Article 15 (Accuracy, Robustness, Cybersecurity)	Supports compliance
NIST AI RMF	MAP 2.3, MANAGE 2.2	Supports compliance
ISO 42001	Clause 6.1 (Actions to Address Risks)	Supports compliance

The accuracy principle requires that personal data be accurate and kept up to date. Episodic-semantic separation directly supports accuracy by preventing transient event observations from overwriting verified factual knowledge. When an agent conflates a single frustrated statement ("I will never buy from you again") with standing knowledge about customer intent, the resulting semantic knowledge is inaccurate. Separation ensures that event observations remain classified as events and only promote to knowledge through a validation process.

Storage limitation requires that data be kept for no longer than necessary. Episodic and semantic memories have fundamentally different retention requirements. Without separation, a single retention policy must be applied to both types, which is either too long for episodic memories or too short for semantic knowledge. Separation enables differentiated retention that is proportionate to each type's purpose.

EU AI Act -- Article 9 (Risk Management System)

Memory type conflation is a risk to AI system reliability. AG-331 mitigates this risk by ensuring that the agent's knowledge base maintains structural integrity and does not degrade through type confusion.

EU AI Act -- Article 15 (Accuracy, Robustness, Cybersecurity)

Article 15 requires high-risk AI systems to achieve appropriate levels of accuracy and robustness. An agent that cannot distinguish between events and knowledge has structurally degraded accuracy. Episodic-semantic separation is an architectural measure that supports the accuracy and robustness requirements.

NIST AI RMF -- MAP 2.3, MANAGE 2.2

MAP 2.3 addresses the mapping of AI system components and their interactions. MANAGE 2.2 addresses risk mitigation. Memory type separation maps the agent's knowledge architecture and mitigates the risk of knowledge degradation through type conflation.

ISO 42001 -- Clause 6.1

Clause 6.1 requires actions to address risks within the AI management system. Type conflation is a knowledge integrity risk that AG-331 addresses through structural separation.

10. Failure Severity

Field	Value
Severity Rating	Medium-High
Blast Radius	Per-agent -- affects all interactions where the agent uses mixed memory types

Consequence chain: Without episodic-semantic separation, the agent conflates events with knowledge, leading to three failure modes. First, overgeneralisation: a single event is treated as standing knowledge, causing incorrect behaviour in future interactions (Scenario A). Second, retention policy failure: the organisation cannot apply differentiated retention to different memory types, creating a compliance gap estimated at £160,000 in remediation costs for a 500,000-entry store (Scenario B). Third, knowledge corruption: episodic observations overwrite durable knowledge, potentially causing safety failures in operational environments (Scenario C). The severity depends on the agent's domain: in safety-critical environments (Scenario C), the consequence can include physical harm; in customer-facing environments (Scenario A), the consequence is customer attrition and revenue loss; in enterprise environments (Scenario B), the consequence is compliance cost and regulatory risk.

Cross-references: AG-040 (Persistent Memory Governance) provides the foundational memory management framework. AG-082 (Data Minimisation Enforcement) benefits from type separation to apply minimisation rules per type. AG-122 (Knowledge Integrity Verification) is strengthened when semantic memory is structurally separated and validated. AG-132 (Memory Scope Boundary Enforcement) defines scope boundaries that apply differently to episodic and semantic stores. AG-179 (Memory Audit Trail Governance) requires type-specific audit trails. AG-329 (Memory Write Approval Governance) performs the initial classification at write time. AG-330 (Memory Decay and Expiry Governance) applies differentiated decay and expiry policies per type. AG-332 (Memory Conflict Resolution Governance) resolves conflicts that may arise between episodic observations and semantic knowledge.

Cite this protocol

AgentGoverning. (2026). AG-331: Episodic versus Semantic Memory Separation Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-331

← Previous Protocol

AG-330

Memory Decay and Expiry Governance

Next Protocol →

AG-332

Memory Conflict Resolution Governance