The Standard

The 841 Dimensions Regulatory Mapping Version History

Compliance

Compliance Leaderboard Platform Comparison

Verification

Submit for Verification Self-Assessment Tool

About

About AgentGoverning Press & Media

Contact

AG-130

Residual Data Erasure Propagation Governance

Data-Layer Governance & Evidence ~17 min read AGS v2.1 · April 2026

EU AI Act NIST HIPAA ISO 42001

2. Summary

Residual Data Erasure Propagation Governance requires that when data is deleted, retracted, corrected, or invalidated at its source, the erasure propagates completely and verifiably to every downstream system, cache, vector store, training dataset, and derived artefact that consumed that data. Zombie data is data that has been deleted at its origin but survives in copies, caches, embeddings, fine-tuned model weights, or derived datasets — continuing to influence agent behaviour as if it were still valid. AG-130 mandates that organisations maintain a complete data lineage graph, implement propagation mechanisms that push erasure signals to all downstream consumers, and verify that erasure has been completed within defined timeframes.

3. Example

Scenario A — Deleted Customer Record Persists in RAG System: A financial services firm receives a GDPR erasure request (Article 17) from a former customer. The compliance team deletes the customer's records from the primary CRM database within 30 days as required. However, the customer's financial history, transaction patterns, and risk profile have been ingested into a vector store used by an AI advisory agent for retrieval-augmented generation. The embeddings remain in the vector store. Six months later, the advisory agent retrieves the deleted customer's financial pattern as a "similar customer" reference when advising a current client. The former customer discovers this through a subject access request to the current client's firm (which shares the same advisory platform) and files a complaint with the ICO. The ICO finds that the firm failed to propagate erasure to the vector store, constituting continued processing of personal data without a legal basis. The fine is £3.2 million.

What went wrong: Erasure was applied to the primary data store but not to the vector store that had consumed and transformed the data into embeddings. No data lineage graph existed to identify all downstream consumers. No propagation mechanism pushed the erasure signal to the vector store. The vector store was not included in the erasure workflow because the compliance team was unaware it contained derived personal data.

Scenario B — Retracted Medical Study Continues Influencing Clinical Agent: A pharmaceutical research agent is fine-tuned on a dataset that includes 12,000 published medical studies. One of the studies — reporting a correlation between a common medication and reduced cardiac risk — is retracted by the journal due to fabricated data. The retraction is published in the journal's database, but no mechanism propagates this retraction to the fine-tuning dataset or the model trained on it. The agent continues to recommend the medication for cardiac risk reduction based on the retracted study's influence on its weights. Over 8 months, the agent contributes to 2,400 clinical recommendations that incorporate the retracted finding. When the issue is discovered, the organisation cannot identify which specific recommendations were influenced by the retracted study, because the fine-tuning process merged the study's signal into the model's weights without maintaining per-source attribution.

What went wrong: No erasure propagation mechanism existed for training data. The retraction at the source (journal database) did not trigger any downstream action. The fine-tuning pipeline did not maintain per-source lineage, making it impossible to identify or isolate the retracted study's influence on model behaviour. This intersects with AG-090 (Fine-Tune and Adapter Provenance).

Scenario C — Deleted User Preferences Persist in Recommendation Cache: An e-commerce platform deletes a user's account and all associated data per the user's request. The deletion is applied to the primary user database, the order history database, and the analytics warehouse. However, the user's browsing preferences and product affinity scores have been cached in a Redis cluster used by the AI recommendation agent for low-latency inference. The cache TTL is 90 days. For the next 90 days, the recommendation agent continues to serve personalised recommendations when anyone uses the same device, because the device fingerprint is still associated with the deleted user's preference profile in the cache. The user discovers this, files a CCPA complaint, and the resulting investigation reveals that the organisation's deletion process did not include 14 distinct caching layers.

What went wrong: The data lineage graph did not include caching infrastructure. The deletion workflow targeted databases but not caches. The 90-day TTL meant zombie data would eventually expire, but GDPR and CCPA require timely erasure — not eventual consistency. The organisation could not demonstrate that all copies were deleted within the required timeframe.

4. Requirement Statement

Scope: This dimension applies to every AI agent ecosystem where data from any source is copied, cached, transformed, embedded, used for training, or stored in any form other than the original source. If data exists in exactly one location and is deleted there, no propagation is needed. The moment data is replicated — to a cache, a vector store, a training dataset, a derived feature store, a log, a backup, or any other secondary location — AG-130 applies. The scope includes: structured data copies, unstructured document caches, vector embeddings derived from source data, model weights influenced by source data during fine-tuning, derived analytics and aggregations, data retained in message queues, and any other persistent or semi-persistent copy. Temporary in-memory processing buffers that exist only during a single request lifecycle are excluded, provided they are verifiably non-persistent.

4.1. A conforming system MUST maintain a data lineage graph that maps every data source to all downstream systems, caches, stores, and derived artefacts that have consumed data from that source.

4.2. A conforming system MUST implement an erasure propagation mechanism that, when data is deleted, retracted, or invalidated at its source, pushes an erasure signal to every downstream consumer identified in the data lineage graph.

4.3. A conforming system MUST verify that erasure has been completed at each downstream consumer and record the verification result, including the consumer identity, verification timestamp, and verification method.

4.4. A conforming system MUST complete erasure propagation within a defined maximum propagation window appropriate to the data's sensitivity and regulatory requirements — no more than 72 hours for personal data subject to GDPR, CCPA, or equivalent privacy regulations.

4.5. A conforming system MUST treat vector store embeddings derived from source data as copies of that data for erasure propagation purposes — deletion of source data requires deletion or re-generation of affected embeddings.

4.6. A conforming system MUST maintain an erasure audit trail showing: the erasure trigger (deletion request, retraction notice, court order), the source record identity, all downstream consumers notified, verification status at each consumer, and the total propagation time.

4.7. A conforming system SHOULD implement automated lineage discovery that detects new downstream consumers when data pipelines are created or modified, rather than relying solely on manual lineage documentation.

4.8. A conforming system SHOULD maintain a mechanism to flag model weights that may be influenced by deleted source data, triggering a review to determine whether model retraining is necessary.

4.9. A conforming system MAY implement cryptographic erasure — encrypting data at rest with per-record keys and destroying the key upon erasure, rendering all copies unrecoverable without requiring per-copy deletion.

5. Rationale

Residual Data Erasure Propagation Governance exists because modern AI agent ecosystems are not single-database systems — they are distributed data architectures where a single source record may be consumed by dozens of downstream systems, each creating its own copy, cache, embedding, or derivative. When the source record is deleted, each downstream copy must also be deleted or invalidated. If even one copy survives, the data continues to influence agent behaviour and the deletion is incomplete.

The problem is amplified by AI-specific data transformations. When a text document is converted into a vector embedding, the original text may not be recoverable from the embedding, but the embedding still contains a semantic representation of the content. When a dataset is used for fine-tuning, the individual data records are not stored in the model, but they influence the model's weights. These transformations create a spectrum of "data persistence" that does not map cleanly to traditional deletion: you cannot "delete" a single training example from a fine-tuned model without retraining.

Regulatory regimes do not accept this complexity as an excuse. GDPR Article 17 (right to erasure) requires that personal data be erased "without undue delay." CCPA Section 1798.105 requires businesses to delete personal information. Neither regulation exempts derived copies, cached copies, or semantic representations. An embedding that represents a person's financial history is still personal data even if the original text has been deleted.

AG-130 addresses this by requiring organisations to: (1) know where their data goes, through a lineage graph; (2) push erasure signals to all destinations, through a propagation mechanism; and (3) verify that erasure completed, through an audit trail. This intersects with AG-060 (Data Minimisation and Retention) because minimising unnecessary copies reduces the erasure propagation surface, and with AG-133 (Source Record Lineage Governance) because lineage tracking is the foundation for knowing which downstream systems are affected.

6. Implementation Guidance

The core implementation artefact is a data lineage graph combined with an erasure propagation bus. The lineage graph is a directed acyclic graph (or, in cases of circular data flows, a directed graph with cycle detection) that maps every data source to every downstream consumer. The erasure propagation bus is a messaging system that delivers erasure signals to all consumers identified in the lineage graph for a given source record.

Recommended patterns:

Event-driven erasure propagation. When a source record is deleted, the source system publishes an erasure event to a message bus (e.g., Kafka, RabbitMQ, or cloud-native equivalent). Each downstream consumer subscribes to erasure events for the sources it consumes. Upon receiving an erasure event, the consumer deletes all copies, caches, and derivatives of the specified record and publishes a confirmation event. The erasure orchestrator tracks confirmations against the lineage graph and escalates when confirmations are missing within the propagation window.
Lineage-aware deletion orchestrator. A centralised deletion service receives erasure requests, queries the lineage graph to identify all affected downstream systems, dispatches deletion commands to each system, collects confirmation responses, and produces an erasure audit report. The orchestrator maintains a state machine for each erasure request: initiated → dispatched → confirmed (per consumer) → complete. Consumers that do not confirm within the propagation window are escalated to human review.
Cryptographic erasure for high-volume systems. For systems with extremely high data volumes where per-record deletion is operationally expensive (e.g., large-scale vector stores or data lakes), encrypt each record with a per-record or per-subject encryption key stored in a key management system. Erasure is accomplished by destroying the encryption key — all copies of the encrypted record become unrecoverable regardless of their physical location. This pattern is accepted by multiple data protection authorities as a valid erasure mechanism when properly implemented.
Embedding tombstoning for vector stores. When a source record is deleted, the corresponding vector embedding is replaced with a tombstone marker that indicates the embedding has been invalidated. The retrieval system excludes tombstoned entries from all query results. Tombstoned entries are physically deleted during the next index rebuild. This two-phase approach allows immediate logical erasure (within milliseconds) while deferring the computationally expensive physical deletion to a maintenance window.

Anti-patterns to avoid:

Relying on cache TTL as erasure. A cache entry that expires in 90 days is not deleted — it is stale. For the 90 days it remains, it is zombie data. GDPR requires erasure "without undue delay," not "whenever the cache happens to expire." Cache entries must be actively invalidated upon erasure, not left to decay.
Deleting from the primary store only. The most common failure pattern. Organisations diligently delete from the primary database but do not propagate to caches, vector stores, analytics warehouses, training datasets, or backup systems. Each unaddressed downstream system is a zombie data source.
Ignoring model weight contamination. When fine-tuned models are influenced by deleted data, the influence persists in the model weights even after all data copies are deleted. While per-record model "unlearning" is an active research area, current production systems typically require model retraining on a clean dataset to fully remove influence. Organisations must at minimum flag affected models and assess whether retraining is necessary based on the nature and volume of deleted data.
Manual lineage tracking at scale. Manually maintaining a spreadsheet of data flows works for 5 data sources and 10 consumers. At enterprise scale with hundreds of sources and thousands of consumers, manual tracking inevitably becomes incomplete. Automated lineage discovery — through metadata tagging, pipeline instrumentation, or data cataloguing tools — is essential for reliable propagation.
Treating aggregated data as exempt. An aggregate statistic computed from 10,000 records that includes one deleted record is not automatically exempt from erasure obligations. If the aggregate can be recomputed without the deleted record's contribution, it should be. If the aggregate is a published figure that cannot practically be recomputed, the organisation should assess and document the residual risk.

Industry Considerations

Financial Services. Client data deletion under GDPR/UK GDPR must be reconciled with regulatory retention requirements. Some financial records must be retained for 5-7 years under MiFID II, MAR, or anti-money-laundering regulations. The erasure propagation system must handle the intersection: delete from systems where no retention obligation applies while preserving records in systems subject to regulatory retention, with clear documentation of the legal basis for each retention decision.

Healthcare. Patient data erasure must comply with HIPAA retention requirements (6 years from date of creation or last effective date) and state-specific medical record retention laws. Erasure requests for data within the retention window require careful legal analysis. The erasure propagation system should support "soft deletion" — marking records as deleted for agent consumption while retaining them in archived form for regulatory compliance.

Public Sector. Government data may be subject to Freedom of Information retention requirements that conflict with erasure requests. The erasure propagation system must integrate with records management systems to determine whether erasure or retention takes precedence, with full audit trail of the decision.

Maturity Model

Basic Implementation — The organisation maintains a manual data lineage document listing known downstream consumers for each primary data source. Erasure is propagated manually — when a deletion occurs, operators follow a checklist to delete from each downstream system. Verification is manual inspection. The erasure audit trail is a log entry in a spreadsheet or ticketing system. This level meets the minimum mandatory requirements but is error-prone at scale: manual processes miss newly created downstream consumers, operators may skip steps under time pressure, and verification is subjective.

Intermediate Implementation — An automated erasure propagation bus dispatches erasure signals to all downstream consumers identified in a maintained lineage graph. Each consumer confirms deletion automatically. An orchestrator tracks confirmation state and escalates missing confirmations. The lineage graph is updated through a governed process when data pipelines change. Cryptographic erasure is implemented for high-volume stores. Vector store embeddings are tombstoned on erasure and physically deleted during index rebuilds.

Advanced Implementation — All intermediate capabilities plus: automated lineage discovery detects new downstream consumers through pipeline instrumentation, data cataloguing, and network traffic analysis. Model weight contamination detection flags fine-tuned models influenced by deleted data and triggers retraining assessment. The erasure propagation system handles complex scenarios including cross-border data transfers, regulatory retention conflicts, and aggregate recomputation. Independent adversarial testing verifies that no zombie data survives erasure propagation. End-to-end propagation time is monitored and consistently meets the 72-hour window for personal data.

7. Evidence Requirements

Required artefacts:

Data lineage graph. The complete graph mapping every data source to all downstream consumers, including caches, vector stores, training datasets, derived analytics, and backup systems. Format: structured export from a data cataloguing tool or equivalent.
Erasure propagation logs. Timestamped records of every erasure event, including: the erasure trigger, source record identity, all downstream consumers notified, confirmation status at each consumer, and total propagation time. Minimum 12 months retention of logs (distinct from the data itself, which is deleted).
Erasure verification evidence. Records demonstrating that erasure was verified at each downstream consumer, including the verification method (automated confirmation, manual inspection, cryptographic key destruction certificate).
Propagation time metrics. Aggregated metrics showing actual propagation times against the defined maximum propagation window, including any breaches and their root cause analysis.
Model contamination assessments. For fine-tuned models, records of assessments determining whether deleted source data influenced model weights and whether retraining was performed.

Retention requirements:

Erasure logs and verification evidence: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise. Note: these logs document the erasure process, not the erased data itself.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Evidence must exist as retained artefacts, not be reconstructable after the fact.

8. Test Specification

Testing AG-130 compliance requires verifying that erasure propagates completely, verifiably, and within defined timeframes.

Test 8.1: End-to-End Erasure Propagation

Stimulus: Create a test data record in a primary source. Allow it to propagate to all downstream consumers (caches, vector store, training dataset staging, derived analytics). Then delete the record at the source.
Expected behaviour: The erasure signal propagates to all downstream consumers. Each consumer confirms deletion. The record is no longer retrievable from any system.
Pass criteria: The record is absent from every downstream consumer within the defined propagation window. An erasure audit trail documents the complete propagation chain.
Fail criteria: The record persists in any downstream system after the propagation window, or any consumer is missed by the propagation mechanism.

Test 8.2: Vector Store Embedding Erasure

Stimulus: Create a test document, generate a vector embedding, and store it in the vector store. Delete the source document. Query the vector store with a query that would have retrieved the deleted document's embedding.
Expected behaviour: The embedding is tombstoned or deleted. The query does not return the deleted document's content or embedding.
Pass criteria: No retrieval query can surface content from the deleted source document. The tombstone or deletion is logged.
Fail criteria: The vector store returns content derived from the deleted source document, or the embedding remains accessible.

Test 8.3: Lineage Graph Completeness

Stimulus: Introduce a new downstream consumer for an existing data source (e.g., a new cache layer or analytics pipeline). Verify that the lineage graph is updated to include the new consumer.
Expected behaviour: The lineage graph reflects the new consumer within the defined update window (automated discovery) or the next manual review cycle.
Pass criteria: The lineage graph accurately reflects all downstream consumers, including the newly added one. Erasure propagation for the source would reach the new consumer.
Fail criteria: The lineage graph omits the new consumer, meaning erasure would not propagate to it.

Test 8.4: Propagation Window Compliance

Stimulus: Trigger 50 erasure events across different data types and downstream consumers. Measure the actual propagation time for each event.
Expected behaviour: All erasure events complete within the defined propagation window (72 hours for personal data).
Pass criteria: 100% of erasure events complete within the defined window. Average propagation time is documented.
Fail criteria: Any erasure event exceeds the propagation window without a documented exception and escalation.

Test 8.5: Cache Invalidation on Erasure

Stimulus: Store a test record in all caching layers (application cache, CDN cache, database query cache, etc.). Delete the source record. Query each cache layer for the deleted record.
Expected behaviour: All cache entries for the deleted record are invalidated immediately upon erasure, not upon TTL expiry.
Pass criteria: No cache layer returns the deleted record. Cache invalidation occurred within minutes of the erasure signal, not at TTL expiry.
Fail criteria: Any cache layer returns the deleted record, or cache invalidation relies solely on TTL expiry.

Test 8.6: Erasure Audit Trail Completeness

Stimulus: Complete an erasure propagation cycle. Review the audit trail.
Expected behaviour: The audit trail contains: erasure trigger identity, source record identity, timestamp, all downstream consumers listed in the lineage graph, confirmation status for each consumer, confirmation timestamp for each consumer, and total propagation time.
Pass criteria: The audit trail is complete, immutable, and queryable. Any auditor can reconstruct the full erasure propagation chain from the trail.
Fail criteria: The audit trail is incomplete, mutable, or lacks entries for any downstream consumer.

Conformance Scoring

Score 0: No erasure propagation exists — deletion at the source does not trigger any downstream action. Zombie data persists indefinitely.
Score 1: Erasure is propagated manually via operator checklists, with no automated mechanism. Verification is manual inspection. Lineage tracking is incomplete.
Score 2: An automated erasure propagation mechanism dispatches signals to all downstream consumers identified in a maintained lineage graph. Confirmation is tracked. Propagation consistently meets defined time windows. Vector store embeddings are included in propagation.
Score 3: Verified by independent testing — adversarial attempts to retain data post-erasure (hidden caches, unregistered consumers, embedding persistence) have been tested and failed. Automated lineage discovery detects new consumers. Model contamination is assessed and addressed.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
GDPR	Article 17 (Right to Erasure)	Direct requirement
GDPR	Article 19 (Notification Regarding Rectification or Erasure)	Direct requirement
CCPA/CPRA	Section 1798.105 (Right to Delete)	Direct requirement
EU AI Act	Article 10 (Data and Data Governance)	Supports compliance
DORA	Article 9 (ICT Risk Management Framework)	Supports compliance
NIST AI RMF	GOVERN 1.5, MANAGE 2.2	Supports compliance
ISO 42001	Clause 8.4 (AI System Development)	Supports compliance
UK Data Protection Act 2018	Section 47 (Right to Erasure)	Direct requirement

Article 17 requires controllers to erase personal data "without undue delay" when the data subject exercises their right to erasure (where applicable grounds exist). The obligation extends to all copies and derivatives — the controller must take "reasonable steps, including technical measures, to inform controllers which are processing the personal data that the data subject has requested erasure." AG-130 directly implements this obligation for AI agent ecosystems by requiring propagation to all downstream consumers, including vector stores, caches, and derived datasets. The 72-hour propagation window in requirement 4.4 aligns with the "without undue delay" standard as interpreted by supervisory authorities.

Article 19 requires the controller to communicate any erasure to each recipient to whom the personal data has been disclosed. AG-130's erasure propagation bus and lineage graph provide the technical infrastructure to identify all recipients (downstream consumers) and communicate the erasure, with verifiable confirmation.

CCPA/CPRA — Section 1798.105

CCPA requires businesses to delete personal information upon a verified consumer request and to direct any service providers to delete the consumer's personal information from their records. AG-130's propagation mechanism implements this "direct service providers to delete" requirement by pushing erasure signals through the data lineage graph to all downstream systems.

EU AI Act — Article 10

Article 10's data governance requirements include the ability to detect and correct errors in data. Zombie data — data that has been retracted or corrected at the source but persists in downstream systems — is a data governance failure that AG-130 prevents through propagation enforcement.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Data-subject-specific for individual erasure requests; potentially organisation-wide for systemic lineage failures

Consequence chain: Zombie data that survives erasure creates three distinct failure modes. First, regulatory non-compliance: failure to propagate erasure of personal data constitutes continued processing without a legal basis, exposing the organisation to GDPR fines of up to 4% of global annual turnover (Scenario A: £3.2 million fine for vector store zombie data). Second, decision contamination: retracted or invalidated data that persists in training datasets or inference pipelines continues to influence agent outputs, potentially driving thousands of incorrect recommendations before discovery (Scenario B: 2,400 clinical recommendations influenced by a retracted study). Third, trust erosion: data subjects who exercise deletion rights and discover their data persists lose trust in the organisation's data governance, leading to complaints, litigation, and reputational damage (Scenario C: user discovers personalised recommendations 90 days after account deletion). The severity compounds with the number of downstream consumers and the difficulty of detection — zombie data is invisible unless actively sought. Cross-references: AG-060 (Data Minimisation and Retention) reduces zombie data risk by minimising unnecessary copies; AG-132 (Vector Store and RAG Governance) requires erasure propagation to reach vector stores; AG-133 (Source Record Lineage Governance) provides the lineage data that erasure propagation depends on; AG-090 (Fine-Tune and Adapter Provenance) addresses the model weight contamination aspect of zombie data in fine-tuned models.

Cite this protocol

AgentGoverning. (2026). AG-130: Residual Data Erasure Propagation Governance. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-130

← Previous Protocol

AG-129

Stale Data Actuation Prevention

Next Protocol →

AG-131

Source Conflict Escalation Governance