Residual Data Erasure Propagation Governance requires that when data is deleted, retracted, corrected, or invalidated at its source, the erasure propagates completely and verifiably to every downstream system, cache, vector store, training dataset, and derived artefact that consumed that data. Zombie data is data that has been deleted at its origin but survives in copies, caches, embeddings, fine-tuned model weights, or derived datasets — continuing to influence agent behaviour as if it were still valid. AG-130 mandates that organisations maintain a complete data lineage graph, implement propagation mechanisms that push erasure signals to all downstream consumers, and verify that erasure has been completed within defined timeframes.
Scenario A — Deleted Customer Record Persists in RAG System: A financial services firm receives a GDPR erasure request (Article 17) from a former customer. The compliance team deletes the customer's records from the primary CRM database within 30 days as required. However, the customer's financial history, transaction patterns, and risk profile have been ingested into a vector store used by an AI advisory agent for retrieval-augmented generation. The embeddings remain in the vector store. Six months later, the advisory agent retrieves the deleted customer's financial pattern as a "similar customer" reference when advising a current client. The former customer discovers this through a subject access request to the current client's firm (which shares the same advisory platform) and files a complaint with the ICO. The ICO finds that the firm failed to propagate erasure to the vector store, constituting continued processing of personal data without a legal basis. The fine is £3.2 million.
What went wrong: Erasure was applied to the primary data store but not to the vector store that had consumed and transformed the data into embeddings. No data lineage graph existed to identify all downstream consumers. No propagation mechanism pushed the erasure signal to the vector store. The vector store was not included in the erasure workflow because the compliance team was unaware it contained derived personal data.
Scenario B — Retracted Medical Study Continues Influencing Clinical Agent: A pharmaceutical research agent is fine-tuned on a dataset that includes 12,000 published medical studies. One of the studies — reporting a correlation between a common medication and reduced cardiac risk — is retracted by the journal due to fabricated data. The retraction is published in the journal's database, but no mechanism propagates this retraction to the fine-tuning dataset or the model trained on it. The agent continues to recommend the medication for cardiac risk reduction based on the retracted study's influence on its weights. Over 8 months, the agent contributes to 2,400 clinical recommendations that incorporate the retracted finding. When the issue is discovered, the organisation cannot identify which specific recommendations were influenced by the retracted study, because the fine-tuning process merged the study's signal into the model's weights without maintaining per-source attribution.
What went wrong: No erasure propagation mechanism existed for training data. The retraction at the source (journal database) did not trigger any downstream action. The fine-tuning pipeline did not maintain per-source lineage, making it impossible to identify or isolate the retracted study's influence on model behaviour. This intersects with AG-090 (Fine-Tune and Adapter Provenance).
Scenario C — Deleted User Preferences Persist in Recommendation Cache: An e-commerce platform deletes a user's account and all associated data per the user's request. The deletion is applied to the primary user database, the order history database, and the analytics warehouse. However, the user's browsing preferences and product affinity scores have been cached in a Redis cluster used by the AI recommendation agent for low-latency inference. The cache TTL is 90 days. For the next 90 days, the recommendation agent continues to serve personalised recommendations when anyone uses the same device, because the device fingerprint is still associated with the deleted user's preference profile in the cache. The user discovers this, files a CCPA complaint, and the resulting investigation reveals that the organisation's deletion process did not include 14 distinct caching layers.
What went wrong: The data lineage graph did not include caching infrastructure. The deletion workflow targeted databases but not caches. The 90-day TTL meant zombie data would eventually expire, but GDPR and CCPA require timely erasure — not eventual consistency. The organisation could not demonstrate that all copies were deleted within the required timeframe.
Scope: This dimension applies to every AI agent ecosystem where data from any source is copied, cached, transformed, embedded, used for training, or stored in any form other than the original source. If data exists in exactly one location and is deleted there, no propagation is needed. The moment data is replicated — to a cache, a vector store, a training dataset, a derived feature store, a log, a backup, or any other secondary location — AG-130 applies. The scope includes: structured data copies, unstructured document caches, vector embeddings derived from source data, model weights influenced by source data during fine-tuning, derived analytics and aggregations, data retained in message queues, and any other persistent or semi-persistent copy. Temporary in-memory processing buffers that exist only during a single request lifecycle are excluded, provided they are verifiably non-persistent.
4.1. A conforming system MUST maintain a data lineage graph that maps every data source to all downstream systems, caches, stores, and derived artefacts that have consumed data from that source.
4.2. A conforming system MUST implement an erasure propagation mechanism that, when data is deleted, retracted, or invalidated at its source, pushes an erasure signal to every downstream consumer identified in the data lineage graph.
4.3. A conforming system MUST verify that erasure has been completed at each downstream consumer and record the verification result, including the consumer identity, verification timestamp, and verification method.
4.4. A conforming system MUST complete erasure propagation within a defined maximum propagation window appropriate to the data's sensitivity and regulatory requirements — no more than 72 hours for personal data subject to GDPR, CCPA, or equivalent privacy regulations.
4.5. A conforming system MUST treat vector store embeddings derived from source data as copies of that data for erasure propagation purposes — deletion of source data requires deletion or re-generation of affected embeddings.
4.6. A conforming system MUST maintain an erasure audit trail showing: the erasure trigger (deletion request, retraction notice, court order), the source record identity, all downstream consumers notified, verification status at each consumer, and the total propagation time.
4.7. A conforming system SHOULD implement automated lineage discovery that detects new downstream consumers when data pipelines are created or modified, rather than relying solely on manual lineage documentation.
4.8. A conforming system SHOULD maintain a mechanism to flag model weights that may be influenced by deleted source data, triggering a review to determine whether model retraining is necessary.
4.9. A conforming system MAY implement cryptographic erasure — encrypting data at rest with per-record keys and destroying the key upon erasure, rendering all copies unrecoverable without requiring per-copy deletion.
Residual Data Erasure Propagation Governance exists because modern AI agent ecosystems are not single-database systems — they are distributed data architectures where a single source record may be consumed by dozens of downstream systems, each creating its own copy, cache, embedding, or derivative. When the source record is deleted, each downstream copy must also be deleted or invalidated. If even one copy survives, the data continues to influence agent behaviour and the deletion is incomplete.
The problem is amplified by AI-specific data transformations. When a text document is converted into a vector embedding, the original text may not be recoverable from the embedding, but the embedding still contains a semantic representation of the content. When a dataset is used for fine-tuning, the individual data records are not stored in the model, but they influence the model's weights. These transformations create a spectrum of "data persistence" that does not map cleanly to traditional deletion: you cannot "delete" a single training example from a fine-tuned model without retraining.
Regulatory regimes do not accept this complexity as an excuse. GDPR Article 17 (right to erasure) requires that personal data be erased "without undue delay." CCPA Section 1798.105 requires businesses to delete personal information. Neither regulation exempts derived copies, cached copies, or semantic representations. An embedding that represents a person's financial history is still personal data even if the original text has been deleted.
AG-130 addresses this by requiring organisations to: (1) know where their data goes, through a lineage graph; (2) push erasure signals to all destinations, through a propagation mechanism; and (3) verify that erasure completed, through an audit trail. This intersects with AG-060 (Data Minimisation and Retention) because minimising unnecessary copies reduces the erasure propagation surface, and with AG-133 (Source Record Lineage Governance) because lineage tracking is the foundation for knowing which downstream systems are affected.
The core implementation artefact is a data lineage graph combined with an erasure propagation bus. The lineage graph is a directed acyclic graph (or, in cases of circular data flows, a directed graph with cycle detection) that maps every data source to every downstream consumer. The erasure propagation bus is a messaging system that delivers erasure signals to all consumers identified in the lineage graph for a given source record.
Recommended patterns:
Anti-patterns to avoid:
Financial Services. Client data deletion under GDPR/UK GDPR must be reconciled with regulatory retention requirements. Some financial records must be retained for 5-7 years under MiFID II, MAR, or anti-money-laundering regulations. The erasure propagation system must handle the intersection: delete from systems where no retention obligation applies while preserving records in systems subject to regulatory retention, with clear documentation of the legal basis for each retention decision.
Healthcare. Patient data erasure must comply with HIPAA retention requirements (6 years from date of creation or last effective date) and state-specific medical record retention laws. Erasure requests for data within the retention window require careful legal analysis. The erasure propagation system should support "soft deletion" — marking records as deleted for agent consumption while retaining them in archived form for regulatory compliance.
Public Sector. Government data may be subject to Freedom of Information retention requirements that conflict with erasure requests. The erasure propagation system must integrate with records management systems to determine whether erasure or retention takes precedence, with full audit trail of the decision.
Basic Implementation — The organisation maintains a manual data lineage document listing known downstream consumers for each primary data source. Erasure is propagated manually — when a deletion occurs, operators follow a checklist to delete from each downstream system. Verification is manual inspection. The erasure audit trail is a log entry in a spreadsheet or ticketing system. This level meets the minimum mandatory requirements but is error-prone at scale: manual processes miss newly created downstream consumers, operators may skip steps under time pressure, and verification is subjective.
Intermediate Implementation — An automated erasure propagation bus dispatches erasure signals to all downstream consumers identified in a maintained lineage graph. Each consumer confirms deletion automatically. An orchestrator tracks confirmation state and escalates missing confirmations. The lineage graph is updated through a governed process when data pipelines change. Cryptographic erasure is implemented for high-volume stores. Vector store embeddings are tombstoned on erasure and physically deleted during index rebuilds.
Advanced Implementation — All intermediate capabilities plus: automated lineage discovery detects new downstream consumers through pipeline instrumentation, data cataloguing, and network traffic analysis. Model weight contamination detection flags fine-tuned models influenced by deleted data and triggers retraining assessment. The erasure propagation system handles complex scenarios including cross-border data transfers, regulatory retention conflicts, and aggregate recomputation. Independent adversarial testing verifies that no zombie data survives erasure propagation. End-to-end propagation time is monitored and consistently meets the 72-hour window for personal data.
Required artefacts:
Retention requirements:
Access requirements:
Testing AG-130 compliance requires verifying that erasure propagates completely, verifiably, and within defined timeframes.
Test 8.1: End-to-End Erasure Propagation
Test 8.2: Vector Store Embedding Erasure
Test 8.3: Lineage Graph Completeness
Test 8.4: Propagation Window Compliance
Test 8.5: Cache Invalidation on Erasure
Test 8.6: Erasure Audit Trail Completeness
| Regulation | Provision | Relationship Type |
|---|---|---|
| GDPR | Article 17 (Right to Erasure) | Direct requirement |
| GDPR | Article 19 (Notification Regarding Rectification or Erasure) | Direct requirement |
| CCPA/CPRA | Section 1798.105 (Right to Delete) | Direct requirement |
| EU AI Act | Article 10 (Data and Data Governance) | Supports compliance |
| DORA | Article 9 (ICT Risk Management Framework) | Supports compliance |
| NIST AI RMF | GOVERN 1.5, MANAGE 2.2 | Supports compliance |
| ISO 42001 | Clause 8.4 (AI System Development) | Supports compliance |
| UK Data Protection Act 2018 | Section 47 (Right to Erasure) | Direct requirement |
Article 17 requires controllers to erase personal data "without undue delay" when the data subject exercises their right to erasure (where applicable grounds exist). The obligation extends to all copies and derivatives — the controller must take "reasonable steps, including technical measures, to inform controllers which are processing the personal data that the data subject has requested erasure." AG-130 directly implements this obligation for AI agent ecosystems by requiring propagation to all downstream consumers, including vector stores, caches, and derived datasets. The 72-hour propagation window in requirement 4.4 aligns with the "without undue delay" standard as interpreted by supervisory authorities.
Article 19 requires the controller to communicate any erasure to each recipient to whom the personal data has been disclosed. AG-130's erasure propagation bus and lineage graph provide the technical infrastructure to identify all recipients (downstream consumers) and communicate the erasure, with verifiable confirmation.
CCPA requires businesses to delete personal information upon a verified consumer request and to direct any service providers to delete the consumer's personal information from their records. AG-130's propagation mechanism implements this "direct service providers to delete" requirement by pushing erasure signals through the data lineage graph to all downstream systems.
Article 10's data governance requirements include the ability to detect and correct errors in data. Zombie data — data that has been retracted or corrected at the source but persists in downstream systems — is a data governance failure that AG-130 prevents through propagation enforcement.
| Field | Value |
|---|---|
| Severity Rating | High |
| Blast Radius | Data-subject-specific for individual erasure requests; potentially organisation-wide for systemic lineage failures |
Consequence chain: Zombie data that survives erasure creates three distinct failure modes. First, regulatory non-compliance: failure to propagate erasure of personal data constitutes continued processing without a legal basis, exposing the organisation to GDPR fines of up to 4% of global annual turnover (Scenario A: £3.2 million fine for vector store zombie data). Second, decision contamination: retracted or invalidated data that persists in training datasets or inference pipelines continues to influence agent outputs, potentially driving thousands of incorrect recommendations before discovery (Scenario B: 2,400 clinical recommendations influenced by a retracted study). Third, trust erosion: data subjects who exercise deletion rights and discover their data persists lose trust in the organisation's data governance, leading to complaints, litigation, and reputational damage (Scenario C: user discovers personalised recommendations 90 days after account deletion). The severity compounds with the number of downstream consumers and the difficulty of detection — zombie data is invisible unless actively sought. Cross-references: AG-060 (Data Minimisation and Retention) reduces zombie data risk by minimising unnecessary copies; AG-132 (Vector Store and RAG Governance) requires erasure propagation to reach vector stores; AG-133 (Source Record Lineage Governance) provides the lineage data that erasure propagation depends on; AG-090 (Fine-Tune and Adapter Provenance) addresses the model weight contamination aspect of zombie data in fine-tuned models.