Data Minimisation and Retention Governance requires that every AI agent collects, processes, and retains only the personal data that is strictly necessary for its defined purpose — and that data is deleted or anonymised when the retention period expires, without requiring human intervention. The dimension addresses a structural tendency of AI systems to maximise data access: more data generally improves model performance, agents with broad data access can handle a wider range of queries, and developers default to requesting maximum permissions rather than minimum necessary. Without AG-060, AI agents accumulate personal data far beyond what any legitimate purpose requires, creating regulatory exposure, increasing the blast radius of data breaches, and violating the fundamental data protection principle that personal data must be adequate, relevant, and limited to what is necessary. AG-060 requires that minimisation and retention are enforced structurally — through data access controls, field-level filtering, and automated retention enforcement — not through instructions that ask the agent to "only use data that is necessary."
Scenario A — Agent Retains Full Customer Records After Purpose Completion: An AI agent for a telecommunications company processes customer service interactions. When a customer calls to dispute a billing charge, the agent retrieves the full customer record: name, address, date of birth, payment card details, call history, browsing history (for bundled internet service), and device identifiers. The agent resolves the billing dispute but retains the full customer record in its conversation context, which is persisted to a log database for quality assurance. After 18 months, the organisation discovers that 4.7 million full customer records — including payment card details — are stored in the quality assurance log, with no retention limit configured. The original purpose (billing dispute resolution) required only the customer name, account number, and billing history. The additional data — payment cards, browsing history, device identifiers — was never necessary for the purpose but was retrieved because the agent had access to the full record.
What went wrong: The agent was given access to the complete customer record rather than a purpose-limited view. The conversation context, including all retrieved data, was persisted without field-level filtering. No retention schedule was applied to the quality assurance log. Consequence: 4.7 million records containing payment card data in an uncontrolled store, PCI DSS violation (payment card data stored beyond transaction completion), GDPR violation (excessive data retention), data breach risk amplified by the volume and sensitivity of stored data, remediation cost of £890,000 including forensic review, data deletion, and control implementation.
Scenario B — Training Data Retention Without Anonymisation: An enterprise deploys an AI agent that learns from customer interactions to improve its responses. Customer conversations — including names, account numbers, health conditions (for an insurance company), and financial details — are fed into a fine-tuning pipeline. The training dataset grows to 12 million interactions over 2 years. When customers exercise their right to erasure (GDPR Article 17), the organisation discovers that the training data is not structured for individual record deletion — conversations are tokenised, embedded, and distributed across training batches. Deleting a specific customer's data would require retraining the model from scratch at a cost of £2.3 million. The organisation also discovers that 340,000 of the training records belong to customers whose accounts have been closed for more than 12 months, exceeding the retention period in the privacy notice.
What went wrong: Training data was treated as a separate category from operational data, exempt from minimisation and retention requirements. No anonymisation was applied before training. No mechanism existed to identify and remove individual records from training datasets. Retention periods were not applied to training data. Consequence: Inability to comply with erasure requests without full model retraining, €3.2 million fine for excessive retention, 18-month remediation programme to implement anonymisation pipeline and erasure-capable training architecture.
Scenario C — Context Window Accumulation Across Sessions: A general-purpose internal copilot agent maintains a "memory" feature that persists information across sessions to improve personalisation. Over time, the agent accumulates detailed personal data about employees: health conditions mentioned in casual conversation ("I have a doctor's appointment for my diabetes"), family situations ("I need to leave early — my daughter's school called"), financial concerns ("I'm worried about making rent this month"), and career intentions ("I've been interviewing at other companies"). None of this data is necessary for the agent's purpose (internal productivity assistance), and employees were not informed that conversational data would be retained indefinitely. When an employee's manager requests the agent's interaction logs for a performance review, the HR team discovers the accumulated personal data and recognises it as both a GDPR violation (special category health data processed without Article 9 condition) and an employment law liability.
What went wrong: The agent's memory feature accumulated personal data without minimisation controls. No filtering mechanism excluded personal data from persistent memory. No retention limit was applied to the memory store. Employees were not informed of the data processing. Consequence: GDPR violation for processing special category data without legal basis, employment tribunal exposure for inappropriate use of personal data in employment context, employee trust damage requiring withdrawal of the agent's memory feature entirely.
Scope: This dimension applies to all AI agents that process personal data in any form — structured records, unstructured text, conversation logs, context windows, training datasets, embeddings, vector stores, cached results, and derived inferences. The scope explicitly includes data that the agent processes transiently (e.g., in a context window during a session) as well as data that is persisted. Data minimisation applies at the point of collection (only request what is necessary), the point of processing (only use what is necessary for the current operation), and the point of storage (only retain what is necessary for the defined retention period). The scope extends to derived data: inferences about individuals generated by the agent are personal data and are subject to the same minimisation and retention requirements. The scope also covers model training data where that data contains personal data, including fine-tuning datasets, few-shot examples, and retrieval-augmented generation (RAG) knowledge bases.
4.1. A conforming system MUST restrict each agent's access to personal data to only the data categories that are necessary for the agent's defined purpose, enforced through access controls at the data layer — not through instructions to the agent.
4.2. A conforming system MUST implement field-level filtering that prevents personal data fields not required for the agent's purpose from being included in the agent's context, even when those fields exist in the same record or data source.
4.3. A conforming system MUST define and enforce a retention period for every category of personal data processed by each agent, with automated deletion or anonymisation when the retention period expires.
4.4. A conforming system MUST ensure that personal data in agent conversation logs, context caches, and session stores is subject to the same retention controls as personal data in primary data stores.
4.5. A conforming system MUST prevent agents from persisting personal data beyond the current processing session unless the persistence is necessary for the defined purpose and covered by the retention schedule.
4.6. A conforming system MUST apply minimisation controls to training data, fine-tuning datasets, and RAG knowledge bases — anonymising personal data before inclusion or, where anonymisation is not feasible, applying the same retention and access controls as operational data.
4.7. A conforming system SHOULD implement data minimisation as purpose-bound views or projections at the data layer, so that the agent structurally cannot access data fields beyond those required for its purpose.
4.8. A conforming system SHOULD maintain metrics on data minimisation effectiveness, including: the ratio of data fields accessible to the agent versus data fields required for the purpose, retention compliance rates, and volumes of data deleted or anonymised through automated retention enforcement.
4.9. A conforming system SHOULD implement graduated retention with automatic downgrading — full records retained for the operational period, then pseudonymised for the analytical period, then anonymised or deleted at the end of the retention period.
4.10. A conforming system MAY implement dynamic minimisation that adjusts the data fields provided to the agent based on the specific task within the session — providing fewer fields for simple queries and more fields (within the purpose-bound limit) for complex operations.
Data Minimisation and Retention Governance enforces the data protection principle that personal data must be "adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed" (GDPR Article 5(1)(c)) and "kept in a form which permits identification of data subjects for no longer than is necessary" (GDPR Article 5(1)(e)). These are not aspirational goals — they are legal requirements with direct enforcement consequences.
AI agents create unique minimisation challenges because they operate on unstructured data in ways that resist traditional field-level access controls. A database query can be restricted to specific columns. An AI agent consuming a natural-language document cannot easily be prevented from reading a paragraph that contains both relevant and irrelevant personal data. The agent's context window may contain data retrieved for multiple sub-tasks, accumulating data that was necessary for one step but is no longer needed for subsequent steps. Training pipelines consume vast quantities of data, and the relationship between any individual training record and the trained model is opaque — making both minimisation and deletion technically challenging.
These challenges do not excuse non-compliance. They require architectural solutions: purpose-bound data views that filter at the infrastructure layer, context window management that clears unnecessary data between processing steps, training pipelines that anonymise before ingestion, and retention automation that enforces deletion without human intervention. The cost of these solutions is a fraction of the cost of a data protection enforcement action for excessive data accumulation.
Retention governance is particularly important for AI agents because agents generate large volumes of derived data — conversation logs, reasoning traces, intermediate results, and inferences — that may contain personal data even when the original data source has been deleted. An organisation that deletes a customer record from its CRM but retains 18 months of agent conversation logs containing that customer's personal data has not complied with its retention obligations.
AG-060 establishes the data minimisation specification as the central governance artefact. The specification maps each agent to its permitted data categories, field-level access scope, retention periods, and anonymisation requirements. The enforcement layer applies the specification at the data access layer, the context management layer, and the storage layer.
Recommended patterns:
Anti-patterns to avoid:
Financial Services. Financial regulations often impose minimum retention periods that conflict with data minimisation — for example, MiFID II requires retention of transaction records for 5-7 years, while data minimisation requires deletion when no longer necessary. The solution is purpose-specific retention: transaction records are retained for the regulatory period, but ancillary personal data (e.g., customer service conversation logs not related to transactions) follows a shorter retention schedule. AI agents must apply different retention periods to different data categories within the same interaction.
Healthcare. Medical records have jurisdiction-specific retention requirements that may extend decades (e.g., 30 years for paediatric records in the UK). Data minimisation in healthcare focuses on limiting access scope rather than shortening retention: an AI triage agent should access only the clinical data relevant to the current presentation, not the patient's entire medical history. HIPAA's "minimum necessary" standard directly maps to AG-060's field-level filtering requirement.
Public Sector. Freedom of Information (FOI) and public records requirements may require retention of AI agent interaction logs as public records. Minimisation in this context means limiting the personal data captured in logs rather than limiting the retention period. Redaction pipelines that remove personal data from logs before archival can satisfy both retention and minimisation requirements simultaneously.
Basic Implementation — The organisation has documented data minimisation requirements for each agent and defined retention periods for each data category. Access controls restrict agents to relevant databases or tables, but field-level filtering is not implemented — agents access all fields in permitted tables. Retention enforcement is manual (periodic review and deletion) or semi-automated (scheduled deletion of records older than a threshold). Conversation logs are retained without field-level filtering. Training data is not subject to minimisation controls. This level meets minimum documentation requirements but leaves significant gaps in structural enforcement.
Intermediate Implementation — Field-level data access controls are implemented through purpose-bound views or API projections. Agents access only the fields required for their defined purpose. Retention enforcement is fully automated across primary data stores, conversation logs, and context caches. Personal data is filtered from conversation logs at the point of persistence, retaining only the minimum necessary for quality assurance. Training data undergoes anonymisation before ingestion into training pipelines. Minimisation metrics are tracked and reported quarterly: data fields accessible versus required, retention compliance rates, and volumes processed by automated retention.
Advanced Implementation — All intermediate capabilities plus: dynamic minimisation adjusts data fields provided to the agent based on the specific task within a session; context window hygiene automatically scrubs data no longer needed for the current processing step; training data anonymisation is verified through re-identification testing with k-anonymity >= 5; graduated retention automatically transitions data through full → pseudonymised → anonymised → deleted stages; and an independent data protection auditor verifies minimisation effectiveness annually. The organisation can demonstrate, for any agent at any point in time, exactly what personal data the agent can access and why each data field is necessary for the agent's purpose.
Required artefacts:
Retention requirements:
Access requirements:
Testing AG-060 compliance requires verifying structural minimisation enforcement, retention automation, and the absence of data accumulation.
Test 8.1: Field-Level Access Restriction
Test 8.2: Retention Period Enforcement
Test 8.3: Conversation Log Filtering
Test 8.4: Context Window Accumulation Prevention
Test 8.5: Training Data Anonymisation
Test 8.6: Retention Enforcement Across All Data Stores
| Regulation | Provision | Relationship Type |
|---|---|---|
| GDPR | Article 5(1)(c) (Data Minimisation) | Direct requirement |
| GDPR | Article 5(1)(e) (Storage Limitation) | Direct requirement |
| GDPR | Article 25 (Data Protection by Design and by Default) | Direct requirement |
| UK GDPR | Articles 5(1)(c), 5(1)(e), 25 (as retained) | Direct requirement |
| EU AI Act | Article 10 (Data and Data Governance) | Supports compliance |
| CCPA/CPRA | Section 1798.100(c) (Collection Limitation) | Direct requirement |
| HIPAA | 45 CFR 164.502(b) (Minimum Necessary Standard) | Direct requirement |
| LGPD (Brazil) | Article 6 (Necessity Principle) | Direct requirement |
| POPIA (South Africa) | Section 10 (Minimality) | Direct requirement |
| ISO 42001 | Clause 6.1 (Actions to Address Risks) | Supports compliance |
Article 5(1)(c) requires that personal data be "adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed." For AI agents, this principle directly prohibits maximum-privilege data access: an agent should not have access to all customer fields when it needs only three. AG-060 implements this principle through structural field-level access controls that ensure the agent can only access data that is necessary for its defined purpose. The EDPB has emphasised that data minimisation applies to AI systems and that "the fact that a large amount of data may improve the performance of the system does not in itself constitute a justification for collecting or processing more data than is necessary."
Article 5(1)(e) requires that personal data be "kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed." AG-060 implements this through automated retention enforcement that deletes or anonymises data when the retention period expires. The storage limitation principle applies to all forms of personal data storage, including conversation logs, context caches, and training datasets — not only to primary databases.
Article 25 requires that data protection principles — including minimisation — be implemented by design and by default. This means that an AI agent's default configuration must be minimised, and expanding data access requires affirmative justification. AG-060 supports this by requiring that minimisation is structural (enforced at the data layer) rather than instructional (relying on the agent's behaviour), which is the definition of "by design."
HIPAA's minimum necessary standard (45 CFR 164.502(b)) requires covered entities to make reasonable efforts to limit protected health information to the minimum necessary to accomplish the intended purpose. For AI agents in healthcare, this maps directly to AG-060's field-level filtering requirement: a triage agent should access only the clinical data relevant to the current presentation, not the patient's complete medical history. Unlike GDPR, HIPAA applies the minimum necessary standard to most uses and disclosures but exempts treatment purposes — however, an AI agent's access for operational purposes (billing, quality assurance) is not exempt.
The CPRA (amending CCPA) introduced Section 1798.100(c), which requires that a business's collection, use, retention, and sharing of personal information be "reasonably necessary and proportionate to achieve the purposes for which the personal information was collected or processed." This is functionally equivalent to GDPR's minimisation principle and is enforced by AG-060 through the same structural controls.
| Field | Value |
|---|---|
| Severity Rating | High |
| Blast Radius | Organisation-wide — affects every data subject whose personal data is over-collected, over-retained, or inadequately minimised by any agent |
Consequence chain: Without data minimisation enforcement, AI agents accumulate personal data far beyond what any purpose requires. The accumulation is invisible in day-to-day operations because the agent functions correctly — more data generally produces better responses. The exposure becomes apparent only upon audit, data breach, or regulatory inquiry. The consequences are threefold. First, regulatory: data minimisation is a core principle under every major data protection framework, and systematic violation is treated as a fundamental compliance failure rather than a technical shortcoming — fines under GDPR for principle violations are in the upper tier (up to €20 million or 4% of annual global turnover). Second, breach amplification: excessive data accumulation increases the blast radius of any data breach — a breach of a minimised data store exposes 6 fields per customer; a breach of an un-minimised store exposes 47 fields per customer, including fields that should never have been collected. Third, operational: remediation of historical data accumulation is expensive and disruptive — identifying, classifying, and deleting or anonymising years of accumulated data across multiple data stores, conversation logs, and training datasets can cost millions and take months to years.
Cross-references: AG-013 (Data Sensitivity and Exfiltration Prevention) provides the data classification foundation that minimisation enforcement relies upon; AG-020 (Purpose-Bound Operation Enforcement) defines the purposes against which minimisation is measured; AG-059 (Lawful Basis and Consent Enforcement) ensures that the data being minimised is processed under a valid legal basis; AG-061 (Erasure and Rectification Propagation Governance) addresses individual deletion requests, complementing AG-060's automated retention; AG-047 (Cross-Jurisdiction Compliance Governance) extends minimisation requirements across jurisdictions with varying standards.