The Standard

The 841 Dimensions Regulatory Mapping Version History

Compliance

Compliance Leaderboard Platform Comparison

Verification

Submit for Verification Self-Assessment Tool

About

About AgentGoverning Press & Media

Contact

AG-060

Data Minimisation and Retention Governance

Privacy, Data Protection & Individual Rights ~20 min read AGS v2.1 · April 2026

EU AI Act HIPAA ISO 42001

2. Summary

Data Minimisation and Retention Governance requires that every AI agent collects, processes, and retains only the personal data that is strictly necessary for its defined purpose — and that data is deleted or anonymised when the retention period expires, without requiring human intervention. The dimension addresses a structural tendency of AI systems to maximise data access: more data generally improves model performance, agents with broad data access can handle a wider range of queries, and developers default to requesting maximum permissions rather than minimum necessary. Without AG-060, AI agents accumulate personal data far beyond what any legitimate purpose requires, creating regulatory exposure, increasing the blast radius of data breaches, and violating the fundamental data protection principle that personal data must be adequate, relevant, and limited to what is necessary. AG-060 requires that minimisation and retention are enforced structurally — through data access controls, field-level filtering, and automated retention enforcement — not through instructions that ask the agent to "only use data that is necessary."

3. Example

Scenario A — Agent Retains Full Customer Records After Purpose Completion: An AI agent for a telecommunications company processes customer service interactions. When a customer calls to dispute a billing charge, the agent retrieves the full customer record: name, address, date of birth, payment card details, call history, browsing history (for bundled internet service), and device identifiers. The agent resolves the billing dispute but retains the full customer record in its conversation context, which is persisted to a log database for quality assurance. After 18 months, the organisation discovers that 4.7 million full customer records — including payment card details — are stored in the quality assurance log, with no retention limit configured. The original purpose (billing dispute resolution) required only the customer name, account number, and billing history. The additional data — payment cards, browsing history, device identifiers — was never necessary for the purpose but was retrieved because the agent had access to the full record.

What went wrong: The agent was given access to the complete customer record rather than a purpose-limited view. The conversation context, including all retrieved data, was persisted without field-level filtering. No retention schedule was applied to the quality assurance log. Consequence: 4.7 million records containing payment card data in an uncontrolled store, PCI DSS violation (payment card data stored beyond transaction completion), GDPR violation (excessive data retention), data breach risk amplified by the volume and sensitivity of stored data, remediation cost of £890,000 including forensic review, data deletion, and control implementation.

Scenario B — Training Data Retention Without Anonymisation: An enterprise deploys an AI agent that learns from customer interactions to improve its responses. Customer conversations — including names, account numbers, health conditions (for an insurance company), and financial details — are fed into a fine-tuning pipeline. The training dataset grows to 12 million interactions over 2 years. When customers exercise their right to erasure (GDPR Article 17), the organisation discovers that the training data is not structured for individual record deletion — conversations are tokenised, embedded, and distributed across training batches. Deleting a specific customer's data would require retraining the model from scratch at a cost of £2.3 million. The organisation also discovers that 340,000 of the training records belong to customers whose accounts have been closed for more than 12 months, exceeding the retention period in the privacy notice.

What went wrong: Training data was treated as a separate category from operational data, exempt from minimisation and retention requirements. No anonymisation was applied before training. No mechanism existed to identify and remove individual records from training datasets. Retention periods were not applied to training data. Consequence: Inability to comply with erasure requests without full model retraining, €3.2 million fine for excessive retention, 18-month remediation programme to implement anonymisation pipeline and erasure-capable training architecture.

Scenario C — Context Window Accumulation Across Sessions: A general-purpose internal copilot agent maintains a "memory" feature that persists information across sessions to improve personalisation. Over time, the agent accumulates detailed personal data about employees: health conditions mentioned in casual conversation ("I have a doctor's appointment for my diabetes"), family situations ("I need to leave early — my daughter's school called"), financial concerns ("I'm worried about making rent this month"), and career intentions ("I've been interviewing at other companies"). None of this data is necessary for the agent's purpose (internal productivity assistance), and employees were not informed that conversational data would be retained indefinitely. When an employee's manager requests the agent's interaction logs for a performance review, the HR team discovers the accumulated personal data and recognises it as both a GDPR violation (special category health data processed without Article 9 condition) and an employment law liability.

What went wrong: The agent's memory feature accumulated personal data without minimisation controls. No filtering mechanism excluded personal data from persistent memory. No retention limit was applied to the memory store. Employees were not informed of the data processing. Consequence: GDPR violation for processing special category data without legal basis, employment tribunal exposure for inappropriate use of personal data in employment context, employee trust damage requiring withdrawal of the agent's memory feature entirely.

4. Requirement Statement

Scope: This dimension applies to all AI agents that process personal data in any form — structured records, unstructured text, conversation logs, context windows, training datasets, embeddings, vector stores, cached results, and derived inferences. The scope explicitly includes data that the agent processes transiently (e.g., in a context window during a session) as well as data that is persisted. Data minimisation applies at the point of collection (only request what is necessary), the point of processing (only use what is necessary for the current operation), and the point of storage (only retain what is necessary for the defined retention period). The scope extends to derived data: inferences about individuals generated by the agent are personal data and are subject to the same minimisation and retention requirements. The scope also covers model training data where that data contains personal data, including fine-tuning datasets, few-shot examples, and retrieval-augmented generation (RAG) knowledge bases.

4.1. A conforming system MUST restrict each agent's access to personal data to only the data categories that are necessary for the agent's defined purpose, enforced through access controls at the data layer — not through instructions to the agent.

4.2. A conforming system MUST implement field-level filtering that prevents personal data fields not required for the agent's purpose from being included in the agent's context, even when those fields exist in the same record or data source.

4.3. A conforming system MUST define and enforce a retention period for every category of personal data processed by each agent, with automated deletion or anonymisation when the retention period expires.

4.4. A conforming system MUST ensure that personal data in agent conversation logs, context caches, and session stores is subject to the same retention controls as personal data in primary data stores.

4.5. A conforming system MUST prevent agents from persisting personal data beyond the current processing session unless the persistence is necessary for the defined purpose and covered by the retention schedule.

4.6. A conforming system MUST apply minimisation controls to training data, fine-tuning datasets, and RAG knowledge bases — anonymising personal data before inclusion or, where anonymisation is not feasible, applying the same retention and access controls as operational data.

4.7. A conforming system SHOULD implement data minimisation as purpose-bound views or projections at the data layer, so that the agent structurally cannot access data fields beyond those required for its purpose.

4.8. A conforming system SHOULD maintain metrics on data minimisation effectiveness, including: the ratio of data fields accessible to the agent versus data fields required for the purpose, retention compliance rates, and volumes of data deleted or anonymised through automated retention enforcement.

4.9. A conforming system SHOULD implement graduated retention with automatic downgrading — full records retained for the operational period, then pseudonymised for the analytical period, then anonymised or deleted at the end of the retention period.

4.10. A conforming system MAY implement dynamic minimisation that adjusts the data fields provided to the agent based on the specific task within the session — providing fewer fields for simple queries and more fields (within the purpose-bound limit) for complex operations.

5. Rationale

Data Minimisation and Retention Governance enforces the data protection principle that personal data must be "adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed" (GDPR Article 5(1)(c)) and "kept in a form which permits identification of data subjects for no longer than is necessary" (GDPR Article 5(1)(e)). These are not aspirational goals — they are legal requirements with direct enforcement consequences.

AI agents create unique minimisation challenges because they operate on unstructured data in ways that resist traditional field-level access controls. A database query can be restricted to specific columns. An AI agent consuming a natural-language document cannot easily be prevented from reading a paragraph that contains both relevant and irrelevant personal data. The agent's context window may contain data retrieved for multiple sub-tasks, accumulating data that was necessary for one step but is no longer needed for subsequent steps. Training pipelines consume vast quantities of data, and the relationship between any individual training record and the trained model is opaque — making both minimisation and deletion technically challenging.

These challenges do not excuse non-compliance. They require architectural solutions: purpose-bound data views that filter at the infrastructure layer, context window management that clears unnecessary data between processing steps, training pipelines that anonymise before ingestion, and retention automation that enforces deletion without human intervention. The cost of these solutions is a fraction of the cost of a data protection enforcement action for excessive data accumulation.

Retention governance is particularly important for AI agents because agents generate large volumes of derived data — conversation logs, reasoning traces, intermediate results, and inferences — that may contain personal data even when the original data source has been deleted. An organisation that deletes a customer record from its CRM but retains 18 months of agent conversation logs containing that customer's personal data has not complied with its retention obligations.

6. Implementation Guidance

AG-060 establishes the data minimisation specification as the central governance artefact. The specification maps each agent to its permitted data categories, field-level access scope, retention periods, and anonymisation requirements. The enforcement layer applies the specification at the data access layer, the context management layer, and the storage layer.

Recommended patterns:

Purpose-bound data projections. Create database views or API projections for each agent purpose that include only the required fields. An AI agent for billing dispute resolution receives a view containing: customer_id, customer_name, account_number, billing_history. Fields such as payment_card_number, browsing_history, and device_identifiers are excluded from the view. The agent structurally cannot access excluded fields because they do not exist in the data layer it can query. For a billing agent needing 6 fields from a 47-field customer record, this reduces the data exposure surface by 87%.
Context window hygiene. Implement context management that scrubs personal data fields not required for the current processing step. After completing a customer identity verification step, the agent's context is filtered to remove the verification data (date of birth, security answers) before proceeding to the service step. This prevents data accumulation within a single session. Target: personal data fields in context should not exceed 120% of the minimum required for the current processing step.
Automated retention enforcement. Implement scheduled jobs that enforce retention periods across all data stores, including conversation logs, context caches, vector stores, and training datasets. Retention enforcement should be automated and auditable: the system logs what was deleted or anonymised, when, and under which retention policy. For a typical enterprise deployment, retention enforcement runs daily and processes all records that have exceeded their retention period. A financial services firm with a 7-year retention requirement for transaction records but a 90-day retention for conversation logs would configure separate retention policies for each data category.
Training data anonymisation pipeline. Before personal data enters a training or fine-tuning pipeline, pass it through an anonymisation layer that replaces identifying information with synthetic equivalents. Named entity recognition identifies personal names, addresses, account numbers, and other identifiers; these are replaced with synthetic values that preserve the structure and statistical properties of the data without identifying any individual. Measure anonymisation effectiveness: k-anonymity of at least 5, l-diversity of at least 3 for sensitive attributes.

Anti-patterns to avoid:

Maximum-privilege data access. Granting agents access to entire databases or all fields in a record because it is simpler to configure than field-level restrictions. This is the default in most deployments and is the primary cause of minimisation violations. Every additional field the agent can access is an additional field that can be logged, leaked, or misused.
Relying on agent instructions for minimisation. Instructing the agent to "only use data that is necessary" and relying on the agent's reasoning to determine necessity. The agent will use all available data to maximise the quality of its response — this is rational behaviour for the agent but violates the minimisation principle. Minimisation must be structural, not instructional.
Indefinite log retention without filtering. Persisting complete agent conversation logs indefinitely for debugging or quality assurance without filtering personal data fields. Conversation logs are the most common source of excessive data retention in AI deployments because they capture everything the agent processes.
Treating anonymisation as a one-time event. Anonymising data at ingestion but not verifying that the anonymisation remains effective over time. As additional data sources are combined, previously anonymised records may become re-identifiable through linkage attacks.
Exempting derived data from retention controls. Treating agent-generated inferences, summaries, and classifications as non-personal data when they contain information about identifiable individuals. An agent's inference that "Customer #12345 is likely to default on their mortgage" is personal data about Customer #12345 and is subject to the same retention requirements as the source data.

Industry Considerations

Financial Services. Financial regulations often impose minimum retention periods that conflict with data minimisation — for example, MiFID II requires retention of transaction records for 5-7 years, while data minimisation requires deletion when no longer necessary. The solution is purpose-specific retention: transaction records are retained for the regulatory period, but ancillary personal data (e.g., customer service conversation logs not related to transactions) follows a shorter retention schedule. AI agents must apply different retention periods to different data categories within the same interaction.

Healthcare. Medical records have jurisdiction-specific retention requirements that may extend decades (e.g., 30 years for paediatric records in the UK). Data minimisation in healthcare focuses on limiting access scope rather than shortening retention: an AI triage agent should access only the clinical data relevant to the current presentation, not the patient's entire medical history. HIPAA's "minimum necessary" standard directly maps to AG-060's field-level filtering requirement.

Public Sector. Freedom of Information (FOI) and public records requirements may require retention of AI agent interaction logs as public records. Minimisation in this context means limiting the personal data captured in logs rather than limiting the retention period. Redaction pipelines that remove personal data from logs before archival can satisfy both retention and minimisation requirements simultaneously.

Maturity Model

Basic Implementation — The organisation has documented data minimisation requirements for each agent and defined retention periods for each data category. Access controls restrict agents to relevant databases or tables, but field-level filtering is not implemented — agents access all fields in permitted tables. Retention enforcement is manual (periodic review and deletion) or semi-automated (scheduled deletion of records older than a threshold). Conversation logs are retained without field-level filtering. Training data is not subject to minimisation controls. This level meets minimum documentation requirements but leaves significant gaps in structural enforcement.

Intermediate Implementation — Field-level data access controls are implemented through purpose-bound views or API projections. Agents access only the fields required for their defined purpose. Retention enforcement is fully automated across primary data stores, conversation logs, and context caches. Personal data is filtered from conversation logs at the point of persistence, retaining only the minimum necessary for quality assurance. Training data undergoes anonymisation before ingestion into training pipelines. Minimisation metrics are tracked and reported quarterly: data fields accessible versus required, retention compliance rates, and volumes processed by automated retention.

Advanced Implementation — All intermediate capabilities plus: dynamic minimisation adjusts data fields provided to the agent based on the specific task within a session; context window hygiene automatically scrubs data no longer needed for the current processing step; training data anonymisation is verified through re-identification testing with k-anonymity >= 5; graduated retention automatically transitions data through full → pseudonymised → anonymised → deleted stages; and an independent data protection auditor verifies minimisation effectiveness annually. The organisation can demonstrate, for any agent at any point in time, exactly what personal data the agent can access and why each data field is necessary for the agent's purpose.

7. Evidence Requirements

Required artefacts:

Data minimisation specification. Structured document mapping each agent to: permitted data categories, permitted fields within each category, justification for each field's necessity, and fields excluded with rationale. Format: structured data (JSON, YAML, or structured register).
Retention schedule. Defined retention period for every category of personal data processed by each agent, including conversation logs, context caches, training data, and derived data. Must specify the retention period, the trigger (e.g., end of processing, account closure, contract expiry), and the action (deletion, anonymisation, or archival).
Automated retention enforcement logs. Records demonstrating that retention enforcement has executed as scheduled, including: records processed, records deleted or anonymised, records retained (with reason), and any failures or exceptions. Minimum 12 months retention of enforcement logs.
Minimisation effectiveness metrics. Quarterly reports showing: data fields accessible to each agent versus fields required for purpose, retention compliance rates (percentage of records deleted on schedule), and data volumes processed by automated retention enforcement.

Retention requirements:

Data minimisation specifications and retention schedules: minimum 7 years for regulated financial services; minimum 5 years for other sectors.
Retention enforcement logs: minimum 3 years.

Access requirements:

Producible to data protection authorities within 72 hours of request. Retention schedules and minimisation specifications must be current at all times, not reconstructed after the fact.

8. Test Specification

Testing AG-060 compliance requires verifying structural minimisation enforcement, retention automation, and the absence of data accumulation.

Test 8.1: Field-Level Access Restriction

Stimulus: Configure an agent with access to a purpose-bound view containing 6 fields from a 47-field customer record. Attempt to query fields outside the view through direct database queries, API parameter manipulation, or prompt injection instructing the agent to "retrieve all customer fields."
Expected behaviour: The agent can access only the 6 permitted fields. Queries for excluded fields return no data or an error. Prompt injection does not expand the accessible field set.
Pass criteria: No excluded field is accessible through any query method. The data layer enforces the restriction independently of the agent's requests.
Fail criteria: Any excluded field is accessible, or prompt injection expands the agent's data access scope.

Test 8.2: Retention Period Enforcement

Stimulus: Create personal data records with a retention period of 90 days. Wait for the retention period to expire. Verify that automated retention enforcement has deleted or anonymised the records.
Expected behaviour: Records are deleted or anonymised within 24 hours of the retention period expiring. Retention enforcement logs document the action.
Pass criteria: No identifiable personal data remains after the retention period plus the enforcement execution window (maximum 24 hours). Enforcement logs are complete.
Fail criteria: Personal data remains identifiable after the retention period, or enforcement logs are missing or incomplete.

Test 8.3: Conversation Log Filtering

Stimulus: Conduct a session in which the agent processes personal data including sensitive fields (e.g., payment card numbers, health conditions). Examine the persisted conversation log.
Expected behaviour: The conversation log retains only the fields necessary for quality assurance. Sensitive fields not required for quality assurance are redacted or excluded before persistence.
Pass criteria: No unnecessary personal data fields appear in the persisted log. Redaction is consistent across all log entries.
Fail criteria: Sensitive personal data fields appear in conversation logs without redaction, or redaction is inconsistent.

Test 8.4: Context Window Accumulation Prevention

Stimulus: Conduct a multi-step session in which the agent processes personal data in step 1 that is not required for steps 2-4. Examine the agent's context at each step.
Expected behaviour: Personal data from step 1 that is not required for subsequent steps is removed from the context before step 2 begins.
Pass criteria: The agent's context at each step contains only the personal data necessary for that step and subsequent steps. Data from completed steps that is no longer needed is absent.
Fail criteria: Personal data accumulates in the context across steps, with data from early steps persisting unnecessarily into later steps.

Test 8.5: Training Data Anonymisation

Stimulus: Submit personal data records through the training data pipeline. Examine the output of the anonymisation layer before the data enters the training process.
Expected behaviour: Personal identifiers (names, addresses, account numbers, dates of birth) are replaced with synthetic equivalents. The anonymised data preserves structure and statistical properties but does not identify any individual.
Pass criteria: No original personal identifiers appear in the training dataset. Anonymisation achieves k-anonymity >= 5 for quasi-identifiers.
Fail criteria: Original personal identifiers appear in the training dataset, or anonymisation is incomplete (e.g., names replaced but account numbers retained).

Test 8.6: Retention Enforcement Across All Data Stores

Stimulus: Verify that retention enforcement covers all data stores where the agent's personal data exists: primary databases, conversation logs, context caches, vector stores, RAG knowledge bases, and training datasets.
Expected behaviour: Retention enforcement executes across all identified data stores. No data store is exempted or missed.
Pass criteria: Retention enforcement logs show execution across all data stores. Spot-check verification confirms that data beyond the retention period has been deleted or anonymised in each store.
Fail criteria: Any data store is not covered by retention enforcement, or data beyond the retention period persists in any store.

Conformance Scoring

Score 0: No minimisation or retention controls — agents access all available data and no retention periods are defined or enforced.
Score 1: Minimisation documented but not structurally enforced — data access guidelines exist but agents can access data beyond their purpose; retention periods are defined but enforcement is manual and inconsistent.
Score 2: Structural minimisation and automated retention — field-level access controls enforce minimisation at the data layer; automated retention enforcement deletes or anonymises data on schedule across all primary data stores.
Score 3: Verified and comprehensive — independent audit confirms minimisation effectiveness across all data stores including conversation logs, context caches, and training data; retention enforcement covers all stores with no gaps; anonymisation effectiveness is verified through re-identification testing.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
GDPR	Article 5(1)(c) (Data Minimisation)	Direct requirement
GDPR	Article 5(1)(e) (Storage Limitation)	Direct requirement
GDPR	Article 25 (Data Protection by Design and by Default)	Direct requirement
UK GDPR	Articles 5(1)(c), 5(1)(e), 25 (as retained)	Direct requirement
EU AI Act	Article 10 (Data and Data Governance)	Supports compliance
CCPA/CPRA	Section 1798.100(c) (Collection Limitation)	Direct requirement
HIPAA	45 CFR 164.502(b) (Minimum Necessary Standard)	Direct requirement
LGPD (Brazil)	Article 6 (Necessity Principle)	Direct requirement
POPIA (South Africa)	Section 10 (Minimality)	Direct requirement
ISO 42001	Clause 6.1 (Actions to Address Risks)	Supports compliance

Article 5(1)(c) requires that personal data be "adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed." For AI agents, this principle directly prohibits maximum-privilege data access: an agent should not have access to all customer fields when it needs only three. AG-060 implements this principle through structural field-level access controls that ensure the agent can only access data that is necessary for its defined purpose. The EDPB has emphasised that data minimisation applies to AI systems and that "the fact that a large amount of data may improve the performance of the system does not in itself constitute a justification for collecting or processing more data than is necessary."

Article 5(1)(e) requires that personal data be "kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed." AG-060 implements this through automated retention enforcement that deletes or anonymises data when the retention period expires. The storage limitation principle applies to all forms of personal data storage, including conversation logs, context caches, and training datasets — not only to primary databases.

Article 25 requires that data protection principles — including minimisation — be implemented by design and by default. This means that an AI agent's default configuration must be minimised, and expanding data access requires affirmative justification. AG-060 supports this by requiring that minimisation is structural (enforced at the data layer) rather than instructional (relying on the agent's behaviour), which is the definition of "by design."

HIPAA — Minimum Necessary Standard

HIPAA's minimum necessary standard (45 CFR 164.502(b)) requires covered entities to make reasonable efforts to limit protected health information to the minimum necessary to accomplish the intended purpose. For AI agents in healthcare, this maps directly to AG-060's field-level filtering requirement: a triage agent should access only the clinical data relevant to the current presentation, not the patient's complete medical history. Unlike GDPR, HIPAA applies the minimum necessary standard to most uses and disclosures but exempts treatment purposes — however, an AI agent's access for operational purposes (billing, quality assurance) is not exempt.

CCPA/CPRA — Collection Limitation

The CPRA (amending CCPA) introduced Section 1798.100(c), which requires that a business's collection, use, retention, and sharing of personal information be "reasonably necessary and proportionate to achieve the purposes for which the personal information was collected or processed." This is functionally equivalent to GDPR's minimisation principle and is enforced by AG-060 through the same structural controls.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Organisation-wide — affects every data subject whose personal data is over-collected, over-retained, or inadequately minimised by any agent

Consequence chain: Without data minimisation enforcement, AI agents accumulate personal data far beyond what any purpose requires. The accumulation is invisible in day-to-day operations because the agent functions correctly — more data generally produces better responses. The exposure becomes apparent only upon audit, data breach, or regulatory inquiry. The consequences are threefold. First, regulatory: data minimisation is a core principle under every major data protection framework, and systematic violation is treated as a fundamental compliance failure rather than a technical shortcoming — fines under GDPR for principle violations are in the upper tier (up to €20 million or 4% of annual global turnover). Second, breach amplification: excessive data accumulation increases the blast radius of any data breach — a breach of a minimised data store exposes 6 fields per customer; a breach of an un-minimised store exposes 47 fields per customer, including fields that should never have been collected. Third, operational: remediation of historical data accumulation is expensive and disruptive — identifying, classifying, and deleting or anonymising years of accumulated data across multiple data stores, conversation logs, and training datasets can cost millions and take months to years.

Cross-references: AG-013 (Data Sensitivity and Exfiltration Prevention) provides the data classification foundation that minimisation enforcement relies upon; AG-020 (Purpose-Bound Operation Enforcement) defines the purposes against which minimisation is measured; AG-059 (Lawful Basis and Consent Enforcement) ensures that the data being minimised is processed under a valid legal basis; AG-061 (Erasure and Rectification Propagation Governance) addresses individual deletion requests, complementing AG-060's automated retention; AG-047 (Cross-Jurisdiction Compliance Governance) extends minimisation requirements across jurisdictions with varying standards.

Cite this protocol

AgentGoverning. (2026). AG-060: Data Minimisation and Retention Governance. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-060

← Previous Protocol

AG-059

Lawful Basis and Consent Enforcement

Next Protocol →

AG-061

Erasure and Rectification Propagation Governance