AG-322

Data Minimisation by Design Governance

Privacy, Consent & Data Subject Rights ~15 min read AGS v2.1 · April 2026
EU AI Act GDPR FCA NIST HIPAA ISO 42001

2. Summary

Data Minimisation by Design Governance requires that every AI agent collects, retains, and processes only the personal data that is strictly necessary for the specific processing purpose — and that this constraint is enforced structurally, not by relying on the agent's own judgement about what data it needs. The system must implement data access controls that limit what data the agent can read, retention policies that automatically purge data that is no longer necessary, and collection filters that prevent the agent from ingesting data beyond its defined scope. This dimension ensures that the GDPR Article 5(1)(c) principle of data minimisation is implemented as an engineering constraint, not merely a policy aspiration.

3. Example

Scenario A — Agent Over-Collects for "Future Use": An AI customer service agent for a telecommunications company is designed to resolve billing queries. The agent is given read access to the complete customer record — including call detail records, location data, device identifiers, payment card details, previous complaint history, and marketing preferences — because the development team anticipated that "the agent might need any of this data for edge cases." In practice, the agent uses only account balance, current plan, and recent invoice data for 98.7% of queries. However, the agent loads the full record into its context for every interaction, exposing all data to potential extraction through prompt injection. An adversary crafts a query that causes the agent to include call detail records in its response. Result: Location data and call records for 14,000 customers exposed. ICO enforcement action: GBP 2.1 million fine under data minimisation principle, plus mandatory data access remediation.

What went wrong: The agent had access to 47 data fields when it needed only 5 for its primary function. No data access scoping existed. The full customer record was loaded into the agent's context, maximising the attack surface. The development team's "might need it" approach directly violated data minimisation.

Scenario B — Retention Without Expiry: A financial advice AI agent retains conversation logs containing customer financial details — income, savings, pension values, investment holdings — indefinitely for "quality improvement." After 3 years of operation, the agent's conversation store contains 2.8 million conversations with detailed financial profiles. A data subject access request reveals that the agent retains conversations from 2 years ago despite the customer having closed their account 18 months prior. Result: FCA and ICO joint investigation. Finding: retention of financial data beyond the retention period specified in the privacy notice. Mandatory deletion of 1.9 million conversations, GBP 750,000 fine, and 6-month remediation programme.

What went wrong: No automated retention policy existed. Data was retained indefinitely by default. No mechanism purged data when the processing purpose ended (account closure). The agent's storage grew without limit, accumulating risk proportional to volume.

Scenario C — Minimisation Correctly Implemented: A healthcare triage AI agent is configured with field-level access controls. For triage (purpose PUR-TRIAGE-001), the agent can access: presenting symptoms, age range (not exact date of birth), gender, relevant medical history (conditions list — no clinical notes), and current medications. The agent cannot access: patient address, next of kin, billing information, or full clinical notes. When the agent requests data, the data access gateway returns only the permitted fields — the agent never sees the full record. Conversation logs are retained for 90 days, then automatically deleted. Retained logs are redacted to remove symptom descriptions after 30 days, retaining only the triage outcome for audit. Result: DPIA review confirms minimal data footprint. Regulatory audit finds no excess data. The agent processes 12,000 triage interactions per month with a data footprint of 340 MB — compared to an estimated 14 GB if full records were loaded.

4. Requirement Statement

Scope: This dimension applies to all AI agents that process personal data. It covers three phases of the data lifecycle: collection (what data the agent ingests), processing (what data the agent accesses during operation), and retention (how long data remains in the agent's accessible storage). The scope includes data in all forms: structured records, unstructured text in conversation logs, embeddings in vector databases, cached data in agent memory, and derived features in feature stores. The scope extends to data received from upstream systems — an agent must not accept data fields beyond what is necessary for its defined purpose, even if the upstream system offers more. Agents that process only anonymised data verified as non-reversible are excluded.

4.1. A conforming system MUST implement field-level access controls that restrict each agent to the specific data fields necessary for its defined processing purpose, as documented in the purpose registry (AG-319).

4.2. A conforming system MUST enforce data collection limits at the ingestion layer, preventing agents from accepting or loading data fields beyond those authorised for their processing purpose.

4.3. A conforming system MUST implement automated retention policies that delete or anonymise personal data when the processing purpose ends or the defined retention period expires, whichever occurs first.

4.4. A conforming system MUST document the necessity justification for every personal data field accessed by each agent, linking the field to the specific processing purpose that requires it.

4.5. A conforming system MUST prevent agents from storing personal data in uncontrolled locations — including local files, unmanaged memory stores, or temporary storage — outside the governed data layer.

4.6. A conforming system MUST apply data minimisation to agent context windows, ensuring that conversation logs, prompt histories, and working memory are purged of personal data when it is no longer needed for the active interaction.

4.7. A conforming system SHOULD implement data access scoping dynamically based on the specific task, rather than granting a static maximum-access profile for all tasks.

4.8. A conforming system SHOULD conduct periodic minimisation reviews (minimum semi-annually) comparing actual data usage patterns against authorised data access, identifying fields that are authorised but never or rarely used.

4.9. A conforming system MAY implement progressive data loading, where the agent initially receives a minimal dataset and requests additional fields only when the interaction requires them, with each request logged and justified.

5. Rationale

GDPR Article 5(1)(c) requires that personal data be "adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed." This is not a best practice — it is a mandatory principle whose violation attracts penalties under Article 83(5)(a) of up to EUR 20 million or 4% of global turnover.

AI agents create acute data minimisation risks for three reasons. First, agents are typically designed to be versatile — developers grant broad data access to handle edge cases, resulting in agents that can access far more data than they routinely need. Second, agents with large context windows can ingest and retain substantial volumes of personal data within a single session, creating an in-memory data store that persists for the interaction duration and is vulnerable to extraction. Third, agents often retain conversation logs indefinitely for quality improvement or model training, accumulating personal data without bounds.

The "might need it" approach to data access is the most common violation. Development teams, uncertain about which data fields the agent might need, grant access to the complete record. This maximises the attack surface — every field the agent can access is a field that can be extracted through prompt injection, context manipulation, or output interception. Data minimisation is therefore both a privacy control and a security control: reducing the data the agent can access reduces the data that can be compromised.

Retention without purpose is equally problematic. An agent that retains conversation logs containing financial details, health information, or personal identifiers for "quality improvement" without a defined retention period violates both data minimisation (retaining more than necessary) and storage limitation (Article 5(1)(e)). The risk compounds over time — an agent with 3 years of undeleted conversation logs represents a single point of compromise for 3 years of customer data.

6. Implementation Guidance

The core architecture for AG-322 compliance is a data access gateway that mediates all agent data access, returning only the fields authorised for the agent's specific purpose, combined with automated retention policies that enforce data lifecycle management.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. Regulatory retention requirements (7 years for transaction records under FCA/SEC rules) coexist with minimisation requirements. The resolution is purpose-specific retention: financial records are retained for regulatory periods while marketing data is purged on shorter cycles. Agents must not conflate regulatory retention obligations with blanket data retention.

Healthcare. Clinical records have long retention requirements (8-30 years depending on jurisdiction and record type), but data used for AI-assisted triage or appointment scheduling does not inherit clinical retention periods. Minimisation requires separating clinical records from operational AI data.

Public Sector. FOI obligations may require retention of decision records, but minimisation still applies to the personal data within those records. Redaction of personal data from retained records after the operational retention period is a recommended practice.

Maturity Model

Basic Implementation — The organisation has documented which data fields each agent accesses and the purpose justification for each field. Retention periods are defined in policy. Data access is controlled at the system or table level rather than the field level. Retention enforcement is manual or semi-automated. This level meets minimum requirements but creates risk through coarse access controls and manual retention enforcement.

Intermediate Implementation — Field-level access controls restrict agent data access to purpose-specific field sets. Automated retention policies execute daily, deleting or anonymising data on schedule. Context window preparation filters personal data before model inference. Periodic minimisation reviews compare actual usage against authorised access. Conversation logs are automatically redacted on defined schedules.

Advanced Implementation — All intermediate capabilities plus: dynamic data access scoping adjusts field access based on the specific task within an interaction. Progressive data loading minimises initial data exposure. Minimisation metrics (data fields accessed vs. data fields authorised, average context window personal data volume, retention compliance rate) are tracked in real-time dashboards. Independent testing confirms that agents cannot access data beyond their authorised scope under any input condition. Data access patterns are analysed for anomalies indicating over-collection.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Field-Level Access Enforcement

Test 8.2: Unauthorised Field Request Rejection

Test 8.3: Automated Retention Enforcement

Test 8.4: Context Window Minimisation

Test 8.5: Uncontrolled Storage Prevention

Test 8.6: Retention on Purpose Completion

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
GDPRArticle 5(1)(c) (Data Minimisation)Direct requirement
GDPRArticle 5(1)(e) (Storage Limitation)Direct requirement
GDPRArticle 25 (Data Protection by Design and by Default)Direct requirement
EU AI ActArticle 10 (Data and Data Governance)Supports compliance
CCPA/CPRASection 1798.100(c) (Collection Limitation)Direct requirement
UK Data Protection Act 2018Schedule 1, Part 4 (Appropriate Policy Document)Supports compliance
HIPAA45 CFR 164.502(b) (Minimum Necessary Standard)Direct requirement
NIST AI RMFMAP 1.5, MANAGE 2.4Supports compliance
ISO 42001Clause 8.4 (AI System Operation)Supports compliance

GDPR — Article 5(1)(c) (Data Minimisation)

Article 5(1)(c) requires that personal data be "adequate, relevant and limited to what is necessary." AG-322 directly implements this by structurally limiting what data agents can access, collect, and retain. The ICO's guidance on data minimisation states that organisations should "identify the minimum amount of personal data you need to fulfil your purpose" — AG-322's data access matrix implements exactly this identification and enforces it at the infrastructure layer.

GDPR — Article 25 (Data Protection by Design and by Default)

Article 25(2) requires that "by default, only personal data which are necessary for each specific purpose of the processing are processed." This applies to "the amount of personal data collected, the extent of their processing, the period of their storage and their accessibility." AG-322 implements Article 25 for AI agents by ensuring that field-level access controls, retention automation, and context window minimisation are built into the agent architecture, not added as afterthoughts.

HIPAA — 45 CFR 164.502(b) (Minimum Necessary Standard)

The HIPAA minimum necessary standard requires covered entities to make reasonable efforts to use, disclose, and request only the minimum amount of protected health information needed for the purpose. For AI agents operating in healthcare, AG-322's field-level access controls directly implement the minimum necessary standard by ensuring agents access only the specific health data fields required for their function.

CCPA/CPRA — Section 1798.100(c)

The CPRA added explicit collection limitation requirements, requiring that businesses "shall not collect additional categories of personal information or use personal information collected for additional purposes" beyond what is disclosed. AG-322's collection filters and purpose-scoped access controls provide the structural mechanism for CPRA collection limitation compliance.

10. Failure Severity

FieldValue
Severity RatingHigh
Blast RadiusOrganisation-wide — every interaction exposes excess data; every retained record beyond necessity accumulates risk

Consequence chain: Over-collection and over-retention create compounding risk. Every excess data field the agent can access is an additional field exposed if the agent is compromised through prompt injection, context manipulation, or output interception. An agent with access to 47 fields when it needs 5 exposes 42 unnecessary fields to every attack vector. Over time, indefinite retention accumulates a growing data store: an agent processing 10,000 interactions per day retaining full conversation logs creates 3.65 million retained records per year. A single breach of this store exposes years of personal data. The regulatory exposure is dual: GDPR Article 5(1)(c) violation for over-collection (up to EUR 20 million or 4% of turnover) and Article 5(1)(e) violation for over-retention (same penalty range). The violations are systemic — they apply to every data subject processed, strengthening the case for the maximum penalty. For AI agents specifically, the attack surface amplification makes data minimisation a security imperative as well as a privacy requirement.

Cross-references: AG-059 (Data Classification & Sensitivity Labelling), AG-060 (Consent & Lawful Basis Verification), AG-061 (Data Subject Rights Execution), AG-063 (Privacy-by-Design Integration), AG-013 (Multi-Jurisdictional Compliance Mapping), AG-319 (Purpose-Consent Granularity Governance), AG-321 (Sensitive Attribute Inference Governance), AG-323 (Children's Data Restriction Governance), AG-326 (Privacy Impact Assessment Trigger Governance).

Cite this protocol
AgentGoverning. (2026). AG-322: Data Minimisation by Design Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-322