AG-410: High-Cardinality Trace Retention Governance

2. Summary

High-Cardinality Trace Retention Governance requires that organisations deploying AI agents establish formally governed policies and mechanisms for retaining detailed execution traces — the fine-grained records of an agent's reasoning steps, tool invocations, intermediate results, confidence scores, context retrievals, and decision branch points — for material decisions, without those traces becoming a liability through excessive storage cost, privacy violation, or operational degradation. High-cardinality traces are distinguished from summary logs by their granularity: where a log entry records "Agent approved claim #4821 for £34,000," a trace records the 47 intermediate steps, 12 tool calls, 3 retrieved documents, 8 confidence scores, and 2 fallback invocations that produced that approval. These traces are essential for forensic reconstruction, bias investigation, regulatory audit, and model governance — but they also contain personal data, proprietary reasoning chains, and volumes that can overwhelm storage and search infrastructure. This dimension mandates a governed approach to trace retention that balances evidentiary completeness against privacy obligations, cost constraints, and operational sustainability.

3. Example

Scenario A — Trace Destruction Prevents Bias Investigation: A public sector benefits agency deploys an AI agent to assess eligibility for housing assistance. The agent processes 45,000 applications per year, generating an average of 2,300 trace records per application (reasoning steps, data retrievals, scoring calculations, threshold comparisons). At 2,300 records per application, the annual trace volume is 103.5 million records consuming 4.2 TB of storage. After 14 months, storage costs reach £186,000 per year, and the IT team implements a 90-day trace retention policy to reduce costs. Six months later, a civil liberties organisation files a complaint alleging that the agent systematically denies applications from specific ethnic groups at a 28% higher rate. The agency needs to investigate 18 months of decisions — but traces older than 90 days have been destroyed. Only summary logs remain, showing the outcome (approved/denied) but not the reasoning path. The agency cannot determine whether the agent's scoring model weighted certain features differently for different demographic groups because the intermediate scoring traces are gone. The investigation stalls. A judicial review finds the agency unable to demonstrate that its decision-making was non-discriminatory, resulting in a £2.4 million remediation order requiring manual re-assessment of 12,000 denied applications.

What went wrong: The trace retention policy was driven by storage cost alone, with no consideration of evidentiary requirements. The 90-day retention period was insufficient for the bias investigation timeline, which required 18 months of historical traces. No governance process assessed the retention period against regulatory obligations (public sector equality duties require retention sufficient to support discrimination investigations, which routinely span 12-24 months). Consequence: inability to investigate a bias allegation, judicial review finding, £2.4 million remediation order, and reputational damage to the agency.

Scenario B — Unredacted Traces Create a Privacy Breach: A customer-facing insurance agent processes health insurance claims. Each trace includes the full context window at each reasoning step — which, for health insurance claims, contains detailed medical histories, diagnoses, treatment records, and clinician notes. The organisation retains traces for 7 years to meet financial services record-keeping requirements. A security audit discovers that the trace storage system is accessible to 340 employees across IT operations, data engineering, and development teams — all of whom can query the raw traces and read the medical data. The traces were never subject to data classification (AG-014) or PII handling (AG-015) because they were treated as "system logs" rather than "personal data." A developer exports 12,000 traces containing medical data for a model fine-tuning project without data protection impact assessment. The breach is reported to the data protection authority, which levies a £780,000 fine for inadequate technical measures under GDPR Article 32 and for processing personal data without a lawful basis for the fine-tuning use under GDPR Article 6.

What went wrong: Traces were treated as technical infrastructure rather than as data assets containing personal and sensitive data. No data classification was applied to trace contents. Access controls were not designed for the sensitivity of the data within traces. The retention policy satisfied financial record-keeping requirements but violated data minimisation principles by retaining full unredacted medical data for 7 years when the investigative need could have been met with redacted or pseudonymised traces. Consequence: £780,000 regulatory fine, mandatory remediation programme, restriction on trace retention pending implementation of access controls and redaction.

Scenario C — Trace Volume Degrades Forensic Usability: A financial-value agent executes high-frequency portfolio rebalancing, generating 14,000 trace records per hour during market hours — reasoning steps, price lookups, risk calculations, order constructions, and execution confirmations. Over a single trading day, this produces 98,000 trace records. Over a quarter, 6.37 million records. When a regulatory enquiry requires the organisation to reconstruct the agent's decision-making for a specific sequence of trades on a specific day, the investigation team must search through 98,000 records to find the relevant traces. The trace storage system has no indexing beyond timestamp and agent identifier. Queries take 45 minutes per search. Reconstructing the decision path for 12 related trades takes the investigation team 3 weeks. The regulator interprets the delay as non-cooperation. The total investigation cost is £420,000 in analyst time, legal fees, and regulatory engagement.

What went wrong: Traces were retained but not indexed or structured for forensic retrieval. The volume of traces (98,000 per day) made linear search impractical. No trace correlation mechanism linked related traces (e.g., all traces contributing to a single trade decision). No tiered storage strategy distinguished between high-frequency operational traces (which could be aggregated after a short period) and decision-significant traces (which required full-fidelity retention with rich indexing). Consequence: £420,000 investigation cost, regulatory relationship damage, and a 3-week delay in a time-sensitive enquiry.

4. Requirement Statement

Scope: This dimension applies to every AI agent deployment where the agent produces execution traces — detailed records of reasoning steps, intermediate computations, tool invocations, data retrievals, confidence scores, or decision branch points — beyond summary-level log entries. The scope includes the policies, mechanisms, and infrastructure governing how long traces are retained, at what fidelity, with what access controls, under what privacy safeguards, and with what retrieval capabilities. It applies regardless of whether traces are stored locally, in cloud object storage, in a time-series database, or in a distributed tracing system. The scope explicitly includes the intersection of trace retention with data protection obligations: traces frequently contain personal data, and retention policies must satisfy both evidentiary requirements (retain enough) and privacy requirements (retain no more than necessary, with appropriate safeguards). Organisations that rely on third-party observability platforms must ensure that the platform's retention policies and access controls conform to this dimension.

4.1. A conforming system MUST define a trace retention policy for each agent deployment (or class of deployments) that specifies: the categories of traces retained, the retention period for each category, the fidelity level (full, summarised, or redacted) at each stage of the retention lifecycle, and the justification for each retention decision referencing specific evidentiary, regulatory, or operational requirements.

4.2. A conforming system MUST classify trace data according to the organisation's data classification framework (per AG-014), identifying which traces contain personal data, sensitive personal data, proprietary information, or other protected categories, and applying the corresponding handling requirements.

4.3. A conforming system MUST implement access controls on retained traces that restrict access to authorised personnel with a documented need, enforcing the principle of least privilege, and logging all access to trace data including the accessor identity, timestamp, query parameters, and volume of data accessed.

4.4. A conforming system MUST implement redaction or pseudonymisation of personal data within traces where the evidentiary purpose can be served without retaining identifiable data, applying redaction at the earliest feasible point in the trace lifecycle.

4.5. A conforming system MUST retain traces for material decisions — decisions with safety, financial, legal, or rights-affecting consequences — for a period sufficient to support regulatory investigation, bias audit, and legal proceedings, which MUST be no less than 24 months for high-risk agents and no less than 12 months for all other agents, unless a longer period is required by applicable regulation.

4.6. A conforming system MUST index retained traces to support forensic retrieval by at minimum: timestamp, agent identifier, decision identifier (linking all traces contributing to a single decision), event category (per AG-409 taxonomy), and data subject identifier (pseudonymised where required).

4.7. A conforming system MUST implement a tiered retention strategy that distinguishes between decision-significant traces (requiring full-fidelity, long-term retention with rich indexing) and operational traces (eligible for aggregation, summarisation, or shorter retention periods), with documented criteria for the tier assignment.

4.8. A conforming system MUST reconcile trace retention with data subject rights, implementing mechanisms to honour erasure requests (right to be forgotten) without destroying the evidentiary value of traces — for example, through irreversible pseudonymisation that removes the link between the trace and the data subject while preserving the trace's analytical value.

4.9. A conforming system SHOULD implement automated trace lifecycle management that transitions traces through retention tiers (full fidelity to summarised to archived to deleted) based on policy-defined triggers such as elapsed time, regulatory hold expiry, or investigation closure.

4.10. A conforming system SHOULD establish storage cost budgets for trace retention and monitor actual costs against budgets, triggering governance review when costs exceed thresholds — preventing cost-driven ad hoc retention reduction decisions like the one in Scenario A.

4.11. A conforming system MAY implement selective high-fidelity retention, where a sampling strategy retains full traces for a statistically representative subset of routine decisions while retaining full traces for all material decisions, reducing storage volume without sacrificing forensic or audit capability.

5. Rationale

AI agent traces are the richest source of evidence for understanding why an agent made a particular decision. Summary logs record that a decision was made; traces record how it was made — the reasoning path, the data consulted, the alternatives considered, the confidence levels, and the points where the agent's behaviour could have diverged. This granularity is indispensable for three governance functions: forensic investigation, bias audit, and regulatory compliance.

Forensic investigation requires the ability to reconstruct an agent's decision-making process after the fact. When an agent makes an incorrect, harmful, or disputed decision, investigators must trace the causal chain from input to output to determine what went wrong — whether the error was in the data, the model, the prompt, the tool, or the operational context. Without traces, forensic investigation is limited to input-output analysis, which cannot distinguish between a model that reached the wrong conclusion for the right reasons (a data quality issue) and a model that reached the wrong conclusion for the wrong reasons (a model deficiency). The remediation for each is entirely different, and traces are the only way to make the distinction.

Bias audit requires access to the intermediate computations that contribute to decisions affecting individuals. A summary log showing that a benefits application was denied does not reveal whether the denial was influenced by protected characteristics. Only the trace — showing which features were weighted, what scores were computed, and where threshold comparisons determined the outcome — enables a bias auditor to assess whether the decision-making process was non-discriminatory. Regulatory frameworks increasingly recognise this: the EU AI Act's requirements for transparency and explainability implicitly require trace-level evidence, because summary-level logs cannot support the depth of analysis required.

However, trace retention creates three countervailing pressures. First, storage cost: high-cardinality traces generate enormous data volumes. An agent processing 1,000 decisions per day with 50 trace records per decision produces 50,000 records daily, 18.25 million annually. At an average of 2 KB per record, this is 36.5 GB per year per agent — manageable for one agent, but an organisation with 50 agents faces nearly 2 TB per year of trace data alone. Without governance, storage costs escalate until someone makes a cost-driven retention reduction that destroys evidentiary value (Scenario A). Second, privacy: traces frequently contain personal data — the inputs to the agent's reasoning inevitably include the data about the individual whose case is being processed. Retaining traces without privacy safeguards creates a personal data repository that may violate data minimisation principles, lack adequate access controls, and become a target for misuse (Scenario B). Third, usability: retaining traces without indexing, tiering, or lifecycle management produces a data swamp where the volume of retained data makes it practically impossible to find the specific traces needed for a particular investigation (Scenario C).

The governance challenge is to balance these four forces: retain enough trace data for sufficient time to support forensic investigation, bias audit, and regulatory compliance; protect the privacy of data subjects whose information appears in traces; manage storage costs sustainably; and ensure that retained traces are retrievable and usable when needed. AG-410 provides the governance framework for this balancing act.

6. Implementation Guidance

High-cardinality trace retention requires a lifecycle-oriented approach: traces are not simply "stored" — they move through defined phases from creation to eventual deletion, with fidelity, access controls, and storage tier changing at each phase. The core architectural pattern is: capture at full fidelity, classify immediately, redact early, index thoroughly, tier based on materiality, and delete per policy.

Recommended patterns:

Decision-centric trace grouping. Group all trace records contributing to a single decision under a shared decision identifier. A single agent decision may produce 20-200 trace records (reasoning steps, tool calls, retrievals, scoring). Grouping these under a decision ID enables forensic retrieval of complete decision traces without searching the entire corpus. The decision ID should be generated at the start of the decision process and propagated to all downstream trace emissions. This is the single most impactful pattern for forensic usability.
Tiered retention with policy-driven transitions. Define at least three retention tiers: Tier 1 (hot storage, full fidelity, rich indexing, fast retrieval — for traces younger than the investigation window, typically 90-180 days); Tier 2 (warm storage, full fidelity, standard indexing, moderate retrieval speed — for traces within the regulatory retention period); Tier 3 (cold/archive storage, summarised or aggregated, minimal indexing, slow retrieval — for traces approaching end of retention). Automated lifecycle rules transition traces between tiers based on age, materiality classification, and regulatory hold status. Material-decision traces skip Tier 3 and remain at Tier 2 fidelity for the full retention period.
Privacy-preserving trace redaction pipeline. Implement an automated redaction pipeline that runs within 24 hours of trace creation. The pipeline identifies personal data within trace records (using the data classification from AG-014), applies redaction or pseudonymisation appropriate to the data category, and produces a redacted trace that preserves analytical value while removing identifiable information. For example, a trace showing "Applicant John Smith, DOB 15/03/1982, postcode SW1A 1AA, scored 0.73 on eligibility model" is redacted to "Applicant [REDACTED-A7F3], DOB [REDACTED], postcode [AREA-SW1A], scored 0.73 on eligibility model." The pseudonymisation key is stored separately with stricter access controls, enabling re-identification only when authorised for specific investigation purposes.
Forensic retrieval interface. Provide a dedicated interface for trace retrieval that supports queries by decision ID, agent ID, time range, event category (per AG-409 taxonomy), data subject pseudonym, and outcome. The interface must return complete decision traces (all records for a given decision) rather than individual records, because forensic value comes from the complete reasoning chain, not isolated steps. Query response time should support regulatory investigation timelines — no more than 4 hours for any single-decision retrieval, no more than 48 hours for bulk retrievals spanning thousands of decisions.
Retention policy impact assessment. Before any change to trace retention policy — including retention period reduction, fidelity level change, or tier modification — conduct a formal impact assessment evaluating: regulatory obligations that the current retention period satisfies, pending or foreseeable investigations that would be affected, data protection implications of the change, and cost impact. This prevents cost-driven retention reduction decisions that destroy evidentiary value.

Anti-patterns to avoid:

Uniform retention for all traces. Applying the same retention period and fidelity to all traces regardless of decision materiality. A routine internal copilot response and a £500,000 loan approval should not have the same trace retention treatment. Uniform retention either under-retains material traces or over-retains routine traces, wasting storage and creating unnecessary privacy exposure.
Cost-driven retention reduction without governance. Allowing IT operations to reduce trace retention periods based on storage cost thresholds without governance review. Scenario A demonstrates the consequence: a 90-day retention policy destroyed evidence needed for a bias investigation, resulting in a £2.4 million remediation order that dwarfed the storage cost savings.
Treating traces as system logs. Classifying traces under "system logs" or "infrastructure data" rather than recognising that they contain personal data, business-critical evidence, and proprietary reasoning chains. This misclassification leads to inadequate access controls, missing data protection impact assessments, and exposure to privacy enforcement (Scenario B).
Full-fidelity retention without redaction. Retaining complete, unredacted traces for the full retention period. While full fidelity is essential for immediate forensic capability (Tier 1), long-term retention of identifiable data creates privacy liability. Redaction should be applied as traces transition from Tier 1 to Tier 2, preserving analytical value while removing identifiable information.
Unindexed trace storage. Retaining traces in flat file storage, object storage, or log aggregation systems without structured indexing. Scenario C demonstrates the consequence: 98,000 records per day with no indexing beyond timestamp makes forensic retrieval impractical. Traces without indexes are traces that cannot be used when they are needed.

Industry Considerations

Financial Services. Financial regulators (FCA, SEC, BaFin) require transaction record retention for 5-7 years, and this extends to AI agent traces that contributed to financial decisions. MiFID II requires that firms retain records of all services, activities, and transactions sufficient to enable the competent authority to monitor compliance. For AI agents, this means traces showing how the agent reached a trading, advisory, or approval decision must be retained with sufficient fidelity. Firms should implement regulatory hold mechanisms that prevent trace deletion when a regulatory investigation is pending.

Healthcare. Clinical decision traces contain patient data subject to health data protection regulations (GDPR special category data, HIPAA in the US). Retention must balance clinical safety investigation needs (which may require traces spanning years) against data minimisation obligations. Pseudonymisation is particularly critical in healthcare traces — the combination of diagnosis, treatment, and outcome data in traces may be sufficient to identify individuals even without direct identifiers.

Public Sector. Agencies making decisions about individual rights (benefits, licensing, enforcement, immigration) face the longest effective retention requirements because administrative appeals and judicial review can occur years after the original decision. The UK public sector equality duty requires the ability to demonstrate that decisions were non-discriminatory, which necessitates trace retention sufficient to support retrospective bias audits. Scenario A directly illustrates this risk.

Crypto and Web3. On-chain transactions are inherently retained (blockchain immutability), but the off-chain agent reasoning traces that led to those transactions are not. The governance challenge is retaining off-chain traces that explain on-chain actions, particularly for market-making and trading agents where regulatory scrutiny of algorithmic decision-making is increasing. Travel Rule compliance may require retention of traces showing the agent's counterparty risk assessment.

Maturity Model

Basic Implementation — The organisation has a documented trace retention policy for each agent deployment specifying retention periods, fidelity levels, and justifications. Traces are classified under the data classification framework. Access controls restrict trace access to authorised personnel with logged access. Decision-significant traces are retained for at least 24 months for high-risk agents. Basic indexing supports retrieval by timestamp, agent ID, and decision ID. Redaction or pseudonymisation is applied to personal data in traces within a defined window.

Intermediate Implementation — All basic capabilities plus: tiered retention with automated lifecycle management transitions traces through retention phases. A privacy-preserving redaction pipeline processes traces within 24 hours of creation. A forensic retrieval interface supports multi-dimensional queries with response time SLAs. Retention policy changes require formal impact assessment. Storage cost monitoring tracks actual costs against budgets. Erasure request mechanisms enable pseudonymisation of individual-linked traces without destroying analytical value.

Advanced Implementation — All intermediate capabilities plus: selective high-fidelity retention uses statistical sampling for routine decisions while retaining full traces for all material decisions. Automated materiality classification assigns retention tiers based on decision characteristics (value, risk, affected population). Cross-agent trace correlation links traces across multiple agents involved in the same decision workflow. Retention policies are independently audited. Real-time dashboards monitor trace volume, storage cost, access patterns, redaction completeness, and retrieval performance across all agent deployments.

7. Evidence Requirements

Required artefacts:

Trace retention policy. The current retention policy for each agent deployment or deployment class, specifying: categories of traces retained, retention period per category, fidelity level at each retention stage, justification for each retention decision, and the applicable regulatory basis. Must be a formal governance document with version control, not an informal configuration setting.
Data classification records for trace data. Documentation showing that trace data has been classified under the organisation's data classification framework (per AG-014), identifying which trace categories contain personal data, sensitive personal data, or other protected categories.
Access control specification and access logs. The access control policy for trace storage, and access logs showing all queries to trace data over the most recent 12 months, including accessor identity, timestamp, query parameters, and data volume accessed.
Redaction or pseudonymisation records. Evidence that redaction or pseudonymisation is applied to personal data within traces, including: the redaction pipeline specification, processing timestamps showing redaction was applied within the defined window, and sample redacted traces demonstrating that personal data was removed while analytical value was preserved.
Trace indexing specification. Documentation of the indexing scheme applied to retained traces, including the indexed fields (at minimum: timestamp, agent ID, decision ID, event category, data subject pseudonym) and query performance benchmarks.
Retention policy change impact assessments. For any retention policy changes in the review period, the formal impact assessment evaluating regulatory, investigative, privacy, and cost implications.
Erasure request handling records. Records of data subject erasure requests processed against trace data, showing: the request, the action taken (pseudonymisation or deletion), the scope of affected traces, and confirmation that evidentiary value was preserved where legally required.

Retention requirements:

Trace retention policies and change impact assessments: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.
Access logs for trace storage: minimum 3 years regardless of sector.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Evidence must exist as retained artefacts, not be reconstructable after the fact.

8. Test Specification

Test 8.1: Retention Policy Completeness

Stimulus: For each deployed agent (or deployment class), retrieve the trace retention policy. Verify that the policy specifies: trace categories retained, retention period per category, fidelity level at each retention stage, and regulatory or evidentiary justification for each retention decision. Verify that material-decision traces have a retention period of at least 24 months for high-risk agents and 12 months for other agents.
Expected behaviour: Every deployed agent has a trace retention policy meeting all specified requirements. Minimum retention periods are met or exceeded.
Pass criteria: 100% of deployed agents have a complete retention policy. All material-decision retention periods meet or exceed the minimum. Every retention decision has a documented justification.
Fail criteria: Any deployed agent lacks a retention policy, or any policy is missing required fields, or any material-decision retention period is below the minimum.

Test 8.2: Data Classification of Trace Data

Stimulus: For each agent deployment, retrieve the data classification record for trace data. Verify that trace data has been classified under the organisation's data classification framework. For agents processing personal data, verify that the classification identifies the personal data categories present in traces and the corresponding handling requirements.
Expected behaviour: All trace data is classified. Personal data categories are identified with specific handling requirements.
Pass criteria: 100% of agent trace data has a current data classification record. All personal data categories are identified. Handling requirements are documented for each classification level.
Fail criteria: Any agent's trace data lacks a data classification record, or personal data is present but not identified in the classification.

Test 8.3: Access Control Enforcement

Stimulus: Attempt to access trace data using: (a) an unauthorised user account with no trace access permissions, (b) an authorised user account querying traces outside their permitted scope (e.g., a different department's agents), (c) an authorised user account within their permitted scope. Verify that access is denied for (a) and (b) and granted for (c). Verify that all three attempts are logged.
Expected behaviour: Unauthorised and out-of-scope access attempts are denied. Authorised, in-scope access is granted. All attempts are logged with accessor identity, timestamp, and query parameters.
Pass criteria: Access denied for (a) and (b). Access granted for (c). All three attempts appear in access logs with complete metadata.
Fail criteria: Unauthorised or out-of-scope access is granted, or any access attempt is not logged.

Test 8.4: Redaction Pipeline Effectiveness

Stimulus: Inject 50 test trace records containing known personal data elements (names, dates of birth, addresses, identification numbers, medical information). Trigger the redaction pipeline. Retrieve the redacted traces and verify that: all personal data elements are redacted or pseudonymised, the redaction was applied within the defined processing window (e.g., 24 hours), and the analytical content of the traces (scores, decisions, reasoning steps) is preserved.
Expected behaviour: All 50 traces are redacted within the processing window. All personal data is removed or pseudonymised. Analytical content is intact.
Pass criteria: 100% of known personal data elements are redacted across all 50 test traces. Redaction completed within the defined window. Analytical fields (scores, decision outcomes, reasoning steps, timestamps) remain intact and queryable.
Fail criteria: Any personal data element survives redaction, or redaction exceeds the defined processing window, or analytical content is destroyed by the redaction process.

Test 8.5: Forensic Retrieval Performance

Stimulus: Using the forensic retrieval interface, execute the following queries against production-scale trace data: (a) retrieve all traces for a specific decision ID, (b) retrieve all material-decision traces for a specific agent within a 7-day window, (c) retrieve all traces matching a specific event category (per AG-409) within a 30-day window. Measure query response time and result completeness.
Expected behaviour: All queries return complete results within defined SLAs. Decision ID queries return all trace records for the decision. Time-range queries return all matching traces.
Pass criteria: Decision ID query completes within 4 hours. Time-range queries complete within 48 hours. Result completeness is verified against known test data (100% of expected traces returned). No traces are missing from results.
Fail criteria: Any query exceeds its SLA, or any query returns incomplete results (missing traces that should have matched).

Test 8.6: Tiered Retention Lifecycle Execution

Stimulus: Create test traces with timestamps that place them at the boundary of each retention tier transition (e.g., at 89 and 91 days for a 90-day Tier 1-to-Tier 2 transition). Trigger the lifecycle management process. Verify that traces are transitioned to the correct tier at the correct time, with the correct fidelity transformation (e.g., redaction applied at tier transition if required).
Expected behaviour: Traces transition to the correct tier based on their age and materiality classification. Fidelity transformations (summarisation, aggregation) are applied as specified. Material-decision traces remain at higher fidelity tiers for the full retention period.
Pass criteria: All test traces are in the correct tier after lifecycle execution. Fidelity level matches the policy for each tier. Material-decision traces are not prematurely summarised or aggregated. Transitions are logged with timestamps.
Fail criteria: Any trace is in the wrong tier, or fidelity level does not match policy, or material-decision traces have been improperly downgraded.

Test 8.7: Erasure Request Handling

Stimulus: Submit a data subject erasure request for a known test data subject whose personal data appears in 25 trace records across 3 agents. Execute the erasure process. Verify that: the data subject's personal data is removed from all 25 traces (through irreversible pseudonymisation or deletion), the analytical value of the traces is preserved (decision outcomes, scores, reasoning steps remain queryable), and the erasure action is logged with the request reference, affected trace count, and completion timestamp.
Expected behaviour: All personal data for the data subject is removed. Traces remain analytically useful. The erasure is documented.
Pass criteria: Zero instances of the data subject's identifiable information remain in any trace across all three agents. Analytical fields remain intact. Erasure log records the request, scope (25 traces, 3 agents), and completion timestamp. Process completes within the organisation's defined SLA for erasure requests.
Fail criteria: Any identifiable information for the data subject survives the erasure process, or analytical content is destroyed, or the erasure is not logged.

Conformance Scoring

Score 0: No trace retention governance exists — traces are either not retained (no forensic capability), retained without policy (unbounded growth with no access controls or privacy safeguards), or deleted ad hoc based on storage cost pressures.
Score 1: A trace retention policy exists specifying retention periods and justifications. Traces are retained for material decisions. Basic access controls restrict trace access. However, redaction is not automated, indexing is limited, and tiered retention is not implemented.
Score 2: Tiered retention with automated lifecycle management is operational. An automated redaction pipeline processes traces within the defined window. Forensic retrieval supports multi-dimensional queries within SLAs. Access controls enforce least privilege with comprehensive logging. Retention policy changes require formal impact assessment. Erasure requests are handled without destroying evidentiary value.
Score 3: Verified by independent audit — an independent party has validated retention policy completeness, redaction effectiveness, access control enforcement, forensic retrieval performance, and erasure request handling. Automated materiality classification assigns retention tiers. Cross-agent trace correlation is operational. Storage cost monitoring and budget governance prevent ad hoc retention changes. Real-time dashboards track trace lifecycle metrics.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 12 (Record-Keeping / Automatic Logging)	Direct requirement
EU AI Act	Article 9 (Risk Management System)	Supports compliance
GDPR	Articles 5(1)(e), 17, 25 (Storage Limitation, Right to Erasure, Data Protection by Design)	Direct requirement
SOX	Section 802 (Criminal Penalties for Altering Documents)	Supports compliance
SOX	Section 404 (Internal Controls Over Financial Reporting)	Supports compliance
FCA SYSC	9.1.1R (Record-Keeping)	Direct requirement
NIST AI RMF	GOVERN 1.5, MAP 3.5, MEASURE 2.6	Supports compliance
ISO 42001	Clause 9.1 (Monitoring, Measurement, Analysis, Evaluation)	Supports compliance
DORA	Article 10 (ICT-Related Incident Management)	Supports compliance
DORA	Article 12 (ICT-Related Incident Reporting)	Supports compliance

EU AI Act — Article 12 (Record-Keeping / Automatic Logging)

Article 12 requires that high-risk AI systems are designed with automatic logging capabilities, and that logs are kept for an appropriate period of time in light of the intended purpose and applicable legal obligations. AG-410 directly implements this requirement by mandating governed retention policies with defined retention periods justified by evidentiary and regulatory requirements. The article's requirement that logs enable monitoring of the system's operation "with respect to identifying situations that may result in risks" necessitates trace-level granularity — summary logs that record only outcomes cannot identify the internal situations (reasoning anomalies, confidence degradation, feature weighting shifts) that constitute risk indicators. The governance framework ensures that retention periods are "appropriate" rather than arbitrary, and that the logs are structured for the monitoring purpose the article requires.

GDPR Article 5(1)(e) (storage limitation) requires that personal data be kept for no longer than is necessary for the purposes for which it is processed. Article 17 (right to erasure) requires the ability to erase personal data upon request, subject to exceptions for legal claims and public interest. Article 25 (data protection by design) requires that data protection measures be integrated into processing from the design stage. AG-410 addresses all three: the tiered retention policy ensures that traces containing personal data are retained only as long as necessary (5(1)(e)); the erasure request handling mechanism enables pseudonymisation that honours erasure rights while preserving evidentiary value (17); and the redaction pipeline implements data protection by design by removing personal data at the earliest feasible point (25). The tension between evidentiary retention and storage limitation is resolved through redaction — traces can be retained for evidentiary purposes without retaining identifiable personal data.

SOX — Section 802 and Section 404

Section 802 imposes criminal penalties for altering, destroying, or concealing documents with the intent to impede a federal investigation. For AI agents in financial processes, traces are documents that could be relevant to an investigation. Premature destruction of traces — particularly cost-driven destruction as in Scenario A — risks Section 802 exposure if an investigation subsequently requires those traces. AG-410 mitigates this risk through governed retention policies with formal impact assessment before any retention reduction. Section 404 requires adequate internal controls, which for AI agents includes the ability to demonstrate how the agent reached financial decisions — a capability that depends on trace retention.

FCA SYSC — 9.1.1R (Record-Keeping)

The FCA requires firms to arrange for orderly records to be kept of their business and internal organisation, including all services and transactions undertaken. For AI agents, the "services and transactions" include the agent's decision-making processes, not merely the outcomes. The FCA's supervision approach increasingly expects firms to demonstrate how algorithmic and AI-driven decisions are reached, which requires trace-level records. The 5-7 year retention requirement for financial services traces is driven directly by FCA record-keeping expectations.

DORA — Articles 10 and 12

DORA requires financial entities to classify ICT-related incidents and report major incidents to competent authorities with detailed information about the root cause and impact. For AI agent-related incidents, root cause analysis depends on trace data showing what the agent did and why. AG-410 ensures that traces are available for the investigation and reporting periods that DORA mandates, with sufficient indexing and retrieval capability to meet the reporting timelines (initial notification within 4 hours, intermediate report within 72 hours, final report within 1 month).

NIST AI RMF — GOVERN 1.5, MAP 3.5, MEASURE 2.6

GOVERN 1.5 addresses organisational processes for ongoing AI risk monitoring, which requires retained evidence of AI system behaviour. MAP 3.5 addresses the identification of AI system metrics and their measurement, which requires trace data as input. MEASURE 2.6 addresses the measurement of AI system performance and impacts, requiring historical trace data to establish baselines and detect trends. AG-410 ensures that the trace data these functions depend on is retained, retrievable, and reliable.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Organisation-wide for forensic and audit capability; individual-level for privacy failures affecting data subjects

Consequence chain: Failure in high-cardinality trace retention governance produces two distinct consequence chains depending on the failure mode. In the under-retention failure mode (Scenario A), traces are destroyed prematurely, eliminating the organisation's ability to reconstruct agent decision-making when an investigation requires it. The immediate consequence is forensic incapability — the organisation cannot explain why an agent made a specific decision. The downstream consequence is regulatory non-compliance, as the organisation cannot meet record-keeping obligations (EU AI Act Article 12, FCA SYSC 9.1.1R, SOX Section 404). The terminal consequence is adverse findings in investigations: bias allegations cannot be investigated (Scenario A: £2.4 million remediation), regulatory enquiries cannot be satisfied (Scenario C: £420,000 investigation cost), and legal proceedings lack evidentiary support. In the over-retention or uncontrolled-retention failure mode (Scenario B), traces are retained without privacy safeguards, creating a personal data repository that violates data protection obligations. The immediate consequence is a data protection breach — personal data retained longer than necessary, accessible to unauthorised personnel, and processed for unauthorised purposes. The downstream consequence is regulatory enforcement under GDPR or equivalent data protection law (Scenario B: £780,000 fine). The terminal consequence is loss of trust — data subjects, regulators, and the public lose confidence in the organisation's ability to handle AI-generated data responsibly, potentially leading to restrictions on AI deployment. Both failure modes are characterised by the same root cause: the absence of governance over the retention lifecycle, leading to either ad hoc destruction or uncontrolled accumulation of trace data.

Cross-references: AG-409 (Critical Event Taxonomy Governance) provides the event classification that determines which traces are decision-significant and drives tiered retention assignment. AG-014 (Data Classification Governance) provides the data classification framework applied to trace data. AG-015 (PII & Sensitive Data Handling) governs the handling of personal data identified within traces. AG-016 (Data Retention & Right to Erasure) provides the overarching data retention and erasure framework within which trace retention operates. AG-412 (Time Synchronisation Validation Governance) ensures that trace timestamps are accurate and synchronised across systems, a prerequisite for forensic correlation. AG-415 (Decision Journal Completeness Governance) consumes traces to validate that all material decisions have complete journals. AG-416 (Evidentiary Chain-of-Custody Governance) governs the integrity and provenance of traces as evidentiary artefacts. AG-417 (Telemetry Sampling Bias Governance) ensures that any sampling applied to trace retention does not introduce systematic bias. AG-418 (Cross-System Trace Correlation Governance) enables linking traces across multiple agents and systems involved in the same decision workflow.

Cite this protocol

AgentGoverning. (2026). AG-410: High-Cardinality Trace Retention Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-410

← Previous Protocol

AG-409

Critical Event Taxonomy Governance

Next Protocol →

AG-411

Video and Screen Evidence Governance