AG-380: Checkpoint Garbage-Collection Governance

2. Summary

Checkpoint Garbage-Collection Governance requires that stale checkpoints, incomplete execution state, and orphaned workflow snapshots are retired through a formally defined, auditable process that preserves forensic integrity while reclaiming system resources. Checkpoints accumulate as agents execute multi-step workflows — each representing a recoverable snapshot of execution state — and without governed retirement, they consume storage, create security exposure through retained sensitive data, and risk accidental resumption of obsolete execution contexts. This dimension ensures that checkpoint lifecycle management follows defined retention policies, that deletion is verifiable and irreversible, and that no checkpoint is removed while it remains required for rollback, audit, or regulatory retention.

3. Example

Scenario A — Stale Checkpoint Resumes Obsolete Trading Strategy: An investment management firm deploys AI agents to execute algorithmic trading strategies. Each agent checkpoints its execution state — including portfolio positions, pending orders, and risk parameters — every 60 seconds. When a strategy is retired and replaced with a new version, the agent's 14,000 accumulated checkpoints are not cleaned up. Three months later, a system restart causes the orchestration platform to scan for resumable checkpoints. It discovers and resumes a checkpoint from the retired strategy, which begins executing trades based on a three-month-old market model. Before detection, the agent executes 312 trades totalling £8.7 million in notional value, resulting in a £1.4 million loss against positions the firm no longer intends to hold.

What went wrong: No garbage-collection policy existed for checkpoints associated with retired workflows. The orchestration platform treated all checkpoints as valid resumption points regardless of age or the status of the associated workflow. The checkpoint contained sufficient state to resume autonomous execution without re-validation. Consequence: £1.4 million direct trading loss, FCA investigation into algorithmic trading controls, potential enforcement action under MAR for uncontrolled market activity, £3.2 million in remediation costs including checkpoint lifecycle management system implementation.

Scenario B — Accumulated Checkpoints Expose Personal Data Beyond Retention Period: A customer-facing AI agent processing insurance claims creates checkpoints at each workflow stage containing the claimant's personal data: name, address, medical records, bank details, and claim photographs. The agent processes 2,500 claims per month. After 18 months, the organisation holds 315,000 checkpoint files containing personal data for claims that have been fully settled, many beyond the 12-month retention period defined in the organisation's privacy policy. A data subject access request from a former claimant reveals that the organisation holds 14 checkpoint snapshots containing the individual's medical records despite the claim being settled 16 months prior. The supervisory authority investigates and discovers the systemic retention failure.

What went wrong: Checkpoint creation was automated as part of the workflow engine, but no corresponding garbage-collection process existed. The checkpoint storage was not integrated with the data retention policy engine. Personal data embedded in checkpoints was invisible to the organisation's data lifecycle management. Consequence: GDPR Article 5(1)(e) violation for storage limitation principle breach, supervisory authority investigation, potential fine of up to €10 million or 2% of global annual turnover under Article 83(4), £1.8 million remediation programme to identify and purge all over-retained checkpoint data, reputational damage from public notification.

Scenario C — Checkpoint Accumulation Causes Safety-Critical System Failure: An embodied AI agent controlling an autonomous warehouse logistics system checkpoints its state — including vehicle positions, route plans, and obstacle maps — every 5 seconds to support rapid recovery from failures. The checkpoint storage volume is provisioned with 500 GB. Over six months of continuous operation, checkpoint accumulation fills the volume to 98% capacity. The garbage-collection process, implemented as a low-priority background task, cannot keep pace with checkpoint creation. When the volume reaches 100%, the checkpoint write fails silently, and the agent continues operating without recovery capability. A subsequent sensor failure causes the agent to collide with a warehouse worker, and the system cannot roll back to a safe state because no valid checkpoint exists. The worker sustains injuries requiring hospitalisation.

What went wrong: Garbage-collection was implemented as a best-effort background process without resource guarantees. No monitoring alerted on checkpoint storage approaching capacity. The silent write failure meant the system continued operating without the recovery capability it was designed to provide. The collision recovery procedure assumed checkpoint availability that no longer existed. Consequence: worker injury requiring hospitalisation, HSE investigation, potential prosecution under the Health and Safety at Work Act 1974, £2.1 million in compensation and legal costs, facility shutdown pending safety review, insurance premium increase of £450,000 annually.

4. Requirement Statement

Scope: This dimension applies to all AI agent systems that create checkpoints, snapshots, or persistent records of execution state for the purpose of recovery, resumption, rollback, or debugging. A checkpoint is any persistent artefact that captures the agent's execution context at a point in time with the intent of enabling future state restoration. This includes explicit checkpoints created by workflow orchestration engines, implicit snapshots created by container orchestration systems, database transaction savepoints used for agent state recovery, serialised agent memory states, cached intermediate computation results retained for recovery purposes, and any other persistent artefact tied to a specific point in an agent's execution timeline. Systems that create no persistent execution state — stateless agents that re-derive all context on each invocation — are excluded. The scope extends to checkpoint-like artefacts regardless of their storage medium: filesystem, object storage, database, distributed cache, or blockchain. If it captures execution state and persists beyond the immediate execution context, it is within scope.

4.1. A conforming system MUST define a formal checkpoint retention policy for each workflow type, specifying the maximum number of checkpoints retained per workflow instance, the maximum age of retained checkpoints, and the conditions under which checkpoints become eligible for garbage collection.

4.2. A conforming system MUST execute garbage collection of eligible checkpoints automatically according to the defined retention policy, without requiring manual intervention for routine cleanup.

4.3. A conforming system MUST prevent garbage collection of any checkpoint that is still required for an active workflow's rollback capability, an in-progress regulatory hold, an unresolved audit request, or a pending data-subject access request.

4.4. A conforming system MUST ensure that checkpoint deletion is irreversible and verifiable — deleted checkpoint data cannot be recovered from the storage medium, and the deletion event is recorded in a tamper-evident log including the checkpoint identifier, deletion timestamp, retention policy that triggered the deletion, and the identity of the process that executed the deletion.

4.5. A conforming system MUST prevent stale checkpoints from being used to resume or restore agent execution after the checkpoint has been superseded by a more recent valid checkpoint or after the associated workflow has reached a terminal state.

4.6. A conforming system MUST monitor checkpoint storage utilisation and generate alerts when utilisation exceeds a configurable threshold, ensuring that garbage-collection capacity keeps pace with checkpoint creation rate.

4.7. A conforming system MUST ensure that garbage collection of checkpoints containing personal data complies with applicable data retention regulations, including the right to erasure under GDPR Article 17, and that checkpoint retirement is integrated with the organisation's data lifecycle management processes.

4.8. A conforming system SHOULD implement garbage collection as a resource-guaranteed process with dedicated compute and I/O allocation, rather than a best-effort background task that competes with production workloads for resources.

4.9. A conforming system SHOULD classify checkpoint data by sensitivity tier (e.g., contains personal data, contains financial data, contains safety-critical state) and apply tier-appropriate deletion methods, including cryptographic erasure or secure overwrite for sensitive tiers.

4.10. A conforming system SHOULD maintain a checkpoint registry that tracks the lifecycle state of every checkpoint (active, eligible for collection, held for regulatory or audit purposes, deleted) and is queryable for compliance reporting.

4.11. A conforming system MAY implement checkpoint compression or deduplication to reduce storage consumption between garbage-collection cycles while preserving the ability to restore from any retained checkpoint.

4.12. A conforming system MAY support legal hold overrides that indefinitely suspend garbage collection for checkpoints associated with specific workflow instances, entities, or time periods, triggered by compliance or legal teams without requiring engineering intervention.

5. Rationale

Checkpoint Garbage-Collection Governance addresses a deceptively mundane but operationally critical risk: the uncontrolled accumulation of persistent execution state. Every modern agent framework that supports multi-step workflows creates checkpoints — snapshots of the agent's execution context that enable recovery from failures, rollback from errors, and resumption after interruptions. These checkpoints are essential for operational resilience, but without governed lifecycle management, they become a growing liability across three dimensions: resource consumption, data protection, and execution safety.

The resource dimension is straightforward but consequential. Checkpoints consume storage. In high-throughput deployments — an agent processing thousands of transactions daily, each checkpointed multiple times — storage accumulation is rapid. A single agent checkpointing every 60 seconds produces 525,600 checkpoints per year. If each checkpoint is 50 KB (modest for a workflow with meaningful state), that is 25 GB per agent per year. An organisation deploying 100 agents accumulates 2.5 TB of checkpoint data annually. Without garbage collection, this grows indefinitely, consuming infrastructure budget and eventually causing storage exhaustion failures. The failure mode of storage exhaustion is particularly dangerous because it is often silent — the checkpoint write fails, the agent continues operating, and the recovery capability the checkpoints were designed to provide silently disappears.

The data protection dimension is more insidious. Checkpoints capture the agent's execution context, which frequently includes the data being processed: customer personal data, financial transaction details, medical records, authentication tokens, API keys, and other sensitive information. Each checkpoint is a snapshot of this data at a point in time. Data protection regulations — GDPR, CCPA, HIPAA — require that personal data be retained only as long as necessary for the purpose for which it was collected. Checkpoint data is typically collected for the purpose of enabling workflow recovery. Once the workflow is complete, the recovery purpose is fulfilled, and continued retention requires a separate legal basis. Many organisations fail to recognise checkpoints as a data processing activity subject to retention limitations, creating systemic non-compliance that scales with the volume of checkpoint data accumulated.

The execution safety dimension is the most dangerous. Stale checkpoints are not inert — they are resumable execution contexts. An orchestration system that discovers a checkpoint may attempt to resume execution from that point. If the checkpoint is stale — from a retired workflow version, an outdated data model, or an obsolete operational context — resumption can cause the agent to execute actions that are inconsistent with the current operational state. This is not a hypothetical risk; it is a well-documented failure pattern in distributed systems where orphaned state leads to zombie processes executing unintended actions.

The intersection with AG-379 (Workflow State-Machine Integrity Governance) is direct: AG-379 governs the integrity of state transitions during execution; AG-380 governs the integrity of state artefacts after execution phases complete. Together, they ensure that workflow state is correct during execution and properly retired after execution. The intersection with AG-006 (Tamper-Evident Record Integrity) is equally direct: checkpoint deletion must be logged with the same tamper-evidence guarantees as checkpoint creation, ensuring a complete forensic trail of the checkpoint's lifecycle.

The intersection with AG-016 (Data Retention & Right to Erasure) creates a regulatory obligation: checkpoints containing personal data must be subject to the same retention and erasure policies as any other personal data store. The practical challenge is that checkpoints are often created by infrastructure layers that are invisible to the organisation's data governance team. AG-380 makes checkpoint data lifecycle visible and governable.

6. Implementation Guidance

AG-380 establishes the checkpoint retention policy as the central governance artefact for checkpoint lifecycle management. A checkpoint retention policy is a versioned, formally defined specification that determines when checkpoints are created, how long they are retained, under what conditions they become eligible for garbage collection, and how they are deleted. The policy is not optional — it is a structural requirement that prevents unbounded state accumulation, ensures data protection compliance, and eliminates the risk of stale checkpoint resumption.

The retention policy should be stored as structured configuration data in a persistent layer independent of the agent's runtime. The garbage-collection mechanism should operate as an autonomous process with its own resource allocation, monitoring, and alerting. Garbage collection should never depend on the agent's cooperation — it is an infrastructure concern, not an agent concern.

Recommended patterns:

Time-to-live (TTL) with reference counting. Assign each checkpoint a TTL based on the workflow type's retention policy. Maintain a reference count tracking the number of active consumers (rollback capability, audit hold, regulatory hold). The checkpoint becomes eligible for garbage collection when either the TTL expires and the reference count is zero, or the associated workflow reaches a terminal state and no holds exist. This pattern cleanly separates the temporal policy (TTL) from the functional dependency (reference count).
Generational garbage collection. Borrow from runtime memory management: classify checkpoints into generations based on age. Recent checkpoints (Generation 0) are retained with high availability for immediate rollback. Older checkpoints (Generation 1) are compressed and moved to lower-cost storage. The oldest checkpoints (Generation 2) are evaluated against the retention policy and either archived (if regulatory holds apply) or deleted. This pattern balances storage cost with availability requirements for different checkpoint age ranges.
Event-driven lifecycle management. Trigger garbage-collection evaluation on workflow lifecycle events rather than periodic scans. When a workflow reaches a terminal state, immediately evaluate all associated checkpoints for eligibility. When a new checkpoint is created, evaluate whether older checkpoints for the same instance have become redundant. This pattern ensures that garbage collection is responsive to workflow events rather than dependent on scan frequency.
Cryptographic erasure for sensitive checkpoints. For checkpoints containing personal data or financial data, encrypt each checkpoint with a unique key managed by a key management service. Garbage collection deletes the encryption key rather than overwriting the checkpoint data. This is significantly faster than secure overwrite for large checkpoints and provides equivalent data protection assurance, since the encrypted data is unrecoverable without the key.

Anti-patterns to avoid:

Best-effort background garbage collection. Implementing garbage collection as a low-priority background process that yields to production workloads is the most common cause of checkpoint accumulation. Under sustained load, the background process never runs, checkpoints accumulate, and storage exhaustion eventually occurs. Garbage collection must have guaranteed resource allocation.
Deleting checkpoints without logging. Checkpoint deletion must be recorded in a tamper-evident log. Without deletion records, the organisation cannot demonstrate to regulators that data was deleted in accordance with retention policies, cannot investigate incidents involving missing checkpoints, and cannot verify that garbage collection is functioning correctly.
Treating all checkpoints identically. Checkpoints containing personal medical data require different handling than checkpoints containing only computational state. Sensitivity classification drives deletion method selection (secure overwrite vs. simple deletion), retention period, and regulatory reporting obligations. A single policy for all checkpoints either over-retains sensitive data or under-protects it.
Garbage-collecting checkpoints based solely on age. A checkpoint that is old may still be required: regulatory holds, pending audits, unresolved data subject requests, or in-progress rollback procedures all create dependencies that override age-based eligibility. Age is a necessary but not sufficient condition for garbage collection.
Relying on the agent to manage its own checkpoints. If the agent is responsible for deleting its own old checkpoints, a compromised, failed, or retired agent leaves orphaned checkpoints that are never cleaned up. Checkpoint lifecycle management must be an infrastructure responsibility, not an agent responsibility.

Industry Considerations

Financial Services. Checkpoint retention in financial services must account for multiple overlapping regulatory retention requirements: MiFID II requires retention of records relating to investment services for a minimum of 5 years (7 years in some jurisdictions). SOX requires retention of audit-relevant records for 7 years. Anti-money laundering regulations require retention of transaction records for 5 years after the business relationship ends. Checkpoints that capture transaction state during processing may fall under any or all of these requirements. Financial services firms should implement a checkpoint classification scheme that maps checkpoint content to applicable retention schedules, ensuring that garbage collection respects the longest applicable retention period while not over-retaining beyond it.

Healthcare. Checkpoints in healthcare workflows frequently contain protected health information (PHI) subject to HIPAA retention requirements. The HIPAA Privacy Rule requires that covered entities retain PHI for 6 years from the date of creation or the date when it was last in effect, whichever is later. Checkpoints from clinical workflow agents — capturing patient data during triage, assessment, or treatment planning — must respect this retention floor. Additionally, the HIPAA Security Rule requires that disposal of PHI render it unrecoverable, mapping directly to the secure deletion requirements of AG-380. State-level medical records retention laws may impose longer retention periods.

Safety-Critical Systems. In safety-critical deployments, checkpoint garbage collection must never compromise the system's ability to recover from failures. Garbage-collection processes for CPS agents should be formally verified to ensure that at least the minimum required number of recovery checkpoints is always available. The garbage-collection process itself must be fail-safe: if garbage collection fails mid-execution, it must not leave the checkpoint store in an inconsistent state where some checkpoints are partially deleted. IEC 61508 safety integrity levels inform the rigour required for garbage-collection process verification.

Crypto/Web3. Checkpoint state for agents interacting with blockchain systems may include private keys, unsigned transaction data, or intermediate state proofs. Garbage collection of such checkpoints requires cryptographic-grade secure deletion to prevent key material recovery. Additionally, checkpoint retention must account for blockchain finality periods — a checkpoint associated with a pending transaction must not be garbage-collected until the transaction has achieved sufficient confirmation depth.

Maturity Model

Basic Implementation — The organisation has defined checkpoint retention policies for each workflow type specifying maximum age and maximum count. Garbage collection is implemented as a scheduled process (e.g., daily cron job) that scans the checkpoint store, identifies eligible checkpoints, and deletes them. Deletion is logged. Stale checkpoint resumption is prevented by checking workflow status before restoration. This level meets the minimum mandatory requirements but has operational weaknesses: the scheduled nature of garbage collection means checkpoints accumulate between runs, storage utilisation monitoring may not catch rapid accumulation, and the garbage-collection process competes with production workloads for resources.

Intermediate Implementation — Garbage collection operates as a dedicated process with guaranteed resource allocation, triggered by both schedule and workflow lifecycle events. Checkpoints are classified by sensitivity tier, with appropriate deletion methods per tier. A checkpoint registry tracks lifecycle state and is queryable for compliance reporting. Storage utilisation monitoring generates alerts at configurable thresholds. Legal hold capability allows compliance teams to suspend garbage collection for specific entities or time periods. Reference counting prevents garbage collection of checkpoints with active dependencies (rollback, audit, regulatory hold). Deletion logging is tamper-evident per AG-006.

Advanced Implementation — All intermediate capabilities plus: cryptographic erasure is used for sensitive checkpoint tiers, eliminating the need for secure overwrite and enabling near-instantaneous deletion. Generational storage management moves checkpoints through tiers based on age, optimising storage cost. Garbage-collection throughput is automatically scaled based on checkpoint creation rate, ensuring the collection process always keeps pace with creation. Formal verification has confirmed that the minimum required recovery checkpoints are always available even during garbage-collection operations. The organisation can demonstrate to regulators a complete, tamper-evident lifecycle trail for every checkpoint from creation through deletion, including the retention policy that governed each lifecycle transition.

7. Evidence Requirements

Required artefacts:

Checkpoint retention policy artefact. The formal, versioned retention policy for each workflow type, specifying maximum checkpoint age, maximum checkpoint count, eligibility conditions for garbage collection, sensitivity classification, and applicable deletion method per tier. Format: structured data (JSON, YAML, or configuration management export). Not a prose description.
Garbage-collection execution log. Tamper-evident, timestamped records of every garbage-collection execution, including checkpoints evaluated, checkpoints deleted, checkpoints retained (with reason: active dependency, regulatory hold, audit hold), deletion method used, and storage reclaimed. Minimum 12 months retained for active review.
Checkpoint lifecycle registry. A queryable registry showing the lifecycle state (active, eligible, held, deleted) of every checkpoint, supporting compliance queries such as "show all checkpoints containing data for entity X" or "show all checkpoints deleted in the past 30 days."
Storage utilisation monitoring evidence. Configuration and alert history for checkpoint storage monitoring, demonstrating that utilisation thresholds are configured and alerts are generated and responded to.
Data protection compliance evidence. Records demonstrating that checkpoints containing personal data were garbage-collected in accordance with applicable retention policies and data subject erasure requests, including evidence of the deletion method used for sensitive data tiers.
Stale resumption prevention evidence. Test results or architectural documentation demonstrating that stale checkpoints cannot be used to resume agent execution after they have been superseded or their associated workflow has terminated.

Retention requirements:

Checkpoint retention policies and garbage-collection logs: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise. Note: the retention policy for the garbage-collection logs themselves must outlive the retention period for the checkpoints they describe.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Evidence must exist as retained artefacts, not be reconstructable after the fact.

8. Test Specification

Testing AG-380 compliance requires validation that checkpoint garbage collection operates reliably, respects retention policies, prevents stale resumption, and maintains forensic integrity. A comprehensive test programme should include the following tests.

Test 8.1: Retention Policy Enforcement

Stimulus: Create a workflow with a defined retention policy (e.g., maximum 10 checkpoints, maximum 72-hour age). Generate 15 checkpoints over a period exceeding 72 hours. Trigger or wait for garbage collection.
Expected behaviour: Garbage collection removes checkpoints exceeding both the count limit and the age limit. After collection, no more than 10 checkpoints remain, and no checkpoint older than 72 hours remains (unless held by an active dependency).
Pass criteria: Post-collection checkpoint count and age conform to the retention policy. Garbage collection executed automatically without manual intervention.
Fail criteria: Checkpoints exceeding the retention policy limits remain after garbage collection, or garbage collection required manual triggering for routine cleanup.

Test 8.2: Active Dependency Protection

Stimulus: Create a workflow instance in a non-terminal state with multiple checkpoints. Place the oldest checkpoint under a regulatory hold. Trigger garbage collection with the retention policy indicating the held checkpoint is eligible for deletion by age.
Expected behaviour: Garbage collection skips the held checkpoint despite its age exceeding the retention policy. The hold takes precedence over the age-based eligibility rule. Non-held checkpoints exceeding the retention policy are deleted normally.
Pass criteria: The held checkpoint is retained. The garbage-collection log records the reason (regulatory hold) for retaining an otherwise eligible checkpoint. All non-held eligible checkpoints are deleted.
Fail criteria: The held checkpoint is deleted despite the regulatory hold, or the garbage-collection log does not record the retention reason.

Test 8.3: Stale Checkpoint Resumption Prevention

Stimulus: Create a workflow instance, generate checkpoints, and advance the workflow to a terminal state. Attempt to resume or restore agent execution from a checkpoint associated with the terminated workflow.
Expected behaviour: The resumption attempt is rejected. The system recognises that the associated workflow has reached a terminal state and that the checkpoint is not a valid resumption point.
Pass criteria: No stale checkpoint can be used to resume execution after the associated workflow has terminated. The rejection includes a reason identifying the workflow's terminal status.
Fail criteria: A stale checkpoint from a terminated workflow is accepted as a valid resumption point and agent execution resumes from obsolete state.

Test 8.4: Deletion Irreversibility and Logging

Stimulus: Execute garbage collection to delete a set of eligible checkpoints. Attempt to recover the deleted checkpoint data from the storage medium (through direct storage access, backup restoration, or filesystem recovery tools). Verify that the deletion event is recorded in the tamper-evident log.
Expected behaviour: Deleted checkpoint data is unrecoverable. The deletion log entry includes the checkpoint identifier, deletion timestamp, retention policy that triggered deletion, and the identity of the garbage-collection process.
Pass criteria: Checkpoint data cannot be recovered after deletion. The deletion event is recorded in a tamper-evident log with all required fields. For sensitive-tier checkpoints, the secure deletion method (cryptographic erasure or secure overwrite) was applied.
Fail criteria: Deleted checkpoint data is recoverable, the deletion event is not logged, or sensitive-tier checkpoints were deleted using a non-secure method.

Test 8.5: Storage Utilisation Monitoring and Alerting

Stimulus: Rapidly create checkpoints to drive checkpoint storage utilisation above the configured alert threshold (e.g., 80% capacity).
Expected behaviour: The monitoring system detects the threshold breach and generates an alert within the configured detection interval. The alert includes current utilisation percentage, the rate of accumulation, and projected time to exhaustion.
Pass criteria: Alert generated within the expected detection window. Alert content includes actionable information (utilisation level, accumulation rate, time to exhaustion).
Fail criteria: No alert is generated when the threshold is exceeded, or the alert lacks sufficient information for operational response.

Test 8.6: Data Protection Compliance for Personal Data Checkpoints

Stimulus: Create checkpoints containing personal data (name, address, financial details) as part of a customer-facing workflow. Complete the workflow. Submit a data subject erasure request (GDPR Article 17) for the individual whose data appears in the checkpoints.
Expected behaviour: The erasure request triggers identification and deletion of all checkpoints containing the data subject's personal data, using a secure deletion method appropriate to the sensitivity tier. The erasure is logged and a confirmation record is created.
Pass criteria: All checkpoints containing the data subject's personal data are identified and securely deleted. The deletion is completed within the organisation's defined response timeline (typically 30 days under GDPR). A confirmation record documents which checkpoints were deleted and the method used.
Fail criteria: Checkpoints containing the data subject's personal data survive the erasure request, or the deletion uses a non-secure method for personal data, or no confirmation record is created.

Test 8.7: Garbage-Collection Resource Guarantee Under Load

Stimulus: Simulate a sustained high-throughput workload where checkpoint creation rate equals or exceeds the garbage-collection capacity observed under normal conditions. Measure whether garbage collection maintains pace with creation over an extended period (e.g., 24 hours).
Expected behaviour: Garbage collection, operating with its dedicated resource allocation, maintains pace with checkpoint creation. Checkpoint storage utilisation remains stable or decreasing over the test period, not growing unboundedly.
Pass criteria: Checkpoint storage utilisation does not grow unboundedly under sustained load. The garbage-collection process demonstrates throughput sufficient to match the creation rate. If the creation rate temporarily exceeds collection capacity, the backlog is cleared within a defined recovery period.
Fail criteria: Checkpoint storage utilisation grows continuously under sustained load, indicating that the garbage-collection process cannot keep pace. The storage utilisation trend projects to exhaustion within the provisioned capacity.

Conformance Scoring

Score 0: No checkpoint garbage collection exists — checkpoints accumulate indefinitely with no lifecycle management.
Score 1: Checkpoint cleanup exists but is manual or ad-hoc — operators periodically delete old checkpoints without formal policy, logging, or automation.
Score 2: Automated garbage collection operates according to a defined retention policy. Deletion is logged. Active dependencies (holds, active workflows) prevent premature deletion. Stale checkpoints cannot be used for resumption. Storage monitoring is operational.
Score 3: Verified by independent testing — garbage collection has been tested under sustained load, adversarial conditions (hold manipulation, storage exhaustion), and regulatory scenarios (erasure requests, audit holds). Cryptographic erasure or equivalent secure deletion is used for sensitive tiers. Complete tamper-evident lifecycle trails exist for every checkpoint from creation through deletion.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 12 (Record-Keeping)	Direct requirement
EU AI Act	Article 9 (Risk Management System)	Supports compliance
GDPR	Article 5(1)(e) (Storage Limitation), Article 17 (Right to Erasure)	Direct requirement
SOX	Section 802 (Criminal Penalties for Altering Documents)	Supports compliance
FCA SYSC	9.1.1R (Record-Keeping Requirements)	Direct requirement
NIST AI RMF	GOVERN 1.5, MANAGE 2.4	Supports compliance
ISO 42001	Clause 6.1 (Actions to Address Risks), Annex B (AI Data Management)	Supports compliance
DORA	Article 12 (Backup Policies and Recovery), Article 9 (ICT Risk Management)	Supports compliance

EU AI Act — Article 12 (Record-Keeping)

Article 12 requires that high-risk AI systems be designed and developed with capabilities enabling the automatic recording of events (logs). This requirement extends to the lifecycle management of those records. Checkpoints, as persistent records of AI system execution state, fall within the scope of Article 12. The regulation requires that logs be "kept for a period that is appropriate in view of the intended purpose of the high-risk AI system and applicable legal obligations." AG-380 implements this requirement by defining formal retention policies that determine checkpoint lifecycle duration based on purpose and legal obligation. The regulation also requires that logging capabilities "ensure a level of traceability of the AI system's functioning throughout its lifecycle" — checkpoint lifecycle logging (creation, retention, holds, deletion) contributes to this traceability requirement.

EU AI Act — Article 9 (Risk Management System)

Article 9 requires risk management systems for high-risk AI systems. Uncontrolled checkpoint accumulation creates risks across multiple dimensions: storage exhaustion (operational risk), data over-retention (compliance risk), and stale resumption (safety risk). AG-380 implements risk mitigation measures for all three risk categories. The risk management system required by Article 9 must identify and address these risks; AG-380 provides the structured control.

Article 5(1)(e) establishes the storage limitation principle: personal data must be "kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed." Checkpoints containing personal data are subject to this principle. Once the workflow for which the checkpoint was created is complete, continued retention of personal data within the checkpoint requires a separate legal basis. AG-380 ensures that checkpoint garbage collection is integrated with data retention policies, preventing systematic violation of the storage limitation principle. Article 17 establishes the right to erasure. A data subject exercising this right is entitled to have their personal data deleted from all systems, including checkpoint stores. AG-380's checkpoint registry and sensitivity classification enable the organisation to identify and delete checkpoints containing a specific data subject's personal data in response to erasure requests. Failure to include checkpoint stores in erasure processing is a common compliance gap that AG-380 directly addresses.

SOX — Section 802

Section 802 establishes criminal penalties for the destruction, alteration, or falsification of records. While AG-380 governs the legitimate, policy-driven deletion of checkpoints, the tamper-evident logging requirement ensures that all deletions are attributable and auditable. This protects the organisation from allegations of improper record destruction by demonstrating that every deletion followed the defined retention policy and was executed by an authorised process. SOX auditors examining AI agent operations will ask how the organisation manages the lifecycle of execution records — AG-380 provides the structured answer.

FCA SYSC — 9.1.1R (Record-Keeping Requirements)

SYSC 9.1.1R requires FCA-regulated firms to maintain orderly records of their business and internal organisation. For firms deploying AI agents, checkpoints are business records that document the agent's execution state during business-critical processes. The FCA expects that record-keeping systems have defined retention policies, that records are retrievable for regulatory purposes, and that record disposal follows documented procedures. AG-380 directly implements these requirements for checkpoint records. The FCA's approach to AI governance, articulated through supervisory statements, emphasises that firms must demonstrate control over all data artefacts produced by AI systems, including intermediate execution state.

NIST AI RMF — GOVERN 1.5, MANAGE 2.4

GOVERN 1.5 addresses ongoing monitoring and review of AI governance processes. MANAGE 2.4 addresses mechanisms for managing AI system data throughout its lifecycle. AG-380 supports compliance by establishing governed lifecycle management for checkpoint data, ensuring that AI system execution artefacts are subject to the same data management rigour as other organisational data assets.

ISO 42001 — Clause 6.1, Annex B

Clause 6.1 requires organisations to address risks within the AI management system. Annex B provides guidance on AI data management. Checkpoint accumulation is a data management risk within the AI management system that AG-380 directly mitigates. The formal retention policy and garbage-collection mechanism satisfy ISO 42001's requirement for systematic data lifecycle management within the AI management system.

DORA — Article 12, Article 9

Article 12 requires financial entities to establish backup policies and procedures and to determine recovery methods. Checkpoint management is directly relevant: checkpoints enable recovery, and their governance ensures that recovery capability is maintained (not degraded by storage exhaustion) while resources are managed efficiently. Article 9 requires ICT risk management frameworks. Uncontrolled checkpoint accumulation is an ICT risk — storage exhaustion, data over-retention, and stale state resumption are all ICT risk events that AG-380's governed garbage collection mitigates.

10. Failure Severity

Field	Value
Severity Rating	Critical
Blast Radius	Organisation-wide — extends to regulatory relationships, data subjects, and dependent systems that rely on checkpoint-based recovery capabilities

Consequence chain: Without governed checkpoint garbage collection, the failure manifests through three independent but compounding paths. First, storage exhaustion: checkpoints accumulate until storage capacity is reached, at which point new checkpoint creation fails. If the failure is silent — as it frequently is in systems that treat checkpoint storage as a non-critical resource — the agent continues operating without recovery capability. The next failure requiring rollback discovers that no valid checkpoint exists, transforming a recoverable incident into an unrecoverable one. The operational impact depends on the agent's domain: in financial services, inability to roll back a transaction sequence; in healthcare, inability to recover a clinical workflow to a safe state; in manufacturing, inability to revert a physical process to a known-good configuration. Second, data protection violation: checkpoints containing personal data persist beyond their lawful retention period. The violation scales with the number of data subjects processed and the duration of over-retention. A single agent processing 2,500 customer records monthly accumulates 30,000 over-retained records annually, each representing an individual regulatory violation. GDPR fines for systematic storage limitation breaches can reach €10 million or 2% of global annual turnover. Third, stale resumption: an orphaned checkpoint from a retired workflow, outdated data model, or superseded strategy is accidentally resumed, causing the agent to execute actions based on obsolete state. In financial services, this can result in trades executed against outdated market models; in healthcare, treatments based on superseded clinical data; in manufacturing, production runs using recalled process parameters. The severity compounds because all three failure paths can activate simultaneously — an organisation with no checkpoint governance simultaneously accumulates storage risk, data protection liability, and stale resumption exposure, with the combined consequence exceeding the sum of individual risks.

Cite this protocol

AgentGoverning. (2026). AG-380: Checkpoint Garbage-Collection Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-380

← Previous Protocol

AG-379

Workflow State-Machine Integrity Governance

Next Protocol →

AG-381

Retry Budget by Error Class Governance