The Standard

The 841 Dimensions Regulatory Mapping Version History

Compliance

Compliance Leaderboard Platform Comparison

Verification

Submit for Verification Self-Assessment Tool

About

About AgentGoverning Press & Media

Contact

AG-066

Forensic Replay and Evidence Preservation Governance

Incident Response, Containment & Recovery ~21 min read AGS v2.1 · April 2026

EU AI Act FCA NIST HIPAA ISO 42001

2. Summary

Forensic Replay and Evidence Preservation Governance requires that every AI agent deployment captures and preserves sufficient operational data to reconstruct the complete sequence of events leading to, during, and after any serious incident — to a level of fidelity that supports root cause analysis, regulatory investigation, and legal proceedings. The preserved evidence must include the agent's inputs (instructions, data feeds, API responses), reasoning process (chain-of-thought, intermediate computations, decision points), outputs (actions taken, communications sent, data modified), and environmental context (model version, configuration state, system resource utilisation). Evidence must be preserved in a tamper-evident, cryptographically verifiable format per AG-006, with chain-of-custody documentation sufficient for regulatory and legal admissibility. Critically, the evidence preservation mechanism must operate continuously — not activated at incident detection — because the evidence most needed for root cause analysis is the data captured before the incident was detected.

3. Example

Scenario A — Reasoning Chain Not Captured Prevents Root Cause Determination: A safety-critical AI agent monitoring a water treatment facility adjusts chlorine dosing based on real-time sensor data. Over a 3-hour period, the agent progressively increases chlorine concentration from the target of 1.5 mg/L to 4.8 mg/L — well above the safe limit of 4.0 mg/L. The incident is detected when a downstream sensor triggers a high-chlorine alarm. The agent is contained per AG-065. Investigation begins, but the agent's reasoning chain was not captured — only its final dosing commands were logged. The investigation can see what the agent did but not why. The sensor data shows no anomaly that would justify the dosing increase. Without the reasoning chain, the investigation cannot determine whether the agent misinterpreted sensor data, received corrupted data that was subsequently corrected, experienced a model drift, or was influenced by an adversarial input. The root cause remains undetermined, and the organisation cannot demonstrate to the regulator that the remediation addresses the actual failure mode.

What went wrong: The operational logging captured inputs and outputs but not the reasoning process. The agent's chain-of-thought, the intermediate calculations that mapped sensor readings to dosing decisions, and the internal state that accumulated over the 3-hour period were all lost when the agent was contained. The organisation had the endpoints of the causal chain (sensor data in, dosing commands out) but not the middle (the reasoning that connected them). Consequence: inability to determine root cause, regulatory finding for inadequate forensic capability, inability to demonstrate remediation effectiveness, extended shutdown of the AI-assisted dosing system pending manual investigation, potential public health investigation.

Scenario B — Evidence Tampering Undermines Regulatory Investigation: A financial-value AI agent is under investigation for potential market manipulation after executing a series of trades that appeared to create artificial price momentum. The regulator requests the complete audit trail of the agent's decision-making process for the 48-hour period in question. The organisation produces the logs, but the regulator's forensic analysis reveals that 14 log entries have timestamps that are inconsistent with the surrounding entries — they appear to have been inserted after the fact. The log storage system used mutable database records with no cryptographic integrity protection. Whether the inconsistencies are due to actual tampering or a benign system issue (e.g., log replication lag), the regulator cannot rely on the evidence, and the investigation shifts from "did the agent manipulate the market?" to "did the organisation tamper with evidence?"

What went wrong: The evidence storage did not provide tamper-evidence per AG-006. Log records were stored in a mutable database that permitted insertion, modification, and deletion without detection. Even if no tampering occurred, the absence of integrity protection means the evidence cannot be verified as authentic. Consequence: regulatory investigation escalated from operational conduct to evidence integrity, potential obstruction finding, criminal referral for evidence tampering (even if no tampering occurred, the inability to demonstrate integrity creates legal exposure), complete loss of credibility in the investigation.

Scenario C — Insufficient Retention Destroys Evidence Before Investigation: A customer-facing AI agent handling credit decisions is found to have a systematic bias that resulted in discriminatory lending outcomes affecting approximately 2,300 applicants over an 8-month period. The bias is discovered through a quarterly fairness audit. The investigation requires the agent's reasoning for each of the 2,300 decisions to determine the mechanism of bias and identify affected individuals. However, the agent's operational logs have a 90-day retention policy. Evidence for decisions made more than 90 days ago has been deleted. The investigation can analyse only the most recent 3 months of decisions (approximately 860), leaving 1,440 affected individuals unidentified. The regulator considers the premature evidence destruction as an aggravating factor.

What went wrong: The retention policy was set based on operational convenience (storage cost management) rather than regulatory and legal requirements. Credit decisions have a regulatory review period of at least 2 years under consumer credit legislation. The 90-day retention policy was fundamentally inadequate for the regulatory context. Consequence: inability to identify 1,440 potentially affected individuals, regulatory finding for inadequate record-keeping, consumer redress programme hampered by incomplete evidence, aggravated regulatory penalty for evidence destruction.

4. Requirement Statement

Scope: This dimension applies to all AI agent deployments within scope of AG-064. The scope of evidence preservation is broader than the scope of incident detection: evidence must be captured and preserved continuously for all agent operations, not only during detected incidents. This is because the most forensically valuable evidence is typically generated before the incident is detected — the sequence of events, inputs, and reasoning that led to the failure. If evidence capture begins only at detection, the causal chain is already broken. The scope includes all data necessary to reconstruct the agent's decision-making process: inputs received, reasoning applied, outputs generated, and environmental context. For multi-agent systems, the scope extends to inter-agent communications and delegation records sufficient to reconstruct the complete interaction sequence.

4.1. A conforming system MUST continuously capture and preserve the following for every agent action: the input data that informed the action, the reasoning process or chain-of-thought that produced the action, the output or action taken, the timestamp with sub-second precision synchronised to a reliable time source, the agent identity and model version, and the governance configuration in effect at the time.

4.2. A conforming system MUST store all captured evidence in a tamper-evident format with cryptographic integrity verification per AG-006, such that any modification, insertion, or deletion of evidence records is detectable.

4.3. A conforming system MUST retain evidence for a minimum period aligned with applicable regulatory requirements — at least 7 years for regulated financial services, at least 6 years for healthcare, at least 5 years for other regulated sectors, and at least 3 years otherwise.

4.4. A conforming system MUST preserve the complete chain of custody for all forensic evidence, documenting every access, copy, or transfer of evidence with the identity of the accessor, the timestamp, the purpose, and the authorisation.

4.5. A conforming system MUST be capable of replaying an agent's decision sequence from preserved evidence — given the same inputs, configuration, and model version, the replay must produce the same outputs, enabling forensic determination of whether the observed behaviour was consistent with the agent's programming or indicative of an anomaly.

4.6. A conforming system MUST implement evidence preservation that is independent of the agent's runtime — the agent must not be able to modify, delete, or suppress its own evidence records.

4.7. A conforming system SHOULD capture environmental context alongside operational data, including: system resource utilisation (CPU, memory, network), concurrent agent activity, external system response times, and any error conditions in dependent systems, to support forensic reconstruction of the operating environment.

4.8. A conforming system SHOULD implement tiered storage for evidence retention — high-performance storage for recent evidence (e.g., 90 days) with automatic migration to cost-effective archival storage for long-term retention, ensuring that retention requirements are met without prohibitive storage costs.

4.9. A conforming system MAY implement automated forensic analysis capabilities that can reconstruct incident timelines, identify anomalous decision patterns, and generate preliminary root cause hypotheses from preserved evidence.

5. Rationale

Forensic Replay and Evidence Preservation Governance addresses a capability gap that is unique to AI agent deployments: the need to understand not just what happened but why the agent made the decisions it made. In traditional software systems, forensic investigation examines inputs, outputs, and code paths — the code is deterministic, so given the same inputs, the same outputs will be produced. AI agents introduce non-determinism (or apparent non-determinism in the case of temperature-based sampling), context-dependent reasoning, and emergent behaviours that cannot be predicted from the code alone. Understanding why an AI agent took a particular action requires preserving not just the inputs and outputs, but the complete reasoning process — the chain-of-thought, the attention patterns, the intermediate computations, and the accumulated context that influenced the decision.

This requirement has three drivers: operational, regulatory, and legal.

The operational driver is root cause analysis. AG-067 requires that every serious incident has a determined root cause. Without preserved evidence of the reasoning process, root cause determination is often impossible — the investigation can observe that the agent produced an incorrect output but cannot determine whether the cause was corrupted input data, model drift, adversarial manipulation, configuration error, or a novel reasoning failure. Without root cause determination, the organisation cannot implement targeted remediation and must instead apply broad, expensive, and potentially disruptive controls.

The regulatory driver is accountability. Regulators across jurisdictions increasingly require that organisations can explain AI-driven decisions, particularly when those decisions affect individuals' rights or financial interests. The EU AI Act Article 13 requires transparency. GDPR Article 22 gives individuals the right not to be subject to purely automated decisions. The FCA expects firms to be able to explain trading decisions. All of these obligations require preserved evidence of the decision-making process — not just the decision outcome.

The legal driver is evidentiary admissibility. In legal proceedings — whether regulatory enforcement, civil litigation, or criminal prosecution — the admissibility of digital evidence depends on the ability to demonstrate its authenticity and integrity. Evidence that has been stored in a mutable format without integrity protection may be challenged as unreliable. Evidence without chain-of-custody documentation may be excluded. AG-066 ensures that preserved evidence meets the standards required for legal admissibility, protecting the organisation's ability to use its own evidence in its defence and preventing adverse inferences from evidence destruction.

6. Implementation Guidance

AG-066 establishes the evidence preservation pipeline as a continuous, infrastructure-layer capability that operates independently of the agent's runtime and cannot be influenced by the agent's outputs. The pipeline captures, processes, stores, and indexes operational data from every agent action, creating a forensic record that supports investigation at any future point.

The architecture of the evidence pipeline should follow a write-once, append-only model. Data flows from the agent's operational environment into the evidence pipeline through a one-way interface — the pipeline can read from the agent's environment, but the agent cannot write to, modify, or query the pipeline. This architectural separation ensures that a compromised agent cannot tamper with its own evidence.

Recommended patterns:

Sidecar evidence collector. Deploy a sidecar container alongside each agent container that intercepts and copies all inputs to and outputs from the agent. The sidecar captures: all API requests and responses, all database queries and results, all inter-agent communications, and the agent's chain-of-thought output (if the agent architecture exposes it). The sidecar writes to a write-once storage layer that the agent container cannot access. The sidecar operates on a separate resource allocation to ensure that evidence capture does not degrade agent performance. Evidence capture latency should be under 50 milliseconds to avoid operational impact.
Cryptographic evidence chain. Each evidence record is hashed and the hash is included in the subsequent record, creating a hash chain that makes any insertion, modification, or deletion detectable. The chain is periodically anchored to an external timestamping authority (e.g., RFC 3161 timestamp) to provide independent temporal verification. For regulated environments, the hash chain should be anchored at least every 60 seconds and at every incident detection event. This pattern implements AG-006 requirements within the evidence domain.
Deterministic replay environment. Maintain a replay environment that can load a specific model version, apply a specific configuration, and process preserved inputs to reproduce the agent's outputs. The replay environment must be isolated from production systems to prevent contamination. Replay fidelity depends on capturing all non-deterministic inputs — if the agent uses temperature-based sampling, the random seed must be captured; if the agent queries external systems, the external responses must be captured; if the agent's behaviour depends on timing, the timing must be reproduced. Where perfect replay is not achievable (e.g., due to non-determinism in the model), the replay should produce outputs within a defined tolerance band, and deviations from the original output should be explainable by the known non-deterministic factors.
Evidence indexing and search. Index evidence records by: timestamp, agent identity, action type, affected entities (counterparties, data subjects, systems), and incident classification (if applicable). The index supports forensic queries such as "all actions by Agent X affecting Customer Y between 14:00 and 16:00 on 15 March 2026." The index must be maintained in a read-only mode relative to the evidence records — the index can be rebuilt from the evidence but must not be the sole source of evidence metadata.

Anti-patterns to avoid:

Capturing outputs without reasoning. Logging that the agent "approved claim #4521 for £12,400" without capturing the reasoning that led to the approval is forensically insufficient. The output alone does not support root cause analysis. If the claim was approved incorrectly, the investigation needs to know why — was it a data error, a reasoning failure, or an adversarial input? Without the reasoning chain, these possibilities cannot be distinguished.
Agent-controlled evidence capture. Allowing the agent to decide what to log, when to log, or how to log creates a fundamental integrity risk. An agent that is malfunctioning may fail to log its malfunction. An agent that has been compromised may suppress evidence of the compromise. An agent optimising for performance may reduce logging fidelity to improve throughput. Evidence capture must be external to and independent of the agent.
Mutable evidence storage. Storing evidence in a standard relational database with update and delete permissions creates both integrity risk and legal risk. Even if no tampering occurs, the possibility of tampering undermines the evidentiary value of the records. Evidence storage must be append-only with cryptographic integrity verification.
Insufficient retention driven by storage cost. Deleting evidence after 90 days to manage storage costs may be operationally convenient but is forensically and legally reckless. Many investigations begin months or years after the events in question. Tiered storage (hot/warm/cold) manages costs while preserving evidence for the required retention period. The cost of adequate evidence retention is negligible compared to the cost of a regulatory finding for evidence destruction.
Evidence capture that degrades agent performance. If evidence capture introduces significant latency (e.g., synchronous writes to remote storage on every action), operators will be pressured to reduce capture fidelity or disable capture entirely during peak load. Evidence capture must be architecturally designed for minimal performance impact — asynchronous writes, local buffering, and separate resource allocation.

Industry Considerations

Financial Services. MiFID II Article 16(7) requires investment firms to record all communications and transactions sufficient to enable the competent authority to monitor compliance. For AI agents executing trades, this includes the reasoning that led to each trading decision, not just the trade execution record. The FCA's Market Watch publications have emphasised that firms must be able to explain trading decisions and reconstruct the decision-making process. Evidence retention of 7 years is the minimum for MiFID II records. For agents operating in multiple financial jurisdictions, the longest applicable retention period should be applied to all records to avoid jurisdiction-specific gaps.

Healthcare. HIPAA requires that access to protected health information (PHI) be logged, and that the logs be retained for at least 6 years. For AI agents processing PHI, the evidence must include which patient records were accessed, what data was extracted, how it was used in reasoning, and what outputs were generated. The evidence must itself be treated as PHI if it contains patient data, requiring encryption at rest and in transit, access controls, and breach notification if compromised. Clinical decision-making evidence must be retained for the medical record retention period applicable in the jurisdiction — often 10+ years.

Critical Infrastructure. For AI agents in critical infrastructure, evidence preservation must include physical process data (sensor readings, actuator commands, control system states) alongside the agent's reasoning data. Post-incident investigation of a physical safety event requires correlation between the agent's reasoning and the physical process dynamics. Evidence storage must be physically separated from the controlled process to survive infrastructure failures that may accompany the incident.

Maturity Model

Basic Implementation — The organisation captures agent inputs and outputs in a structured log. The log is stored in a standard database with access controls but without cryptographic integrity protection. Chain-of-thought or reasoning process data is captured if the agent architecture exposes it, but capture is not guaranteed for all decision types. Retention is defined by policy but managed manually (e.g., periodic deletion of records older than the retention period). Replay capability is limited — the organisation can review what happened but cannot reproduce the agent's decisions. This level meets the minimum mandatory requirements but has significant forensic limitations: evidence integrity cannot be cryptographically verified, and replay capability is insufficient for root cause determination.

Intermediate Implementation — Evidence capture is implemented as an independent pipeline (sidecar or equivalent) that operates outside the agent's control. All inputs, outputs, and reasoning processes are captured with sub-second timestamps. Evidence is stored in an append-only format with cryptographic hash chaining per AG-006. Retention is automated with tiered storage (hot/warm/cold) meeting regulatory minimums. The organisation can replay agent decisions for recent incidents using preserved inputs and the current model version. Chain-of-custody documentation is maintained for all evidence access. Evidence is indexed and searchable.

Advanced Implementation — All intermediate capabilities plus: deterministic replay environment that can load historical model versions and configurations to reproduce agent decisions exactly as they occurred. Evidence is anchored to an external timestamping authority. Automated forensic analysis tools can reconstruct incident timelines, identify anomalous decision patterns, and correlate agent evidence with external system logs. Evidence preservation has been independently verified for completeness (no gaps in capture), integrity (tamper-evidence is sound), and admissibility (meets evidentiary standards for the applicable jurisdictions). Mean time from forensic request to evidence delivery is under 4 hours.

7. Evidence Requirements

Required artefacts:

Evidence pipeline architecture. Architecture documentation demonstrating that the evidence capture mechanism operates independently of the agent runtime, including data flow diagrams, infrastructure separation evidence, and the specific capture points for each data type (inputs, reasoning, outputs, environmental context).
Capture completeness verification. Results from testing that confirms all required data types are captured for all agent action types. Format: test report showing the data types captured for each action type, with gap analysis for any action types where capture is incomplete.
Integrity verification records. Results from periodic integrity verification of the evidence store, confirming that hash chains are unbroken, no records have been modified, and external timestamp anchors are valid. Minimum monthly verification cadence.
Replay capability demonstration. Documentation of a successful replay exercise, showing that preserved evidence was used to reproduce a historical agent decision sequence, with comparison of replay outputs to original outputs.
Retention compliance verification. Evidence that retention policies are being enforced correctly — records are being migrated to archival storage on schedule, and no records have been deleted before their retention period expires.

Retention requirements:

Evidence pipeline architecture and capture completeness records: maintained current and retained for the lifetime of the agent deployment plus the longest applicable retention period.
Integrity verification records: retained for the same period as the evidence they verify.
Evidence itself: per requirement 4.3 — minimum 7 years (financial services), 6 years (healthcare), 5 years (other regulated), 3 years (otherwise).

Access requirements:

Forensic evidence producible to regulators within 24 hours of request. For DORA-regulated entities, within 4 hours. For legal proceedings, within timeframes specified by court order. Evidence must be producible in a format that supports independent verification of integrity.

8. Test Specification

Testing AG-066 compliance requires verification that evidence capture is complete, tamper-evident, and supports forensic replay.

Test 8.1: Capture Completeness

Stimulus: Execute a defined sequence of agent actions covering all supported action types (API calls, database operations, communications, file operations). For each action, verify that the evidence pipeline captures: input data, reasoning process, output, timestamp, agent identity, model version, and configuration state.
Expected behaviour: Every data type is captured for every action type with no gaps.
Pass criteria: 100% capture completeness across all action types and data types.
Fail criteria: Any action type or data type has incomplete capture.

Test 8.2: Capture Independence

Stimulus: Configure the agent to actively attempt to suppress evidence capture — have the agent modify its own log outputs, send false data to the capture mechanism, or attempt to access and delete evidence records.
Expected behaviour: The evidence pipeline captures the agent's actual inputs, reasoning, and outputs regardless of the agent's suppression attempts. Deletion attempts are blocked.
Pass criteria: Evidence records accurately reflect the agent's actual behaviour, not the agent's self-reported behaviour. All suppression and deletion attempts fail.
Fail criteria: Any evidence record is influenced by the agent's suppression attempts, or any evidence record is deleted.

Test 8.3: Tamper-Evidence Verification

Stimulus: After a sequence of evidence records has been written, attempt to modify a record in the middle of the sequence (change a value, alter a timestamp, insert a new record). Then run the integrity verification process.
Expected behaviour: The integrity verification detects the tampering and identifies the affected record(s).
Pass criteria: All tampering attempts are detected. The specific tampered record is identified. The integrity of unmodified records is confirmed.
Fail criteria: Any tampering goes undetected, or the integrity verification produces false positives on unmodified records.

Test 8.4: Forensic Replay Fidelity

Stimulus: Select a historical agent decision sequence (at least 100 actions over at least 1 hour). Load the replay environment with the model version and configuration that were in effect at the time. Replay the preserved inputs and compare the outputs to the original recorded outputs.
Expected behaviour: Replay outputs match original outputs within the tolerance band defined for the agent's non-deterministic factors. Any deviations are explainable by documented non-deterministic elements (e.g., temperature-based sampling).
Pass criteria: At least 95% of replayed actions produce identical outputs. Deviations in the remaining 5% are within the defined tolerance band and explainable.
Fail criteria: More than 5% of replayed actions produce outputs outside the tolerance band, or any deviation is unexplainable.

Test 8.5: Retention Compliance

Stimulus: Verify that evidence records exist for the full required retention period. Attempt to access records at the boundary of the retention period (e.g., records from 6 years and 364 days ago for a 7-year retention requirement).
Expected behaviour: Records are accessible and integrity-verified for the full retention period. Records beyond the retention period may be deleted per policy.
Pass criteria: All records within the retention period are accessible, complete, and integrity-verified.
Fail criteria: Any record within the retention period is missing, incomplete, or fails integrity verification.

Test 8.6: Chain-of-Custody Integrity

Stimulus: Access evidence records through multiple channels (forensic query interface, regulatory production, legal discovery export). Verify that each access is logged with accessor identity, timestamp, purpose, and authorisation.
Expected behaviour: Every evidence access is logged. The chain-of-custody log itself is tamper-evident.
Pass criteria: 100% of evidence access events are logged with all required metadata. Chain-of-custody log integrity is verifiable.
Fail criteria: Any evidence access is unlogged, or chain-of-custody records can be modified without detection.

Test 8.7: Performance Impact of Evidence Capture

Stimulus: Measure agent operational performance (throughput, latency) with evidence capture enabled and disabled. The measurement should be conducted under peak load conditions.
Expected behaviour: Evidence capture introduces less than 5% degradation in agent throughput and less than 50 milliseconds additional latency per action.
Pass criteria: Performance impact is within defined tolerances and does not create operational pressure to disable or reduce evidence capture.
Fail criteria: Performance impact exceeds tolerances, or evidence capture causes operational failures under peak load.

Conformance Scoring

Score 0: No systematic evidence preservation exists — agent operational data is captured only in application logs with no integrity protection, no chain-of-custody, and no replay capability.
Score 1: Evidence is captured for inputs and outputs but not reasoning processes. Storage is mutable. No cryptographic integrity protection. Limited retention.
Score 2: Complete evidence capture (inputs, reasoning, outputs, context) in an independent pipeline with tamper-evident storage, cryptographic integrity, and retention meeting regulatory requirements. Replay capability exists for recent incidents.
Score 3: Verified by independent forensic review — evidence completeness, integrity, replay fidelity, and chain-of-custody have been validated by an independent party. Deterministic replay environment supports historical model versions. Evidence has been successfully used in a regulatory production or legal proceeding.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 12 (Record-Keeping)	Direct requirement
EU AI Act	Article 13 (Transparency)	Supports compliance
EU AI Act	Article 62 (Reporting of Serious Incidents)	Supports compliance
MiFID II	Article 16(7) (Record-Keeping of Transactions and Orders)	Direct requirement
GDPR	Article 5(2) (Accountability Principle)	Direct requirement
GDPR	Article 30 (Records of Processing Activities)	Supports compliance
DORA	Article 10 (Detection)	Supports compliance
FCA SYSC	9.1 (Record-Keeping)	Direct requirement
NIST AI RMF	GOVERN 1.5, MAP 3.5, MANAGE 2.3	Supports compliance
ISO 42001	Clause 7.5 (Documented Information), Clause 9.1 (Monitoring, Measurement, Analysis and Evaluation)	Supports compliance

EU AI Act — Article 12 (Record-Keeping)

Article 12 requires that high-risk AI systems are designed and developed with capabilities enabling the automatic recording of events ("logs") relevant to identifying risks and facilitating post-market monitoring. The logs must be capable of recording the period of each use, the reference database against which input data has been checked, the input data for which the search has led to a match, and the identification of the natural persons involved in the verification of results. AG-066 implements Article 12 for AI agent deployments by ensuring that operational logs capture the complete decision-making process, not just the decision outcome. The tamper-evidence requirement exceeds Article 12's minimum by ensuring that logs are not only captured but are verifiably authentic.

MiFID II — Article 16(7) (Record-Keeping)

Article 16(7) requires investment firms to arrange for records to be kept of all services, activities, and transactions sufficient to enable the competent authority to monitor compliance. For AI agents executing trades, this means the complete chain from market data input through reasoning to trade execution must be recorded and retained for at least 5 years (7 years in practice for most firms). AG-066's evidence pipeline satisfies this requirement by continuously capturing the full decision chain, stored with integrity protection that supports regulatory production.

The accountability principle requires controllers to be able to demonstrate compliance with the data protection principles. For AI agents processing personal data, this means the organisation must be able to demonstrate that each processing decision was lawful, fair, and transparent. AG-066's evidence preservation enables this demonstration by preserving the reasoning that led to each processing decision, the legal basis applied, and the data used. Without this evidence, the controller cannot discharge the accountability burden.

DORA — Article 10 (Detection)

Article 10 requires financial entities to have mechanisms to promptly detect anomalous activities. AG-066 supports detection by preserving the operational data that detection mechanisms (AG-064) analyse. Without continuous evidence preservation, detection mechanisms have incomplete data — they can only analyse recent data held in operational buffers, missing patterns that develop over longer periods.

FCA SYSC — 9.1 (Record-Keeping)

The FCA requires firms to arrange for orderly records to be kept of their business, internal organisation, and compliance with applicable requirements. For AI agent deployments, the FCA expects records sufficient to reconstruct the decision-making process for any agent action. AG-066 ensures these records are captured continuously, stored with integrity protection, and retained for regulatory timescales.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Organisation-wide — affecting the organisation's ability to investigate incidents, comply with regulatory inquiries, and defend itself in legal proceedings

Consequence chain: Without forensic replay and evidence preservation, the organisation loses the ability to understand its own AI agent operations after the fact. The immediate technical failure is an evidence gap — when an incident occurs, the evidence needed to determine root cause does not exist or cannot be verified. The operational impact cascades through the incident response process: AG-067 (Root Cause and Corrective Action) cannot determine root cause without evidence, which means corrective actions are based on assumptions rather than findings, which means the organisation cannot demonstrate that remediation addresses the actual failure mode. The regulatory impact is severe: regulators expect organisations to be able to explain AI-driven decisions and to demonstrate compliance. An organisation that cannot produce verifiable evidence of its agent's decision-making process faces adverse inferences — the regulator may assume the worst. Under GDPR, inability to demonstrate compliance with the accountability principle is itself a violation. Under MiFID II, failure to maintain adequate records is an independent regulatory breach. The legal impact extends to civil litigation: if the organisation is sued for harm caused by an AI agent and cannot produce evidence of the agent's reasoning, the court may draw adverse inferences. The business consequence includes inability to determine root causes (leading to recurring incidents), regulatory enforcement for inadequate records, adverse inferences in legal proceedings, increased insurance premiums (insurers require evidence capability as a condition of AI liability coverage), and reputational damage when the organisation is seen as unable to explain or account for its AI agents' actions.

Cite this protocol

AgentGoverning. (2026). AG-066: Forensic Replay and Evidence Preservation Governance. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-066

← Previous Protocol

AG-065

Quarantine and Safe Containment Governance

Next Protocol →

AG-067

Root Cause and Corrective Action Governance