The Standard

The 841 Dimensions Regulatory Mapping Version History

Compliance

Compliance Leaderboard Platform Comparison

Verification

Submit for Verification Self-Assessment Tool

About

About AgentGoverning Press & Media

Contact

AG-235

Evidence Admissibility Governance

Legal, Regulatory & Records ~17 min read AGS v2.1 · April 2026

EU AI Act FCA NIST HIPAA eIDAS

2. Summary

Evidence Admissibility Governance requires that AI agent records — interaction logs, decision traces, model outputs, configuration snapshots, and all associated metadata — are generated, stored, and maintained in forms that satisfy the evidentiary standards of courts, arbitration panels, regulators, and disciplinary bodies. Generating records is necessary but not sufficient: if those records cannot be admitted as evidence because they fail authenticity, integrity, chain-of-custody, or reliability tests, they are legally worthless. This dimension ensures that AI agent records are court-ready from the moment of creation, not retroactively remediated when litigation arises.

3. Example

Scenario A — Unsigned Logs Ruled Inadmissible: A financial-value agent executing trades is challenged in arbitration by a counterparty claiming that the agent executed a trade at an incorrect price. The organisation produces its trade logs showing the correct price. The counterparty challenges the admissibility of the logs under the UK Civil Evidence Act 1995, arguing that: the logs were generated by a computer system with no evidence that it was operating properly at the time; no witness can authenticate the logs (no human observed the trade execution); the logs lack integrity protections (no cryptographic signatures, no tamper-evident seals); and the logs were stored in a system accessible to the organisation's IT team, who could have modified them. The arbitration panel rules the logs inadmissible as business records because the organisation cannot establish the reliability of the computer system that generated them or the integrity of the records since generation. Without the trade logs, the organisation cannot prove the correct execution price. The arbitration panel rules in favour of the counterparty — damages of USD 1.4 million.

What went wrong: The trade logs were generated as plain-text files without cryptographic integrity protection. No chain-of-custody mechanism existed to demonstrate that the logs had not been modified since generation. No evidence of the computer system's reliability was maintained (no system health records, no validation of the logging pipeline). The logs met the organisation's operational needs but failed the evidentiary standard for legal proceedings. Consequence: USD 1.4 million arbitration loss, requirement to implement evidence-grade logging across all trading agents, and precedent risk for all future disputes involving agent-generated records.

Scenario B — Chain of Custody Broken by Cloud Migration: A public-sector benefits agent's decision records are subpoenaed in a class action alleging discriminatory outcomes. The records were originally stored in an on-premises database. During the 18 months since the decisions were made, the records were migrated to a cloud provider as part of an infrastructure modernisation programme. The migration involved ETL (extract, transform, load) processes that reformatted timestamps, converted character encodings, and normalised field values. No chain-of-custody record was maintained during the migration. The claimants' expert witness argues that the transformation processes could have altered the substantive content of the records, and the absence of chain-of-custody evidence makes it impossible to verify that the produced records match the original records. The court appoints an independent expert who identifies 347 records where timestamp reformatting introduced rounding errors, altering the apparent decision sequence for those records.

What went wrong: The migration treated the records as data, not as evidence. No chain-of-custody protocol was applied to the migration. The ETL transformations were designed for operational compatibility, not for evidentiary integrity. The reformatting introduced changes that, while operationally insignificant, were legally material because they altered the apparent chronology of decisions. Consequence: 347 records excluded from evidence, court-appointed expert costs of GBP 280,000, adverse inference for the excluded records, and settlement pressure significantly increased.

Scenario C — Model Output Not Reproducible: A healthcare AI agent's diagnostic suggestion is challenged in a malpractice claim. The patient alleges that the agent's suggestion was negligent. The organisation attempts to reproduce the agent's output by replaying the original inputs against the current model. The output differs from the logged output because: the model has been retrained 3 times since the original output; the decoding temperature used at the time was not recorded; and the random seed for the stochastic sampling was not preserved. The expert witness testifies that the organisation cannot demonstrate what the agent actually computed at the time of the decision — the log says one thing but the organisation cannot independently verify it. The court reduces the weight given to the logged output, noting that "the reliability of a record that cannot be independently verified is inherently questionable."

What went wrong: The records preserved the input and output but not the computational conditions (model version, temperature, random seed) necessary for reproducibility. The output was logged but not verifiable through independent reproduction. Under Daubert/Frye standards (US) or the equivalent reliability test (UK), expert testimony about AI system behaviour typically requires demonstration of reproducibility. Consequence: Diminished evidentiary weight, weakened defence, increased settlement cost by an estimated 40%, and requirement to implement reproducibility-grade logging.

4. Requirement Statement

Scope: This dimension applies to every AI agent whose records could become evidence in legal, regulatory, arbitral, or disciplinary proceedings — which, in practice, means all agents that make or contribute to decisions affecting individuals, execute transactions, interact with external parties, or operate in regulated sectors. The scope covers all record types: interaction logs (prompts, responses, context), decision traces (reasoning chains, tool calls, intermediate outputs), model outputs (generated text, classifications, recommendations), configuration snapshots (model version, parameters, temperature, random seed), and all associated metadata (timestamps, user identifiers, session identifiers, system identifiers). The scope extends to the systems that generate, transport, store, and produce these records — the admissibility of the record depends on the reliability of the entire pipeline, not just the record itself.

4.1. A conforming system MUST apply cryptographic integrity protection (digital signatures or tamper-evident seals) to all agent records at the point of generation, before any transport or storage operation.

4.2. A conforming system MUST maintain an unbroken chain of custody for all agent records from generation through any transport, transformation, migration, or storage operation to eventual production, with each custody transfer cryptographically verified.

4.3. A conforming system MUST record, for each agent output, the computational conditions necessary for reproducibility: model version identifier (resolvable to specific weights), decoding parameters (temperature, top-p, top-k), random seed (if stochastic), and any tools or external data sources invoked.

4.4. A conforming system MUST maintain evidence of the reliability of the systems that generate agent records, including system health logs, validation test results, and incident records for any period during which the system's reliability is relevant.

4.5. A conforming system MUST ensure that timestamp accuracy is maintained to a precision sufficient for the applicable legal context (minimum millisecond precision for financial transactions, minimum second precision for other interactions), synchronised against an authoritative time source with documented accuracy bounds.

4.6. A conforming system MUST support the production of records in formats acceptable to courts, regulators, and arbitration panels in all applicable jurisdictions, including conversion from internal formats to standard evidentiary formats without loss of content or metadata.

4.7. A conforming system SHOULD implement reproducibility verification — periodic re-execution of historical agent decisions using preserved model versions and computational conditions, verifying that the output matches the logged record within documented variance bounds.

4.8. A conforming system SHOULD maintain witness-ready documentation: a human-readable description of the record-keeping system that a qualified witness (e.g., a system administrator or records manager) can present in court to authenticate the records.

4.9. A conforming system SHOULD implement jurisdictional mapping of evidentiary standards, identifying the specific admissibility requirements in each jurisdiction where records may need to be produced.

4.10. A conforming system MAY implement automated admissibility assessment that evaluates each record against the applicable evidentiary standards and flags records that may face admissibility challenges.

5. Rationale

The evidentiary value of a record depends not on what the record says but on whether the record can be trusted. Courts apply specific tests to determine whether a record is trustworthy enough to be admitted as evidence. These tests vary by jurisdiction but share common themes: authenticity (is the record what it purports to be?), integrity (has the record been altered since creation?), reliability (was the record created by a reliable system?), and completeness (is the record complete, or have portions been omitted?).

AI agent records face heightened scrutiny because they are generated by systems that are not fully understood. A traditional business record — an invoice, a bank statement, a transaction log — is generated by deterministic software that produces the same output given the same input. A court can reasonably infer that the record is accurate if the system was functioning properly. AI agent records are generated by probabilistic systems whose outputs depend on model weights, sampling parameters, and random seeds. The same input can produce different outputs. This means that a court cannot simply infer accuracy from system functionality — the court needs evidence that the specific output was in fact produced at the specific time, and ideally, evidence that the output is reproducible.

The business record exception (US Federal Rules of Evidence 803(6), UK Civil Evidence Act 1995 Section 9) permits the admission of records made "in the course of regularly conducted business activity" if the record was made "at or near the time" of the event, "by a person with knowledge," and "the source of information or the method or circumstances of preparation indicate trustworthiness." For AI agent records, the "person with knowledge" is the system itself, and the "trustworthiness" of the method depends on the integrity protections, chain of custody, and system reliability evidence that AG-235 requires.

6. Implementation Guidance

Evidence admissibility is an outcome of the entire record lifecycle — generation, transport, storage, transformation, and production. Weakness at any point can render the entire record inadmissible.

Recommended patterns:

Sign-at-source architecture. Cryptographically sign each agent record at the point of generation — within the logging pipeline, before the record enters any transport or storage layer. Use a signing key that is not accessible to personnel who have access to the stored records (separation of duties). The signature covers the full record content including all metadata. Upon production, the signature can be verified to demonstrate that the record has not been modified since generation.
Immutable append-only record store. Store records in an append-only data store where records cannot be modified or deleted (without detectable evidence of the attempt). Technologies include write-once storage (WORM), blockchain-based audit trails, and append-only databases with cryptographic chaining (each record includes a hash of the previous record, creating a tamper-evident chain). The append-only property eliminates the class of challenges based on post-hoc modification.
Computational provenance capture. At each agent decision point, capture and record the complete computational provenance: the model version (as a hash or resolvable identifier), the decoding parameters, the random seed (if applicable), the input data (complete prompt/context), and the output data (complete response). Store this provenance alongside the decision record with the same integrity protections. This enables reproducibility verification — replaying the decision with the same provenance should produce the same output (within documented variance bounds for stochastic systems).
Chain-of-custody protocol for transformations. When records must be transformed (format conversion, migration, archival), implement a protocol that: (1) hashes the record before transformation, (2) performs the transformation, (3) hashes the record after transformation, (4) logs both hashes, the transformation applied, the timestamp, and the identity of the process, and (5) retains the original record alongside the transformed version. This allows verification that the transformation did not alter the substantive content.

Anti-patterns to avoid:

Treating operational logs as evidence-grade. Operational logs are designed for debugging and monitoring, not for legal proceedings. They typically lack cryptographic integrity, chain-of-custody records, and system reliability evidence. Producing operational logs as evidence without these protections invites admissibility challenges.
Signing records after storage rather than at generation. A signature applied after the record has been transported and stored proves that the record has not been modified since signing — but does not prove that the record was not modified between generation and signing. The evidentiary gap between generation and signing is a vulnerability. Sign at source.
Relying on database access controls as integrity evidence. Access controls demonstrate who could access the records, not whether the records were actually modified. A court will note that access controls can be bypassed by administrators, and that the absence of modification cannot be inferred from the presence of access controls alone. Cryptographic integrity provides positive evidence of non-modification.
Discarding computational provenance as "operational detail." The model version, temperature, and random seed are not operational details — they are the conditions that determined the output. Without them, the output is an assertion without proof. With them, the output can be independently verified.
Assuming all jurisdictions have the same evidentiary standards. The US, UK, EU member states, and other jurisdictions have different admissibility rules, different authentication requirements, and different approaches to computer-generated evidence. Records produced for one jurisdiction may not meet the standards of another. Jurisdictional mapping (requirement 4.9) is essential for multi-jurisdictional operations.

Industry Considerations

Financial Services. MiFID II Article 16(6) requires that records be maintained in a medium that "allows the storage of information in a way accessible for future reference" and "in such a form and manner that the competent authority can readily access them." FCA SUP 15A and MAR 8 specify technical standards for order record-keeping that include timestamp precision requirements (microsecond for certain records). For AI trading agents, these requirements extend to the agent's decision records.

Healthcare. Clinical decision support records may be subject to expert evidence standards (Daubert in the US, the reliability test in the UK). An AI diagnostic suggestion that cannot be reproduced or explained may be excluded as unreliable expert evidence. Healthcare records also face specific admissibility requirements under medical records legislation (e.g., the US HIPAA Privacy Rule's requirements for designated record sets).

Public Sector. Public sector records may be subject to Freedom of Information requests, judicial review, and parliamentary inquiries. The evidentiary standard for judicial review of administrative decisions is high: the court needs to see the full basis for the decision, including any AI contribution. Records that cannot be authenticated or reproduced may result in the court quashing the decision for inadequate reasoning.

Maturity Model

Basic Implementation — Agent records are generated with timestamps and stored in access-controlled databases. No cryptographic integrity protection is applied. Chain of custody is implicit (the records are in the database they were written to). System reliability evidence is limited to standard IT monitoring. Reproducibility is not supported — model versions may not be retained. This level produces records that are usable for operational purposes but face significant admissibility challenges in contested proceedings.

Intermediate Implementation — Agent records are cryptographically signed at generation. Records are stored in append-only stores with tamper-evident properties. Computational provenance (model version, parameters, seed) is recorded with each decision. Chain-of-custody protocols cover transformations and migrations. Timestamps are synchronised against an authoritative time source with documented accuracy. Witness-ready documentation exists for the record-keeping system. This level produces records that meet standard evidentiary requirements in most jurisdictions.

Advanced Implementation — All intermediate capabilities plus: periodic reproducibility verification confirms that historical decisions can be replayed from preserved provenance. Jurisdictional mapping identifies specific admissibility requirements for each jurisdiction. Automated admissibility assessment flags records that may face challenges. Chain-of-custody is cryptographically chained (each record includes a hash of the previous record). Independent audit of the record-keeping system validates its reliability on an annual basis. The organisation can present a qualified witness who can authenticate the entire record-keeping pipeline from generation to production.

7. Evidence Requirements

Required artefacts:

Record integrity architecture. Documentation of the cryptographic integrity mechanisms applied to agent records, including signing algorithms, key management procedures, and verification procedures.
Chain-of-custody records. Complete chain-of-custody logs for all records that have undergone transformation, migration, or transfer, showing before/after hashes and the transformation applied.
System reliability evidence. Health logs, validation results, and incident records for the systems that generate and store agent records, covering the period during which the records were generated.
Computational provenance records. Model version identifiers, decoding parameters, and random seeds recorded with each agent decision, sufficient for reproducibility verification.
Witness-ready documentation. Human-readable description of the record-keeping system suitable for presentation by a qualified witness in court.
Reproducibility verification results. Results from periodic re-execution of historical decisions using preserved provenance, demonstrating output consistency.

Retention requirements:

Integrity-protected records and chain-of-custody logs: aligned with the retention requirements of the applicable records (per AG-231 and sector-specific requirements). System reliability evidence: retained for the same period as the records it supports. Witness-ready documentation: maintained as a living document, with historical versions retained.

Access requirements:

Producible in legally admissible form to courts, arbitration panels, regulators, and opposing counsel (pursuant to appropriate legal process) within the timeframe ordered by the applicable body.

8. Test Specification

Test 8.1: Integrity Verification

Stimulus: Generate an agent record. Retrieve the record and verify its cryptographic signature.
Expected behaviour: The signature verification succeeds, confirming that the record has not been modified since generation.
Pass criteria: The signature is valid. The signing timestamp matches the record generation timestamp within documented clock skew bounds.
Fail criteria: The signature verification fails, or no signature exists on the record.

Test 8.2: Tamper Detection

Stimulus: Generate an agent record. Modify one byte of the record in storage (bypassing application controls through direct storage access). Run the integrity verification.
Expected behaviour: The signature verification fails, detecting the modification.
Pass criteria: Any modification, however small, is detected by the integrity verification mechanism.
Fail criteria: The modification is not detected.

Test 8.3: Reproducibility Verification

Stimulus: Generate an agent decision with recorded computational provenance (model version, parameters, random seed). Replay the decision using the preserved provenance.
Expected behaviour: The replayed output matches the original logged output exactly (for deterministic systems) or within documented variance bounds (for stochastic systems with preserved random seeds).
Pass criteria: Output match within documented bounds. The replayed output is consistent with the logged output.
Fail criteria: The replayed output differs from the logged output beyond documented variance bounds, or the replay cannot be performed because provenance data is missing.

Test 8.4: Chain-of-Custody Preservation Through Migration

Stimulus: Migrate agent records from one storage system to another (e.g., on-premises to cloud). Verify that the chain-of-custody protocol was followed: pre-migration hash, post-migration hash, transformation log.
Expected behaviour: Pre-migration and post-migration hashes match (for lossless migration) or the transformation log documents all changes with justification (for format-converting migration). The original record is retained alongside the migrated version.
Pass criteria: Complete chain-of-custody record for the migration. Content integrity verified through hash comparison.
Fail criteria: No chain-of-custody record for the migration, or post-migration hash differs from pre-migration hash without documented justification.

Test 8.5: Timestamp Accuracy

Stimulus: Generate 100 agent records while simultaneously recording the time from an independent, authoritative time source. Compare the record timestamps against the independent source.
Expected behaviour: All record timestamps are within documented accuracy bounds of the authoritative source (e.g., within 100 milliseconds for NTP-synchronised systems).
Pass criteria: All 100 timestamps are within documented accuracy bounds. No systematic bias is detected.
Fail criteria: Any timestamp exceeds accuracy bounds, or a systematic bias is detected.

Test 8.6: Production Format Conversion

Stimulus: Convert agent records from internal format to a standard evidentiary format (e.g., PDF/A for document production, CSV for data production). Verify that the conversion preserves all content and metadata.
Expected behaviour: The converted records contain all content and metadata from the original records. No information is lost in conversion.
Pass criteria: Character-by-character comparison confirms content preservation. All metadata fields are present in the converted format.
Fail criteria: Any content or metadata is lost, truncated, or altered in conversion.

Conformance Scoring

Score 0: Agent records are generated without integrity protection — plain text logs with no signatures, no chain of custody, and no computational provenance.
Score 1: Records are stored in access-controlled systems with timestamps, but no cryptographic integrity and no reproducibility capability — operationally useful but evidentiarily weak.
Score 2: Records are cryptographically signed at generation, stored in tamper-evident stores, with computational provenance captured and chain-of-custody maintained — evidence-grade records meeting standard admissibility requirements.
Score 3: Verified by independent audit, with reproducibility verification, jurisdictional admissibility mapping, witness-ready documentation, and demonstrated admissibility in at least one contested proceeding — court-proven evidence infrastructure.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
US FRE	Rule 803(6) (Business Records Exception), Rule 901 (Authentication)	Direct requirement
UK Civil Evidence Act 1995	Section 9 (Proof of Records of Business or Public Authority)	Direct requirement
EU eIDAS	Regulation 910/2014 (Electronic Signatures, Seals, Timestamps)	Supports compliance
MiFID II	Article 16(6-7) (Record-Keeping), RTS 25 (Clock Synchronisation)	Direct requirement
FCA MAR 8	Telephone Recording and Electronic Communications	Supports compliance
EU AI Act	Article 12 (Record-Keeping), Article 61 (Post-Market Monitoring)	Supports compliance
ISO 27001	Annex A.12.4 (Logging and Monitoring)	Supports compliance

US FRE — Rule 803(6) and Rule 901

Rule 803(6) provides a hearsay exception for records of regularly conducted activity (business records) if the custodian or qualified witness testifies to the record's authenticity, regularity, and timeliness. Rule 901 requires authentication — evidence sufficient to support a finding that the record is what the proponent claims it is. For AI agent records, authentication requires demonstrating that the record was generated by the agent at the claimed time, has not been modified, and was generated by a system that was functioning reliably. AG-235's integrity protections, chain-of-custody records, and system reliability evidence provide the foundation for Rule 803(6) and Rule 901 compliance.

UK Civil Evidence Act 1995 — Section 9

Section 9 permits the proof of business records through a certificate identifying the record, describing the manner in which it was compiled, and certifying that it was supplied in the usual course of business. The common law requirement for computer-generated evidence (which supplements the Act) requires evidence that the computer was operating properly at the relevant time. AG-235's system reliability evidence and witness-ready documentation support Section 9 authentication.

EU eIDAS — Electronic Signatures and Timestamps

eIDAS Regulation 910/2014 establishes the legal framework for electronic signatures, electronic seals, and electronic timestamps within the EU. An electronic timestamp that meets eIDAS requirements creates a presumption of the accuracy of the date and time and the integrity of the data. AG-235's cryptographic integrity and timestamp accuracy requirements align with eIDAS standards, supporting the legal admissibility of agent records across EU member states.

MiFID II — RTS 25 (Clock Synchronisation)

RTS 25 specifies clock synchronisation requirements for trading venues and their members — requiring synchronisation to UTC with specific precision requirements (1 millisecond for high-frequency trading, 1 second for other trading). AI trading agents must meet these timestamp precision requirements for their decision and execution records.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Case-specific, but with systemic implications if the record-keeping infrastructure is fundamentally flawed

Consequence chain: Records that fail admissibility tests are excluded from evidence. In proceedings where the organisation bears the burden of proof (e.g., demonstrating that an agent's decision was reasonable, that a trade was executed correctly, or that a benefits determination was lawful), excluded records eliminate the organisation's ability to meet its burden. The practical consequence is that the organisation loses cases it should have won — not because the agent acted incorrectly, but because the organisation cannot prove the agent acted correctly. The financial impact is case-specific but can be substantial: the arbitration scenario (USD 1.4 million) involved a single trade; a class action involving thousands of agent decisions could involve exposure of tens or hundreds of millions. The systemic risk is that a fundamental flaw in the record-keeping infrastructure (e.g., no cryptographic integrity, no chain of custody) affects all records, not just those involved in a specific case. A single adverse admissibility ruling can establish a precedent that all records from the same system are unreliable, affecting every pending and future case involving that system.

Cross-references: AG-006 (Tamper-Evident Record Integrity) provides the foundational integrity mechanisms that AG-235 builds upon. AG-066 (Forensic Replay and Evidence Preservation) defines the reproducibility capability that AG-235's computational provenance enables. AG-231 (Legal Hold and Preservation Governance) ensures that records are preserved when legal proceedings are anticipated — AG-235 ensures that the preserved records are admissible. AG-232 (Privilege and Confidential Review Segregation Governance) addresses the segregation of privileged records, which must maintain admissibility while preserving privilege. AG-229 (Jurisdictional Applicability Mapping Governance) determines which jurisdiction's evidentiary standards apply to each set of records.

Cite this protocol

AgentGoverning. (2026). AG-235: Evidence Admissibility Governance. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-235

← Previous Protocol

AG-234

Representation and Warranty Control Governance

Next Protocol →

AG-236

Export Control and Sanctions-Law Binding Governance