AG-235

Evidence Admissibility Governance

Legal, Regulatory & Records ~17 min read AGS v2.1 · April 2026
EU AI Act FCA NIST HIPAA eIDAS

2. Summary

Evidence Admissibility Governance requires that AI agent records — interaction logs, decision traces, model outputs, configuration snapshots, and all associated metadata — are generated, stored, and maintained in forms that satisfy the evidentiary standards of courts, arbitration panels, regulators, and disciplinary bodies. Generating records is necessary but not sufficient: if those records cannot be admitted as evidence because they fail authenticity, integrity, chain-of-custody, or reliability tests, they are legally worthless. This dimension ensures that AI agent records are court-ready from the moment of creation, not retroactively remediated when litigation arises.

3. Example

Scenario A — Unsigned Logs Ruled Inadmissible: A financial-value agent executing trades is challenged in arbitration by a counterparty claiming that the agent executed a trade at an incorrect price. The organisation produces its trade logs showing the correct price. The counterparty challenges the admissibility of the logs under the UK Civil Evidence Act 1995, arguing that: the logs were generated by a computer system with no evidence that it was operating properly at the time; no witness can authenticate the logs (no human observed the trade execution); the logs lack integrity protections (no cryptographic signatures, no tamper-evident seals); and the logs were stored in a system accessible to the organisation's IT team, who could have modified them. The arbitration panel rules the logs inadmissible as business records because the organisation cannot establish the reliability of the computer system that generated them or the integrity of the records since generation. Without the trade logs, the organisation cannot prove the correct execution price. The arbitration panel rules in favour of the counterparty — damages of USD 1.4 million.

What went wrong: The trade logs were generated as plain-text files without cryptographic integrity protection. No chain-of-custody mechanism existed to demonstrate that the logs had not been modified since generation. No evidence of the computer system's reliability was maintained (no system health records, no validation of the logging pipeline). The logs met the organisation's operational needs but failed the evidentiary standard for legal proceedings. Consequence: USD 1.4 million arbitration loss, requirement to implement evidence-grade logging across all trading agents, and precedent risk for all future disputes involving agent-generated records.

Scenario B — Chain of Custody Broken by Cloud Migration: A public-sector benefits agent's decision records are subpoenaed in a class action alleging discriminatory outcomes. The records were originally stored in an on-premises database. During the 18 months since the decisions were made, the records were migrated to a cloud provider as part of an infrastructure modernisation programme. The migration involved ETL (extract, transform, load) processes that reformatted timestamps, converted character encodings, and normalised field values. No chain-of-custody record was maintained during the migration. The claimants' expert witness argues that the transformation processes could have altered the substantive content of the records, and the absence of chain-of-custody evidence makes it impossible to verify that the produced records match the original records. The court appoints an independent expert who identifies 347 records where timestamp reformatting introduced rounding errors, altering the apparent decision sequence for those records.

What went wrong: The migration treated the records as data, not as evidence. No chain-of-custody protocol was applied to the migration. The ETL transformations were designed for operational compatibility, not for evidentiary integrity. The reformatting introduced changes that, while operationally insignificant, were legally material because they altered the apparent chronology of decisions. Consequence: 347 records excluded from evidence, court-appointed expert costs of GBP 280,000, adverse inference for the excluded records, and settlement pressure significantly increased.

Scenario C — Model Output Not Reproducible: A healthcare AI agent's diagnostic suggestion is challenged in a malpractice claim. The patient alleges that the agent's suggestion was negligent. The organisation attempts to reproduce the agent's output by replaying the original inputs against the current model. The output differs from the logged output because: the model has been retrained 3 times since the original output; the decoding temperature used at the time was not recorded; and the random seed for the stochastic sampling was not preserved. The expert witness testifies that the organisation cannot demonstrate what the agent actually computed at the time of the decision — the log says one thing but the organisation cannot independently verify it. The court reduces the weight given to the logged output, noting that "the reliability of a record that cannot be independently verified is inherently questionable."

What went wrong: The records preserved the input and output but not the computational conditions (model version, temperature, random seed) necessary for reproducibility. The output was logged but not verifiable through independent reproduction. Under Daubert/Frye standards (US) or the equivalent reliability test (UK), expert testimony about AI system behaviour typically requires demonstration of reproducibility. Consequence: Diminished evidentiary weight, weakened defence, increased settlement cost by an estimated 40%, and requirement to implement reproducibility-grade logging.

4. Requirement Statement

Scope: This dimension applies to every AI agent whose records could become evidence in legal, regulatory, arbitral, or disciplinary proceedings — which, in practice, means all agents that make or contribute to decisions affecting individuals, execute transactions, interact with external parties, or operate in regulated sectors. The scope covers all record types: interaction logs (prompts, responses, context), decision traces (reasoning chains, tool calls, intermediate outputs), model outputs (generated text, classifications, recommendations), configuration snapshots (model version, parameters, temperature, random seed), and all associated metadata (timestamps, user identifiers, session identifiers, system identifiers). The scope extends to the systems that generate, transport, store, and produce these records — the admissibility of the record depends on the reliability of the entire pipeline, not just the record itself.

4.1. A conforming system MUST apply cryptographic integrity protection (digital signatures or tamper-evident seals) to all agent records at the point of generation, before any transport or storage operation.

4.2. A conforming system MUST maintain an unbroken chain of custody for all agent records from generation through any transport, transformation, migration, or storage operation to eventual production, with each custody transfer cryptographically verified.

4.3. A conforming system MUST record, for each agent output, the computational conditions necessary for reproducibility: model version identifier (resolvable to specific weights), decoding parameters (temperature, top-p, top-k), random seed (if stochastic), and any tools or external data sources invoked.

4.4. A conforming system MUST maintain evidence of the reliability of the systems that generate agent records, including system health logs, validation test results, and incident records for any period during which the system's reliability is relevant.

4.5. A conforming system MUST ensure that timestamp accuracy is maintained to a precision sufficient for the applicable legal context (minimum millisecond precision for financial transactions, minimum second precision for other interactions), synchronised against an authoritative time source with documented accuracy bounds.

4.6. A conforming system MUST support the production of records in formats acceptable to courts, regulators, and arbitration panels in all applicable jurisdictions, including conversion from internal formats to standard evidentiary formats without loss of content or metadata.

4.7. A conforming system SHOULD implement reproducibility verification — periodic re-execution of historical agent decisions using preserved model versions and computational conditions, verifying that the output matches the logged record within documented variance bounds.

4.8. A conforming system SHOULD maintain witness-ready documentation: a human-readable description of the record-keeping system that a qualified witness (e.g., a system administrator or records manager) can present in court to authenticate the records.

4.9. A conforming system SHOULD implement jurisdictional mapping of evidentiary standards, identifying the specific admissibility requirements in each jurisdiction where records may need to be produced.

4.10. A conforming system MAY implement automated admissibility assessment that evaluates each record against the applicable evidentiary standards and flags records that may face admissibility challenges.

5. Rationale

The evidentiary value of a record depends not on what the record says but on whether the record can be trusted. Courts apply specific tests to determine whether a record is trustworthy enough to be admitted as evidence. These tests vary by jurisdiction but share common themes: authenticity (is the record what it purports to be?), integrity (has the record been altered since creation?), reliability (was the record created by a reliable system?), and completeness (is the record complete, or have portions been omitted?).

AI agent records face heightened scrutiny because they are generated by systems that are not fully understood. A traditional business record — an invoice, a bank statement, a transaction log — is generated by deterministic software that produces the same output given the same input. A court can reasonably infer that the record is accurate if the system was functioning properly. AI agent records are generated by probabilistic systems whose outputs depend on model weights, sampling parameters, and random seeds. The same input can produce different outputs. This means that a court cannot simply infer accuracy from system functionality — the court needs evidence that the specific output was in fact produced at the specific time, and ideally, evidence that the output is reproducible.

The business record exception (US Federal Rules of Evidence 803(6), UK Civil Evidence Act 1995 Section 9) permits the admission of records made "in the course of regularly conducted business activity" if the record was made "at or near the time" of the event, "by a person with knowledge," and "the source of information or the method or circumstances of preparation indicate trustworthiness." For AI agent records, the "person with knowledge" is the system itself, and the "trustworthiness" of the method depends on the integrity protections, chain of custody, and system reliability evidence that AG-235 requires.

6. Implementation Guidance

Evidence admissibility is an outcome of the entire record lifecycle — generation, transport, storage, transformation, and production. Weakness at any point can render the entire record inadmissible.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. MiFID II Article 16(6) requires that records be maintained in a medium that "allows the storage of information in a way accessible for future reference" and "in such a form and manner that the competent authority can readily access them." FCA SUP 15A and MAR 8 specify technical standards for order record-keeping that include timestamp precision requirements (microsecond for certain records). For AI trading agents, these requirements extend to the agent's decision records.

Healthcare. Clinical decision support records may be subject to expert evidence standards (Daubert in the US, the reliability test in the UK). An AI diagnostic suggestion that cannot be reproduced or explained may be excluded as unreliable expert evidence. Healthcare records also face specific admissibility requirements under medical records legislation (e.g., the US HIPAA Privacy Rule's requirements for designated record sets).

Public Sector. Public sector records may be subject to Freedom of Information requests, judicial review, and parliamentary inquiries. The evidentiary standard for judicial review of administrative decisions is high: the court needs to see the full basis for the decision, including any AI contribution. Records that cannot be authenticated or reproduced may result in the court quashing the decision for inadequate reasoning.

Maturity Model

Basic Implementation — Agent records are generated with timestamps and stored in access-controlled databases. No cryptographic integrity protection is applied. Chain of custody is implicit (the records are in the database they were written to). System reliability evidence is limited to standard IT monitoring. Reproducibility is not supported — model versions may not be retained. This level produces records that are usable for operational purposes but face significant admissibility challenges in contested proceedings.

Intermediate Implementation — Agent records are cryptographically signed at generation. Records are stored in append-only stores with tamper-evident properties. Computational provenance (model version, parameters, seed) is recorded with each decision. Chain-of-custody protocols cover transformations and migrations. Timestamps are synchronised against an authoritative time source with documented accuracy. Witness-ready documentation exists for the record-keeping system. This level produces records that meet standard evidentiary requirements in most jurisdictions.

Advanced Implementation — All intermediate capabilities plus: periodic reproducibility verification confirms that historical decisions can be replayed from preserved provenance. Jurisdictional mapping identifies specific admissibility requirements for each jurisdiction. Automated admissibility assessment flags records that may face challenges. Chain-of-custody is cryptographically chained (each record includes a hash of the previous record). Independent audit of the record-keeping system validates its reliability on an annual basis. The organisation can present a qualified witness who can authenticate the entire record-keeping pipeline from generation to production.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Integrity Verification

Test 8.2: Tamper Detection

Test 8.3: Reproducibility Verification

Test 8.4: Chain-of-Custody Preservation Through Migration

Test 8.5: Timestamp Accuracy

Test 8.6: Production Format Conversion

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
US FRERule 803(6) (Business Records Exception), Rule 901 (Authentication)Direct requirement
UK Civil Evidence Act 1995Section 9 (Proof of Records of Business or Public Authority)Direct requirement
EU eIDASRegulation 910/2014 (Electronic Signatures, Seals, Timestamps)Supports compliance
MiFID IIArticle 16(6-7) (Record-Keeping), RTS 25 (Clock Synchronisation)Direct requirement
FCA MAR 8Telephone Recording and Electronic CommunicationsSupports compliance
EU AI ActArticle 12 (Record-Keeping), Article 61 (Post-Market Monitoring)Supports compliance
ISO 27001Annex A.12.4 (Logging and Monitoring)Supports compliance

US FRE — Rule 803(6) and Rule 901

Rule 803(6) provides a hearsay exception for records of regularly conducted activity (business records) if the custodian or qualified witness testifies to the record's authenticity, regularity, and timeliness. Rule 901 requires authentication — evidence sufficient to support a finding that the record is what the proponent claims it is. For AI agent records, authentication requires demonstrating that the record was generated by the agent at the claimed time, has not been modified, and was generated by a system that was functioning reliably. AG-235's integrity protections, chain-of-custody records, and system reliability evidence provide the foundation for Rule 803(6) and Rule 901 compliance.

UK Civil Evidence Act 1995 — Section 9

Section 9 permits the proof of business records through a certificate identifying the record, describing the manner in which it was compiled, and certifying that it was supplied in the usual course of business. The common law requirement for computer-generated evidence (which supplements the Act) requires evidence that the computer was operating properly at the relevant time. AG-235's system reliability evidence and witness-ready documentation support Section 9 authentication.

EU eIDAS — Electronic Signatures and Timestamps

eIDAS Regulation 910/2014 establishes the legal framework for electronic signatures, electronic seals, and electronic timestamps within the EU. An electronic timestamp that meets eIDAS requirements creates a presumption of the accuracy of the date and time and the integrity of the data. AG-235's cryptographic integrity and timestamp accuracy requirements align with eIDAS standards, supporting the legal admissibility of agent records across EU member states.

MiFID II — RTS 25 (Clock Synchronisation)

RTS 25 specifies clock synchronisation requirements for trading venues and their members — requiring synchronisation to UTC with specific precision requirements (1 millisecond for high-frequency trading, 1 second for other trading). AI trading agents must meet these timestamp precision requirements for their decision and execution records.

10. Failure Severity

FieldValue
Severity RatingHigh
Blast RadiusCase-specific, but with systemic implications if the record-keeping infrastructure is fundamentally flawed

Consequence chain: Records that fail admissibility tests are excluded from evidence. In proceedings where the organisation bears the burden of proof (e.g., demonstrating that an agent's decision was reasonable, that a trade was executed correctly, or that a benefits determination was lawful), excluded records eliminate the organisation's ability to meet its burden. The practical consequence is that the organisation loses cases it should have won — not because the agent acted incorrectly, but because the organisation cannot prove the agent acted correctly. The financial impact is case-specific but can be substantial: the arbitration scenario (USD 1.4 million) involved a single trade; a class action involving thousands of agent decisions could involve exposure of tens or hundreds of millions. The systemic risk is that a fundamental flaw in the record-keeping infrastructure (e.g., no cryptographic integrity, no chain of custody) affects all records, not just those involved in a specific case. A single adverse admissibility ruling can establish a precedent that all records from the same system are unreliable, affecting every pending and future case involving that system.

Cross-references: AG-006 (Tamper-Evident Record Integrity) provides the foundational integrity mechanisms that AG-235 builds upon. AG-066 (Forensic Replay and Evidence Preservation) defines the reproducibility capability that AG-235's computational provenance enables. AG-231 (Legal Hold and Preservation Governance) ensures that records are preserved when legal proceedings are anticipated — AG-235 ensures that the preserved records are admissible. AG-232 (Privilege and Confidential Review Segregation Governance) addresses the segregation of privileged records, which must maintain admissibility while preserving privilege. AG-229 (Jurisdictional Applicability Mapping Governance) determines which jurisdiction's evidentiary standards apply to each set of records.

Cite this protocol
AgentGoverning. (2026). AG-235: Evidence Admissibility Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-235