AG-221: Assurance Evidence Schema Governance

2. Summary

Assurance Evidence Schema Governance requires that organisations standardise the evidence objects, fields, digital signatures, and retention rules used to prove conformance across all governance controls. Every control dimension in the Agent Governance Standard produces evidence — blocked action logs, configuration snapshots, test results, audit trails. Without a standardised schema governing these evidence artefacts, each control produces evidence in ad hoc formats, making cross-control analysis impossible, automated validation unreliable, and auditor review prohibitively expensive. The evidence schema MUST define the structure, mandatory fields, signature requirements, and retention periods for every evidence type, and MUST be versioned and change-controlled alongside the taxonomy and dependency register.

3. Example

Scenario A — Heterogeneous Evidence Formats Block Automated Audit: An organisation implements 47 governance controls across 8 AI agents. Each control was implemented by a different team, and each team chose its own evidence format: some produce JSON logs, others CSV exports, others PDF reports, and two produce proprietary binary formats from commercial governance tools. When the annual conformance assessment begins, the assessor spends 340 hours (£68,000 at external audit rates) simply normalising evidence into a comparable format before substantive assessment can begin. Three controls produce evidence that cannot be correlated with evidence from related controls because timestamps use different formats (Unix epoch vs. ISO 8601 vs. local time without timezone), agent identifiers use different naming conventions, and action references are not cross-indexed.

What went wrong: No evidence schema existed. Each implementation team made independent format decisions. Evidence objects lack common mandatory fields (standardised timestamps, agent identifiers, control references). Consequence: £68,000 in normalisation costs per assessment, 3 controls with uncorrelatable evidence, overall assessment confidence reduced because cross-control analysis was impossible.

Scenario B — Unsigned Evidence Undermines Regulatory Submission: An organisation submits conformance evidence to a financial regulator as part of its AI agent authorisation process. The evidence includes 18 months of blocked action logs from AG-001 enforcement. During the regulator's review, the regulator asks: "How do you demonstrate that these logs have not been modified since creation?" The organisation cannot answer — the logs are plain text files on a shared network drive with no digital signatures, no hash chains, and no tamper-evident storage. The regulator declines to accept the evidence, requiring the organisation to re-implement its evidence pipeline with integrity controls before resubmitting. The re-implementation takes 6 months. During this period, the agent authorisation is suspended.

What went wrong: The evidence schema did not specify integrity requirements (digital signatures, hash chains, tamper-evident storage). Evidence was produced and stored in a format that could be modified without detection. Consequence: 6-month authorisation suspension, £420,000 in re-implementation costs, reputational damage with the regulator.

Scenario C — Inconsistent Retention Rules Create Legal Exposure: An organisation faces litigation related to an AI agent's actions 4 years ago. The organisation's legal team requests all governance evidence for the relevant period. Three controls retained evidence for 7 years (meeting financial services requirements), four controls retained evidence for 3 years (meeting the minimum default), and two controls retained evidence for only 1 year (set by the implementation team based on storage cost concerns). The evidence for the two short-retention controls has been purged. These controls are material to the litigation because they would have demonstrated that the agent was operating within governed parameters. The organisation cannot prove its governance posture at the time of the incident.

What went wrong: No standardised retention rules existed. Each implementation team set retention independently, without reference to regulatory or legal requirements. Consequence: Inability to demonstrate governance posture for a material litigation, adverse inference drawn by the court, settlement at £1.8 million — 4x the estimated judgment with evidence.

4. Requirement Statement

Scope: This dimension applies to all evidence artefacts produced by any governance control within the Agent Governance Standard. "Evidence" means any data object created to demonstrate that a governance control is implemented, operating, or effective. This includes but is not limited to: action logs, configuration snapshots, test results, assessment reports, approval records, audit trails, and performance metrics. The scope covers the schema definition (structure and fields), integrity requirements (signatures and tamper evidence), retention rules, and access controls for all evidence types. Organisations using third-party governance platforms must verify that the platform's evidence output conforms to the organisation's evidence schema or must implement a schema-conforming extraction layer.

4.1. A conforming system MUST define a versioned evidence schema specifying the structure, mandatory fields, data types, and validation rules for every evidence type produced by implemented governance controls.

4.2. A conforming system MUST include the following mandatory fields in every evidence object: unique evidence identifier, creation timestamp in ISO 8601 format with timezone, source control dimension identifier (AG number), source agent identifier, evidence type classification, and schema version reference.

4.3. A conforming system MUST apply digital signatures or cryptographic hash chains to evidence objects at creation time, enabling subsequent verification that the evidence has not been modified since creation.

4.4. A conforming system MUST define and enforce retention periods for each evidence type, aligned with the most stringent applicable regulatory, legal, and contractual requirement, with a minimum default of 3 years for unregulated contexts.

4.5. A conforming system MUST validate evidence objects against the schema at creation time, rejecting or flagging evidence that does not conform to the defined structure and mandatory field requirements.

4.6. A conforming system MUST version the evidence schema using immutable version identifiers, with full change history, and must maintain backward compatibility or provide explicit migration paths when the schema evolves.

4.7. A conforming system SHOULD implement a centralised evidence repository or federated evidence index enabling cross-control evidence correlation using the standardised fields.

4.8. A conforming system SHOULD generate automated alerts when evidence production for any control ceases unexpectedly, drops below expected volume by more than 50% for 24 hours, or fails schema validation at a rate exceeding 5% of evidence objects.

4.9. A conforming system MAY implement evidence provenance chains — metadata recording the complete lifecycle of each evidence object from creation through any transformations to archival or purge.

5. Rationale

Evidence is the currency of assurance. A governance control that cannot produce credible, structured, verifiable evidence of its operation provides no assurance value — it is an assertion without proof. The Agent Governance Standard's Section 7 (Evidence Requirements) and Section 8 (Test Specification) of every control dimension specify what evidence is required and what tests must pass. AG-221 governs the evidence infrastructure itself — ensuring that evidence produced across all controls is structured consistently, integrity-protected, and retained appropriately.

Three forces make evidence schema governance essential. First, scale: an organisation with 20 agents and 50 implemented controls may produce millions of evidence objects per year. Without a standardised schema, these objects cannot be indexed, queried, or correlated at scale. Second, integrity: evidence submitted to regulators, auditors, or courts must be demonstrably unmodified. Ad hoc evidence formats without integrity protection are vulnerable to challenge. Third, interoperability: evidence produced by different controls must be correlatable — a blocked action log from AG-001 must be cross-referenceable with the configuration version from AG-007 that defined the limit. This requires common identifiers, common timestamp formats, and common reference fields.

Evidence schema governance is also a precondition for automated conformance assessment. As governance frameworks mature, manual assessment becomes unsustainable. Automated assessment tools require structured, schema-conforming evidence as input. Without a governed schema, every automated tool must implement custom parsers for every evidence format — a combinatorial problem that does not scale.

6. Implementation Guidance

The evidence schema should be implemented as a formal schema definition — JSON Schema, Protocol Buffers, Avro, or equivalent — that can be used for automated validation. The schema should define: (1) a common envelope containing the mandatory fields required by 4.2 for all evidence types, and (2) type-specific payloads for each evidence category (action logs, configuration snapshots, test results, etc.).

Recommended patterns:

Common envelope with typed payload. Every evidence object consists of a common envelope (mandatory fields per 4.2) wrapping a type-specific payload. The envelope enables cross-control indexing and correlation; the payload carries the control-specific evidence data. The envelope schema is governed centrally; payload schemas are governed by each control's implementation team within constraints set by the central schema.
Sign-at-source integrity. Apply digital signatures or append to a cryptographic hash chain at the point of evidence creation, before the evidence enters any transport or storage layer. This ensures that evidence integrity is established at the earliest possible point. Use timestamped signatures (RFC 3161 or equivalent) to prove not just integrity but the time of creation.
Tiered retention with automated lifecycle management. Define retention tiers (e.g., 7 years for financial services, 5 years for regulated sectors, 3 years default) and assign each evidence type to a tier based on the most stringent applicable requirement. Implement automated lifecycle management: evidence is created, stored in the active tier, migrated to archival storage at a defined age, and purged only after the retention period expires with an audit trail of the purge action.
Schema evolution with compatibility contracts. When the evidence schema changes, define explicit compatibility rules: backward compatible changes (adding optional fields) can be applied without migration; breaking changes (removing fields, changing types) require a migration path and a transition period during which both old and new schema versions are accepted. Version the schema and record which evidence objects conform to which schema version.

Anti-patterns to avoid:

Log files as evidence. Unstructured log files (plain text, syslog format) are not evidence objects — they cannot be validated against a schema, cannot be reliably parsed, and cannot be integrity-protected at the object level. Log files may be a source of evidence, but they must be transformed into schema-conforming evidence objects.
Evidence without timestamps or with ambiguous timestamps. Evidence with timestamps in local time without timezone, or with timestamps at insufficient precision (date without time), cannot be reliably ordered or correlated. All timestamps must be ISO 8601 with timezone and at least second precision.
Shared storage without access controls. Evidence stored on shared network drives or in shared databases without access controls can be modified by any authorised user. Evidence storage must enforce append-only access for evidence creators and read-only access for evidence consumers, with modification and deletion restricted to lifecycle management processes.
Retention by convention. Retention periods specified in policy documents but not enforced by technical controls create legal risk. Automated retention enforcement — where evidence is retained for exactly the specified period and cannot be purged early — is the only reliable approach.

Maturity Model

Basic Implementation — The organisation has defined an evidence schema specifying mandatory fields and data types for all evidence types. Evidence objects are validated against the schema at creation time. Retention periods are defined for each evidence type and enforced through manual processes with periodic verification. Digital signatures are applied to evidence objects at creation time. The schema is versioned with change history.

Intermediate Implementation — The evidence schema is machine-readable (JSON Schema or equivalent) and integrated with evidence production pipelines for automated validation. A centralised evidence repository or federated index enables cross-control evidence correlation. Retention lifecycle management is automated — evidence is archived and purged on schedule with audit trails. Schema evolution follows defined compatibility contracts. Evidence production monitoring generates alerts when production volume drops unexpectedly or validation failure rates exceed thresholds.

Advanced Implementation — All intermediate capabilities plus: evidence provenance chains record the complete lifecycle of each evidence object. The evidence schema is independently audited annually for completeness and regulatory alignment. Automated evidence analysis tools consume schema-conforming evidence for continuous conformance monitoring. Evidence integrity is verified periodically (e.g., hash chain validation) to detect storage-layer tampering. Cross-organisation evidence exchange uses the standardised schema for supply-chain assurance.

7. Evidence Requirements

Required artefacts:

Evidence schema definition. The current schema definition in machine-readable format (JSON Schema, Protocol Buffers, Avro, or equivalent) showing all evidence types, mandatory fields, data types, and validation rules. Plus a human-readable rendering.
Schema version history. Complete version history of the evidence schema with timestamps, authors, approvers, and change descriptions. Minimum: all versions since initial adoption or 3 years, whichever is shorter.
Integrity verification records. Results of the most recent evidence integrity verification, demonstrating that digital signatures or hash chains are valid for a representative sample of evidence objects (minimum: 100 objects or 1% of objects in the assessment period, whichever is greater).
Retention compliance records. Evidence demonstrating that retention periods are enforced: objects retained for the required duration, objects purged on schedule (not before), and audit trails of purge actions.
Validation failure records. Records of evidence objects that failed schema validation during the assessment period, including the failure reason and resolution action.

Retention requirements:

Evidence schema versions and integrity records: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise. (Note: the evidence schema retention period must be at least as long as the longest evidence retention period it governs.)

Access requirements:

Producible to regulators or auditors within 48 hours of request.

8. Test Specification

Test 8.1: Mandatory Field Presence Validation

Stimulus: Generate evidence objects from at least 5 different governance controls. Inspect each evidence object for the presence of all mandatory fields defined in Requirement 4.2.
Expected behaviour: Every evidence object contains all mandatory fields with valid values.
Pass criteria: 100% of sampled evidence objects contain all mandatory fields. All timestamps are ISO 8601 with timezone. All control references are valid AG numbers. All agent identifiers resolve to known agents.
Fail criteria: Any evidence object is missing a mandatory field or contains an invalid value in a mandatory field.

Test 8.2: Schema Validation at Creation

Stimulus: Submit evidence objects that violate the schema: missing mandatory fields, wrong data types, invalid enum values, malformed timestamps. Verify rejection or flagging.
Expected behaviour: Non-conforming evidence objects are rejected or flagged at creation time.
Pass criteria: 100% of deliberately non-conforming evidence objects are detected and rejected/flagged. No non-conforming object is accepted as valid evidence.
Fail criteria: Any non-conforming evidence object is accepted without detection.

Test 8.3: Evidence Integrity Verification

Stimulus: Retrieve evidence objects created at various points in the past (e.g., 1 day, 1 month, 6 months, 1 year ago). Verify their digital signatures or hash chain integrity.
Expected behaviour: All evidence objects pass integrity verification — signatures are valid and hash chains are unbroken.
Pass criteria: 100% of sampled evidence objects pass integrity verification across all tested time periods.
Fail criteria: Any evidence object fails integrity verification, indicating potential modification after creation.

Test 8.4: Retention Period Enforcement

Stimulus: Attempt to delete evidence objects that are within their retention period. Attempt to access evidence objects that are at the boundary of their retention period (retention period minus 1 day).
Expected behaviour: Deletion within the retention period is blocked. Evidence at the boundary is accessible.
Pass criteria: Premature deletion is prevented with an error indicating the remaining retention period. Boundary evidence is accessible and intact.
Fail criteria: Evidence within its retention period can be deleted, or evidence near the retention boundary has already been purged.

Test 8.5: Cross-Control Evidence Correlation

Stimulus: Select a specific agent and time window. Retrieve evidence from all controls applicable to that agent during that window. Correlate evidence objects using the standardised agent identifier and timestamp fields.
Expected behaviour: Evidence from different controls for the same agent and time window can be correlated using the common fields. A coherent timeline of governance events can be reconstructed.
Pass criteria: Evidence from at least 80% of applicable controls can be correlated for the selected agent and time window. Correlated evidence produces a consistent, non-contradictory timeline.
Fail criteria: Evidence from multiple controls cannot be correlated due to incompatible identifiers, timestamp formats, or missing cross-reference fields.

Test 8.6: Schema Version Immutability

Stimulus: Retrieve a historical schema version by its version identifier. Compare against the archived version.
Expected behaviour: Historical versions are immutable.
Pass criteria: Exact match between retrieved and archived versions.
Fail criteria: Any historical schema version has been modified after publication.

Conformance Scoring

Score 0: No evidence schema exists — evidence is produced in ad hoc formats with no standardisation, integrity protection, or retention management.
Score 1: An evidence schema exists defining mandatory fields and data types, but is not machine-readable, not enforced by automated validation, and integrity protection is inconsistent.
Score 2: The evidence schema is machine-readable and enforced by automated validation at creation time. Digital signatures or hash chains protect evidence integrity. Retention periods are technically enforced. Evidence can be correlated across controls using standardised fields.
Score 3: Verified by independent audit — an independent party has validated schema enforcement, integrity protection, retention compliance, and cross-control correlation capability. Evidence provenance chains are implemented. Continuous evidence production monitoring is operational.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 12 (Record-Keeping)	Direct requirement
EU AI Act	Article 17 (Quality Management System)	Supports compliance
SOX	Section 802 (Criminal Penalties for Document Alteration)	Supports compliance
SOX	Section 404 (Internal Controls)	Supports compliance
FCA SYSC	9.1 (Record-Keeping)	Direct requirement
NIST AI RMF	GOVERN 1.4 (Documentation)	Supports compliance
ISO 42001	Clause 7.5 (Documented Information)	Direct requirement
DORA	Article 10 (ICT-Related Incident Reporting)	Supports compliance
GDPR	Article 5(2) (Accountability Principle)	Supports compliance

EU AI Act — Article 12 (Record-Keeping)

Article 12 requires providers of high-risk AI systems to ensure automatic recording of events ("logs") with sufficient detail to trace the system's operation. AG-221 implements this requirement at the governance level — standardising the format, integrity, and retention of all governance evidence ensures that regulatory record-keeping obligations are met consistently across all controls, not just for operational logs but for the full evidence portfolio that demonstrates governance effectiveness.

SOX — Section 802 (Criminal Penalties for Document Alteration)

Section 802 establishes criminal penalties for knowingly altering, destroying, or concealing documents or records with intent to impede an investigation. For AI governance evidence, this means evidence integrity is not merely a governance best practice but a legal obligation. AG-221's requirements for digital signatures, hash chains, and enforced retention directly support compliance by making evidence modification detectable and premature destruction technically prevented.

ISO 42001 — Clause 7.5 (Documented Information)

Clause 7.5 requires the organisation to manage documented information needed by the AI management system, including ensuring its availability, suitability, and adequate protection. AG-221 provides the schema-level governance that ensures documented information (evidence) is structured consistently, integrity-protected, and retained appropriately.

FCA SYSC — 9.1 (Record-Keeping)

SYSC 9.1 requires firms to arrange for orderly records to be kept of their business and internal organisation. For AI agent governance, this extends to the evidence demonstrating that governance controls are operating effectively. The FCA expects records to be producible in a format that enables regulatory analysis — ad hoc evidence formats do not meet this expectation.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Organisation-wide — affects the credibility of all governance evidence across all controls

Consequence chain: Without evidence schema governance, evidence produced by individual controls is heterogeneous, unverifiable, and uncorrelatable. The immediate failure mode is assessment inefficiency — auditors and assessors spend disproportionate time normalising evidence rather than evaluating it. The more severe failure mode is evidence integrity failure — evidence submitted to regulators or courts that cannot be demonstrated to be unmodified is vulnerable to challenge and may be rejected. The downstream consequence is loss of governance credibility: an organisation that cannot produce structured, integrity-protected evidence of its governance operations cannot demonstrate conformance. In litigation, missing or unreliable evidence triggers adverse inferences. In regulatory proceedings, it triggers findings of inadequate record-keeping.

Cross-references: AG-219 (Control Taxonomy Governance) defines the control set for which evidence schemas must exist. AG-007 (Governance Configuration Control) governs the versioning of the evidence schema itself. AG-056 (Independent Validation) consumes schema-conforming evidence as input to validation activities. AG-157 (External Conformance Assessment) requires structured evidence for external assessor review. AG-153 (Control Efficacy Measurement) produces evidence objects that must conform to the evidence schema.

Cite this protocol

AgentGoverning. (2026). AG-221: Assurance Evidence Schema Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-221

← Previous Protocol

AG-220

Control Dependency Governance

Next Protocol →

AG-222

Conformance Profile Governance