AG-190

Governance Reporting Fidelity Governance

Protocolised Ecosystems, Long-Running Tasks & Tomorrow's Agents ~17 min read AGS v2.1 · April 2026
EU AI Act SOX FCA NIST ISO 42001

2. Summary

Governance Reporting Fidelity Governance requires that governance dashboards, status reports, and summary indicators accurately reflect the underlying governance state and that every summary metric can be drilled down to the specific evidence that produced it. As governance frameworks scale across hundreds of agents, thousands of policies, and millions of actions, organisations necessarily rely on summary views — traffic-light indicators, compliance scores, aggregate metrics. AG-190 mandates that these summaries are faithful representations of the underlying reality and that the path from any summary indicator to its constituent evidence is navigable, complete, and verifiable. Without this dimension, governance summaries become detached from governance reality, creating a false sense of security that can persist until a major incident reveals the gap.

3. Example

Scenario A — Green Dashboard Conceals Critical Failure: A financial services firm monitors 47 AI agents through a governance dashboard. The dashboard shows a "green" overall compliance status based on an aggregate score: 46 of 47 agents report full compliance. The one non-compliant agent — a high-frequency trading agent responsible for £12,000,000 in daily volume — has a critical mandate enforcement failure that has been masked in the aggregate. The individual agent's compliance entry shows "partial compliance" because it passes 14 of 15 test categories, but the failed category is "mandate enforcement under concurrency" — meaning the agent can exceed its trading limits under concurrent request conditions. The dashboard's aggregation logic counts 14/15 as 93% compliant, and 93% rounds to "green" under the organisation's threshold of 90%. A £3,200,000 loss event occurs when the agent exploits the concurrency gap during a volatile trading session. The board, which relied on the dashboard's green status, asks why they were not informed of the risk. The answer is that the dashboard's aggregation logic concealed a critical failure behind a pass threshold.

What went wrong: The summary metric (aggregate compliance percentage) was arithmetically accurate but semantically misleading. A critical control failure in a high-impact agent was treated equally with low-risk compliance items in the aggregation. The dashboard had no mechanism to distinguish between "14/15 low-risk items pass" and "14/15 items pass but the one failure is critical." The drill-down path existed but required navigating 4 levels of hierarchy to find the specific failure, and the summary did not signal that a drill-down was warranted. Consequence: £3,200,000 trading loss, FCA enforcement inquiry, board-level governance failure finding.

Scenario B — Stale Summary Conceals Governance Regression: A healthcare organisation's governance summary shows "98% policy compliance across all clinical agents" — a figure calculated from the most recent quarterly assessment completed 11 weeks ago. Since that assessment, 3 model upgrades, 7 new tool integrations, and 2 policy changes have occurred. The 98% figure is no longer reflective of the current state, but the dashboard has no staleness indicator. A clinical agent, upgraded 5 weeks ago, now has a capability/control mismatch (AG-189) that the 11-week-old assessment did not detect. The agent makes an inappropriate clinical recommendation based on a capability that did not exist at assessment time. A patient adverse event occurs. The investigation reveals that the governance dashboard showed 98% compliance at the time of the event — a figure that was stale by 11 weeks and did not reflect 12 material changes.

What went wrong: The summary metric had no temporal validity indicator. Decision-makers assumed the 98% figure reflected the current state when it actually reflected a state from 11 weeks ago. The dashboard did not display when the assessment was conducted, how many changes had occurred since, or whether the figure was likely to still be accurate. Consequence: Patient adverse event, CQC investigation, loss of clinical commissioning confidence.

Scenario C — Drill-Down Path Is Incomplete: A regulator requests evidence supporting a governance dashboard's claim that "all agents comply with AG-001 mandate enforcement." The organisation attempts to produce the drill-down from the summary indicator to the underlying evidence. The drill-down shows: aggregate pass rate (level 1) → per-agent pass/fail (level 2) → per-test-category pass/fail (level 3). But level 3 links to test results that are stored in a different system, and that system's retention policy deleted results older than 90 days. For 12 of the 47 agents, the most recent retained test results are from 4 months ago — there is no evidence supporting the current compliance claim. The dashboard continued showing "compliant" because it cached the pass status but lost the link to the evidence. The regulator issues a finding for inadequate record-keeping and inability to substantiate governance claims.

What went wrong: The drill-down path was not an unbroken chain from summary to evidence. The summary system and the evidence system had different retention policies. The summary persisted pass/fail status after the evidence supporting it was deleted. The organisation could not substantiate its governance claims when challenged. Consequence: Regulatory finding, governance certification suspended, 6-month remediation programme required.

4. Requirement Statement

Scope: This dimension applies to all organisations that produce governance summaries, dashboards, status reports, compliance scores, or aggregate indicators for AI agent governance. This includes executive dashboards, board reporting packs, regulatory submissions, client-facing compliance attestations, and internal operational monitoring. Any representation of governance state that abstracts, aggregates, or summarises underlying detail is within scope. The scope extends to summaries produced by AI agents themselves — an agent that reports its own compliance status is producing a governance summary that must be faithful and drillable. Organisations that do not produce any aggregated governance views and rely solely on per-agent, per-test raw data are excluded, though such an approach becomes impractical beyond approximately 5 agents.

4.1. A conforming system MUST ensure that every governance summary metric is derivable from specific, identifiable, retained evidence — no summary indicator may exist without a complete, navigable drill-down path to the evidence that produced it.

4.2. A conforming system MUST display the temporal validity of every summary metric, including when the metric was last calculated, when the underlying evidence was last collected, and how many governance-relevant changes have occurred since the last assessment.

4.3. A conforming system MUST implement severity-weighted aggregation — summary metrics MUST NOT treat critical control failures equivalently with low-risk compliance items in aggregation logic.

4.4. A conforming system MUST prevent summary metrics from displaying a passing status when any critical-severity control is in a failing state, regardless of the aggregate score.

4.5. A conforming system MUST maintain the drill-down path for the full retention period of the summary — if a summary is retained for 7 years, the evidence supporting it MUST also be retained for 7 years.

4.6. A conforming system MUST flag summary metrics as "unsubstantiated" when the supporting evidence is unavailable, expired, or incomplete, rather than displaying the last known value.

4.7. A conforming system SHOULD implement automated fidelity checks that periodically recalculate summary metrics from underlying evidence and compare the result against the displayed value.

4.8. A conforming system SHOULD provide one-click (or equivalent single-action) drill-down from any summary indicator to the specific evidence that produced it, without requiring navigation through intermediate systems.

4.9. A conforming system SHOULD implement anomaly detection on summary metrics to flag statistically improbable stability (e.g., a metric that has not changed in 6 months despite continuous agent operation).

4.10. A conforming system MAY implement confidence intervals on summary metrics that reflect the age and coverage of the underlying evidence.

4.11. A conforming system MAY implement counter-factual summary views — showing what the summary would display under alternative aggregation methodologies to highlight sensitivity to aggregation choices.

5. Rationale

Governance at scale requires abstraction. No board member, regulator, or senior manager can process the raw output of every governance test across every agent in real time. Summaries are essential — they enable human oversight of systems too complex for detailed human review. But summaries introduce a new risk: the summary may not faithfully represent the underlying reality.

Summary infidelity arises from multiple sources. Aggregation logic can mask critical failures behind passing scores — a 95% compliance rate sounds strong until you discover that the 5% failure includes the mandate enforcement control for your highest-value agent. Temporal staleness means the summary reflects a past state, not the current one — a 98% compliance figure from 3 months ago may bear no resemblance to today's actual compliance after intervening changes. Drill-down gaps mean the summary cannot be substantiated — the dashboard says "compliant" but the evidence that would prove it has been deleted, migrated, or was never collected.

These are not theoretical risks. They are the governance equivalent of "dashboard-driven management" — decisions based on indicators that have become detached from the reality they claim to represent. In financial services, this is sometimes called "green screen syndrome" — every indicator shows green, creating complacency that persists until a red event that the indicators failed to predict.

AG-190 addresses summary infidelity through three mechanisms. First, fidelity requirements ensure that summary metrics are accurate representations of the underlying evidence at the time they are displayed. Second, temporal validity requirements ensure that decision-makers know how current the summary is. Third, drill-down integrity requirements ensure that every summary can be substantiated by navigating from the aggregate to the specific evidence that produced it, and that this navigation path remains intact for the full retention period.

The dimension is detective rather than preventive because it detects when summaries diverge from reality, rather than preventing the underlying governance failures. It complements the preventive and recovery controls in other dimensions by ensuring that those controls' outputs are faithfully represented in the governance oversight layer.

6. Implementation Guidance

AG-190 implementation requires a summary fidelity framework, temporal validity tracking, and drill-down architecture.

Recommended Patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. Board-level governance reporting for AI agents must align with existing risk reporting frameworks. The summary should integrate with the firm's operational risk dashboard, with governance metrics presented alongside other operational risk indicators. FCA expectations for board reporting (SYSC 4.3A) apply to AI governance summaries — the board must receive information that is "timely, accurate, and complete."

Healthcare. Clinical governance dashboards must distinguish between administrative compliance (documentation, training records) and clinical safety compliance (clinical decision-making controls, patient safety monitoring). A clinical safety failure must not be aggregated with administrative items. CQC inspection frameworks require that governance reports are substantiable — inspectors will drill down.

Public Sector. Governance summaries for public-sector AI agents may be subject to Freedom of Information requests. The summary and its drill-down path must be comprehensible to non-specialist reviewers. The Algorithmic Transparency Recording Standard (ATRS) in the UK requires public-sector organisations to publish information about algorithmic tools — governance summary fidelity ensures that published information is accurate.

Maturity Model

Basic Implementation — Governance summaries exist with drill-down to per-agent, per-test detail. Aggregation is equal-weight (no severity weighting). Summaries are refreshed quarterly or on demand. The drill-down path covers 2 levels (aggregate → per-agent). Evidence is retained but not lifecycle-linked to the summary. This meets minimum requirements but is vulnerable to critical-failure masking, temporal staleness, and evidence-summary retention misalignment.

Intermediate Implementation — Severity-weighted aggregation with critical override is implemented. Every summary metric displays temporal validity indicators (calculated date, evidence date, changes since). Drill-down covers 3+ levels (aggregate → per-agent → per-test → specific evidence). Evidence retention is lifecycle-linked to summary retention. Daily automated fidelity reconciliation runs. Unsubstantiated metrics are flagged.

Advanced Implementation — All intermediate capabilities plus: real-time summary computation from underlying evidence (no caching), anomaly detection on summary stability, confidence intervals on summary metrics, counter-factual aggregation views, integration with the organisation's enterprise risk management dashboard, and independent audit of summary fidelity as part of the annual governance review.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Drill-Down Completeness

Test 8.2: Critical Override Enforcement

Test 8.3: Temporal Validity Display

Test 8.4: Fidelity Reconciliation

Test 8.5: Evidence Retention Alignment

Test 8.6: Severity-Weighted Aggregation Accuracy

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 13 (Transparency)Direct requirement
EU AI ActArticle 11 (Technical Documentation)Supports compliance
FCA SYSC4.3A (Management Reporting)Direct requirement
SOXSection 302 (Corporate Responsibility for Financial Reports)Supports compliance
NIST AI RMFGOVERN 1.5, MEASURE 4.1Supports compliance
ISO 42001Clause 9.1 (Monitoring, Measurement, Analysis and Evaluation)Direct requirement
UK ATRSAlgorithmic Transparency Recording StandardSupports compliance
CQCWell-Led Framework — Governance and ManagementSupports compliance

EU AI Act — Article 13 (Transparency)

Article 13 requires that high-risk AI systems "are designed and developed in such a way as to ensure that their operation is sufficiently transparent to enable deployers to interpret a system's output and use it appropriately." Governance summaries are a primary mechanism through which deployers understand the governance state of their AI systems. If these summaries are unfaithful — showing compliance when the underlying reality is non-compliant — the transparency requirement is defeated. AG-190 ensures that governance summaries meet the spirit of Article 13 by being accurate, current, and substantiable.

FCA SYSC — 4.3A (Management Reporting)

SYSC 4.3A requires that "appropriate management reporting" is provided to the governing body and senior management. For AI agent governance, this means that board-level reporting must accurately reflect the governance state. The FCA has made clear through supervisory practice that it expects board reports to be timely, accurate, and actionable. A governance dashboard that shows green when a critical control is failing does not meet this standard. AG-190's severity-weighted aggregation with critical override ensures that management reporting is not misleading.

SOX — Section 302 (Corporate Responsibility for Financial Reports)

Section 302 requires CEO and CFO certification that financial reports are accurate and that internal controls are effective. If AI agents contribute to financial operations and the governance summary that supports the officer's certification is unfaithful, the officer may be certifying based on misleading information. AG-190's fidelity requirements ensure that the governance information supporting officer certifications is substantiable.

ISO 42001 — Clause 9.1 (Monitoring, Measurement, Analysis and Evaluation)

Clause 9.1 requires organisations to determine "what needs to be monitored and measured," "the methods for monitoring, measurement, analysis and evaluation," and "when the results shall be analysed and evaluated." AG-190 directly addresses the fidelity of this monitoring and measurement output — ensuring that the methods produce accurate results and that the analysis and evaluation are substantiable.

10. Failure Severity

FieldValue
Severity RatingHigh
Blast RadiusOrganisation-wide — affects all governance oversight and decision-making that relies on summary indicators

Consequence chain: Summary infidelity creates a meta-governance failure — the governance of governance itself is broken. Decision-makers (board members, senior managers, regulators) rely on summary indicators to allocate attention and resources. If the summary shows green, attention goes elsewhere; if it shows red, attention is directed to the problem. An unfaithful summary misdirects attention systematically, allowing actual governance failures to persist unaddressed while resources are consumed by non-critical items that the summary highlights. The consequence is delayed detection and response to governance failures — the underlying failure exists, but the summary conceals it from the people who could authorise remediation. In Scenario A, the trading loss was £3,200,000 because the critical control failure persisted for weeks behind a green dashboard. If the summary had faithfully reflected the critical failure, remediation would have occurred before the loss event. The regulatory consequence is compounded: not only did the governance control fail, but the governance reporting failed to alert the governing body — creating a dual finding for both inadequate controls and inadequate management information.

Cross-references: AG-001 (Operational Boundary Enforcement) — mandate enforcement status is a critical summary metric that must not be masked by aggregation; AG-007 (Governance Configuration Control) — summary methodology is a governed configuration artefact; AG-153 (Control Efficacy Measurement) — efficacy measurements feed summary metrics and must maintain drill-down integrity; AG-019 (Human Escalation & Override Triggers) — summary infidelity may suppress escalation triggers that depend on summary status; AG-189 (Capability/Control Mismatch Detection Governance) — mismatch status must be faithfully represented in summaries, not masked by aggregation with lower-risk metrics; AG-191 (Multi-Human Authority Conflict Governance) — governance summaries must accurately reflect unresolved authority conflicts.

Cite this protocol
AgentGoverning. (2026). AG-190: Governance Reporting Fidelity Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-190