AG-779

Regulatory Reporting Integrity Governance

Output Integrity and Transparency Governance ~16 min read AGS v2.1 · 2026-04-25
EU AI Act NIST AI RMF ISO 42001

1. Definition

Regulatory Reporting Integrity Governance ensures the accuracy, completeness, timeliness, and auditability of all regulatory reports, compliance attestations, and supervisory submissions that are generated, assembled, or verified by AI agents. As organisations increasingly delegate regulatory reporting tasks to agents -- from generating prudential returns and transaction reports to compiling risk disclosures and filing suspicious activity reports -- the integrity of these outputs becomes a critical governance concern with direct legal and financial consequences.

Inaccurate regulatory reporting can trigger severe penalties. The FCA has imposed fines exceeding GBP 100 million for reporting failures, and the PRA requires that firms' reporting systems produce accurate data in both normal and stressed conditions. When agents are responsible for any part of the reporting chain -- data extraction, calculation, formatting, validation, or submission -- every step must be governed with the same rigour as human-produced reports, and in many cases with additional controls to address the opacity and potential for hallucination inherent in AI-generated outputs.

AG-779 mandates a four-layer integrity framework: (1) source data validation, ensuring agents consume accurate and complete upstream data; (2) calculation verification, ensuring agent-performed calculations match independently verified expected results; (3) output reconciliation, ensuring the final report reconciles with source data and intermediate calculations; and (4) submission assurance, ensuring the correct report reaches the correct regulator in the correct format within the mandated deadline. Each layer requires specific controls, tolerance thresholds, and exception handling procedures.

The dimension also addresses the attestation integrity challenge. When a compliance officer signs a regulatory attestation that was partially or wholly prepared by an agent, the attestation must include a disclosure of the AI agent's involvement, the specific sections the agent contributed to, and the human verification steps performed. This transparency requirement aligns with the EU AI Act Art. 17 quality management obligations and ensures that regulators understand the provenance of the information they receive.

AG-779 recognises that the regulatory reporting landscape is increasingly complex, with overlapping and sometimes conflicting reporting obligations across jurisdictions. A single financial transaction may generate reporting obligations to 3-5 regulators simultaneously (e.g., MiFID II transaction report to the NCA, EMIR derivative report to the trade repository, suspicious transaction report to the FIU, and a FATF travel rule report to the counterparty). Agents managing this complexity must maintain separate validation pipelines for each reporting obligation while ensuring consistency across all reports generated from the same underlying data.

2. Scope

This dimension applies to all AI agent deployments operating under the AGS framework where the governance controls specified in Section 4 are relevant to the agent's operational context. Specifically:

Exclusions: Agents operating in fully sandboxed research environments with no access to production data or systems are excluded, subject to the condition that any transition to production immediately triggers compliance with this dimension. Single-purpose read-only agents with no write access to external systems may be excluded where a documented risk assessment confirms that the governance controls specified here are not applicable to the agent's operational scope.

Industry Considerations

Financial Services. Agents operating in financial services face heightened regulatory scrutiny under MiFID II, DORA, and FCA SYSC requirements. The controls in this dimension support compliance with these frameworks and should be implemented at the most stringent level applicable to the agent's transaction authority.

Healthcare. Agents processing patient data or supporting clinical decisions must implement this dimension's controls in conjunction with HIPAA safeguards and applicable medical device regulations. The governance controls directly support the duty of care that healthcare organisations owe to patients.

Public Sector. Government agencies deploying agents that affect individual rights or public services must implement this dimension's controls to satisfy transparency, accountability, and judicial review requirements applicable to algorithmic decision-making in the public sector.

3. Why This Matters

Regulatory Reporting Integrity Governance addresses a governance gap that, if left unmanaged, creates systemic risk across the agent ecosystem. As AI agents move from experimental deployments to production operations with real-world consequences, the absence of structural controls in this area means that failures scale with the speed and autonomy of the agent population — not at the pace of human review.

Traditional approaches to this governance challenge — contractual obligations, periodic audits, and application-layer policy enforcement — are necessary but insufficient for agentic contexts. Contractual obligations operate on legal timescales; agents operate on millisecond timescales. Periodic audits capture a snapshot; agent behaviour is continuous and dynamic. Application-layer enforcement can be bypassed through prompt injection, reasoning failure, or context manipulation. The AGS approach requires structural enforcement at the infrastructure layer — controls that operate independently of the agent's reasoning process and cannot be circumvented by the agent's own outputs.

The regulatory environment increasingly mandates the controls this dimension specifies. The EU AI Act requires risk management systems proportionate to identified risks. NIST AI RMF requires organisations to map, measure, and manage AI risks through enforceable controls. ISO 42001 requires an AI management system with documented operational procedures. This dimension operationalises these regulatory requirements into specific, testable, infrastructure-enforceable controls — bridging the gap between regulatory intent and technical implementation.

The consequences of absence are illustrated in Section 8 (Failure Scenarios). When this dimension is not implemented, the resulting governance gap permits agent behaviour that can cause material financial loss, regulatory enforcement action, reputational damage, and — in safety-critical deployments — physical harm. The blast radius scales with the agent's access scope and operational autonomy.

4. Requirements

  1. All regulatory reports generated, assembled, or verified by agents MUST pass through a four-layer integrity framework: source data validation, calculation verification, output reconciliation, and submission assurance.
  2. Calculation verification MUST use an independent calculation pathway (separate model instance, separate code path, or human spot-check) for all material figures.
  3. Discrepancies between primary and verification calculations MUST be investigated and resolved before submission if they exceed defined tolerance thresholds.
  4. Tolerance thresholds MUST be defined for each regulatory report type, calibrated to regulatory materiality (e.g., 5 basis points for capital ratios, 1% for transaction volumes, zero tolerance for SAR completeness).
  5. Regulatory report submissions MUST include a provenance disclosure identifying which sections were agent-generated, which were human-reviewed, and the verification steps performed.
  6. All data inputs consumed by agents for regulatory reporting MUST be validated against authoritative source systems with documented lineage.
  7. Agents MUST NOT submit regulatory reports without at least one human review and sign-off for all reports classified as material or high-risk.
  8. Agents generating SARs or equivalent suspicious transaction reports MUST achieve 100% completeness against underlying alerts, with zero silent failures.
  9. Data enrichment failures, API timeouts, and other processing errors during regulatory report generation MUST be treated as mandatory escalation events, not skip events.
  10. Regulatory reporting audit trails MUST capture every data transformation, calculation, validation result, and human interaction in the reporting chain.
  11. Organisations MUST conduct quarterly reconciliation of agent-generated regulatory reports against independently prepared control totals.
  12. Late submission risk MUST be monitored in real time, with alerts at 72-hour, 24-hour, and 4-hour warning thresholds before regulatory deadlines.

5. Maturity Model

Basic Implementation — The organisation has documented policies addressing regulatory reporting integrity and has implemented initial controls. Implementation is primarily at the application layer with manual processes for monitoring and response. Logging covers key events but may lack full metadata. Coverage extends to the most critical agent deployments but may not encompass all in-scope systems. Staff are aware of requirements but formal training may be incomplete.

Intermediate Implementation — All Basic capabilities plus: controls are enforced at the infrastructure layer with automated monitoring and alerting. All MUST requirements from Section 4 are implemented with documented evidence. Coverage extends to all in-scope agent deployments. Audit trails are tamper-evident and retained per regulatory requirements. Formal change control governs all configuration changes. Regular review cycles are established and documented. Staff receive formal training and competency is assessed.

Advanced Implementation — All Intermediate capabilities plus: controls have been validated through independent adversarial testing. Real-time dashboards provide operational visibility into compliance status, anomaly detection, and response metrics. The organisation can demonstrate to regulators and counterparties that no known attack vector bypasses the governance controls. Continuous improvement processes incorporate lessons from incidents, testing, and regulatory developments. Integration with related dimensions provides defence-in-depth coverage.

Implementation Patterns

Tamper-evident audit trail. Implement all governance event logging in an append-only, integrity-protected data store independent of the agent runtime. Every governance decision, configuration change, and enforcement action is recorded with full metadata including timestamps, actor identities, and outcomes.

Real-time monitoring with graduated alerting. Deploy monitoring infrastructure that evaluates governance compliance continuously rather than periodically. Implement graduated alert severity levels with defined response procedures for each level, ensuring that critical governance violations trigger immediate automated response.

Scheduled governance review cycle. Establish a formal review cadence (minimum quarterly) that examines governance effectiveness, reviews incident data, assesses emerging risks, and updates policies and controls accordingly. Review outcomes are documented and tracked.

Separation of governance and agent runtime domains. Deploy governance enforcement infrastructure in a security domain separate from the agent runtime. The agent cannot influence governance decisions, modify enforcement configuration, or access governance logs directly. This architectural separation is the foundation for infrastructure-layer enforcement.

Defined escalation paths with human oversight integration. Establish clear escalation procedures for governance events that exceed automated response capability. Human oversight touchpoints are defined, documented, and tested. Override mechanisms require authenticated authorisation with full audit trail.

Anti-Patterns

Governance by instruction rather than infrastructure. Relying on agent system prompts or configuration files to enforce governance controls rather than infrastructure-layer enforcement. Instruction-based controls can be bypassed through prompt injection, context manipulation, or reasoning failure.

Monitoring without enforcement. Implementing detection and logging of governance violations without pre-execution blocking. By the time a violation is logged, the ungoverned action has already executed. Detection is necessary but not sufficient; prevention must be the primary control.

Manual processes for machine-speed operations. Relying on human review processes for governance decisions that occur at machine speed. Agents execute actions in milliseconds; governance controls that depend on human review cycles of hours or days leave gaps that scale with agent autonomy.

6. Test Criteria

Test Case 779-TC-01: Calculation Verification Discrepancy Detection

Objective: Verify that the independent calculation pathway detects material discrepancies. Procedure: Introduce a deliberate 8 basis point error in a CET1 capital ratio calculation. Run through the verification layer with a 5 basis point tolerance. Expected Result: Discrepancy detected. Report held for investigation. Error identified and reported. Pass Criteria: Detection within one verification cycle. Report not submitted until resolved.

Test Case 779-TC-02: SAR Completeness Audit

Objective: Confirm that all transaction monitoring alerts have corresponding SAR decisions. Procedure: Generate 100 test alerts. Process through the SAR agent. Audit for completeness. Expected Result: All 100 alerts have a documented decision (SAR filed, or justified false positive assessment, or escalation). Pass Criteria: Zero undocumented alerts. No silent failures.

Test Case 779-TC-03: Provenance Disclosure Completeness

Objective: Verify that regulatory report submissions include complete AI involvement disclosure. Procedure: Generate 5 different regulatory report types. Inspect each for provenance disclosure metadata. Expected Result: All 5 reports contain: agent-generated sections identified, human-reviewed sections identified, verification steps documented. Pass Criteria: 100% provenance disclosure completeness across all report types.

Test Case 779-TC-04: Deadline Warning System

Objective: Test that submission deadline warnings fire at the correct thresholds. Procedure: Simulate a regulatory report approaching its deadline. Verify alerts at 72-hour, 24-hour, and 4-hour thresholds. Expected Result: Three alerts generated at the correct intervals. Pass Criteria: All three alerts fire within 5 minutes of their respective thresholds.

Test Case 779-TC-05: Data Lineage Traceability

Objective: Verify that every data element in a regulatory report can be traced to its authoritative source. Procedure: Select 50 data elements from a completed regulatory report. Trace each to its source system record. Expected Result: All 50 elements traceable with documented transformation steps. Pass Criteria: 100% traceability. Zero orphan data elements.

Test Case 779-TC-06: Error Escalation for Processing Failures

Objective: Confirm that data enrichment failures trigger escalation rather than silent skip. Procedure: Simulate a data enrichment API timeout during SAR generation for 5 alerts. Expected Result: All 5 alerts escalated to human review. Zero skipped. Agent logs include timeout details. Pass Criteria: 100% escalation rate. Zero silent failures.

Evidence Artefacts

Evidence IDDescriptionCollection FrequencyRetention Period
AG779-E01Calculation verification results and discrepancy logsPer report10 years
AG779-E02Source data validation and lineage recordsPer report10 years
AG779-E03SAR completeness audit resultsMonthly10 years
AG779-E04Provenance disclosure metadata for submitted reportsPer submission10 years
AG779-E05Deadline monitoring alerts and submission timestampsPer report7 years
AG779-E06Human review and sign-off records for material reportsPer report10 years
AG779-E07Quarterly reconciliation of agent vs. control totalsQuarterly7 years

7. Scoring

ScoreLevelDescription
0No implementationNo regulatory reporting integrity governance exists. The organisation has no controls, policies, or monitoring in place for the capabilities this dimension governs. Agent behaviour in this area is ungoverned.
1BasicBasic detection mechanisms exist but operate at the application layer. Detection may be manual, periodic, or threshold-based without real-time monitoring. Alerts are generated but may lack automated response. Coverage is partial — not all relevant agent behaviours or data flows are monitored.
2Infrastructure-layer enforcementDetection is enforced at the infrastructure layer with real-time monitoring across all relevant agent behaviours and data flows. Automated alerting with structured response procedures. Detection logic operates in a separate security domain from the agent runtime. Full audit trail with tamper-evident logging.
3Verified by independent adversarial testingAll Level 2 capabilities are in place and have been validated through independent adversarial testing. An independent party has attempted to bypass, circumvent, or degrade the governance controls using known attack techniques relevant to this dimension and has failed. Test results are documented, reproducible, and available for regulatory review.

8. Failure Scenarios

Scenario A: Capital Adequacy Report Validation

A Financial-Value Agent is tasked with generating a quarterly Common Equity Tier 1 (CET1) capital ratio report for PRA submission. The bank's CET1 ratio is a critical prudential metric; a miscalculation could trigger unnecessary supervisory intervention or, worse, mask genuine capital inadequacy. The agent calculates a CET1 ratio of 13.42% based on GBP 8.7 billion in CET1 capital and GBP 64.8 billion in risk-weighted assets (RWA). AG-779's calculation verification layer independently recomputes the RWA using a separate model instance and a snapshot of the same input data. The independent calculation produces an RWA of GBP 65.3 billion, yielding a CET1 ratio of 13.32% -- a 10 basis point discrepancy. While both figures are above the regulatory minimum (4.5% + buffers), the discrepancy exceeds the AG-779 tolerance threshold of 5 basis points for CET1 reporting. The report is held for investigation. Root cause analysis reveals that the primary agent incorrectly applied a credit risk weight of 75% (retail IRB) to a GBP 500 million corporate exposure that should have received a 100% weight under the standardised approach, due to a recent portfolio reclassification that the agent's data pipeline had not yet ingested. The error is corrected, the report is regenerated at 13.34%, and submitted within the PRA deadline with a 6-hour margin. Estimated prevented regulatory impact: potential supervisory review and remediation order.

Scenario B: Suspicious Activity Report Completeness Audit

A Customer-Facing Agent at an anti-money laundering (AML) operations centre generates Suspicious Activity Reports (SARs) for submission to the National Crime Agency (NCA). In Q1 2026, the agent generates 347 SARs. AG-779's output reconciliation layer performs a completeness audit by cross-referencing the generated SARs against the underlying transaction monitoring alerts. The audit discovers 12 alerts (3.5%) where the transaction monitoring system flagged suspicious activity but the agent did not generate a corresponding SAR. Investigation reveals that 8 of the 12 alerts were correctly assessed by the agent as false positives (documented justification present), but 4 alerts lacked any documented assessment. For these 4 cases, the agent had processed the alert but encountered a data enrichment timeout and silently moved to the next alert without logging the failure. AG-779's integrity controls flag this as a Critical finding: a SAR generation failure for a genuine suspicious transaction could constitute a criminal offence under the Proceeds of Crime Act 2002 (Section 330: failure to disclose). The 4 alerts are immediately escalated to human AML analysts, who determine that 2 require SARs. The SARs are filed within 24 hours of discovery. The agent's error handling is redesigned to treat data enrichment timeouts as mandatory escalation events rather than skip events. Estimated prevented legal risk: criminal liability for failure to report.

9. Regulatory Mapping

RegulationProvisionRelationship Type
#Framework / Standard_Pending v2.1 editorial review_
---------------------------------------_Pending v2.1 editorial review_
1FCA SUP 15_Pending v2.1 editorial review_
2PRA Reporting Requirements_Pending v2.1 editorial review_
3EU AI Act_Pending v2.1 editorial review_
4DORA_Pending v2.1 editorial review_
5Basel Committee_Pending v2.1 editorial review_
6EU CRR/CRD_Pending v2.1 editorial review_
7MiFID II_Pending v2.1 editorial review_
8FCA MAR_Pending v2.1 editorial review_
9Proceeds of Crime Act 2002_Pending v2.1 editorial review_
10NIST AI RMF_Pending v2.1 editorial review_
11SOC 2 Type II_Pending v2.1 editorial review_
12ISO/IEC 42001:2023_Pending v2.1 editorial review_
13FINMA Circular_Pending v2.1 editorial review_
14MAS Notice_Pending v2.1 editorial review_
15SEC Regulation S-X_Pending v2.1 editorial review_
16EBA Guidelines_Pending v2.1 editorial review_
DimensionNameRelationship
AG-771Cross-Jurisdictional Governance ComplianceMulti-jurisdiction reporting requirements
AG-774Autonomous Financial Market Impact GovernanceAccuracy of trade reporting and market data submissions
AG-773Quantum-Resilient Cryptographic GovernanceCryptographic integrity of archived regulatory submissions
AG-772Synthetic Media and Deepfake Detection GovernancePreventing synthetic data in regulatory reports
AG-778Human-Agent Relationship Boundary GovernanceTransparent disclosure of AI involvement in attestations
AG-775Agent Succession and Failover GovernanceReporting continuity during agent failover events
Cite this protocol
AgentGoverning. (2026). AG-779: Regulatory Reporting Integrity Governance. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-779