The Standard

The 841 Dimensions Regulatory Mapping Version History

Compliance

Compliance Leaderboard Platform Comparison

Verification

Submit for Verification Self-Assessment Tool

About

About AgentGoverning Press & Media

Contact

AG-067

Root Cause and Corrective Action Governance

Incident Response, Containment & Recovery ~22 min read AGS v2.1 · April 2026

EU AI Act SOX FCA NIST ISO 42001

2. Summary

Root Cause and Corrective Action Governance requires that every serious incident involving an AI agent is subjected to a structured root cause analysis process that identifies the actual cause of the failure — not merely the proximate trigger — and that corrective actions are defined, implemented, verified, and tracked to closure. The root cause analysis must go beyond the immediate technical failure to examine the systemic factors that allowed the failure to occur: inadequate controls, configuration errors, untested edge cases, gaps in monitoring, or organisational process failures. Corrective actions must be specific, measurable, time-bound, and verified through testing before the agent is returned to service (AG-068). The root cause and corrective action process must produce a formal record that is retained, reviewable by regulators, and feeds into the organisation's continuous improvement of AI governance. Without this dimension, incidents recur — the same root cause produces the same failure, the organisation applies the same superficial fix, and the cycle continues until a regulator or a catastrophic loss breaks it.

3. Example

Scenario A — Superficial Root Cause Leads to Recurring Incident: A customer-facing AI agent handling insurance claims incorrectly denies 234 valid claims over a 2-week period. The initial investigation identifies the proximate cause: a reference data update on day 1 of the period changed the format of policy type codes from 3-character to 5-character strings, and the agent's validation logic rejected the new format as invalid. The corrective action is to update the validation logic to accept 5-character codes. The fix is deployed, and the agent is returned to service. Three months later, a similar reference data update changes claim category codes, and the agent incorrectly denies another 189 valid claims. The root cause was never the specific format change — it was the absence of a contract between the reference data system and the agent that defines the expected data format and provides change notification. The superficial fix addressed one symptom; the systemic root cause was untouched.

What went wrong: The root cause analysis stopped at the proximate trigger (the format change) without examining the systemic factor (the absence of an interface contract with change notification). The corrective action fixed one instance of the problem without preventing future instances. No validation tested whether the corrective action addressed the root cause rather than just the symptom. Consequence: 189 additional incorrectly denied claims, reputational damage, regulatory scrutiny, and the eventual realisation that the original root cause analysis was inadequate — requiring a second, more thorough investigation at greater cost and with less available evidence (AG-066 retention notwithstanding).

Scenario B — Corrective Action Not Verified Before Return to Service: A financial-value AI agent executing foreign exchange trades is contained after executing 12 trades at prices that deviated from the mid-market rate by more than the 0.5% tolerance defined in its mandate. Root cause analysis determines that the agent's pricing model was using a stale exchange rate feed — the feed provider changed the API endpoint, and the agent's fallback logic was using a cached rate from 4 hours earlier. The corrective action is to update the API endpoint configuration and add a staleness check that rejects rates older than 60 seconds. The configuration change and staleness check are deployed, and the agent is returned to service based on a code review of the changes. No testing is performed. On return to service, the staleness check works correctly but has an unintended interaction with the agent's error handling: when a rate is rejected as stale, the error handler retries the request to the old (non-functional) endpoint rather than the new one, creating an infinite retry loop that consumes all available connections to the rate feed. The agent freezes, and 47 pending trades fail to execute within the required settlement window.

What went wrong: The corrective action was deployed without verification testing. The code review confirmed that the staleness check was correctly implemented but did not test the interaction between the staleness check and the existing error handling. The corrective action introduced a new failure mode that was not present in the original incident. Consequence: 47 failed settlements, counterparty claims, regulatory finding for inadequate change control, and a second incident investigation required for the failure introduced by the corrective action from the first incident.

Scenario C — Root Cause Analysis Omits Organisational Factors: A safety-critical AI agent monitoring air quality in an underground mine triggers a false evacuation after misinterpreting a sensor calibration test as an actual gas leak. The root cause analysis determines that the agent did not have access to the maintenance schedule and could not distinguish between a planned calibration event and an actual reading. The corrective action integrates the maintenance schedule with the agent's input data so it can suppress readings during calibration. However, the root cause analysis does not examine why the maintenance team did not notify the agent operations team of the calibration, why there was no standard operating procedure requiring such notification, or why the agent was designed to trigger evacuation on a single sensor reading without corroboration. Six months later, a different maintenance activity (equipment testing) causes another false evacuation because the corrective action addressed only the specific case of sensor calibration, not the general case of maintenance activities affecting agent inputs.

What went wrong: The root cause analysis was technically focused and did not examine organisational factors. The corrective action was narrowly scoped to the specific trigger (sensor calibration) rather than the general vulnerability (uncoordinated maintenance activities affecting agent inputs). No organisational process change was implemented to require maintenance teams to coordinate with agent operations. Consequence: recurring false evacuations, mine production disruption (each false evacuation costs approximately £180,000 in lost production), erosion of trust in the AI monitoring system, eventual reversion to fully manual monitoring.

4. Requirement Statement

Scope: This dimension applies to all serious incidents classified under AG-064 at any severity level. The scope includes incidents that were successfully contained (AG-065) and those where containment was partial or delayed. The scope extends to near-miss events — incidents where the conditions for a serious failure existed but the failure did not materialise due to coincidental factors rather than design controls. Near-misses are within scope because the root cause analysis may reveal systemic vulnerabilities that will produce an actual failure under different circumstances. The scope includes multi-agent incidents where the root cause spans multiple agents, external system failures that the agent governance should have detected or mitigated, and organisational process failures that enabled the technical failure.

4.1. A conforming system MUST initiate a structured root cause analysis for every incident classified as Severity 1 or Severity 2 under AG-064, beginning within 24 hours of incident containment and completing within 15 business days for Severity 1 and 30 business days for Severity 2.

4.2. A conforming system MUST ensure that root cause analysis examines at least three layers: the proximate technical cause (what directly triggered the failure), the contributing technical factors (what conditions allowed the proximate cause to produce the observed impact), and the systemic organisational factors (what governance, process, or oversight gaps allowed the contributing factors to exist).

4.3. A conforming system MUST produce a formal root cause analysis report that documents: the incident timeline, the evidence examined (referencing AG-066 records), the determined root cause at each layer, the contributing factors, the corrective actions defined, and the rationale linking each corrective action to a specific root cause finding.

4.4. A conforming system MUST define corrective actions that are specific (addressing a defined root cause finding), measurable (with defined success criteria), time-bound (with a defined implementation deadline), and assigned (to a named responsible individual or team).

4.5. A conforming system MUST verify corrective actions through testing before the affected agent is returned to service per AG-068 — verification must demonstrate that the specific root cause no longer produces the observed failure and that the corrective action does not introduce new failure modes.

4.6. A conforming system MUST track corrective actions to closure, with evidence of implementation and verification recorded for each action, and with escalation to senior management when implementation deadlines are missed.

4.7. A conforming system SHOULD implement a corrective action effectiveness review at 30, 90, and 180 days after implementation to verify that the corrective action remains effective in production conditions and that the root cause has not recurred.

4.8. A conforming system SHOULD maintain a root cause taxonomy that categorises historical root causes to support trend analysis — identifying recurring root cause categories enables systemic improvements rather than incident-by-incident fixes.

4.9. A conforming system SHOULD conduct root cause analysis for Severity 3 incidents where the incident reveals a novel failure mode, a gap in existing controls, or a pattern of recurring low-severity incidents that may indicate a systemic issue.

4.10. A conforming system MAY implement automated root cause hypothesis generation using AG-066 forensic evidence, producing preliminary root cause candidates that human investigators can evaluate and refine.

5. Rationale

Root Cause and Corrective Action Governance addresses the question that determines whether an organisation learns from its AI agent failures or is condemned to repeat them: "Do we actually understand why this happened, and have we fixed the real problem?"

The distinction between proximate cause and root cause is fundamental. The proximate cause of an incident is the immediate trigger — a corrupted data feed, a misconfigured threshold, a prompt injection. The root cause is the systemic condition that allowed the proximate cause to produce the observed impact — the absence of input validation, the lack of configuration change control, the failure to test adversarial scenarios. Fixing the proximate cause addresses one instance of the problem. Fixing the root cause prevents the class of problems.

AI agent incidents are particularly vulnerable to superficial root cause analysis because the proximate cause is often technically interesting and apparently sufficient. "The agent received a corrupted data feed and produced incorrect outputs" is a complete narrative — it explains what happened. But it does not explain why the agent did not detect the corruption, why there was no input validation, why the data feed had no integrity check, or why the monitoring system did not detect the output anomaly. Each of these "why" questions reveals a systemic factor that, if unaddressed, will produce a different incident with a different proximate cause but the same root cause.

The corrective action verification requirement reflects the reality that corrective actions can themselves introduce new failure modes. An AI agent is a complex system operating in a complex environment. Changes to one component may have unexpected interactions with other components. A staleness check that works correctly in isolation may interact with error handling in unexpected ways. A format validation that accepts the new data format may reject a future format that the old validation would have accepted. Corrective actions must be tested not only for their intended effect but for their unintended interactions, and this testing must occur before the agent is returned to service.

The requirement for organisational root cause analysis acknowledges that many AI agent incidents have organisational causes. The technical failure is the effect; the organisational process gap is the cause. An agent that receives corrupted data failed technically, but the root cause may be an organisational failure to establish data quality contracts between teams, to include the agent operations team in change management processes, or to test the agent's resilience to data quality degradation. Without examining the organisational layer, corrective actions remain purely technical and leave the systemic vulnerability intact.

6. Implementation Guidance

AG-067 establishes the root cause analysis and corrective action process as a mandatory governance function — not an optional best practice. The process should be defined in advance, with clear roles, responsibilities, timelines, and quality standards. Root cause analysis conducted under time pressure during an active incident is prone to the same superficial analysis that this dimension aims to prevent. The process should be initiated after containment (AG-065) has stabilised the situation, using evidence preserved by AG-066.

The root cause analysis methodology should be structured and repeatable. Recommended methodologies include the "5 Whys" technique (iteratively asking "why" to move from proximate cause to root cause), Ishikawa (fishbone) diagrams for multi-factor analysis, and fault tree analysis for complex system interactions. The methodology should require analysis at three layers: technical, process, and organisational. Each finding should be linked to specific evidence from the AG-066 forensic record.

Recommended patterns:

Structured root cause analysis template. Define a standard template that every root cause analysis must follow, including: incident summary, timeline of events (with AG-066 evidence references), proximate cause determination, contributing factor analysis, systemic root cause determination (using the 5 Whys or equivalent methodology), corrective actions with traceability to root cause findings, and risk assessment for each corrective action (including the risk of unintended consequences). The template enforces completeness and prevents investigators from stopping at the proximate cause.
Independent investigation for Severity 1 incidents. For Severity 1 incidents, the root cause analysis should be conducted by an investigation team that is independent of the team that developed, deployed, or operated the affected agent. This independence prevents cognitive biases — the team that built the system may unconsciously steer the investigation away from findings that implicate their design decisions. Independence can be achieved through an internal investigation function, a cross-team investigation, or an external investigation firm for critical incidents.
Corrective action impact analysis. Before implementing a corrective action, conduct a formal impact analysis that evaluates: what other components or processes the corrective action will affect, what failure modes the corrective action could introduce, what testing is required to verify both the intended effect and the absence of unintended effects, and what the rollback plan is if the corrective action causes a new incident. This analysis should be reviewed and approved before implementation.
Root cause taxonomy and trend analysis. Maintain a taxonomy of root cause categories (e.g., input validation gap, configuration management gap, adversarial resistance gap, monitoring gap, organisational process gap, training data artefact, model drift). Categorise every determined root cause. Review the taxonomy quarterly to identify trends — if 40% of incidents in the past quarter had root causes in the "input validation gap" category, this indicates a systemic weakness that requires a programme-level response, not just incident-level corrective actions.
Corrective action verification environment. Maintain a pre-production verification environment that mirrors the production environment closely enough to support meaningful testing of corrective actions. The environment should include realistic data volumes, realistic concurrent load, and realistic integration with dependent systems. Corrective actions must pass verification in this environment before deployment to production. The verification test suite should include: regression testing (confirming the corrective action fixes the root cause), interaction testing (confirming the corrective action does not introduce new failure modes), and adversarial testing (confirming the corrective action does not create new attack surfaces).

Anti-patterns to avoid:

Stopping at the proximate cause. "The data feed was corrupted" is not a root cause — it is the starting point of the investigation. The root cause analysis must continue: Why was the corruption not detected? Why was there no input validation? Why was the data feed not covered by the monitoring system? Each "why" reveals a deeper systemic factor.
Defining corrective actions without traceability to root cause findings. A corrective action that is not traceable to a specific root cause finding may address a symptom rather than a cause, may be unnecessary (adding controls without a clear rationale), or may be insufficient (missing a root cause finding that requires its own corrective action). Every corrective action must be linked to a specific finding.
Deploying corrective actions without verification testing. Code review is not verification testing. A corrective action may be logically correct in isolation but produce unintended interactions in the production environment. Verification testing must exercise the corrective action in a realistic environment before production deployment.
Treating root cause analysis as a blame exercise. Root cause analysis that focuses on identifying individuals to blame produces defensive behaviour, suppressed information, and superficial findings. The analysis should focus on systemic factors — what conditions allowed the failure — not on individual fault. Organisations that punish individuals for honest mistakes create incentives to hide mistakes, which is the opposite of the transparency that AG-067 requires.
Closing corrective actions at implementation without effectiveness verification. Implementing a corrective action is not the same as verifying its effectiveness. The corrective action may be implemented incorrectly, may not address the root cause as intended, or may lose effectiveness over time as the environment changes. Effectiveness reviews at 30, 90, and 180 days provide ongoing verification.

Industry Considerations

Financial Services. FCA expectations for incident management include root cause analysis that identifies systemic issues and corrective actions that are tracked to completion. The Senior Managers Regime requires that the responsible Senior Manager can demonstrate that root cause analysis was thorough, corrective actions were appropriate, and implementation was verified. For incidents involving market conduct (e.g., best execution failures, market manipulation), the root cause analysis may need to be disclosed to the FCA and may be subject to skilled person review under Section 166 of the Financial Services and Markets Act. Corrective actions must be reflected in the firm's risk register and control framework.

Healthcare. For incidents involving patient safety, root cause analysis must follow established patient safety investigation methodologies (e.g., the NHS Serious Incident Framework or equivalent). The analysis must consider clinical pathway impacts — an AI agent that provides incorrect clinical decision support may have affected patient treatment decisions downstream. Corrective actions must be reviewed by clinical governance before implementation to ensure they do not introduce clinical safety risks. The investigation record must be retained as part of the clinical governance record.

Critical Infrastructure. For incidents in critical infrastructure, root cause analysis must include physical process safety analysis alongside AI-specific analysis. A root cause in the agent's reasoning may have physical safety implications that require process safety engineering review. Corrective actions that modify the agent's behaviour in a physical control context must be validated through process safety analysis (e.g., HAZOP review) before implementation. IEC 61511 requirements for safety instrumented systems may apply to corrective actions that affect safety functions.

Maturity Model

Basic Implementation — The organisation conducts root cause analysis for Severity 1 incidents, documented in a free-form report. The analysis identifies the proximate technical cause. Corrective actions are defined and tracked in a ticketing system. Verification consists of code review and basic functional testing. No formal methodology is prescribed — the quality of analysis depends on the investigator's expertise. Corrective actions are closed when implemented. No effectiveness review is conducted. This level meets the minimum mandatory requirements but is vulnerable to superficial analysis, inconsistent quality, and recurring incidents from unaddressed systemic root causes.

Intermediate Implementation — Root cause analysis follows a structured methodology (5 Whys, Ishikawa, or fault tree) applied to all Severity 1 and 2 incidents. The analysis template requires three-layer examination (technical, process, organisational). Corrective actions are traceable to specific root cause findings. Verification testing is conducted in a pre-production environment before production deployment. Corrective action effectiveness reviews are conducted at 30 and 90 days. Root causes are categorised in a taxonomy for trend analysis. Severity 1 investigations are conducted by an independent team. The root cause taxonomy is reviewed quarterly to identify systemic trends.

Advanced Implementation — All intermediate capabilities plus: automated root cause hypothesis generation from AG-066 forensic evidence feeds preliminary candidates to human investigators, reducing time to root cause. Corrective action impact analysis is formally conducted and approved before implementation. Effectiveness reviews at 30, 90, and 180 days verify sustained effectiveness. Root cause trend analysis drives programme-level improvements — when a root cause category reaches a defined threshold, a systemic improvement programme is initiated. The organisation can demonstrate to regulators a declining trend in incident recurrence rates attributable to effective root cause analysis and corrective action. Cross-organisation root cause sharing (anonymised) contributes to industry-wide learning.

7. Evidence Requirements

Required artefacts:

Root cause analysis reports. Formal reports for each Severity 1 and 2 incident, following the structured template, documenting: incident timeline with AG-066 evidence references, proximate cause, contributing factors, systemic root cause at each analysis layer, corrective actions with traceability, and risk assessment for each corrective action. Format: structured document with evidence cross-references.
Corrective action register. A tracked register of all corrective actions, showing: the originating incident, the root cause finding addressed, the responsible individual or team, the implementation deadline, the current status, and the verification evidence. The register must support audit queries such as "all open corrective actions for Severity 1 incidents" and "all corrective actions overdue by more than 5 business days."
Verification test results. Results from pre-deployment verification testing of each corrective action, including: regression test results confirming the root cause is addressed, interaction test results confirming no new failure modes, and adversarial test results where applicable.
Effectiveness review records. Records from 30, 90, and 180-day effectiveness reviews, documenting whether the corrective action remains effective, whether the root cause has recurred, and any additional corrective actions required.
Root cause taxonomy and trend analysis. The current taxonomy, categorisation of all historical root causes, and trend analysis reports showing root cause category frequencies over time.

Retention requirements:

Root cause analysis reports and corrective action records: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.
Effectiveness review records: retained for the same period as the root cause analysis report.
Root cause taxonomy and trend analysis: maintained current and retained indefinitely to support long-term trend analysis.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Root cause analysis reports must be available to the responsible Senior Manager at all times. Corrective action status must be reportable in real time.

8. Test Specification

Testing AG-067 compliance requires verification that the root cause analysis process is structured, thorough, and effective, and that corrective actions are verified and tracked.

Test 8.1: Root Cause Analysis Depth

Stimulus: Inject a synthetic incident with a known multi-layer root cause (e.g., proximate cause: corrupted input data; contributing factor: no input validation; systemic root cause: no interface contract between data provider and agent operations team). Execute the root cause analysis process. Evaluate the resulting report.
Expected behaviour: The analysis identifies all three layers of root cause. The report documents the technical, process, and organisational factors.
Pass criteria: All three root cause layers are identified. Corrective actions address each layer. Findings are linked to specific evidence from the forensic record.
Fail criteria: The analysis stops at the proximate cause, or any root cause layer is not examined.

Test 8.2: Corrective Action Traceability

Stimulus: Review a completed root cause analysis report and its corrective actions. Verify that every corrective action is traceable to a specific root cause finding, and that every root cause finding has at least one corrective action.
Expected behaviour: Full bidirectional traceability between root cause findings and corrective actions.
Pass criteria: No corrective action without a root cause finding. No root cause finding without a corrective action. The rationale linking each pair is documented.
Fail criteria: Any corrective action is untraceable, or any root cause finding has no corresponding corrective action.

Test 8.3: Corrective Action Verification Completeness

Stimulus: Review verification test results for a corrective action. Confirm that testing includes regression testing (the root cause no longer produces the failure), interaction testing (no new failure modes introduced), and where applicable, adversarial testing (no new attack surfaces created).
Expected behaviour: All three testing categories are represented in the verification results.
Pass criteria: Regression, interaction, and adversarial tests are documented with pass/fail results. All tests pass before production deployment.
Fail criteria: Any testing category is missing, or a corrective action is deployed to production with failed verification tests.

Test 8.4: Timeline Compliance

Stimulus: Review a set of historical root cause analyses. Verify that Severity 1 analyses began within 24 hours of containment and completed within 15 business days, and Severity 2 analyses completed within 30 business days.
Expected behaviour: All analyses meet the defined timelines, or any overruns are documented with escalation evidence.
Pass criteria: 90% or more of analyses complete within the mandated timeline. All overruns are documented with senior management escalation.
Fail criteria: More than 10% of analyses exceed the mandated timeline without documented escalation.

Test 8.5: Corrective Action Tracking to Closure

Stimulus: Review the corrective action register. Verify that all corrective actions have a current status, that overdue actions have escalation records, and that closed actions have implementation and verification evidence.
Expected behaviour: The register is current. No corrective action is in an indeterminate state. Closed actions have evidence. Overdue actions have escalation.
Pass criteria: 100% of corrective actions have current status. 100% of closed actions have implementation and verification evidence. 100% of overdue actions have escalation records.
Fail criteria: Any corrective action has no current status, or any closed action lacks evidence, or any overdue action lacks escalation.

Test 8.6: Effectiveness Review Execution

Stimulus: Review effectiveness review records for corrective actions implemented more than 180 days ago. Verify that 30, 90, and 180-day reviews were conducted and documented.
Expected behaviour: All three reviews are documented for each applicable corrective action.
Pass criteria: 90% or more of applicable corrective actions have all three effectiveness reviews documented.
Fail criteria: More than 10% of applicable corrective actions are missing one or more effectiveness reviews.

Test 8.7: Root Cause Recurrence Analysis

Stimulus: Review the root cause taxonomy and trend analysis. Identify the top 3 root cause categories by frequency. Verify that recurring categories have triggered systemic improvement actions.
Expected behaviour: Recurring root cause categories are identified, trend analysis is documented, and categories exceeding a defined recurrence threshold have associated systemic improvement programmes.
Pass criteria: Trend analysis is current. Categories exceeding the recurrence threshold have documented systemic improvement programmes.
Fail criteria: No trend analysis exists, or recurring categories above threshold have no systemic improvement programme.

Conformance Scoring

Score 0: No structured root cause analysis process exists — incidents are resolved by applying fixes to the immediate symptom without investigation of underlying causes.
Score 1: Root cause analysis is conducted for major incidents but follows no structured methodology. Analysis typically identifies the proximate cause. Corrective actions are defined but not systematically verified or tracked to closure.
Score 2: Structured root cause analysis methodology is applied to all Severity 1 and 2 incidents, examining technical, process, and organisational layers. Corrective actions are traceable to root cause findings, verified through testing before deployment, and tracked to closure.
Score 3: Verified by independent review — root cause analysis quality, corrective action effectiveness, and trend analysis have been validated by an independent party. Incident recurrence rates demonstrate declining trends. Systemic improvement programmes address recurring root cause categories. The organisation contributes to cross-industry root cause learning.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9 (Risk Management System)	Direct requirement
EU AI Act	Article 72 (Post-Market Monitoring)	Supports compliance
DORA	Article 13 (Learning and Evolving)	Direct requirement
DORA	Article 19 (Reporting — Final Report)	Direct requirement
FCA SYSC	6.1.1R (Systems and Controls)	Direct requirement
NIST AI RMF	MANAGE 4.1, MANAGE 4.2	Supports compliance
ISO 42001	Clause 10.2 (Nonconformity and Corrective Action)	Direct requirement
SOX	Section 404 (Internal Controls)	Supports compliance
NIS2 Directive	Article 23 (Final Report)	Supports compliance

EU AI Act — Article 9 (Risk Management System)

Article 9 requires a risk management system that includes "estimation and evaluation of the risks that may emerge when the high-risk AI system is used in accordance with its intended purpose and under conditions of reasonably foreseeable misuse" and "adoption of appropriate and targeted risk management measures." AG-067 implements the feedback loop within the risk management system: when an incident reveals a risk that was not previously managed, the root cause analysis identifies the gap, and the corrective action closes it. Without this feedback loop, the risk management system is static and does not improve based on operational experience. Article 9(4)(d) specifically requires that risk management measures "are implemented with a view to eliminating or reducing risks as far as possible through adequate design and development" — root cause analysis is the mechanism by which design and development are improved based on real-world failure data.

DORA — Article 13 (Learning and Evolving)

Article 13 requires financial entities to "incorporate lessons learnt from ICT-related incidents" into their ICT risk management framework. AG-067 directly implements this requirement by ensuring that every serious incident produces a formal root cause analysis with corrective actions that are tracked to closure. The requirement for root cause taxonomy and trend analysis supports the broader Article 13 obligation to identify patterns and systemic weaknesses, not just individual incident responses.

DORA — Article 19 (Reporting — Final Report)

Article 19 requires a final report on major ICT-related incidents that includes "the root cause analysis, regardless of whether mitigating actions have already been completed." AG-067 ensures that the root cause analysis is conducted with the rigour and documentation necessary to satisfy this reporting requirement. The root cause analysis report produced under AG-067 forms the basis of the DORA final report.

ISO 42001 — Clause 10.2 (Nonconformity and Corrective Action)

Clause 10.2 requires organisations to react to nonconformities, evaluate the need for action to eliminate causes, implement corrective actions, review their effectiveness, and make changes to the AI management system if necessary. AG-067 implements Clause 10.2 for AI agent incidents by providing the structured process for root cause determination, corrective action definition, implementation verification, and effectiveness review. The requirement for root cause taxonomy and trend analysis supports the Clause 10.2 obligation to "make changes to the AI management system if necessary" by identifying systemic issues that require management system changes.

FCA SYSC — 6.1.1R (Systems and Controls)

The FCA requires firms to establish and maintain adequate policies and procedures sufficient to ensure compliance. For AI agent deployments, this includes the ability to learn from incidents and improve controls. The FCA has stated that it expects firms to demonstrate a "continuous improvement" approach to AI governance — AG-067's root cause analysis and corrective action process is the mechanism that delivers this improvement. The Senior Managers Regime requires that responsible individuals can demonstrate that incidents were thoroughly investigated and that corrective actions were appropriate and effective.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Organisation-wide — recurring incidents from unaddressed root causes affect all agent deployments sharing the same systemic weakness

Consequence chain: Without structured root cause analysis and corrective action governance, the organisation's incident response is purely reactive — each incident is treated as an isolated event, the proximate cause is fixed, and the systemic vulnerability remains. The immediate consequence is incident recurrence: the same class of failure produces different specific incidents, each requiring containment, investigation, and remediation. The operational impact is escalating: each recurring incident consumes investigation resources, disrupts operations, and erodes confidence in the AI agent deployment. The regulatory impact compounds: regulators view recurring incidents as evidence of inadequate governance. A single incident may be treated as an operational failure; recurring incidents with the same root cause class are treated as a governance failure — a materially more serious finding. Under DORA Article 13, failure to learn from incidents is an independent regulatory breach. Under the EU AI Act, failure to update the risk management system based on operational experience violates Article 9. The financial impact grows: each recurring incident carries its own direct costs (containment, investigation, remediation, customer impact) plus the cumulative cost of repeated disruption and the eventual cost of a comprehensive remediation programme that should have been initiated after the first incident. The business consequence includes regulatory enforcement for inadequate governance, escalating incident costs, loss of organisational confidence in AI agent capabilities, potential moratorium on new agent deployments pending governance improvements, and personal liability for senior managers who cannot demonstrate that the organisation learned from its failures.

Cite this protocol

AgentGoverning. (2026). AG-067: Root Cause and Corrective Action Governance. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-067

← Previous Protocol

AG-066

Forensic Replay and Evidence Preservation Governance

Next Protocol →

AG-068

Return-to-Service Reauthorisation Governance