AG-419: Adverse Event Severity Matrix Governance

2. Summary

Adverse Event Severity Matrix Governance requires every organisation operating AI agents to maintain a formally governed, multi-dimensional severity classification matrix that scores agent-caused incidents across safety impact, governed exposure, rights infringement, legal liability, and reputational damage axes. The matrix must produce a deterministic composite severity level (Critical, High, Medium, Low, Informational) for every incident, ensuring that response urgency, notification obligations, escalation paths, and resource allocation are calibrated to actual harm rather than subjective judgement. Without a governed severity matrix, organisations either over-escalate trivial events — consuming crisis capacity on non-crises — or under-escalate catastrophic failures — allowing rights violations or safety hazards to persist while classified as routine operational issues.

3. Example

Scenario A — Misclassified Safety Incident Causes Delayed Response: A warehouse robotics agent controlling 12 autonomous picking units develops a path-planning fault that causes one unit to collide with a human worker, resulting in a minor shoulder injury. The on-call engineer classifies the incident as "Medium — Equipment Malfunction" using an informal severity guide that does not distinguish between equipment-only collisions and human-contact collisions. Under this classification, the response SLA is 48 hours. The remaining 11 units continue operating with the same path-planning model. Within 36 hours, a second collision occurs — this time pinning a worker against a shelving unit and causing a fractured pelvis requiring surgical intervention. Post-incident review reveals that the severity matrix used by the organisation had no separate axis for physical safety; all collisions were classified under a single "equipment" category.

What went wrong: The severity matrix lacked a safety impact axis that would have elevated a human-contact collision to Critical severity with immediate response (halt all units, root-cause analysis within 2 hours). A single-axis classification collapsed a safety-critical event into a routine equipment category. The 48-hour SLA was appropriate for equipment-only faults but catastrophically slow for a pattern that endangered human life. Consequence: one surgical injury, Health and Safety Executive investigation, £1.2 million in liability claims, 3-week facility shutdown, and criminal health-and-safety prosecution of the operations director.

Scenario B — Financial Severity Axis Missing From Matrix Causes Unreported Exposure: A financial-value agent executing foreign exchange hedging for a mid-sized manufacturer processes a malformed rate feed and executes 47 hedging contracts at a rate 340 basis points below market. The total mark-to-market loss is £2.8 million. The incident management team classifies the event as "High — Data Feed Error" because the severity matrix classifies incidents by root cause (data, logic, infrastructure) rather than by consequence. The "Data Feed Error" classification triggers a 24-hour response SLA and no board notification. The trading desk does not unwind the positions for 18 hours. During that period, the rate moves a further 85 basis points against the position, increasing the loss to £3.4 million. The board learns of the loss 6 days later, after the monthly risk report is produced.

What went wrong: The severity matrix classified by root cause rather than by governed exposure. A £2.8 million loss classified as a data feed error received the same response as a £500 data feed error that caused no trading impact. The matrix had no governed exposure axis that would have elevated any loss exceeding £500,000 to Critical severity with immediate board notification and position freeze. Consequence: £3.4 million total loss (£600,000 of which accrued during the delayed response), FCA supervisory finding for inadequate incident classification under SYSC 6.1.1R, board-level governance review, and £180,000 in external remediation costs.

Scenario C — Rights-Impact Axis Absent Allows Systematic Discrimination to Persist: A public-sector benefits-assessment agent processes 14,000 disability benefit applications per month. A model update introduces a regression that systematically under-scores mobility impairments affecting wheelchair users, reducing award amounts by an average of £127 per month. The incident is classified as "Low — Model Accuracy Drift" because the severity matrix uses only a technical accuracy metric (overall accuracy decreased by 0.3%, which is below the "Medium" threshold of 1.0%). The regression persists for 4 months before an advocacy group files a formal complaint. During those 4 months, 2,340 wheelchair users receive reduced benefits totalling £296,000 in aggregate underpayment. The Equality and Human Rights Commission opens an investigation into indirect disability discrimination.

What went wrong: The severity matrix had no rights-impact axis. A systematic bias affecting a protected characteristic was invisible to a matrix that measured only aggregate technical accuracy. The 0.3% overall accuracy drop masked a 23% accuracy drop for a specific protected group. A rights-impact axis would have flagged any differential impact exceeding 5% on a protected characteristic as Critical severity regardless of aggregate accuracy. Consequence: £296,000 in underpayments requiring retrospective correction, Equality Act 2010 investigation, reputational damage across disability advocacy networks, and 8 months of remediation including independent algorithmic audit.

4. Requirement Statement

Scope: This dimension applies to every organisation operating AI agents where agent actions or outputs can cause harm across any of the five severity axes: safety (physical injury, environmental damage, life-safety system degradation), financial (monetary loss, exposure, misallocation), rights (discrimination, privacy violation, due-process denial, autonomy infringement), legal (regulatory breach, contractual violation, criminal liability), and reputational (public trust erosion, brand damage, stakeholder confidence loss). The scope includes the matrix definition itself, its calibration process, its integration with incident management workflows, and its periodic validation. Organisations operating agents exclusively in sandboxed development environments with no production data or real-world effects may defer full implementation but must implement the matrix before any production deployment.

4.1. A conforming system MUST maintain a multi-dimensional severity matrix that classifies agent-related incidents across at least five independent axes: safety impact, governed exposure, rights infringement, legal liability, and reputational damage, with defined severity levels for each axis.

4.2. A conforming system MUST define at least four severity levels per axis (Critical, High, Medium, Low) with quantitative thresholds or qualitative criteria that are specific enough to produce consistent classification by different analysts reviewing the same incident.

4.3. A conforming system MUST implement a composite severity derivation rule that produces a single overall severity level from the individual axis scores, where the composite severity is at least as high as the maximum individual axis severity (i.e., a Critical rating on any single axis produces a Critical composite rating).

4.4. A conforming system MUST map each composite severity level to defined response parameters including: maximum time-to-acknowledge, maximum time-to-initial-response, escalation recipients, notification obligations, and resource mobilisation requirements.

4.5. A conforming system MUST integrate the severity matrix with the incident management workflow such that every agent-related incident is classified using the matrix before response actions are determined, and the classification is recorded as a permanent incident artefact.

4.6. A conforming system MUST validate the severity matrix at least annually through calibration exercises where multiple independent analysts classify the same set of historical or synthetic incidents, measuring inter-rater agreement and adjusting thresholds where agreement falls below 80%.

4.7. A conforming system MUST version-control the severity matrix, retaining all prior versions with change history, and re-classify any open incidents when the matrix is updated to determine whether the updated matrix changes their severity.

4.8. A conforming system SHOULD define domain-specific sub-axes within each primary axis to capture sector-specific severity factors (e.g., within the safety axis: near-miss, minor injury, major injury, fatality; within the rights axis: individual impact, group impact, systemic impact on protected characteristic).

4.9. A conforming system SHOULD implement automated pre-classification that uses incident metadata (error type, affected population size, monetary values, system identifiers) to suggest an initial severity classification before human review.

4.10. A conforming system SHOULD establish severity escalation triggers — conditions under which an incident's severity is automatically elevated (e.g., when the number of affected individuals exceeds a threshold, when governed exposure crosses a monetary boundary, or when a second incident of the same type occurs within a defined window).

4.11. A conforming system MAY implement real-time severity scoring that continuously updates the composite severity as new information about an incident becomes available, adjusting response parameters dynamically.

4.12. A conforming system MAY integrate the severity matrix with external reporting obligations, automatically determining whether an incident at a given severity level triggers mandatory regulatory notification under applicable frameworks (e.g., EU AI Act serious incident reporting, GDPR breach notification, DORA major ICT-related incident reporting).

5. Rationale

Incident severity classification is the single most consequential decision in the first minutes of an agent-related incident. The severity level determines how fast the organisation responds, who is notified, what resources are mobilised, and whether external parties (regulators, affected individuals, law enforcement) are informed. A misclassification by even one severity level can mean the difference between a 2-hour response and a 48-hour response — a delay that can allow a safety hazard to injure additional people, a governed exposure to compound, or a rights violation to affect thousands more individuals.

Traditional IT incident severity frameworks (SEV-1 through SEV-4, typically based on service availability and user impact) are inadequate for AI agent incidents for three reasons. First, AI agent incidents frequently cause harm along dimensions that IT frameworks do not measure. A model regression that introduces systematic bias against a protected group has zero impact on service availability but catastrophic impact on rights. An agent that executes financially harmful transactions at full service availability is a High-availability disaster. Second, AI agent incidents often have delayed or distributed impact. A classification error that affects 0.3% of cases may seem trivial in aggregate but may represent a 23% error rate for a specific population — a pattern visible only through a rights-impact axis. Third, AI agent incidents can cross category boundaries simultaneously. A single incident may cause physical injury (safety), financial loss (financial), privacy violation (rights), regulatory breach (legal), and media coverage (reputational). A single-axis framework cannot capture this multi-dimensional harm profile.

The governed severity matrix addresses these gaps by requiring explicit, independent scoring across all relevant harm dimensions. The composite severity rule — where the overall severity is at least the maximum of any individual axis — ensures that a Critical rights violation is treated as a Critical incident even if the financial and safety axes score Low. This prevents the dilution of rights and safety concerns by aggregate metrics, which is the failure mode observed in Scenario C.

Calibration is essential because severity classification is ultimately a human judgement applied under time pressure. Research in incident management consistently shows that different analysts will classify the same incident differently when thresholds are vague. Annual calibration exercises, where multiple analysts independently classify the same incidents and disagreements are resolved through threshold adjustment, produce the inter-rater reliability necessary for consistent, defensible classification. Without calibration, the matrix is a document rather than a functioning control.

Regulatory frameworks increasingly require structured incident classification. The EU AI Act Article 62 requires providers to report serious incidents involving high-risk AI systems, with reporting timelines tied to severity. DORA Article 19 requires financial entities to classify ICT-related incidents using defined criteria. The FCA expects firms to have incident classification frameworks that distinguish between incidents requiring immediate supervisory notification and those that do not. A governed severity matrix provides the structured classification mechanism that these frameworks require.

6. Implementation Guidance

The severity matrix should be implemented as a structured decision framework — not a prose policy document. Each axis should have defined levels with quantitative thresholds where possible and qualitative criteria where quantification is impractical. The matrix should be accessible to incident responders in real time, not locked in a policy library that requires login credentials and document navigation during a crisis.

Recommended patterns:

Axis-specific threshold tables. For each of the five axes, define a table mapping severity levels to specific thresholds. For the financial axis: Critical ≥ £1,000,000 or ≥ 10% of agent-managed portfolio; High ≥ £100,000; Medium ≥ £10,000; Low < £10,000. For the safety axis: Critical = fatality or life-threatening injury; High = major injury requiring hospitalisation; Medium = minor injury requiring medical treatment; Low = near-miss with no injury. For the rights axis: Critical = systematic impact on a protected characteristic affecting > 100 individuals; High = individual rights violation with material consequence; Medium = potential rights impact requiring investigation; Low = technical anomaly with no confirmed rights impact. Quantitative thresholds enable consistent, rapid classification.
Maximum-of-axes composite rule with mandatory escalation. The composite severity equals the maximum severity across all five axes. This is the simplest rule that prevents single-axis dilution. When an incident scores Critical on any axis, it is a Critical incident regardless of other axis scores. Document this rule explicitly so that analysts do not average or weight the axes.
Matrix integration into incident tooling. Embed the severity matrix directly into incident management tooling (ticketing systems, on-call platforms, runbooks). When an analyst creates an incident ticket, the tool presents the five axes and requires a severity selection for each before the ticket can be submitted. The tool computes the composite severity automatically and populates response parameters (SLA, escalation list, notification requirements). This prevents analysts from bypassing the matrix or classifying based on intuition.
Calibration exercise protocol. Annually, select 15-20 historical or synthetic incidents spanning all severity levels and all five axes. Have at least 5 analysts independently classify each incident using the matrix. Compute inter-rater agreement (Cohen's kappa or Fleiss' kappa for multi-rater). Where agreement falls below 80% on any axis, convene a calibration workshop to discuss disagreements, clarify thresholds, and adjust the matrix. Retain calibration results as evidence.
Severity re-evaluation triggers. Define conditions that trigger severity re-evaluation of an active incident: new information about scope (affected population increases), new information about impact (governed exposure revised upward), time-based escalation (incident unresolved beyond the response SLA), and pattern detection (second incident of same type within 30 days). Re-evaluation ensures that initial classification — often made with incomplete information — is corrected as the incident develops.

Anti-patterns to avoid:

Single-axis severity classification. Using only service availability, only financial impact, or only technical error type to classify severity. Single-axis frameworks are blind to harm dimensions outside their axis. An agent that is fully available while systematically violating rights is classified as "no incident" under an availability-only framework.
Root-cause-based classification. Classifying severity by what caused the incident (data error, model error, infrastructure failure) rather than by what harm the incident caused. Root cause determines remediation strategy, not response urgency. A data error that causes £3 million in losses requires the same response urgency as a logic error that causes £3 million in losses.
Averaging across axes. Computing composite severity as the average of individual axis scores. Averaging allows a Critical safety event to be diluted to Medium if other axes score Low. The composite must be at least the maximum of any individual axis.
Static thresholds without calibration. Defining thresholds once and never testing whether analysts apply them consistently. Thresholds that seem clear in a policy document become ambiguous when applied to real incidents under time pressure. Without calibration, classification consistency degrades over time.
Classification after response. Determining response actions first and then classifying severity to match, rather than classifying severity first and deriving response actions from the classification. This inverts the control logic and makes the matrix a documentation exercise rather than a decision framework.

Industry Considerations

Financial Services. Financial institutions must calibrate the financial axis to their specific scale — a £100,000 threshold that is appropriate for a retail bank may be too high for a small asset manager and too low for a global investment bank. The matrix must also account for regulatory notification thresholds: DORA requires reporting of major ICT-related incidents to competent authorities, and the financial axis thresholds should align with DORA's materiality criteria. The rights axis must cover consumer duty obligations under the FCA's Consumer Duty framework, where systematic detriment to retail customers constitutes a rights impact.

Healthcare and Safety-Critical. The safety axis requires granularity beyond binary injury/no-injury. Near-miss events in healthcare and industrial settings are leading indicators that must be classified and tracked. The severity matrix should include a near-miss severity level that, while not triggering full crisis response, triggers root-cause investigation and preventive action. Medical device regulations (MDR, FDA 21 CFR Part 803) impose specific adverse event reporting obligations that the matrix must integrate.

Public Sector and Rights-Sensitive. The rights axis requires particular attention in public-sector deployments where AI agents make or influence decisions affecting individuals' legal rights, benefits, or liberty. Impact on protected characteristics under equality legislation must be a first-class severity consideration, not subordinate to technical accuracy metrics. The matrix should define differential impact thresholds: any measurable differential impact on a protected group exceeding a defined threshold (e.g., 5% relative difference) triggers at least Medium severity on the rights axis regardless of aggregate performance.

Crypto and Web3. The financial axis must account for the volatility and irreversibility characteristic of blockchain-based transactions. A classification threshold of £100,000 for traditional finance may need to be lower for crypto operations where losses cannot be reversed and market conditions can change the exposure within minutes. The legal axis must account for the evolving and jurisdiction-dependent regulatory status of crypto assets.

Maturity Model

Basic Implementation — The organisation has defined a five-axis severity matrix with at least four severity levels per axis. Quantitative thresholds or qualitative criteria are documented for each level. The composite severity rule is defined and documented. Every agent-related incident is classified using the matrix before response actions are determined. The matrix is version-controlled with change history. Classification is manual — an analyst reviews the incident and selects severity levels for each axis using the documented criteria. Annual calibration exercises are conducted with at least 3 analysts and 10 test incidents.

Intermediate Implementation — All basic capabilities plus: the severity matrix is integrated into incident management tooling, requiring axis-by-axis classification before ticket submission. Automated pre-classification suggests initial severity based on incident metadata. Severity re-evaluation triggers are defined and monitored. Calibration exercises involve at least 5 analysts, 20 test incidents, and formal inter-rater agreement measurement. The matrix maps each severity level to specific response parameters (SLAs, escalation lists, notification requirements). Domain-specific sub-axes are defined for the organisation's primary risk domains.

Advanced Implementation — All intermediate capabilities plus: real-time severity scoring continuously updates composite severity as new information becomes available. The matrix integrates with regulatory notification obligations, automatically determining reporting requirements. Cross-incident pattern detection identifies severity escalation conditions (e.g., recurring incidents suggesting systemic failure). The matrix is independently audited annually. Severity classification data is analysed to identify systematic misclassification patterns and drive matrix refinement. Multi-jurisdictional severity mapping accounts for different regulatory thresholds across operating jurisdictions.

7. Evidence Requirements

Required artefacts:

Severity matrix definition. The current severity matrix showing all axes, severity levels, thresholds/criteria for each level, the composite severity derivation rule, and the response parameter mappings. Format: structured data (JSON, YAML, or database export) plus a human-readable rendering suitable for incident responder reference.
Severity matrix version history. Complete version history showing all changes to axes, thresholds, composite rules, and response mappings, with timestamps, authors, approvers, and change justifications. Minimum: all versions since initial adoption or 3 years, whichever is shorter.
Incident classification records. For every agent-related incident, the severity classification record showing each axis score, the composite severity, the classifying analyst, the classification timestamp, and any subsequent re-classifications with justification. These records must be permanent incident artefacts, not transient workflow data.
Calibration exercise records. Results of the most recent annual calibration exercise, showing: the test incident set, each analyst's independent classifications, inter-rater agreement scores, disagreements identified, threshold adjustments made, and the updated matrix version if changes resulted.
Response parameter compliance records. For each incident, a record showing whether the actual response times and escalation actions met the parameters defined by the classified severity level, with explanation for any deviations.

Retention requirements:

Severity matrix versions and calibration records: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.
Incident classification records: retained for the same period as the underlying incident records, and no less than the matrix retention period.

Access requirements:

Producible to regulators or auditors within 48 hours of request. The current severity matrix must be accessible to all incident responders in real time without requiring escalated access permissions. Evidence must exist as retained artefacts, not be reconstructable after the fact.

8. Test Specification

Test 8.1: Multi-Axis Classification Completeness

Stimulus: Submit 10 synthetic incidents spanning all five severity axes, including incidents where the primary harm is on the safety axis, the financial axis, the rights axis, the legal axis, and the reputational axis respectively. At least 2 incidents should have Critical severity on a non-obvious axis (e.g., a low-financial-impact incident with Critical rights impact).
Expected behaviour: Each incident is classified on all five axes with a severity level selected for each. No axis is left unscored. The composite severity reflects the maximum individual axis score.
Pass criteria: 100% of incidents are classified on all five axes. The composite severity equals the maximum individual axis severity for every incident. The two non-obvious Critical incidents are correctly classified as Critical composite severity.
Fail criteria: Any incident is classified on fewer than five axes, any composite severity is lower than the maximum individual axis severity, or any non-obvious Critical incident is classified below Critical composite.

Test 8.2: Inter-Rater Classification Consistency

Stimulus: Present 15 test incidents to at least 3 independent analysts. Each analyst classifies each incident using the severity matrix without consulting other analysts. Compute inter-rater agreement for each axis and for the composite severity.
Expected behaviour: Inter-rater agreement (measured by Cohen's kappa or equivalent multi-rater metric) is at least 0.60 (substantial agreement) for each axis and at least 0.70 for composite severity.
Pass criteria: Kappa ≥ 0.60 for each axis. Kappa ≥ 0.70 for composite severity. No single incident has a composite severity disagreement of more than one level (e.g., one analyst classifies Critical while another classifies Medium).
Fail criteria: Kappa < 0.60 on any axis, kappa < 0.70 on composite severity, or any incident has a composite severity disagreement exceeding one level.

Test 8.3: Composite Severity Derivation Correctness

Stimulus: Submit 20 incidents with pre-defined axis scores covering all permutations of severity levels across axes. Verify that the system computes the correct composite severity for each.
Expected behaviour: The composite severity equals the maximum individual axis severity in every case. An incident scoring (Low, Low, Critical, Low, Low) produces Critical composite. An incident scoring (High, Medium, High, Low, Medium) produces High composite.
Pass criteria: 100% of composite severity computations match the expected result. Zero derivation errors.
Fail criteria: Any composite severity is computed incorrectly, particularly any case where a Critical individual axis score does not produce a Critical composite.

Test 8.4: Response Parameter Mapping Enforcement

Stimulus: Classify 5 incidents at each composite severity level (Critical, High, Medium, Low). Verify that each classification triggers the correct response parameters: time-to-acknowledge SLA, escalation recipients, notification obligations.
Expected behaviour: Critical incidents trigger the most aggressive response parameters. Each severity level maps to its defined parameters. No severity level has undefined response parameters.
Pass criteria: 100% of classified incidents have correctly mapped response parameters. All Critical incidents trigger immediate escalation and notification within the defined SLA. No incident has missing or undefined response parameters.
Fail criteria: Any incident has incorrect or missing response parameters, or any Critical incident does not trigger immediate escalation.

Test 8.5: Severity Matrix Version Control and Re-Classification

Stimulus: Update the severity matrix (e.g., lower the financial axis Critical threshold from £1,000,000 to £500,000). Verify that: (a) the prior matrix version is retained with full change history, (b) all open incidents are re-evaluated against the updated matrix, and (c) any incident whose composite severity changes under the updated matrix is flagged for review.
Expected behaviour: The prior version is archived with an immutable version identifier. Open incidents are re-classified using the updated matrix. Incidents with changed severity are flagged.
Pass criteria: Prior version is retrievable and unchanged. All open incidents are re-evaluated. At least one incident with governed exposure between £500,000 and £1,000,000 is re-classified from High to Critical and flagged.
Fail criteria: Prior version is lost or modified, open incidents are not re-evaluated, or severity changes are not flagged.

Test 8.6: Calibration Exercise Execution

Stimulus: Execute a calibration exercise using 15 synthetic incidents and at least 3 analysts. Measure inter-rater agreement. Identify axes where agreement falls below 80%. Verify that disagreements trigger threshold review and matrix adjustment where warranted.
Expected behaviour: The calibration exercise produces measurable inter-rater agreement scores. Axes with agreement below 80% are identified. Threshold adjustments or clarifications are documented for low-agreement axes.
Pass criteria: Calibration exercise is completed with documented results. Inter-rater agreement is computed for each axis. Low-agreement axes are identified and addressed through threshold adjustment or criteria clarification. Updated matrix version (if changes are made) is published with calibration exercise as the change justification.
Fail criteria: Calibration exercise is not completed, agreement is not measured, or low-agreement axes are identified but not addressed.

Test 8.7: Incident Classification Integration

Stimulus: Attempt to create an agent-related incident record in the incident management system without completing the severity matrix classification. Attempt to proceed directly to response actions without a classified severity.
Expected behaviour: The system requires severity classification before the incident record can be submitted or response actions initiated. Classification cannot be bypassed.
Pass criteria: Incident creation without classification is blocked. Response actions cannot be initiated without a classified severity. The system enforces classification as a mandatory workflow step.
Fail criteria: An incident record can be created or response actions initiated without severity classification.

Conformance Scoring

Score 0: No severity matrix exists — incidents are classified using ad hoc judgement, a single-axis framework (e.g., service availability only), or not classified at all.
Score 1: A multi-axis severity matrix exists in document form and is used for incident classification, but it is not integrated into incident management tooling, composite severity is computed manually, and no calibration exercises have been conducted.
Score 2: The severity matrix is integrated into incident management tooling with mandatory axis-by-axis classification. The composite severity is computed automatically. Response parameters are mapped to each severity level. Annual calibration exercises measure inter-rater agreement and drive threshold adjustments. The matrix is version-controlled with re-classification of open incidents on update.
Score 3: Verified by independent audit — an independent party has validated the matrix's completeness, calibration consistency, response parameter enforcement, and version control integrity. Real-time severity scoring updates composite severity as new information emerges. Regulatory notification obligations are automatically determined from the classification. Cross-incident pattern detection identifies systemic severity trends.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 62 (Reporting of Serious Incidents)	Direct requirement
EU AI Act	Article 9 (Risk Management System)	Supports compliance
SOX	Section 404 (Internal Controls Over Financial Reporting)	Supports compliance
FCA SYSC	6.1.1R (Systems and Controls)	Direct requirement
NIST AI RMF	GOVERN 1.5, MANAGE 4.1 (Incident Response)	Supports compliance
ISO 42001	Clause 8.4 (Nonconformity and Corrective Action)	Supports compliance
DORA	Article 18 (Classification of ICT-Related Incidents)	Direct requirement

EU AI Act — Article 62 (Reporting of Serious Incidents)

Article 62 requires providers and deployers of high-risk AI systems to report serious incidents to the relevant market surveillance authorities. "Serious incident" is defined in Article 3(49) as an incident that directly or indirectly leads to death, serious damage to health, serious and irreversible disruption of critical infrastructure, breach of fundamental rights obligations, or serious damage to property or the environment. The severity matrix required by AG-419 provides the classification mechanism that determines whether an incident meets the Article 62 reporting threshold. Without a structured severity matrix, the determination of whether an incident is "serious" under Article 62 becomes an ad hoc judgement made under time pressure — exactly the conditions that produce misclassification. The safety axis maps directly to the death and health damage criteria; the rights axis maps to the fundamental rights breach criterion; the financial axis, combined with the legal axis, maps to the property damage and critical infrastructure criteria.

SOX — Section 404 (Internal Controls Over Financial Reporting)

For SOX-subject organisations, agent-related incidents that affect financial reporting accuracy, completeness, or timeliness are control failures that may constitute material weaknesses. The financial axis of the severity matrix provides the classification mechanism that determines whether an incident has material financial reporting impact. A severity matrix that consistently classifies incidents at the correct financial severity level is an internal control over the incident management process itself — a control-over-controls that SOX auditors will evaluate.

FCA SYSC — 6.1.1R (Systems and Controls)

The FCA expects firms to have adequate systems and controls for managing operational risks, including the ability to identify, classify, and respond to incidents proportionally to their severity. SYSC 6.1.1R does not prescribe a specific severity framework, but the FCA's expectations — articulated through supervisory statements and enforcement actions — make clear that firms must demonstrate structured, consistent incident classification. A governed severity matrix with calibration exercises and response parameter mapping provides the structured classification that the FCA expects. The FCA has specifically criticised firms where incident severity was determined by the seniority of the person who happened to notice the incident rather than by a structured assessment of impact.

NIST AI RMF — GOVERN 1.5, MANAGE 4.1

GOVERN 1.5 addresses processes for escalation and response, requiring that organisations have defined mechanisms for determining when and how to escalate AI-related risks and incidents. The severity matrix provides the structured escalation logic that GOVERN 1.5 requires. MANAGE 4.1 addresses incident response planning and execution, requiring that response actions are proportional to the severity of the incident. AG-419's response parameter mapping ensures proportionality by linking defined response actions to defined severity levels.

ISO 42001 — Clause 8.4 (Nonconformity and Corrective Action)

ISO 42001 requires organisations to react to nonconformities, evaluate the need for corrective action, and implement corrections proportional to the impact of the nonconformity. The severity matrix provides the structured impact assessment that determines the proportionality of corrective action. Without severity classification, organisations cannot demonstrate that their corrective actions are proportional — they either over-correct for minor issues (wasting resources) or under-correct for major issues (allowing harm to persist).

DORA Article 18 requires financial entities to classify ICT-related incidents using criteria defined in Article 18(1), including the number of clients affected, the duration of the incident, the geographical spread, data losses, the criticality of services affected, and the economic impact. AG-419's multi-axis severity matrix is directly aligned with DORA's classification requirements: the financial axis maps to economic impact, the safety and rights axes map to client impact, and the legal axis captures regulatory compliance breaches. DORA's requirement for consistent classification criteria across the financial entity is met by the governed, calibrated matrix with inter-rater agreement validation.

10. Failure Severity

Field	Value
Severity Rating	Critical
Blast Radius	Organisation-wide — affects the response quality and timeliness of every agent-related incident

Consequence chain: Without a governed severity matrix, incident severity is determined by ad hoc judgement — the subjective assessment of whoever happens to be on call when the incident occurs. The immediate consequence is inconsistent classification: the same incident type is classified as Critical by one analyst and Medium by another, producing wildly different response times, escalation paths, and notification decisions. The downstream consequence is either systematic under-escalation (the more common failure) or systematic over-escalation. Under-escalation is the more dangerous failure mode: a safety incident classified as an equipment issue (Scenario A) leads to additional injuries while response is delayed; a £2.8 million financial loss classified as a data error (Scenario B) compounds to £3.4 million while the board remains uninformed; a systematic rights violation classified as accuracy drift (Scenario C) persists for months affecting thousands of individuals. Over-escalation, while less dangerous, exhausts crisis capacity — if every incident is treated as Critical, the organisation develops crisis fatigue and eventually begins ignoring genuine Critical incidents. The regulatory consequence is severe: Article 62 of the EU AI Act, DORA Article 19, and FCA supervisory expectations all require structured incident classification. An organisation that cannot demonstrate consistent, governed severity classification faces enforcement action when any incident subsequently reveals that classification was inadequate. The ultimate business consequence is liability amplification — courts and regulators assess not only the original incident but the adequacy of the response, and inadequate response driven by misclassification increases both punitive damages and regulatory penalties.

Cross-references: AG-007 (Governance Configuration Control) governs the configuration artefacts within which the severity matrix is maintained. AG-409 (Critical Event Taxonomy Governance) defines the event taxonomy that the severity matrix classifies. AG-420 (Tabletop Exercise Governance) uses the severity matrix as an input to exercise scenario design. AG-423 (Incident Learning Closure Governance) consumes severity classifications to prioritise learning activities. AG-424 (Notification Routing Governance) uses severity levels to determine notification recipients. AG-428 (Crisis Communication Approval Governance) is triggered by Critical and High severity classifications. AG-022 (Behavioural Drift Detection) may generate incidents that require severity classification. AG-019 (Human Escalation & Override Triggers) defines escalation mechanisms that the severity matrix activates.

Cite this protocol

AgentGoverning. (2026). AG-419: Adverse Event Severity Matrix Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-419

← Previous Protocol

AG-418

Cross-System Trace Correlation Governance

Next Protocol →

AG-420

Tabletop Exercise Governance