Adverse Event Severity Matrix Governance requires every organisation operating AI agents to maintain a formally governed, multi-dimensional severity classification matrix that scores agent-caused incidents across safety impact, governed exposure, rights infringement, legal liability, and reputational damage axes. The matrix must produce a deterministic composite severity level (Critical, High, Medium, Low, Informational) for every incident, ensuring that response urgency, notification obligations, escalation paths, and resource allocation are calibrated to actual harm rather than subjective judgement. Without a governed severity matrix, organisations either over-escalate trivial events — consuming crisis capacity on non-crises — or under-escalate catastrophic failures — allowing rights violations or safety hazards to persist while classified as routine operational issues.
Scenario A — Misclassified Safety Incident Causes Delayed Response: A warehouse robotics agent controlling 12 autonomous picking units develops a path-planning fault that causes one unit to collide with a human worker, resulting in a minor shoulder injury. The on-call engineer classifies the incident as "Medium — Equipment Malfunction" using an informal severity guide that does not distinguish between equipment-only collisions and human-contact collisions. Under this classification, the response SLA is 48 hours. The remaining 11 units continue operating with the same path-planning model. Within 36 hours, a second collision occurs — this time pinning a worker against a shelving unit and causing a fractured pelvis requiring surgical intervention. Post-incident review reveals that the severity matrix used by the organisation had no separate axis for physical safety; all collisions were classified under a single "equipment" category.
What went wrong: The severity matrix lacked a safety impact axis that would have elevated a human-contact collision to Critical severity with immediate response (halt all units, root-cause analysis within 2 hours). A single-axis classification collapsed a safety-critical event into a routine equipment category. The 48-hour SLA was appropriate for equipment-only faults but catastrophically slow for a pattern that endangered human life. Consequence: one surgical injury, Health and Safety Executive investigation, £1.2 million in liability claims, 3-week facility shutdown, and criminal health-and-safety prosecution of the operations director.
Scenario B — Financial Severity Axis Missing From Matrix Causes Unreported Exposure: A financial-value agent executing foreign exchange hedging for a mid-sized manufacturer processes a malformed rate feed and executes 47 hedging contracts at a rate 340 basis points below market. The total mark-to-market loss is £2.8 million. The incident management team classifies the event as "High — Data Feed Error" because the severity matrix classifies incidents by root cause (data, logic, infrastructure) rather than by consequence. The "Data Feed Error" classification triggers a 24-hour response SLA and no board notification. The trading desk does not unwind the positions for 18 hours. During that period, the rate moves a further 85 basis points against the position, increasing the loss to £3.4 million. The board learns of the loss 6 days later, after the monthly risk report is produced.
What went wrong: The severity matrix classified by root cause rather than by governed exposure. A £2.8 million loss classified as a data feed error received the same response as a £500 data feed error that caused no trading impact. The matrix had no governed exposure axis that would have elevated any loss exceeding £500,000 to Critical severity with immediate board notification and position freeze. Consequence: £3.4 million total loss (£600,000 of which accrued during the delayed response), FCA supervisory finding for inadequate incident classification under SYSC 6.1.1R, board-level governance review, and £180,000 in external remediation costs.
Scenario C — Rights-Impact Axis Absent Allows Systematic Discrimination to Persist: A public-sector benefits-assessment agent processes 14,000 disability benefit applications per month. A model update introduces a regression that systematically under-scores mobility impairments affecting wheelchair users, reducing award amounts by an average of £127 per month. The incident is classified as "Low — Model Accuracy Drift" because the severity matrix uses only a technical accuracy metric (overall accuracy decreased by 0.3%, which is below the "Medium" threshold of 1.0%). The regression persists for 4 months before an advocacy group files a formal complaint. During those 4 months, 2,340 wheelchair users receive reduced benefits totalling £296,000 in aggregate underpayment. The Equality and Human Rights Commission opens an investigation into indirect disability discrimination.
What went wrong: The severity matrix had no rights-impact axis. A systematic bias affecting a protected characteristic was invisible to a matrix that measured only aggregate technical accuracy. The 0.3% overall accuracy drop masked a 23% accuracy drop for a specific protected group. A rights-impact axis would have flagged any differential impact exceeding 5% on a protected characteristic as Critical severity regardless of aggregate accuracy. Consequence: £296,000 in underpayments requiring retrospective correction, Equality Act 2010 investigation, reputational damage across disability advocacy networks, and 8 months of remediation including independent algorithmic audit.
Scope: This dimension applies to every organisation operating AI agents where agent actions or outputs can cause harm across any of the five severity axes: safety (physical injury, environmental damage, life-safety system degradation), financial (monetary loss, exposure, misallocation), rights (discrimination, privacy violation, due-process denial, autonomy infringement), legal (regulatory breach, contractual violation, criminal liability), and reputational (public trust erosion, brand damage, stakeholder confidence loss). The scope includes the matrix definition itself, its calibration process, its integration with incident management workflows, and its periodic validation. Organisations operating agents exclusively in sandboxed development environments with no production data or real-world effects may defer full implementation but must implement the matrix before any production deployment.
4.1. A conforming system MUST maintain a multi-dimensional severity matrix that classifies agent-related incidents across at least five independent axes: safety impact, governed exposure, rights infringement, legal liability, and reputational damage, with defined severity levels for each axis.
4.2. A conforming system MUST define at least four severity levels per axis (Critical, High, Medium, Low) with quantitative thresholds or qualitative criteria that are specific enough to produce consistent classification by different analysts reviewing the same incident.
4.3. A conforming system MUST implement a composite severity derivation rule that produces a single overall severity level from the individual axis scores, where the composite severity is at least as high as the maximum individual axis severity (i.e., a Critical rating on any single axis produces a Critical composite rating).
4.4. A conforming system MUST map each composite severity level to defined response parameters including: maximum time-to-acknowledge, maximum time-to-initial-response, escalation recipients, notification obligations, and resource mobilisation requirements.
4.5. A conforming system MUST integrate the severity matrix with the incident management workflow such that every agent-related incident is classified using the matrix before response actions are determined, and the classification is recorded as a permanent incident artefact.
4.6. A conforming system MUST validate the severity matrix at least annually through calibration exercises where multiple independent analysts classify the same set of historical or synthetic incidents, measuring inter-rater agreement and adjusting thresholds where agreement falls below 80%.
4.7. A conforming system MUST version-control the severity matrix, retaining all prior versions with change history, and re-classify any open incidents when the matrix is updated to determine whether the updated matrix changes their severity.
4.8. A conforming system SHOULD define domain-specific sub-axes within each primary axis to capture sector-specific severity factors (e.g., within the safety axis: near-miss, minor injury, major injury, fatality; within the rights axis: individual impact, group impact, systemic impact on protected characteristic).
4.9. A conforming system SHOULD implement automated pre-classification that uses incident metadata (error type, affected population size, monetary values, system identifiers) to suggest an initial severity classification before human review.
4.10. A conforming system SHOULD establish severity escalation triggers — conditions under which an incident's severity is automatically elevated (e.g., when the number of affected individuals exceeds a threshold, when governed exposure crosses a monetary boundary, or when a second incident of the same type occurs within a defined window).
4.11. A conforming system MAY implement real-time severity scoring that continuously updates the composite severity as new information about an incident becomes available, adjusting response parameters dynamically.
4.12. A conforming system MAY integrate the severity matrix with external reporting obligations, automatically determining whether an incident at a given severity level triggers mandatory regulatory notification under applicable frameworks (e.g., EU AI Act serious incident reporting, GDPR breach notification, DORA major ICT-related incident reporting).
Incident severity classification is the single most consequential decision in the first minutes of an agent-related incident. The severity level determines how fast the organisation responds, who is notified, what resources are mobilised, and whether external parties (regulators, affected individuals, law enforcement) are informed. A misclassification by even one severity level can mean the difference between a 2-hour response and a 48-hour response — a delay that can allow a safety hazard to injure additional people, a governed exposure to compound, or a rights violation to affect thousands more individuals.
Traditional IT incident severity frameworks (SEV-1 through SEV-4, typically based on service availability and user impact) are inadequate for AI agent incidents for three reasons. First, AI agent incidents frequently cause harm along dimensions that IT frameworks do not measure. A model regression that introduces systematic bias against a protected group has zero impact on service availability but catastrophic impact on rights. An agent that executes financially harmful transactions at full service availability is a High-availability disaster. Second, AI agent incidents often have delayed or distributed impact. A classification error that affects 0.3% of cases may seem trivial in aggregate but may represent a 23% error rate for a specific population — a pattern visible only through a rights-impact axis. Third, AI agent incidents can cross category boundaries simultaneously. A single incident may cause physical injury (safety), financial loss (financial), privacy violation (rights), regulatory breach (legal), and media coverage (reputational). A single-axis framework cannot capture this multi-dimensional harm profile.
The governed severity matrix addresses these gaps by requiring explicit, independent scoring across all relevant harm dimensions. The composite severity rule — where the overall severity is at least the maximum of any individual axis — ensures that a Critical rights violation is treated as a Critical incident even if the financial and safety axes score Low. This prevents the dilution of rights and safety concerns by aggregate metrics, which is the failure mode observed in Scenario C.
Calibration is essential because severity classification is ultimately a human judgement applied under time pressure. Research in incident management consistently shows that different analysts will classify the same incident differently when thresholds are vague. Annual calibration exercises, where multiple analysts independently classify the same incidents and disagreements are resolved through threshold adjustment, produce the inter-rater reliability necessary for consistent, defensible classification. Without calibration, the matrix is a document rather than a functioning control.
Regulatory frameworks increasingly require structured incident classification. The EU AI Act Article 62 requires providers to report serious incidents involving high-risk AI systems, with reporting timelines tied to severity. DORA Article 19 requires financial entities to classify ICT-related incidents using defined criteria. The FCA expects firms to have incident classification frameworks that distinguish between incidents requiring immediate supervisory notification and those that do not. A governed severity matrix provides the structured classification mechanism that these frameworks require.
The severity matrix should be implemented as a structured decision framework — not a prose policy document. Each axis should have defined levels with quantitative thresholds where possible and qualitative criteria where quantification is impractical. The matrix should be accessible to incident responders in real time, not locked in a policy library that requires login credentials and document navigation during a crisis.
Recommended patterns:
Anti-patterns to avoid:
Financial Services. Financial institutions must calibrate the financial axis to their specific scale — a £100,000 threshold that is appropriate for a retail bank may be too high for a small asset manager and too low for a global investment bank. The matrix must also account for regulatory notification thresholds: DORA requires reporting of major ICT-related incidents to competent authorities, and the financial axis thresholds should align with DORA's materiality criteria. The rights axis must cover consumer duty obligations under the FCA's Consumer Duty framework, where systematic detriment to retail customers constitutes a rights impact.
Healthcare and Safety-Critical. The safety axis requires granularity beyond binary injury/no-injury. Near-miss events in healthcare and industrial settings are leading indicators that must be classified and tracked. The severity matrix should include a near-miss severity level that, while not triggering full crisis response, triggers root-cause investigation and preventive action. Medical device regulations (MDR, FDA 21 CFR Part 803) impose specific adverse event reporting obligations that the matrix must integrate.
Public Sector and Rights-Sensitive. The rights axis requires particular attention in public-sector deployments where AI agents make or influence decisions affecting individuals' legal rights, benefits, or liberty. Impact on protected characteristics under equality legislation must be a first-class severity consideration, not subordinate to technical accuracy metrics. The matrix should define differential impact thresholds: any measurable differential impact on a protected group exceeding a defined threshold (e.g., 5% relative difference) triggers at least Medium severity on the rights axis regardless of aggregate performance.
Crypto and Web3. The financial axis must account for the volatility and irreversibility characteristic of blockchain-based transactions. A classification threshold of £100,000 for traditional finance may need to be lower for crypto operations where losses cannot be reversed and market conditions can change the exposure within minutes. The legal axis must account for the evolving and jurisdiction-dependent regulatory status of crypto assets.
Basic Implementation — The organisation has defined a five-axis severity matrix with at least four severity levels per axis. Quantitative thresholds or qualitative criteria are documented for each level. The composite severity rule is defined and documented. Every agent-related incident is classified using the matrix before response actions are determined. The matrix is version-controlled with change history. Classification is manual — an analyst reviews the incident and selects severity levels for each axis using the documented criteria. Annual calibration exercises are conducted with at least 3 analysts and 10 test incidents.
Intermediate Implementation — All basic capabilities plus: the severity matrix is integrated into incident management tooling, requiring axis-by-axis classification before ticket submission. Automated pre-classification suggests initial severity based on incident metadata. Severity re-evaluation triggers are defined and monitored. Calibration exercises involve at least 5 analysts, 20 test incidents, and formal inter-rater agreement measurement. The matrix maps each severity level to specific response parameters (SLAs, escalation lists, notification requirements). Domain-specific sub-axes are defined for the organisation's primary risk domains.
Advanced Implementation — All intermediate capabilities plus: real-time severity scoring continuously updates composite severity as new information becomes available. The matrix integrates with regulatory notification obligations, automatically determining reporting requirements. Cross-incident pattern detection identifies severity escalation conditions (e.g., recurring incidents suggesting systemic failure). The matrix is independently audited annually. Severity classification data is analysed to identify systematic misclassification patterns and drive matrix refinement. Multi-jurisdictional severity mapping accounts for different regulatory thresholds across operating jurisdictions.
Required artefacts:
Retention requirements:
Access requirements:
Test 8.1: Multi-Axis Classification Completeness
Test 8.2: Inter-Rater Classification Consistency
Test 8.3: Composite Severity Derivation Correctness
Test 8.4: Response Parameter Mapping Enforcement
Test 8.5: Severity Matrix Version Control and Re-Classification
Test 8.6: Calibration Exercise Execution
Test 8.7: Incident Classification Integration
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 62 (Reporting of Serious Incidents) | Direct requirement |
| EU AI Act | Article 9 (Risk Management System) | Supports compliance |
| SOX | Section 404 (Internal Controls Over Financial Reporting) | Supports compliance |
| FCA SYSC | 6.1.1R (Systems and Controls) | Direct requirement |
| NIST AI RMF | GOVERN 1.5, MANAGE 4.1 (Incident Response) | Supports compliance |
| ISO 42001 | Clause 8.4 (Nonconformity and Corrective Action) | Supports compliance |
| DORA | Article 18 (Classification of ICT-Related Incidents) | Direct requirement |
Article 62 requires providers and deployers of high-risk AI systems to report serious incidents to the relevant market surveillance authorities. "Serious incident" is defined in Article 3(49) as an incident that directly or indirectly leads to death, serious damage to health, serious and irreversible disruption of critical infrastructure, breach of fundamental rights obligations, or serious damage to property or the environment. The severity matrix required by AG-419 provides the classification mechanism that determines whether an incident meets the Article 62 reporting threshold. Without a structured severity matrix, the determination of whether an incident is "serious" under Article 62 becomes an ad hoc judgement made under time pressure — exactly the conditions that produce misclassification. The safety axis maps directly to the death and health damage criteria; the rights axis maps to the fundamental rights breach criterion; the financial axis, combined with the legal axis, maps to the property damage and critical infrastructure criteria.
For SOX-subject organisations, agent-related incidents that affect financial reporting accuracy, completeness, or timeliness are control failures that may constitute material weaknesses. The financial axis of the severity matrix provides the classification mechanism that determines whether an incident has material financial reporting impact. A severity matrix that consistently classifies incidents at the correct financial severity level is an internal control over the incident management process itself — a control-over-controls that SOX auditors will evaluate.
The FCA expects firms to have adequate systems and controls for managing operational risks, including the ability to identify, classify, and respond to incidents proportionally to their severity. SYSC 6.1.1R does not prescribe a specific severity framework, but the FCA's expectations — articulated through supervisory statements and enforcement actions — make clear that firms must demonstrate structured, consistent incident classification. A governed severity matrix with calibration exercises and response parameter mapping provides the structured classification that the FCA expects. The FCA has specifically criticised firms where incident severity was determined by the seniority of the person who happened to notice the incident rather than by a structured assessment of impact.
GOVERN 1.5 addresses processes for escalation and response, requiring that organisations have defined mechanisms for determining when and how to escalate AI-related risks and incidents. The severity matrix provides the structured escalation logic that GOVERN 1.5 requires. MANAGE 4.1 addresses incident response planning and execution, requiring that response actions are proportional to the severity of the incident. AG-419's response parameter mapping ensures proportionality by linking defined response actions to defined severity levels.
ISO 42001 requires organisations to react to nonconformities, evaluate the need for corrective action, and implement corrections proportional to the impact of the nonconformity. The severity matrix provides the structured impact assessment that determines the proportionality of corrective action. Without severity classification, organisations cannot demonstrate that their corrective actions are proportional — they either over-correct for minor issues (wasting resources) or under-correct for major issues (allowing harm to persist).
DORA Article 18 requires financial entities to classify ICT-related incidents using criteria defined in Article 18(1), including the number of clients affected, the duration of the incident, the geographical spread, data losses, the criticality of services affected, and the economic impact. AG-419's multi-axis severity matrix is directly aligned with DORA's classification requirements: the financial axis maps to economic impact, the safety and rights axes map to client impact, and the legal axis captures regulatory compliance breaches. DORA's requirement for consistent classification criteria across the financial entity is met by the governed, calibrated matrix with inter-rater agreement validation.
| Field | Value |
|---|---|
| Severity Rating | Critical |
| Blast Radius | Organisation-wide — affects the response quality and timeliness of every agent-related incident |
Consequence chain: Without a governed severity matrix, incident severity is determined by ad hoc judgement — the subjective assessment of whoever happens to be on call when the incident occurs. The immediate consequence is inconsistent classification: the same incident type is classified as Critical by one analyst and Medium by another, producing wildly different response times, escalation paths, and notification decisions. The downstream consequence is either systematic under-escalation (the more common failure) or systematic over-escalation. Under-escalation is the more dangerous failure mode: a safety incident classified as an equipment issue (Scenario A) leads to additional injuries while response is delayed; a £2.8 million financial loss classified as a data error (Scenario B) compounds to £3.4 million while the board remains uninformed; a systematic rights violation classified as accuracy drift (Scenario C) persists for months affecting thousands of individuals. Over-escalation, while less dangerous, exhausts crisis capacity — if every incident is treated as Critical, the organisation develops crisis fatigue and eventually begins ignoring genuine Critical incidents. The regulatory consequence is severe: Article 62 of the EU AI Act, DORA Article 19, and FCA supervisory expectations all require structured incident classification. An organisation that cannot demonstrate consistent, governed severity classification faces enforcement action when any incident subsequently reveals that classification was inadequate. The ultimate business consequence is liability amplification — courts and regulators assess not only the original incident but the adequacy of the response, and inadequate response driven by misclassification increases both punitive damages and regulatory penalties.
Cross-references: AG-007 (Governance Configuration Control) governs the configuration artefacts within which the severity matrix is maintained. AG-409 (Critical Event Taxonomy Governance) defines the event taxonomy that the severity matrix classifies. AG-420 (Tabletop Exercise Governance) uses the severity matrix as an input to exercise scenario design. AG-423 (Incident Learning Closure Governance) consumes severity classifications to prioritise learning activities. AG-424 (Notification Routing Governance) uses severity levels to determine notification recipients. AG-428 (Crisis Communication Approval Governance) is triggered by Critical and High severity classifications. AG-022 (Behavioural Drift Detection) may generate incidents that require severity classification. AG-019 (Human Escalation & Override Triggers) defines escalation mechanisms that the severity matrix activates.