Critical Event Taxonomy Governance requires that every organisation deploying AI agents establishes, maintains, and enforces a formally governed taxonomy of critical events — the discrete categories of occurrences that must be logged because they carry safety, legal, operational, or financial significance. Without a standardised event taxonomy, logging becomes arbitrary: some teams log everything (generating noise that buries genuine signals), others log too little (creating forensic blind spots), and no two systems classify the same occurrence the same way, making cross-system correlation impossible. This dimension mandates a canonical, versioned, change-controlled classification system for events that matter — ensuring that when an agent makes a consequential decision, triggers a safety boundary, executes a financial transaction, escalates to a human, or encounters an anomaly, the event is captured under a well-defined category with consistent semantics, severity assignment, and required metadata fields.
Scenario A — Inconsistent Event Classification Conceals a Pattern of Safety Boundary Violations: A healthcare organisation deploys three AI agents across triage assessment, medication recommendation, and patient scheduling. Each agent was developed by a different team. The triage agent logs safety boundary events under the category "SAFETY_OVERRIDE" with severity "Critical." The medication agent logs the same class of event under "BOUNDARY_EXCEPTION" with severity "Warning." The patient scheduling agent logs it under "RULE_VIOLATION" with severity "Error." Over a six-month period, each agent individually triggers 4-7 safety boundary events — below the threshold that would trigger investigation for any single agent. However, across all three agents, 17 safety boundary events have occurred, which far exceeds the organisation's cross-agent threshold of 10 per quarter. Because the events are classified under three different names with three different severities, no aggregation dashboard, no automated alert, and no quarterly review detects the combined pattern. A patient receives an incorrect medication recommendation that the triage agent had flagged as a boundary case — but no one connected the dots. The adverse outcome costs £480,000 in clinical remediation, litigation, and regulatory penalties.
What went wrong: Three agents logged materially identical events under three different taxonomy categories. No canonical event taxonomy existed to ensure consistent classification. Cross-agent aggregation was impossible because the event names, severity levels, and metadata schemas were incompatible. The pattern that would have triggered intervention was invisible because it was fragmented across three incompatible classification systems. Consequence: undetected safety pattern leading to patient harm, £480,000 in direct costs, regulatory investigation under clinical safety reporting obligations.
Scenario B — Missing Event Category Creates a Forensic Gap During Regulatory Investigation: A financial-value agent processes insurance claims with automated approval authority up to £50,000. The agent's logging taxonomy includes categories for "CLAIM_APPROVED," "CLAIM_DENIED," "CLAIM_ESCALATED," and "ERROR." Over 14 months, the agent processes 23,000 claims. A regulatory investigation into potential bias discovers that the agent has been approving claims from certain postcodes at a 34% higher rate. Investigators request the audit trail for all decisions where the agent overrode its initial risk assessment — cases where the model's first-pass score recommended denial but the agent ultimately approved. The organisation discovers that no event category exists for "RISK_SCORE_OVERRIDE" or any equivalent. The agent's internal reasoning trace shows these overrides occurred 1,847 times, but they were logged under the generic "CLAIM_APPROVED" category with no distinguishing metadata. Reconstructing which approvals involved risk-score overrides requires re-running the model on all 23,000 claims — a process that takes 6 weeks and costs £215,000, and the regulator questions whether the reconstruction is faithful to the original decisions.
What went wrong: The event taxonomy was designed around final outcomes (approved, denied, escalated) rather than around decision-significant events. The override of an internal risk assessment is a critical event — it represents a point where the agent's behaviour diverged from its initial assessment — but no taxonomy category captured it. The forensic gap meant that a routine investigation became a six-week reconstruction exercise. Consequence: £215,000 in reconstruction costs, regulatory credibility damage, and potential enforcement action for inadequate record-keeping under insurance conduct rules.
Scenario C — Uncontrolled Taxonomy Evolution Produces Semantic Drift: A crypto trading platform deploys an AI agent for market-making operations. The initial event taxonomy defines "POSITION_LIMIT_BREACH" as: "The agent attempted to take a position that would exceed the configured position limit." After a platform upgrade, a developer modifies the logging code so that "POSITION_LIMIT_BREACH" is also emitted when the agent's position approaches 80% of the limit — a warning, not a breach. No taxonomy change-control process exists, so the semantic meaning of the event changes without updating the taxonomy definition, the alerting thresholds, or the compliance dashboards. Over four months, the compliance team receives 340 "POSITION_LIMIT_BREACH" alerts, concludes the system is unreliable (because most alerts are warnings, not actual breaches), and begins ignoring them. When a genuine position limit breach occurs — the agent takes a $2.3 million position exceeding the $1.5 million limit — the alert is dismissed as another false positive. The breach is discovered 47 hours later during a manual reconciliation.
What went wrong: The semantic definition of a critical event category was changed without governance. A "breach" became a "warning" in practice, but the taxonomy, the alerts, and the compliance processes still treated it as a breach. Alert fatigue from the diluted definition caused a genuine breach to be ignored. Consequence: $2.3 million position limit breach undetected for 47 hours, potential regulatory enforcement for inadequate market risk controls, and $890,000 in losses from the unhedged position.
Scope: This dimension applies to every AI agent deployment where the agent performs actions that carry safety, legal, financial, or operational consequences — which, for any agent operating under this governance standard, means every deployment. The taxonomy governs the classification system for events, not the events themselves (which are governed by AG-023). It defines what categories of events exist, what each category means, what metadata each category requires, and how categories relate to severity levels and response obligations. The scope includes the taxonomy's data model, its versioning and change-control process, its semantic definitions, its severity classification scheme, and its cross-system consistency requirements. Organisations that deploy multiple agents, whether developed internally or sourced from third parties, must ensure that all agents emit events classified under the same canonical taxonomy — not merely that each agent has its own logging.
4.1. A conforming system MUST maintain a canonical critical event taxonomy that defines every category of event requiring logging, with each category having a unique identifier, a human-readable name, a formal semantic definition specifying the precise conditions under which the event is emitted, a severity classification, and a required metadata schema.
4.2. A conforming system MUST classify events using a severity scheme with at minimum four levels — Critical, High, Medium, Low — where each level has a formal definition tied to impact thresholds (e.g., Critical: immediate safety risk or governed exposure exceeding a defined threshold; High: potential regulatory non-compliance or operational degradation).
4.3. A conforming system MUST require that every deployed agent emits events using only categories defined in the canonical taxonomy, rejecting or quarantining events that reference undefined categories.
4.4. A conforming system MUST subject taxonomy changes — additions, removals, semantic modifications, severity reclassifications, and metadata schema changes — to a formal change-control process requiring review and approval by at least two individuals with governance authority, neither of whom is the change author.
4.5. A conforming system MUST version the taxonomy with immutable version identifiers, retaining all prior versions with complete change history including timestamps, authors, approval references, and the rationale for each change.
4.6. A conforming system MUST include in the taxonomy at minimum the following event categories (or domain-specific equivalents with documented mapping): safety boundary violation, human escalation trigger, financial threshold breach, data access anomaly, model output override, configuration change, authentication or authorisation failure, and error or exception with potential data integrity impact.
4.7. A conforming system MUST validate on a defined schedule (no less frequently than quarterly) that all deployed agents emit events consistent with the current taxonomy version, detecting agents that use deprecated categories, undefined categories, or incorrect metadata schemas.
4.8. A conforming system SHOULD implement machine-readable taxonomy definitions enabling automated validation of event emissions against the taxonomy at the point of ingestion, rejecting or flagging non-conforming events in real time.
4.9. A conforming system SHOULD define for each event category the expected response obligation — whether the event requires immediate human review, automated escalation, periodic aggregation review, or archival only — and enforce these obligations through integration with alerting and workflow systems.
4.10. A conforming system MAY implement taxonomy simulation, enabling the organisation to model the impact of a proposed taxonomy change on existing dashboards, alerts, aggregation queries, and compliance reports before committing the change.
4.11. A conforming system MAY implement automated semantic drift detection, monitoring whether the conditions under which events are emitted in production remain consistent with the taxonomy's formal semantic definitions.
Logging is the foundation of governance observability. An organisation that cannot observe what its agents are doing cannot govern them. But raw logging without taxonomy is like a library without a catalogue — the books exist, but finding the right one requires searching every shelf. A critical event taxonomy transforms logging from a raw data stream into a structured governance instrument.
The primary risk that taxonomy governance addresses is semantic fragmentation. When multiple agents, teams, or systems define their own event categories independently, the same real-world occurrence is classified differently depending on who built the logging. This fragmentation has three consequences. First, cross-agent pattern detection fails. If safety boundary violations are classified under five different names across five agents, no dashboard or alert can aggregate them. Patterns that are visible in aggregate become invisible when fragmented. Second, forensic investigation becomes expensive. When an incident occurs and investigators need to reconstruct what happened, they must first discover and reconcile multiple classification systems before they can query the logs. Third, regulatory reporting becomes unreliable. Regulators increasingly require organisations to report on categories of AI incidents — safety events, bias detections, data breaches. If the internal event taxonomy does not map cleanly to regulatory reporting categories, every report requires manual reconciliation.
The secondary risk is semantic drift — the gradual change in what an event category means in practice, even though its formal definition has not been updated. Semantic drift occurs when developers modify the conditions under which an event is emitted without updating the taxonomy. The category name stays the same, but its meaning changes. This is particularly dangerous for severity classifications: if a "Critical" event is diluted to include non-critical occurrences, alert fatigue follows, and genuine critical events are missed.
Regulatory frameworks increasingly mandate structured event logging for AI systems. The EU AI Act Article 12 requires that high-risk AI systems support automatic logging of events relevant to identifying risks and facilitating post-market monitoring. The act does not prescribe a taxonomy, but the requirement for "automatic logging of events" implicitly requires a classification system — otherwise, there is no basis for determining which events are "relevant to identifying risks." ISO 42001 Clause 9.1 requires monitoring, measurement, analysis, and evaluation, which presupposes that the organisation knows what to monitor and how to classify what it observes. DORA Article 10 requires ICT-related incident classification, directly mandating a taxonomy for technology-related events.
Without taxonomy governance, the organisation is building governance on an unstructured foundation. Every downstream control — alerting, escalation, forensic investigation, regulatory reporting, drift detection, anomaly detection — depends on events being classified consistently, semantically stable, and comprehensively covering the risk-relevant event space. AG-409 ensures that the classification system itself is governed.
The critical event taxonomy should be implemented as a machine-readable artefact — a structured schema definition, not a prose document. Each event category should be a record with defined fields: category identifier, name, semantic definition, severity level, required metadata fields, optional metadata fields, response obligation, and related categories. The taxonomy is a reference data set consumed by all logging infrastructure.
Recommended patterns:
Anti-patterns to avoid:
Financial Services. Financial regulators require detailed transaction logging with specific event categories: suspicious transaction detection, threshold breach, limit override, counterparty risk escalation, and market abuse indicator. The taxonomy must map to regulatory reporting formats such as Suspicious Activity Reports (SARs) and transaction reporting obligations under MiFID II. Event categories must support the distinction between attempted and executed transactions, as regulators often require both.
Healthcare and Life Sciences. Clinical safety events require a taxonomy aligned with patient safety reporting frameworks. Categories must distinguish between potential harm (near-miss), actual harm, and system errors that could affect clinical decisions. Integration with adverse event reporting systems (e.g., Yellow Card in the UK, MedWatch in the US) requires taxonomy categories that map to regulatory incident classifications.
Crypto and Web3. On-chain and off-chain events must be taxonomised consistently. Categories must cover smart contract interaction anomalies, wallet access events, gas fee threshold breaches, oracle disagreements, and governance token voting actions. The immutability of blockchain transactions makes pre-execution event logging particularly important — once a transaction is on-chain, it cannot be reversed, so the event taxonomy must capture the decision to execute before execution occurs.
Public Sector. Rights-sensitive event categories must be defined for decisions affecting individual rights: benefit determinations, enforcement actions, access decisions, and eligibility assessments. The taxonomy must support the transparency requirements of administrative law, enabling citizens to understand what events occurred during the processing of their case.
Basic Implementation — The organisation maintains a documented critical event taxonomy in a structured format with unique identifiers, semantic definitions, severity levels, and metadata schemas for each category. The taxonomy covers the mandatory minimum categories. All deployed agents emit events using taxonomy-defined categories. Changes follow a documented approval process with two reviewers. The taxonomy is versioned with change history. Quarterly validation confirms agent compliance with the current taxonomy version.
Intermediate Implementation — The taxonomy is published as a machine-readable schema in a central registry. Agent logging layers validate events against the registry at the point of emission, quarantining non-conforming events. Severity levels drive automated response workflows with defined SLAs. Cross-system semantic alignment is validated through periodic synthetic event injection. The taxonomy maps to regulatory reporting categories. Taxonomy diff reports are generated and distributed on every version change.
Advanced Implementation — All intermediate capabilities plus: automated semantic drift detection monitors whether production event emissions are consistent with taxonomy definitions. Taxonomy simulation models the impact of proposed changes on dashboards, alerts, and compliance reports before commitment. The taxonomy is independently audited annually. Real-time dashboards show taxonomy coverage metrics — the percentage of agent actions that produce at least one taxonomised event — across all deployments. Cross-standard taxonomy mapping (to ISO 42001, NIST AI RMF, DORA incident classification) is maintained and validated.
Required artefacts:
Retention requirements:
Access requirements:
Test 8.1: Taxonomy Completeness Against Mandatory Categories
Test 8.2: Event Emission Conformance Validation
Test 8.3: Change-Control Enforcement
Test 8.4: Version Immutability Verification
Test 8.5: Cross-Agent Semantic Consistency
Test 8.6: Quarterly Compliance Validation Execution
Test 8.7: Severity-Response Obligation Integration
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 12 (Record-Keeping / Automatic Logging) | Direct requirement |
| EU AI Act | Article 9 (Risk Management System) | Supports compliance |
| SOX | Section 404 (Internal Controls Over Financial Reporting) | Supports compliance |
| FCA SYSC | 6.1.1R (Systems and Controls) | Supports compliance |
| FCA SYSC | 10A.1.6R (Record-Keeping) | Direct requirement |
| NIST AI RMF | GOVERN 1.5, MEASURE 2.6 | Supports compliance |
| ISO 42001 | Clause 9.1 (Monitoring, Measurement, Analysis, Evaluation) | Direct requirement |
| DORA | Article 10 (ICT-Related Incident Classification) | Direct requirement |
Article 12 requires that high-risk AI systems are designed and developed with capabilities enabling the automatic recording of events (logs) over the lifetime of the system. The logs must be appropriate to the intended purpose of the system and must enable monitoring of its operation with respect to identifying situations that may result in risks. This article implicitly mandates a taxonomy — the requirement that logs be "appropriate to the intended purpose" and enable "identifying situations that may result in risks" requires a classification system that determines what constitutes a risk-relevant event. Without AG-409's taxonomy governance, an organisation cannot demonstrate that its logging is systematic, complete, and appropriate to its risk profile. The taxonomy is the mechanism through which the organisation defines "appropriate" and "risk-relevant."
For organisations subject to SOX, AI agents that participate in financial processes (transaction processing, reporting, calculation, approval) must log events that are relevant to financial control effectiveness. SOX auditors require a documented event classification system to determine whether logging is adequate. A taxonomy that defines "financial threshold breach," "model output override," and "configuration change" as distinct event categories — each with defined metadata — provides the structured foundation that auditors need. Without a taxonomy, the auditor must assess logging adequacy event-by-event, which is impractical for agents processing thousands of transactions.
The FCA requires firms to maintain adequate systems and controls (6.1.1R) and to keep orderly records of their business (10A.1.6R). For AI agents in financial services, "orderly records" requires a classification system — unclassified logs are not orderly. The taxonomy provides the order, and taxonomy governance ensures that the classification system is maintained as agent capabilities and regulatory requirements evolve. The FCA's expectation of "adequate" systems and controls requires that logging be comprehensive and consistent, which is only achievable with a governed taxonomy.
GOVERN 1.5 addresses ongoing monitoring of AI systems, which requires knowing what to monitor. The taxonomy defines the monitored event space. MEASURE 2.6 addresses the measurement of AI system performance and impacts, which requires classified events as inputs to measurement. The taxonomy ensures that measurements are based on consistent, well-defined event categories rather than ad hoc log analysis.
Clause 9.1 requires organisations to determine what needs to be monitored and measured, the methods for monitoring and measurement, when monitoring and measuring shall be performed, and when results shall be analysed and evaluated. The critical event taxonomy is the organisation's answer to "what needs to be monitored" — it defines the event categories that constitute the monitoring scope. Without taxonomy governance, the answer to "what needs to be monitored" is unstable, inconsistent, and not independently verifiable.
DORA requires financial entities to classify ICT-related incidents using defined criteria including the number of clients affected, the duration, the geographical spread, and the data losses. AI agent events that constitute or contribute to ICT-related incidents must be classified in a manner that supports DORA incident reporting. The taxonomy must include event categories that map to DORA's incident classification criteria, enabling automated aggregation of agent events into DORA-reportable incident assessments.
| Field | Value |
|---|---|
| Severity Rating | High |
| Blast Radius | Organisation-wide — affects the observability, forensic capability, and regulatory reporting for all deployed agents |
Consequence chain: Without a governed critical event taxonomy, each agent logs events under its own classification system, producing fragmented, semantically inconsistent, and non-aggregable log data. The immediate technical failure is loss of cross-agent visibility — patterns that are detectable only in aggregate (such as a rising frequency of safety boundary violations across multiple agents) become invisible because the events are classified under different names. The operational failure that follows is delayed or missed incident detection: genuine critical events are lost in noise (when severity classifications are inconsistent) or are not correlated across agents (when event categories are incompatible). The forensic failure compounds the operational one: when an incident occurs and investigators must reconstruct the event sequence, they first must discover and reconcile multiple classification systems before they can query the logs — a process that adds days or weeks to investigation timelines and costs hundreds of thousands in analyst time. The regulatory failure is the terminal consequence: regulators expect structured, classifiable event logs that support systematic analysis. An organisation that cannot produce a consistent event classification across its agents faces findings under EU AI Act Article 12 (inadequate logging), DORA Article 10 (inadequate incident classification), FCA SYSC 10A.1.6R (disorderly records), and SOX Section 404 (inadequate internal controls). In financial services, the combined regulatory exposure can exceed £1 million in penalties and remediation costs, with secondary consequences including mandatory independent monitoring and restrictions on agent deployment authority.
Cross-references: AG-007 (Governance Configuration Control) governs the configuration artefacts that include the taxonomy as a governed configuration item. AG-023 (Audit Trail Governance) governs the logging infrastructure that consumes the taxonomy. AG-410 (High-Cardinality Trace Retention Governance) relies on the taxonomy to classify which traces require extended retention. AG-414 (Alert Deduplication Governance) consumes the taxonomy's severity classifications to deduplicate alerts. AG-415 (Decision Journal Completeness Governance) uses taxonomy categories to validate that all decision-significant events are journaled. AG-006 (Tamper-Evident Record Integrity) ensures that taxonomy-classified events cannot be altered post-emission. AG-022 (Behavioural Drift Detection) uses taxonomy-classified events as input signals for drift detection.