AG-409: Critical Event Taxonomy Governance

2. Summary

Critical Event Taxonomy Governance requires that every organisation deploying AI agents establishes, maintains, and enforces a formally governed taxonomy of critical events — the discrete categories of occurrences that must be logged because they carry safety, legal, operational, or financial significance. Without a standardised event taxonomy, logging becomes arbitrary: some teams log everything (generating noise that buries genuine signals), others log too little (creating forensic blind spots), and no two systems classify the same occurrence the same way, making cross-system correlation impossible. This dimension mandates a canonical, versioned, change-controlled classification system for events that matter — ensuring that when an agent makes a consequential decision, triggers a safety boundary, executes a financial transaction, escalates to a human, or encounters an anomaly, the event is captured under a well-defined category with consistent semantics, severity assignment, and required metadata fields.

3. Example

Scenario A — Inconsistent Event Classification Conceals a Pattern of Safety Boundary Violations: A healthcare organisation deploys three AI agents across triage assessment, medication recommendation, and patient scheduling. Each agent was developed by a different team. The triage agent logs safety boundary events under the category "SAFETY_OVERRIDE" with severity "Critical." The medication agent logs the same class of event under "BOUNDARY_EXCEPTION" with severity "Warning." The patient scheduling agent logs it under "RULE_VIOLATION" with severity "Error." Over a six-month period, each agent individually triggers 4-7 safety boundary events — below the threshold that would trigger investigation for any single agent. However, across all three agents, 17 safety boundary events have occurred, which far exceeds the organisation's cross-agent threshold of 10 per quarter. Because the events are classified under three different names with three different severities, no aggregation dashboard, no automated alert, and no quarterly review detects the combined pattern. A patient receives an incorrect medication recommendation that the triage agent had flagged as a boundary case — but no one connected the dots. The adverse outcome costs £480,000 in clinical remediation, litigation, and regulatory penalties.

What went wrong: Three agents logged materially identical events under three different taxonomy categories. No canonical event taxonomy existed to ensure consistent classification. Cross-agent aggregation was impossible because the event names, severity levels, and metadata schemas were incompatible. The pattern that would have triggered intervention was invisible because it was fragmented across three incompatible classification systems. Consequence: undetected safety pattern leading to patient harm, £480,000 in direct costs, regulatory investigation under clinical safety reporting obligations.

Scenario B — Missing Event Category Creates a Forensic Gap During Regulatory Investigation: A financial-value agent processes insurance claims with automated approval authority up to £50,000. The agent's logging taxonomy includes categories for "CLAIM_APPROVED," "CLAIM_DENIED," "CLAIM_ESCALATED," and "ERROR." Over 14 months, the agent processes 23,000 claims. A regulatory investigation into potential bias discovers that the agent has been approving claims from certain postcodes at a 34% higher rate. Investigators request the audit trail for all decisions where the agent overrode its initial risk assessment — cases where the model's first-pass score recommended denial but the agent ultimately approved. The organisation discovers that no event category exists for "RISK_SCORE_OVERRIDE" or any equivalent. The agent's internal reasoning trace shows these overrides occurred 1,847 times, but they were logged under the generic "CLAIM_APPROVED" category with no distinguishing metadata. Reconstructing which approvals involved risk-score overrides requires re-running the model on all 23,000 claims — a process that takes 6 weeks and costs £215,000, and the regulator questions whether the reconstruction is faithful to the original decisions.

What went wrong: The event taxonomy was designed around final outcomes (approved, denied, escalated) rather than around decision-significant events. The override of an internal risk assessment is a critical event — it represents a point where the agent's behaviour diverged from its initial assessment — but no taxonomy category captured it. The forensic gap meant that a routine investigation became a six-week reconstruction exercise. Consequence: £215,000 in reconstruction costs, regulatory credibility damage, and potential enforcement action for inadequate record-keeping under insurance conduct rules.

Scenario C — Uncontrolled Taxonomy Evolution Produces Semantic Drift: A crypto trading platform deploys an AI agent for market-making operations. The initial event taxonomy defines "POSITION_LIMIT_BREACH" as: "The agent attempted to take a position that would exceed the configured position limit." After a platform upgrade, a developer modifies the logging code so that "POSITION_LIMIT_BREACH" is also emitted when the agent's position approaches 80% of the limit — a warning, not a breach. No taxonomy change-control process exists, so the semantic meaning of the event changes without updating the taxonomy definition, the alerting thresholds, or the compliance dashboards. Over four months, the compliance team receives 340 "POSITION_LIMIT_BREACH" alerts, concludes the system is unreliable (because most alerts are warnings, not actual breaches), and begins ignoring them. When a genuine position limit breach occurs — the agent takes a $2.3 million position exceeding the $1.5 million limit — the alert is dismissed as another false positive. The breach is discovered 47 hours later during a manual reconciliation.

What went wrong: The semantic definition of a critical event category was changed without governance. A "breach" became a "warning" in practice, but the taxonomy, the alerts, and the compliance processes still treated it as a breach. Alert fatigue from the diluted definition caused a genuine breach to be ignored. Consequence: $2.3 million position limit breach undetected for 47 hours, potential regulatory enforcement for inadequate market risk controls, and $890,000 in losses from the unhedged position.

4. Requirement Statement

Scope: This dimension applies to every AI agent deployment where the agent performs actions that carry safety, legal, financial, or operational consequences — which, for any agent operating under this governance standard, means every deployment. The taxonomy governs the classification system for events, not the events themselves (which are governed by AG-023). It defines what categories of events exist, what each category means, what metadata each category requires, and how categories relate to severity levels and response obligations. The scope includes the taxonomy's data model, its versioning and change-control process, its semantic definitions, its severity classification scheme, and its cross-system consistency requirements. Organisations that deploy multiple agents, whether developed internally or sourced from third parties, must ensure that all agents emit events classified under the same canonical taxonomy — not merely that each agent has its own logging.

4.1. A conforming system MUST maintain a canonical critical event taxonomy that defines every category of event requiring logging, with each category having a unique identifier, a human-readable name, a formal semantic definition specifying the precise conditions under which the event is emitted, a severity classification, and a required metadata schema.

4.2. A conforming system MUST classify events using a severity scheme with at minimum four levels — Critical, High, Medium, Low — where each level has a formal definition tied to impact thresholds (e.g., Critical: immediate safety risk or governed exposure exceeding a defined threshold; High: potential regulatory non-compliance or operational degradation).

4.3. A conforming system MUST require that every deployed agent emits events using only categories defined in the canonical taxonomy, rejecting or quarantining events that reference undefined categories.

4.4. A conforming system MUST subject taxonomy changes — additions, removals, semantic modifications, severity reclassifications, and metadata schema changes — to a formal change-control process requiring review and approval by at least two individuals with governance authority, neither of whom is the change author.

4.5. A conforming system MUST version the taxonomy with immutable version identifiers, retaining all prior versions with complete change history including timestamps, authors, approval references, and the rationale for each change.

4.6. A conforming system MUST include in the taxonomy at minimum the following event categories (or domain-specific equivalents with documented mapping): safety boundary violation, human escalation trigger, financial threshold breach, data access anomaly, model output override, configuration change, authentication or authorisation failure, and error or exception with potential data integrity impact.

4.7. A conforming system MUST validate on a defined schedule (no less frequently than quarterly) that all deployed agents emit events consistent with the current taxonomy version, detecting agents that use deprecated categories, undefined categories, or incorrect metadata schemas.

4.8. A conforming system SHOULD implement machine-readable taxonomy definitions enabling automated validation of event emissions against the taxonomy at the point of ingestion, rejecting or flagging non-conforming events in real time.

4.9. A conforming system SHOULD define for each event category the expected response obligation — whether the event requires immediate human review, automated escalation, periodic aggregation review, or archival only — and enforce these obligations through integration with alerting and workflow systems.

4.10. A conforming system MAY implement taxonomy simulation, enabling the organisation to model the impact of a proposed taxonomy change on existing dashboards, alerts, aggregation queries, and compliance reports before committing the change.

4.11. A conforming system MAY implement automated semantic drift detection, monitoring whether the conditions under which events are emitted in production remain consistent with the taxonomy's formal semantic definitions.

5. Rationale

Logging is the foundation of governance observability. An organisation that cannot observe what its agents are doing cannot govern them. But raw logging without taxonomy is like a library without a catalogue — the books exist, but finding the right one requires searching every shelf. A critical event taxonomy transforms logging from a raw data stream into a structured governance instrument.

The primary risk that taxonomy governance addresses is semantic fragmentation. When multiple agents, teams, or systems define their own event categories independently, the same real-world occurrence is classified differently depending on who built the logging. This fragmentation has three consequences. First, cross-agent pattern detection fails. If safety boundary violations are classified under five different names across five agents, no dashboard or alert can aggregate them. Patterns that are visible in aggregate become invisible when fragmented. Second, forensic investigation becomes expensive. When an incident occurs and investigators need to reconstruct what happened, they must first discover and reconcile multiple classification systems before they can query the logs. Third, regulatory reporting becomes unreliable. Regulators increasingly require organisations to report on categories of AI incidents — safety events, bias detections, data breaches. If the internal event taxonomy does not map cleanly to regulatory reporting categories, every report requires manual reconciliation.

The secondary risk is semantic drift — the gradual change in what an event category means in practice, even though its formal definition has not been updated. Semantic drift occurs when developers modify the conditions under which an event is emitted without updating the taxonomy. The category name stays the same, but its meaning changes. This is particularly dangerous for severity classifications: if a "Critical" event is diluted to include non-critical occurrences, alert fatigue follows, and genuine critical events are missed.

Regulatory frameworks increasingly mandate structured event logging for AI systems. The EU AI Act Article 12 requires that high-risk AI systems support automatic logging of events relevant to identifying risks and facilitating post-market monitoring. The act does not prescribe a taxonomy, but the requirement for "automatic logging of events" implicitly requires a classification system — otherwise, there is no basis for determining which events are "relevant to identifying risks." ISO 42001 Clause 9.1 requires monitoring, measurement, analysis, and evaluation, which presupposes that the organisation knows what to monitor and how to classify what it observes. DORA Article 10 requires ICT-related incident classification, directly mandating a taxonomy for technology-related events.

Without taxonomy governance, the organisation is building governance on an unstructured foundation. Every downstream control — alerting, escalation, forensic investigation, regulatory reporting, drift detection, anomaly detection — depends on events being classified consistently, semantically stable, and comprehensively covering the risk-relevant event space. AG-409 ensures that the classification system itself is governed.

6. Implementation Guidance

The critical event taxonomy should be implemented as a machine-readable artefact — a structured schema definition, not a prose document. Each event category should be a record with defined fields: category identifier, name, semantic definition, severity level, required metadata fields, optional metadata fields, response obligation, and related categories. The taxonomy is a reference data set consumed by all logging infrastructure.

Recommended patterns:

Schema-registry approach. Publish the taxonomy as a versioned schema in a central registry accessible to all agent deployments. Each agent's logging layer validates emitted events against the registry schema at the point of emission. Events that do not conform to a registered category are quarantined in a dead-letter queue for human review rather than silently dropped or logged under an incorrect category. The registry supports schema evolution with backward compatibility rules — new categories can be added without breaking existing consumers, but removing or redefining categories requires a deprecation period.
Severity-driven response automation. Link each severity level in the taxonomy to an automated response workflow. Critical events trigger immediate human notification with a defined response SLA (e.g., 15 minutes). High events create incident tickets with a 4-hour response SLA. Medium events are aggregated in daily review dashboards. Low events are archived for forensic and trend analysis. This linkage ensures that severity classifications are not merely labels but carry operational consequences that drive appropriate response.
Mandatory minimum category set with domain extensions. Define a core set of event categories that all agents must support (safety boundary violation, human escalation, financial threshold breach, configuration change, authentication failure, model output override, data access anomaly, error with integrity impact). Allow domain-specific extensions (e.g., "POSITION_LIMIT_BREACH" for trading agents, "CLINICAL_OVERRIDE" for healthcare agents) that must be registered in the taxonomy before use. Extensions inherit the governance requirements of core categories — semantic definition, severity classification, metadata schema, change control.
Cross-system semantic alignment validation. Periodically inject synthetic events of known categories into each agent's logging pipeline and verify that the emitted events are classified correctly. This "taxonomy conformance testing" detects agents that have drifted from the canonical taxonomy — whether through code changes, configuration drift, or dependency updates that alter logging behaviour.
Taxonomy-to-regulation mapping. Maintain an explicit mapping from each taxonomy category to the regulatory reporting categories it supports. When the taxonomy changes, the mapping is reviewed to ensure that regulatory reporting obligations are still met. This prevents the situation where a taxonomy change creates a gap in regulatory reporting.

Anti-patterns to avoid:

Developer-defined categories. Allowing individual developers to create event categories ad hoc in code without registering them in the canonical taxonomy. This is the primary source of semantic fragmentation. Categories must be defined in the taxonomy first, then referenced in code — not defined in code and reconciled later.
Severity inflation. Classifying too many events as "Critical" to ensure they receive attention. Severity inflation produces the same outcome as severity dilution: alert fatigue. If 40% of events are "Critical," the severity classification conveys no useful information. Critical events should represent genuine immediate-impact occurrences — typically fewer than 2% of total logged events.
Outcome-only taxonomy. Defining event categories only around final outcomes (approved, denied, completed, failed) rather than around decision-significant moments (risk score override, confidence threshold breach, fallback invocation, constraint relaxation). Outcome events tell you what happened; decision-significant events tell you why and how, which is what forensic investigation and governance review require.
Copy-paste taxonomy adoption. Adopting a taxonomy template from a vendor, standard, or peer organisation without adapting it to the organisation's specific agent portfolio, risk profile, and regulatory obligations. A taxonomy must reflect the actual events that the organisation's agents can produce, not a generic template that may omit domain-critical categories or include irrelevant ones.
Static severity assignments. Assigning severity levels when the taxonomy is created and never revisiting them. Severity should be reviewed when the operational context changes — an event that was "Medium" severity when the agent processed £10,000 transactions may be "High" severity when the agent's authority increases to £500,000.

Industry Considerations

Financial Services. Financial regulators require detailed transaction logging with specific event categories: suspicious transaction detection, threshold breach, limit override, counterparty risk escalation, and market abuse indicator. The taxonomy must map to regulatory reporting formats such as Suspicious Activity Reports (SARs) and transaction reporting obligations under MiFID II. Event categories must support the distinction between attempted and executed transactions, as regulators often require both.

Healthcare and Life Sciences. Clinical safety events require a taxonomy aligned with patient safety reporting frameworks. Categories must distinguish between potential harm (near-miss), actual harm, and system errors that could affect clinical decisions. Integration with adverse event reporting systems (e.g., Yellow Card in the UK, MedWatch in the US) requires taxonomy categories that map to regulatory incident classifications.

Crypto and Web3. On-chain and off-chain events must be taxonomised consistently. Categories must cover smart contract interaction anomalies, wallet access events, gas fee threshold breaches, oracle disagreements, and governance token voting actions. The immutability of blockchain transactions makes pre-execution event logging particularly important — once a transaction is on-chain, it cannot be reversed, so the event taxonomy must capture the decision to execute before execution occurs.

Public Sector. Rights-sensitive event categories must be defined for decisions affecting individual rights: benefit determinations, enforcement actions, access decisions, and eligibility assessments. The taxonomy must support the transparency requirements of administrative law, enabling citizens to understand what events occurred during the processing of their case.

Maturity Model

Basic Implementation — The organisation maintains a documented critical event taxonomy in a structured format with unique identifiers, semantic definitions, severity levels, and metadata schemas for each category. The taxonomy covers the mandatory minimum categories. All deployed agents emit events using taxonomy-defined categories. Changes follow a documented approval process with two reviewers. The taxonomy is versioned with change history. Quarterly validation confirms agent compliance with the current taxonomy version.

Intermediate Implementation — The taxonomy is published as a machine-readable schema in a central registry. Agent logging layers validate events against the registry at the point of emission, quarantining non-conforming events. Severity levels drive automated response workflows with defined SLAs. Cross-system semantic alignment is validated through periodic synthetic event injection. The taxonomy maps to regulatory reporting categories. Taxonomy diff reports are generated and distributed on every version change.

Advanced Implementation — All intermediate capabilities plus: automated semantic drift detection monitors whether production event emissions are consistent with taxonomy definitions. Taxonomy simulation models the impact of proposed changes on dashboards, alerts, and compliance reports before commitment. The taxonomy is independently audited annually. Real-time dashboards show taxonomy coverage metrics — the percentage of agent actions that produce at least one taxonomised event — across all deployments. Cross-standard taxonomy mapping (to ISO 42001, NIST AI RMF, DORA incident classification) is maintained and validated.

7. Evidence Requirements

Required artefacts:

Canonical taxonomy artefact. The current taxonomy definition showing all event categories with identifiers, names, semantic definitions, severity classifications, required metadata schemas, and response obligations. Format: machine-readable structured data (JSON Schema, YAML, or database export) plus a human-readable rendering. Must include the taxonomy version identifier and effective date.
Taxonomy version history. Complete version history showing all changes with timestamps, authors, approvers, change rationale, and the specific modifications made (categories added, removed, redefined, reclassified). Minimum retention: all versions since adoption or 5 years, whichever is shorter.
Agent-to-taxonomy compliance records. Results of the most recent quarterly validation for each deployed agent, showing: categories emitted, categories not emitted (with justification if applicable), non-conforming events detected, and remediation status.
Change-control records. Approval records for each taxonomy change demonstrating two-reviewer approval, reviewer identities, review dates, and approval decisions.
Severity-response linkage documentation. The documented mapping between severity levels and response obligations, including SLAs, escalation paths, and workflow integrations.
Regulatory mapping artefact. The documented mapping from taxonomy categories to regulatory reporting categories, with the date of last review and the reviewer identity.

Retention requirements:

Taxonomy versions and change-control records: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.
Agent compliance validation records: same retention as taxonomy versions.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Evidence must exist as retained artefacts, not be reconstructable after the fact.

8. Test Specification

Test 8.1: Taxonomy Completeness Against Mandatory Categories

Stimulus: Compare the taxonomy's category list against the mandatory minimum categories defined in Requirement 4.6: safety boundary violation, human escalation trigger, financial threshold breach, data access anomaly, model output override, configuration change, authentication or authorisation failure, and error or exception with potential data integrity impact. For each mandatory category, verify that the taxonomy contains either the exact category or a documented domain-specific equivalent with explicit mapping.
Expected behaviour: All mandatory categories are present in the taxonomy with complete definitions, severity classifications, and metadata schemas.
Pass criteria: 100% of mandatory categories are present with complete definitions. Each category has a unique identifier, semantic definition, severity level, and metadata schema.
Fail criteria: Any mandatory category is missing, or any present category lacks a complete definition (missing identifier, semantic definition, severity classification, or metadata schema).

Test 8.2: Event Emission Conformance Validation

Stimulus: For each deployed agent, collect a sample of at least 500 logged events (or all events if fewer than 500 exist). Validate each event against the canonical taxonomy: verify that the event's category identifier matches a defined taxonomy category, that all required metadata fields for that category are present and correctly typed, and that the severity level matches the taxonomy's assignment for that category.
Expected behaviour: All sampled events conform to the canonical taxonomy. Non-conforming events are detected and quarantined.
Pass criteria: At least 99% of sampled events conform to the taxonomy. Any non-conforming events have been quarantined with a logged reason. Zero events reference undefined taxonomy categories.
Fail criteria: More than 1% of sampled events are non-conforming, or any event references an undefined taxonomy category without quarantine.

Test 8.3: Change-Control Enforcement

Stimulus: Attempt to commit three taxonomy changes: (a) a new category addition with no approvals, (b) a severity reclassification with only one approval, (c) a semantic definition modification approved by the change author as one of the two reviewers. Verify rejection in all three cases.
Expected behaviour: The system rejects all three non-conforming change attempts. Only changes with two independent approvals (neither being the author) are committed.
Pass criteria: All three non-conforming change attempts are rejected. The rejection reason references the specific governance violation (no approvals, insufficient approvals, or author self-approval).
Fail criteria: Any taxonomy change is committed without the required two independent approvals.

Test 8.4: Version Immutability Verification

Stimulus: Retrieve at least three historical taxonomy versions by their version identifiers. Compare each retrieved version against the archived version stored at the time of publication.
Expected behaviour: Historical versions are immutable. Retrieved versions match archived versions exactly.
Pass criteria: Bit-for-bit match between retrieved and archived versions for all tested historical versions. All versions in the change history are retrievable.
Fail criteria: Any historical version has been modified after publication, or any version referenced in the change history is not retrievable.

Test 8.5: Cross-Agent Semantic Consistency

Stimulus: Inject a synthetic test scenario that should trigger the same event category across at least three different deployed agents (e.g., a safety boundary violation scenario adapted to each agent's domain). Collect the emitted events from all three agents.
Expected behaviour: All three agents emit events under the same taxonomy category, with the same severity level, and with metadata conforming to the same schema.
Pass criteria: The taxonomy category, severity level, and metadata schema are identical across all three agents for the same class of event. Category identifiers match exactly.
Fail criteria: Any agent classifies the synthetic event under a different category, assigns a different severity level, or uses a non-conforming metadata schema.

Test 8.6: Quarterly Compliance Validation Execution

Stimulus: Verify that a quarterly taxonomy compliance validation has been executed within the most recent quarter. Review the validation report for completeness: does it cover all deployed agents, does it identify non-conforming events, does it report remediation status for previously identified issues?
Expected behaviour: A validation has been executed within the quarter. The report covers all deployed agents and includes conformance metrics.
Pass criteria: Validation executed within the quarter. Report covers 100% of deployed agents. Non-conforming events are identified with count, category, and affected agent. Remediation status is documented for any previously identified non-conformance.
Fail criteria: No validation executed in the most recent quarter, or the report does not cover all deployed agents, or non-conformance is identified without remediation tracking.

Test 8.7: Severity-Response Obligation Integration

Stimulus: Trigger one event at each defined severity level (Critical, High, Medium, Low) in a test environment. Verify that the corresponding response obligation is executed: Critical triggers immediate human notification within the defined SLA, High creates an incident ticket, Medium appears in the daily aggregation dashboard, Low is archived.
Expected behaviour: Each severity level triggers its defined response obligation within the specified SLA.
Pass criteria: Critical event triggers human notification within SLA. High event creates a ticket. Medium event appears in the aggregation dashboard. Low event is archived. All response obligations are verifiable through audit trail.
Fail criteria: Any severity level fails to trigger its defined response obligation, or the response occurs outside the defined SLA for Critical or High events.

Conformance Scoring

Score 0: No critical event taxonomy exists — agents log events under ad hoc, locally defined categories with no consistency, no severity scheme, and no governance of the classification system itself.
Score 1: A taxonomy exists in documented form with defined categories and severity levels, but it is not machine-readable, not validated automatically against agent emissions, and changes are not formally controlled. Some agents may still use locally defined categories.
Score 2: The taxonomy is machine-readable and published in a central registry. All agents emit events against the canonical taxonomy with automated validation. Changes follow a formal two-reviewer process. Quarterly compliance validation confirms agent conformance. Severity levels drive automated response workflows.
Score 3: Verified by independent audit — an independent party has validated taxonomy completeness, cross-agent semantic consistency, change-control integrity, and severity-response linkage. Automated semantic drift detection monitors production emissions. Taxonomy simulation enables pre-commitment impact analysis. Cross-standard regulatory mapping is maintained and validated.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 12 (Record-Keeping / Automatic Logging)	Direct requirement
EU AI Act	Article 9 (Risk Management System)	Supports compliance
SOX	Section 404 (Internal Controls Over Financial Reporting)	Supports compliance
FCA SYSC	6.1.1R (Systems and Controls)	Supports compliance
FCA SYSC	10A.1.6R (Record-Keeping)	Direct requirement
NIST AI RMF	GOVERN 1.5, MEASURE 2.6	Supports compliance
ISO 42001	Clause 9.1 (Monitoring, Measurement, Analysis, Evaluation)	Direct requirement
DORA	Article 10 (ICT-Related Incident Classification)	Direct requirement

EU AI Act — Article 12 (Record-Keeping / Automatic Logging)

Article 12 requires that high-risk AI systems are designed and developed with capabilities enabling the automatic recording of events (logs) over the lifetime of the system. The logs must be appropriate to the intended purpose of the system and must enable monitoring of its operation with respect to identifying situations that may result in risks. This article implicitly mandates a taxonomy — the requirement that logs be "appropriate to the intended purpose" and enable "identifying situations that may result in risks" requires a classification system that determines what constitutes a risk-relevant event. Without AG-409's taxonomy governance, an organisation cannot demonstrate that its logging is systematic, complete, and appropriate to its risk profile. The taxonomy is the mechanism through which the organisation defines "appropriate" and "risk-relevant."

SOX — Section 404 (Internal Controls Over Financial Reporting)

For organisations subject to SOX, AI agents that participate in financial processes (transaction processing, reporting, calculation, approval) must log events that are relevant to financial control effectiveness. SOX auditors require a documented event classification system to determine whether logging is adequate. A taxonomy that defines "financial threshold breach," "model output override," and "configuration change" as distinct event categories — each with defined metadata — provides the structured foundation that auditors need. Without a taxonomy, the auditor must assess logging adequacy event-by-event, which is impractical for agents processing thousands of transactions.

FCA SYSC — 6.1.1R and 10A.1.6R

The FCA requires firms to maintain adequate systems and controls (6.1.1R) and to keep orderly records of their business (10A.1.6R). For AI agents in financial services, "orderly records" requires a classification system — unclassified logs are not orderly. The taxonomy provides the order, and taxonomy governance ensures that the classification system is maintained as agent capabilities and regulatory requirements evolve. The FCA's expectation of "adequate" systems and controls requires that logging be comprehensive and consistent, which is only achievable with a governed taxonomy.

NIST AI RMF — GOVERN 1.5 and MEASURE 2.6

GOVERN 1.5 addresses ongoing monitoring of AI systems, which requires knowing what to monitor. The taxonomy defines the monitored event space. MEASURE 2.6 addresses the measurement of AI system performance and impacts, which requires classified events as inputs to measurement. The taxonomy ensures that measurements are based on consistent, well-defined event categories rather than ad hoc log analysis.

ISO 42001 — Clause 9.1

Clause 9.1 requires organisations to determine what needs to be monitored and measured, the methods for monitoring and measurement, when monitoring and measuring shall be performed, and when results shall be analysed and evaluated. The critical event taxonomy is the organisation's answer to "what needs to be monitored" — it defines the event categories that constitute the monitoring scope. Without taxonomy governance, the answer to "what needs to be monitored" is unstable, inconsistent, and not independently verifiable.

DORA requires financial entities to classify ICT-related incidents using defined criteria including the number of clients affected, the duration, the geographical spread, and the data losses. AI agent events that constitute or contribute to ICT-related incidents must be classified in a manner that supports DORA incident reporting. The taxonomy must include event categories that map to DORA's incident classification criteria, enabling automated aggregation of agent events into DORA-reportable incident assessments.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Organisation-wide — affects the observability, forensic capability, and regulatory reporting for all deployed agents

Consequence chain: Without a governed critical event taxonomy, each agent logs events under its own classification system, producing fragmented, semantically inconsistent, and non-aggregable log data. The immediate technical failure is loss of cross-agent visibility — patterns that are detectable only in aggregate (such as a rising frequency of safety boundary violations across multiple agents) become invisible because the events are classified under different names. The operational failure that follows is delayed or missed incident detection: genuine critical events are lost in noise (when severity classifications are inconsistent) or are not correlated across agents (when event categories are incompatible). The forensic failure compounds the operational one: when an incident occurs and investigators must reconstruct the event sequence, they first must discover and reconcile multiple classification systems before they can query the logs — a process that adds days or weeks to investigation timelines and costs hundreds of thousands in analyst time. The regulatory failure is the terminal consequence: regulators expect structured, classifiable event logs that support systematic analysis. An organisation that cannot produce a consistent event classification across its agents faces findings under EU AI Act Article 12 (inadequate logging), DORA Article 10 (inadequate incident classification), FCA SYSC 10A.1.6R (disorderly records), and SOX Section 404 (inadequate internal controls). In financial services, the combined regulatory exposure can exceed £1 million in penalties and remediation costs, with secondary consequences including mandatory independent monitoring and restrictions on agent deployment authority.

Cross-references: AG-007 (Governance Configuration Control) governs the configuration artefacts that include the taxonomy as a governed configuration item. AG-023 (Audit Trail Governance) governs the logging infrastructure that consumes the taxonomy. AG-410 (High-Cardinality Trace Retention Governance) relies on the taxonomy to classify which traces require extended retention. AG-414 (Alert Deduplication Governance) consumes the taxonomy's severity classifications to deduplicate alerts. AG-415 (Decision Journal Completeness Governance) uses taxonomy categories to validate that all decision-significant events are journaled. AG-006 (Tamper-Evident Record Integrity) ensures that taxonomy-classified events cannot be altered post-emission. AG-022 (Behavioural Drift Detection) uses taxonomy-classified events as input signals for drift detection.

Cite this protocol

AgentGoverning. (2026). AG-409: Critical Event Taxonomy Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-409

← Previous Protocol

AG-408

Infrastructure Drift Detection Governance

Next Protocol →

AG-410

High-Cardinality Trace Retention Governance