AG-689: Abuse Taxonomy Governance

2. Summary

Abuse Taxonomy Governance requires organisations operating community platforms, marketplaces, or any user-facing agent-moderated environment to maintain a structured, versioned, and governed taxonomy of abuse types, associated harms, severity tiers, and corresponding moderation responses. The taxonomy serves as the authoritative reference that determines how an AI moderation agent classifies user behaviour, selects enforcement actions, and routes cases for specialist review or victim support. Without a governed taxonomy, moderation agents operate against informal, inconsistent, or stale definitions of abuse — producing enforcement decisions that are arbitrary, disproportionate, or systematically biased against specific user populations, and leaving the organisation unable to demonstrate that its moderation decisions follow a principled and auditable framework.

3. Example

Scenario A — Stale Taxonomy Fails to Cover Emerging Abuse Vector: A social platform with 28 million monthly active users deploys an AI moderation agent trained on an abuse taxonomy last updated 14 months prior. During that interval, a coordinated harassment tactic emerges in which users post apparently benign images alongside coded textual references that, in combination, constitute targeted threats against ethnic minority communities. The taxonomy contains categories for explicit hate speech, graphic violence, and direct threats — but no category for coded or contextual abuse where the harm arises from the combination of otherwise-innocuous elements. Over a 9-week period, the moderation agent processes 4,200 reports related to this tactic and closes 3,740 (89%) as "no violation found" because no individual element matches an existing taxonomy category. Affected users submit 1,860 appeals, 78% of which are upheld by human reviewers who recognise the coordinated pattern — but only after an average delay of 6.3 days per case. A civil society organisation publishes a report documenting 312 instances of the tactic with screenshots, naming the platform. The platform's stock price drops 4.2% in the following trading session, regulatory authorities in two jurisdictions open inquiries into the platform's content moderation practices, and the platform ultimately pays £3.8 million in legal settlements to affected individuals and advocacy groups.

What went wrong: The abuse taxonomy had no governance process requiring periodic review against emerging abuse vectors. The 14-month gap between taxonomy updates meant the agent's classification framework was structurally unable to recognise a tactic that had become prevalent 8 months earlier. The absence of a governed review cadence — with threat intelligence inputs, community feedback loops, and researcher engagement — left the taxonomy static in a dynamic threat landscape. Consequence: 3,740 incorrectly dismissed reports, 6.3-day average appeal delay for affected users, regulatory inquiries in two jurisdictions, £3.8 million in legal settlements, and measurable reputational harm.

Scenario B — Inconsistent Severity Mapping Produces Disproportionate Enforcement: A marketplace platform with 12 million registered sellers deploys an AI agent to moderate listing violations. The abuse taxonomy defines 47 violation types but assigns severity tiers to only 23 of them — the remaining 24 are classified as "general violations" without differentiated severity. The moderation agent applies a default enforcement action of 72-hour account suspension to all general violations. In practice, this means a seller who misspells a product dimension in a listing description receives the same 72-hour suspension as a seller who posts a counterfeit safety certification. Over a 6-month period, 14,300 sellers receive 72-hour suspensions for minor listing errors that, under a properly tiered taxonomy, would warrant only a warning or listing correction notice. An analysis by the platform's trust and safety team reveals that 62% of suspended sellers reduce their listing activity by more than 40% in the month following reinstatement — representing an estimated £8.2 million in lost transaction volume. Meanwhile, 340 sellers who posted fraudulent safety certifications received the same 72-hour suspension and resumed the same behaviour within days of reinstatement, because the enforcement action was insufficient for the severity of the violation. A product safety regulator identifies 17 listings with fraudulent certifications that remained active during the 72-hour suspension window, involving consumer products later subject to safety recalls.

What went wrong: The taxonomy lacked complete severity-tier coverage, forcing the moderation agent to apply uniform enforcement to violations of vastly different severity. No governance process ensured that every taxonomy category carried a severity classification and a proportionate enforcement response. The result was simultaneous over-enforcement against minor violations and under-enforcement against serious ones. Consequence: 14,300 disproportionate suspensions, £8.2 million in lost transaction volume from chilled seller activity, continued circulation of fraudulent safety certifications, and regulatory exposure for product safety failures.

Scenario C — Jurisdiction-Blind Taxonomy Violates Local Law: A cross-border community platform operates in 34 countries and deploys a single AI moderation agent using a unified abuse taxonomy developed primarily by a trust and safety team based in the United States. The taxonomy categorises "political satire targeting public officials" as protected expression and instructs the agent to take no moderation action. In three jurisdictions where the platform operates — Thailand, Turkey, and Vietnam — satirical content targeting heads of state or senior government officials constitutes a criminal offence under lese-majesty or insult laws. Over a 4-month period, the agent applies the "protected expression" classification to 2,100 pieces of content that local law enforcement authorities in those jurisdictions consider criminal. Authorities in Thailand issue a blocking order affecting the platform's entire local operation, impacting 1.4 million users. Turkish authorities impose a fine of ₺18 million (approximately £480,000) for failure to remove content within the statutory 48-hour window. Vietnamese authorities require the platform to localise its moderation operations or face a complete service ban. The platform spends £2.1 million on emergency localisation of its taxonomy, legal representation across three jurisdictions, and remediation of the affected content.

What went wrong: The taxonomy was not governed with jurisdictional variation in mind. No requirement existed for the taxonomy to map abuse categories to jurisdiction-specific legal definitions, or for the moderation agent to apply jurisdiction-aware classification logic. A single global taxonomy applied a US-centric free expression framework to jurisdictions with fundamentally different legal standards. Consequence: Service blocking in Thailand affecting 1.4 million users, £480,000 fine in Turkey, threat of service ban in Vietnam, and £2.1 million in emergency remediation costs.

4. Requirement Statement

Scope: This dimension applies to every deployment where an AI agent performs, assists, or recommends moderation decisions on user-generated content, user behaviour, marketplace listings, or community interactions. The scope covers all taxonomic structures that the agent uses to classify abuse — whether the taxonomy is embedded in the agent's training data, encoded in rule systems, maintained as configuration, or referenced via external policy documents. The scope extends to the full lifecycle of the taxonomy: initial creation, periodic review, version control, jurisdictional adaptation, severity-tier assignment, enforcement-response mapping, and retirement of obsolete categories. The scope includes both the taxonomy itself and the governance processes that ensure the taxonomy remains current, complete, proportionate, and legally compliant across all jurisdictions in which the agent operates. Organisations that operate in multiple jurisdictions or serve diverse community populations face heightened obligations under this dimension because taxonomic gaps or misclassifications produce enforcement errors at scale.

4.1. A conforming system MUST maintain a structured abuse taxonomy that enumerates every abuse type the moderation agent is designed to detect, with each type defined by: a unique identifier, a human-readable name, a precise textual definition with inclusion and exclusion criteria, at least two illustrative examples, an assigned severity tier, and one or more mapped enforcement responses proportionate to the severity.

4.2. A conforming system MUST version-control the abuse taxonomy such that every change — addition, modification, deprecation, or removal of any taxonomy entry — is recorded with a timestamp, the identity of the authorising individual or governance body, a rationale for the change, and the effective date from which the moderation agent begins applying the updated classification.

4.3. A conforming system MUST conduct taxonomy review cycles at a minimum frequency of once every 90 days, incorporating inputs from: threat intelligence on emerging abuse vectors, community reporting data and appeal outcomes, civil society and researcher engagement, regulatory developments in all operating jurisdictions, and internal moderation quality audits.

4.4. A conforming system MUST assign a severity tier to every abuse category in the taxonomy, using a defined severity scale of no fewer than three tiers, and map each severity tier to a set of proportionate enforcement responses such that no two abuse categories of materially different severity produce identical enforcement outcomes.

4.5. A conforming system MUST ensure that the abuse taxonomy accounts for jurisdictional variation in the legal definition of prohibited content and conduct, such that the moderation agent applies jurisdiction-appropriate classifications when operating in or serving users in jurisdictions with differing legal standards for the same category of expression or behaviour.

4.6. A conforming system MUST maintain a mapping between each abuse taxonomy category and the corresponding moderation response, including: the range of available enforcement actions (warning, content removal, temporary restriction, permanent suspension, law enforcement referral), the criteria for selecting among available actions, escalation triggers for specialist human review per AG-691, and victim support routing criteria per AG-694.

4.7. A conforming system MUST log every moderation decision made by the agent with reference to the specific taxonomy category applied, the taxonomy version in effect at the time of the decision, the severity tier, and the enforcement action selected, such that any moderation decision can be traced back to its taxonomic basis.

4.8. A conforming system MUST establish a process for emergency taxonomy updates — additions or modifications that take effect in fewer than 72 hours — to address novel abuse vectors that cause imminent harm and cannot wait for the next scheduled review cycle, with emergency updates subject to post-hoc ratification by the taxonomy governance body within 14 days.

4.9. A conforming system SHOULD engage external subject-matter experts — including civil society organisations, academic researchers, and affected community representatives — in taxonomy review cycles to identify blind spots, cultural biases, and emerging harms that internal teams may not recognise.

4.10. A conforming system SHOULD implement automated monitoring of the moderation agent's classification distribution to detect taxonomy coverage gaps, indicated by rising volumes of reports that the agent classifies as "no violation" or "uncategorised" while human reviewers on appeal subsequently find violations.

4.11. A conforming system SHOULD maintain a taxonomy coverage metric — the percentage of user-reported abuse incidents that the agent can classify into a specific taxonomy category rather than a residual "other" or "uncategorised" bucket — and target a coverage rate of 95% or higher.

4.12. A conforming system MAY implement taxonomy simulation testing, where proposed taxonomy changes are applied retrospectively to a sample of historical moderation decisions to assess the impact on classification accuracy, enforcement proportionality, and consistency before the changes are deployed to the production agent.

4.13. A conforming system MAY publish a user-facing summary of the abuse taxonomy — excluding operational details that could facilitate evasion — to provide transparency about what types of behaviour are prohibited and what enforcement responses users can expect.

5. Rationale

The abuse taxonomy is the single most consequential governance artefact in any AI-moderated community or marketplace environment. Every moderation decision the agent makes — every piece of content removed, every account suspended, every report dismissed — traces back to the taxonomy. If the taxonomy is incomplete, the agent cannot recognise abuse types it has no category for. If the taxonomy is imprecise, the agent misclassifies borderline cases. If the taxonomy is jurisdiction-blind, the agent applies culturally or legally inappropriate standards. If the taxonomy is stale, the agent fails to detect tactics that have evolved since the taxonomy was last updated. The taxonomy is not a static reference document; it is a living governance instrument that must evolve in response to adversarial innovation, regulatory change, community feedback, and emerging research on online harms.

The governance challenge is that abuse taxonomies tend to degrade over time without active stewardship. New abuse vectors emerge continuously as bad actors adapt to moderation systems — a phenomenon known as adversarial evolution. Coordinated inauthentic behaviour, coded language, context-dependent harassment, AI-generated synthetic media, and cross-platform coordinated campaigns are examples of abuse tactics that did not exist or were not well-understood when many current taxonomies were first drafted. Without a governed review cadence, the taxonomy falls behind the threat landscape, and the moderation agent's effectiveness degrades proportionally. The 90-day review cycle specified in Requirement 4.3 reflects the observed pace of adversarial adaptation on major platforms, where novel tactics typically emerge and proliferate within 60–120 days.

Severity tiering and proportionate enforcement mapping address a different but equally critical problem: disproportionate moderation. Without explicit severity tiers, moderation agents default to uniform enforcement — applying the same action to a listing typo and a fraudulent safety certification, or to a first-time minor community guideline infraction and a repeated pattern of targeted harassment. Disproportionate enforcement produces two simultaneous failures: over-enforcement that chills legitimate participation (users who receive harsh penalties for minor infractions reduce their engagement or leave the platform) and under-enforcement that fails to deter serious abuse (bad actors who receive the same penalty as minor violators are not meaningfully sanctioned). Research on platform economics consistently shows that over-enforcement of minor violations costs more in lost user engagement than the revenue saved by reduced moderation workload.

Jurisdictional variation is the third pillar of taxonomy governance. The legal definition of prohibited speech and conduct varies dramatically across jurisdictions. Content that is constitutionally protected in the United States may be criminal in Germany (incitement to hatred under Section 130 StGB), Thailand (lese-majesty under Section 112 of the Criminal Code), or India (Section 153A of the Indian Penal Code on promoting enmity between groups). A globally deployed moderation agent that applies a single taxonomy without jurisdictional adaptation will inevitably violate local law in some jurisdictions while failing to protect users in others. The EU Digital Services Act Article 14 requires platforms to include information about content moderation practices in their terms of service, and Article 16 requires clear and specific statements of reasons for content moderation decisions — both of which presuppose a structured taxonomy that users can understand and contest. The German NetzDG, the Turkish Law No. 7253, and the Australian Online Safety Act each impose jurisdiction-specific content removal obligations with statutory timelines that the taxonomy must encode to enable compliance.

Finally, the taxonomy must be auditable. Every moderation decision must be traceable to a specific taxonomy version and category. Without this traceability, the organisation cannot demonstrate that its moderation decisions are principled rather than arbitrary, cannot conduct meaningful quality audits of agent performance, cannot produce evidence to regulators that enforcement actions follow a governed framework, and cannot provide users with the specific and meaningful statements of reasons that the DSA and similar regulations require.

6. Implementation Guidance

Abuse Taxonomy Governance requires the integration of a structured classification system with version control, governance workflows, jurisdictional mapping, and continuous monitoring. The taxonomy is both a policy artefact (defining what constitutes abuse) and a technical artefact (encoding classifications the agent applies at runtime). Both aspects require governance.

Recommended patterns:

Machine-readable taxonomy with human-readable documentation. Maintain the canonical taxonomy in a structured, machine-readable format (JSON-LD, YAML, or a purpose-built schema) that the moderation agent consumes directly, alongside human-readable documentation that includes definitions, examples, and rationale for each category. The machine-readable and human-readable versions must be generated from the same source of truth — maintaining them independently creates divergence. Each taxonomy entry should include: a unique stable identifier (e.g., ABUSE-HARASS-COORD-001), a human-readable name, inclusion criteria (what behaviour falls within this category), exclusion criteria (what superficially similar behaviour does not), severity tier, jurisdictional applicability flags, mapped enforcement actions, and cross-references to related categories.
Governance body with defined authority. Establish a Taxonomy Governance Board with documented authority to approve taxonomy changes. The board should include representation from trust and safety operations, legal and regulatory compliance, product management, data science, and — for platforms operating in multiple jurisdictions — regional policy leads. Board membership should include at least one external advisor (civil society, academic researcher, or domain expert) to mitigate institutional blind spots. The board meets at minimum quarterly to conduct the 90-day review cycle, with authority to convene emergency sessions for novel abuse vectors.
Change management with impact assessment. Every proposed taxonomy change — whether adding a new category, modifying a definition, adjusting a severity tier, or retiring an obsolete category — must undergo impact assessment before deployment. The impact assessment should evaluate: the estimated volume of moderation decisions affected, the enforcement action changes that will result, jurisdictional implications, false positive and false negative risk, and any retraining or reconfiguration required for the moderation agent. For high-impact changes (affecting more than 5% of daily moderation volume or changing enforcement severity for an existing category), the impact assessment should include retrospective simulation against historical data per Requirement 4.12.
Jurisdictional overlay architecture. Implement jurisdiction-specific overlays on the base taxonomy rather than maintaining entirely separate taxonomies per jurisdiction. The base taxonomy defines the platform's global standards. Jurisdictional overlays add local legal requirements (e.g., NetzDG obligations for German users, lese-majesty provisions for Thai users), modify severity tiers where local law imposes stricter standards, and flag categories where the platform's global standard differs from local law to enable informed policy decisions. This architecture avoids the combinatorial explosion of maintaining 30+ separate taxonomies while ensuring jurisdictional compliance.
Feedback loop from appeals and quality audits. Systematically route data from user appeals, human reviewer overturns, and quality audit findings back into the taxonomy review process. If a specific taxonomy category shows an appeal overturn rate exceeding 15%, this is a signal that the category definition is ambiguous, the severity tier is miscalibrated, or the enforcement response is disproportionate. Track overturn rates per category per quarter and present trend data to the Taxonomy Governance Board at each review cycle.
Emergency update protocol with post-hoc ratification. Define a documented fast-track process for emergency taxonomy updates. The process should specify: who has authority to approve emergency updates (e.g., Head of Trust and Safety or designated on-call lead), the maximum time from identification of a novel abuse vector to deployment of the updated taxonomy entry (target: under 48 hours), the minimum documentation required at the time of emergency deployment (category definition, severity tier, enforcement response, jurisdictional scope), and the deadline for post-hoc ratification by the full Taxonomy Governance Board (14 days). Track the number of emergency updates per quarter as a leading indicator of taxonomy currency — a rising trend may indicate the regular review cycle is insufficiently frequent.

Anti-patterns to avoid:

Unversioned taxonomy. Operating without version control on the taxonomy, such that changes are applied in place and historical states cannot be reconstructed. This makes it impossible to determine which taxonomy version was in effect when a disputed moderation decision was made, and prevents retrospective quality audits.
Catch-all residual categories. Defining broad categories like "other abuse" or "general violation" that absorb significant volumes of moderation decisions. Residual categories mask taxonomy gaps — if 20% of moderation decisions fall into "other," the taxonomy is missing specific categories for abuse types that recur frequently enough to warrant explicit classification. The target is less than 5% residual classification.
Enforcement-only taxonomy without harm mapping. Defining abuse categories solely in terms of the enforcement response (e.g., "suspension-level offence") rather than the nature of the harm. Harm-based categorisation enables proportionate enforcement, meaningful user communication, and regulatory reporting. Enforcement-based categorisation conflates what happened with what should be done about it.
Internal-only taxonomy development. Developing and maintaining the taxonomy exclusively within the trust and safety team without external input. Internal teams develop blind spots, especially regarding harms experienced by marginalised communities, culturally specific abuse tactics, and emerging vectors that have not yet generated high report volumes on the specific platform.
Static jurisdictional mapping. Mapping jurisdictional requirements once at taxonomy creation and not updating them as laws change. Content moderation law is evolving rapidly — the DSA, the UK Online Safety Act, India's IT Rules amendments, Brazil's Internet Civil Framework updates, and similar legislation impose new or modified obligations on a rolling basis. Jurisdictional mappings must be reviewed at least as frequently as the taxonomy itself.

Industry Considerations

Social Media and Community Platforms. These platforms face the broadest range of abuse types and the fastest adversarial evolution. Taxonomy governance must address hate speech, harassment, coordinated inauthentic behaviour, non-consensual intimate imagery, child sexual abuse material (CSAM), terrorism and violent extremism content, self-harm and suicide content, misinformation, and platform manipulation. CSAM and terrorism content require specialised handling — including mandatory law enforcement referral in most jurisdictions — and should be governed by dedicated sub-taxonomies with the highest severity tier and no discretionary enforcement flexibility. The EU Digital Services Act Article 34 requires Very Large Online Platforms to conduct annual risk assessments addressing systemic risks, including the dissemination of illegal content — the taxonomy is the foundation for this risk assessment.

Marketplace and E-Commerce Platforms. Marketplace abuse taxonomies must cover counterfeit goods, fraudulent listings, prohibited product categories (varying by jurisdiction), fake reviews, seller impersonation, and unsafe product listings. Product safety violations require particular taxonomic precision because under-enforcement has physical safety consequences for consumers. The EU Digital Services Act Article 31 and the EU Product Safety Regulation impose specific obligations on marketplace platforms regarding unsafe product detection and removal. Marketplace taxonomies should cross-reference product safety databases (RAPEX, CPSC recalls) to ensure that known-unsafe products are mapped to the highest severity tier.

Gaming and Interactive Platforms. Gaming environments face abuse types that are context-dependent and may not map neatly to categories developed for text-based social media. Voice chat abuse, in-game griefing, real-money trading fraud, minor safety risks, and doxxing require game-specific taxonomy categories. Severity tiers must account for the immersive nature of gaming environments — verbal abuse in a voice-chat session may cause greater immediate psychological harm than equivalent text-based abuse because of the real-time, interpersonal dynamic.

Public Sector and Rights-Sensitive Platforms. Government-operated platforms — including civic engagement portals, public consultation forums, and government-managed social channels — face heightened obligations regarding freedom of expression. Abuse taxonomies for public sector platforms must be carefully calibrated to avoid over-moderation that suppresses legitimate political speech or citizen criticism of government. The taxonomy should be developed with explicit freedom of expression impact assessments and should be subject to independent review.

Maturity Model

Basic Implementation — The organisation maintains a structured abuse taxonomy with unique identifiers, definitions, and severity tiers for all categories. The taxonomy is version-controlled with change history. Review cycles occur at minimum every 90 days. Every moderation decision is logged with reference to the taxonomy category and version. Enforcement responses are mapped to severity tiers. Jurisdictional requirements are identified for all operating jurisdictions. All mandatory requirements (4.1 through 4.8) are satisfied.

Intermediate Implementation — All basic capabilities plus: external subject-matter experts participate in taxonomy review cycles. Automated monitoring detects taxonomy coverage gaps through classification distribution analysis. Appeal overturn rates are tracked per category and fed back into taxonomy reviews. Jurisdictional overlays are implemented as a systematic architecture rather than ad-hoc exceptions. Taxonomy changes undergo impact assessment with retrospective simulation for high-impact changes. Emergency update processes have been exercised at least once and refined based on the experience.

Advanced Implementation — All intermediate capabilities plus: the taxonomy coverage metric is tracked and exceeds 95%. Taxonomy simulation testing is routinely applied to proposed changes before deployment. The organisation publishes a user-facing taxonomy summary for transparency. Cross-platform threat intelligence sharing per AG-697 informs taxonomy reviews. The Taxonomy Governance Board includes external civil society representation with voting authority. Independent audit annually validates taxonomy currency, severity calibration, jurisdictional completeness, and enforcement proportionality. Taxonomy governance metrics are reported to the board of directors or equivalent senior oversight body.

7. Evidence Requirements

Required artefacts:

Canonical taxonomy artefact. The current version of the abuse taxonomy in its machine-readable and human-readable forms, including all category definitions, severity tiers, enforcement response mappings, jurisdictional overlays, and cross-references. Must demonstrate completeness (every category has all required fields per Requirement 4.1) and currency (last review date within 90 days).
Taxonomy version history. The complete version control history of the taxonomy, showing every change with timestamp, authorising individual or governance body, rationale, and effective date. Must demonstrate compliance with Requirement 4.2 — every change is traceable.
Review cycle records. Documentation of each taxonomy review cycle conducted, including: date, participants, inputs considered (threat intelligence, community reports, appeal data, regulatory developments), changes proposed, changes approved, changes deferred, and rationale for deferrals. Must demonstrate compliance with the 90-day minimum review cadence per Requirement 4.3.
Severity-tier and enforcement mapping. The complete mapping of abuse categories to severity tiers and enforcement responses, demonstrating that every category has an assigned severity tier per Requirement 4.4 and that enforcement responses are differentiated by severity.
Jurisdictional mapping documentation. Documentation showing how the taxonomy accounts for jurisdictional variation, including: jurisdictions covered, local legal requirements incorporated, categories where jurisdictional overlays modify the base taxonomy, and the legal review process for jurisdictional compliance.
Moderation decision logs. Logs of moderation decisions referencing the taxonomy category, taxonomy version, severity tier, and enforcement action, per Requirement 4.7. Must be queryable and must support reconstruction of the decision basis for any individual case.
Emergency update records. Records of any emergency taxonomy updates, including: the triggering event, the approving authority, the time from identification to deployment, the emergency taxonomy entry, and the post-hoc ratification record from the Taxonomy Governance Board.

Retention requirements:

Canonical taxonomy versions: retained for the entire operational life of the moderation agent plus 5 years. All historical versions must remain accessible, not just the current version.
Moderation decision logs: minimum 7 years for platforms subject to the EU Digital Services Act or equivalent legislation; minimum 5 years for other regulated platforms; minimum 3 years otherwise.
Review cycle records and governance body minutes: minimum 5 years.

Access requirements:

Producible to regulators, Digital Services Coordinators, or designated auditors within 48 hours of request. The DSA Article 40 grants competent authorities the power to require information necessary for monitoring compliance — taxonomy artefacts and moderation logs fall within this scope.
User-facing access: individual users must be able to obtain the taxonomy category, severity tier, and enforcement basis for any moderation action taken against their content or account, per Requirement 4.7 and DSA Article 17 (statement of reasons).

8. Test Specification

Test 8.1: Taxonomy Completeness and Structural Integrity

Stimulus: Retrieve the current canonical taxonomy. For each entry, verify the presence of: unique identifier, human-readable name, textual definition with inclusion and exclusion criteria, at least two illustrative examples, assigned severity tier, and one or more mapped enforcement responses.
Expected behaviour: Every taxonomy entry contains all required structural elements.
Pass criteria: 100% of taxonomy entries contain all six required elements per Requirement 4.1. No entry has a null or placeholder value for any required field.
Fail criteria: Any taxonomy entry is missing one or more required elements, or any required field contains a null, placeholder, or "TBD" value.

Test 8.2: Version Control Integrity

Stimulus: Retrieve the taxonomy version history for the past 12 months. For each change recorded, verify the presence of: timestamp, identity of authorising individual or governance body, rationale for the change, and effective date.
Expected behaviour: Every taxonomy change is fully documented in the version history.
Pass criteria: 100% of changes recorded in the past 12 months have all four required metadata fields (timestamp, authorising identity, rationale, effective date). The version history is tamper-evident (hashed, signed, or stored in an immutable system).
Fail criteria: Any change lacks required metadata, or the version history shows gaps where the taxonomy state cannot be reconstructed for a given date.

Test 8.3: Review Cycle Cadence Compliance

Stimulus: Retrieve review cycle records for the past 12 months. Calculate the interval between consecutive review cycles.
Expected behaviour: No interval between consecutive review cycles exceeds 90 days.
Pass criteria: All intervals between consecutive taxonomy reviews are 90 days or fewer. Each review record documents the inputs considered (threat intelligence, community reports, appeal data, regulatory developments, internal audit findings).
Fail criteria: Any interval between consecutive reviews exceeds 90 days, or any review record lacks documentation of the required input categories.

Test 8.4: Severity Tier Completeness and Enforcement Differentiation

Stimulus: Retrieve the severity-tier mapping. Verify that every taxonomy category has an assigned severity tier. Identify all distinct severity tiers and verify that the enforcement response sets for different severity tiers are materially different.
Expected behaviour: Every category has a severity tier, and enforcement responses are proportionate and differentiated.
Pass criteria: 100% of taxonomy categories have an assigned severity tier. The severity scale uses no fewer than three tiers. Enforcement response sets for any two non-adjacent severity tiers (e.g., lowest and highest) differ in at least one enforcement action type. No two categories whose severity tiers differ by two or more levels produce identical enforcement outcomes.
Fail criteria: Any category lacks a severity tier, the severity scale has fewer than three tiers, or categories of materially different severity produce identical enforcement outcomes.

Test 8.5: Jurisdictional Variation Coverage

Stimulus: Identify all jurisdictions in which the platform operates or serves users. For each jurisdiction, retrieve the jurisdictional overlay or mapping. Verify that the taxonomy accounts for jurisdiction-specific legal requirements affecting at least the three highest-risk content categories (as defined by the organisation's legal risk assessment).
Expected behaviour: Jurisdictional variation is systematically mapped and applied.
Pass criteria: Every operating jurisdiction has a documented jurisdictional overlay or mapping. For each jurisdiction, the overlay addresses local legal requirements for at least the three highest-risk content categories. The moderation agent's configuration references these jurisdictional overlays when processing content originating from or directed at users in the relevant jurisdictions.
Fail criteria: Any operating jurisdiction lacks a jurisdictional overlay, or the moderation agent's configuration does not reference jurisdictional overlays in its classification logic.

Test 8.6: Moderation Decision Traceability

Stimulus: Select a random sample of 100 moderation decisions from the past 30 days. For each decision, verify that the log contains: the specific taxonomy category applied, the taxonomy version in effect at the time, the severity tier, and the enforcement action selected.
Expected behaviour: Every moderation decision is traceable to its taxonomic basis.
Pass criteria: 100% of sampled decisions contain all four required log fields. For each decision, the taxonomy category, version, and severity tier can be cross-referenced to the canonical taxonomy version history to confirm consistency.
Fail criteria: Any sampled decision is missing one or more required log fields, or any decision references a taxonomy category or version that cannot be found in the version history.

Test 8.7: Enforcement Response Mapping Completeness

Stimulus: Retrieve the enforcement response mapping per Requirement 4.6. For each taxonomy category, verify that the mapping specifies: the range of available enforcement actions, selection criteria among actions, escalation triggers for specialist review, and victim support routing criteria.
Expected behaviour: Every category has a complete enforcement response mapping.
Pass criteria: 100% of taxonomy categories have a documented enforcement response mapping with all four required elements (available actions, selection criteria, escalation triggers, victim support routing criteria).
Fail criteria: Any category lacks one or more elements of the enforcement response mapping.

Test 8.8: Emergency Update Capability

Stimulus: Review records of emergency taxonomy updates from the past 12 months. If no emergency updates occurred, conduct a tabletop exercise: present the taxonomy governance team with a simulated novel abuse vector and measure the time to produce and deploy a valid emergency taxonomy entry.
Expected behaviour: Emergency taxonomy updates can be deployed in under 72 hours and are ratified within 14 days.
Pass criteria: For actual emergency updates: deployment occurred within 72 hours of identification, and post-hoc ratification by the governance body occurred within 14 days. For tabletop exercises: the team produces a structurally complete taxonomy entry (meeting all Requirement 4.1 elements) and a deployment plan within 72 hours of the simulated trigger.
Fail criteria: Any emergency update exceeded the 72-hour deployment window, any emergency update was not ratified within 14 days, or the tabletop exercise fails to produce a deployable taxonomy entry within 72 hours.

Conformance Scoring

Score 0: No structured abuse taxonomy exists. The moderation agent operates against informal guidelines, undocumented rules, or training data without a governed classification framework. Moderation decisions are not traceable to specific taxonomy categories.
Score 1: A structured taxonomy exists with category definitions and severity tiers, but it is not version-controlled, review cycles are irregular or undocumented, jurisdictional variation is not systematically addressed, and moderation decision logs do not consistently reference taxonomy categories.
Score 2: The taxonomy is structured, version-controlled, and reviewed at least every 90 days. All categories have severity tiers and enforcement response mappings. Jurisdictional variation is addressed. Moderation decisions are logged with taxonomy references. Emergency update processes are defined and exercised. All mandatory requirements (4.1 through 4.8) are satisfied.
Score 3: Verified by independent audit — an independent party has validated taxonomy completeness, version control integrity, review cycle cadence, severity calibration, jurisdictional coverage, enforcement proportionality, and moderation decision traceability. External experts participate in review cycles. Coverage metrics exceed 95%. Taxonomy simulation testing is routinely applied. Cross-platform threat intelligence informs taxonomy updates. The organisation publishes a user-facing taxonomy summary.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU Digital Services Act	Article 14 (Terms of Service)	Direct requirement
EU Digital Services Act	Article 16 (Statement of Reasons)	Direct requirement
EU Digital Services Act	Article 34 (Risk Assessment)	Supports compliance
EU Digital Services Act	Article 40 (Data Access for Research)	Supports compliance
EU AI Act	Article 9 (Risk Management System)	Supports compliance
EU AI Act	Article 14 (Human Oversight)	Supports compliance
German NetzDG	Sections 1-3 (Content Removal Obligations)	Direct requirement
UK Online Safety Act	Sections 10-12 (Illegal Content Duties)	Direct requirement
Australian Online Safety Act	Division 2 (Removal Notices)	Supports compliance
NIST AI RMF	MAP 1.5 (Risk Identification)	Supports compliance
ISO 42001	Clause 8.4 (AI System Operation)	Supports compliance

EU Digital Services Act — Articles 14 and 16

Article 14 requires platforms to include in their terms of service information about content moderation policies, including the procedures, measures, and tools used for content moderation. A governed abuse taxonomy is the foundational artefact that gives substance to this requirement — without it, the platform cannot clearly communicate what content is prohibited, how violations are categorised, or what enforcement responses apply. Article 16 requires that every content moderation decision be accompanied by a clear and specific statement of reasons, including the legal or contractual ground relied upon and an explanation of why the content is considered incompatible with that ground. A structured taxonomy with unique identifiers, definitions, and severity tiers enables the generation of specific statements of reasons — the moderation agent can reference the taxonomy category and definition in the statement, providing users with a meaningful basis for understanding and contesting the decision. Without taxonomic structure, statements of reasons default to generic boilerplate that does not satisfy Article 16's specificity requirements.

EU Digital Services Act — Article 34

Article 34 requires Very Large Online Platforms and Very Large Online Search Engines to identify, analyse, and assess systemic risks arising from the design, functioning, and use of their services. The categories of systemic risk enumerated in Article 34(1) — dissemination of illegal content, negative effects on fundamental rights, negative effects on civic discourse, risks related to gender-based violence, protection of minors, and public health — directly correspond to categories in a well-governed abuse taxonomy. The taxonomy provides the classification framework through which systemic risk trends are identified and quantified. Rising volumes of certain abuse categories, shifts in severity distribution, and jurisdictional concentration patterns are all risk signals that the taxonomy makes measurable.

German NetzDG and UK Online Safety Act

NetzDG imposes statutory timelines for removal of "manifestly unlawful" content (24 hours) and other unlawful content (7 days). The UK Online Safety Act imposes duties regarding illegal content, content harmful to children, and content harmful to adults. Both frameworks require platforms to have category-specific content moderation processes — the abuse taxonomy must encode these categories with their jurisdiction-specific severity tiers and statutory removal timelines. Failure to map these obligations into the taxonomy results in the moderation agent applying incorrect timelines or failing to identify content that falls within statutory obligations.

EU AI Act — Articles 9 and 14

When the moderation agent qualifies as a high-risk AI system — which may apply in cases where moderation decisions materially affect users' access to public discourse or economic participation — Article 9 requires a risk management system that includes identification of known and foreseeable risks. The abuse taxonomy is a risk identification artefact that enumerates the known harm types the agent is designed to address. Article 14 requires human oversight measures — the taxonomy's escalation triggers and specialist review routing per AG-691 are the mechanisms through which human oversight is structured. Without a governed taxonomy, the organisation cannot demonstrate that the agent's risk management system is comprehensive or that human oversight is structured around a principled classification framework.

NIST AI RMF — MAP 1.5

MAP 1.5 in the NIST AI Risk Management Framework addresses the identification of risks that the AI system might pose, including risks to individuals, communities, and society. The abuse taxonomy is the primary instrument through which the organisation identifies and categorises the risks of moderation error — both the risk of failing to detect genuine abuse (under-enforcement) and the risk of incorrectly sanctioning legitimate expression (over-enforcement). A well-governed taxonomy with severity tiers, jurisdictional mappings, and proportionate enforcement responses demonstrates that the organisation has systematically mapped its risk landscape.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Platform-wide — affects every moderation decision made by the agent across all abuse categories, user populations, and operating jurisdictions

Consequence chain: Failure to maintain a governed abuse taxonomy produces a cascading failure across the entire moderation function. The first-order failure is taxonomic incompleteness or staleness — the agent lacks categories for emerging abuse vectors or applies outdated definitions that no longer match the current threat landscape. The second-order failure is systematic misclassification at scale — thousands or tens of thousands of reports are dismissed because the agent cannot classify the abuse, or users receive disproportionate enforcement because severity tiers are absent or miscalibrated. The third-order failure is divergent outcomes across jurisdictions — the agent applies legally incorrect standards in jurisdictions where the taxonomy does not encode local legal requirements, producing regulatory violations in some jurisdictions and inadequate user protection in others. The fourth-order failure is loss of user and regulatory trust — affected users lose confidence in the platform's ability to protect them, driving attrition among the users most vulnerable to abuse; regulators in multiple jurisdictions initiate enforcement proceedings; civil society organisations publish evidence of systematic moderation failures. The fifth-order failure is remediation at scale — the organisation must retrospectively review thousands of moderation decisions made under the deficient taxonomy, re-process appeals, compensate affected users, and demonstrate to regulators that the taxonomy governance deficiency has been corrected. For major platforms, the governed exposure from taxonomy governance failure is substantial: regulatory fines under the Digital Services Act can reach up to 6% of global annual turnover (Article 52), NetzDG fines can reach EUR 50 million, and UK Online Safety Act fines can reach GBP 18 million or 10% of qualifying worldwide revenue. Beyond fines, the reputational damage from documented moderation failures — particularly those affecting vulnerable communities — produces long-term advertiser hesitancy and user attrition that compounds over multiple quarters.

Cross-references: AG-001 (Operational Boundary Enforcement) ensures the moderation agent operates within its defined scope — the abuse taxonomy defines one critical dimension of that scope by specifying what the agent is authorised to classify and enforce. AG-007 (Governance Configuration Control) governs the configuration management process that the taxonomy — as a governance configuration artefact — must follow. AG-019 (Human Escalation & Override Triggers) defines when moderation decisions should escalate to human reviewers — the taxonomy's escalation triggers per Requirement 4.6 must align with AG-019's escalation framework. AG-022 (Behavioural Drift Detection) monitors whether the agent's classification behaviour drifts over time — taxonomy changes are a legitimate source of classification shifts, but drift without corresponding taxonomy changes indicates agent degradation. AG-029 (Data Classification Enforcement) governs the classification of data processed during moderation, including sensitive personal data in abuse reports and victim testimony. AG-033 (Consent Lifecycle Governance) is relevant where moderation involves processing user data beyond the original consent scope. AG-055 (Audit Trail Immutability & Completeness) governs the integrity of the moderation decision logs required by Requirement 4.7. AG-210 (Multi-Jurisdictional Regulatory Mapping) provides the jurisdictional regulatory framework that the taxonomy's jurisdictional overlays per Requirement 4.5 must encode. AG-419 (Incident Classification & Severity Assignment) defines the incident severity framework that should align with the taxonomy's severity tiers to ensure consistent classification language across moderation and incident response functions. AG-691 (Escalation to Specialist Review) defines the specialist review pathways that the taxonomy's escalation triggers route cases to. AG-694 (Victim Support Routing) defines the support pathways that the taxonomy's victim support routing criteria direct affected users toward. AG-695 (Repeat-Offender Linkage) depends on consistent taxonomic classification across an offender's history to detect escalating patterns. AG-697 (Cross-Platform Threat Intelligence) provides external intelligence on emerging abuse vectors that should inform taxonomy review cycles.

Cite this protocol

AgentGoverning. (2026). AG-689: Abuse Taxonomy Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-689

← Previous Protocol

AG-688

Foreclosure and Eviction Escalation Governance

Next Protocol →

AG-690

Marketplace Integrity Governance