AG-698

Emergency Harm Response Governance

Community Platforms, Trust & Safety ~28 min read AGS v2.1 · April 2026
EU AI Act GDPR NIST ISO 42001

2. Summary

Emergency Harm Response Governance requires that AI agents operating on community platforms, messaging services, crisis hotlines, and public-facing digital channels detect, classify, and escalate imminent signals of self-harm, suicide, violence, or other life-threatening crisis situations within strictly defined time bounds and through verified routing pathways. The dimension addresses the entire chain from signal detection through human specialist handoff, ensuring that no crisis signal is silently dropped, delayed beyond safety-critical thresholds, or routed to an unqualified handler. Without governed emergency harm response, an agent may encounter explicit statements of suicidal intent, credible threats of violence, or indicators of ongoing abuse and either fail to recognise the signal, respond with generic content, or route the case into a standard moderation queue where it waits hours or days — any of which can contribute to preventable death or serious injury.

3. Example

Scenario A — Suicide Signal Routed to General Moderation Queue: A social media platform deploys an AI agent to moderate user-generated content and respond to direct messages. A 17-year-old user sends a direct message to the platform's support channel stating: "I've written my note. I have the pills. Nobody will care after tonight." The agent classifies the message as a content policy violation under "self-harm references" — the same category used for song lyrics and fictional writing containing self-harm themes. The message enters the general moderation queue with a target review time of 24 hours. A human moderator reviews the message 19 hours later, recognises the acute crisis, and attempts to escalate — but the user's account has been inactive for 16 hours. Local emergency services are contacted but cannot locate the individual for 3 additional hours. The user is found alive but hospitalised after a suicide attempt that occurred approximately 2 hours after the original message.

What went wrong: The agent's classification taxonomy did not distinguish between fictional or artistic self-harm references and acute crisis signals with temporal immediacy indicators ("tonight," "I have the pills," "written my note"). No priority routing existed for messages matching acute crisis patterns. The 24-hour general moderation SLA was applied to a signal requiring response within minutes, not hours. The platform had no integration with crisis intervention services that could have been activated immediately upon detection. Consequence: A preventable suicide attempt, hospitalisation, potential civil liability for the platform, regulatory scrutiny under the UK Online Safety Act's duty of care provisions, and a £1.8 million settlement with the family.

Scenario B — Violence Threat Dismissed as Hyperbole: A community gaming platform uses an AI agent to monitor voice-to-text transcriptions and chat messages for policy violations. A user posts in a public channel: "I know where [username] lives. 1247 Oak Street. I'm going there tonight and they will regret everything. I have my father's gun." The agent scores the message at 0.62 on its threat assessment scale (threshold for escalation: 0.75) because the training data associates gaming communities with hyperbolic language, and the phrase "they will regret everything" appears frequently in non-threatening gaming contexts. The message is logged as a low-priority policy violation. No escalation occurs. 14 hours later, the targeted user reports a physical assault at their home to local police. The police investigation discovers the platform message during evidence collection. The platform is subpoenaed, and the investigation reveals that the agent's threat model systematically under-scored threats containing personally identifiable information (a home address) and specific weapon references when they occurred in gaming contexts.

What went wrong: The agent's threat scoring model applied a context-dependent discount for gaming communities that suppressed genuine threat signals. The model did not incorporate signal amplifiers — the presence of a specific physical address and a specific weapon reference should have escalated the score regardless of community context. No rule-based override existed for messages containing PII of a third party combined with violence language. No human review was triggered for messages scoring between 0.5 and 0.75 despite this range containing the highest density of false negatives in the platform's own validation data. Consequence: Physical assault, criminal investigation involving the platform, subpoena of platform records, £3.2 million civil lawsuit from the victim, and regulatory investigation under the EU Digital Services Act's systemic risk obligations.

Scenario C — Crisis Chatbot Provides Generic Response During Active Self-Harm: A government mental health service deploys an AI agent as the first point of contact on its crisis text line, designed to provide immediate support while routing callers to trained counsellors. A user texts: "I've already cut myself. There's a lot of blood. I don't know what to do." The agent responds with its standard self-harm acknowledgement template: "I hear that you're going through a difficult time. Here are some resources that may help: [link to self-help guide], [link to appointment booking]. If you feel you are in immediate danger, please call emergency services." The response treats an active medical emergency as a future hypothetical ("if you feel you are in immediate danger"). The agent does not flag the conversation for immediate human takeover, does not initiate an automated welfare check, and does not directly connect the user to emergency services. The user does not call emergency services. A counsellor picks up the conversation 22 minutes later and immediately initiates emergency protocols, but the user has become non-responsive. Emergency services locate the user 41 minutes after the initial message; the user survives but requires emergency surgery and a 9-day hospitalisation.

What went wrong: The agent did not distinguish between ideation-stage crisis signals (which may appropriately receive resource links) and active-emergency signals (which require immediate human intervention and potential emergency services activation). The template response included a conditional clause ("if you feel you are in immediate danger") applied to a situation where the user had explicitly described an active medical emergency. No automated escalation triggered immediate human takeover when active harm indicators were detected. The 22-minute counsellor response time exceeded the medically critical window for the injury described. Consequence: Life-threatening medical emergency with delayed intervention, £2.4 million negligence claim against the government agency, parliamentary inquiry into the digital mental health service, and suspension of the AI-first triage model pending redesign.

4. Requirement Statement

Scope: This dimension applies to any AI agent that may encounter signals of imminent self-harm, suicide, violence against others, child exploitation, human trafficking, or other life-threatening or safety-critical crisis situations during its normal operation. The scope includes — but is not limited to — content moderation agents, customer service agents, chatbots operating on social media or messaging platforms, crisis line triage agents, community management agents, and any agent processing user-generated text, voice, image, or video content where crisis signals may appear. The scope extends to agents that operate as intermediaries or first points of contact where a human in crisis may disclose their situation. The dimension governs the detection of crisis signals, the classification of signal severity and temporal urgency, the routing of crisis cases to qualified human responders or emergency services, the time-bound guarantees for response, and the governance of the entire escalation chain. Organisations that deploy agents in contexts where crisis signals are unlikely but possible (e.g., e-commerce customer service) are not exempt — they must implement at minimum a basic detection and routing capability. The scope explicitly includes cross-jurisdictional considerations, as crisis signals may originate from any jurisdiction and emergency service routing must account for the user's location, not the platform's headquarters.

4.1. A conforming system MUST implement crisis signal detection capable of identifying explicit and contextually implied indicators of imminent self-harm, suicide, violence against others, child exploitation, and other life-threatening situations, with detection covering text, and where applicable, voice, image, and video modalities processed by the agent.

4.2. A conforming system MUST classify detected crisis signals into at least three severity tiers — active emergency (harm is occurring or imminent within minutes), acute risk (credible indicators of harm within hours), and elevated concern (indicators requiring professional assessment but no immediate threat to life) — with documented classification criteria and examples for each tier.

4.3. A conforming system MUST route active-emergency signals to a qualified human responder or emergency services integration within 60 seconds of detection, with no intermediate queuing that could delay the handoff beyond this threshold.

4.4. A conforming system MUST route acute-risk signals to a qualified human responder within 5 minutes of detection, with priority queuing that ensures these cases are handled before non-crisis workload.

4.5. A conforming system MUST maintain a fallback escalation pathway that activates automatically when the primary routing pathway is unavailable, degraded, or exceeds its response time guarantee, ensuring that no crisis signal is dropped due to a single point of failure.

4.6. A conforming system MUST suppress generic or template responses when an active-emergency signal is detected, replacing them with crisis-specific responses that include direct emergency services contact information appropriate to the user's jurisdiction, explicit acknowledgement of the crisis, and a clear statement that a human specialist is being connected.

4.7. A conforming system MUST log every crisis signal detection, classification decision, routing action, response time, and human handoff event with immutable timestamps, retaining these records in compliance with jurisdiction-specific requirements and for a minimum of 7 years.

4.8. A conforming system MUST implement false-negative monitoring by conducting periodic retrospective reviews of a statistically valid sample of non-escalated content to identify crisis signals that the detection system missed, with findings used to retrain or reconfigure the detection system.

4.9. A conforming system SHOULD implement location-aware emergency service routing that identifies the user's probable jurisdiction from available signals (IP geolocation, account registration data, language, timezone, explicit user-provided location) and routes to the appropriate local emergency services or crisis intervention resources.

4.10. A conforming system SHOULD integrate with established crisis intervention organisations (e.g., national suicide prevention lifelines, child protection hotlines, domestic violence services) through verified, tested API or telephony connections, with connectivity validated at least monthly.

4.11. A conforming system SHOULD implement signal amplifier rules — hard-coded overrides that escalate classification severity when specific high-confidence indicators are present (e.g., named weapons, specific addresses, named pharmaceuticals with dosage quantities, references to specific dates or times for planned harm) regardless of the base model's confidence score.

4.12. A conforming system MAY implement proactive welfare check protocols that, with appropriate authorisation and consent frameworks, initiate contact with users who exhibited crisis signals but disengaged before human handoff was completed.

4.13. A conforming system MAY implement anonymous reporting mechanisms that allow community members to flag crisis signals observed in other users' content, with these reports entering the same priority routing as agent-detected signals.

5. Rationale

Emergency harm response represents the highest-stakes failure mode in community platform and trust-and-safety operations. Unlike content moderation errors that result in user dissatisfaction, policy inconsistency, or reputational damage, failures in emergency harm response can directly contribute to preventable death or serious physical injury. The time-critical nature of these situations — where minutes determine outcomes — makes this fundamentally different from other governance dimensions where hours or days of response time are acceptable.

The threat model for emergency harm response encompasses several distinct failure modes. First, detection failure: the agent does not recognise a crisis signal because its training data or classification rules do not cover the signal's form. Crisis signals range from explicit ("I'm going to kill myself tonight") to contextually implied ("I've given away all my things, written letters to everyone, and feel at peace for the first time") to coded (using community-specific language or euphemisms). Detection must cover this spectrum without generating an unmanageable volume of false positives that desensitise human responders. Second, classification failure: the agent detects a crisis-relevant signal but assigns it incorrect severity, treating an active emergency as an elevated concern or vice versa. The consequences of under-classification (delayed response to an active emergency) are catastrophic and irreversible; the consequences of over-classification (unnecessary emergency response) are costly but recoverable. This asymmetry mandates conservative classification — when in doubt, classify higher. Third, routing failure: the signal is correctly detected and classified but routed to an inappropriate handler — a general moderation queue, an untrained first-tier support agent, or a crisis resource in the wrong jurisdiction. Fourth, response failure: the human handoff succeeds, but the agent's interim response (the message the user sees while waiting for a human) is inappropriate — generic, dismissive, conditional, or counterproductive. During the seconds or minutes before a human arrives, the agent's response is the only communication the user receives; it must not make the situation worse.

The duty-of-care framework underlying this dimension is increasingly codified in law. The UK Online Safety Act 2023 imposes duties on platforms to protect users, particularly children, from harm — including self-harm and suicide content. The EU Digital Services Act requires very large online platforms to assess and mitigate systemic risks, explicitly including risks to mental health and physical safety. In the United States, Section 230 safe harbour does not extend to platforms that have actual knowledge of specific threats and fail to act. Courts have increasingly found that platforms owe a duty of care when they have deployed AI systems that process crisis signals — the deployment of the AI system is itself evidence that the platform has the technical capability to detect and respond, which strengthens the duty.

False positive management is a critical design tension. An overly sensitive detection system will flood human responders with non-crisis cases, degrading their capacity to handle genuine emergencies and contributing to alert fatigue that undermines the entire response system. The governance framework must balance detection sensitivity against responder capacity, using tiered classification to ensure that the highest-severity tier remains manageable while lower tiers absorb the higher-volume, lower-urgency cases. The 60-second SLA for active emergencies is achievable only if the active-emergency tier is not overwhelmed by false positives from lower tiers.

Cross-jurisdictional complexity adds a further dimension. A user in crisis in Germany messaging a platform headquartered in the United States requires routing to German emergency services (112), not US services (911). The agent must determine the user's probable location with sufficient confidence to route correctly, and must have the routing infrastructure to connect to emergency services or crisis organisations across all jurisdictions where it operates. This is not a theoretical concern — platforms routinely receive crisis signals from users in every country where they have users, and routing to the wrong jurisdiction's emergency services can add critical minutes or hours to response time.

6. Implementation Guidance

Emergency harm response governance requires a layered detection-classification-routing-response architecture operating under strict time-bound guarantees with redundant pathways. The system must function under degraded conditions — network failures, responder unavailability, model inference latency — because crisis signals do not wait for optimal operating conditions.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Social Media and Community Platforms. Platforms operating at scale face the highest volume of crisis signals — a platform with 100 million daily active users may encounter thousands of crisis-relevant signals per day. Detection must operate at inference latencies compatible with real-time content processing. Platforms subject to the EU Digital Services Act's systemic risk obligations must demonstrate that their crisis response systems are proportionate to the risk. The UK Online Safety Act imposes specific duties regarding self-harm and suicide content, with Ofcom empowered to enforce compliance.

Government and Public Sector. Government agencies deploying AI agents on citizen-facing channels — benefits services, health information portals, civic engagement platforms — must account for the fact that citizens in crisis may contact any government channel, not only the designated crisis service. A benefits chatbot that encounters a suicidal user must be capable of crisis response even though crisis response is not its primary function. Public sector organisations also face heightened accountability — a government agency that deploys an AI agent and fails to respond to a crisis signal faces parliamentary and media scrutiny that private companies may not.

Healthcare and Mental Health. Organisations operating AI agents in healthcare or mental health contexts face the highest concentration of crisis signals and the most stringent duty of care. Clinical crisis response protocols must align with established safeguarding frameworks and mandatory reporting requirements. The agent must not practice medicine — but it must recognise when a medical emergency is occurring and ensure that a qualified human is connected without delay.

Cross-Border Platforms. Platforms operating across jurisdictions must maintain emergency service mappings and crisis organisation partnerships for every jurisdiction where they have users. A crisis signal from a user in Japan requires routing to Japanese emergency services (119) with Japanese-language crisis resources, regardless of the platform's country of incorporation. Multi-jurisdictional regulatory mapping (AG-210) is essential for determining reporting obligations, which vary significantly — some jurisdictions mandate reporting of self-harm signals to authorities, others prohibit it without consent.

Maturity Model

Basic Implementation — The organisation has implemented crisis signal detection covering explicit textual indicators of self-harm, suicide, and violence. Detected signals are classified into at least three severity tiers. Active-emergency signals are routed to a human responder within 60 seconds. A fallback escalation pathway exists and is tested quarterly. Generic template responses are suppressed for active emergencies. Crisis event logs are maintained with immutable timestamps. All mandatory requirements (4.1 through 4.8) are satisfied.

Intermediate Implementation — All basic capabilities plus: detection covers contextual and implied signals in addition to explicit indicators. Location-aware routing connects users to jurisdiction-appropriate emergency services and crisis organisations. Signal amplifier rules override model confidence scores when high-confidence indicators are present. Integrations with established crisis intervention organisations are maintained and tested monthly. Retrospective false-negative auditing operates on a statistically valid sample with findings driving detection improvements. Crisis response metrics (detection rates, classification accuracy, routing times, false-negative rates) are reported to senior leadership monthly.

Advanced Implementation — All intermediate capabilities plus: multimodal detection covers voice, image, and video in addition to text. Proactive welfare check protocols contact users who disengaged before human handoff. Predictive models identify escalating risk trajectories before explicit crisis signals emerge. Cross-platform threat intelligence sharing (AG-697) enables detection of crisis signals that span multiple platforms. Independent annual audit validates detection coverage, routing reliability, and response time compliance. The crisis response system operates with sub-30-second active-emergency routing and has demonstrated resilience under simulated infrastructure failures.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Crisis Signal Detection Coverage

Test 8.2: Severity Classification Accuracy

Test 8.3: Active-Emergency Routing Time

Test 8.4: Acute-Risk Routing Time

Test 8.5: Fallback Pathway Activation

Test 8.6: Template Suppression for Active Emergencies

Test 8.7: Immutable Crisis Event Logging

Test 8.8: False-Negative Monitoring Programme

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
UK Online Safety Act 2023Section 12 (Safety Duties — Illegal Content)Direct requirement
UK Online Safety Act 2023Section 13 (Safety Duties — Children)Direct requirement
EU Digital Services ActArticle 34 (Risk Assessment)Supports compliance
EU Digital Services ActArticle 35 (Mitigation of Risks)Supports compliance
EU AI ActArticle 9 (Risk Management System)Supports compliance
EU AI ActArticle 14 (Human Oversight)Direct requirement
NIST AI RMFMANAGE 4.1 (Post-deployment Monitoring)Supports compliance
ISO 42001Clause 8.4 (AI System Operation)Supports compliance
US FOSTA-SESTA18 U.S.C. § 1591 (Sex Trafficking)Supports compliance
GDPRArticle 6(1)(d) (Vital Interests)Enables lawful processing

UK Online Safety Act 2023 — Sections 12 and 13

The Online Safety Act imposes duties on providers of user-to-user services to take proportionate steps to prevent individuals from encountering illegal content and to protect children from harmful content, including self-harm and suicide content. Section 12 requires platforms to operate systems and processes designed to minimise the presence of illegal content, which includes content constituting threats to life. Section 13 imposes additional duties regarding children's safety, including proactive measures to prevent children from encountering content that encourages self-harm or suicide. An AI agent that fails to detect and escalate crisis signals — particularly from users who may be minors — directly undermines the platform's compliance with these duties. Ofcom, as the regulator, has the power to impose fines of up to £18 million or 10% of global revenue, whichever is higher, for non-compliance.

EU Digital Services Act — Articles 34 and 35

Articles 34 and 35 require very large online platforms (those with 45 million or more EU users) to assess systemic risks and implement reasonable, proportionate, and effective mitigation measures. Systemic risks explicitly include risks to public security and negative effects on mental health. An AI moderation agent that fails to respond to crisis signals is itself a source of systemic risk — it processes the signal but fails to act, which is worse than not processing it at all because it creates the appearance of protection without the substance. Risk mitigation under Article 35 must include crisis response capabilities proportionate to the platform's scale.

EU AI Act — Article 14 (Human Oversight)

When AI agents are used to process crisis signals, Article 14's human oversight requirement takes on acute importance. The agent must not be the sole decision-maker in a life-threatening situation. Human oversight must be immediate (within the 60-second SLA for active emergencies), not deferred. The routing architecture mandated by this dimension ensures that the AI agent serves as a detection and triage layer while final intervention authority rests with qualified humans.

GDPR — Article 6(1)(d) (Vital Interests)

Article 6(1)(d) provides a lawful basis for processing personal data when "processing is necessary in order to protect the vital interests of the data subject or of another natural person." This provision explicitly enables platforms to process and share personal data (including location data, account information, and communication content) with emergency services when a genuine threat to life is detected. This addresses the tension between data protection obligations and emergency response: a platform that fails to share necessary data with emergency services during a crisis cannot invoke data protection as a justification when the GDPR itself provides the lawful basis for such sharing. Organisations must document their Article 6(1)(d) processing protocols in advance so that crisis response is not delayed by data protection uncertainty.

US FOSTA-SESTA — 18 U.S.C. § 1591

FOSTA-SESTA amended Section 230 to exclude from safe harbour protection platforms that facilitate sex trafficking. Platforms using AI agents to process user content must detect signals of trafficking and exploitation, including signals embedded in crisis communications. An agent that encounters a trafficking signal and fails to escalate it exposes the platform to federal criminal liability. Emergency harm response governance must include trafficking indicators within its detection scope.

NIST AI RMF — MANAGE 4.1

MANAGE 4.1 addresses post-deployment AI system monitoring, including monitoring for incidents and emergent risks. Emergency harm response represents the most severe category of AI deployment incident — where the system's failure to respond to a crisis signal directly contributes to harm. The monitoring and retrospective auditing requirements in this dimension satisfy MANAGE 4.1's post-deployment monitoring expectations for crisis-relevant AI systems.

10. Failure Severity

FieldValue
Severity RatingCritical
Blast RadiusUser safety — affects every individual who may disclose a crisis situation to the agent, with potential for irreversible physical harm or death

Consequence chain: Without emergency harm response governance, the agent encounters a crisis signal and either fails to detect it (classification failure) or detects it but routes it through standard pathways (routing failure). The immediate consequence is delayed or absent human intervention for a person in immediate danger. For self-harm and suicide signals, the critical intervention window is measured in minutes — delays of even 15-30 minutes can be the difference between a successful intervention and a fatality. For violence threats, failure to escalate may result in a preventable assault, and the platform's possession of the unescalated signal becomes evidence of negligence in subsequent civil and criminal proceedings. For child exploitation signals, failure to escalate violates mandatory reporting obligations in virtually every jurisdiction, exposing the organisation to criminal liability. The second-order consequence is systemic: users and the public learn that the platform's AI-mediated crisis response is unreliable, which may deter future help-seeking behaviour — a person in crisis who believes the platform will not respond may not reach out at all, or may reach out to an unmoderated channel where no safety infrastructure exists. The regulatory consequence is severe and multi-jurisdictional: UK Online Safety Act fines up to 10% of global revenue, EU Digital Services Act enforcement actions, and potential criminal liability under FOSTA-SESTA or jurisdiction-specific mandatory reporting statutes. The reputational consequence is existential: media coverage of a preventable death linked to an AI agent's failure to escalate a crisis signal is among the most damaging events a technology platform can experience, with long-lasting effects on user trust, regulatory relationships, and political environment.

Cross-references: AG-019 (Human Escalation & Override Triggers) defines the general framework for human escalation; this dimension specifies the time-bound, life-safety-critical specialisation of that framework for emergency harm signals. AG-001 (Operational Boundary Enforcement) ensures the agent operates within defined boundaries; crisis response requires the agent to override normal operational boundaries (e.g., standard response templates) when a crisis is detected. AG-008 (Governance Continuity Under Failure) ensures governance mechanisms persist under infrastructure failure; for emergency harm response, continuity under failure is a life-safety requirement, not merely a governance requirement. AG-022 (Behavioural Drift Detection) monitors for changes in the agent's detection behaviour over time that could degrade crisis signal sensitivity. AG-029 (Data Classification Enforcement) governs the classification of crisis-related data, which includes highly sensitive personal information processed under vital interests grounds. AG-033 (Consent Lifecycle Governance) and AG-037 (Anonymisation & Pseudonymisation Governance) must accommodate the vital interests exception that permits data sharing with emergency services without prior consent. AG-040 (Sensitive Category Data Processing Governance) governs the processing of health and mental health data inherent in crisis signals. AG-055 (Audit Trail Immutability & Completeness) provides the evidentiary foundation for crisis event logging. AG-210 (Multi-Jurisdictional Regulatory Mapping) maps the reporting obligations and emergency service routing requirements across jurisdictions. AG-419 (Incident Classification & Severity Assignment) provides the incident classification framework that crisis severity tiers must align with. AG-420 (Automated Containment Action Governance) governs the automated actions taken upon crisis detection, such as content preservation and routing activation. AG-689 (Abuse Taxonomy Governance) defines the classification taxonomy within which crisis signals are categorised. AG-691 (Escalation to Specialist Review Governance) governs the escalation pathways that crisis routing uses. AG-694 (Victim Support Routing Governance) ensures that crisis survivors are connected to appropriate support services after the immediate emergency is resolved. AG-697 (Cross-Platform Threat Intelligence Governance) enables sharing of threat signals across platforms when a crisis spans multiple services.

Cite this protocol
AgentGoverning. (2026). AG-698: Emergency Harm Response Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-698