AG-436

Abuse-at-Scale Detection Governance

Security, Adversarial Abuse & Threat Operations ~25 min read AGS v2.1 · April 2026
EU AI Act GDPR SOX FCA NIST ISO 42001

2. Summary

Abuse-at-Scale Detection Governance requires that organisations operating AI agent estates implement detection capabilities specifically designed to identify coordinated, high-volume, or automated exploitation of agent systems by botnets, coordinated human actor networks, or hybrid attack campaigns that combine automated and human-directed activity. Individual agent abuse — a single attacker probing a single agent — is addressed by prompt integrity, input validation, and rate governance controls. Abuse-at-scale is qualitatively different: it involves systematic exploitation across multiple agents, sessions, accounts, or time periods, often using automation to achieve volume that a single human attacker could not. Without scale-aware detection, an organisation may successfully defend each individual agent session while failing to recognise that thousands of sessions constitute a coordinated campaign extracting training data, mapping safety boundaries, generating prohibited content at industrial volume, or exploiting agent capabilities to conduct fraud across the entire estate.

3. Example

Scenario A — Botnet-Driven Credential Stuffing Through Agent Conversational Interface: A financial services firm deploys a customer-facing agent that can check account balances, initiate transfers, and reset passwords when customers provide identity verification information. An attacker operates a botnet of 12,000 compromised residential IP addresses. Each bot initiates a conversational session with the agent, presents a different stolen identity (from a breached database of 2.3 million credentials), and attempts the identity verification flow. The botnet is engineered to mimic human conversational patterns — variable typing speeds, natural pauses, occasional typos — to evade simple bot detection. Each individual session appears legitimate: a customer asking to check their account balance. But across the estate, 12,000 sessions within a 4-hour window are all following the same conversational pattern with different identities. The per-session rate limiter (AG-004) is satisfied because each session generates only 3-5 requests. The per-account rate limiter is satisfied because each identity is used only once. No individual session triggers any anomaly. Over the 4-hour window, 847 sessions successfully pass identity verification, and the attacker initiates £2.1 million in transfers before the fraud team detects the activity through downstream transaction monitoring — 14 hours after the campaign began. The transfers are distributed across 340 recipient accounts. Recovery is limited to £380,000; net loss is £1.72 million. The regulatory investigation reveals the firm had no mechanism to detect that 12,000 sessions with statistically similar conversational patterns constituted a coordinated attack rather than normal customer activity.

What went wrong: Per-session and per-account controls were satisfied because the attack was distributed. Rate limiting at the individual session level could not detect a campaign operating at the estate level. The firm lacked aggregate pattern detection that would identify: (a) statistically similar conversational flows across thousands of sessions, (b) an anomalous spike in identity verification attempts, (c) geographic or temporal clustering inconsistent with normal customer behaviour, and (d) the coordination signal — 12,000 sessions executing near-identical conversational strategies within a compressed time window.

Scenario B — Distributed Jailbreak Campaign for Prohibited Content Generation: A content generation agent deployed by a media platform has safety guardrails preventing the generation of extremist recruitment material. A coordinated network of 3,200 accounts — some automated, some operated by human participants recruited through extremist forums — conducts a systematic campaign to map the agent's safety boundaries and discover jailbreak techniques that bypass the guardrails. Phase 1 (weeks 1-2): 800 accounts each submit 20-30 prompts testing variations of prohibited requests, systematically probing the boundary between permitted and prohibited content. Phase 2 (weeks 3-4): accounts share successful partial bypasses and refine techniques. Phase 3 (weeks 5-6): 2,400 accounts use the discovered jailbreak techniques to generate extremist content at industrial volume — approximately 45,000 pieces of content over 14 days. Each individual account generates modest volumes (14-19 pieces), well below the per-account content generation limit. No single account's activity triggers any alert. The platform discovers the campaign only when external researchers identify a pattern of extremist content traceable to the platform, 8 weeks after the campaign began. The reputational damage is severe: press coverage of "AI platform mass-produces extremist content," advertiser withdrawal worth £8.3 million in lost revenue, and regulatory investigation under the EU Digital Services Act.

What went wrong: Per-account monitoring detected no anomaly because individual account activity was within normal bounds. The organisation had no mechanism to detect: (a) the systematic boundary-probing behaviour in Phase 1, where hundreds of accounts submitted structurally similar prompt variations targeting the same safety boundary; (b) the temporal progression from probing to exploitation across the account network; (c) the statistical similarity of generated content across 2,400 accounts — all producing variations on the same prohibited themes using the same jailbreak technique. The attack exploited the gap between per-session security (which was functional) and estate-level pattern detection (which did not exist).

Scenario C — Coordinated Model Extraction Through Agent API: A Crypto/Web3 agent provides market analysis and trading signal generation. A competing firm operates a coordinated extraction campaign using 5,600 accounts created with synthetic identities across 47 jurisdictions. Each account submits carefully crafted queries designed to elicit the agent's proprietary trading logic — not by directly requesting the logic, but by submitting thousands of hypothetical market scenarios and recording the agent's recommended actions. The queries are engineered to be maximally informative: each query probes a different region of the decision space, and the collective query set constitutes a systematic sampling of the agent's decision boundary. Over 6 months, the campaign submits 2.8 million queries across the 5,600 accounts (an average of 2.8 queries per account per day — indistinguishable from normal user behaviour). The extracted decision boundary is used to reconstruct a competing model that replicates 89% of the original agent's trading decisions. The organisation discovers the extraction only when the competing product launches with suspiciously similar performance characteristics. The estimated value of the extracted intellectual property is £14 million. Litigation costs an additional £3.2 million, and the competitive advantage of the proprietary model is permanently destroyed.

What went wrong: Per-account query volume was normal. Per-session behaviour was unremarkable. No individual query was suspicious. The attack was detectable only at the aggregate level: (a) the query distribution across all accounts showed a systematic, maximally-informative sampling pattern inconsistent with natural user curiosity; (b) the 5,600 accounts showed registration patterns suggesting synthetic identity generation (similar registration timing, jurisdiction distribution inconsistent with customer demographics); (c) the collective query set, when analysed as a single corpus, revealed an obvious systematic extraction methodology. The organisation had no aggregate query analysis that would detect these estate-level patterns.

4. Requirement Statement

Scope: This dimension applies to any organisation operating more than one AI agent instance or any single agent accessible to more than one user or session. The scope is the agent estate — the totality of agent instances, sessions, accounts, and interactions across the organisation. Per-session and per-account security controls are assumed to exist (per AG-004, AG-003, and related dimensions) and are not replaced by this dimension. AG-436 addresses the detection gap that exists between individual session security and estate-level pattern recognition. The scope includes all interaction channels — conversational interfaces, API endpoints, tool invocations, inter-agent communications, and any other pathway through which external actors interact with the agent estate. The scope extends to detection of coordinated human actors, automated botnets, hybrid human-bot campaigns, and internal abuse by authorised users operating at anomalous scale. Organisations operating a single agent instance accessible to a single user may defer this dimension, but must re-evaluate applicability whenever access is expanded.

4.1. A conforming system MUST implement estate-level aggregate monitoring that analyses interaction patterns across all agents, sessions, accounts, and time periods to detect coordinated or automated abuse campaigns that are invisible at the individual session level.

4.2. A conforming system MUST define baseline behavioural profiles for normal interaction patterns at the estate level, including: aggregate session volume by time period, conversational pattern distributions, query topic distributions, account creation and activity rates, geographic and temporal access patterns, and content generation volumes by category.

4.3. A conforming system MUST implement anomaly detection that identifies statistically significant deviations from estate-level baselines, including: (a) volume anomalies — unusual spikes in sessions, queries, or content generation; (b) pattern anomalies — clusters of sessions following statistically similar interaction flows; (c) account anomalies — registration patterns suggesting synthetic identity generation or coordinated account creation; (d) temporal anomalies — activity patterns inconsistent with human behaviour (e.g., sustained high-frequency interaction with no breaks); and (e) content anomalies — generated content clustering around specific prohibited or sensitive topics across multiple accounts.

4.4. A conforming system MUST implement automated alerting when estate-level anomaly detection thresholds are breached, with alerts routed to security operations within a maximum latency defined by risk tier: 15 minutes for critical-tier agents, 60 minutes for high-risk agents, 4 hours for standard agents.

4.5. A conforming system MUST maintain the capability to correlate activity across accounts, sessions, IP addresses, client fingerprints, and temporal patterns to identify coordinated campaigns operating through distributed identities.

4.6. A conforming system MUST implement graduated response capabilities that can be activated when abuse-at-scale is detected, including: (a) enhanced monitoring for the affected agent or interaction pattern; (b) increased authentication or verification requirements; (c) rate reduction or temporary suspension for affected accounts or session clusters; and (d) estate-wide defensive posture escalation.

4.7. A conforming system MUST conduct post-incident analysis for every detected abuse-at-scale campaign, producing a documented analysis that includes: campaign scope (accounts, sessions, time period, volume), attack methodology, detection timeline (when did the campaign begin vs. when was it detected), impact assessment, and defensive improvement recommendations.

4.8. A conforming system SHOULD implement behavioural clustering algorithms that group sessions by interaction pattern similarity, enabling detection of coordinated campaigns where individual sessions are benign but the cluster pattern reveals coordination.

4.9. A conforming system SHOULD implement cross-agent correlation that detects campaigns spanning multiple agent types — e.g., an attacker probing safety boundaries on a low-risk internal copilot and applying discovered techniques against a high-risk customer-facing agent.

4.10. A conforming system SHOULD integrate abuse-at-scale detection with external threat intelligence feeds to identify known botnet infrastructure, compromised credential databases, and coordinated attack campaign indicators.

4.11. A conforming system MAY implement predictive detection that identifies emerging campaigns in their early phases (e.g., the boundary-probing Phase 1 in Scenario B) before the campaign reaches the exploitation phase, enabling pre-emptive defensive action.

5. Rationale

Abuse-at-scale is the natural evolution of adversarial attacks against AI agents as agent deployments grow from experimental single-instance deployments to production estates serving millions of users. The transition from individual attacks to coordinated campaigns mirrors the evolution of cybersecurity threats against traditional web applications — the same progression from manual exploitation to automated botnet-driven attacks, from single-account fraud to distributed credential stuffing, from individual vulnerability probing to systematic attack surface mapping.

AI agent estates face a specific vulnerability to scale attacks because of the fundamental asymmetry between per-session defences and estate-level visibility. Per-session controls — rate limiting, input validation, prompt integrity, output filtering — are designed to protect individual interactions. They evaluate each session in isolation. An attacker who distributes an attack across thousands of sessions, each of which individually appears benign, can bypass every per-session control while conducting a devastating campaign at the aggregate level. This is not a failure of per-session controls; it is a limitation of their architectural scope. AG-436 addresses this limitation by requiring detection capabilities that operate at the estate level, analysing patterns that only become visible when individual sessions are correlated.

The economic incentive for abuse-at-scale is substantial. A single jailbreak produces one piece of prohibited content; a coordinated campaign produces thousands. A single credential-stuffing session compromises one account; a botnet campaign compromises hundreds. A single extraction query yields minimal intellectual property; 2.8 million coordinated queries can reconstruct an entire proprietary model. The attacker's return on investment from scale attacks is orders of magnitude higher than from individual attacks, making scale attacks the preferred methodology for sophisticated adversaries.

Three categories of abuse-at-scale require distinct detection approaches. First, automated abuse by botnets: high-volume, machine-speed attacks using compromised infrastructure. Detection relies on temporal pattern analysis, client fingerprinting, and behavioural indicators of automation (consistent timing, identical error patterns, lack of human behavioural noise). Second, coordinated human abuse: networks of human operators conducting a campaign with shared objectives but individual execution. Detection relies on content similarity analysis, conversational pattern clustering, and temporal correlation that reveals coordination. Third, hybrid campaigns: automated infrastructure directed by human operators, combining the volume of bots with the adaptability of humans. Detection requires combining the indicators from both automated and coordinated human detection.

Regulatory expectations for scale-attack detection are increasing. The EU AI Act requires that high-risk AI systems be resilient to adversarial manipulation — a requirement that encompasses manipulation at scale, not only manipulation of individual sessions. Financial regulators expect fraud detection capabilities proportionate to the attack surface — an agent accessible to millions of users has an attack surface that demands estate-level monitoring. The EU Digital Services Act requires platforms to implement measures to address systemic risks, including the misuse of AI systems for the generation of prohibited content at scale.

6. Implementation Guidance

Abuse-at-scale detection requires a fundamentally different architectural approach from per-session security. Per-session controls operate inline — they evaluate each request as it arrives and make accept/reject decisions in real time. Estate-level detection operates on aggregated data — it collects interaction telemetry from across the estate, analyses patterns over time windows, and identifies anomalies that are invisible in any single session. The two approaches are complementary, not competing: per-session controls provide the first line of defence, and estate-level detection identifies campaigns that per-session controls cannot see.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. Financial agent estates are primary targets for credential stuffing, account takeover, and distributed fraud campaigns. Financial institutions should implement the most aggressive detection timelines (15-minute alerting latency), mandatory account creation anomaly detection, and integration with existing fraud detection systems. Cross-correlation between agent interaction patterns and traditional transaction fraud indicators provides powerful detection of campaigns that use agent conversational interfaces as the entry point for financial fraud.

Content Platforms. Platforms deploying content generation agents face the specific risk of coordinated content generation campaigns (Scenario B). Content similarity analysis across the estate — detecting clusters of generated content converging on specific topics or themes — is essential. Integration with content moderation systems and external content intelligence feeds enables detection of campaigns producing harmful content that individually passes content filters but collectively constitutes a systematic campaign.

Crypto/Web3. Crypto agents face heightened extraction risk because trading strategies and market analysis represent high-value intellectual property. Query distribution analysis is particularly important for detecting systematic extraction campaigns. Additional consideration: on-chain transaction analysis can reveal coordination among accounts that interact with both the agent and related blockchain protocols.

Public Sector. Government agents face the risk of coordinated manipulation campaigns that seek to influence benefits decisions, immigration processing, or other rights-affecting outcomes at scale. Scale detection must account for the possibility that coordinated campaigns target vulnerable populations whose applications may already appear similar, requiring careful calibration to avoid discriminatory false positives.

Maturity Model

Basic Implementation — The organisation aggregates interaction telemetry from all agents into a centralised analytics pipeline. Estate-level baselines are established for session volume, account creation rate, and query topic distribution. Anomaly detection triggers alerts when metrics deviate beyond defined thresholds. Alerts are routed to security operations within the required latency. Post-incident analysis is conducted for detected campaigns. Graduated response capabilities include manual account suspension and rate adjustment. This level detects volume-based attacks and obvious coordination patterns.

Intermediate Implementation — All basic capabilities plus: behavioural clustering groups sessions by interaction pattern similarity and detects coordination clusters. Account creation and lifecycle anomaly detection identifies synthetic identity patterns. Query distribution analysis detects systematic extraction campaigns. Cross-agent correlation detects campaigns spanning multiple agent types. Near-real-time detection operates on 5-minute sliding windows. Graduated response is partially automated with pre-defined playbooks. External threat intelligence feeds are integrated. Detection is tested quarterly against simulated scale attack scenarios.

Advanced Implementation — All intermediate capabilities plus: predictive detection identifies emerging campaigns in early phases before exploitation begins. Machine learning models continuously adapt baselines to seasonal and contextual variation while maintaining sensitivity to adversarial patterns. Full automation of graduated response with human-in-the-loop for escalation decisions. Real-time dashboards provide estate-level threat visibility. Red team exercises simulate novel scale attack methodologies not covered by known patterns. Cross-organisational threat intelligence sharing contributes to and benefits from collective detection capabilities. Detection latency for fast-moving campaigns (Scenario A) is under 30 minutes from campaign onset.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Distributed Session Pattern Detection

Test 8.2: Account Creation Anomaly Detection

Test 8.3: Query Distribution Extraction Detection

Test 8.4: Graduated Response Activation

Test 8.5: Cross-Agent Correlation Detection

Test 8.6: Baseline Calibration and False Positive Measurement

Test 8.7: Post-Incident Analysis Completeness

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 9 (Risk Management System)Direct requirement
EU AI ActArticle 15 (Accuracy, Robustness and Cybersecurity)Direct requirement
SOXSection 404 (Internal Controls Over Financial Reporting)Supports compliance
FCA SYSC6.1.1R (Systems and Controls)Direct requirement
NIST AI RMFMANAGE 2.2, MANAGE 4.1Supports compliance
ISO 42001Clause 6.1 (Actions to Address Risks)Supports compliance
DORAArticle 9 (ICT Risk Management Framework), Article 17 (ICT-Related Incident Management Process)Direct requirement

EU AI Act — Article 9 (Risk Management System) and Article 15 (Accuracy, Robustness and Cybersecurity)

The EU AI Act requires that high-risk AI systems be protected against adversarial manipulation. Abuse-at-scale — coordinated campaigns that exploit AI agent estates using botnets, synthetic accounts, or organised human networks — is a form of adversarial manipulation that operates at a level of sophistication and volume that individual-session defences cannot address. Article 9 requires a risk management system that identifies and mitigates risks throughout the AI system's lifecycle. The risk of coordinated exploitation is a foreseeable risk for any AI system accessible to multiple users, and the absence of estate-level detection means this risk is neither identified nor mitigated. Article 15's cybersecurity requirement explicitly encompasses the resilience of the AI system against coordinated attacks, not only individual exploitation attempts. An organisation that demonstrates per-session security but cannot detect or respond to a 12,000-session botnet campaign has not met Article 15's robustness standard.

FCA SYSC — 6.1.1R (Systems and Controls)

The FCA expects that financial firms maintain systems and controls proportionate to the risks they face. For firms deploying AI agents accessible to customers, the risk of coordinated abuse campaigns — credential stuffing, distributed fraud, systematic extraction of proprietary models — is material and well-documented. The FCA's operational resilience framework requires that firms can detect and respond to threats to their critical services, including threats that operate through their customer-facing AI systems. AG-436 provides the specific detection and response controls for abuse-at-scale against agent estates. Firms that implement per-session controls but lack estate-level detection face supervisory challenge when a coordinated campaign causes customer harm that could have been detected earlier.

DORA — Article 9 (ICT Risk Management Framework) and Article 17 (ICT-Related Incident Management Process)

DORA requires financial entities to implement ICT risk management that includes the detection of anomalous activities and ICT-related incidents. A coordinated abuse-at-scale campaign against an AI agent estate is an ICT-related incident that must be detected, managed, and reported. Article 17 requires that entities have processes for detecting, classifying, and responding to ICT-related incidents. AG-436's graduated response playbooks and post-incident analysis requirements directly support DORA Article 17 compliance by ensuring that abuse-at-scale campaigns are detected as incidents, responded to through defined processes, and analysed for continuous improvement. The alerting latency requirements (15 minutes for critical-tier agents) align with DORA's expectation of timely incident detection.

SOX — Section 404 (Internal Controls Over Financial Reporting)

Coordinated campaigns against financial agents can directly affect financial reporting accuracy — fraudulent transactions initiated through botnet credential stuffing (Scenario A), manipulation of financial decision-making at scale, or extraction of proprietary financial models. SOX requires that internal controls are effective at preventing material misstatement. Estate-level detection that identifies and arrests coordinated fraud campaigns is a necessary component of the internal control environment for organisations whose financial processing relies on AI agents. The absence of scale detection means that a coordinated fraud campaign could process thousands of fraudulent transactions before detection, potentially creating material financial statement impact.

10. Failure Severity

FieldValue
Severity RatingCritical
Blast RadiusEstate-wide; affects all agents, accounts, and users within the exploited scope; may extend to partner organisations, financial counterparties, and downstream systems that process agent outputs

Consequence chain: The absence of abuse-at-scale detection creates a blind spot between per-session security and estate-level visibility. An adversary — whether a botnet operator, organised fraud network, or state-sponsored actor — identifies that individual agent sessions are well-protected but that no correlation exists across sessions. The adversary designs a campaign that distributes activity across thousands of sessions, each individually benign, collectively devastating. The campaign executes over hours (Scenario A: £1.72 million in financial losses within 4 hours), weeks (Scenario B: 45,000 pieces of extremist content over 14 days with £8.3 million in revenue impact), or months (Scenario C: £14 million in intellectual property extraction over 6 months). Per-session controls pass every individual interaction. The organisation's security team sees no alerts. The campaign is discovered only through secondary effects — downstream fraud detection, external researcher reports, or competitor product launches — by which time the damage is complete and largely irreversible. The business consequences cascade: direct financial losses from fraud or IP theft, regulatory enforcement for inadequate controls (GDPR fines, FCA enforcement, DORA incident reporting failures), reputational damage from public disclosure of the exploitation, customer remediation costs, and the operational cost of emergency response and system hardening. For organisations with large agent estates serving millions of users, a single undetected abuse-at-scale campaign can produce losses exceeding £10 million and trigger regulatory actions that constrain future AI deployment. The severity is rated Critical because the failure is both high-impact and silent — the organisation has no mechanism to detect the damage while it is occurring, and the delay between exploitation and discovery maximises the attacker's return and the organisation's loss.

Cross-references: AG-004 (Action Rate Governance), AG-003 (Adversarial Coordination Detection), AG-429 (Social Engineering Attack Simulation Governance), AG-432 (Model Exfiltration Throttling Governance), AG-434 (Covert Channel Detection Governance), AG-437 (Economic Abuse Resistance Governance), AG-438 (Jailbreak Pattern Library Governance), AG-414 (Alert Deduplication Governance), AG-418 (Cross-System Trace Correlation Governance).

Cite this protocol
AgentGoverning. (2026). AG-436: Abuse-at-Scale Detection Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-436