AG-103

Red-Team Coverage Management Governance

Adversarial AI, Security Testing & Abuse Resistance ~18 min read AGS v2.1 · April 2026
EU AI Act GDPR SOX FCA NIST ISO 42001

2. Summary

Red-Team Coverage Management Governance requires that organisations maintain a structured, measurable approach to ensuring that adversarial testing of AI agents covers the full attack surface — not just the attack vectors that are convenient to test or that the red team is most familiar with. Many organisations conduct red-team exercises that repeatedly test the same categories of attack (typically prompt injection and basic jailbreaking) while leaving entire attack classes unexamined: membership inference, multimodal exploitation, long-horizon multi-step attacks, supply chain manipulation, and cross-agent coordination attacks. AG-103 requires a coverage matrix that maps all identified attack classes against all deployed agents, tracks which combinations have been tested, identifies coverage gaps, and drives remediation of those gaps on a defined schedule.

3. Example

Scenario A — Red-Team Coverage Blind Spot Allows Systematic Exploitation: A financial services firm conducts quarterly red-team exercises against its AI trading agent. The red team consistently focuses on prompt injection, jailbreaking, and output manipulation — the attack classes they are most experienced with. Over four quarters, they achieve 92% pass rates and report the agent as "well-defended." However, no test has ever evaluated membership inference (the agent is fine-tuned on proprietary trading data), multi-step attack chains (where individually benign queries accumulate to extract sensitive information), or supply chain attacks (where a compromised data feed alters the agent's behaviour). A competitor exploits the membership inference vulnerability over six weeks, extracting sufficient information to reconstruct the firm's core equity strategy. The firm loses an estimated £25 million in annual alpha before detecting the breach.

What went wrong: The red-team exercises tested depth within familiar attack classes but lacked breadth across the full attack surface. No coverage matrix tracked which attack classes had been evaluated. The quarterly reports showed high pass rates, creating false confidence. The untested attack classes contained the vulnerability that was ultimately exploited. Consequence: £25 million in lost competitive advantage, FCA investigation into adequacy of testing controls, board-level inquiry into security governance, and mandatory remediation costing £3 million.

Scenario B — Coverage Decay After Agent Update: A customer-facing AI agent undergoes a major model update — moving from a text-only model to a multimodal model that also processes uploaded images and documents. The organisation's red-team programme was designed for the text-only version and covers prompt injection, output manipulation, and data exfiltration through text channels. After the update, the red team continues to execute the same test plan. No tests address the new multimodal attack surface: adversarial images, cross-modal inconsistency exploitation, or visual prompt injection. Three months after the update, an attacker discovers that embedding instructions in image metadata allows bypassing the text-based prompt injection filters entirely.

What went wrong: The red-team coverage matrix was not updated when the agent's capabilities changed. No process required re-assessment of the attack surface after a significant agent update. The red team tested the old attack surface, not the current one. Consequence: Customer data exfiltration through the visual channel affecting 12,000 records, mandatory breach notification under GDPR, ICO investigation, and six-week deployment suspension during emergency remediation.

Scenario C — Incomplete Coverage Across Agent Portfolio: An enterprise deploys 15 AI agents across different business functions. The red-team programme tests the three highest-profile agents thoroughly but has never tested the remaining 12 "lower-risk" agents. One of the untested agents — a document classification agent used in the legal department — has access to privileged legal communications and operates with minimal security controls because it was classified as "internal only." An attacker compromises the agent through a supply chain attack on one of its dependencies, gaining access to 2,500 privileged legal documents including active litigation strategy.

What went wrong: Red-team coverage was allocated by perceived risk and visibility, not by a systematic coverage analysis across the full agent portfolio. The untested agent had both high sensitivity (legal privilege) and high vulnerability (minimal controls). No coverage tracking identified it as a gap. Consequence: Breach of legal professional privilege for active litigation, potential case outcomes affected, regulatory penalty for inadequate data protection, malpractice exposure, and £8 million in estimated damages.

4. Requirement Statement

Scope: This dimension applies to all organisations that deploy AI agents in production environments and conduct adversarial testing (red-teaming) of those agents. The scope encompasses the governance of the red-team programme itself — specifically, the completeness and adequacy of coverage across agents, attack classes, and time. It does not prescribe specific red-team techniques (which are addressed by individual AG dimensions such as AG-095, AG-098, AG-101, and AG-102); rather, it governs the management process that ensures all required techniques are applied to all relevant agents on an appropriate schedule. Organisations that do not yet conduct red-team exercises should first implement AG-100 (Red-Team Readiness Governance) before proceeding to AG-103. The scope extends to third-party red-team engagements — the organisation retains responsibility for coverage management regardless of whether testing is conducted internally or externally.

4.1. A conforming system MUST maintain a coverage matrix that maps all identified attack classes against all deployed agents, recording which agent-attack combinations have been tested, the date of the most recent test, the result, and the next scheduled test.

4.2. A conforming system MUST define a minimum coverage standard specifying which attack classes must be tested for each agent profile, based on the agent's capabilities, data access, and deployment context.

4.3. A conforming system MUST update the coverage matrix within 30 days of any significant change to an agent's capabilities, input modalities, data access, or deployment context, re-assessing the attack surface and scheduling tests for newly identified attack classes.

4.4. A conforming system MUST track coverage gaps — agent-attack combinations that are required by the minimum coverage standard but have not been tested within the defined testing cadence — and report them to the responsible governance body at defined intervals.

4.5. A conforming system MUST ensure that no deployed agent has any required attack class untested for more than 12 months.

4.6. A conforming system SHOULD define attack class categories that align with the AG dimension landscape — at minimum: prompt injection (AG-095), output manipulation (AG-096), extraction attacks (AG-098), multimodal attacks (AG-102 where applicable), membership inference (AG-101 where applicable), long-horizon attacks (AG-044), supply chain attacks, and cross-agent coordination attacks.

4.7. A conforming system SHOULD implement coverage metrics that quantify the proportion of the required coverage matrix that has been tested within cadence, enabling trend analysis and regression detection.

4.8. A conforming system SHOULD require that red-team exercises include at least one novel attack technique per cycle — an attack not previously in the coverage matrix — to ensure the programme evolves with the threat landscape.

4.9. A conforming system MAY implement automated continuous red-teaming for high-frequency attack classes (e.g., prompt injection) while reserving manual red-team exercises for complex attack classes (e.g., multi-step attacks, social engineering of the agent).

5. Rationale

Red-team exercises are a critical governance control for AI agent deployments — they are the empirical validation that other governance controls actually work under adversarial conditions. However, the value of red-team exercises is entirely dependent on their coverage. A red-team programme that thoroughly tests three attack classes but ignores seven others provides false assurance — the high pass rates on tested classes mask the untested vulnerabilities that adversaries will exploit.

This is not a theoretical concern. The adversarial AI research community continuously identifies new attack classes, and the lag between academic publication and integration into red-team programmes can be 12-24 months. During that lag, organisations are vulnerable to attacks they are not testing for. AG-103 addresses this by requiring a structured coverage management process that identifies gaps, tracks evolution of the attack surface, and ensures that the red-team programme keeps pace with both the organisation's changing agent portfolio and the evolving threat landscape.

The coverage management challenge has several dimensions. First, the attack surface grows with each new agent capability — a text-only agent has a different attack surface than a multimodal agent, and a single-agent system has a different surface than a multi-agent orchestration. Second, the attack surface evolves as new techniques are published — membership inference techniques that were state-of-the-art two years ago may be significantly less effective than current techniques, meaning that historical test results against old attack implementations provide diminishing assurance. Third, coverage gaps are invisible without structured tracking — an organisation cannot know what it has not tested without a systematic comparison of what it should test against what it has tested.

The operational consequence of inadequate coverage is that security governance degrades silently. The organisation reports high red-team pass rates, governance dashboards show green, and decision-makers believe the agents are well-defended — but the untested attack classes harbour exploitable vulnerabilities. AG-103 prevents this false confidence by requiring explicit coverage tracking, gap identification, and remediation scheduling.

6. Implementation Guidance

AG-103 requires a coverage management infrastructure that operates alongside the red-team programme itself. The coverage matrix is the central artefact — a living document that maps the intersection of agents and attack classes and tracks testing completeness over time.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. Regulators including the FCA and ECB expect that AI systems are subject to ongoing adversarial testing, not just pre-deployment assessment. The TIBER-EU framework for threat intelligence-based ethical red-teaming provides a model for structuring coverage across financial AI agents. Firms should map their AI red-team coverage matrix to their existing TIBER or CBEST programmes.

Healthcare. Healthcare AI agents require coverage of medical-domain-specific attack classes — adversarial perturbations to medical images, manipulation of clinical decision support outputs, and exploitation of drug interaction databases. The FDA's post-market surveillance expectations for AI-based medical devices include ongoing adversarial evaluation.

Public Sector. Government AI agents processing citizen data carry heightened accountability. Red-team coverage should include attack classes specific to public sector contexts: benefit fraud facilitation, identity verification bypass, and manipulation of eligibility determinations. Coverage reports should be available for parliamentary scrutiny and audit.

Maturity Model

Basic Implementation — The organisation maintains a coverage matrix listing all deployed agents and the attack classes tested against each. The minimum coverage standard is defined based on agent profiles. Coverage gaps are identified and reported quarterly. All required attack classes are tested at least annually. This level meets the minimum mandatory requirements but may have limited coverage breadth and no automated coverage tracking.

Intermediate Implementation — Coverage tracking is automated with a governance dashboard showing real-time coverage metrics and trends. Capability-triggered reassessment is integrated with the agent change management process. Coverage metrics are reported to the governance body monthly. Novel attack techniques are integrated into the programme within 90 days of identification. Red-team exercises include at least one novel technique per cycle.

Advanced Implementation — All intermediate capabilities plus: automated continuous red-teaming covers high-frequency attack classes in real-time. Coverage metrics feed into organisational risk calculations. Independent external validation of coverage completeness is conducted annually. The organisation contributes to industry threat intelligence sharing on AI adversarial techniques. Coverage management is formally integrated with the organisation's overall security testing programme, ensuring that AI-specific and traditional cybersecurity testing are coordinated.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Testing AG-103 compliance is a governance process audit — it evaluates the coverage management process, not the individual red-team test results (which are tested under their respective AG dimensions).

Test 8.1: Coverage Matrix Completeness

Test 8.2: Coverage Cadence Compliance

Test 8.3: Capability Change Reassessment Timeliness

Test 8.4: Gap Reporting and Remediation

Test 8.5: Novel Attack Integration

Test 8.6: Minimum Coverage Standard Appropriateness

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 9 (Risk Management System)Direct requirement
EU AI ActArticle 15 (Accuracy, Robustness and Cybersecurity)Supports compliance
NIST AI RMFGOVERN 1.1, MANAGE 4.1Supports compliance
ISO 42001Clause 9.1 (Monitoring, Measurement, Analysis and Evaluation)Direct requirement
FCA SYSC6.1.1R (Systems and Controls)Supports compliance
DORAArticle 26 (Threat-Led Penetration Testing)Direct requirement
SOXSection 404 (Internal Controls Over Financial Reporting)Supports compliance

EU AI Act — Article 9 (Risk Management System)

Article 9 requires that the risk management system include "testing with a view to identifying the most appropriate risk management measures." AG-103 implements the governance of that testing — ensuring that the testing programme covers the full risk surface, not just a subset of it. The regulation's requirement for continuous, iterative risk management maps directly to AG-103's requirements for ongoing coverage tracking, gap identification, and programme evolution.

EU AI Act — Article 15 (Accuracy, Robustness and Cybersecurity)

Article 15 requires resilience against attempts to exploit system vulnerabilities. Demonstrating resilience requires testing against the known vulnerability space — AG-103 ensures that the testing programme covers that space systematically rather than selectively.

NIST AI RMF — GOVERN 1.1, MANAGE 4.1

GOVERN 1.1 addresses legal and regulatory requirements; MANAGE 4.1 addresses regular monitoring and review of risk management effectiveness. AG-103 supports both by ensuring that the adversarial testing programme — the primary mechanism for validating risk management effectiveness — is comprehensive and current.

ISO 42001 — Clause 9.1

Clause 9.1 requires organisations to determine what needs to be monitored and measured, the methods for monitoring and measurement, and when monitoring and measurement shall be performed. AG-103 directly implements this requirement for adversarial testing: the coverage matrix determines what needs to be tested, the minimum coverage standard determines the methods, and the testing cadence determines when. The coverage metrics provide the measurement required by the clause.

FCA SYSC — 6.1.1R (Systems and Controls)

For financial firms, the adequacy of testing programmes is a systems and controls obligation. An AI governance testing programme that leaves significant attack classes untested would not meet the adequacy standard. The FCA expects testing programmes to evolve with the threat landscape, which AG-103 ensures through its novel attack integration requirement.

DORA — Article 26 (Threat-Led Penetration Testing)

Article 26 requires financial entities to carry out threat-led penetration testing (TLPT) at least every three years. For AI agents in financial services, TLPT must include AI-specific attack classes. AG-103's coverage management ensures that AI-specific attack classes are included in TLPT scope and that coverage is tracked across testing cycles. The coverage matrix provides the evidence that TLPT addressed the AI attack surface.

SOX — Section 404 (Internal Controls Over Financial Reporting)

For AI agents involved in financial reporting or financial operations, the testing of governance controls is a SOX requirement. A SOX auditor will examine whether the testing programme covers the control objectives. AG-103's coverage matrix provides the auditor with a comprehensive view of what has been tested, when, and with what results — directly supporting the Section 404 assessment.

10. Failure Severity

FieldValue
Severity RatingHigh
Blast RadiusOrganisation-wide — coverage gaps can exist across any deployed agent, and exploitation of an untested vulnerability can affect any business function served by that agent

Consequence chain: Without red-team coverage management, the organisation accumulates untested vulnerabilities across its agent portfolio. The failure mode is invisible — there is no alert when an attack class is not tested, no warning when agent capabilities change without coverage reassessment, and no signal when the threat landscape evolves beyond the red-team programme's scope. The organisation operates with false confidence based on high pass rates in the attack classes that are tested, while untested classes harbour exploitable vulnerabilities. When an adversary discovers and exploits an untested vulnerability, the impact scales with the agent's authority and data access. The discovery timeline is typically months — organisations rarely detect exploitation of vulnerabilities they did not know existed, because they have no monitoring for attack patterns they have not considered. The financial impact ranges from direct losses through the exploited vulnerability (potentially millions in financial services or healthcare contexts) to indirect costs including regulatory enforcement, remediation, and reputational damage. The regulatory consequence is compounded by the fact that AG-103 failures are governance failures, not technical failures — they demonstrate inadequate oversight of the security testing programme, which regulators view as a more fundamental deficiency than a single technical vulnerability.

Cross-reference note: AG-103 is the meta-governance dimension for the adversarial AI landscape — it ensures that the controls specified by AG-095 through AG-102 are actually tested in practice, not just specified in policy. AG-100 (Red-Team Readiness Governance) establishes the organisational capability to conduct red-team exercises; AG-103 governs the completeness of those exercises. Together, they form the assurance layer that validates the entire adversarial AI governance programme. AG-044 (Long-Horizon Attack Strategy Detection) is a particularly important coverage area, as long-horizon attacks are among the most commonly untested attack classes.

Cite this protocol
AgentGoverning. (2026). AG-103: Red-Team Coverage Management Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-103