The Standard

The 841 Dimensions Regulatory Mapping Version History

Compliance

Compliance Leaderboard Platform Comparison

Verification

Submit for Verification Self-Assessment Tool

About

About AgentGoverning Press & Media

Contact

AG-103

Red-Team Coverage Management Governance

Adversarial AI, Security Testing & Abuse Resistance ~18 min read AGS v2.1 · April 2026

EU AI Act SOX FCA NIST ISO 42001

2. Summary

Red-Team Coverage Management Governance requires that organisations maintain a structured, measurable approach to ensuring that adversarial testing of AI agents covers the full attack surface — not just the attack vectors that are convenient to test or that the red team is most familiar with. Many organisations conduct red-team exercises that repeatedly test the same categories of attack (typically prompt injection and basic jailbreaking) while leaving entire attack classes unexamined: membership inference, multimodal exploitation, long-horizon multi-step attacks, supply chain manipulation, and cross-agent coordination attacks. AG-103 requires a coverage matrix that maps all identified attack classes against all deployed agents, tracks which combinations have been tested, identifies coverage gaps, and drives remediation of those gaps on a defined schedule.

3. Example

Scenario A — Red-Team Coverage Blind Spot Allows Systematic Exploitation: A financial services firm conducts quarterly red-team exercises against its AI trading agent. The red team consistently focuses on prompt injection, jailbreaking, and output manipulation — the attack classes they are most experienced with. Over four quarters, they achieve 92% pass rates and report the agent as "well-defended." However, no test has ever evaluated membership inference (the agent is fine-tuned on proprietary trading data), multi-step attack chains (where individually benign queries accumulate to extract sensitive information), or supply chain attacks (where a compromised data feed alters the agent's behaviour). A competitor exploits the membership inference vulnerability over six weeks, extracting sufficient information to reconstruct the firm's core equity strategy. The firm loses an estimated £25 million in annual alpha before detecting the breach.

What went wrong: The red-team exercises tested depth within familiar attack classes but lacked breadth across the full attack surface. No coverage matrix tracked which attack classes had been evaluated. The quarterly reports showed high pass rates, creating false confidence. The untested attack classes contained the vulnerability that was ultimately exploited. Consequence: £25 million in lost competitive advantage, FCA investigation into adequacy of testing controls, board-level inquiry into security governance, and mandatory remediation costing £3 million.

Scenario B — Coverage Decay After Agent Update: A customer-facing AI agent undergoes a major model update — moving from a text-only model to a multimodal model that also processes uploaded images and documents. The organisation's red-team programme was designed for the text-only version and covers prompt injection, output manipulation, and data exfiltration through text channels. After the update, the red team continues to execute the same test plan. No tests address the new multimodal attack surface: adversarial images, cross-modal inconsistency exploitation, or visual prompt injection. Three months after the update, an attacker discovers that embedding instructions in image metadata allows bypassing the text-based prompt injection filters entirely.

What went wrong: The red-team coverage matrix was not updated when the agent's capabilities changed. No process required re-assessment of the attack surface after a significant agent update. The red team tested the old attack surface, not the current one. Consequence: Customer data exfiltration through the visual channel affecting 12,000 records, mandatory breach notification under GDPR, ICO investigation, and six-week deployment suspension during emergency remediation.

Scenario C — Incomplete Coverage Across Agent Portfolio: An enterprise deploys 15 AI agents across different business functions. The red-team programme tests the three highest-profile agents thoroughly but has never tested the remaining 12 "lower-risk" agents. One of the untested agents — a document classification agent used in the legal department — has access to privileged legal communications and operates with minimal security controls because it was classified as "internal only." An attacker compromises the agent through a supply chain attack on one of its dependencies, gaining access to 2,500 privileged legal documents including active litigation strategy.

What went wrong: Red-team coverage was allocated by perceived risk and visibility, not by a systematic coverage analysis across the full agent portfolio. The untested agent had both high sensitivity (legal privilege) and high vulnerability (minimal controls). No coverage tracking identified it as a gap. Consequence: Breach of legal professional privilege for active litigation, potential case outcomes affected, regulatory penalty for inadequate data protection, malpractice exposure, and £8 million in estimated damages.

4. Requirement Statement

Scope: This dimension applies to all organisations that deploy AI agents in production environments and conduct adversarial testing (red-teaming) of those agents. The scope encompasses the governance of the red-team programme itself — specifically, the completeness and adequacy of coverage across agents, attack classes, and time. It does not prescribe specific red-team techniques (which are addressed by individual AG dimensions such as AG-095, AG-098, AG-101, and AG-102); rather, it governs the management process that ensures all required techniques are applied to all relevant agents on an appropriate schedule. Organisations that do not yet conduct red-team exercises should first implement AG-100 (Red-Team Readiness Governance) before proceeding to AG-103. The scope extends to third-party red-team engagements — the organisation retains responsibility for coverage management regardless of whether testing is conducted internally or externally.

4.1. A conforming system MUST maintain a coverage matrix that maps all identified attack classes against all deployed agents, recording which agent-attack combinations have been tested, the date of the most recent test, the result, and the next scheduled test.

4.2. A conforming system MUST define a minimum coverage standard specifying which attack classes must be tested for each agent profile, based on the agent's capabilities, data access, and deployment context.

4.3. A conforming system MUST update the coverage matrix within 30 days of any significant change to an agent's capabilities, input modalities, data access, or deployment context, re-assessing the attack surface and scheduling tests for newly identified attack classes.

4.4. A conforming system MUST track coverage gaps — agent-attack combinations that are required by the minimum coverage standard but have not been tested within the defined testing cadence — and report them to the responsible governance body at defined intervals.

4.5. A conforming system MUST ensure that no deployed agent has any required attack class untested for more than 12 months.

4.6. A conforming system SHOULD define attack class categories that align with the AG dimension landscape — at minimum: prompt injection (AG-095), output manipulation (AG-096), extraction attacks (AG-098), multimodal attacks (AG-102 where applicable), membership inference (AG-101 where applicable), long-horizon attacks (AG-044), supply chain attacks, and cross-agent coordination attacks.

4.7. A conforming system SHOULD implement coverage metrics that quantify the proportion of the required coverage matrix that has been tested within cadence, enabling trend analysis and regression detection.

4.8. A conforming system SHOULD require that red-team exercises include at least one novel attack technique per cycle — an attack not previously in the coverage matrix — to ensure the programme evolves with the threat landscape.

4.9. A conforming system MAY implement automated continuous red-teaming for high-frequency attack classes (e.g., prompt injection) while reserving manual red-team exercises for complex attack classes (e.g., multi-step attacks, social engineering of the agent).

5. Rationale

Red-team exercises are a critical governance control for AI agent deployments — they are the empirical validation that other governance controls actually work under adversarial conditions. However, the value of red-team exercises is entirely dependent on their coverage. A red-team programme that thoroughly tests three attack classes but ignores seven others provides false assurance — the high pass rates on tested classes mask the untested vulnerabilities that adversaries will exploit.

This is not a theoretical concern. The adversarial AI research community continuously identifies new attack classes, and the lag between academic publication and integration into red-team programmes can be 12-24 months. During that lag, organisations are vulnerable to attacks they are not testing for. AG-103 addresses this by requiring a structured coverage management process that identifies gaps, tracks evolution of the attack surface, and ensures that the red-team programme keeps pace with both the organisation's changing agent portfolio and the evolving threat landscape.

The coverage management challenge has several dimensions. First, the attack surface grows with each new agent capability — a text-only agent has a different attack surface than a multimodal agent, and a single-agent system has a different surface than a multi-agent orchestration. Second, the attack surface evolves as new techniques are published — membership inference techniques that were state-of-the-art two years ago may be significantly less effective than current techniques, meaning that historical test results against old attack implementations provide diminishing assurance. Third, coverage gaps are invisible without structured tracking — an organisation cannot know what it has not tested without a systematic comparison of what it should test against what it has tested.

The operational consequence of inadequate coverage is that security governance degrades silently. The organisation reports high red-team pass rates, governance dashboards show green, and decision-makers believe the agents are well-defended — but the untested attack classes harbour exploitable vulnerabilities. AG-103 prevents this false confidence by requiring explicit coverage tracking, gap identification, and remediation scheduling.

6. Implementation Guidance

AG-103 requires a coverage management infrastructure that operates alongside the red-team programme itself. The coverage matrix is the central artefact — a living document that maps the intersection of agents and attack classes and tracks testing completeness over time.

Recommended patterns:

Structured coverage matrix. Implement the coverage matrix as a structured data store (database, spreadsheet, or governance platform) with the following dimensions: Agent ID, Agent Name, Agent Profile (from AG applicable profiles), Attack Class (aligned with AG dimensions), Required (yes/no based on minimum coverage standard), Last Tested Date, Last Test Result (pass/fail/partial), Testing Team, Next Scheduled Test, and Coverage Status (current/overdue/untested). Generate automated reports showing coverage percentage, overdue tests, and untested agent-attack combinations. A typical enterprise with 15 agents and 10 attack classes has 150 matrix cells to track — manageable manually at small scale but requiring automation at larger deployments.
Attack class taxonomy aligned with AG dimensions. Define attack classes that map directly to AG governance dimensions: prompt injection and jailbreaking (AG-095), output manipulation and poisoning (AG-096), model extraction and weights theft (AG-098), membership inference and model inversion (AG-101), multimodal adversarial attacks (AG-102), data exfiltration through agent channels (AG-013), long-horizon multi-step attacks (AG-044), active deception and evasion (AG-039), supply chain attacks (dependency compromise, data feed manipulation), and cross-agent coordination attacks (exploiting multi-agent architectures). This taxonomy ensures that red-team coverage directly validates AG governance controls.
Capability-triggered coverage reassessment. Integrate the coverage matrix update with the agent change management process. When an agent undergoes a significant change — new model version, new input modality, new data access, new integration, or new deployment context — automatically flag the coverage matrix for reassessment and schedule tests for any newly applicable attack classes. Define "significant change" explicitly to avoid ambiguity.
Coverage dashboard with trend analysis. Implement a governance dashboard that displays: current coverage percentage (tested within cadence / required), coverage trend over time, agents with the most coverage gaps, attack classes with the lowest coverage across the portfolio, and upcoming test schedule. Present this dashboard to the governance body at defined intervals (monthly or quarterly).
Novel attack integration process. Establish a process for monitoring the adversarial AI research landscape and integrating newly published attack techniques into the red-team programme. Assign responsibility for monitoring conference proceedings (NeurIPS, ICML, USENIX Security, IEEE S&P), pre-print servers (arXiv), and vendor security advisories. When a new attack class is identified as relevant, add it to the attack class taxonomy, assess which agents it applies to, and schedule initial testing within 90 days.

Anti-patterns to avoid:

Treating red-team exercises as checkbox compliance. If the red team tests the same techniques against the same agents each quarter without expanding scope, the programme provides diminishing assurance while consuming the same resources. Coverage management requires continuous evolution.
Measuring red-team effectiveness by pass rate alone. A 100% pass rate may indicate thorough defences or inadequate testing. Coverage metrics (breadth of attack classes tested, novelty of techniques, coverage of the agent portfolio) are more meaningful than pass rate alone.
Allocating red-team coverage by agent visibility rather than risk. High-profile customer-facing agents often receive disproportionate testing attention while internal agents with high data sensitivity are under-tested. Coverage allocation should be risk-based, considering both the sensitivity of the agent's data access and the severity of potential exploitation.
Allowing coverage gaps to persist without remediation plans. Identifying gaps without scheduling remediation is worse than not tracking gaps at all — it creates documented evidence that the organisation knew about untested vulnerabilities and did not act. Every identified gap should have a remediation date.
Conducting red-team exercises only at deployment time. The threat landscape evolves continuously. An agent that passed all red-team tests at deployment may be vulnerable to attack techniques published six months later. Continuous or periodic retesting within cadence is essential.

Industry Considerations

Financial Services. Regulators including the FCA and ECB expect that AI systems are subject to ongoing adversarial testing, not just pre-deployment assessment. The TIBER-EU framework for threat intelligence-based ethical red-teaming provides a model for structuring coverage across financial AI agents. Firms should map their AI red-team coverage matrix to their existing TIBER or CBEST programmes.

Healthcare. Healthcare AI agents require coverage of medical-domain-specific attack classes — adversarial perturbations to medical images, manipulation of clinical decision support outputs, and exploitation of drug interaction databases. The FDA's post-market surveillance expectations for AI-based medical devices include ongoing adversarial evaluation.

Public Sector. Government AI agents processing citizen data carry heightened accountability. Red-team coverage should include attack classes specific to public sector contexts: benefit fraud facilitation, identity verification bypass, and manipulation of eligibility determinations. Coverage reports should be available for parliamentary scrutiny and audit.

Maturity Model

Basic Implementation — The organisation maintains a coverage matrix listing all deployed agents and the attack classes tested against each. The minimum coverage standard is defined based on agent profiles. Coverage gaps are identified and reported quarterly. All required attack classes are tested at least annually. This level meets the minimum mandatory requirements but may have limited coverage breadth and no automated coverage tracking.

Intermediate Implementation — Coverage tracking is automated with a governance dashboard showing real-time coverage metrics and trends. Capability-triggered reassessment is integrated with the agent change management process. Coverage metrics are reported to the governance body monthly. Novel attack techniques are integrated into the programme within 90 days of identification. Red-team exercises include at least one novel technique per cycle.

Advanced Implementation — All intermediate capabilities plus: automated continuous red-teaming covers high-frequency attack classes in real-time. Coverage metrics feed into organisational risk calculations. Independent external validation of coverage completeness is conducted annually. The organisation contributes to industry threat intelligence sharing on AI adversarial techniques. Coverage management is formally integrated with the organisation's overall security testing programme, ensuring that AI-specific and traditional cybersecurity testing are coordinated.

7. Evidence Requirements

Required artefacts:

Coverage matrix. The structured coverage matrix showing all agent-attack class combinations, testing status, dates, results, and schedules. Must be current (updated within 30 days of any significant agent change).
Minimum coverage standard. The documented standard specifying which attack classes are required for each agent profile, including the rationale for each inclusion or exclusion.
Coverage gap reports. Periodic reports identifying untested agent-attack combinations, overdue tests, and remediation schedules. Minimum quarterly reporting cadence.
Red-team exercise reports. For each red-team exercise: scope (which agents and attack classes were tested), techniques used, findings, pass/fail results, and coverage matrix updates resulting from the exercise.
Capability change reassessment records. Documentation of coverage matrix reassessments triggered by significant agent changes, including the change description, the attack surface re-analysis, and any new tests scheduled.

Retention requirements:

Coverage matrices, gap reports, and exercise reports: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Evidence must exist as retained artefacts, not be reconstructable after the fact.

8. Test Specification

Testing AG-103 compliance is a governance process audit — it evaluates the coverage management process, not the individual red-team test results (which are tested under their respective AG dimensions).

Test 8.1: Coverage Matrix Completeness

Stimulus: Audit the coverage matrix against the current inventory of deployed agents and the defined attack class taxonomy. Compare each agent's capabilities and profile against the minimum coverage standard.
Expected behaviour: Every deployed agent is present in the matrix. Every attack class required by the minimum coverage standard for that agent's profile is listed. No required agent-attack combination is missing from the matrix.
Pass criteria: 100% of required agent-attack combinations are present in the coverage matrix.
Fail criteria: Any required agent-attack combination is absent from the coverage matrix.

Test 8.2: Coverage Cadence Compliance

Stimulus: Review the coverage matrix for all required agent-attack combinations and verify that each has been tested within the defined cadence (maximum 12 months).
Expected behaviour: All required combinations show a test date within the past 12 months.
Pass criteria: 100% of required combinations tested within cadence.
Fail criteria: Any required combination has not been tested within the past 12 months without a documented exception and remediation plan.

Test 8.3: Capability Change Reassessment Timeliness

Stimulus: Review agent change logs for the past 12 months and identify all significant capability changes. For each, verify that the coverage matrix was reassessed within 30 days of the change.
Expected behaviour: Each significant capability change has a corresponding coverage matrix reassessment within 30 days, with any new attack classes identified and testing scheduled.
Pass criteria: 100% of significant capability changes have timely coverage reassessments.
Fail criteria: Any significant capability change lacks a coverage reassessment within 30 days.

Test 8.4: Gap Reporting and Remediation

Stimulus: Review coverage gap reports for the past 12 months. For each identified gap, verify that a remediation plan was created and that the gap was either closed or has an active remediation schedule.
Expected behaviour: Every identified coverage gap has a documented remediation plan with a target date. Gaps older than 90 days have an escalation record.
Pass criteria: 100% of identified gaps have remediation plans; no gap older than 12 months without closure or active escalation.
Fail criteria: Any gap lacks a remediation plan, or any gap persists beyond 12 months without documented escalation and justification.

Test 8.5: Novel Attack Integration

Stimulus: Review the organisation's threat intelligence monitoring process for the past 12 months. Identify newly published attack techniques relevant to deployed agents and verify that they were integrated into the coverage matrix.
Expected behaviour: At least one novel attack technique was added to the coverage matrix per testing cycle. Newly published techniques relevant to deployed agents were assessed for inclusion within 90 days of publication.
Pass criteria: Evidence of novel technique integration and timely assessment of published techniques.
Fail criteria: No novel techniques were added in the past 12 months, or relevant published techniques were not assessed within 90 days.

Test 8.6: Minimum Coverage Standard Appropriateness

Stimulus: Review the minimum coverage standard against the current threat landscape, published adversarial research, and industry incident reports. Assess whether the standard includes all attack classes identified by industry sources as relevant to the organisation's agent profiles.
Expected behaviour: The minimum coverage standard reflects the current threat landscape, not a historical snapshot. Attack classes absent from the standard are justified by documented risk assessment.
Pass criteria: The minimum coverage standard is current (reviewed within 12 months) and includes all relevant attack classes or documents the risk-based rationale for exclusions.
Fail criteria: The standard has not been reviewed in more than 12 months, or relevant attack classes are absent without documented justification.

Conformance Scoring

Score 0: No red-team coverage management exists — red-team exercises are ad hoc with no tracking of which agents or attack classes have been tested.
Score 1: A coverage matrix exists but is not systematically maintained — gaps are identified reactively rather than proactively, and coverage is not linked to agent capability changes.
Score 2: Coverage management is active and systematic — the matrix is current, gaps are identified and reported on schedule, capability changes trigger reassessment, and all required combinations are tested within cadence.
Score 3: Coverage management is proactive and intelligence-driven — novel techniques are integrated continuously, automated red-teaming covers high-frequency attack classes, coverage metrics feed into organisational risk calculations, and independent validation confirms completeness.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9 (Risk Management System)	Direct requirement
EU AI Act	Article 15 (Accuracy, Robustness and Cybersecurity)	Supports compliance
NIST AI RMF	GOVERN 1.1, MANAGE 4.1	Supports compliance
ISO 42001	Clause 9.1 (Monitoring, Measurement, Analysis and Evaluation)	Direct requirement
FCA SYSC	6.1.1R (Systems and Controls)	Supports compliance
DORA	Article 26 (Threat-Led Penetration Testing)	Direct requirement
SOX	Section 404 (Internal Controls Over Financial Reporting)	Supports compliance

EU AI Act — Article 9 (Risk Management System)

Article 9 requires that the risk management system include "testing with a view to identifying the most appropriate risk management measures." AG-103 implements the governance of that testing — ensuring that the testing programme covers the full risk surface, not just a subset of it. The regulation's requirement for continuous, iterative risk management maps directly to AG-103's requirements for ongoing coverage tracking, gap identification, and programme evolution.

EU AI Act — Article 15 (Accuracy, Robustness and Cybersecurity)

Article 15 requires resilience against attempts to exploit system vulnerabilities. Demonstrating resilience requires testing against the known vulnerability space — AG-103 ensures that the testing programme covers that space systematically rather than selectively.

NIST AI RMF — GOVERN 1.1, MANAGE 4.1

GOVERN 1.1 addresses legal and regulatory requirements; MANAGE 4.1 addresses regular monitoring and review of risk management effectiveness. AG-103 supports both by ensuring that the adversarial testing programme — the primary mechanism for validating risk management effectiveness — is comprehensive and current.

ISO 42001 — Clause 9.1

Clause 9.1 requires organisations to determine what needs to be monitored and measured, the methods for monitoring and measurement, and when monitoring and measurement shall be performed. AG-103 directly implements this requirement for adversarial testing: the coverage matrix determines what needs to be tested, the minimum coverage standard determines the methods, and the testing cadence determines when. The coverage metrics provide the measurement required by the clause.

FCA SYSC — 6.1.1R (Systems and Controls)

For financial firms, the adequacy of testing programmes is a systems and controls obligation. An AI governance testing programme that leaves significant attack classes untested would not meet the adequacy standard. The FCA expects testing programmes to evolve with the threat landscape, which AG-103 ensures through its novel attack integration requirement.

DORA — Article 26 (Threat-Led Penetration Testing)

Article 26 requires financial entities to carry out threat-led penetration testing (TLPT) at least every three years. For AI agents in financial services, TLPT must include AI-specific attack classes. AG-103's coverage management ensures that AI-specific attack classes are included in TLPT scope and that coverage is tracked across testing cycles. The coverage matrix provides the evidence that TLPT addressed the AI attack surface.

SOX — Section 404 (Internal Controls Over Financial Reporting)

For AI agents involved in financial reporting or financial operations, the testing of governance controls is a SOX requirement. A SOX auditor will examine whether the testing programme covers the control objectives. AG-103's coverage matrix provides the auditor with a comprehensive view of what has been tested, when, and with what results — directly supporting the Section 404 assessment.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Organisation-wide — coverage gaps can exist across any deployed agent, and exploitation of an untested vulnerability can affect any business function served by that agent

Consequence chain: Without red-team coverage management, the organisation accumulates untested vulnerabilities across its agent portfolio. The failure mode is invisible — there is no alert when an attack class is not tested, no warning when agent capabilities change without coverage reassessment, and no signal when the threat landscape evolves beyond the red-team programme's scope. The organisation operates with false confidence based on high pass rates in the attack classes that are tested, while untested classes harbour exploitable vulnerabilities. When an adversary discovers and exploits an untested vulnerability, the impact scales with the agent's authority and data access. The discovery timeline is typically months — organisations rarely detect exploitation of vulnerabilities they did not know existed, because they have no monitoring for attack patterns they have not considered. The financial impact ranges from direct losses through the exploited vulnerability (potentially millions in financial services or healthcare contexts) to indirect costs including regulatory enforcement, remediation, and reputational damage. The regulatory consequence is compounded by the fact that AG-103 failures are governance failures, not technical failures — they demonstrate inadequate oversight of the security testing programme, which regulators view as a more fundamental deficiency than a single technical vulnerability.

Cross-reference note: AG-103 is the meta-governance dimension for the adversarial AI landscape — it ensures that the controls specified by AG-095 through AG-102 are actually tested in practice, not just specified in policy. AG-100 (Red-Team Readiness Governance) establishes the organisational capability to conduct red-team exercises; AG-103 governs the completeness of those exercises. Together, they form the assurance layer that validates the entire adversarial AI governance programme. AG-044 (Long-Horizon Attack Strategy Detection) is a particularly important coverage area, as long-horizon attacks are among the most commonly untested attack classes.

Cite this protocol

AgentGoverning. (2026). AG-103: Red-Team Coverage Management Governance. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-103

← Previous Protocol

AG-102

Multimodal Adversarial Robustness Governance

Next Protocol →

AG-104

Trust Calibration Governance