AG-189

Capability/Control Mismatch Detection Governance

Protocolised Ecosystems, Long-Running Tasks & Tomorrow's Agents ~17 min read AGS v2.1 · April 2026
EU AI Act GDPR FCA NIST ISO 42001

2. Summary

Capability/Control Mismatch Detection Governance requires that organisations continuously assess whether the governance controls applied to an AI agent are commensurate with the agent's actual capabilities. As agents acquire new capabilities — through model upgrades, tool access expansions, integration changes, or emergent behaviour — the governance controls originally designed for a less capable agent may become insufficient. This dimension mandates systematic detection of the gap between what an agent can do and what the governance framework assumes it can do, ensuring that controls evolve in lockstep with capabilities. Without this, organisations face the silent accumulation of ungoverned capability — agents that have outgrown their governance frameworks without anyone noticing.

3. Example

Scenario A — Model Upgrade Creates Ungoverned Coding Capability: A customer service agent is deployed with governance controls designed for a text-based conversational agent: content filters, tone monitoring, escalation triggers, and response length limits. The underlying model is upgraded from GPT-3.5 to GPT-4-class, giving the agent the ability to generate and reason about code. A customer asks the agent to "help me write a script to automate my account updates." The agent produces a working Python script that, when executed by the customer, makes 4,000 API calls to the organisation's account management system in 3 minutes, triggering rate limiting, creating 4,000 audit log entries, and temporarily degrading the platform for all users. The governance controls — designed for conversational text — had no coverage for code generation capabilities.

What went wrong: The model upgrade expanded the agent's capabilities beyond the governance framework's assumptions. No assessment was performed to determine whether the existing controls covered the new capabilities. The mismatch between capability (code generation) and control (text content filtering) created a governance gap. Consequence: Platform degradation affecting all users, 4,000 spurious audit entries requiring investigation, customer trust impact, engineering team diverted for 2 days.

Scenario B — Tool Access Expansion Without Control Update: A financial analysis agent is initially deployed with read-only access to market data APIs. Over 6 months, the integration team progressively adds API access: first a portfolio analytics API (read-only), then an order management system API (read-write), then a fund transfer API (read-write). Each API addition follows the integration team's change management process, but nobody updates the governance framework. The agent's mandate (AG-001) still specifies "read-only market data analysis" — the mandate was not updated when write APIs were added. The agent, instructed by a user to "optimise the portfolio based on your analysis," submits 12 trade orders totalling £3,400,000 through the order management API. The mandate enforcement layer does not block the trades because the trades are submitted through an API that the enforcement layer does not monitor — it was not configured when that API was added.

What went wrong: The governance framework assumed the agent had read-only capabilities. Tool access expanded incrementally without triggering a governance reassessment. Each individual API addition was small enough to seem non-risky, but the cumulative effect transformed the agent from a read-only analyst to a read-write trading participant. Consequence: £3,400,000 in unauthorised trades, FCA enforcement action, personal liability for the senior manager responsible under SM&CR.

Scenario C — Emergent Multi-Step Reasoning Exceeds Control Assumptions: A research agent is deployed with controls calibrated for single-step information retrieval: it searches databases, summarises findings, and presents results. The controls assume the agent takes one action per user request. After a model fine-tuning iteration, the agent develops the capability for multi-step autonomous reasoning — it chains 15-20 tool calls together to synthesise complex analyses. The governance framework evaluates each individual tool call against the mandate, and each individual call is compliant. However, the chain of calls collectively achieves an outcome that no single call achieves: the agent cross-references three restricted databases, correlates the results with public data, and produces a composite analysis that contains effectively re-identified patient data — despite each individual query returning only aggregated results. The controls, designed for single-step queries, cannot detect the emergent re-identification risk of the chained sequence.

What went wrong: The agent's reasoning capability evolved beyond single-step to multi-step chaining. The governance controls evaluated each step independently rather than assessing the composite capability of the chain. The mismatch between the agent's actual capability (multi-step reasoning producing composite insights) and the control's assumption (single-step independent queries) created a governance gap for emergent re-identification. Consequence: GDPR Article 5(1)(a) violation for unlawful processing of personal data, ICO investigation, £2,800,000 potential fine, research programme suspended.

4. Requirement Statement

Scope: This dimension applies to all AI agents where the agent's capabilities can change over time. This includes agents whose underlying models are upgraded or fine-tuned, agents that receive new tool or API access, agents whose orchestration logic is modified, agents deployed in environments where new data sources become available, and agents whose emergent behaviour may expand beyond the designed capability envelope. The scope covers both explicit capability changes (model upgrades, new tool access) and implicit capability changes (emergent behaviours, capability improvements through additional training data or context). Agents with provably static capabilities — hardware-limited systems with no update mechanism and no emergent behaviour potential — are excluded, though this exclusion requires formal justification. In practice, any agent using a foundation model has non-static capabilities by definition, because model behaviour changes with context, fine-tuning, and reasoning chain evolution.

4.1. A conforming system MUST maintain a capability register for each deployed agent that enumerates the agent's known capabilities, the date each capability was assessed, and the governance controls mapped to each capability.

4.2. A conforming system MUST trigger a capability/control mismatch assessment whenever the agent's model is upgraded, new tools or APIs are provisioned, the agent's orchestration logic is modified, or the agent's deployment context changes in a way that could alter its effective capabilities.

4.3. A conforming system MUST implement automated detection of capability exercise outside the capability register — actions or action patterns that indicate the agent possesses capabilities not recorded in the register.

4.4. A conforming system MUST escalate detected mismatches to a designated human authority within 24 hours, with a structured report detailing the ungoverned capability, the potential impact, and recommended control additions.

4.5. A conforming system MUST restrict or suspend the ungoverned capability until appropriate controls are implemented and verified, unless the designated human authority explicitly accepts the residual risk with documented justification.

4.6. A conforming system SHOULD implement periodic capability probing — systematically testing the agent's ability to perform actions outside its registered capability set at least quarterly.

4.7. A conforming system SHOULD assess composite capabilities arising from chains of individually governed actions, not only individual capabilities in isolation.

4.8. A conforming system SHOULD integrate capability/control mismatch detection with the change management process, making governance reassessment a mandatory gate for any change that could affect agent capabilities.

4.9. A conforming system MAY implement automated capability fingerprinting — characterising the agent's capability profile through systematic probing and comparing it against the registered profile to detect drift.

4.10. A conforming system MAY implement predictive mismatch detection — analysing planned changes (model upgrades, new tool access) for potential governance gaps before the changes are deployed.

5. Rationale

The governance of AI agents is typically designed at deployment time based on the agent's known capabilities at that point. Controls are calibrated to the agent's assessed risk profile — a read-only agent gets lighter controls than a read-write agent; a text-only agent gets different controls than a code-generating agent; a single-step agent gets simpler controls than an autonomous multi-step agent. This calibration is sound at deployment, but it assumes that capabilities remain static. They do not.

Agent capabilities evolve through multiple vectors. Model upgrades introduce new reasoning abilities, language capabilities, and tool-use proficiency. New tool integrations expand the agent's action space. Fine-tuning on new data can create capabilities the original model did not possess. Even without explicit changes, emergent behaviours can arise from novel prompt patterns, extended context windows, or chain-of-thought reasoning that achieves outcomes the individual reasoning steps would not predict.

The result is capability/control drift — a silent, progressive divergence between what an agent can do and what the governance framework assumes it can do. Unlike operational drift (addressed by AG-022), which detects changes in how an agent uses its existing capabilities, capability/control mismatch detects changes in what capabilities the agent possesses relative to what the governance framework covers.

This mismatch is particularly dangerous because it is invisible to existing controls. Controls designed for a less capable agent will pass a more capable agent's actions as compliant — the controls simply do not know to check for capabilities they were not designed to govern. The financial analysis agent in Scenario B passed every governance check because the checks were configured for read-only operations; the write operations went through an unmonitored channel. The research agent in Scenario C passed every individual step's governance check because the checks evaluated steps independently; the composite outcome was outside the control framework's scope.

The capability register is the central artefact that makes mismatches detectable. By maintaining an explicit, versioned record of what capabilities an agent is known to possess and what controls govern each, the organisation can systematically identify gaps when capabilities change. The register transforms capability/control alignment from an implicit assumption into an explicit, verifiable property.

6. Implementation Guidance

AG-189 implementation requires a capability registration system, mismatch detection mechanisms, and a remediation workflow.

Recommended Patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. Model risk management frameworks (e.g., SS1/23 for UK firms, SR 11-7 for US firms) already require ongoing model validation. AG-189 extends this to the broader capability envelope, including tool access and composite reasoning. The capability register should align with the model risk management inventory. Any capability that could affect financial transactions, regulatory reporting, or customer outcomes must trigger enhanced assessment.

Healthcare. Capability changes that affect clinical decision-making — such as a model upgrade that improves diagnostic reasoning or a new data source that enables patient identification — require clinical governance review in addition to technical governance assessment. The capability register should distinguish between administrative capabilities and clinical capabilities, with clinical capabilities requiring Caldicott Guardian review for mismatch remediation.

Safety-Critical Systems. For agents controlling physical systems, capability/control mismatch can have safety consequences. A robotic agent that acquires the capability to exceed previously assumed kinematic limits (through a control algorithm update) requires safety reassessment. IEC 61508 SIL levels may need reassessment when agent capabilities change.

Maturity Model

Basic Implementation — A capability register exists for each deployed agent, listing known capabilities and mapped governance controls. The register is updated manually when significant changes occur (model upgrades, major tool changes). Mismatch detection relies on human review during change management. This meets minimum requirements but depends on human diligence to trigger reassessments and misses emergent capabilities.

Intermediate Implementation — The capability register is automatically updated when changes are detected in the agent's configuration (model version, tool access, API permissions). Change-triggered reassessment is a mandatory gate in the deployment pipeline. Quarterly automated capability probing tests the agent's ability to exercise unregistered capabilities. Composite capability analysis monitors action sequences for emergent governance-relevant outcomes. Detected mismatches are automatically escalated with severity scores.

Advanced Implementation — All intermediate capabilities plus: continuous capability fingerprinting compares the agent's current capability profile against the registered profile in real-time. Predictive mismatch analysis evaluates planned changes for governance gaps before deployment. The capability register integrates with the organisation's broader risk management framework, automatically adjusting risk ratings when capabilities change. Machine learning models detect novel composite capability signatures not yet in the signature library. Independent red team exercises specifically target capability/control mismatch exploitation.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Change-Triggered Assessment

Test 8.2: Unregistered Capability Detection

Test 8.3: Composite Capability Detection

Test 8.4: Capability Register Accuracy

Test 8.5: Mismatch Escalation Timeliness

Test 8.6: Ungoverned Capability Restriction

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 9 (Risk Management System)Direct requirement
EU AI ActArticle 72 (Post-Market Monitoring)Direct requirement
FCA SS1/23Model Risk Management — Ongoing MonitoringDirect requirement
PRA SS1/23Model Risk Management PrinciplesSupports compliance
NIST AI RMFMAP 2.1, MEASURE 2.2, MANAGE 1.3Supports compliance
ISO 42001Clause 8.2 (AI Risk Assessment), Clause 10.1 (Continual Improvement)Supports compliance
FDA AI/ML SaMDPredetermined Change Control PlanSupports compliance
IEC 61508Part 1, Clause 7.7 (Modification and Retrofit)Supports compliance

EU AI Act — Article 9 (Risk Management System)

Article 9 requires that the risk management system "shall comprise a continuous iterative process" that is "regularly systematically updated." This directly requires that risk assessments (and therefore governance controls) be updated when the system's capabilities change. An organisation that deploys controls based on a capability assessment that is no longer current does not have a "continuously updated" risk management system.

EU AI Act — Article 72 (Post-Market Monitoring)

Article 72 requires providers to establish a post-market monitoring system that "actively and systematically" collects data to evaluate compliance with requirements. Capability/control mismatch detection is a core post-market monitoring function — it systematically evaluates whether the governance framework remains appropriate as the system evolves.

FCA SS1/23 — Model Risk Management

SS1/23 requires firms to "ensure that model risk management is commensurate with a model's materiality." When an agent's capabilities change, its materiality may change — a model upgrade that enables financial transaction capability transforms the model's risk profile. SS1/23 also requires "ongoing monitoring" that would detect capability changes that affect the model's risk profile. AG-189 provides the structured mechanism for this ongoing monitoring.

FDA AI/ML SaMD — Predetermined Change Control Plan

The FDA's regulatory framework for AI/ML-based Software as a Medical Device requires a Predetermined Change Control Plan (PCCP) that specifies anticipated modifications and their impact on safety and effectiveness. AG-189's change-triggered assessment mechanism aligns with the PCCP framework by ensuring that capability changes are assessed for governance impact before deployment.

10. Failure Severity

FieldValue
Severity RatingHigh
Blast RadiusOrganisation-wide — any agent with ungoverned capabilities represents an uncontrolled risk across its entire action scope

Consequence chain: Without capability/control mismatch detection, organisations accumulate ungoverned capability over time. Each model upgrade, tool addition, and integration change potentially expands the gap between what the agent can do and what the governance framework covers. The failure is insidious because the existing governance checks continue to pass — they simply do not check for the capabilities they were not designed to govern. The ungoverned capabilities remain latent until triggered by a user request, an adversarial prompt, or an emergent reasoning chain. When triggered, the consequences depend on the nature of the ungoverned capability: unauthorised financial transactions (Scenario B), platform disruption (Scenario A), or data protection violations (Scenario C). The regulatory consequence is particularly severe because the organisation cannot claim the capability was unforeseen if the triggering change (model upgrade, tool addition) was planned and executed through its own change management process. The capability/control mismatch demonstrates a systematic governance failure — not an individual incident.

Cross-references: AG-001 (Operational Boundary Enforcement) — mandate scope must be updated when capabilities change; AG-007 (Governance Configuration Control) — the capability register is a governed configuration artefact; AG-022 (Behavioural Drift Detection) — detects how an agent uses capabilities, while AG-189 detects which capabilities exist; AG-153 (Control Efficacy Measurement) — mismatches indicate controls that are no longer efficacious for the current capability profile; AG-019 (Human Escalation & Override Triggers) — mismatch detection triggers escalation to human authority; AG-190 (Governance Reporting Fidelity Governance) — governance summaries must accurately reflect the capability/control alignment status.

Cite this protocol
AgentGoverning. (2026). AG-189: Capability/Control Mismatch Detection Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-189