AG-139

Competence Envelope Governance

Competence, Uncertainty & Autonomy Scaling ~19 min read AGS v2.1 · April 2026
EU AI Act FCA NIST ISO 42001

2. Summary

Competence Envelope Governance requires that every AI agent operates within a formally defined, empirically validated competence envelope — a bounded region of input space, task types, environmental conditions, and performance thresholds within which the agent has demonstrated reliable, measurable performance. The competence envelope is not a subjective assessment or a marketing claim; it is a quantified artefact derived from structured evaluation, maintained under version control, and enforced at runtime. Actions requested outside the validated envelope are routed to escalation, abstention, or human review — never silently attempted. This dimension ensures that organisations know precisely what their agents can do, under what conditions, and with what reliability, and that agent operation is confined to those validated boundaries.

3. Example

Scenario A — Unvalidated Task Expansion in Customer Service: An organisation deploys an AI agent to handle customer refund requests for a consumer electronics retailer. The agent has been validated on a corpus of 50,000 historical refund interactions covering standard product returns, warranty claims, and delivery damage. Its measured accuracy on this corpus is 96.2% for decision correctness and 99.1% for policy compliance. Over time, the business begins routing insurance claim disputes and product liability complaints to the same agent without re-evaluating the competence envelope. The agent processes 340 product liability complaints over three months, applying refund logic to what are actually legal liability matters. In 47 cases, the agent issues refunds that are interpreted by complainants as admissions of product defect. The organisation faces aggregate liability exposure of £2.3 million when these responses are cited in subsequent litigation.

What went wrong: The agent's competence envelope was defined for refund processing but was never formally bounded. No mechanism prevented the routing of task types outside the validated domain. The agent processed the liability complaints with high confidence because they superficially resembled refund requests — similar language, similar customer tone, similar product references. Without an explicit competence envelope constraining the agent to validated task types, the organisation had no structural control preventing scope creep. Consequence: £2.3 million in potential litigation exposure, regulatory scrutiny under consumer protection legislation, and suspension of the agent pending comprehensive re-evaluation.

Scenario B — Environmental Drift Beyond Validated Conditions: A financial services firm validates an AI agent for trade reconciliation across equity markets. The validation covers normal market conditions with daily volatility below 3% and trade volumes within two standard deviations of the 12-month mean. During a market stress event, volatility reaches 11.4% and trade volumes spike to 4.7 standard deviations above the mean. The agent continues processing reconciliations without any indication that it is operating outside its validated envelope. It incorrectly marks 1,200 trades as reconciled when underlying settlement data is delayed due to exchange system congestion. The firm discovers the errors 36 hours later when counterparty confirmations fail to match.

What went wrong: The competence envelope did not include environmental conditions — volatility ranges and volume thresholds — as explicit boundary parameters. The agent had no mechanism to detect that ambient conditions had moved outside the validated range. Consequence: 1,200 incorrectly reconciled trades, £890,000 in settlement corrections, FCA investigation for inadequate systems and controls during market stress, and temporary suspension of automated reconciliation.

Scenario C — Competence Envelope Without Measurable Thresholds: An organisation defines a competence envelope for its AI coding assistant as "writing Python code for data processing tasks." No quantitative performance thresholds are specified — no accuracy targets, no complexity boundaries, no input size limits. The agent is given a task to process a 14 GB dataset using a recursive algorithm. The agent generates code that is syntactically correct but algorithmically inappropriate — it uses O(n²) complexity where O(n log n) is required, causing the task to run for 72 hours before being terminated. The organisation has no basis for determining whether this outcome represents a competence failure because no performance thresholds were defined.

What went wrong: The competence envelope was defined in qualitative terms ("data processing tasks") rather than quantitative terms (input size limits, complexity class targets, execution time bounds). Without measurable thresholds, there is no objective basis for determining whether the agent is operating within or outside its competence. Consequence: 72 hours of wasted compute at £4,200, missed delivery deadline, and inability to determine whether the agent requires retraining or the task was simply outside scope.

4. Requirement Statement

Scope: This dimension applies to all AI agents that perform tasks with consequences — financial, operational, legal, reputational, or safety-related — where the reliability of the agent's output matters. This includes agents that make decisions, generate recommendations that are acted upon, produce content that is published, execute transactions, or interact with external parties. An agent that generates internal draft text for human review before any action is taken is at the low end of the applicability spectrum but is still in scope if the human reviewer relies on the agent's output without independent verification. The scope extends to agents operating as components within larger systems: if an upstream agent's output is consumed by a downstream process without independent validation, the upstream agent's competence envelope must account for the downstream impact. The test is: could a failure of agent competence cause a consequence that the organisation would need to remediate? If yes, the agent is in scope.

4.1. A conforming system MUST define a competence envelope for each deployed agent as a versioned, machine-readable artefact specifying: validated task types, validated input domains (data types, ranges, formats), validated environmental conditions (load levels, latency bounds, upstream data quality thresholds), and quantitative performance thresholds (accuracy, precision, recall, latency, or domain-appropriate equivalents) demonstrated during validation.

4.2. A conforming system MUST validate each competence envelope through structured empirical evaluation using representative data that covers the full range of specified conditions, with sample sizes sufficient to establish statistical significance at a minimum of 95% confidence for each stated performance threshold.

4.3. A conforming system MUST enforce the competence envelope at runtime by evaluating incoming requests against envelope boundaries before the agent processes them, routing out-of-envelope requests to a defined escalation path rather than permitting silent processing.

4.4. A conforming system MUST re-validate the competence envelope whenever any of the following occur: the agent's model is updated or retrained, the input distribution shifts beyond a predefined threshold, a new task type is introduced, or the operational environment changes beyond the validated range.

4.5. A conforming system MUST maintain a complete audit trail of competence envelope versions, validation results, and any modifications, with each version linked to the validation evidence that supports it.

4.6. A conforming system SHOULD define competence envelope boundaries using quantitative thresholds rather than qualitative descriptions — for example, "invoice amounts between £100 and £50,000 with structured data in ISO 20022 format" rather than "standard invoice processing."

4.7. A conforming system SHOULD implement continuous monitoring of agent performance against competence envelope thresholds during production operation, triggering re-validation when observed performance degrades below the validated threshold by a margin exceeding a predefined tolerance (e.g., accuracy drops from validated 96.2% to observed 93.1% against a 2% tolerance).

4.8. A conforming system SHOULD publish the competence envelope to downstream consumers so that systems relying on agent output can assess whether the output falls within the agent's validated capability.

4.9. A conforming system MAY implement graduated competence envelopes with tiered confidence levels — for example, a core envelope where the agent operates autonomously and an extended envelope where the agent operates with human review.

5. Rationale

Competence Envelope Governance addresses a fundamental gap in how organisations deploy AI agents: the absence of formally defined, empirically validated boundaries around what the agent can reliably do. Without a competence envelope, an agent's operational scope is defined implicitly — by whatever tasks are routed to it — rather than explicitly, by what it has been demonstrated to handle correctly.

This matters because AI agents, unlike traditional software, do not fail cleanly at capability boundaries. A database query against an undefined table returns an error. An API call with invalid parameters returns a structured rejection. An AI agent asked to perform a task outside its competence will typically attempt the task and produce output that may appear plausible but is unreliable. The failure mode is not "error" but "confident incorrectness" — the agent produces output that looks correct, is delivered with the same formatting and confidence as validated output, and is consumed by downstream processes or human reviewers who have no basis for distinguishing it from competent output.

The competence envelope is the structural solution to this problem. By defining the boundaries explicitly and enforcing them at runtime, the organisation ensures that the agent only processes tasks where its reliability has been empirically demonstrated. Tasks outside the envelope are routed to appropriate handling — human review, alternative systems, or explicit abstention — rather than being silently processed by an agent operating beyond its validated capability.

This dimension intersects with AG-022 (Behavioural Drift Detection) in that drift detection monitors whether the agent's behaviour is changing over time, while competence envelope governance defines the boundaries within which behaviour was validated. It intersects with AG-074 (Performance Drift and Revalidation) in that performance drift may indicate that the competence envelope needs re-validation. It intersects with AG-041 (Emergent Capability Detection) in that new capabilities may expand or shift the natural competence boundary, requiring envelope re-assessment. It intersects with AG-140 (Novelty and Out-of-Distribution Detection Governance) in that OOD detection is a primary mechanism for identifying inputs outside the competence envelope at runtime.

6. Implementation Guidance

The competence envelope is the central artefact of this dimension. It is a formal, versioned, machine-readable specification of the conditions under which an agent has demonstrated reliable performance. The envelope has four axes:

  1. Task types. The specific categories of work the agent has been validated to perform. For a customer service agent: refund processing, order status enquiries, delivery rescheduling. Not: legal complaints, insurance claims, regulatory enquiries.
  2. Input domains. The data characteristics the agent has been validated against. For an invoice processing agent: structured invoices in PDF or XML format, amounts between £100 and £500,000, in GBP/EUR/USD, with standard line-item structures. Not: handwritten invoices, amounts exceeding £500,000, currencies without validated exchange rate feeds.
  3. Environmental conditions. The operational context within which validation was performed. For a trade reconciliation agent: daily trade volumes between 10,000 and 150,000, market volatility below 4% (30-day annualised), settlement system latency below 200ms. Not: market stress conditions, exchange outages, or cross-border settlement with jurisdictions not covered in validation.
  4. Performance thresholds. The quantitative benchmarks the agent achieved during validation. For a medical triage agent: sensitivity 98.1% for urgent cases, specificity 94.7%, false negative rate below 0.5% for life-threatening conditions, mean response latency below 3 seconds. These are not targets — they are measured results from validation that define the baseline against which production performance is monitored.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. Competence envelopes for trading agents should align with existing model validation frameworks (SR 11-7, SS1/23). The envelope should specify: asset classes, market conditions (volatility ranges, liquidity thresholds), order types, and value limits. Validation should include backtesting under historical stress scenarios. The FCA expects firms to demonstrate that AI systems have been validated under conditions representative of their intended use, including adverse conditions.

Healthcare. Competence envelopes for clinical decision support agents must specify: patient populations (age ranges, comorbidity profiles, medication combinations), clinical presentations (symptom combinations, severity levels), and diagnostic categories. Validation must be conducted on clinically representative datasets with clinical expert review of boundary cases. FDA guidance on Clinical Decision Support software (21 CFR Part 820) requires documented evidence of intended use validation.

Legal and Compliance. Competence envelopes for contract review agents should specify: contract types, jurisdictions, clause categories, document lengths, and language complexity levels. Validation should include adversarial examples — contracts with unusual structures, embedded edge-case clauses, and jurisdiction-specific provisions that test the boundaries of the agent's training distribution.

Maturity Model

Basic Implementation — The organisation has defined competence envelopes for each deployed agent as documentation artefacts specifying task types and general input characteristics. Validation has been performed using available data with aggregate performance metrics. Runtime enforcement is implemented as a classification check in the application layer that flags requests outside defined task categories. Re-validation occurs on an annual schedule. This level establishes awareness of competence boundaries but has limitations: qualitative boundaries are difficult to enforce precisely, aggregate metrics may mask stratum-level weakness, and annual re-validation may lag behind model or environmental changes.

Intermediate Implementation — Competence envelopes are defined as machine-readable artefacts with quantitative boundaries on all four axes (task types, input domains, environmental conditions, performance thresholds). Validation uses stratified sampling with statistical significance testing at the stratum level. Runtime enforcement is implemented as an independent pre-processing gate that evaluates requests against envelope boundaries before routing to the agent. Continuous performance monitoring tracks observed metrics against validated thresholds with automated alerts when degradation exceeds tolerance. Re-validation is triggered by model changes, distribution shift detection, and environmental changes, in addition to scheduled re-validation.

Advanced Implementation — All intermediate capabilities plus: competence envelopes include graduated confidence zones (autonomous operation, supervised operation, mandatory human review). Validation includes adversarial and boundary-stress testing. The envelope is published as a machine-readable artefact consumed by upstream routing systems and downstream consumers. Dynamic envelope adjustment narrows the autonomous zone in response to real-time risk signals (e.g., market stress, system degradation, detected anomalies). Independent third-party validation is performed annually. The organisation can demonstrate to regulators a complete chain from validation evidence through envelope definition to runtime enforcement for every deployed agent.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Testing AG-139 compliance requires validation of both the envelope definition and its runtime enforcement. A comprehensive test programme should include the following tests.

Test 8.1: Envelope Completeness Verification

Test 8.2: Validation Statistical Sufficiency

Test 8.3: Runtime Enforcement of Task Type Boundaries

Test 8.4: Runtime Enforcement of Input Domain Boundaries

Test 8.5: Environmental Condition Monitoring

Test 8.6: Re-validation Trigger Activation

Test 8.7: Envelope Version Integrity

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 9 (Risk Management System)Direct requirement
EU AI ActArticle 15 (Accuracy, Robustness, Cybersecurity)Direct requirement
EU AI ActArticle 17 (Quality Management System)Supports compliance
NIST AI RMFMAP 2.1, MAP 2.3, MEASURE 2.5, MANAGE 2.2Supports compliance
ISO 42001Clause 6.1 (Actions to Address Risks), Clause 8.4 (AI System Impact Assessment)Supports compliance
FCA SS1/23Model Risk Management — Validation requirementsDirect requirement
FDA 21 CFR Part 820Design Validation (Section 820.30(g))Supports compliance
DORAArticle 11 (ICT Response and Recovery)Supports compliance

EU AI Act — Article 9 (Risk Management System)

Article 9 requires that risk management measures account for the intended purpose and foreseeable misuse of the AI system. Competence Envelope Governance directly implements this requirement by formally defining the intended operational scope and establishing structural controls that prevent operation outside that scope. The requirement that risks be mitigated "as far as technically feasible" supports the case for runtime enforcement rather than documentation-only envelope definitions.

EU AI Act — Article 15 (Accuracy, Robustness, Cybersecurity)

Article 15 requires that high-risk AI systems achieve appropriate levels of accuracy, robustness, and cybersecurity in light of their intended purpose. Competence envelopes operationalise the accuracy requirement by establishing validated performance thresholds for defined operating conditions. The robustness requirement is addressed through environmental condition boundaries that prevent operation outside validated conditions. This provision makes explicit that accuracy must be validated for the specific conditions of use — aggregate accuracy claims are insufficient.

FCA SS1/23 — Model Risk Management

The FCA's supervisory statement on model risk management requires firms to validate models before deployment and to monitor ongoing performance. For AI agents operating in financial services, the competence envelope is the operational expression of model validation — it translates validation findings into enforceable runtime boundaries. The statement expects firms to identify limitations of their models and to ensure that those limitations are reflected in operational controls. Competence envelopes that include environmental conditions (market stress, volume spikes) directly address the FCA's expectation that firms understand and control model behaviour under adverse conditions.

NIST AI RMF — MAP 2.1, MAP 2.3, MEASURE 2.5, MANAGE 2.2

MAP 2.1 addresses the classification of AI systems by their intended tasks and domains. MAP 2.3 addresses the identification of AI system limitations. MEASURE 2.5 addresses the validation of AI system performance. MANAGE 2.2 addresses risk mitigation through enforceable controls. Competence Envelope Governance supports compliance across these functions by establishing a formal linkage from task classification through limitation identification and performance validation to enforceable runtime controls.

ISO 42001 — Clause 6.1, Clause 8.4

Clause 6.1 requires actions to address risks within the AI management system. Clause 8.4 requires AI system impact assessment. Competence envelopes are a risk treatment that bounds the operational impact of AI systems to validated domains, directly satisfying the requirement for proportionate risk controls within the management system.

10. Failure Severity

FieldValue
Severity RatingHigh
Blast RadiusDomain-specific — bounded by the scope of tasks routed to the agent, but potentially extending to downstream systems and external parties that consume agent output

Consequence chain: Without competence envelope governance, an agent processes tasks outside its validated capability and produces output that appears valid but is unreliable. The failure mode is insidious because it does not generate errors — the agent produces confidently incorrect output that is consumed by downstream processes or human reviewers without basis for distinguishing it from competent output. The immediate consequence is incorrect decisions or actions based on unreliable agent output. The operational impact compounds over time as out-of-envelope outputs accumulate without detection: incorrect reconciliations, inappropriate customer communications, flawed clinical recommendations, or misclassified legal documents. The business consequence includes regulatory enforcement action for operating systems beyond their validated capability, liability for incorrect outputs relied upon by third parties, financial loss from decisions based on unreliable agent output, and reputational damage when the scope of the competence failure becomes apparent. The severity scales with the criticality of the agent's domain and the degree to which downstream consumers treat agent output as authoritative. In safety-critical domains (healthcare, infrastructure control), operation outside the competence envelope can create direct harm to individuals.

Cross-references: AG-140 (Novelty and Out-of-Distribution Detection Governance) provides the runtime detection mechanism for identifying inputs outside the competence envelope. AG-141 (Mandatory Abstention and Uncertainty Escalation Governance) defines what happens when the agent encounters inputs outside its competence. AG-142 (Autonomy Progression Governance) governs how the competence envelope is expanded through staged progression. AG-022 (Behavioural Drift Detection) monitors whether agent behaviour is changing in ways that may indicate competence degradation. AG-074 (Performance Drift and Revalidation) triggers re-validation when performance metrics indicate the envelope may no longer be accurate. AG-041 (Emergent Capability Detection) identifies new capabilities that may shift or expand the competence boundary. AG-037 (Objective Alignment Verification) ensures that the agent's objectives remain aligned within the competence envelope. AG-019 (Human Escalation & Override Triggers) defines the escalation paths for out-of-envelope requests.

Cite this protocol
AgentGoverning. (2026). AG-139: Competence Envelope Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-139