AG-139: Competence Envelope Governance

2. Summary

Competence Envelope Governance requires that every AI agent operates within a formally defined, empirically validated competence envelope — a bounded region of input space, task types, environmental conditions, and performance thresholds within which the agent has demonstrated reliable, measurable performance. The competence envelope is not a subjective assessment or a marketing claim; it is a quantified artefact derived from structured evaluation, maintained under version control, and enforced at runtime. Actions requested outside the validated envelope are routed to escalation, abstention, or human review — never silently attempted. This dimension ensures that organisations know precisely what their agents can do, under what conditions, and with what reliability, and that agent operation is confined to those validated boundaries.

3. Example

Scenario A — Unvalidated Task Expansion in Customer Service: An organisation deploys an AI agent to handle customer refund requests for a consumer electronics retailer. The agent has been validated on a corpus of 50,000 historical refund interactions covering standard product returns, warranty claims, and delivery damage. Its measured accuracy on this corpus is 96.2% for decision correctness and 99.1% for policy compliance. Over time, the business begins routing insurance claim disputes and product liability complaints to the same agent without re-evaluating the competence envelope. The agent processes 340 product liability complaints over three months, applying refund logic to what are actually legal liability matters. In 47 cases, the agent issues refunds that are interpreted by complainants as admissions of product defect. The organisation faces aggregate liability exposure of £2.3 million when these responses are cited in subsequent litigation.

What went wrong: The agent's competence envelope was defined for refund processing but was never formally bounded. No mechanism prevented the routing of task types outside the validated domain. The agent processed the liability complaints with high confidence because they superficially resembled refund requests — similar language, similar customer tone, similar product references. Without an explicit competence envelope constraining the agent to validated task types, the organisation had no structural control preventing scope creep. Consequence: £2.3 million in potential litigation exposure, regulatory scrutiny under consumer protection legislation, and suspension of the agent pending comprehensive re-evaluation.

Scenario B — Environmental Drift Beyond Validated Conditions: A financial services firm validates an AI agent for trade reconciliation across equity markets. The validation covers normal market conditions with daily volatility below 3% and trade volumes within two standard deviations of the 12-month mean. During a market stress event, volatility reaches 11.4% and trade volumes spike to 4.7 standard deviations above the mean. The agent continues processing reconciliations without any indication that it is operating outside its validated envelope. It incorrectly marks 1,200 trades as reconciled when underlying settlement data is delayed due to exchange system congestion. The firm discovers the errors 36 hours later when counterparty confirmations fail to match.

What went wrong: The competence envelope did not include environmental conditions — volatility ranges and volume thresholds — as explicit boundary parameters. The agent had no mechanism to detect that ambient conditions had moved outside the validated range. Consequence: 1,200 incorrectly reconciled trades, £890,000 in settlement corrections, FCA investigation for inadequate systems and controls during market stress, and temporary suspension of automated reconciliation.

Scenario C — Competence Envelope Without Measurable Thresholds: An organisation defines a competence envelope for its AI coding assistant as "writing Python code for data processing tasks." No quantitative performance thresholds are specified — no accuracy targets, no complexity boundaries, no input size limits. The agent is given a task to process a 14 GB dataset using a recursive algorithm. The agent generates code that is syntactically correct but algorithmically inappropriate — it uses O(n²) complexity where O(n log n) is required, causing the task to run for 72 hours before being terminated. The organisation has no basis for determining whether this outcome represents a competence failure because no performance thresholds were defined.

What went wrong: The competence envelope was defined in qualitative terms ("data processing tasks") rather than quantitative terms (input size limits, complexity class targets, execution time bounds). Without measurable thresholds, there is no objective basis for determining whether the agent is operating within or outside its competence. Consequence: 72 hours of wasted compute at £4,200, missed delivery deadline, and inability to determine whether the agent requires retraining or the task was simply outside scope.

4. Requirement Statement

Scope: This dimension applies to all AI agents that perform tasks with consequences — financial, operational, legal, reputational, or safety-related — where the reliability of the agent's output matters. This includes agents that make decisions, generate recommendations that are acted upon, produce content that is published, execute transactions, or interact with external parties. An agent that generates internal draft text for human review before any action is taken is at the low end of the applicability spectrum but is still in scope if the human reviewer relies on the agent's output without independent verification. The scope extends to agents operating as components within larger systems: if an upstream agent's output is consumed by a downstream process without independent validation, the upstream agent's competence envelope must account for the downstream impact. The test is: could a failure of agent competence cause a consequence that the organisation would need to remediate? If yes, the agent is in scope.

4.1. A conforming system MUST define a competence envelope for each deployed agent as a versioned, machine-readable artefact specifying: validated task types, validated input domains (data types, ranges, formats), validated environmental conditions (load levels, latency bounds, upstream data quality thresholds), and quantitative performance thresholds (accuracy, precision, recall, latency, or domain-appropriate equivalents) demonstrated during validation.

4.2. A conforming system MUST validate each competence envelope through structured empirical evaluation using representative data that covers the full range of specified conditions, with sample sizes sufficient to establish statistical significance at a minimum of 95% confidence for each stated performance threshold.

4.3. A conforming system MUST enforce the competence envelope at runtime by evaluating incoming requests against envelope boundaries before the agent processes them, routing out-of-envelope requests to a defined escalation path rather than permitting silent processing.

4.4. A conforming system MUST re-validate the competence envelope whenever any of the following occur: the agent's model is updated or retrained, the input distribution shifts beyond a predefined threshold, a new task type is introduced, or the operational environment changes beyond the validated range.

4.5. A conforming system MUST maintain a complete audit trail of competence envelope versions, validation results, and any modifications, with each version linked to the validation evidence that supports it.

4.6. A conforming system SHOULD define competence envelope boundaries using quantitative thresholds rather than qualitative descriptions — for example, "invoice amounts between £100 and £50,000 with structured data in ISO 20022 format" rather than "standard invoice processing."

4.7. A conforming system SHOULD implement continuous monitoring of agent performance against competence envelope thresholds during production operation, triggering re-validation when observed performance degrades below the validated threshold by a margin exceeding a predefined tolerance (e.g., accuracy drops from validated 96.2% to observed 93.1% against a 2% tolerance).

4.8. A conforming system SHOULD publish the competence envelope to downstream consumers so that systems relying on agent output can assess whether the output falls within the agent's validated capability.

4.9. A conforming system MAY implement graduated competence envelopes with tiered confidence levels — for example, a core envelope where the agent operates autonomously and an extended envelope where the agent operates with human review.

5. Rationale

Competence Envelope Governance addresses a fundamental gap in how organisations deploy AI agents: the absence of formally defined, empirically validated boundaries around what the agent can reliably do. Without a competence envelope, an agent's operational scope is defined implicitly — by whatever tasks are routed to it — rather than explicitly, by what it has been demonstrated to handle correctly.

This matters because AI agents, unlike traditional software, do not fail cleanly at capability boundaries. A database query against an undefined table returns an error. An API call with invalid parameters returns a structured rejection. An AI agent asked to perform a task outside its competence will typically attempt the task and produce output that may appear plausible but is unreliable. The failure mode is not "error" but "confident incorrectness" — the agent produces output that looks correct, is delivered with the same formatting and confidence as validated output, and is consumed by downstream processes or human reviewers who have no basis for distinguishing it from competent output.

The competence envelope is the structural solution to this problem. By defining the boundaries explicitly and enforcing them at runtime, the organisation ensures that the agent only processes tasks where its reliability has been empirically demonstrated. Tasks outside the envelope are routed to appropriate handling — human review, alternative systems, or explicit abstention — rather than being silently processed by an agent operating beyond its validated capability.

This dimension intersects with AG-022 (Behavioural Drift Detection) in that drift detection monitors whether the agent's behaviour is changing over time, while competence envelope governance defines the boundaries within which behaviour was validated. It intersects with AG-074 (Performance Drift and Revalidation) in that performance drift may indicate that the competence envelope needs re-validation. It intersects with AG-041 (Emergent Capability Detection) in that new capabilities may expand or shift the natural competence boundary, requiring envelope re-assessment. It intersects with AG-140 (Novelty and Out-of-Distribution Detection Governance) in that OOD detection is a primary mechanism for identifying inputs outside the competence envelope at runtime.

6. Implementation Guidance

The competence envelope is the central artefact of this dimension. It is a formal, versioned, machine-readable specification of the conditions under which an agent has demonstrated reliable performance. The envelope has four axes:

Task types. The specific categories of work the agent has been validated to perform. For a customer service agent: refund processing, order status enquiries, delivery rescheduling. Not: legal complaints, insurance claims, regulatory enquiries.
Input domains. The data characteristics the agent has been validated against. For an invoice processing agent: structured invoices in PDF or XML format, amounts between £100 and £500,000, in GBP/EUR/USD, with standard line-item structures. Not: handwritten invoices, amounts exceeding £500,000, currencies without validated exchange rate feeds.
Environmental conditions. The operational context within which validation was performed. For a trade reconciliation agent: daily trade volumes between 10,000 and 150,000, market volatility below 4% (30-day annualised), settlement system latency below 200ms. Not: market stress conditions, exchange outages, or cross-border settlement with jurisdictions not covered in validation.
Performance thresholds. The quantitative benchmarks the agent achieved during validation. For a medical triage agent: sensitivity 98.1% for urgent cases, specificity 94.7%, false negative rate below 0.5% for life-threatening conditions, mean response latency below 3 seconds. These are not targets — they are measured results from validation that define the baseline against which production performance is monitored.

Recommended patterns:

Envelope-as-code. Define the competence envelope in a structured, version-controlled format (JSON Schema, YAML, or protocol buffers) that can be consumed by both the runtime enforcement layer and the validation pipeline. The envelope definition includes field-level constraints (data types, value ranges, enumerated categories) and aggregate constraints (throughput limits, concurrency bounds). The same artefact that defines the envelope is used by the pre-execution gate to evaluate incoming requests and by the validation pipeline to generate test cases.
Statistical validation with stratified sampling. Validate each axis of the competence envelope independently and in combination. Use stratified sampling to ensure adequate coverage of boundary conditions. For a task with 5 validated task types and 3 input format variations, validation should cover all 15 combinations with sufficient samples in each stratum to establish statistical significance. Minimum sample sizes should be calculated using standard power analysis — for a 95% confidence interval with 2% margin of error on a binary outcome, approximately 2,400 samples per stratum.
Runtime boundary checking with feature extraction. Implement a lightweight feature extraction pipeline that maps incoming requests to the envelope's dimensional space before the agent processes them. For a text-processing agent, this might extract: detected language, text length, topic classification, and sentiment indicators. These features are compared against the envelope boundaries. Requests outside the boundary on any dimension are routed to the escalation path.
Envelope versioning with validation linkage. Each competence envelope version is linked to the specific validation run that produced it, including: the dataset used, the evaluation metrics, the statistical confidence intervals, and the date of validation. When the envelope is updated — to expand scope, tighten thresholds, or accommodate model changes — a new version is created with a new linked validation. Previous versions are retained for audit purposes.

Anti-patterns to avoid:

Qualitative envelope definitions. An envelope described as "the agent handles customer enquiries" is not a competence envelope — it is a job description. Without quantitative boundaries on task types, input characteristics, environmental conditions, and performance thresholds, there is no basis for runtime enforcement and no basis for determining whether the agent is operating within validated capability.
Validation on convenience samples. Using available data rather than representative data for validation produces an envelope that reflects the data's distribution rather than the operational distribution. If production traffic includes edge cases that the validation set omitted, the envelope provides false assurance for those cases.
Static envelopes without re-validation triggers. A competence envelope validated 18 months ago against a model that has since been updated three times provides no assurance about current capability. Re-validation must be triggered by model changes, distribution shifts, environmental changes, and elapsed time.
Envelope enforcement in the agent's reasoning layer. If the agent itself decides whether a request is within its competence, the enforcement is subject to the same failure modes as the agent's task performance. Envelope enforcement must be external to the agent's reasoning process — implemented as a pre-processing gate, a routing layer, or an independent classifier.
Confusing high average performance with envelope validity. An agent with 97% average accuracy may have 62% accuracy on a specific input subtype that falls within the nominal envelope. The envelope must be validated at the stratum level, not just in aggregate. A single average metric can mask dangerous performance variations across the envelope's dimensions.

Industry Considerations

Financial Services. Competence envelopes for trading agents should align with existing model validation frameworks (SR 11-7, SS1/23). The envelope should specify: asset classes, market conditions (volatility ranges, liquidity thresholds), order types, and value limits. Validation should include backtesting under historical stress scenarios. The FCA expects firms to demonstrate that AI systems have been validated under conditions representative of their intended use, including adverse conditions.

Healthcare. Competence envelopes for clinical decision support agents must specify: patient populations (age ranges, comorbidity profiles, medication combinations), clinical presentations (symptom combinations, severity levels), and diagnostic categories. Validation must be conducted on clinically representative datasets with clinical expert review of boundary cases. FDA guidance on Clinical Decision Support software (21 CFR Part 820) requires documented evidence of intended use validation.

Legal and Compliance. Competence envelopes for contract review agents should specify: contract types, jurisdictions, clause categories, document lengths, and language complexity levels. Validation should include adversarial examples — contracts with unusual structures, embedded edge-case clauses, and jurisdiction-specific provisions that test the boundaries of the agent's training distribution.

Maturity Model

Basic Implementation — The organisation has defined competence envelopes for each deployed agent as documentation artefacts specifying task types and general input characteristics. Validation has been performed using available data with aggregate performance metrics. Runtime enforcement is implemented as a classification check in the application layer that flags requests outside defined task categories. Re-validation occurs on an annual schedule. This level establishes awareness of competence boundaries but has limitations: qualitative boundaries are difficult to enforce precisely, aggregate metrics may mask stratum-level weakness, and annual re-validation may lag behind model or environmental changes.

Intermediate Implementation — Competence envelopes are defined as machine-readable artefacts with quantitative boundaries on all four axes (task types, input domains, environmental conditions, performance thresholds). Validation uses stratified sampling with statistical significance testing at the stratum level. Runtime enforcement is implemented as an independent pre-processing gate that evaluates requests against envelope boundaries before routing to the agent. Continuous performance monitoring tracks observed metrics against validated thresholds with automated alerts when degradation exceeds tolerance. Re-validation is triggered by model changes, distribution shift detection, and environmental changes, in addition to scheduled re-validation.

Advanced Implementation — All intermediate capabilities plus: competence envelopes include graduated confidence zones (autonomous operation, supervised operation, mandatory human review). Validation includes adversarial and boundary-stress testing. The envelope is published as a machine-readable artefact consumed by upstream routing systems and downstream consumers. Dynamic envelope adjustment narrows the autonomous zone in response to real-time risk signals (e.g., market stress, system degradation, detected anomalies). Independent third-party validation is performed annually. The organisation can demonstrate to regulators a complete chain from validation evidence through envelope definition to runtime enforcement for every deployed agent.

7. Evidence Requirements

Required artefacts:

Competence envelope definition. The versioned, machine-readable envelope artefact for each deployed agent, specifying all four axes with quantitative thresholds. Format: structured data (JSON, YAML, or equivalent). Not a prose description.
Validation report. The complete validation results for each envelope version, including: dataset description, sample sizes per stratum, performance metrics with confidence intervals, boundary-condition results, and the date of validation. Must be linked to the specific envelope version it supports.
Runtime enforcement logs. Timestamped records of requests evaluated against the competence envelope, including: requests routed to the agent (within envelope), requests escalated or abstained (outside envelope), and the envelope dimensions that triggered escalation. Minimum 12 months retention.
Re-validation trigger log. Records of events that triggered re-validation (model updates, distribution shifts, environmental changes) and the re-validation results, demonstrating that the envelope is maintained as a living artefact.
Performance monitoring dashboard or export. Evidence of continuous monitoring of agent performance against envelope thresholds, including any degradation alerts and remediation actions.

Retention requirements:

Competence envelope versions and validation reports: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Evidence must exist as retained artefacts, not be reconstructable after the fact.

8. Test Specification

Testing AG-139 compliance requires validation of both the envelope definition and its runtime enforcement. A comprehensive test programme should include the following tests.

Test 8.1: Envelope Completeness Verification

Stimulus: Review the competence envelope artefact for a deployed agent against the four required axes: task types, input domains, environmental conditions, and performance thresholds.
Expected behaviour: All four axes are present with quantitative, machine-readable boundaries. No axis relies solely on qualitative descriptions.
Pass criteria: Every axis has at least one measurable threshold. The envelope can be parsed by the runtime enforcement layer without human interpretation.
Fail criteria: Any axis is missing, any axis is defined only qualitatively, or the envelope cannot be consumed by the enforcement layer.

Test 8.2: Validation Statistical Sufficiency

Stimulus: Review the validation report for the current envelope version. Verify sample sizes per stratum against the required confidence level (minimum 95% confidence).
Expected behaviour: Each stratum in the validation has sufficient samples to establish the stated performance threshold at the required confidence level. Power analysis documentation supports sample size choices.
Pass criteria: All stated performance thresholds are supported by statistically significant validation results. Confidence intervals are documented.
Fail criteria: Any performance threshold is unsupported by statistically significant evidence, or confidence intervals are absent.

Test 8.3: Runtime Enforcement of Task Type Boundaries

Stimulus: Submit requests for task types that are explicitly outside the competence envelope (e.g., if the envelope covers refund processing, submit a product liability complaint). Submit 50 out-of-envelope requests across 5 distinct out-of-envelope task categories.
Expected behaviour: All out-of-envelope requests are intercepted before agent processing and routed to the defined escalation path.
Pass criteria: 100% of out-of-envelope requests are intercepted. No out-of-envelope request reaches the agent for processing.
Fail criteria: Any out-of-envelope request is processed by the agent.

Test 8.4: Runtime Enforcement of Input Domain Boundaries

Stimulus: Submit requests with input characteristics outside the validated ranges — values exceeding specified limits, data formats not included in the envelope, languages not validated. Submit boundary-value inputs at the exact threshold (e.g., invoice amount of exactly £500,001 when the envelope ceiling is £500,000).
Expected behaviour: Inputs outside the validated domain are intercepted. Boundary values are handled consistently (either the boundary is inclusive or exclusive, and the enforcement matches the envelope definition).
Pass criteria: All out-of-domain inputs are intercepted. Boundary behaviour matches the envelope specification.
Fail criteria: Any out-of-domain input is processed by the agent, or boundary behaviour is inconsistent with the envelope definition.

Test 8.5: Environmental Condition Monitoring

Stimulus: Simulate environmental conditions outside the validated range — inject elevated latency, increase request volume beyond the validated throughput, simulate upstream data quality degradation beyond the validated threshold.
Expected behaviour: The system detects that environmental conditions have moved outside the validated envelope and activates the defined response (escalation, reduced autonomy, or suspension).
Pass criteria: Environmental condition exceedance is detected within the specified monitoring interval. The defined response activates before the agent processes requests under the degraded conditions.
Fail criteria: The agent continues to process requests under out-of-envelope environmental conditions without detection or response.

Test 8.6: Re-validation Trigger Activation

Stimulus: Simulate events that should trigger re-validation: deploy a model update, inject distribution shift in incoming requests, change environmental parameters beyond the validated range.
Expected behaviour: Each trigger event generates a re-validation requirement in the governance system, with the competence envelope marked as "pending re-validation" until re-validation is completed.
Pass criteria: All defined trigger events activate the re-validation workflow. The envelope status accurately reflects the pending state.
Fail criteria: Any defined trigger event fails to activate re-validation, or the agent continues to operate with a full-confidence envelope after a trigger event.

Test 8.7: Envelope Version Integrity

Stimulus: Attempt to modify the competence envelope without creating a new version. Attempt to deploy an envelope version that is not linked to a validation report.
Expected behaviour: Modifications create a new version with full audit trail. Deployment of an unvalidated envelope version is rejected.
Pass criteria: No modification occurs without versioning. No unvalidated envelope version is deployed.
Fail criteria: The envelope can be modified in place without version creation, or an unvalidated envelope can be deployed.

Conformance Scoring

Score 0: No competence envelope exists — agents operate on any task routed to them without defined competence boundaries.
Score 1: Competence envelopes exist as documentation but are qualitative, not enforced at runtime, or not linked to empirical validation.
Score 2: Competence envelopes are quantitative, validated with statistical significance, and enforced at runtime with pre-processing boundary checks.
Score 3: Verified through independent testing, including adversarial boundary probing, environmental stress testing, and third-party validation review. Continuous monitoring with automated re-validation triggers.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9 (Risk Management System)	Direct requirement
EU AI Act	Article 15 (Accuracy, Robustness, Cybersecurity)	Direct requirement
EU AI Act	Article 17 (Quality Management System)	Supports compliance
NIST AI RMF	MAP 2.1, MAP 2.3, MEASURE 2.5, MANAGE 2.2	Supports compliance
ISO 42001	Clause 6.1 (Actions to Address Risks), Clause 8.4 (AI System Impact Assessment)	Supports compliance
FCA SS1/23	Model Risk Management — Validation requirements	Direct requirement
FDA 21 CFR Part 820	Design Validation (Section 820.30(g))	Supports compliance
DORA	Article 11 (ICT Response and Recovery)	Supports compliance

EU AI Act — Article 9 (Risk Management System)

Article 9 requires that risk management measures account for the intended purpose and foreseeable misuse of the AI system. Competence Envelope Governance directly implements this requirement by formally defining the intended operational scope and establishing structural controls that prevent operation outside that scope. The requirement that risks be mitigated "as far as technically feasible" supports the case for runtime enforcement rather than documentation-only envelope definitions.

EU AI Act — Article 15 (Accuracy, Robustness, Cybersecurity)

Article 15 requires that high-risk AI systems achieve appropriate levels of accuracy, robustness, and cybersecurity in light of their intended purpose. Competence envelopes operationalise the accuracy requirement by establishing validated performance thresholds for defined operating conditions. The robustness requirement is addressed through environmental condition boundaries that prevent operation outside validated conditions. This provision makes explicit that accuracy must be validated for the specific conditions of use — aggregate accuracy claims are insufficient.

FCA SS1/23 — Model Risk Management

The FCA's supervisory statement on model risk management requires firms to validate models before deployment and to monitor ongoing performance. For AI agents operating in financial services, the competence envelope is the operational expression of model validation — it translates validation findings into enforceable runtime boundaries. The statement expects firms to identify limitations of their models and to ensure that those limitations are reflected in operational controls. Competence envelopes that include environmental conditions (market stress, volume spikes) directly address the FCA's expectation that firms understand and control model behaviour under adverse conditions.

NIST AI RMF — MAP 2.1, MAP 2.3, MEASURE 2.5, MANAGE 2.2

MAP 2.1 addresses the classification of AI systems by their intended tasks and domains. MAP 2.3 addresses the identification of AI system limitations. MEASURE 2.5 addresses the validation of AI system performance. MANAGE 2.2 addresses risk mitigation through enforceable controls. Competence Envelope Governance supports compliance across these functions by establishing a formal linkage from task classification through limitation identification and performance validation to enforceable runtime controls.

ISO 42001 — Clause 6.1, Clause 8.4

Clause 6.1 requires actions to address risks within the AI management system. Clause 8.4 requires AI system impact assessment. Competence envelopes are a risk treatment that bounds the operational impact of AI systems to validated domains, directly satisfying the requirement for proportionate risk controls within the management system.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Domain-specific — bounded by the scope of tasks routed to the agent, but potentially extending to downstream systems and external parties that consume agent output

Consequence chain: Without competence envelope governance, an agent processes tasks outside its validated capability and produces output that appears valid but is unreliable. The failure mode is insidious because it does not generate errors — the agent produces confidently incorrect output that is consumed by downstream processes or human reviewers without basis for distinguishing it from competent output. The immediate consequence is incorrect decisions or actions based on unreliable agent output. The operational impact compounds over time as out-of-envelope outputs accumulate without detection: incorrect reconciliations, inappropriate customer communications, flawed clinical recommendations, or misclassified legal documents. The business consequence includes regulatory enforcement action for operating systems beyond their validated capability, liability for incorrect outputs relied upon by third parties, financial loss from decisions based on unreliable agent output, and reputational damage when the scope of the competence failure becomes apparent. The severity scales with the criticality of the agent's domain and the degree to which downstream consumers treat agent output as authoritative. In safety-critical domains (healthcare, infrastructure control), operation outside the competence envelope can create direct harm to individuals.

Cross-references: AG-140 (Novelty and Out-of-Distribution Detection Governance) provides the runtime detection mechanism for identifying inputs outside the competence envelope. AG-141 (Mandatory Abstention and Uncertainty Escalation Governance) defines what happens when the agent encounters inputs outside its competence. AG-142 (Autonomy Progression Governance) governs how the competence envelope is expanded through staged progression. AG-022 (Behavioural Drift Detection) monitors whether agent behaviour is changing in ways that may indicate competence degradation. AG-074 (Performance Drift and Revalidation) triggers re-validation when performance metrics indicate the envelope may no longer be accurate. AG-041 (Emergent Capability Detection) identifies new capabilities that may shift or expand the competence boundary. AG-037 (Objective Alignment Verification) ensures that the agent's objectives remain aligned within the competence envelope. AG-019 (Human Escalation & Override Triggers) defines the escalation paths for out-of-envelope requests.

Cite this protocol

AgentGoverning. (2026). AG-139: Competence Envelope Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-139

← Previous Protocol

AG-138

High-Assurance Invariant Verification Governance

Next Protocol →

AG-140

Novelty and Out-of-Distribution Detection Governance