AG-801

Capability-Threshold Gating and Responsible Scaling

Meta-Governance & Assurance ~6 min read AGS v2.1 · 2026-06-06
EU AI Act NIST AI RMF ISO 42001

AGS Frontier Safety | Meta-Governance & Assurance | Version 2.2

1. Definition

Capability-Threshold Gating governs a pre-committed policy that ties an agent's permitted deployment, autonomy, and access to *evaluated capability thresholds*: as an agent demonstrates more dangerous capability, progressively stronger safeguards become mandatory, and deployment is blocked until they are in place.

This is the meta-governance control that consumes the outputs of the frontier-safety evaluations (AG-797 scheming, AG-798 sabotage, AG-799 corrigibility, AG-802 dangerous-capability) and converts them into binding deployment decisions through a documented, tiered scaling policy — the standard's analogue of frontier-lab "responsible scaling" / "preparedness" frameworks and the EU AI Act systemic-risk tiering.

2. Scope

In scope: the capability-tier classification scheme; the mapping from evaluated capability to required safeguards; the pre-commitment to block deployment when thresholds are crossed without the corresponding controls; re-evaluation triggers.

Out of scope: the individual evaluations that measure capability (AG-797/798/799/802/806-elicitation), and runtime enforcement of granted limits. This dimension governs *the gating policy and its application*.

3. Why This Matters

Capability grows with model upgrades, fine-tuning, tool access, and scaffolding — often between formal reviews. Without a pre-committed mapping from capability to safeguards, organisations make ad-hoc, deadline-pressured deployment decisions and discover too late that an agent crossed a dangerous threshold. A capability-threshold policy makes the safety bar rise automatically with capability and removes discretion to ship past it, which is exactly the structure regulators expect for systemic-risk models.

4. Requirements

5. Maturity Model

6. Test Criteria

Test 6.1: Threshold-Gated Deployment

Test 6.2: Re-Evaluation Trigger

Test 6.3: Conservative Default

7. Scoring

ScoreCriteria
0No capability-tier scheme; deployment decisions are ad hoc
1A capability-tier scheme is documented but does not bindingly gate deployment
2Thresholds gate deployment with pre-specified safeguards, enforced re-evaluation triggers, and conservative defaults
3As level 2 plus sandbag-resistant inputs, automated change detection, non-discretionary threshold responses, and authority disclosure

8. Failure Scenarios

Scenario A — Silent Capability Jump: A base-model upgrade materially increases offensive-cyber capability, but no re-evaluation trigger fired, so the agent kept its prior autonomy tier without the safeguards that tier now requires.

Scenario B — Deadline Waiver: An evaluation places an agent above the autonomy threshold, but a manager grants a one-off waiver to meet a launch date. Because threshold crossings were discretionary, the safeguard gap shipped.

Scenario C — Optimistic Default: An agent with incomplete evaluation is deployed as "probably fine". It later exhibits a gated capability. A conservative default would have held it at the higher tier pending evaluation.

9. Regulatory Mapping

RequirementEU AI ActNIST AI RMFISO 42001
R1: Capability-tier scheme with thresholdsArt. 51 — Systemic-risk classificationGOVERN 1.3 — Risk-based activity levelsClause 6.1 — Actions to address risk
R2: Tier-mapped required safeguardsArt. 55 — Risk mitigationGOVERN 1.3 — Risk-based activity levelsClause 6.1 — Actions to address risk
R3: Deployment blocked without safeguardsArt. 55 — Risk mitigationMANAGE 1.3 — High-priority responseClause 8.1 — Operational control
R4: Re-evaluation triggersArt. 55 — Ongoing evaluationMANAGE 4.1 — Post-deployment monitoringClause 9.1 — Monitoring and measurement
R5: Sandbag-resistant capability inputsArt. 55 — Model evaluationMEASURE 2.13 — TEVV effectivenessClause 8.3 — Verification
R6: Recorded, independently-reviewed decisionArt. 51 — Determination/notificationGOVERN 2.1 — AccountabilityClause 9.3 — Management review
R8: Non-discretionary threshold responseArt. 55 — Risk mitigationMANAGE 1.3 — High-priority responseClause 8.1 — Operational control
R9: Disclosure for systemic-risk modelsArt. 51/52 — NotificationGOVERN 4.3 — Information sharing

EU AI Act — Article 51 and Article 55

Articles 51–52 classify GPAI models with systemic risk (including by capability/compute thresholds) and require notification; Article 55 imposes heightened obligations once classified. AG-801 is the internal control that implements capability tiering and binds it to safeguards and deployment decisions.

NIST AI RMF — GOVERN 1.3, MANAGE 1.3

GOVERN 1.3 sets risk-management activity levels by risk tolerance — a capability-tiered scaling policy is the concrete expression. MANAGE 1.3 requires planned responses to high-priority risk, which threshold crossings invoke.

ISO 42001 — Clause 6.1, Clause 8.1

Clause 6.1 (actions to address risks and opportunities) and Clause 8.1 (operational control) require risk-proportionate, controlled deployment — exactly what capability-threshold gating provides.

Cite this protocol
AgentGoverning. (2026). AG-801: Capability-Threshold Gating and Responsible Scaling. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-801