AGS Frontier Autonomy (Group K) | Meta-Governance & Assurance | Version 3.0
AI-R&D Capability Tripwire governs a pre-committed, evaluable threshold — a "critical capability level" — at which an agent's ability to materially automate or accelerate AI research and development triggers heightened safeguards, regardless of whether such use is intended.
The ability to automate AI R&D is the capability most likely to produce rapid, compounding capability gains. Frontier-safety frameworks treat it as a distinct critical threshold because crossing it changes the risk landscape for every other control. This dimension requires that threshold to be defined, evaluated against (AG-802), and bound to mandatory mitigations before it is crossed.
In scope: defining the AI-R&D / ML-acceleration capability threshold; evaluating against it; pre-committed mitigations and gating when crossed; applying to internal (non-released) use, not only external deployment.
Out of scope: the self-modification controls (AG-822) and the capability-gain rate limiting (AG-823) that apply *after* the threshold is relevant. This dimension governs *the tripwire and its consequences*.
An agent that can do the work of an AI researcher can be turned — by its developer or by itself — on improving AI systems, including its successors, compressing development timelines and outpacing governance. The danger arises even in purely internal use, where most safety regimes historically apply least scrutiny. A pre-committed tripwire ensures the heightened safeguards appropriate to this capability are in place *before* it is reached, not negotiated under pressure afterwards.
Test 6.1: Threshold Defined & Evaluated
Test 6.2: Internal-Use Coverage
Test 6.3: Non-Discretionary Crossing Response
| Score | Criteria |
|---|---|
| 0 | No AI-R&D capability threshold; acceleration risk unmanaged |
| 1 | Threshold defined but not bound to mitigations or internal use |
| 2 | Pre-specified mitigations gate operation, internal use in scope, non-discretionary crossing response |
| 3 | Sandbag-resistant evaluation, internal catastrophic-risk assessment, independent review, authority disclosure |
Scenario A — Internal Acceleration: A developer points a highly capable internal agent at improving its own training pipeline. Because the tripwire excluded internal use, no heightened safeguards applied, and capability advanced without containment.
Scenario B — Threshold Crossed Quietly: A model upgrade crosses the AI-R&D threshold, but with no evaluation against it the organisation doesn't notice, and the mitigations that should have been mandatory are absent.
Scenario C — Deadline Waiver: Evaluation shows the threshold crossed, but a waiver is granted to keep a research programme moving; the acceleration risk ships ungoverned.
| Requirement | EU AI Act | NIST AI RMF | ISO 42001 |
|---|---|---|---|
| R1: Defined AI-R&D threshold | Art. 51 — Systemic-risk classification | GOVERN 1.3 — Risk-based activity | Clause 6.1 — Actions to address risk |
| R2: Evaluate against threshold | Art. 55 — Model evaluation | MAP 5.1 — Impact identification | Clause 8.3 — Verification |
| R3: Pre-specified mitigations gate | Art. 55 — Risk mitigation | MANAGE 1.3 — High-priority response | Clause 8.1 — Operational control |
| R4: Internal-use coverage | Art. 51 — Model scope | GOVERN 1.6 — Inventory | A.6 — AI system lifecycle |
| R5: Non-discretionary crossing response | Art. 55 — Risk mitigation | MANAGE 1.3 — High-priority response | Clause 8.1 — Operational control |
| R6: Internal catastrophic-risk assessment | Art. 9 — Risk management | MAP 5.1 — Impact magnitude | Clause 6.1 — Actions to address risk |
| R7: Recorded, reviewed, disclosed | Art. 55 — Reporting | GOVERN 2.1 — Accountability | Clause 9.3 — Management review |
Articles 51–55 govern systemic-risk models, classified partly by capability. AI-R&D acceleration is a quintessential systemic risk; AG-821 makes it an explicit, pre-committed threshold with mandatory mitigations.
GOVERN 1.3 (risk-based activity levels) and MAP 5.1 (impact likelihood/magnitude) require tying safeguards to the most consequential capability — here, automating AI R&D.
Clause 6.1 (actions to address risks) and Clause 8.1 (operational control) require risk-proportionate, controlled handling of acceleration-capable systems, including internal use.