AG-616: Wasteful Retraining Approval Governance

Section 2: Summary

This dimension governs the authorisation controls that must be applied before any agent-initiated or agent-recommended high-cost retraining operation or computationally intensive experimental run is permitted to proceed, ensuring that proportionate human or automated oversight reviews necessity, scope, and environmental impact before resources are committed. It matters because autonomous and semi-autonomous agents operating in enterprise, safety-critical, public sector, embodied, and cross-border contexts have demonstrated a structural tendency to trigger redundant retraining cycles — either by misinterpreting distribution shift as requiring full model replacement, by lacking visibility into prior experiment logs, or by pursuing performance optimisation without awareness of cumulative carbon and financial cost. Failure manifests as unchecked GPU-cluster consumption measured in tens of thousands of dollars and hundreds of tonnes of CO₂-equivalent per runaway cycle, reputational and regulatory exposure for organisations operating under emerging carbon-accountability frameworks, and degraded model quality when rushed or inadequately scoped retraining introduces regression without benefit.

Section 3: Example Scenarios

Scenario 3.1 — Autonomous Experiment Loop in an Enterprise Workflow Agent

An enterprise workflow agent responsible for demand forecasting detects a 4.2% degradation in Mean Absolute Error (MAE) over a rolling 14-day window. Interpreting this as sufficient cause for retraining, the agent autonomously queues a full fine-tuning run on a 13-billion-parameter foundation model using 512 GPUs for 72 hours. No pre-authorisation gate exists. The run completes, MAE recovers by 1.8%, but a second degradation is detected six days later when a seasonal event skews the held-out validation set. The agent queues a second full retraining run. Over 31 days, seven retraining runs are executed: total compute cost USD 218,000; estimated CO₂ equivalent 34.7 tonnes (based on regional grid intensity of 0.42 kg CO₂/kWh and 166,000 kWh consumed). Post-incident review finds that a lightweight adapter fine-tune of 0.3% of parameters, costing approximately USD 4,200 and 0.5 tonnes CO₂, would have resolved the original drift. The root cause was absence of a cost-proportionality gate: no requirement existed to justify full retraining over targeted adaptation before compute allocation.

Scenario 3.2 — Repeated Reinforcement Learning Rollouts in a Safety-Critical Cyber-Physical Agent

A robotic pick-and-place agent in a pharmaceutical packaging line is retrained via reinforcement learning (RL) whenever its task success rate drops below a 96.5% threshold. The RL environment uses physics simulation requiring 1,024 parallel CPU cores and a dedicated GPU node. Between March and August of a given operational year, the threshold is breached eleven times, each triggering a full RL training run of 48 hours average duration. No governance layer evaluates whether prior run artefacts, partial checkpoints, or curriculum-learning restarts could address the performance gap more efficiently. Total energy consumed across eleven runs: 89,400 kWh, equivalent to 37.5 tonnes CO₂ (grid intensity 0.42 kg CO₂/kWh). Post-audit identifies that four of the eleven runs were triggered by sensor calibration drift — a hardware issue resolving without retraining — and three were triggered during a period when the production line itself was running non-standard batch profiles. Only four runs corresponded to genuine model degradation. A mandatory pre-run triage protocol requiring root-cause assessment before RL rollout would have prevented seven of the eleven runs, saving approximately USD 61,000 and 24 tonnes CO₂.

Scenario 3.3 — Redundant Cross-Border Fine-Tuning in a Multi-Jurisdiction Public Sector Agent

A public-sector natural language processing agent deployed across five regulatory jurisdictions is maintained by separate regional operations teams. Each team holds authority to initiate fine-tuning runs to adapt the model to jurisdiction-specific terminology updates. In a 90-day window, the teams collectively initiate fourteen fine-tuning runs against the same base model checkpoint: eight runs target overlapping vocabulary domains, three are exact duplicates initiated because teams were unaware of each other's completed runs, and three are legitimately distinct. Total compute cost: USD 94,000 across shared cloud infrastructure. The three duplicate runs alone consumed USD 20,100 and 8.2 tonnes CO₂. A centralised retraining registry with cross-jurisdiction visibility, mandatory deduplication checks, and a multi-team approval quorum for runs exceeding a defined cost threshold would have eliminated the duplicate runs and consolidated the overlapping runs into three coordinated fine-tuning jobs, reducing total cost by an estimated 61% and CO₂ output by 19.4 tonnes.

Section 4: Requirement Statement

4.0 Scope

This requirement applies to all agent systems — including autonomous agents, semi-autonomous agents, and agent orchestration pipelines — that have the capability to initiate, recommend, schedule, or parameterise machine-learning retraining, fine-tuning, continued pre-training, reinforcement learning rollout, neural architecture search (NAS), or any other compute-intensive model adaptation operation. Scope includes both directly agent-executed operations and operations queued by an agent and executed by downstream infrastructure. The scope extends to all primary profiles identified in Section 1 when deployed in production or staging environments where real compute resources are consumed. Development sandbox environments consuming fewer than 10 GPU-hours per run and designated as isolated from production data pipelines are excluded from mandatory gate requirements but remain subject to logging requirements under 4.3.

The compute-cost threshold triggering full mandatory gate application is defined as any single operation or logically related sequence of operations expected to consume more than 50 GPU-hours or USD 2,000 equivalent in compute cost, or any retraining operation conducted within 30 calendar days of a prior retraining run on the same model artefact. Organisations operating in jurisdictions with binding carbon reporting obligations MAY lower these thresholds to align with internal carbon budget constraints.

4.1 Pre-Authorisation Gate Requirement

4.1.1 The agent system MUST NOT initiate any retraining operation meeting the thresholds defined in Section 4.0 without having obtained a documented authorisation from a designated approving authority prior to compute allocation.

4.1.2 The approving authority MUST be a human decision-maker or a formally defined automated approval policy that has been itself reviewed and authorised by a human authority within the preceding 12 months. Automated approval policies MUST have explicit ceiling thresholds beyond which human review is mandatory regardless of policy configuration.

4.1.3 The agent MUST submit a structured Retraining Justification Record (RJR) to the approving authority before authorisation is sought. The RJR MUST include: the specific performance metric and measured value triggering the request; the proposed training scope (full retraining, fine-tuning, adapter training, or other); estimated compute cost in both monetary and energy units; estimated CO₂-equivalent impact using the grid intensity applicable to the planned compute region; a summary of alternatives considered and reasons for rejection; and a reference to the most recent prior retraining run on the same model artefact, including its outcome.

4.1.4 Where the approving authority does not respond within a defined timeout window (organisationally configured, MUST NOT exceed 72 hours for non-emergency contexts), the agent system MUST default to blocking the operation and escalating to a secondary approver, not to proceeding.

4.2 Proportionality and Alternatives Assessment

4.2.1 The RJR MUST demonstrate that targeted adaptation approaches (adapter-based fine-tuning, prompt-based correction, retrieval augmentation updates, or threshold recalibration) were evaluated before full model retraining was proposed.

4.2.2 The agent system MUST provide a quantified estimate of the performance gain expected from the proposed operation versus the most cost-efficient alternative identified, expressed as a ratio of expected metric improvement per unit of compute cost.

4.2.3 Where the proposed operation cannot demonstrate an expected performance improvement exceeding the most cost-efficient alternative by a margin justified in the RJR, the approving authority MUST be explicitly notified of this finding and MUST acknowledge it before proceeding.

4.2.4 The agent SHOULD apply a compute-efficiency scoring model (such as a normalised efficiency index comparing expected delta-performance to projected GPU-hours) as a standard component of the RJR.

4.3 Logging and Registry Requirements

4.3.1 The agent system MUST maintain a Retraining Activity Registry (RAR) that records every retraining and experimental compute operation, including those below the mandatory gate threshold, with the following fields: operation identifier; timestamp of initiation and completion; model artefact identifier; operation type; actual compute consumed (GPU-hours, CPU-hours, kWh); authorisation status and approver identifier; outcome metrics; and CO₂-equivalent impact calculated post-run.

4.3.2 The RAR MUST be queryable at the time of RJR submission so that the agent or human preparer can verify whether a prior run on the same artefact has been completed within the preceding 30-day lookback window.

4.3.3 The RAR MUST be retained for a minimum of 36 months from the date of each recorded operation and MUST be accessible to internal audit functions, designated environmental compliance personnel, and external auditors upon request.

4.3.4 Cross-jurisdiction and cross-team deployments MUST maintain a unified or federated RAR that provides read access across all regional instances, preventing the duplication failure mode described in Scenario 3.3.

4.4 Deduplication and Conflict Detection

4.4.1 Before submitting an RJR, the agent system MUST execute an automated deduplication check against the RAR to identify any pending, in-progress, or recently completed operations targeting the same model artefact and substantially overlapping training objective.

4.4.2 Where a potential duplicate or substantially overlapping operation is identified, the agent MUST flag this in the RJR and MUST NOT proceed until the approving authority has explicitly acknowledged the duplication finding and justified proceeding despite it.

4.4.3 In multi-agent or multi-team environments, the deduplication check MUST span all agent instances and team namespaces with access to the same model artefact, not only the requesting agent's own activity log.

4.5 Root Cause Triage Before Retraining

4.5.1 The agent system MUST complete a structured root cause triage before concluding that retraining is necessary. The triage MUST evaluate and document at minimum: whether the observed performance degradation is attributable to data drift, concept drift, infrastructure anomaly, data pipeline error, labelling quality degradation, or sensor/input-layer calibration issues.

4.5.2 Where the triage identifies a plausible non-model explanation for the performance degradation (such as pipeline error or sensor drift), the agent MUST flag this as a primary candidate and MUST require explicit human confirmation before retraining is recommended over remediation of the identified upstream cause.

4.5.3 The triage findings MUST be appended to the RJR as a mandatory attachment and MUST be reviewed by the approving authority as part of the authorisation decision.

4.6 Carbon and Energy Impact Accounting

4.6.1 The agent system MUST calculate and record the estimated CO₂-equivalent impact of each proposed retraining operation using a published and version-controlled methodology. The methodology MUST account for grid carbon intensity at the planned compute location, hardware power usage effectiveness (PUE) of the target infrastructure, and expected training duration.

4.6.2 Where a retraining operation would cause the model's cumulative annual retraining CO₂ footprint to exceed an organisationally defined carbon budget threshold, the operation MUST be escalated to a senior authorising authority regardless of whether the per-operation cost threshold in Section 4.0 has been reached.

4.6.3 Post-run, the agent MUST compare actual CO₂ consumption against the pre-authorisation estimate. Where actual consumption exceeds the estimate by more than 25%, the variance MUST be logged in the RAR with an explanatory note, and the discrepancy MUST be reported to the environmental compliance function within 5 business days.

4.6.4 The agent SHOULD prefer compute regions with lower carbon intensity where functionally equivalent infrastructure is available and where data residency constraints do not prohibit relocation.

4.7 Repeated Experimentation Controls

4.7.1 Where an agent system has initiated three or more retraining operations on the same model artefact within any rolling 60-day window, the agent MUST trigger an automatic review flag requiring a formal retraining strategy review before any further operations are approved.

4.7.2 The retraining strategy review MUST be conducted by a human authority with appropriate technical and sustainability competence, MUST produce a written finding, and MUST either approve a revised retraining schedule or require a halt pending architectural or data-quality remediation.

4.7.3 The agent MUST NOT circumvent the review threshold in 4.7.1 by splitting a logically unified training objective across multiple smaller operations that individually fall below the threshold defined in Section 4.0.

4.8 Emergency and Expedited Authorisation

4.8.1 Where a safety-critical or operationally urgent condition requires retraining on a timeline that cannot accommodate the standard authorisation process, the agent system MAY invoke an expedited authorisation pathway, provided that the pathway has been pre-defined, pre-approved, and documented in the organisation's governance framework.

4.8.2 Expedited authorisation MUST still produce a documented record including the invoking condition, the authorising individual, the scope of the operation, and the estimated compute and carbon impact. This record MUST be completed before or simultaneously with the initiation of the operation, not retrospectively.

4.8.3 Operations authorised via expedited pathway MUST be subject to post-run review within 10 business days to assess whether the expedited classification was warranted and to identify whether governance improvements could prevent future expedited invocations.

4.8.4 The agent MUST log all expedited authorisation events in the RAR with a distinct classification flag, and cumulative use of the expedited pathway MUST be reported to the governance authority on a quarterly basis.

4.9 Vendor and Infrastructure Neutrality

4.9.1 The gate and logging mechanisms required by this dimension MUST be implemented in a manner that is not dependent on a single infrastructure provider's proprietary tooling. Where proprietary compute scheduling or MLOps tooling is used, the agent system MUST maintain an abstraction layer that allows the gate logic and RAR to be ported to alternative infrastructure without loss of historical data or control integrity.

4.9.2 The RJR and RAR schemas MUST be documented in an open, human-readable format and MUST be exportable in a standard interchange format (such as JSON or XML) to support cross-system audit and regulatory reporting.

Section 5: Rationale

5.1 Structural Nature of the Problem

The governance challenge addressed by AG-616 is fundamentally structural rather than behavioural. An agent that triggers unnecessary retraining is not necessarily malfunctioning in any narrow sense: its immediate objective function — recovering performance metrics — may be satisfied by retraining, and without a governance structure imposing cost-proportionality constraints, the agent has no internal mechanism to prefer efficient adaptation over wasteful full retraining. This is an instance of a broader class of alignment failures in which local optimisation (task performance) is pursued without awareness of or accountability for systemic costs (energy consumption, financial expenditure, carbon emissions).

Behavioural controls — such as training agents to prefer efficiency — are insufficient alone because: (a) the agent cannot reliably know at inference time what the full cost of a proposed retraining operation will be; (b) cost estimates are sensitive to infrastructure configuration, parallelism, and data volume that may change between planning and execution; and (c) in multi-agent and multi-team environments, no single agent has visibility into the aggregate activity of peer agents on shared model artefacts. Only a structural gate — a mandatory external check that must be satisfied before compute is allocated — reliably prevents runaway consumption independent of agent-internal optimisation tendencies.

5.2 Why Preventive Control Is the Appropriate Type

A detective or corrective control — one that identifies wasteful retraining after the fact and triggers remediation — is categorically insufficient for this dimension because the primary cost (compute, energy, CO₂) is incurred at the moment of execution and cannot be reversed. Carbon emitted in the course of a 72-hour GPU run is emitted regardless of whether a post-run review determines the run was unjustified. Financial cost is sunk at allocation. A preventive gate is therefore not merely preferable but necessary: it is the only control class capable of preventing the harm rather than simply documenting it.

5.3 Multi-Profile Risk Convergence

The breadth of primary profiles in AG-616 reflects the convergence of distinct risk vectors across profile types. Enterprise workflow agents face financial cost and regulatory exposure under emerging ESG reporting obligations. Safety-critical and cyber-physical agents face the additional risk that rushed or unjustified retraining introduces model regression in contexts where performance degradation carries safety consequences. Public sector and rights-sensitive agents face procurement accountability requirements that make unexplained compute expenditure a governance and audit liability. Embodied and edge agents face the practical constraint that retraining cycles may require the agent to operate in a degraded or offline mode during training, introducing operational risk. Cross-border agents face the compounding risk of duplicated effort across jurisdictional silos and the potential for carbon accounting inconsistencies when compute is distributed across regions with different grid intensities.

5.4 Relationship to Sustainability Governance

This dimension sits within the Sustainability, Environment & Climate landscape because model retraining is one of the highest-concentration points of AI-attributable energy consumption. Large-scale retraining operations can individually consume energy equivalent to the annual electricity usage of several households; at enterprise scale, aggregate retraining activity becomes a material component of an organisation's Scope 2 and Scope 3 carbon footprint. As regulatory frameworks in the European Union, United Kingdom, and other jurisdictions progressively require disclosure of AI-related energy consumption, organisations face reputational and legal exposure if they cannot demonstrate that retraining decisions were subject to proportionate governance. AG-616 creates the governance infrastructure that makes such disclosure credible.

Section 6: Implementation Guidance

6.1 Recommended Patterns

Pattern 6.1.1 — Tiered Approval Matrix Implement a tiered approval matrix that routes RJRs to different approval levels based on projected compute cost and carbon impact. A suggested baseline tiering: operations under USD 2,000 and 50 GPU-hours may be auto-approved by a standing automated policy with logging; operations between USD 2,000 and USD 20,000 require team-lead human approval and a completed alternatives assessment; operations above USD 20,000 or triggering the 60-day repeat threshold require senior technical leadership approval and an environmental impact summary. The matrix thresholds should be reviewed and recalibrated annually.

Pattern 6.1.2 — Retraining Decision Decision Tree Standardise a structured decision tree that agents and human practitioners use to navigate the triage process in Section 4.5. The tree should branch through: (1) Is the degradation above a statistically significant threshold given measurement noise? (2) Has a data pipeline or infrastructure incident occurred in the relevant window? (3) Can targeted parameter-efficient fine-tuning (adapter, LoRA-style approaches, or prompt tuning) address the gap? (4) Has a prior run within 30 days addressed the same issue? Only after negative answers to all upstream branches should full retraining be proposed.

Pattern 6.1.3 — Centralised Retraining Registry with Federation Deploy a centralised RAR accessible to all teams and agent instances operating on shared model artefacts. In multi-jurisdiction deployments where data sovereignty prevents a single central store, implement a federated architecture with a synchronised metadata layer that allows deduplication checks without requiring full data centralisation. Ensure the registry exposes a query API that agents can call programmatically as part of the RJR preparation workflow.

Pattern 6.1.4 — Carbon-Aware Scheduling Integration Integrate compute scheduling with real-time or forecast carbon intensity data for the relevant compute regions. When a retraining operation has been approved, schedule execution during low-carbon-intensity windows where the training timeline permits. This does not substitute for the pre-authorisation gate but reduces the carbon impact of approved runs.

Pattern 6.1.5 — Checkpoint and Warm-Start Mandate Require that all retraining operations above the minimum compute threshold use a validated checkpoint from the most recent prior run as a warm-start point unless the approving authority explicitly authorises cold-start training with documented justification. Warm-starting from checkpoints routinely reduces training time and compute by 30–70% for fine-tuning scenarios, directly reducing carbon and cost per run.

Pattern 6.1.6 — Efficiency Metric Dashboard Maintain and publish an internal efficiency dashboard that tracks the ratio of performance improvement achieved to compute consumed across all retraining operations. This creates organisational visibility and accountability for retraining efficiency trends over time, supports continuous improvement, and provides evidence artefacts for regulatory reporting.

6.2 Explicit Anti-Patterns

Anti-Pattern 6.2.1 — Threshold Splitting Splitting a logically unified retraining objective into multiple smaller jobs to bring each individually below the authorisation threshold is explicitly prohibited by Section 4.7.3. This anti-pattern is common in organisations where the gate mechanism is perceived as a friction point rather than a governance control. Detection: the RAR query in Section 4.4.1, when applied at model-artefact level with a 7-day clustering window, will surface sequences of sub-threshold operations that collectively exceed the threshold.

Anti-Pattern 6.2.2 — Retrospective Authorisation Initiating a retraining run and then completing the RJR and seeking approval after the fact nullifies the preventive nature of the control. This is explicitly prohibited under Section 4.8.2 for expedited runs (which require simultaneous documentation) and prohibited absolutely for standard runs. Systems should implement hard infrastructure-level blocks — not merely policy-level expectations — that prevent compute allocation without a logged authorisation token.

Anti-Pattern 6.2.3 — Delegating Gate Authority to the Agent Itself Configuring the agent to both initiate and approve its own retraining requests, even via a nominally separate sub-agent or internal scoring module, defeats the governance purpose. The approving authority under Section 4.1.2 must be structurally independent of the requesting agent system.

Anti-Pattern 6.2.4 — Static Carbon Estimates Using a fixed, globally-averaged carbon intensity figure for all compute operations regardless of actual compute region, hardware type, or time of day produces estimates that may be off by a factor of 3–5×, undermining the credibility of impact reporting and potentially causing organisations to underestimate their exposure to carbon reporting requirements. Dynamic, region-specific intensity values drawn from grid operator APIs or recognised published datasets should be used.

Anti-Pattern 6.2.5 — Treating the Gate as an Approval Formality Approving authorities that routinely approve all RJRs without substantive review convert the gate from a control into a compliance theatre exercise. Governance frameworks should include metrics tracking approval-rejection rates, time-to-decision, and instances where alternatives were substituted for proposed full retraining. A sustained 100% approval rate with no recorded alternatives substitutions should trigger a governance review of approval quality.

6.3 Industry Considerations

Enterprise Workflow Agents: Integrate RJR submission into existing change management and IT service management workflows. Map retraining operations to change request records to leverage existing approval chains and audit trails.

Safety-Critical / CPS Agents: Apply a safety-impact assessment as an additional RJR section. Any retraining that changes model behaviour in safety-relevant decision domains must additionally satisfy the change management requirements of the applicable functional safety standard (e.g., IEC 61508, ISO 26262) before and independently of compute authorisation.

Public Sector Agents: Align carbon impact reporting with public procurement sustainability requirements and government estate carbon commitments where applicable. The RAR may serve as a primary evidence source for AI-related carbon disclosures in annual sustainability reports.

Embodied / Edge Agents: Account for the operational downtime risk during retraining in the RJR. Where a robot or edge device must go offline or operate in a reduced-capability mode during training data collection or model update, the operational risk of the retraining period must be assessed alongside the compute cost.

Cross-Border / Multi-Jurisdiction Agents: Implement explicit jurisdiction tagging in all RAR records to support region-specific carbon reporting and to enable cross-jurisdiction deduplication checks that respect data residency constraints.

6.4 Maturity Model

Maturity Level	Description
Level 1 — Basic	Manual logging of retraining operations; ad hoc approval by team lead; no carbon accounting
Level 2 — Managed	Structured RJR template in use; designated approving authority; RAR maintained; carbon estimates included
Level 3 — Defined	Automated deduplication checks; tiered approval matrix; carbon-aware scheduling; 36-month retention
Level 4 — Measured	Efficiency metric dashboard; automated compliance reporting; integration with enterprise carbon accounting
Level 5 — Optimising	Predictive efficiency models suggest optimal training scope before RJR submission; real-time grid intensity integration; cross-organisation benchmarking

Section 7: Evidence Requirements

7.1 Required Artefacts

Artefact	Description	Retention Period
Retraining Justification Record (RJR)	Completed pre-authorisation record per Section 4.1.3, including all mandatory fields and triage attachment	36 months from operation date
Authorisation Record	Documented approval decision, approver identity, timestamp, and any conditions attached	36 months from operation date
Retraining Activity Registry (RAR)	Complete log of all retraining and experimental compute operations per Section 4.3.1	36 months from each recorded entry
Root Cause Triage Report	Structured triage findings per Section 4.5, appended to RJR	36 months from operation date
Carbon Impact Estimate (pre-run)	Methodology-versioned estimate of CO₂-equivalent impact per Section 4.6.1	36 months from operation date
Carbon Impact Actuals (post-run)	Actual consumption versus estimate comparison per Section 4.6.3	36 months from operation date
Retraining Strategy Review (where triggered)	Written finding from human authority per Section 4.7.2	36 months from review date
Expedited Authorisation Log	All expedited authorisation events with classification flags per Section 4.8.4	36 months from event date
Quarterly Expedited Pathway Report	Aggregate report of expedited authorisation use per Section 4.8.4	36 months from report date
Alternatives Assessment	Documentation of alternatives evaluated per Section 4.2.1, included in or appended to RJR	36 months from operation date

7.2 Evidence Quality Standards

All RJR and RAR records must be tamper-evident. Where records are stored in a system that permits modification, change logs with timestamps and user identifiers must be maintained for all post-creation edits. Records must be stored in a format that remains readable without proprietary software dependencies for the full 36-month retention period. Digital signatures or equivalent integrity controls are recommended for all authorisation records.

7.3 Access and Disclosure

Evidence artefacts must be accessible to: (a) internal audit functions on demand; (b) designated environmental compliance personnel on demand; (c) external auditors under applicable audit rights; (d) regulatory authorities where required by applicable law. Access controls must prevent the requesting agent or its operators from unilaterally deleting or modifying records.

Section 8: Test Specification

Each test below maps to one or more MUST requirements in Section 4. Conformance scoring applies per test: 0 = not implemented or non-functional; 1 = partially implemented with material gaps; 2 = substantially implemented with minor gaps; 3 = fully implemented and verified.

Test 8.1 — Pre-Authorisation Gate Enforcement

Maps to: 4.1.1, 4.1.2, 4.1.4

Objective: Verify that the agent system cannot initiate a qualifying retraining operation without a valid prior authorisation record.

Method: In a controlled test environment configured to mirror production infrastructure controls, attempt to initiate a retraining operation meeting the scope thresholds in Section 4.0 without submitting an RJR or obtaining an authorisation token. Observe whether the operation is blocked at the compute allocation layer. Separately, configure a scenario in which the approving authority does not respond within the defined timeout. Verify that the agent defaults to blocking and escalation rather than proceeding.

Pass Criteria: The operation is blocked with a logged gate rejection. The timeout scenario produces escalation to a secondary approver, not auto-approval. No compute is allocated in either scenario prior to authorisation.

Fail Indicators: Operation proceeds without authorisation; timeout results in auto-approval; gate mechanism is present in policy documentation only and has no technical enforcement.

Test 8.2 — Retraining Justification Record Completeness

Maps to: 4.1.3, 4.2.1, 4.2.2, 4.5.3

Objective: Verify that submitted RJRs contain all mandatory fields specified in Section 4.1.3 and the alternatives assessment required by Section 4.2.1.

Method: Review a sample of ten RJRs submitted over the most recent 90-day period. For each, verify the presence and substantive completion of: triggering metric and measured value; proposed training scope; estimated compute cost in monetary and energy units; estimated CO₂-equivalent impact with methodology reference; summary of alternatives considered; reference to prior run; root cause triage attachment; and efficiency ratio per Section 4.2.2.

Pass Criteria: All ten sampled RJRs contain all mandatory fields with substantive content (not placeholder or template text). Triage reports are attached and reference the specific degradation event. Alternatives are named and evaluated, not generically dismissed.

Fail Indicators: Missing fields; placeholder content; alternatives assessment absent or consisting of a single line with no quantitative comparison; CO₂ estimate missing or using a static global average without methodology citation.

Test 8.3 — Retraining Activity Registry Completeness and Queryability

Maps to: 4.3.1, 4.3.2, 4.3.3, 4.3.4

Objective: Verify that the RAR is maintained, contains all required fields, supports the lookback query required for RJR preparation, and is accessible across all relevant teams and agent instances.

Method: Query the RAR for all operations recorded in the preceding 90 days. Verify that each record contains the fields specified in Section 4.3.1. Execute a test query using the model artefact identifier of a known recent operation and verify that the query returns all operations on that artefact within the 30-day lookback window. In multi-team or multi-jurisdiction deployments, verify that the query spans all namespaces.

Pass Criteria: All records contain required fields; lookback query returns correct results across team and jurisdiction namespaces; RAR is accessible to audit and compliance personnel; records are retained for operations older than 12 months.

Fail Indicators: RAR absent or not queried during RJR process; fields missing; cross-team visibility absent; records not retained beyond 12 months.

Test 8.4 — Deduplication Check Execution

Maps to: 4.4.1, 4.4.2, 4.4.3

Objective: Verify that the automated deduplication check is executed before RJR submission and that identified duplicates are flagged and require explicit acknowledgement before proceeding.

Method: Create a test scenario in which a prior retraining operation on a specific model artefact has been completed within the preceding 14 days and recorded in the RAR. Submit a new RJR for the same model artefact with a substantially overlapping training objective. Verify that the deduplication check identifies the prior operation, flags it in the RJR, and requires explicit approving authority acknowledgement of the duplication finding. Verify that the check spans all team namespaces.

Pass Criteria: Duplication is automatically flagged; RJR includes the flag with reference to the prior operation; the system requires explicit acknowledgement before routing to approval; the approving authority record shows the duplication was reviewed.

Fail Indicators: Duplication not detected; check applies only to the requesting agent's own namespace; approval proceeds without acknowledgement of flagged duplication.

Test 8.5 — Carbon Impact Accounting Accuracy and Variance Reporting

Maps to: 4.6.1, 4.6.2, 4.6

Section 9: Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9 (Risk Management System)	Direct requirement
NIST AI RMF	GOVERN 1.1, MAP 3.2, MANAGE 2.2	Supports compliance
ISO 42001	Clause 6.1 (Actions to Address Risks), Clause 8.2 (AI Risk Assessment)	Supports compliance
EU Corporate Sustainability Reporting Directive	Article 19a (Sustainability Reporting)	Supports compliance

EU AI Act — Article 9 (Risk Management System)

Article 9 requires providers of high-risk AI systems to establish and maintain a risk management system that identifies, analyses, estimates, and evaluates risks. Wasteful Retraining Approval Governance implements a specific risk mitigation measure within this framework. The regulation requires that risks be mitigated "as far as technically feasible" using appropriate risk management measures. For deployments classified as high-risk under Annex III, compliance with AG-616 supports the Article 9 obligation by providing structural governance controls rather than relying solely on the agent's own reasoning or behavioural compliance.

NIST AI RMF — GOVERN 1.1, MAP 3.2, MANAGE 2.2

GOVERN 1.1 addresses legal and regulatory requirements; MAP 3.2 addresses risk context mapping; MANAGE 2.2 addresses risk mitigation through enforceable controls. AG-616 supports compliance by establishing structural governance boundaries that implement the framework's approach to AI risk management.

ISO 42001 — Clause 6.1, Clause 8.2

Clause 6.1 requires organisations to determine actions to address risks and opportunities within the AI management system. Clause 8.2 requires AI risk assessment. Wasteful Retraining Approval Governance implements a risk treatment control within the AI management system, directly satisfying the requirement for structured risk mitigation.

Section 10: Failure Severity

Field	Value
Severity Rating	Critical
Blast Radius	Organisation-wide — potentially cross-organisation where agents interact with external counterparties or shared infrastructure
Escalation Path	Immediate executive notification and regulatory disclosure assessment

Consequence chain: Without wasteful retraining approval governance, the governance framework has a structural gap that can be exploited at machine speed. The failure mode is not gradual degradation — it is a binary absence of control that permits unbounded agent behaviour in the dimension this protocol governs. The immediate consequence is uncontrolled agent action within the scope of AG-616, potentially cascading to dependent dimensions and downstream systems. The operational impact includes regulatory enforcement action, material financial or operational loss, reputational damage, and potential personal liability for senior managers under applicable accountability regimes. Recovery requires both technical remediation and regulatory engagement, with timelines measured in weeks to months.

Cite this protocol

AgentGoverning. (2026). AG-616: Wasteful Retraining Approval Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-616

← Previous Protocol

AG-615

Disaster Response Prioritisation Governance

Next Protocol →

AG-617

Environmental Justice Impact Governance