AG-447: Deskilling Mitigation Drill Governance

2. Summary

Deskilling Mitigation Drill Governance requires organisations to conduct structured, recurring exercises in which human operators perform tasks normally delegated to AI agents, without automation assistance, to ensure they retain the cognitive and procedural competence necessary to take over manual operations when the agent is unavailable, degraded, or revoked. As agent autonomy increases and humans spend months or years in a supervisory rather than an execution role, their ability to perform the underlying tasks atrophies — a phenomenon well-documented in aviation, nuclear operations, and industrial control. This dimension mandates a drill programme that measures manual task performance, identifies competence decay, and triggers remediation before deskilling creates an operational single point of failure where the organisation literally cannot function without the agent.

3. Example

Scenario A — Underwriting Team Cannot Process Claims Manually After Agent Outage: A commercial insurance firm deploys an AI agent to handle 92% of commercial property underwriting decisions. Human underwriters have spent 14 months reviewing agent outputs rather than performing independent risk assessments. The agent suffers a sustained outage lasting 72 hours due to a provider infrastructure failure. Management instructs the 8-person underwriting team to process claims manually. In the first 24 hours, the team completes 11 assessments — compared to the agent's throughput of 340 per day. Of the 11 manual assessments, 4 contain pricing errors exceeding 15% of the correct premium, and 2 miss material exclusion clauses. The firm suspends new business intake for 3 days, losing an estimated £620,000 in premium revenue. Two incorrectly priced policies are bound before the errors are discovered, creating a combined excess exposure of £1.4 million.

What went wrong: No deskilling drills had been conducted since the agent was deployed. The underwriters' manual risk assessment skills had atrophied over 14 months of supervisory work. The organisation had no data on manual performance capability and no remediation programme. The agent was a functional single point of failure — not because humans were absent, but because humans had lost the ability to do the work.

Scenario B — Air Traffic Control Handover Failure During GPS Spoofing Event: A regional air traffic control facility uses an AI agent for traffic separation and sequencing. Controllers have operated in a monitoring role for 22 months. During a suspected GPS spoofing event, the agent's position data becomes unreliable and standard operating procedures require reversion to manual radar-based separation. Controllers attempt manual separation but struggle with workload management — the facility handles 45 aircraft per hour, and controllers have not practised manual sequencing at this density since before the agent was deployed. Two loss-of-separation events occur within 90 minutes. The facility restricts traffic flow to 18 aircraft per hour — a 60% reduction — for 6 hours until additional controllers with recent manual experience can be brought in from other facilities. Direct airline delay costs exceed £2.8 million. The aviation authority opens a formal investigation into competence assurance.

What went wrong: The facility had no deskilling drill programme. Controllers were certified on paper but had not performed manual separation at operational density for nearly two years. The competence gap was invisible until a real event demanded manual performance. The investigation revealed that the facility's safety case assumed manual capability that no longer existed.

Scenario C — Municipal Benefits Processing Collapses During Audit-Mandated Agent Suspension: A municipal government uses an AI agent to process housing benefit applications. A regulatory audit identifies bias concerns and the agent is suspended pending investigation. The 12-person benefits team must process applications manually. Average processing time increases from 8 minutes (agent-assisted) to 47 minutes per application. The backlog grows to 2,300 applications within two weeks. Citizens experience payment delays averaging 19 days, causing 340 formal complaints and 28 emergency housing referrals. The council incurs £185,000 in overtime costs and £92,000 in emergency housing costs. A judicial review is filed by a citizens' advocacy group alleging breach of statutory processing timelines.

What went wrong: The council had not conducted manual processing drills. Staff had been trained on agent-assisted workflows exclusively. No baseline existed for manual processing capacity, so the council could not forecast the backlog or allocate resources proactively when the agent was suspended. The statutory obligation to process claims within defined timescales was breached because the organisation had unknowingly lost the capacity to meet those obligations without the agent.

4. Requirement Statement

Scope: This dimension applies to any AI agent deployment where human operators have primary or backup responsibility for tasks that the agent performs, and where those operators may need to execute those tasks without agent assistance during agent unavailability, degradation, revocation, or planned maintenance. The scope includes all scenarios where automation has shifted human roles from execution to supervision, creating conditions for skill atrophy. The dimension does not apply to fully autonomous systems with no human fallback pathway — but such systems are rare under current regulatory frameworks, and most deployments retain human fallback as a regulatory or operational requirement. If human operators are listed in the fallback plan, continuity plan, or safety case as capable of manual task execution, this dimension applies.

4.1. A conforming system MUST implement a recurring deskilling drill programme in which human operators perform tasks normally delegated to the AI agent without any automation assistance, at a frequency no less than quarterly for high-risk/critical deployments and no less than semi-annually for other deployments.

4.2. A conforming system MUST define measurable performance criteria for manual task execution — including accuracy, throughput, error rates, and time-to-completion — and evaluate every drill participant against these criteria.

4.3. A conforming system MUST establish pre-deployment baseline measurements of manual task performance for all operators, recorded before or at the time the agent assumes operational responsibility, against which subsequent drill performance is compared.

4.4. A conforming system MUST implement a remediation process that is triggered when any operator's drill performance falls below defined thresholds, with remediation completed and re-tested before the operator's next scheduled shift in a supervisory role.

4.5. A conforming system MUST document drill results in a structured format that includes operator identity, drill date, tasks performed, performance measurements against each criterion, and pass/fail determination, retained as governance evidence.

4.6. A conforming system MUST conduct drills at operational or near-operational complexity and volume — drills using simplified scenarios or reduced throughput requirements that do not represent actual manual fallback conditions do not satisfy this requirement.

4.7. A conforming system SHOULD rotate drill scenarios across the full range of task types the agent handles, ensuring that operators are tested on edge cases and complex scenarios, not only routine tasks.

4.8. A conforming system SHOULD integrate deskilling drill results with the fallback staffing model defined under AG-426, adjusting staffing requirements if drill data reveals that manual throughput is lower than the fallback plan assumes.

4.9. A conforming system MAY implement unannounced drills — exercises initiated without prior scheduling — to measure operator readiness under realistic conditions, subject to applicable labour regulations and workforce agreements.

4.10. A conforming system MAY use drill performance data to identify candidates for advanced training or to adjust the scope of agent autonomy if widespread deskilling is detected across the operator population.

5. Rationale

Deskilling is not a theoretical risk — it is one of the most extensively documented consequences of automation in safety-critical domains. The aviation industry's experience with autopilot-dependent pilots is the canonical example: when pilots spend 95% of flight time monitoring autopilot systems rather than hand-flying, their manual flying skills degrade measurably within months. The same pattern has been observed in industrial process control, power plant operations, maritime navigation, and medical diagnostics. The underlying cognitive mechanism is well-established: skills that are not practised atrophy, and the rate of atrophy increases with the duration of non-practice.

AI agent deployments create identical conditions for deskilling. When an agent handles 80-95% of task execution and humans shift to a supervisory role — reviewing agent outputs rather than producing their own — the humans' task execution skills begin to atrophy immediately. Within 6-12 months of full agent deployment, most operators will show measurable degradation in manual task performance. Within 18-24 months, the degradation may be severe enough that manual operation at production volume and quality is effectively impossible without retraining.

This creates a dangerous paradox: the organisation deploys an agent and retains human operators as a fallback, but the longer the agent operates successfully, the less capable the human fallback becomes. The fallback plan assumes human competence that is eroding in real time. If the agent fails after 18 months of operation, the organisation discovers that its fallback is fictional — humans are present but cannot perform the work. This is worse than having no fallback plan, because the organisation has made operational and regulatory commitments based on an assumed capability that no longer exists.

The regulatory implications are significant. The EU AI Act Article 14 requires human oversight of high-risk AI systems, which necessarily implies that the humans performing oversight retain the competence to do so effectively. Article 14(4)(a) specifically requires that human overseers are able to "fully understand the capacities and limitations of the high-risk AI system and be able to duly monitor its operation." If overseers have lost the domain competence to evaluate whether the agent's outputs are correct, they cannot fulfil this requirement regardless of how many dashboards and alert systems are provided. FCA SYSC 6.1.1R requires firms to maintain adequate systems and controls — a fallback plan that assumes human competence without verifying it through drills is not adequate. DORA Article 11 requires ICT-related incident response testing, which includes testing the organisation's ability to continue operations during system unavailability.

Deskilling drills are the organisational equivalent of fire drills — they ensure that the capability that everyone assumes exists actually exists when it is needed. Without drills, deskilling is invisible: no alarm sounds when an operator's manual accuracy drops below acceptable levels. The degradation is gradual, individual, and undetectable until a real event demands manual performance.

6. Implementation Guidance

Deskilling Mitigation Drill Governance requires a structured programme that exercises human operators on manual task execution at regular intervals, measures their performance against pre-established baselines, and triggers remediation when degradation is detected. The programme must be integrated with fallback planning (AG-426), training recertification (AG-446), and operational continuity (AG-008) to form a coherent competence assurance framework.

Recommended patterns:

Structured drill calendar with scenario rotation. Maintain a drill calendar that schedules exercises across all operator groups at the required frequency. Each drill cycle should cover a different subset of the agent's task portfolio, rotating through routine tasks, complex tasks, edge cases, and high-volume scenarios over the annual cycle. The calendar should be published in advance for announced drills and maintained internally for unannounced drills. Drill dates should not cluster at quarter-end — distribute them to minimise operational disruption while maintaining the required frequency.
Realistic drill environments with production-equivalent complexity. Conduct drills in environments that replicate production conditions as closely as possible — the same tools (minus the agent), the same data complexity, the same time pressures. A drill conducted on simplified test data at half the production volume tells you whether operators can handle half the work on easy cases — it tells you nothing about their readiness to handle a real fallback. Use sanitised copies of recent production data where feasible, or synthetic data generated to match production complexity distributions.
Pre-deployment baseline capture with multi-dimensional metrics. Before deploying the agent (or as close to deployment as possible), measure every operator's manual task performance across at least four dimensions: accuracy (error rate per task), throughput (tasks completed per hour), completeness (percentage of required steps or checks performed), and time-to-completion (elapsed time per task from start to finish). These baselines become the reference against which drill performance is measured. If baseline capture was not performed at deployment, conduct an immediate assessment now to establish a reference point, acknowledging that some degradation may already have occurred.
Individual performance tracking with trend analysis. Track each operator's drill performance over time, not just the most recent drill. A single below-threshold result may be an anomaly; a declining trend across three consecutive drills indicates systematic deskilling requiring intervention. Trend analysis should use simple statistical methods — linear regression on performance scores over time, or comparison of moving averages — to detect degradation before it reaches critical levels.
Remediation with re-test gate. When an operator's drill performance falls below threshold, initiate a structured remediation programme (refresher training, supervised manual practice, or paired operation with a competent operator) and re-test before the operator returns to supervisory duties. Remediation is not optional and not deferrable — an operator who cannot perform manual tasks cannot serve as a credible fallback.

Anti-patterns to avoid:

Checkbox drills with no performance measurement. Conducting drills where operators go through the motions but no quantitative performance data is collected. A drill without measurement is a training exercise at best and a waste of time at worst. Drills must produce data that can be compared against baselines and thresholds.
Simplified or reduced-volume drills only. Drills that use simplified scenarios or reduced throughput to avoid operational disruption. These drills test whether operators remember the process but not whether they can execute it at production scale. If operators pass simplified drills but cannot handle production volume, the drill programme provides false assurance.
Annual-only drill frequency. Conducting drills once per year or less. Skill atrophy occurs on a timescale of weeks to months, not years. Annual drills detect deskilling far too late — an operator's skills may have degraded to unacceptable levels within 3-4 months of the last drill.
Group-average assessment without individual tracking. Reporting drill results as team averages rather than individual scores. Deskilling affects individuals at different rates — some operators maintain skills through personal practice or prior experience, while others degrade rapidly. Team averages conceal individuals who have crossed below the competence threshold.
Drills that allow partial agent assistance. Permitting operators to use agent outputs as a reference during drills, or conducting drills with the agent running in parallel. The purpose of the drill is to test unassisted manual competence. Any agent assistance defeats the purpose and invalidates the results.

Industry Considerations

Financial Services. Financial institutions face acute deskilling risk in trading operations, credit underwriting, and compliance monitoring — domains where manual expertise took years to develop and can atrophy rapidly under agent automation. FCA-regulated firms should align drill frequency with their SYSC 6 obligations and integrate drill results into their operational resilience self-assessments. Drills for trading desks should be conducted during market hours using realistic market conditions to test performance under genuine time pressure.

Healthcare. Clinical decision-support agents create deskilling risk for diagnostic and treatment-planning skills. Medical professionals who rely on AI-assisted diagnosis for extended periods may lose the pattern-recognition skills developed through years of clinical practice. Healthcare drills must be conducted with clinical governance oversight and should include assessment of clinical judgement, not just procedural accuracy. Patient safety regulations may impose additional drill requirements beyond this dimension's minimum frequency.

Safety-Critical and CPS. Aviation, energy, and industrial control environments have the most mature drill programmes — mandatory simulator checks for pilots, emergency operating procedure drills for nuclear plant operators. Organisations deploying agents in these domains should integrate AI-fallback drills into existing competence assurance programmes rather than creating parallel structures. The key addition is ensuring that drills specifically test the transition from agent-monitored to manual operation, not just manual operation in isolation.

Public Sector. Government agencies face statutory service delivery obligations that cannot be suspended during agent unavailability. Deskilling drills must verify that manual processing capacity is sufficient to meet statutory timescales. Drill results should inform contingency planning for agent suspension scenarios, including temporary staff redeployment and overtime authorisation.

Maturity Model

Basic Implementation — The organisation has established a drill programme with defined frequency, scope, and performance criteria. Pre-deployment baselines exist for all current operators. Drills are conducted at or near production complexity. Individual drill results are documented and retained. Remediation processes exist for below-threshold performance. All mandatory requirements (4.1 through 4.6) are satisfied.

Intermediate Implementation — All basic capabilities plus: drill scenarios rotate across the full task portfolio including edge cases and high-complexity scenarios. Individual performance trends are tracked over time with statistical analysis. Drill results are integrated with fallback staffing models (AG-426). Unannounced drills are conducted at least once per year. Drill environments use sanitised production data for maximum realism. Remediation includes structured retraining programmes with re-test gates.

Advanced Implementation — All intermediate capabilities plus: drill performance data drives dynamic adjustments to agent autonomy scope — if widespread deskilling is detected, agent autonomy is reduced to increase human task execution and rebuild skills. Predictive models forecast individual deskilling trajectories based on time-since-last-manual-execution and operator characteristics. Drill results are independently audited annually. Cross-functional drill exercises test end-to-end manual operation across multiple teams simultaneously. The organisation can demonstrate through data that manual operational capability is maintained at a defined percentage of pre-deployment baseline.

7. Evidence Requirements

Required artefacts:

Drill programme specification. Documented drill programme including frequency, scope, scenario design methodology, performance criteria with thresholds, and remediation triggers. Must demonstrate compliance with the quarterly minimum frequency for high-risk/critical deployments.
Pre-deployment baseline records. Individual operator baseline performance measurements taken at or before agent deployment, covering accuracy, throughput, completeness, and time-to-completion. If baselines were not captured at deployment, the earliest available assessment with an explanation of the gap.
Individual drill result records. Structured records for every drill conducted, including: operator identity, drill date, scenario description, tasks attempted, performance measurements per criterion, baseline comparison, and pass/fail determination. Not summary reports — individual operator-level data.
Remediation records. For every below-threshold drill result: the remediation plan, remediation activities completed, re-test date, re-test results, and return-to-duty authorisation. Must demonstrate that remediation was completed before the operator resumed supervisory duties.
Drill scenario library. The catalogue of drill scenarios used, showing coverage across the agent's task portfolio and demonstrating that drills test operational complexity rather than simplified subsets.
Trend analysis reports. Periodic reports (at least quarterly) showing individual and aggregate performance trends over time, identifying operators with declining trajectories and the remediation actions taken.

Retention requirements:

Drill results and remediation records: minimum 7 years for regulated financial services and safety-critical deployments; minimum 5 years for other regulated sectors; minimum 3 years otherwise.
Pre-deployment baselines: retained for the entire duration of the agent's operational life plus 3 years after decommissioning.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Evidence must exist as retained artefacts, not reconstructable after the fact.

8. Test Specification

Test 8.1: Drill Frequency Compliance

Stimulus: Retrieve the drill execution log for the past 12 months. Calculate the interval between consecutive drills for each operator group.
Expected behaviour: No interval between consecutive drills exceeds the maximum permitted period (90 days for high-risk/critical, 180 days for other deployments).
Pass criteria: 100% of operator groups have been drilled at or above the required frequency with no gaps exceeding the permitted interval.
Fail criteria: Any operator group has a drill interval exceeding the maximum permitted period, or any operator has not participated in a drill within the required timeframe.

Test 8.2: Performance Criteria Measurement Completeness

Stimulus: Select a random sample of 10 drill result records. Verify that each record includes quantitative measurements for accuracy, throughput, completeness, and time-to-completion, and that each measurement is compared against the operator's pre-deployment baseline.
Expected behaviour: All four performance dimensions are measured and baseline-compared in every drill record.
Pass criteria: 100% of sampled drill records contain all four performance measurements with baseline comparisons and explicit pass/fail determinations.
Fail criteria: Any sampled drill record is missing one or more performance measurements, or any record lacks a baseline comparison.

Test 8.3: Drill Complexity Verification

Stimulus: Compare the complexity profile of drill scenarios (task types, data complexity, volume requirements) against the production complexity profile of the agent's actual workload. Assess whether drills represent operational conditions.
Expected behaviour: Drill scenarios match or approximate production complexity across task type distribution, data complexity, and volume expectations.
Pass criteria: Drill scenarios cover at least 80% of the agent's task type portfolio, and volume/complexity requirements are within 75% of production levels.
Fail criteria: Drill scenarios cover less than 50% of task types, or volume/complexity requirements are less than 50% of production levels, indicating simplified drills that do not test operational readiness.

Test 8.4: Remediation Trigger and Completion

Stimulus: Identify all drill results where an operator scored below the defined performance threshold. For each below-threshold result, verify that a remediation plan was created, remediation activities were completed, a re-test was conducted, and the operator did not return to supervisory duties before passing the re-test.
Expected behaviour: Every below-threshold result triggers a documented remediation cycle that completes before the operator resumes supervisory duties.
Pass criteria: 100% of below-threshold drill results have associated remediation records with documented completion and successful re-test. No operator resumed supervisory duties between a failed drill and a successful re-test.
Fail criteria: Any below-threshold result lacks remediation records, or any operator resumed supervisory duties before passing a re-test.

Test 8.5: Baseline Existence and Coverage

Stimulus: Retrieve the pre-deployment baseline records. Verify that a baseline exists for every operator currently assigned to supervisory duties for agent-automated tasks.
Expected behaviour: Every current operator has a documented baseline.
Pass criteria: 100% of current operators have baseline records covering all four performance dimensions.
Fail criteria: Any current operator lacks a baseline record, or any baseline is missing one or more performance dimensions.

Test 8.6: Drill Independence from Agent Assistance

Stimulus: Review drill environment configurations and drill execution logs. Verify that the AI agent was not available, not running, and not accessible during drill execution.
Expected behaviour: Drills are conducted in environments where the agent is completely unavailable. No agent outputs, suggestions, or reference data are accessible to drill participants.
Pass criteria: Drill environment documentation confirms agent unavailability, and drill logs show no agent interactions during drill periods.
Fail criteria: Any evidence that the agent was available, running, or consulted during a drill, or drill environment documentation does not address agent isolation.

Test 8.7: Drill Result Documentation and Retention

Stimulus: Request drill result records from 18 months ago (or the earliest available if the programme is newer). Verify that records are retrievable, complete, and unmodified.
Expected behaviour: Historical drill records are retained and retrievable within the defined retention period.
Pass criteria: Records are produced within 48 hours, contain all required fields, and integrity verification (checksums or audit trail) confirms no post-hoc modification.
Fail criteria: Records are unavailable, incomplete, or show evidence of post-hoc modification.

Conformance Scoring

Score 0: No deskilling drill programme exists. Operators have not been tested on manual task execution since agent deployment.
Score 1: Drills are conducted but are informal — irregular frequency, no quantitative performance measurement, no baseline comparisons, and no remediation process for poor performance.
Score 2: A structured drill programme exists with defined frequency, quantitative performance criteria, baseline comparisons, and documented remediation processes. All mandatory requirements (4.1 through 4.6) are satisfied.
Score 3: Verified by independent audit — an independent party has validated drill frequency compliance, performance measurement rigour, drill complexity adequacy, remediation effectiveness, and trend analysis accuracy. Drill results are integrated with fallback staffing models and agent autonomy decisions.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 14 (Human Oversight)	Direct requirement
EU AI Act	Article 9 (Risk Management System)	Supports compliance
SOX	Section 404 (Internal Controls)	Supports compliance
FCA SYSC	SYSC 6.1.1R (Adequate Systems and Controls)	Supports compliance
NIST AI RMF	GOVERN 1.5 (Ongoing Monitoring)	Supports compliance
ISO 42001	Clause 8.4 (Operation of AI System)	Supports compliance
DORA	Article 11 (Response and Recovery)	Direct requirement

EU AI Act — Article 14 (Human Oversight)

Article 14 requires that high-risk AI systems be designed and developed so that they can be effectively overseen by natural persons during their period of use. Article 14(4)(a) specifically requires that human overseers are able to "fully understand the capacities and limitations of the high-risk AI system and be able to duly monitor its operation." This presupposes that overseers retain sufficient domain competence to evaluate the system's outputs and intervene when necessary. Deskilling drills provide the mechanism through which organisations verify that this competence is maintained over time. Without drills, the Article 14 compliance claim rests on an assumption of human competence that may no longer be true.

SOX — Section 404 (Internal Controls)

For SOX-regulated organisations, deskilling drills are a component of the internal control framework governing AI agent operations. The fallback to manual operation is an internal control — a mechanism that ensures financial reporting accuracy when the agent is unavailable. If the humans who constitute this control have lost the competence to execute it, the control is ineffective. SOX Section 404 assessments should evaluate whether deskilling drill programmes provide adequate assurance that manual fallback controls are operational.

FCA SYSC — SYSC 6.1.1R

The FCA requires regulated firms to maintain systems and controls that are adequate for the nature, scale, and complexity of their activities. A firm that relies on AI agents for material operational functions and retains human operators as a fallback must ensure that the fallback is functional — not merely staffed. Deskilling drills demonstrate that the firm's fallback controls are adequate by providing empirical evidence of manual operational capability.

NIST AI RMF — GOVERN 1.5

GOVERN 1.5 addresses ongoing monitoring processes for AI systems, including the monitoring of human-AI interaction effectiveness. Deskilling drills are a specific monitoring mechanism that verifies the human component of the human-AI operational model remains effective over time.

DORA — Article 11 (Response and Recovery)

DORA Article 11 requires financial entities to establish ICT-related incident response and recovery plans, including testing those plans through regular exercises. Deskilling drills are a form of recovery exercise that tests the organisation's ability to maintain operations when the AI agent (as an ICT system) is unavailable. The drill results provide evidence that the organisation can meet its recovery time objectives through manual operation.

ISO 42001 — Clause 8.4

ISO 42001 Clause 8.4 addresses the operation of AI systems within the AI management system, including requirements for human competence in AI system operation. Deskilling drills directly support competence assurance for operators who interact with AI systems and must retain the ability to operate without them.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Organisation-wide — affects all operational functions dependent on agent-automated tasks where human fallback is assumed

Consequence chain: Without deskilling drills, operator competence degrades silently over months. The immediate failure mode is invisible — no alarms fire when an operator's manual skills fall below the level assumed in the fallback plan. The degradation becomes visible only when a real event demands manual operation: an agent outage, a regulatory suspension, a safety-critical override. At that point, the organisation discovers simultaneously that (a) the agent is unavailable and (b) the humans cannot perform the work. The operational impact is a capacity collapse — throughput drops by 70-90% compared to agent-assisted operation, and error rates spike to levels that may exceed pre-agent historical norms because operators have lost not just speed but also accuracy. The business consequences include service delivery failures, regulatory breaches for failing to meet statutory processing timescales, financial losses from incorrect manual processing, and potential safety incidents in critical domains. The reputational consequence is severe: the organisation must disclose that it deployed an AI agent, allowed its human operators to lose the ability to do the work, and then could not operate when the agent failed. The regulatory consequence is a finding that the organisation's fallback plan and human oversight arrangements were inadequate — a finding that undermines the organisation's credibility for all future AI deployments.

Cross-references: AG-446 (Training Recertification Cadence Governance) defines the training cadence that supports skill maintenance; deskilling drills verify that training is effective by measuring actual performance. AG-426 (Fallback Staffing Governance) defines staffing levels for manual fallback; drill results validate whether the assumed per-operator throughput is achievable. AG-440 (Oversight Ergonomic Design Governance) ensures that oversight interfaces do not exacerbate deskilling by making supervisory roles entirely passive. AG-420 (Tabletop Exercise Governance) covers scenario-based exercises at the strategic level; deskilling drills are operational-level exercises focused on individual task competence. AG-422 (Recovery Time Objective Governance) defines how quickly the organisation must recover from agent failure; drill data determines whether the RTO is achievable with available manual capacity. AG-008 (Governance Continuity Under Failure) ensures that governance processes themselves survive agent failure — deskilling drills ensure the humans within those processes can function.

Cite this protocol

AgentGoverning. (2026). AG-447: Deskilling Mitigation Drill Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-447

← Previous Protocol

AG-446

Training Recertification Cadence Governance

Next Protocol →

AG-448

Escalation Timeliness Governance