AG-426: Fallback Staffing Governance

2. Summary

Fallback Staffing Governance requires that organisations operating AI agent systems maintain documented, tested, and exercised plans for human staffing that can sustain operational control when automation is constrained, degraded, or fully disabled. When an agent system fails, degrades, or is deliberately taken offline for safety reasons, the workload does not disappear — it reverts to human operators who may not have processed that volume manually for months or years. This dimension mandates that organisations calculate the human staffing capacity required to maintain critical operations at defined service levels during agent outages, pre-identify and train fallback personnel, establish activation triggers linked to the adverse event severity matrix, and regularly exercise the fallback plan to verify that staffing assumptions remain valid as agent workloads evolve.

3. Example

Scenario A — Agent Outage Exposes Zero Manual Processing Capacity: A financial-value agent processes 94% of trade confirmations for a mid-tier investment bank, handling 12,000 confirmations per day with a team of 4 human operators managing exceptions and approvals. The agent suffers a critical failure at 08:15 on a Monday morning and is taken offline. The 4 human operators are now responsible for processing 12,000 confirmations manually. Their maximum manual throughput — last measured 3 years ago when the agent was first deployed — was 150 confirmations per person per day. The team can process 600 confirmations in a day, leaving 11,400 unprocessed. By regulatory deadline (T+1 settlement), 8,200 confirmations are unprocessed. The firm breaches CSDR settlement discipline requirements for 8,200 trades. Counterparties initiate buy-in procedures. The firm's cash penalties under CSDR total €1.4 million for the first day alone, with cascading failures as the backlog compounds on subsequent days.

What went wrong: The organisation had no fallback staffing plan. When the agent was deployed, 28 operators were reassigned to other roles. No plan existed to recall or redeploy those operators during an agent outage. The 4 remaining operators were skilled in exception handling but had not performed bulk manual processing in 3 years. No calculation had been performed to determine the staffing gap between agent capacity and available human capacity. Consequence: €1.4 million in first-day settlement penalties, 8,200 counterparty relationships strained, regulatory inquiry from ESMA, 11-day backlog requiring 40 temporary staff at a cost of €320,000, and permanent reputational damage in the interdealer market.

Scenario B — Fallback Staff Lack Current System Access and Training: A customer-facing agent handles 78% of insurance claims intake for a regional insurer, processing 3,400 claims per day. Following a model integrity incident (AG-379), the agent is suspended pending investigation. The organisation's fallback plan identifies 22 claims processors from other departments who can be redeployed. However, 14 of the 22 have had their access to the claims processing system revoked during an access recertification exercise 6 months earlier. Of the remaining 8, only 3 have processed claims in the current system version — the claims platform was upgraded 4 months ago with a new interface and workflow. The 3 capable operators can process approximately 450 claims per day. The backlog grows by 2,950 claims per day. After 5 days, 14,750 claims are unprocessed, customers are unable to reach human agents due to call centre overload, and the regulator receives 340 formal complaints.

What went wrong: The fallback staffing plan was created when the agent was deployed but never updated. Access permissions for fallback staff were revoked during routine recertification because the staff were not actively using the claims system. The system upgrade invalidated the training of most fallback staff. No exercise had been conducted to verify that fallback staff could actually perform the required tasks with current tools and access. Consequence: 14,750-claim backlog, 340 regulatory complaints, FCA investigation into treating customers fairly obligations, £680,000 in overtime and temporary staffing costs, and 4 claims where delayed processing resulted in denied coverage that should have been approved (subsequent litigation costs: £290,000).

Scenario C — Cross-Border Agent Outage Requires Multi-Timezone Staffing: A cross-border agent operating in 6 jurisdictions handles regulatory reporting submissions across EU, UK, US, and APAC time zones. The agent is taken offline after discovering a data quality issue that has caused incorrect reports to be filed in 2 jurisdictions. The organisation's fallback plan assumes a single London-based team of 8 analysts can handle manual submissions. However, APAC regulatory deadlines fall during London's night hours, and 3 of the 6 jurisdictions require submissions in the local language (Japanese, Korean, Portuguese) which none of the London team speaks. The APAC submissions are missed entirely for the first 2 days. The Brazilian submission deadline passes without filing. Japanese FSA and Korean FSS both issue formal warnings. The organisation scrambles to engage external consultants in each jurisdiction at premium rates.

What went wrong: The fallback staffing plan did not account for the multi-timezone, multi-language requirements of the cross-border agent's workload. A single-location team cannot provide 24-hour coverage, and language requirements were not mapped to fallback staff capabilities. The assumption that one team could replace a global agent was never tested. Consequence: Missed regulatory filings in 3 jurisdictions, formal warnings from 2 regulators, €2.1 million in emergency consulting fees, 60-day enhanced supervision by the FSA, and a mandatory remediation programme requiring the organisation to establish regional fallback teams within 90 days.

4. Requirement Statement

Scope: This dimension applies to any organisation where AI agents perform operational tasks that would require human execution if the agent were unavailable. The scope encompasses all agent workloads where the absence of the agent would create a processing gap — tasks that cannot simply be deferred indefinitely but must be completed within defined time constraints (regulatory deadlines, contractual SLAs, safety requirements, or business continuity thresholds). The dimension covers the full lifecycle of fallback staffing: capacity calculation, personnel identification, training and access provisioning, activation triggers, operational handover procedures, workload prioritisation during degraded operations, and regular exercising of the plan. Organisations that operate agents exclusively for advisory or enhancement purposes — where the agent's absence does not create an unprocessable backlog — are minimally affected but should still consider fallback arrangements for any advisory function where absence creates regulatory or safety risk.

4.1. A conforming system MUST maintain a documented fallback staffing plan for every agent workload where agent unavailability would create a processing backlog that exceeds defined operational thresholds, including the calculation of required human staffing capacity at defined service levels.

4.2. A conforming system MUST calculate the staffing gap — the difference between agent processing capacity and available human processing capacity — for each critical agent workload, updating this calculation whenever agent workload volumes change by more than 20% or when fallback staff availability changes.

4.3. A conforming system MUST pre-identify named fallback personnel for each critical agent workload, with verified current access to the systems required to perform the work, verified training on the current version of those systems, and verified availability commitments (e.g., on-call rosters, cross-training agreements, or contractual standby arrangements).

4.4. A conforming system MUST define activation triggers for fallback staffing that are linked to the adverse event severity matrix (AG-419) and the recovery time objectives (AG-422), specifying which agent outage conditions trigger partial fallback (supplementing degraded agent capacity with human staff) and which trigger full fallback (replacing agent capacity entirely with human staff).

4.5. A conforming system MUST include a workload prioritisation framework within the fallback plan that determines which tasks are processed first when human capacity is insufficient to handle the full agent workload, with prioritisation criteria based on regulatory deadlines, safety impact, governed exposure, and customer impact.

4.6. A conforming system MUST exercise the fallback staffing plan at least annually through a practical exercise — not a tabletop discussion — in which fallback personnel actually perform representative tasks using the production systems (or a production-equivalent environment) to verify that access, training, and throughput assumptions are valid.

4.7. A conforming system SHOULD maintain fallback staffing capacity sufficient to process at least 60% of the agent's normal workload within the defined recovery time objective, acknowledging that full agent replacement throughput is rarely achievable and that the workload prioritisation framework (Requirement 4.5) determines which tasks are processed within the available capacity.

4.8. A conforming system SHOULD establish mutual aid agreements with other business units, partner organisations, or staffing providers that can supply additional trained personnel within defined mobilisation timeframes (see AG-427) to supplement internal fallback capacity during extended agent outages.

4.9. A conforming system SHOULD implement automated workload routing that redirects agent tasks to human processing queues when the agent is unavailable, rather than requiring manual intervention to reroute work.

4.10. A conforming system SHOULD maintain runbook documentation for each critical agent workload that enables a competent operator to perform the task manually, including step-by-step procedures, decision criteria, escalation points, and quality checks that replicate the agent's critical decision logic.

4.11. A conforming system MAY implement graduated automation reduction — rather than binary agent-on/agent-off — where the agent's autonomy is progressively reduced and human involvement progressively increased as incident severity escalates, maintaining partial automation where safe while increasing human oversight.

5. Rationale

The deployment of AI agents into operational roles creates a dependency that organisations frequently underestimate. When an agent processes 90% or more of a workload, the human team that previously handled that workload has typically been reassigned, retrained for other roles, or reduced through attrition. The institutional knowledge of how to process the workload manually erodes. System access for fallback personnel lapses. The muscle memory of manual processing fades. The organisation has, in effect, transferred a critical operational capability from a resilient human team (which could be scaled by hiring and training) to a single automated system (which can fail completely and instantaneously).

This creates a brittleness that traditional business continuity planning does not address. Traditional BCP focuses on infrastructure resilience — redundant data centres, failover systems, backup power. It assumes that the human workforce is available and capable. Agent dependency inverts this assumption: the infrastructure may be fine, but the agent is unavailable, and the humans who could perform the work are no longer equipped to do so. The failure mode is not a data centre outage resolved by failover — it is a capability gap that can only be filled by trained humans who may not exist in sufficient numbers.

The risk is amplified by the speed at which agent workloads grow. An agent that handles 50% of a workload at deployment may handle 90% within 12 months as the organisation shifts resources and the agent's scope expands. The fallback staffing plan created at deployment — assuming 50% human capacity — becomes dangerously inadequate within a year. Without a requirement to recalculate the staffing gap as workloads evolve, the plan degrades silently.

The regulatory dimension is significant. Financial regulators expect firms to demonstrate that critical business services can be maintained within impact tolerances during disruptions. The FCA's operational resilience framework (PS21/3) requires firms to map important business services and ensure they can be delivered within impact tolerances even when technology fails. If an AI agent is a critical component of an important business service, the firm must demonstrate that the service can continue when the agent is unavailable — and that demonstration requires fallback staffing. DORA Article 11 requires financial entities to have business continuity policies that include human resource planning for ICT incident response. The EU AI Act Article 14 requires human oversight mechanisms for high-risk AI systems, which necessarily includes the ability to operate without the AI system when oversight determines that the system must be suspended.

The human factors dimension is equally important. Fallback staff who have not performed a task for months or years will be slower, less accurate, and more prone to errors than they were when they performed the task daily. Throughput estimates based on historical performance (before the agent was deployed) will overstate actual fallback capacity. System interfaces may have changed. Business rules may have been updated. New regulatory requirements may have been implemented in the agent's logic but not documented in human-readable procedures. The fallback plan must account for this degradation and be exercised regularly to identify and remediate gaps before they are exposed by an actual incident.

6. Implementation Guidance

Fallback staffing governance requires a structured approach that begins with capacity analysis and extends through personnel management, training, exercising, and continuous recalibration. The core principle is that fallback staffing is not a one-time plan created at agent deployment — it is a living operational capability that must be maintained with the same rigour as the agent system itself.

Recommended patterns:

Capacity gap analysis with documented assumptions. For each critical agent workload, document: the agent's current daily processing volume, the number of human operators required to process that volume manually, the current number of available fallback operators, their estimated manual throughput (validated through exercising, not assumed from historical data), and the resulting capacity gap. Express the gap in concrete terms: "Agent processes 12,000 items/day; 4 available operators can process 600 items/day; gap is 11,400 items/day requiring 76 additional operators at current throughput rates." Update quarterly or when workload volumes change significantly.
Tiered fallback staffing pools. Establish three tiers of fallback personnel. Tier 1: operators who perform the task regularly alongside the agent (exception handlers, quality reviewers) and can immediately scale to manual processing. Tier 2: operators in adjacent roles who are cross-trained and can be mobilised within 4 hours. Tier 3: operators from other business units or external providers who can be mobilised within 24-48 hours with just-in-time training using runbook documentation. Each tier has a defined mobilisation time and expected throughput rate.
Access preservation for fallback personnel. Ensure that all identified fallback personnel maintain active, verified access to the systems required for manual processing. Where access recertification processes exist, fallback staff should be flagged as requiring continued access even when not actively using the system. Access verification should be included in the quarterly exercise cycle.
Runbook-driven manual processing. Maintain detailed runbooks for each critical agent workload that enable a competent operator to perform the task manually. Runbooks should include: step-by-step procedures with screenshots of the current system interface, decision trees for complex judgement calls that the agent makes automatically, quality check procedures that replicate the agent's validation logic, escalation criteria for situations outside the runbook's scope, and regulatory requirements that must be met during manual processing. Runbooks must be updated whenever the agent's logic, the underlying systems, or the regulatory requirements change.
Exercised throughput baselines. During annual exercises, measure actual throughput rates for each tier of fallback staff. Use these measured rates — not historical or estimated rates — for capacity planning. Track throughput trends over time: if fallback staff throughput degrades between exercises, investigate and remediate (additional training, system simplification, better documentation).

Anti-patterns to avoid:

Paper plans without exercises. A fallback staffing plan that has never been exercised is an assumption, not a capability. The exercise will almost always reveal gaps — lapsed access, outdated training, unrealistic throughput assumptions, missing runbooks — that invalidate the plan. Without exercising, these gaps are discovered only during an actual incident.
Fallback capacity calculated from pre-deployment throughput. Using throughput rates from before the agent was deployed (e.g., "the team processed 5,000 items/day in 2021") dramatically overstates current fallback capacity. Those operators have not performed the task at that volume for years, the systems have changed, and many of those operators have left the organisation.
Single-location fallback for multi-timezone operations. A global agent's workload spans time zones, languages, and regulatory jurisdictions. A fallback team in one location cannot cover 24-hour operations, may lack language capabilities, and may not be authorised to make submissions in all jurisdictions.
Fallback plan that assumes full workload replacement. It is rarely feasible to replace 100% of an agent's capacity with human operators. The fallback plan must include a workload prioritisation framework that determines which tasks are processed first when capacity is limited, rather than assuming all tasks can be completed.
Treating fallback staffing as an IT responsibility. Fallback staffing is a business continuity responsibility, not an IT responsibility. The business units that own the agent workloads must own the fallback staffing plan, including personnel identification, training, and exercising. IT provides the technical infrastructure (access provisioning, workload routing) but cannot determine staffing requirements.

Industry Considerations

Financial Services. Settlement, payment, and regulatory reporting functions have hard deadlines that cannot be extended during agent outages. Financial firms must calculate fallback staffing requirements against these deadlines, not against normal processing windows. The FCA and PRA expect firms to demonstrate that important business services can be maintained within impact tolerances — which requires quantified fallback staffing capacity, not just a plan to "call in additional staff." CASS rules require that client money reconciliation occurs daily regardless of system availability, creating an absolute staffing floor.

Healthcare. Clinical agents supporting triage, medication verification, or diagnostic assistance create direct patient safety dependencies. Fallback staffing in healthcare must account for clinical competency requirements — fallback staff must hold appropriate clinical qualifications, not just system access. Fallback activation must be near-instantaneous for patient-facing systems, with defined maximum response times measured in minutes, not hours.

Public Sector and Rights-Sensitive. Agents processing benefits claims, immigration applications, or law enforcement referrals affect fundamental rights. Fallback staffing must ensure that processing backlogs do not create unlawful delays in rights-affecting decisions. Public sector organisations must also consider that fallback processing rates may create de facto prioritisation of certain populations, which must be assessed for equality impact.

Crypto and Web3. Agents monitoring blockchain state, managing liquidity positions, or executing smart contract interactions may have time-critical fallback requirements measured in blocks rather than hours. Fallback staff must understand blockchain-specific risks including transaction finality, gas price volatility, and smart contract state dependencies.

Maturity Model

Basic Implementation — A fallback staffing plan exists for each critical agent workload. The plan identifies fallback personnel by name or role. A basic capacity gap calculation has been performed. Fallback staff have system access. A workload prioritisation framework exists. Limitations: the plan has not been exercised, throughput assumptions are estimated rather than measured, runbooks may be incomplete or outdated, access has not been recently verified.

Intermediate Implementation — Fallback staffing plans are exercised annually with measured throughput rates used for capacity planning. Tiered staffing pools are established (Tier 1, 2, 3) with defined mobilisation times. Access for all fallback staff is verified quarterly. Runbooks are maintained for all critical agent workloads and updated when systems or requirements change. Automated workload routing redirects tasks to human queues when the agent is unavailable. Mutual aid agreements provide access to external fallback capacity. Capacity gap calculations are updated quarterly.

Advanced Implementation — All intermediate capabilities plus: fallback exercises are conducted semi-annually with realistic scenarios including multi-day outages and degraded infrastructure. Graduated automation reduction enables progressive handover from agent to human as incident severity increases. Real-time dashboards show current fallback capacity against current workload, including mobilisation time for each staffing tier. Fallback throughput trends are tracked and declining performance triggers remediation. The fallback plan is integrated with the organisation's broader operational resilience framework, and fallback staffing capacity is reported as a key resilience metric to the board.

7. Evidence Requirements

Required artefacts:

Fallback staffing plan. Documented plan for each critical agent workload including: capacity gap analysis, named fallback personnel with their tier assignments and mobilisation times, workload prioritisation framework, activation triggers linked to AG-419 severity levels, and escalation procedures for when fallback capacity is insufficient.
Capacity gap calculations. Quantified analysis showing agent processing capacity, available human processing capacity (based on exercised throughput rates), the resulting gap, and the plan to address the gap (additional training, mutual aid agreements, external staffing contracts).
Fallback personnel roster. Named individuals with their assigned workloads, current system access verification dates, training completion dates, most recent exercise participation dates, and availability commitments.
Exercise records. Records of each fallback exercise including: date, scenario, participating personnel, measured throughput rates, identified gaps, and remediation actions taken. Minimum annual frequency.
Runbook documentation. Current runbooks for each critical agent workload, with version history and last-reviewed dates.
Activation records. Records of any actual fallback activations including: trigger event, activation time, fallback personnel deployed, workload processed, backlog metrics, and lessons learned.

Retention requirements:

Fallback staffing plans, capacity gap calculations, exercise records, and activation records: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators, auditors, or operational resilience review boards within 48 hours of request. Exercise records and activation records must be queryable by workload, by personnel, and by date range.

8. Test Specification

Test 8.1: Fallback Staffing Plan Existence and Completeness

Stimulus: Request the fallback staffing plan for each critical agent workload. Verify that each plan includes: capacity gap analysis, named fallback personnel, workload prioritisation framework, activation triggers, and escalation procedures.
Expected behaviour: A complete plan exists for every critical agent workload with all required components.
Pass criteria: Plans exist for 100% of critical agent workloads. Each plan contains all five required components with current data (updated within the last 12 months).
Fail criteria: Any critical agent workload lacks a fallback staffing plan, or any plan is missing required components, or any plan contains data older than 12 months.

Test 8.2: Fallback Personnel Access Verification

Stimulus: For each named fallback person in the roster, attempt to log in to the required production systems (or production-equivalent test systems) and perform a representative task from the assigned workload.
Expected behaviour: All named fallback personnel can successfully authenticate, access the required systems, and initiate (but not complete, unless in a test environment) a representative task.
Pass criteria: 100% of Tier 1 fallback personnel and at least 90% of Tier 2 personnel can access the required systems and initiate the representative task.
Fail criteria: Any Tier 1 personnel cannot access the required systems, or more than 10% of Tier 2 personnel cannot access the required systems.

Test 8.3: Capacity Gap Calculation Accuracy

Stimulus: Compare the documented capacity gap calculation against current agent processing volumes (from the last 30 days of production data) and current fallback personnel availability (from the current roster). Verify that the calculation reflects current reality.
Expected behaviour: The documented capacity gap calculation matches current agent volumes within 20% and reflects the current fallback personnel roster.
Pass criteria: Agent volume in the capacity calculation is within 20% of actual 30-day average. All personnel in the calculation are still employed, still assigned as fallback, and still have verified access.
Fail criteria: Agent volume in the calculation differs from actual by more than 20%, or any personnel in the calculation are no longer available, or the calculation has not been updated within 12 months.

Test 8.4: Fallback Exercise Execution and Throughput Measurement

Stimulus: Execute a fallback exercise in which Tier 1 fallback personnel process a representative sample of the agent's workload (minimum 100 items or 2 hours of processing, whichever is greater) using the production system or a production-equivalent environment. Measure actual throughput rates.
Expected behaviour: Fallback personnel complete the exercise. Measured throughput rates are recorded. Gaps between measured rates and planned rates are identified and documented.
Pass criteria: Exercise completed by all participating Tier 1 personnel. Throughput rates measured and documented. Measured throughput is within 30% of the rates assumed in the capacity gap calculation.
Fail criteria: Exercise not completed, throughput rates not measured, or measured throughput is less than 70% of the rates assumed in the capacity gap calculation (indicating the plan overstates fallback capacity).

Test 8.5: Activation Trigger Integration with Severity Matrix

Stimulus: Simulate an agent outage at the severity level defined as the fallback activation trigger in the fallback plan. Verify that the activation trigger fires and initiates the fallback mobilisation process.
Expected behaviour: The fallback activation trigger fires within the defined window (recommended: under 15 minutes for Severity-1, under 60 minutes for Severity-2). Notification is sent to all identified fallback personnel. Workload routing begins redirecting tasks to human processing queues.
Pass criteria: Activation trigger fires within the defined window. Notifications reach all Tier 1 fallback personnel. Workload routing redirects at least one test task to the human processing queue.
Fail criteria: Activation trigger does not fire, notifications fail to reach fallback personnel, or workload routing does not redirect tasks.

Test 8.6: Workload Prioritisation Framework Functionality

Stimulus: Present the fallback team with a workload queue containing items across all priority categories (regulatory-deadline, safety-critical, financially-material, and standard) in randomised order. Observe whether the team processes items according to the prioritisation framework.
Expected behaviour: Fallback personnel process highest-priority items first, consistent with the documented prioritisation framework. Regulatory-deadline and safety-critical items are processed before financially-material items, which are processed before standard items.
Pass criteria: Processing order aligns with the documented prioritisation framework. No lower-priority item is processed while a higher-priority item with an earlier deadline remains unprocessed.
Fail criteria: Processing order does not align with the prioritisation framework, or fallback personnel are unaware of the prioritisation criteria.

Test 8.7: Runbook Currency and Usability

Stimulus: Provide the current runbook for a critical agent workload to a fallback operator who has not performed the task in the last 90 days. Ask the operator to process 5 representative items using only the runbook as guidance (in a test environment). Measure completion rate and error rate.
Expected behaviour: The operator completes at least 4 of 5 items correctly using the runbook. Any errors are attributable to operator skill, not runbook deficiency (missing steps, outdated screenshots, incorrect procedures).
Pass criteria: At least 4 of 5 items processed correctly. No errors attributable to runbook deficiency (outdated content, missing steps, incorrect system references).
Fail criteria: Fewer than 4 of 5 items processed correctly, or any error is directly attributable to an outdated or incorrect runbook.

Conformance Scoring

Score 0: No fallback staffing plan exists. Agent outages result in unprocessed workload with no pre-planned human capacity to absorb it.
Score 1: A fallback staffing plan exists documenting named personnel and basic capacity calculations, but the plan has not been exercised, access has not been verified, and throughput assumptions are estimated rather than measured. Runbooks may be absent or outdated.
Score 2: Fallback staffing plans are exercised annually with measured throughput rates. Tiered staffing pools are established. Access is verified quarterly. Runbooks are current. Activation triggers are linked to the severity matrix. Workload prioritisation is documented and understood by fallback staff.
Score 3: All Score 2 capabilities plus: exercises are semi-annual with realistic multi-day scenarios. Graduated automation reduction enables progressive handover. Real-time fallback capacity dashboards exist. Throughput trends are tracked with declining performance triggering remediation. Fallback capacity is reported as a board-level resilience metric. Mutual aid agreements provide additional capacity for extended outages.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 14 (Human Oversight)	Direct requirement
EU AI Act	Article 9 (Risk Management System)	Supports compliance
SOX	Section 404 (Internal Controls over Financial Reporting)	Direct requirement
FCA SYSC	15A (Operational Resilience)	Direct requirement
NIST AI RMF	GOVERN 1.2, MAP 3.4 (Human-AI Teaming and Oversight)	Supports compliance
ISO 42001	6.1.2 (AI Risk Assessment)	Supports compliance
DORA	Article 11 (Business Continuity Policy)	Direct requirement

EU AI Act — Article 14 (Human Oversight)

Article 14 requires that high-risk AI systems are designed and developed in such a way that they can be effectively overseen by natural persons, including the ability to "decide, in any particular situation, not to use the high-risk AI system or to otherwise disregard, override or reverse the output of the high-risk AI system." This right to suspend or override an AI system is meaningless without the operational capacity to sustain the affected service when the system is suspended. If an organisation lacks the human staffing to process the workload when the agent is disabled, the theoretical right to suspend becomes a practical impossibility — the organisation cannot afford to exercise its oversight obligation because doing so would create an operational crisis. Fallback staffing governance ensures that human oversight is a genuine operational capability, not a theoretical right undermined by capacity constraints.

SOX — Section 404 (Internal Controls over Financial Reporting)

Section 404 requires that management assess the effectiveness of internal controls over financial reporting. For organisations where AI agents perform financial reporting functions (transaction processing, reconciliation, regulatory filing), the availability of human fallback capacity is an internal control. If the agent fails and no human fallback exists, financial reporting obligations cannot be met within required timeframes. SOX auditors will assess whether the organisation has quantified the fallback staffing requirement, identified and trained fallback personnel, and exercised the fallback plan. A paper plan that has never been exercised will not satisfy Section 404 requirements.

FCA SYSC — 15A (Operational Resilience)

SYSC 15A requires firms to identify their important business services, set impact tolerances for disruption to those services, and demonstrate that they can remain within those tolerances during severe but plausible scenarios. For firms where AI agents are critical components of important business services, remaining within impact tolerances during agent failure requires fallback staffing. The FCA has been explicit that operational resilience is not just about technology resilience — it encompasses the people, processes, and technology required to maintain service delivery. A firm that cannot demonstrate sufficient fallback staffing to maintain an important business service during an agent outage will fail its operational resilience self-assessment.

NIST AI RMF — GOVERN 1.2, MAP 3.4

GOVERN 1.2 addresses the allocation of resources for AI risk management, including human resources. MAP 3.4 addresses human-AI teaming configurations and their implications for human performance. Fallback staffing governance directly implements both subcategories: it ensures that sufficient human resources are allocated to maintain operations when AI systems are unavailable (GOVERN 1.2), and it addresses the human performance implications of operators who must resume manual processing after extended periods of agent-assisted or agent-automated operation (MAP 3.4). The framework's emphasis on measuring and monitoring AI system performance extends to measuring fallback human performance through exercising.

ISO 42001 — 6.1.2 (AI Risk Assessment)

ISO 42001 Clause 6.1.2 requires organisations to identify and assess risks related to the use of AI. The risk of insufficient human capacity to maintain operations during agent unavailability is a core AI-specific risk. The standard requires that identified risks are treated with proportionate measures — fallback staffing governance is the proportionate treatment for this risk. The standard's emphasis on documented, maintained risk treatment plans aligns with the requirement for exercised, current fallback plans.

DORA — Article 11 (Business Continuity Policy)

DORA Article 11 requires financial entities to establish comprehensive ICT business continuity policies, including "adequate and proportionate human resources" to address ICT-related incidents. Article 11(3) specifically requires business continuity plans that account for scenarios involving "the failure or absence of critical ICT services provided by ICT third-party service providers." AI agent systems — whether operated internally or provided by third parties — fall within this scope. The requirement for "adequate and proportionate human resources" directly maps to fallback staffing governance. DORA's testing requirements (Article 24-25) also align with this dimension's requirement for regular exercising of fallback plans.

10. Failure Severity

Field	Value
Severity Rating	Critical
Blast Radius	Organisation-wide — affecting every business function dependent on the unavailable agent, with potential cascade to customers, counterparties, and regulatory obligations

Consequence chain: Failure to maintain adequate fallback staffing creates a cascade that begins with operational paralysis and escalates through regulatory, financial, and reputational consequences. The immediate consequence is workload accumulation: when the agent goes offline and no fallback staff are available, unprocessed items accumulate at the agent's processing rate (potentially thousands per hour). The first escalation is deadline breach: regulatory filings, settlement obligations, customer response SLAs, and safety-critical processing deadlines are missed as the backlog grows faster than any improvised manual capacity can address. The second escalation is regulatory consequence: missed filings trigger penalties, missed settlements trigger buy-in procedures, and missed SLAs trigger contractual remedies. Regulators interpret the lack of fallback capacity as a governance failure — the organisation deployed an AI system it could not safely disable. The third escalation is forced resumption: under pressure to restore processing, the organisation may be forced to reactivate the agent before the original incident is fully resolved, accepting known risks rather than enduring the operational consequences of continued outage. This undermines the human oversight mandate (AG-019) and may create legal liability if the known risks materialise. The fourth escalation is strategic: the incident demonstrates that the organisation's operational resilience is contingent on a single automated system with no viable human fallback, undermining confidence from regulators, counterparties, and the board that the organisation can manage its AI dependencies responsibly.

Cross-references: AG-019 (Human Escalation & Override Triggers) defines the conditions under which agents must escalate to or be overridden by humans — fallback staffing ensures those humans are available and capable. AG-422 (Recovery Time Objective Governance) defines the time constraints within which fallback staffing must be mobilised and operational. AG-419 (Adverse Event Severity Matrix Governance) provides the severity classification that triggers fallback activation. AG-420 (Tabletop Exercise Governance) complements this dimension — tabletop exercises test decision-making while fallback exercises test operational execution. AG-425 (Emergency Change Freeze Governance) protects the environment from uncontrolled changes while fallback staff are processing workload manually. AG-427 (Mutual Aid and Vendor Coordination Governance) provides the framework for external fallback capacity when internal staffing is insufficient. AG-383 (Runtime Scheduler Fairness Governance) addresses the risk that degraded agent capacity disproportionately affects certain workload categories during partial fallback. AG-402 (Model Serving Rate Partitioning Governance) addresses capacity allocation when the agent operates in degraded mode alongside human fallback processing.

Cite this protocol

AgentGoverning. (2026). AG-426: Fallback Staffing Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-426

← Previous Protocol

AG-425

Emergency Change Freeze Governance

Next Protocol →

AG-427

Mutual Aid and Vendor Coordination Governance