Model Routing and Fallback Safety Governance requires that when an AI agent relies on an underlying model (LLM, ML model, or reasoning engine) and that model becomes unavailable, degraded, or untrustworthy, the system routes to a pre-approved fallback in a controlled manner — or halts safely. The routing decision must be governed by policy, not by the agent's own reasoning. Fallback models must be pre-validated for the agent's task profile. Routing must not silently change the agent's capability profile, risk characteristics, or governance posture. The system must never fall back to an unvalidated model, route to a model with broader capabilities than the primary, or continue operating when no safe fallback exists. This dimension ensures that model-layer failures do not cascade into governance failures.
Scenario A — Silent Fallback to Unvalidated Model Produces Harmful Outputs: A customer-facing agent for a healthcare organisation uses a specialised medical language model fine-tuned for clinical triage. The model provider experiences a 4-hour outage. The platform's automatic fallback routes to a general-purpose LLM that has not been validated for clinical use. During the outage, the agent continues responding to patient queries using the general-purpose model. The fallback model advises a patient with chest pain symptoms to "try resting and drinking water" instead of directing them to emergency services. The patient delays seeking treatment by 6 hours. The clinical governance team was not notified of the model switch.
What went wrong: The fallback was automatic and silent — no governance checkpoint validated whether the fallback model was appropriate for clinical use. The general-purpose model was not on the approved model list for clinical triage. No notification was sent to the clinical governance team. The agent's external behaviour appeared unchanged to users despite a fundamental change in its underlying capability. Consequence: Patient harm due to inappropriate clinical advice from an unvalidated model. Medical malpractice liability. Regulatory investigation by CQC (Care Quality Commission). Requirement to demonstrate that AI clinical tools operate within validated parameters at all times.
Scenario B — Fallback Model Has Broader Capabilities Than Primary: A financial-value agent uses a model constrained to process only structured financial data — no code execution, no internet access, no creative text generation. The primary model becomes unavailable. The fallback model is a general-purpose LLM with code execution capabilities enabled. The routing layer switches to the fallback model. A prompt injection attack that was ineffective against the constrained primary model succeeds against the fallback because the fallback model's code execution capability allows the injected instruction to execute arbitrary code. The attacker exfiltrates API keys from the agent's environment.
What went wrong: The fallback model had a broader capability profile than the primary model. The routing decision did not evaluate whether the fallback's capabilities exceeded the agent's approved scope. The governance controls calibrated for the constrained primary model were insufficient for the unconstrained fallback. Consequence: API key exfiltration leading to £320,000 in unauthorised transactions via the compromised keys. Complete agent infrastructure rebuild required. Regulatory notification under DORA.
Scenario C — Governed Fallback with Graceful Degradation: An enterprise workflow agent processes invoice approvals using a primary model hosted on-premises. The routing policy specifies: if the primary model is unavailable for more than 60 seconds, route to Fallback Model A (a validated, capability-equivalent model hosted by a pre-approved cloud provider); if Fallback Model A is also unavailable, suspend agent operations and route all pending requests to a human approval queue. At 14:32, the primary model fails. At 14:33, the routing layer switches to Fallback Model A, logs the routing change, and notifies the operations team. At 14:33, Fallback Model A is verified as available and within the approved capability profile. Processing continues. The operations team reviews the model performance comparison over the next 2 hours and confirms equivalent output quality. At 16:15, the primary model is restored and the routing layer switches back, logging the change.
What went right: The fallback was policy-governed with a defined sequence and a safe terminal state (human queue). The fallback model was pre-validated. The routing change was logged and notified. The capability profile was verified before routing. The system never operated without a validated model.
Scope: This dimension applies to every AI agent that depends on an underlying model — whether a large language model, a machine learning classifier, a reasoning engine, or any other model component — for its decision-making or action generation. The scope includes agents using a single model, agents using multiple models for different sub-tasks, and agents using model ensembles. The scope extends to all model-layer components: the model itself, the model's hosting infrastructure, the model's API endpoint, and any intermediary services between the agent and the model. If the agent generates outputs, makes decisions, or takes actions based on model inference, this dimension applies.
4.1. A conforming system MUST define a routing policy for each agent specifying: the primary model, an ordered list of approved fallback models, the conditions under which routing to each fallback is triggered, and a terminal safe state when no approved fallback is available.
4.2. A conforming system MUST ensure that every model in the routing policy (primary and fallback) has been validated for the agent's task profile, including capability boundaries, safety characteristics, and governance compliance.
4.3. A conforming system MUST prevent routing to any model not on the approved list for the agent's task profile, regardless of availability or performance characteristics.
4.4. A conforming system MUST verify that no fallback model has a broader capability profile than the primary model — fallback models must have equal or narrower capabilities.
4.5. A conforming system MUST log every routing change with: the timestamp, the reason for the change, the source model, the target model, and the verification result confirming the target model's approval status.
4.6. A conforming system MUST implement a terminal safe state — the action the system takes when no approved model is available — which defaults to suspending agent operations rather than continuing with an unvalidated model.
4.7. A conforming system SHOULD notify the designated governance function within 5 minutes of any routing change, including the reason and expected duration.
4.8. A conforming system SHOULD implement model health monitoring that detects degradation (increased latency, reduced quality, elevated error rates) before complete failure, enabling proactive routing rather than reactive failover.
4.9. A conforming system MAY implement gradual traffic shifting (canary routing) when transitioning between models, routing a small percentage of requests to the fallback model first to verify performance before full cutover.
AI agents are fundamentally dependent on their underlying models. The model is not just a component — it is the source of the agent's reasoning, judgment, and decision-making. When the model changes, the agent's behaviour changes. When the model fails, the agent's capability fails. When the model is replaced by a different model — even temporarily — the agent becomes a different agent in all aspects that matter for governance.
This creates a governance challenge that has no close precedent in traditional software. When a database fails over to a replica, the replica contains the same data and behaves identically. When a model fails over to a different model, the fallback may produce different outputs for the same inputs, may have different safety characteristics, and may have different vulnerability profiles. The routing decision is therefore a governance decision, not merely an infrastructure decision.
The critical insight is that model routing must be governed by the same standards as agent deployment. Deploying an agent with a new model requires validation. Routing an existing agent to a different model should require equivalent validation — performed in advance, not at the moment of failover. The pre-validation requirement ensures that when a routing change occurs under time pressure, the governance decision has already been made during a period of calm deliberation.
The capability profile constraint (requirement 4.4) addresses a subtle but critical risk: fallback models that are "better" in general capability terms may be worse from a governance perspective. A constrained model that cannot execute code, cannot access the internet, and cannot generate creative content has a smaller attack surface than a general-purpose model. Routing to the general-purpose model as a "fallback" expands the attack surface at the very moment when the system is under stress — exactly the wrong time to expand capabilities.
The terminal safe state requirement (4.6) embodies the principle that no operation is better than ungoverned operation. An agent that suspends because no approved model is available creates a temporary operational gap. An agent that continues operating with an unvalidated model creates a governance gap that can produce lasting harm. The operational gap is recoverable; the governance gap may not be.
The routing governance system operates between the agent and the model layer. It intercepts all model inference requests, routes them according to policy, monitors model health, and enforces the approved model list.
Recommended patterns:
Anti-patterns to avoid:
Financial Services. Model changes in trading and financial decision-making are subject to model risk management requirements (FCA SS1/23, Federal Reserve SR 11-7). A fallback model for a trading agent must have undergone the same model validation process as the primary model. The routing change must be reportable to the model risk management function. Trading operations may need to pause during model transitions to avoid transacting on a model with different risk characteristics.
Healthcare. Clinical AI models require regulatory approval (e.g., FDA 510(k), CE marking under MDR). A fallback model that has not received the same regulatory approval as the primary model cannot serve clinical requests. The terminal safe state for a clinical agent must route to a qualified human clinician rather than continuing with an unapproved model. Model routing changes must be recorded in the clinical audit trail.
Safety-Critical / CPS. In safety-critical systems, model fallback must be validated through the same functional safety process as the primary model (e.g., IEC 61508 SIL assessment). An unvalidated fallback model in a safety-critical application can introduce hazards that the safety case does not address. The terminal safe state must be a safe shutdown or human takeover, not continued autonomous operation.
Basic Implementation — A routing policy exists for each agent specifying the primary model and at least one fallback. Fallback models have been informally validated for the agent's task. Routing changes are logged. The terminal safe state is defined but may be "return error to user" rather than a graceful degradation path. Model health monitoring is basic (availability only, no quality metrics). Capability profile comparison between primary and fallback is manual and documented but not enforced at the routing layer.
Intermediate Implementation — Routing is managed by a policy-driven gateway that enforces the approved model registry. Fallback models have undergone formal validation equivalent to the primary model. The routing gateway enforces capability profile constraints — fallback models with broader capabilities are blocked. Model health monitoring includes latency, error rate, and availability, with automatic proactive routing. Routing changes trigger automated notifications to the governance function. The terminal safe state is a graceful degradation path (e.g., human queue, read-only mode).
Advanced Implementation — All intermediate capabilities plus: model quality monitoring detects output degradation before it causes harm (e.g., drift detection, hallucination rate monitoring). Canary routing validates fallback performance with a small traffic percentage before full cutover. The routing system has been independently tested for failure scenarios including simultaneous failure of primary and fallback models, model health metric manipulation, and routing policy injection. Formal safety analysis has verified that no routing scenario results in an ungoverned agent state.
Required artefacts:
Retention requirements:
Access requirements:
Testing AG-163 compliance requires verification that routing is policy-governed, capability-constrained, and safely terminal.
Test 8.1: Primary Model Failure Triggers Policy Routing
Test 8.2: Unapproved Model Rejection
Test 8.3: Capability Profile Constraint Enforcement
Test 8.4: Terminal Safe State Activation
Test 8.5: Routing Change Logging Completeness
Test 8.6: Proactive Routing on Degradation
Test 8.7: Governance Notification on Routing Change
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 9 (Risk Management System) | Direct requirement |
| EU AI Act | Article 15 (Accuracy, Robustness and Cybersecurity) | Direct requirement |
| FCA SS1/23 | Model Risk Management Principles | Direct requirement |
| Federal Reserve SR 11-7 | Supervisory Guidance on Model Risk Management | Direct requirement |
| DORA | Article 11 (ICT Third-Party Risk) | Supports compliance |
| ISO 42001 | Clause 8.4 (AI System Operation) | Supports compliance |
| NIST AI RMF | MAP 2.3, MANAGE 1.3 | Supports compliance |
| IEC 61508 | Functional Safety (for safety-critical deployments) | Supports compliance |
Article 15 requires that high-risk AI systems are resilient to errors, faults, and inconsistencies. Model failure is a fault; uncontrolled fallback is an error-amplifying response to that fault. AG-163 ensures that model-layer faults are handled in a controlled, governance-compliant manner, directly satisfying Article 15's robustness requirement. The requirement that fallback models have equal or narrower capabilities ensures that the system's cybersecurity posture does not degrade during fallback.
FCA SS1/23 sets expectations for firms' management of model risk, including the requirement that model changes are governed through a formal change process. Routing to a different model — even temporarily — is a model change. AG-163 ensures that this change is governed by the same standards as a permanent model change: the fallback must be pre-validated, the routing must be logged, and the governance function must be notified.
SR 11-7 defines model risk as the potential for adverse consequences from decisions based on incorrect or misused model outputs. Routing to an unvalidated fallback model during a primary model failure directly creates model risk. AG-163 mitigates this risk by requiring pre-validation of all fallback models and enforcing capability constraints that prevent the fallback from introducing new risk dimensions.
When models are provided by third parties (model-as-a-service), model failure is a third-party ICT risk event. AG-163's routing governance ensures that third-party model failures are handled within the organisation's risk management framework, with defined fallback paths that maintain governance compliance. The terminal safe state ensures that total third-party failure results in controlled suspension rather than ungoverned operation.
| Field | Value |
|---|---|
| Severity Rating | High |
| Blast Radius | Agent-specific — but cascading if the agent operates within a multi-agent pipeline, where model routing failure in one agent can degrade the entire pipeline |
Consequence chain: Uncontrolled model routing creates a class of failure where the agent appears to operate normally but its underlying reasoning has changed fundamentally. The immediate risk is that outputs produced by an unvalidated fallback model may not meet the safety, accuracy, or governance standards applied to the primary model. In healthcare, this can result in patient harm from inappropriate clinical advice. In financial services, this can result in trading decisions based on a model with different risk characteristics. In customer-facing applications, this can result in harmful, biased, or legally non-compliant outputs. The detection challenge is severe: because the agent's external interface remains unchanged, users and governance stakeholders may not realise the model has changed. The outputs appear to come from the same agent, through the same interface, in the same format — but the reasoning engine behind them is different. This "silent substitution" risk is the primary motivation for the notification and logging requirements. Without AG-163 governance, an organisation may discover weeks later that a significant portion of its AI-generated decisions were produced by an unvalidated model operating outside the governance framework.