AG-163: Model Routing and Fallback Safety Governance

2. Summary

Model Routing and Fallback Safety Governance requires that when an AI agent relies on an underlying model (LLM, ML model, or reasoning engine) and that model becomes unavailable, degraded, or untrustworthy, the system routes to a pre-approved fallback in a controlled manner — or halts safely. The routing decision must be governed by policy, not by the agent's own reasoning. Fallback models must be pre-validated for the agent's task profile. Routing must not silently change the agent's capability profile, risk characteristics, or governance posture. The system must never fall back to an unvalidated model, route to a model with broader capabilities than the primary, or continue operating when no safe fallback exists. This dimension ensures that model-layer failures do not cascade into governance failures.

3. Example

Scenario A — Silent Fallback to Unvalidated Model Produces Harmful Outputs: A customer-facing agent for a healthcare organisation uses a specialised medical language model fine-tuned for clinical triage. The model provider experiences a 4-hour outage. The platform's automatic fallback routes to a general-purpose LLM that has not been validated for clinical use. During the outage, the agent continues responding to patient queries using the general-purpose model. The fallback model advises a patient with chest pain symptoms to "try resting and drinking water" instead of directing them to emergency services. The patient delays seeking treatment by 6 hours. The clinical governance team was not notified of the model switch.

What went wrong: The fallback was automatic and silent — no governance checkpoint validated whether the fallback model was appropriate for clinical use. The general-purpose model was not on the approved model list for clinical triage. No notification was sent to the clinical governance team. The agent's external behaviour appeared unchanged to users despite a fundamental change in its underlying capability. Consequence: Patient harm due to inappropriate clinical advice from an unvalidated model. Medical malpractice liability. Regulatory investigation by CQC (Care Quality Commission). Requirement to demonstrate that AI clinical tools operate within validated parameters at all times.

Scenario B — Fallback Model Has Broader Capabilities Than Primary: A financial-value agent uses a model constrained to process only structured financial data — no code execution, no internet access, no creative text generation. The primary model becomes unavailable. The fallback model is a general-purpose LLM with code execution capabilities enabled. The routing layer switches to the fallback model. A prompt injection attack that was ineffective against the constrained primary model succeeds against the fallback because the fallback model's code execution capability allows the injected instruction to execute arbitrary code. The attacker exfiltrates API keys from the agent's environment.

What went wrong: The fallback model had a broader capability profile than the primary model. The routing decision did not evaluate whether the fallback's capabilities exceeded the agent's approved scope. The governance controls calibrated for the constrained primary model were insufficient for the unconstrained fallback. Consequence: API key exfiltration leading to £320,000 in unauthorised transactions via the compromised keys. Complete agent infrastructure rebuild required. Regulatory notification under DORA.

Scenario C — Governed Fallback with Graceful Degradation: An enterprise workflow agent processes invoice approvals using a primary model hosted on-premises. The routing policy specifies: if the primary model is unavailable for more than 60 seconds, route to Fallback Model A (a validated, capability-equivalent model hosted by a pre-approved cloud provider); if Fallback Model A is also unavailable, suspend agent operations and route all pending requests to a human approval queue. At 14:32, the primary model fails. At 14:33, the routing layer switches to Fallback Model A, logs the routing change, and notifies the operations team. At 14:33, Fallback Model A is verified as available and within the approved capability profile. Processing continues. The operations team reviews the model performance comparison over the next 2 hours and confirms equivalent output quality. At 16:15, the primary model is restored and the routing layer switches back, logging the change.

What went right: The fallback was policy-governed with a defined sequence and a safe terminal state (human queue). The fallback model was pre-validated. The routing change was logged and notified. The capability profile was verified before routing. The system never operated without a validated model.

4. Requirement Statement

Scope: This dimension applies to every AI agent that depends on an underlying model — whether a large language model, a machine learning classifier, a reasoning engine, or any other model component — for its decision-making or action generation. The scope includes agents using a single model, agents using multiple models for different sub-tasks, and agents using model ensembles. The scope extends to all model-layer components: the model itself, the model's hosting infrastructure, the model's API endpoint, and any intermediary services between the agent and the model. If the agent generates outputs, makes decisions, or takes actions based on model inference, this dimension applies.

4.1. A conforming system MUST define a routing policy for each agent specifying: the primary model, an ordered list of approved fallback models, the conditions under which routing to each fallback is triggered, and a terminal safe state when no approved fallback is available.

4.2. A conforming system MUST ensure that every model in the routing policy (primary and fallback) has been validated for the agent's task profile, including capability boundaries, safety characteristics, and governance compliance.

4.3. A conforming system MUST prevent routing to any model not on the approved list for the agent's task profile, regardless of availability or performance characteristics.

4.4. A conforming system MUST verify that no fallback model has a broader capability profile than the primary model — fallback models must have equal or narrower capabilities.

4.5. A conforming system MUST log every routing change with: the timestamp, the reason for the change, the source model, the target model, and the verification result confirming the target model's approval status.

4.6. A conforming system MUST implement a terminal safe state — the action the system takes when no approved model is available — which defaults to suspending agent operations rather than continuing with an unvalidated model.

4.7. A conforming system SHOULD notify the designated governance function within 5 minutes of any routing change, including the reason and expected duration.

4.8. A conforming system SHOULD implement model health monitoring that detects degradation (increased latency, reduced quality, elevated error rates) before complete failure, enabling proactive routing rather than reactive failover.

4.9. A conforming system MAY implement gradual traffic shifting (canary routing) when transitioning between models, routing a small percentage of requests to the fallback model first to verify performance before full cutover.

5. Rationale

AI agents are fundamentally dependent on their underlying models. The model is not just a component — it is the source of the agent's reasoning, judgment, and decision-making. When the model changes, the agent's behaviour changes. When the model fails, the agent's capability fails. When the model is replaced by a different model — even temporarily — the agent becomes a different agent in all aspects that matter for governance.

This creates a governance challenge that has no close precedent in traditional software. When a database fails over to a replica, the replica contains the same data and behaves identically. When a model fails over to a different model, the fallback may produce different outputs for the same inputs, may have different safety characteristics, and may have different vulnerability profiles. The routing decision is therefore a governance decision, not merely an infrastructure decision.

The critical insight is that model routing must be governed by the same standards as agent deployment. Deploying an agent with a new model requires validation. Routing an existing agent to a different model should require equivalent validation — performed in advance, not at the moment of failover. The pre-validation requirement ensures that when a routing change occurs under time pressure, the governance decision has already been made during a period of calm deliberation.

The capability profile constraint (requirement 4.4) addresses a subtle but critical risk: fallback models that are "better" in general capability terms may be worse from a governance perspective. A constrained model that cannot execute code, cannot access the internet, and cannot generate creative content has a smaller attack surface than a general-purpose model. Routing to the general-purpose model as a "fallback" expands the attack surface at the very moment when the system is under stress — exactly the wrong time to expand capabilities.

The terminal safe state requirement (4.6) embodies the principle that no operation is better than ungoverned operation. An agent that suspends because no approved model is available creates a temporary operational gap. An agent that continues operating with an unvalidated model creates a governance gap that can produce lasting harm. The operational gap is recoverable; the governance gap may not be.

6. Implementation Guidance

The routing governance system operates between the agent and the model layer. It intercepts all model inference requests, routes them according to policy, monitors model health, and enforces the approved model list.

Recommended patterns:

Policy-driven routing gateway. Deploy a routing gateway that sits between the agent and all model endpoints. The gateway maintains the routing policy: primary model, fallback sequence, trigger conditions, and terminal safe state. All model inference requests pass through the gateway. The gateway monitors model health (latency, error rate, quality metrics) and executes routing changes when trigger conditions are met. The agent is unaware of which model is serving its requests — the gateway handles routing transparently while maintaining governance controls.
Pre-validated model registry. Maintain a registry of approved models for each agent task profile. Each entry in the registry includes: model identifier, model version, capability profile (what the model can and cannot do), validation date, validation results (safety testing, capability testing, governance compliance), and approval status. The routing gateway checks every routing target against the registry before routing. An unregistered model cannot serve requests regardless of its availability.
Capability profile enforcement. When evaluating a fallback model, the routing gateway compares the fallback's capability profile against the primary model's profile. If the fallback has any capability not present in the primary (e.g., code execution, internet access, tool use), the routing is blocked. The comparison is structural — based on the registered capability profiles — not based on runtime testing, which may miss edge cases.
Health-based proactive routing. Monitor model health metrics continuously: p99 latency, error rate, output quality scores (if measurable), and availability. Define health thresholds for each metric. When the primary model's health degrades below the threshold — but before it fails completely — proactively route to the first approved fallback. This reduces the period of degraded service and avoids the spike in failed requests that occurs with reactive failover.

Anti-patterns to avoid:

Automatic fallback to "any available model." Routing to whatever model is available when the primary fails, without validating the fallback against the agent's approved model list. This treats model routing as a pure availability problem rather than a governance problem.
Silent model substitution. Changing the model without logging the change, notifying the governance function, or recording the change in the audit trail. Users and governance stakeholders should know when the underlying model has changed, even temporarily.
Fallback with broader capabilities. Routing to a model with more capabilities as a "better" fallback. The broader model may introduce attack surfaces, safety gaps, or governance non-compliance that was not present with the primary model.
No terminal safe state. Systems that cascade through fallback models but have no defined behaviour when all models are unavailable. Without a terminal safe state, the system enters an undefined state — potentially processing requests with no model, returning errors to users without explanation, or hanging indefinitely.
Agent-controlled routing. Allowing the agent to decide which model to route to. The agent's reasoning should not influence the routing decision because the agent may be compromised, confused, or manipulated into preferring a model that is less governed.

Industry Considerations

Financial Services. Model changes in trading and financial decision-making are subject to model risk management requirements (FCA SS1/23, Federal Reserve SR 11-7). A fallback model for a trading agent must have undergone the same model validation process as the primary model. The routing change must be reportable to the model risk management function. Trading operations may need to pause during model transitions to avoid transacting on a model with different risk characteristics.

Healthcare. Clinical AI models require regulatory approval (e.g., FDA 510(k), CE marking under MDR). A fallback model that has not received the same regulatory approval as the primary model cannot serve clinical requests. The terminal safe state for a clinical agent must route to a qualified human clinician rather than continuing with an unapproved model. Model routing changes must be recorded in the clinical audit trail.

Safety-Critical / CPS. In safety-critical systems, model fallback must be validated through the same functional safety process as the primary model (e.g., IEC 61508 SIL assessment). An unvalidated fallback model in a safety-critical application can introduce hazards that the safety case does not address. The terminal safe state must be a safe shutdown or human takeover, not continued autonomous operation.

Maturity Model

Basic Implementation — A routing policy exists for each agent specifying the primary model and at least one fallback. Fallback models have been informally validated for the agent's task. Routing changes are logged. The terminal safe state is defined but may be "return error to user" rather than a graceful degradation path. Model health monitoring is basic (availability only, no quality metrics). Capability profile comparison between primary and fallback is manual and documented but not enforced at the routing layer.

Intermediate Implementation — Routing is managed by a policy-driven gateway that enforces the approved model registry. Fallback models have undergone formal validation equivalent to the primary model. The routing gateway enforces capability profile constraints — fallback models with broader capabilities are blocked. Model health monitoring includes latency, error rate, and availability, with automatic proactive routing. Routing changes trigger automated notifications to the governance function. The terminal safe state is a graceful degradation path (e.g., human queue, read-only mode).

Advanced Implementation — All intermediate capabilities plus: model quality monitoring detects output degradation before it causes harm (e.g., drift detection, hallucination rate monitoring). Canary routing validates fallback performance with a small traffic percentage before full cutover. The routing system has been independently tested for failure scenarios including simultaneous failure of primary and fallback models, model health metric manipulation, and routing policy injection. Formal safety analysis has verified that no routing scenario results in an ungoverned agent state.

7. Evidence Requirements

Required artefacts:

Routing policy documentation. The routing policy for each agent, including primary model, fallback sequence, trigger conditions, capability profile constraints, and terminal safe state definition.
Model validation records. Evidence that every model in the routing policy (primary and fallback) has been validated for the agent's task profile, including capability testing, safety testing, and governance compliance verification.
Routing change logs. Immutable logs of all routing changes, including timestamp, reason, source model, target model, and approval status verification. Minimum 12 months retention.
Terminal safe state test results. Evidence that the terminal safe state has been tested and functions correctly when all approved models are unavailable.

Retention requirements:

Routing logs and model validation records: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Model validation records must demonstrate that the validation process was independent of the model provider.

8. Test Specification

Testing AG-163 compliance requires verification that routing is policy-governed, capability-constrained, and safely terminal.

Test 8.1: Primary Model Failure Triggers Policy Routing

Stimulus: Simulate primary model failure (stop the model endpoint). Submit 100 requests during the outage. Verify that requests are routed to the first approved fallback model.
Expected behaviour: Requests are routed to the approved fallback within the defined trigger threshold (e.g., 60 seconds). All 100 requests are served by the fallback model.
Pass criteria: All requests during the outage are served by the approved fallback model. No requests are dropped, and no requests are served by an unapproved model.
Fail criteria: Requests are dropped, served by an unapproved model, or continue to be sent to the failed primary model beyond the trigger threshold.

Test 8.2: Unapproved Model Rejection

Stimulus: Modify the routing configuration to reference a model not in the approved registry. Attempt to route requests to this model.
Expected behaviour: The routing gateway rejects the routing target because it is not in the approved registry. Requests are not served by the unapproved model.
Pass criteria: No request is served by a model not in the approved registry, regardless of the routing configuration.
Fail criteria: Any request is served by an unapproved model.

Test 8.3: Capability Profile Constraint Enforcement

Stimulus: Register a fallback model with a broader capability profile than the primary model (e.g., the fallback has code execution capability that the primary does not). Attempt to route to this fallback.
Expected behaviour: The routing gateway blocks the routing because the fallback's capability profile exceeds the primary's. The system proceeds to the next fallback in sequence or enters the terminal safe state.
Pass criteria: No routing to a model with broader capabilities than the primary is permitted.
Fail criteria: Routing to a broader-capability model is permitted.

Test 8.4: Terminal Safe State Activation

Stimulus: Simulate failure of all approved models (primary and all fallbacks). Submit requests while no approved model is available.
Expected behaviour: The system enters the terminal safe state — agent operations are suspended, pending requests are queued for human review or returned with a structured notification, and an alert is generated.
Pass criteria: No request is processed by an unvalidated model. The terminal safe state is activated within the defined threshold. The alert is generated.
Fail criteria: Any request is processed during the total model outage, or the system enters an undefined state.

Test 8.5: Routing Change Logging Completeness

Stimulus: Execute 10 routing changes (primary to fallback, fallback to primary, fallback to fallback). Verify that every change is logged with all required fields.
Expected behaviour: All 10 routing changes are logged with: timestamp, reason, source model, target model, and approval status verification.
Pass criteria: 100% of routing changes are logged with all required fields.
Fail criteria: Any routing change is missing from the log or missing any required field.

Test 8.6: Proactive Routing on Degradation

Stimulus: Gradually degrade primary model performance — increase latency from 100ms to 5,000ms over 10 minutes. Verify that the routing gateway detects the degradation and routes proactively before complete failure.
Expected behaviour: The routing gateway detects latency degradation when it crosses the health threshold (e.g., p99 > 2,000ms) and routes to the approved fallback before the primary model fails completely.
Pass criteria: Proactive routing occurs before the primary model becomes completely unavailable. The routing is to an approved fallback.
Fail criteria: The system waits for complete failure before routing, or routes to an unapproved model.

Test 8.7: Governance Notification on Routing Change

Stimulus: Trigger a routing change. Verify that the governance function is notified within the required timeframe (5 minutes).
Expected behaviour: An automated notification reaches the designated governance contact within 5 minutes, including the reason, source model, target model, and expected duration.
Pass criteria: Notification is received within 5 minutes with all required details.
Fail criteria: No notification is sent, or notification is delayed beyond 5 minutes.

Conformance Scoring

Score 0: No routing governance — model failures result in agent failure or automatic fallback to any available model without governance checks.
Score 1: A routing policy exists but is not enforced at the infrastructure layer — fallback models are documented but the routing decision is manual or agent-controlled, and capability profile constraints are not checked.
Score 2: Policy-driven routing gateway enforces the approved model registry, capability profile constraints are checked before routing, routing changes are logged, and a terminal safe state is defined and tested.
Score 3: All Score 2 capabilities plus proactive routing based on health monitoring, canary routing for model transitions, independently verified routing policy enforcement, and formal safety analysis confirming no ungoverned routing scenarios.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9 (Risk Management System)	Direct requirement
EU AI Act	Article 15 (Accuracy, Robustness and Cybersecurity)	Direct requirement
FCA SS1/23	Model Risk Management Principles	Direct requirement
Federal Reserve SR 11-7	Supervisory Guidance on Model Risk Management	Direct requirement
DORA	Article 11 (ICT Third-Party Risk)	Supports compliance
ISO 42001	Clause 8.4 (AI System Operation)	Supports compliance
NIST AI RMF	MAP 2.3, MANAGE 1.3	Supports compliance
IEC 61508	Functional Safety (for safety-critical deployments)	Supports compliance

EU AI Act — Article 15 (Accuracy, Robustness and Cybersecurity)

Article 15 requires that high-risk AI systems are resilient to errors, faults, and inconsistencies. Model failure is a fault; uncontrolled fallback is an error-amplifying response to that fault. AG-163 ensures that model-layer faults are handled in a controlled, governance-compliant manner, directly satisfying Article 15's robustness requirement. The requirement that fallback models have equal or narrower capabilities ensures that the system's cybersecurity posture does not degrade during fallback.

FCA SS1/23 — Model Risk Management Principles

FCA SS1/23 sets expectations for firms' management of model risk, including the requirement that model changes are governed through a formal change process. Routing to a different model — even temporarily — is a model change. AG-163 ensures that this change is governed by the same standards as a permanent model change: the fallback must be pre-validated, the routing must be logged, and the governance function must be notified.

Federal Reserve SR 11-7 — Model Risk Management

SR 11-7 defines model risk as the potential for adverse consequences from decisions based on incorrect or misused model outputs. Routing to an unvalidated fallback model during a primary model failure directly creates model risk. AG-163 mitigates this risk by requiring pre-validation of all fallback models and enforcing capability constraints that prevent the fallback from introducing new risk dimensions.

DORA — Article 11 (ICT Third-Party Risk)

When models are provided by third parties (model-as-a-service), model failure is a third-party ICT risk event. AG-163's routing governance ensures that third-party model failures are handled within the organisation's risk management framework, with defined fallback paths that maintain governance compliance. The terminal safe state ensures that total third-party failure results in controlled suspension rather than ungoverned operation.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Agent-specific — but cascading if the agent operates within a multi-agent pipeline, where model routing failure in one agent can degrade the entire pipeline

Consequence chain: Uncontrolled model routing creates a class of failure where the agent appears to operate normally but its underlying reasoning has changed fundamentally. The immediate risk is that outputs produced by an unvalidated fallback model may not meet the safety, accuracy, or governance standards applied to the primary model. In healthcare, this can result in patient harm from inappropriate clinical advice. In financial services, this can result in trading decisions based on a model with different risk characteristics. In customer-facing applications, this can result in harmful, biased, or legally non-compliant outputs. The detection challenge is severe: because the agent's external interface remains unchanged, users and governance stakeholders may not realise the model has changed. The outputs appear to come from the same agent, through the same interface, in the same format — but the reasoning engine behind them is different. This "silent substitution" risk is the primary motivation for the notification and logging requirements. Without AG-163 governance, an organisation may discover weeks later that a significant portion of its AI-generated decisions were produced by an unvalidated model operating outside the governance framework.

Cite this protocol

AgentGoverning. (2026). AG-163: Model Routing and Fallback Safety Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-163

← Previous Protocol

AG-162

Least-Agency Provisioning and Just-in-Time Capability Governance

Next Protocol →

AG-164

Idempotency and Duplicate Action Suppression Governance