AG-275

Policy Simulation Sandbox Governance

Policy Semantics, Rule Engine & Control Logic ~15 min read AGS v2.1 · April 2026
EU AI Act FCA NIST ISO 42001

2. Summary

Policy Simulation Sandbox Governance requires that every policy change is tested against realistic scenarios in an isolated environment before production release, and that the sandbox environment is sufficiently representative to reveal behavioural differences that would occur in production. The sandbox is not a unit test environment (covered by AG-271) or a compilation verification step (covered by AG-270) — it is a full-fidelity simulation that replays real or realistic decision traffic through the proposed policy and compares the outcomes against the current production policy. This dimension mandates that the sandbox is isolated from production (no sandbox actions can affect real-world state), that the simulation uses representative data, and that the results are analysed before the policy change is approved for production.

3. Example

Scenario A — Policy Change Triggers Unexpected Mass Rejection: A customer-facing agent serves 80,000 loan applications per month. A policy update tightens the debt-to-income threshold from 45% to 40%. The team estimates this will reject an additional 3-5% of applications. The change is deployed to production without sandbox testing. In the first week, the rejection rate increases by 22% — not the expected 3-5%. Investigation reveals that the new threshold interacts with a pre-existing rule that counts student loan obligations differently for applicants under 30. The combined effect of the two rules is far larger than the threshold change alone.

What went wrong: The policy change was tested only in isolation (unit tests for the threshold change passed). No sandbox simulation replayed recent production traffic through the new policy to measure the actual impact. The rule interaction was invisible without production-scale data. Consequence: 1,760 additional rejected applications in the first week, reputational damage from social media complaints, emergency rollback, estimated revenue loss of £440,000.

Scenario B — Sandbox Leaks to Production: An enterprise workflow agent's sandbox environment shares a database connection pool with the production environment for "efficiency." A developer tests a policy that automatically escalates high-priority tickets. The sandbox test triggers 340 escalation notifications to real managers through the shared notification service. Managers respond to the escalations, creating confusion and wasted effort.

What went wrong: The sandbox was not isolated from production systems. Shared infrastructure meant that sandbox actions had real-world effects. Consequence: 340 false escalations, 12 hours of organisational disruption, loss of trust in the governance process, policy change delayed by 3 weeks while isolation is established.

Scenario C — Sandbox Uses Stale Data and Misses Population Shift: A financial-value agent's sandbox replays decision traffic from 6 months ago to test a new credit scoring policy. In the intervening 6 months, the customer population has shifted: a marketing campaign attracted a younger demographic with lower average credit scores. The sandbox shows the new policy rejecting 8% of applicants — within acceptable bounds. In production, the new policy rejects 14% because the current population has lower credit scores than the 6-month-old test data.

What went wrong: The sandbox used stale data that was not representative of the current production population. The simulation results were misleadingly optimistic. Consequence: Rejection rate 75% higher than predicted, customer complaints, regulatory scrutiny for potential fair lending impact on younger demographic.

4. Requirement Statement

Scope: This dimension applies to all AI agents governed by policy rules where policy changes can affect production decision outcomes. Any system where a policy change could alter the outcome of a decision for a real customer, counterparty, or operational process is within scope. Systems where policy changes have no effect on decision outcomes (e.g., cosmetic changes to policy documentation) are excluded. The scope extends to all types of policy changes: new rules, modified rules, removed rules, threshold changes, precedence changes, and jurisdiction-specific variants.

4.1. A conforming system MUST provide a sandbox environment where proposed policy changes can be evaluated against realistic decision traffic before production activation.

4.2. A conforming system MUST ensure complete isolation between the sandbox and production — no sandbox action can affect real-world state, including: no external API calls, no database writes to production data stores, no notifications to real users, and no financial transactions.

4.3. A conforming system MUST use representative data in the sandbox that reflects the current production population and decision distribution, with a maximum staleness of 30 days for the underlying data set.

4.4. A conforming system MUST produce a quantitative impact report comparing sandbox results under the proposed policy against results under the current production policy, showing: total decisions affected, percentage of decisions with changed outcomes, distribution of changes by category, and any new rule interactions detected.

4.5. A conforming system MUST require that the impact report is reviewed and approved by an authorised policy owner before the change is activated in production.

4.6. A conforming system SHOULD replay at least 10,000 recent production decisions (or the full decision volume if fewer than 10,000 in the reference period) through the sandbox to achieve statistical significance.

4.7. A conforming system SHOULD detect and highlight rule interactions in the sandbox that do not appear in unit tests — cases where the outcome changes only because of how the modified rule interacts with other rules.

4.8. A conforming system SHOULD provide a diff view showing, for each decision that changes, the old outcome, the new outcome, and the rules responsible for the change.

4.9. A conforming system MAY implement continuous sandbox shadowing — running the proposed policy in parallel with production (without affecting outcomes) and comparing results in real time for a defined observation period.

5. Rationale

Unit tests (AG-271) verify that individual rules and their interactions work correctly for defined test inputs. Compilation verification (AG-270) confirms that compiled rules match the source policy. Neither reveals the production-scale impact of a policy change — how many real decisions will change, which populations will be affected, and whether rule interactions at production scale produce unexpected emergent effects.

The sandbox addresses this gap by testing the proposed policy against realistic, representative decision traffic. The key insight is that policy changes are not like software deployments where the primary risk is a crash or error. Policy changes produce valid, well-formed decisions — but the decisions may be wrong or may have unacceptable distributional impact. A tighter threshold produces more rejections. A new rule interacts with existing rules in unexpected ways. These effects are invisible to unit tests because unit tests exercise predetermined inputs, not the actual distribution of inputs the system will encounter.

The isolation requirement (4.2) exists because sandbox leakage is a common and damaging failure. A sandbox that shares any infrastructure with production (notification services, payment gateways, communication channels) will eventually cause real-world effects during testing. The consequences range from annoyance (false notifications) to catastrophic (real payments processed under test policy). True isolation means the sandbox has its own copy of every downstream service, or uses mock services that discard all outputs.

The data freshness requirement (4.3) addresses the stale data problem in Scenario C. A sandbox that replays 6-month-old data will miss population shifts, seasonal patterns, and changes in the input distribution. The 30-day maximum staleness ensures that the simulation reflects approximately current conditions, while allowing time for data preparation and anonymisation.

6. Implementation Guidance

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. The FCA expects firms to test changes to automated decision systems before deployment. For lending decisions, sandbox testing should include fair lending analysis: does the proposed policy change disproportionately affect any protected characteristic group? Replay data should be stratified by demographic attributes to detect disparate impact. The PRA's SS1/23 expects pre-deployment validation of model changes, which extends to policy rule changes.

Healthcare. Clinical decision support policy changes should be simulated against patient populations to measure the clinical impact: how many patients would receive different recommendations, and what is the clinical significance? Clinical safety review should be informed by sandbox results, not just rule-level analysis.

Critical Infrastructure. Safety-critical policy changes should be simulated against historical operational data including alarm conditions, fault scenarios, and boundary conditions. The simulation should include stress scenarios that test the policy under adverse conditions, not just normal operations.

Maturity Model

Basic Implementation — A sandbox environment exists, isolated from production. Policy changes are replayed against a sample of recent decision data before deployment. A quantitative impact report is generated and reviewed. The sandbox uses mock downstream services. Data is refreshed at least monthly.

Intermediate Implementation — Production traffic replay uses at least 10,000 decisions with full anonymisation. The impact dashboard includes population segment breakdowns. Anomalous rule interactions are flagged automatically. Continuous shadow mode is available for high-impact changes. The impact report is formally approved by a policy owner before activation. Fair lending / equalities impact analysis is included for customer-facing decisions.

Advanced Implementation — All intermediate capabilities plus: continuous shadow mode for all policy changes with a minimum 48-hour observation period. Real-time comparison between shadow and production outcomes. Automated regression detection flags when the shadow policy produces worse outcomes on any monitored metric. The sandbox environment is independently verified for isolation completeness. Formal approval workflow requires sign-off from compliance, risk, and the policy owner before activation.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Sandbox Isolation Verification

Test 8.2: Impact Prediction Accuracy

Test 8.3: Mandatory Simulation Gate

Test 8.4: Data Freshness Enforcement

Test 8.5: Anomalous Interaction Detection

Test 8.6: Approval Workflow Enforcement

Test 8.7: Shadow Mode Accuracy

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 9 (Risk Management System)Direct requirement
EU AI ActArticle 15 (Accuracy, Robustness, Cybersecurity)Supports compliance
FCA SYSC6.1.1R (Systems and Controls)Direct requirement
PRA SS1/23Model Risk Management — Pre-deployment ValidationDirect requirement
NIST AI RMFMEASURE 2.5, MANAGE 2.2Supports compliance
ISO 42001Clause 8.4 (AI System Development)Supports compliance
Equality Act 2010 / ECOADisparate Impact TestingSupports compliance

EU AI Act — Article 9 (Risk Management System)

Article 9 requires that risk management measures be tested "with a view to identifying the most appropriate risk management measures." Sandbox simulation is the testing mechanism for policy changes — it reveals whether a proposed policy change achieves its intended effect without unacceptable side effects. The requirement to test against realistic scenarios (not just unit tests) maps directly to the sandbox's purpose.

PRA SS1/23 — Pre-deployment Validation

The PRA's supervisory statement expects firms to validate model changes before deployment. For policy rule changes, the sandbox provides the pre-deployment validation environment. The PRA expects validation to use representative data and to measure the quantitative impact of the change. Sandbox simulation directly implements this expectation.

Equality Act 2010 / ECOA — Disparate Impact Testing

For customer-facing decisions, policy changes must be tested for disparate impact on protected characteristic groups. The sandbox provides the environment for this testing by replaying production decisions (stratified by demographic attributes) through the proposed policy. A policy change that disproportionately affects a protected group triggers fair lending or equalities review.

10. Failure Severity

FieldValue
Severity RatingHigh
Blast RadiusAll decisions affected by the untested policy change — potentially all production decisions

Consequence chain: Without sandbox simulation, policy changes are deployed based on the policy author's judgment and unit test results. The immediate technical failure is a policy change that produces unexpected effects at production scale — higher rejection rates, disparate impact on population segments, or emergent rule interactions. The operational impact is felt immediately: in Scenario A, 1,760 additional rejections in the first week. The business consequence includes customer harm (legitimate applicants rejected), reputational damage (social media complaints), and revenue loss (the organisation estimated £440,000 in lost revenue from the over-rejection). The regulatory consequence depends on the nature of the unexpected effect: if the policy change disproportionately affects a protected group, the consequence is an equality or fair lending investigation; if the change causes systematic non-compliance, the consequence is a regulatory enforcement action. If the sandbox leaks to production (Scenario B), the consequence is operational disruption and loss of trust in the governance process. The compounding factor is that policy changes without sandbox testing accumulate: each untested change adds risk, and the eventual production failure may result from the interaction of multiple untested changes.

Cross-references: AG-270 (Policy Compilation Verification Governance) verifies that compiled policy matches the source; AG-275 extends verification to production-scale impact. AG-271 (Rule-Test Coverage Governance) provides unit-level test coverage; AG-275 complements this with integration-level testing against realistic data. AG-269 (Policy Version Pinning Governance) ensures that sandbox results are linked to specific policy versions. AG-273 (Temporal Policy Trigger Governance) introduces time-dependent policy changes that should be simulated in the sandbox. AG-274 (Geographic Policy Trigger Governance) introduces jurisdiction-specific variants that require jurisdiction-stratified sandbox testing. AG-278 (Policy Hot-Patch Rollback Governance) provides the rollback mechanism when sandbox testing is bypassed for emergency changes. AG-134 (Machine-Checkable Policy Semantics) enables automated comparison of outcomes. AG-138 (High-Assurance Invariant Verification) provides formal verification of sandbox isolation properties. AG-007 (Governance Configuration Control) governs changes to the sandbox environment configuration. AG-136 (Independent Control-Plane Separation) supports the isolation requirement.

Cite this protocol
AgentGoverning. (2026). AG-275: Policy Simulation Sandbox Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-275