Policy Compilation Verification Governance requires that the machine-executable policy artefact (compiled rules, decision tables, executable rule engine configuration) is verified to faithfully represent the approved human-readable policy text before it is activated for production decision-making. The compilation step — where human-authored policy is translated into machine-executable form — is a critical transformation that can introduce semantic drift, omission, inversion, or unintended side effects. This dimension mandates that the compilation output is independently verified against the source policy through formal equivalence checking, differential testing, or structured review, and that no compiled policy enters production without passing verification.
Scenario A — Negation Inversion During Policy Translation: An organisation's compliance team approves a policy rule: "Customers with a credit score below 580 MUST NOT be offered unsecured lending products." A developer translates this into the rule engine as: if credit_score < 580 then offer_unsecured = true. The negation is inverted — the compiled rule offers unsecured products to exactly the population that should be excluded. The rule engine processes 3,200 applications over two weeks before a manual audit discovers the error. 847 customers with credit scores below 580 received unsecured lending offers.
What went wrong: The translation from human policy to machine rule introduced a logical inversion. No verification step compared the compiled rule's behaviour against the source policy's intent. Consequence: 847 non-compliant lending offers, potential FCA enforcement for irresponsible lending, estimated redress cost of £1.8 million, reputational damage.
Scenario B — Threshold Precision Loss in Rule Compilation: A healthcare policy states: "Patients with eGFR below 30 mL/min/1.73m² require specialist referral before prescribing metformin." The developer implements the threshold as an integer comparison: if eGFR < 30. A patient with eGFR of 29.7 mL/min/1.73m² (which rounds to 30 in the display) is not flagged for referral because the rule engine truncates the float to 29, which triggers the rule — but the display shows "30" to the clinician, who believes the referral was not required. In a different edge case, a patient with eGFR of 30.0 receives a referral when the policy threshold is "below 30" (strictly less than), not "at or below 30."
What went wrong: The compilation introduced ambiguity around the boundary condition. The source policy specified "below 30" but the implementation's type handling created inconsistent boundary behaviour depending on whether the value was displayed as rounded or processed as raw. Consequence: Clinical safety incident, potential harm to patients, mandatory incident report to regulatory body, trust suspension pending review.
Scenario C — Silent Rule Omission During Bulk Policy Migration: An organisation migrates 340 policy rules from an Excel-based rule matrix to a new rule engine. The migration script successfully imports 338 rules. Two rules — governing maximum exposure limits for cryptocurrency counterparties — fail silently due to a special character in the rule description field. The absence is not detected because the migration validation checks only that imported rules are syntactically valid, not that the set of imported rules matches the set of source rules. The missing rules are discovered 8 months later during annual policy review.
What went wrong: The migration process validated the syntactic correctness of imported rules but did not verify completeness — that every source rule had a corresponding compiled rule. Consequence: 8 months of uncontrolled cryptocurrency counterparty exposure, regulatory finding for inadequate systems and controls, £340,000 in excess exposure identified during remediation.
Scope: This dimension applies to all AI agents whose behaviour is governed by policy rules that undergo a compilation, translation, or transformation step between the human-approved form and the machine-executable form. This includes: manual coding of policy rules into a rule engine, automated generation of decision tables from policy documents, transpilation of policy DSLs into executable code, migration of rules between systems, and any process where the human-readable policy and the machine-executable policy are not the same artefact. Systems where policy is directly authored in its executable form (e.g., a rule engine with a visual editor that is also the approval artefact) are within scope for completeness verification but may have reduced equivalence risk. The scope extends to policy updates — every change to compiled policy must be re-verified, not just the initial compilation.
4.1. A conforming system MUST verify that the machine-executable policy artefact semantically matches the approved human-readable policy before the compiled version is activated for production use.
4.2. A conforming system MUST verify completeness — that every rule in the approved policy has a corresponding rule in the compiled artefact, and that no compiled rule exists without a corresponding approved rule.
4.3. A conforming system MUST verify boundary conditions — that threshold values, comparison operators (less than vs. less than or equal to), and type handling (integer vs. float, inclusive vs. exclusive ranges) match the approved policy specification.
4.4. A conforming system MUST block activation of any compiled policy that has not passed verification, defaulting to the previously verified version.
4.5. A conforming system MUST retain the verification evidence (test results, equivalence proof, or review record) alongside the compiled artefact's version identifier per AG-269.
4.6. A conforming system SHOULD implement automated verification using differential testing: generate a set of test inputs (including boundary cases), evaluate them against both the source policy specification and the compiled artefact, and compare outputs.
4.7. A conforming system SHOULD implement a bidirectional traceability matrix mapping each approved rule to its compiled implementation and vice versa.
4.8. A conforming system MAY implement formal equivalence checking where the policy language and rule engine support it, providing mathematical proof that the compilation preserves semantics.
The compilation step — translating human policy into machine-executable form — is one of the highest-risk transformations in the AI governance pipeline. Unlike software bugs that may affect edge cases, a compilation error in policy can systematically affect every decision the agent makes. A negation inversion affects 100% of relevant decisions. A threshold error affects all boundary cases. An omitted rule removes a control entirely.
The risk is amplified by the fact that policy compilation errors are often invisible in normal operation. The agent continues to make decisions that appear well-formed — they have the right structure, they reference the right data fields, they produce outputs in the expected format. Only the decision logic is wrong. This means compilation errors tend to persist until they are discovered by external events: a customer complaint, a regulatory audit, or an incident investigation.
Human review alone is insufficient for verification because the compiled artefact is typically in a different language and representation from the source policy. A compliance officer who approved the English-language policy may not be able to verify that the Rego, Drools, or Python implementation faithfully represents it. Automated verification — through differential testing, formal equivalence, or structured test suites — provides assurance that does not depend on a single reviewer's ability to read two different representations simultaneously.
The completeness requirement (4.2) addresses a particularly insidious failure mode: the silent omission. When a rule fails to compile or migrate, the absence is invisible unless the verification process explicitly checks for it. A system that validates only that compiled rules are correct but does not verify that all rules are present will pass verification with missing rules.
Recommended patterns:
Anti-patterns to avoid:
Financial Services. Regulatory rules (e.g., MiFID II suitability, PSD2 strong customer authentication, anti-money laundering thresholds) are translated from regulatory text to executable rules. Compilation errors in these rules directly violate regulatory requirements. The FCA expects firms to demonstrate that automated controls faithfully implement the regulatory obligation. A compilation error that causes systematic non-compliance is a control failure, not a policy failure.
Healthcare. Clinical decision support rules are derived from clinical guidelines (e.g., NICE guidelines, FDA-approved prescribing information). A compilation error in a dosage calculation or contraindication check can directly harm patients. The verification requirement should include clinical validation by a qualified clinician in addition to automated testing, per established clinical safety standards (e.g., DCB 0129 in the UK).
Critical Infrastructure. Safety-critical policy parameters (e.g., pressure limits, temperature thresholds, flow rate boundaries) are compiled from safety cases into operational control logic. Compilation errors can create conditions outside the safety envelope. Verification should include hardware-in-the-loop testing against the actual control system, not just simulation.
Basic Implementation — Policy compilation is performed by a developer and reviewed by a second developer. A manual traceability matrix maps source rules to compiled rules. Basic functional tests verify that the compiled rules produce expected outputs for a sample of inputs. Verification results are documented in a review record.
Intermediate Implementation — An automated differential test suite generates boundary-value test inputs for every rule and compares compiled outputs against a reference specification. A bidirectional traceability matrix is maintained in a structured format (not a spreadsheet) and checked automatically for completeness. The compilation pipeline blocks deployment on verification failure. Verification artefacts are stored alongside the compiled version's content hash.
Advanced Implementation — All intermediate capabilities plus: formal equivalence checking where the policy language supports it. Mutation testing to verify that the test suite detects injected compilation errors (e.g., inverting a negation, shifting a threshold by 1). Continuous verification — the compiled policy is re-verified against the source specification on a scheduled basis (not just at compilation time) to detect drift. Independent third-party verification for safety-critical and regulatory-critical policy sets.
Required artefacts:
Retention requirements:
Access requirements:
Test 8.1: Negation Inversion Detection
!= to ==). Run the verification pipeline.Test 8.2: Threshold Boundary Accuracy
Test 8.3: Completeness Verification
Test 8.4: Unauthorised Rule Detection
Test 8.5: Verification Gate Enforcement
Test 8.6: Re-verification on Change
Test 8.7: Type Handling Consistency
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 9 (Risk Management System) | Direct requirement |
| EU AI Act | Article 17 (Quality Management System) | Direct requirement |
| FCA SYSC | 6.1.1R (Systems and Controls) | Direct requirement |
| SOX | Section 404 (Internal Controls) | Supports compliance |
| NIST AI RMF | MAP 2.3, MANAGE 2.2 | Supports compliance |
| ISO 42001 | Clause 8.4 (AI System Development) | Supports compliance |
| IEC 62443 | SR 3.4 (Software and Information Integrity) | Supports compliance |
Article 9 requires risk management measures to be tested to ensure they work as intended. Policy compilation verification is a risk management measure that ensures the translation from approved policy to executable rules does not introduce errors. The requirement for testing risk management measures maps directly to the automated verification requirement — the organisation must demonstrate that its compilation verification process detects the types of errors it is designed to catch.
Article 17 requires a quality management system covering the development process for high-risk AI systems. Policy compilation is a development activity. Compilation verification is a quality control within that development process. The quality management system must include procedures for ensuring that policy rules are correctly translated, tested, and traceable to their approved source.
The FCA expects firms to maintain controls sufficient to ensure compliance. For AI agents governed by compiled policy rules, this means demonstrating that the compiled rules faithfully implement the approved policy. A compilation error that causes systematic non-compliance is a control failure. The FCA would expect to see evidence of verification testing, traceability, and gate enforcement.
For AI agents executing financial operations, the compilation of policy rules into executable form is an internal control activity. SOX Section 404 requires management to assess the effectiveness of internal controls. Compilation verification provides the evidence that this specific control — the translation of policy to executable form — is effective.
Clause 8.4 addresses the development and modification of AI systems. Policy compilation is a system modification activity. Compilation verification implements the quality check required for modifications to AI system behaviour, ensuring that changes to policy rules are correctly reflected in system behaviour.
| Field | Value |
|---|---|
| Severity Rating | Critical |
| Blast Radius | All decisions governed by the incorrectly compiled policy — potentially organisation-wide |
Consequence chain: A compilation error in policy creates a systematic failure affecting every decision that invokes the affected rule. Unlike random errors that affect individual decisions, a compilation error is deterministic — it produces the wrong outcome every time the conditions are met. A negation inversion in a lending eligibility rule affects every applicant in the affected category. A threshold error in a safety parameter affects every operating condition near the boundary. The immediate technical failure is divergence between the approved policy and the executed policy. The operational impact is systematic non-compliance or systematic safety violation at machine speed — 50,000 decisions per day applying the wrong rule. The regulatory consequence depends on the domain: in financial services, systematic non-compliance with lending rules can trigger redress requirements for all affected customers (the FCA's 2022 redress orders for systematic consumer credit failures averaged £12.4 million); in healthcare, clinical decision support errors can cause direct patient harm and trigger mandatory safety incident reporting; in safety-critical domains, incorrect operating parameters can cause physical damage or loss of life. The failure is amplified by the fact that compilation errors are invisible in normal operation — the system appears to function correctly while systematically applying the wrong rules.
Cross-references: AG-134 (Machine-Checkable Policy Semantics) provides the formal policy language that enables automated compilation verification. AG-269 (Policy Version Pinning Governance) ensures that the verified compiled artefact is immutably identified so that verification results remain valid. AG-135 (Policy Precedence and Conflict Arbitration) defines precedence rules that the compilation must preserve. AG-271 (Rule-Test Coverage Governance) extends compilation verification to ongoing test coverage of the deployed rule set. AG-275 (Policy Simulation Sandbox Governance) provides the environment for pre-production verification testing. AG-138 (High-Assurance Invariant Verification) provides formal verification techniques applicable to critical compilation verification. AG-007 (Governance Configuration Control) governs the change control process for compiled policy artefacts.