AG-270

Policy Compilation Verification Governance

Policy Semantics, Rule Engine & Control Logic ~15 min read AGS v2.1 · April 2026
EU AI Act SOX FCA NIST ISO 42001

2. Summary

Policy Compilation Verification Governance requires that the machine-executable policy artefact (compiled rules, decision tables, executable rule engine configuration) is verified to faithfully represent the approved human-readable policy text before it is activated for production decision-making. The compilation step — where human-authored policy is translated into machine-executable form — is a critical transformation that can introduce semantic drift, omission, inversion, or unintended side effects. This dimension mandates that the compilation output is independently verified against the source policy through formal equivalence checking, differential testing, or structured review, and that no compiled policy enters production without passing verification.

3. Example

Scenario A — Negation Inversion During Policy Translation: An organisation's compliance team approves a policy rule: "Customers with a credit score below 580 MUST NOT be offered unsecured lending products." A developer translates this into the rule engine as: if credit_score < 580 then offer_unsecured = true. The negation is inverted — the compiled rule offers unsecured products to exactly the population that should be excluded. The rule engine processes 3,200 applications over two weeks before a manual audit discovers the error. 847 customers with credit scores below 580 received unsecured lending offers.

What went wrong: The translation from human policy to machine rule introduced a logical inversion. No verification step compared the compiled rule's behaviour against the source policy's intent. Consequence: 847 non-compliant lending offers, potential FCA enforcement for irresponsible lending, estimated redress cost of £1.8 million, reputational damage.

Scenario B — Threshold Precision Loss in Rule Compilation: A healthcare policy states: "Patients with eGFR below 30 mL/min/1.73m² require specialist referral before prescribing metformin." The developer implements the threshold as an integer comparison: if eGFR < 30. A patient with eGFR of 29.7 mL/min/1.73m² (which rounds to 30 in the display) is not flagged for referral because the rule engine truncates the float to 29, which triggers the rule — but the display shows "30" to the clinician, who believes the referral was not required. In a different edge case, a patient with eGFR of 30.0 receives a referral when the policy threshold is "below 30" (strictly less than), not "at or below 30."

What went wrong: The compilation introduced ambiguity around the boundary condition. The source policy specified "below 30" but the implementation's type handling created inconsistent boundary behaviour depending on whether the value was displayed as rounded or processed as raw. Consequence: Clinical safety incident, potential harm to patients, mandatory incident report to regulatory body, trust suspension pending review.

Scenario C — Silent Rule Omission During Bulk Policy Migration: An organisation migrates 340 policy rules from an Excel-based rule matrix to a new rule engine. The migration script successfully imports 338 rules. Two rules — governing maximum exposure limits for cryptocurrency counterparties — fail silently due to a special character in the rule description field. The absence is not detected because the migration validation checks only that imported rules are syntactically valid, not that the set of imported rules matches the set of source rules. The missing rules are discovered 8 months later during annual policy review.

What went wrong: The migration process validated the syntactic correctness of imported rules but did not verify completeness — that every source rule had a corresponding compiled rule. Consequence: 8 months of uncontrolled cryptocurrency counterparty exposure, regulatory finding for inadequate systems and controls, £340,000 in excess exposure identified during remediation.

4. Requirement Statement

Scope: This dimension applies to all AI agents whose behaviour is governed by policy rules that undergo a compilation, translation, or transformation step between the human-approved form and the machine-executable form. This includes: manual coding of policy rules into a rule engine, automated generation of decision tables from policy documents, transpilation of policy DSLs into executable code, migration of rules between systems, and any process where the human-readable policy and the machine-executable policy are not the same artefact. Systems where policy is directly authored in its executable form (e.g., a rule engine with a visual editor that is also the approval artefact) are within scope for completeness verification but may have reduced equivalence risk. The scope extends to policy updates — every change to compiled policy must be re-verified, not just the initial compilation.

4.1. A conforming system MUST verify that the machine-executable policy artefact semantically matches the approved human-readable policy before the compiled version is activated for production use.

4.2. A conforming system MUST verify completeness — that every rule in the approved policy has a corresponding rule in the compiled artefact, and that no compiled rule exists without a corresponding approved rule.

4.3. A conforming system MUST verify boundary conditions — that threshold values, comparison operators (less than vs. less than or equal to), and type handling (integer vs. float, inclusive vs. exclusive ranges) match the approved policy specification.

4.4. A conforming system MUST block activation of any compiled policy that has not passed verification, defaulting to the previously verified version.

4.5. A conforming system MUST retain the verification evidence (test results, equivalence proof, or review record) alongside the compiled artefact's version identifier per AG-269.

4.6. A conforming system SHOULD implement automated verification using differential testing: generate a set of test inputs (including boundary cases), evaluate them against both the source policy specification and the compiled artefact, and compare outputs.

4.7. A conforming system SHOULD implement a bidirectional traceability matrix mapping each approved rule to its compiled implementation and vice versa.

4.8. A conforming system MAY implement formal equivalence checking where the policy language and rule engine support it, providing mathematical proof that the compilation preserves semantics.

5. Rationale

The compilation step — translating human policy into machine-executable form — is one of the highest-risk transformations in the AI governance pipeline. Unlike software bugs that may affect edge cases, a compilation error in policy can systematically affect every decision the agent makes. A negation inversion affects 100% of relevant decisions. A threshold error affects all boundary cases. An omitted rule removes a control entirely.

The risk is amplified by the fact that policy compilation errors are often invisible in normal operation. The agent continues to make decisions that appear well-formed — they have the right structure, they reference the right data fields, they produce outputs in the expected format. Only the decision logic is wrong. This means compilation errors tend to persist until they are discovered by external events: a customer complaint, a regulatory audit, or an incident investigation.

Human review alone is insufficient for verification because the compiled artefact is typically in a different language and representation from the source policy. A compliance officer who approved the English-language policy may not be able to verify that the Rego, Drools, or Python implementation faithfully represents it. Automated verification — through differential testing, formal equivalence, or structured test suites — provides assurance that does not depend on a single reviewer's ability to read two different representations simultaneously.

The completeness requirement (4.2) addresses a particularly insidious failure mode: the silent omission. When a rule fails to compile or migrate, the absence is invisible unless the verification process explicitly checks for it. A system that validates only that compiled rules are correct but does not verify that all rules are present will pass verification with missing rules.

6. Implementation Guidance

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. Regulatory rules (e.g., MiFID II suitability, PSD2 strong customer authentication, anti-money laundering thresholds) are translated from regulatory text to executable rules. Compilation errors in these rules directly violate regulatory requirements. The FCA expects firms to demonstrate that automated controls faithfully implement the regulatory obligation. A compilation error that causes systematic non-compliance is a control failure, not a policy failure.

Healthcare. Clinical decision support rules are derived from clinical guidelines (e.g., NICE guidelines, FDA-approved prescribing information). A compilation error in a dosage calculation or contraindication check can directly harm patients. The verification requirement should include clinical validation by a qualified clinician in addition to automated testing, per established clinical safety standards (e.g., DCB 0129 in the UK).

Critical Infrastructure. Safety-critical policy parameters (e.g., pressure limits, temperature thresholds, flow rate boundaries) are compiled from safety cases into operational control logic. Compilation errors can create conditions outside the safety envelope. Verification should include hardware-in-the-loop testing against the actual control system, not just simulation.

Maturity Model

Basic Implementation — Policy compilation is performed by a developer and reviewed by a second developer. A manual traceability matrix maps source rules to compiled rules. Basic functional tests verify that the compiled rules produce expected outputs for a sample of inputs. Verification results are documented in a review record.

Intermediate Implementation — An automated differential test suite generates boundary-value test inputs for every rule and compares compiled outputs against a reference specification. A bidirectional traceability matrix is maintained in a structured format (not a spreadsheet) and checked automatically for completeness. The compilation pipeline blocks deployment on verification failure. Verification artefacts are stored alongside the compiled version's content hash.

Advanced Implementation — All intermediate capabilities plus: formal equivalence checking where the policy language supports it. Mutation testing to verify that the test suite detects injected compilation errors (e.g., inverting a negation, shifting a threshold by 1). Continuous verification — the compiled policy is re-verified against the source specification on a scheduled basis (not just at compilation time) to detect drift. Independent third-party verification for safety-critical and regulatory-critical policy sets.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Negation Inversion Detection

Test 8.2: Threshold Boundary Accuracy

Test 8.3: Completeness Verification

Test 8.4: Unauthorised Rule Detection

Test 8.5: Verification Gate Enforcement

Test 8.6: Re-verification on Change

Test 8.7: Type Handling Consistency

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 9 (Risk Management System)Direct requirement
EU AI ActArticle 17 (Quality Management System)Direct requirement
FCA SYSC6.1.1R (Systems and Controls)Direct requirement
SOXSection 404 (Internal Controls)Supports compliance
NIST AI RMFMAP 2.3, MANAGE 2.2Supports compliance
ISO 42001Clause 8.4 (AI System Development)Supports compliance
IEC 62443SR 3.4 (Software and Information Integrity)Supports compliance

EU AI Act — Article 9 (Risk Management System)

Article 9 requires risk management measures to be tested to ensure they work as intended. Policy compilation verification is a risk management measure that ensures the translation from approved policy to executable rules does not introduce errors. The requirement for testing risk management measures maps directly to the automated verification requirement — the organisation must demonstrate that its compilation verification process detects the types of errors it is designed to catch.

EU AI Act — Article 17 (Quality Management System)

Article 17 requires a quality management system covering the development process for high-risk AI systems. Policy compilation is a development activity. Compilation verification is a quality control within that development process. The quality management system must include procedures for ensuring that policy rules are correctly translated, tested, and traceable to their approved source.

FCA SYSC — 6.1.1R (Systems and Controls)

The FCA expects firms to maintain controls sufficient to ensure compliance. For AI agents governed by compiled policy rules, this means demonstrating that the compiled rules faithfully implement the approved policy. A compilation error that causes systematic non-compliance is a control failure. The FCA would expect to see evidence of verification testing, traceability, and gate enforcement.

SOX — Section 404 (Internal Controls)

For AI agents executing financial operations, the compilation of policy rules into executable form is an internal control activity. SOX Section 404 requires management to assess the effectiveness of internal controls. Compilation verification provides the evidence that this specific control — the translation of policy to executable form — is effective.

ISO 42001 — Clause 8.4 (AI System Development)

Clause 8.4 addresses the development and modification of AI systems. Policy compilation is a system modification activity. Compilation verification implements the quality check required for modifications to AI system behaviour, ensuring that changes to policy rules are correctly reflected in system behaviour.

10. Failure Severity

FieldValue
Severity RatingCritical
Blast RadiusAll decisions governed by the incorrectly compiled policy — potentially organisation-wide

Consequence chain: A compilation error in policy creates a systematic failure affecting every decision that invokes the affected rule. Unlike random errors that affect individual decisions, a compilation error is deterministic — it produces the wrong outcome every time the conditions are met. A negation inversion in a lending eligibility rule affects every applicant in the affected category. A threshold error in a safety parameter affects every operating condition near the boundary. The immediate technical failure is divergence between the approved policy and the executed policy. The operational impact is systematic non-compliance or systematic safety violation at machine speed — 50,000 decisions per day applying the wrong rule. The regulatory consequence depends on the domain: in financial services, systematic non-compliance with lending rules can trigger redress requirements for all affected customers (the FCA's 2022 redress orders for systematic consumer credit failures averaged £12.4 million); in healthcare, clinical decision support errors can cause direct patient harm and trigger mandatory safety incident reporting; in safety-critical domains, incorrect operating parameters can cause physical damage or loss of life. The failure is amplified by the fact that compilation errors are invisible in normal operation — the system appears to function correctly while systematically applying the wrong rules.

Cross-references: AG-134 (Machine-Checkable Policy Semantics) provides the formal policy language that enables automated compilation verification. AG-269 (Policy Version Pinning Governance) ensures that the verified compiled artefact is immutably identified so that verification results remain valid. AG-135 (Policy Precedence and Conflict Arbitration) defines precedence rules that the compilation must preserve. AG-271 (Rule-Test Coverage Governance) extends compilation verification to ongoing test coverage of the deployed rule set. AG-275 (Policy Simulation Sandbox Governance) provides the environment for pre-production verification testing. AG-138 (High-Assurance Invariant Verification) provides formal verification techniques applicable to critical compilation verification. AG-007 (Governance Configuration Control) governs the change control process for compiled policy artefacts.

Cite this protocol
AgentGoverning. (2026). AG-270: Policy Compilation Verification Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-270