AG-138: High-Assurance Invariant Verification Governance

2. Summary

High-Assurance Invariant Verification Governance requires that the most critical governance properties — safety invariants, regulatory hard limits, and irreversible-action constraints — are verified through formal methods that provide mathematical proof, not merely tested through empirical observation. Testing demonstrates that a system behaves correctly for the tested inputs; formal verification demonstrates that the system behaves correctly for all possible inputs within the formal model. A conforming system must identify its critical governance invariants, express them in a formally verifiable specification, and apply model checking, theorem proving, abstract interpretation, or equivalent formal methods to verify that the governance enforcement mechanism satisfies these invariants under all reachable states. This is the highest assurance level in the governance stack — the point where governance moves from "we have tested this thoroughly" to "we have proved this mathematically."

3. Example

Scenario A — Exhaustive Testing Misses Corner Case in Aggregate Limit: An organisation tests its governance enforcement engine with 50,000 test cases covering boundary values, concurrency scenarios, and adversarial inputs. All tests pass. The engine enforces a daily aggregate transaction limit of £1,000,000. In production, an agent submits a sequence of transactions in a specific currency pair (JPY/GBP) where the exchange rate lookup returns a null value due to a weekend market closure. The enforcement engine's aggregate calculation treats the null exchange rate as zero, computing the GBP-equivalent value of the JPY transaction as £0. The transaction is approved without incrementing the aggregate counter. The agent submits 500 JPY transactions totalling ¥12,000,000,000 (approximately £62,000,000) on a Saturday, none of which increment the aggregate counter. Monday review reveals the exposure.

What went wrong: Testing covered 50,000 scenarios but did not cover the specific interaction between a null exchange rate and the aggregate calculation. Empirical testing, no matter how extensive, can only cover a finite number of input combinations. The state space of a governance enforcement engine — considering all possible action types, parameter values, currency combinations, timing conditions, and environmental states — is effectively infinite. Testing samples from this space; formal verification covers it. Consequence: £62,000,000 in untracked aggregate exposure, FCA investigation, material risk management failure.

Scenario B — Policy Composition Introduces Undetected Permission Escalation: An organisation deploys three governance policies: Policy A restricts the agent to counterparties in the "approved" list. Policy B permits transactions with any counterparty if the amount is below £100 and the purpose is "testing." Policy C permits elevated limits (up to £500,000) for counterparties in the "strategic partner" list. Each policy individually is correct and has been tested. However, the composition of B and C creates an unintended interaction: an agent can submit a £99 "testing" transaction to any counterparty (permitted by B, no counterparty restriction), and then — because the counterparty now appears in the transaction history — a subsequent policy update adds the counterparty to the "strategic partner" list through an automated process triggered by transaction history. Once on the strategic partner list, C permits £500,000 transactions. The path from "any counterparty" to "£500,000 transactions" is: one £99 test transaction, one automated list update, one £500,000 transaction.

What went wrong: Each policy was correct in isolation but their composition created a reachability path from an unrestricted state to a high-value state. Testing each policy independently did not reveal the interaction. Testing the composition would require enumerating all possible sequences of transactions and automated processes — a combinatorial explosion that empirical testing cannot cover. Formal verification of the invariant "no transaction exceeding £10,000 shall be permitted with a counterparty not on the original approved list" would have detected the reachability path through model checking. Consequence: £500,000 transaction with an unapproved counterparty, compliance failure, governance architecture redesign.

Scenario C — Integer Overflow in Enforcement Engine Bypasses Maximum Limit: A governance enforcement engine stores transaction values as 32-bit unsigned integers in the smallest currency unit (pence). The maximum representable value is 4,294,967,295 pence (approximately £42,949,672.95). The mandate specifies a maximum transaction value of £50,000,000 (5,000,000,000 pence). This value exceeds the 32-bit integer range. When loaded, the value wraps around to 705,032,704 pence (approximately £7,050,327.04). The enforcement engine faithfully enforces the wrapped limit — £7,050,327.04 instead of £50,000,000. This is a stricter limit than intended, which might seem safe — until the organisation notices and "fixes" it by changing the mandate to use a different representation, inadvertently introducing a second overflow that sets the effective limit to £42,949,672.95 with no aggregate tracking.

What went wrong: The enforcement engine's data type could not represent all values in the mandate's range. The overflow was silent — no error, no warning, no rejection. Empirical testing with values below £42,949,672 would not trigger the overflow. Formal verification of the type-safety property "the enforcement engine's data types can represent all values in the mandate's range without overflow or truncation" would have detected the type mismatch at verification time, before deployment. Consequence: incorrect enforcement limit, cascading fix that introduced a larger vulnerability, loss of confidence in the enforcement engine's correctness.

4. Requirement Statement

Scope: This dimension applies to governance enforcement mechanisms where the consequences of enforcement failure are severe enough to justify the investment in formal verification. This includes but is not limited to: safety-critical deployments where governance failure could result in physical harm, financial deployments where governance failure could result in material financial loss (defined by the organisation's risk appetite, but typically exceeding £1,000,000 in potential exposure), regulated deployments where governance failure could result in regulatory enforcement action, and multi-agent deployments where governance failure in one agent could cascade to other agents. The scope covers the governance policy (AG-134), the precedence mechanism (AG-135), the enforcement engine, and the interaction between them. It does not require formal verification of the agent runtime itself (which is typically not formally verifiable) — only the governance enforcement mechanism. Organisations must identify which governance invariants are critical enough to warrant formal verification; not all invariants require this level of assurance.

4.1. A conforming system MUST identify and formally specify all critical governance invariants — properties that must hold under all reachable states, including but not limited to: maximum transaction value limits, aggregate exposure ceilings, prohibited counterparty restrictions, safety parameter boundaries, and irreversible-action constraints.

4.2. A conforming system MUST express critical governance invariants in a formal specification language suitable for automated verification (e.g., temporal logic, first-order logic, refinement types, or a purpose-built invariant specification language).

4.3. A conforming system MUST apply formal verification techniques — model checking, theorem proving, abstract interpretation, or equivalent methods — to verify that the governance enforcement mechanism satisfies all critical invariants under all reachable states within the formal model.

4.4. A conforming system MUST re-verify all critical invariants whenever the governance enforcement mechanism, the policy set, or the precedence specification changes, before the change is deployed to production.

4.5. A conforming system MUST maintain the formal specification of invariants, the verification proofs, and the verification tool versions as auditable artefacts, such that the verification can be independently reproduced.

4.6. A conforming system SHOULD verify the absence of common implementation defects in the enforcement engine: integer overflow, null reference, division by zero, race conditions, and unbounded recursion, using static analysis or model checking techniques.

4.7. A conforming system SHOULD verify policy composition properties — that the combination of multiple policies, under the defined precedence rules, does not create reachability paths to states that any individual policy would prohibit.

4.8. A conforming system SHOULD maintain a mapping between formal invariants and regulatory requirements, demonstrating that each critical regulatory obligation has a corresponding formally verified governance invariant.

4.9. A conforming system MAY implement runtime invariant monitoring as a defence-in-depth complement to static verification, checking that invariants hold at runtime even though they have been formally verified to hold, providing an additional detection layer for environmental factors outside the formal model.

5. Rationale

High-Assurance Invariant Verification Governance addresses the fundamental limitation of testing-based assurance. Testing demonstrates that a system behaves correctly for the tested inputs; it cannot demonstrate correctness for untested inputs. For governance enforcement — where a single incorrect decision can result in material financial loss, safety incidents, or regulatory enforcement — this limitation is significant. The state space of a governance enforcement engine is vast: every combination of action type, parameter value, currency, counterparty, timing, environmental condition, and concurrent action creates a distinct input state. Exhaustive testing of this space is infeasible.

Formal verification addresses this by analysing the enforcement engine's behaviour over all possible inputs within a formal model. A model checker explores every reachable state of the system; a theorem prover derives mathematical proofs that properties hold. The result is not "we tested 50,000 cases and all passed" but "we have proved that this property holds for all reachable states." The assurance level is qualitatively different.

This is not academic pedantry. Real governance failures occur at the intersection of conditions that individual tests do not cover: a null exchange rate combined with a weekend timing combined with a specific currency pair. A race condition that manifests only under a specific load pattern. An integer overflow that occurs only with values exceeding a threshold that test data never reaches. These are the failures that empirical testing misses and formal verification catches.

The practical application of formal verification to governance enforcement is feasible because the enforcement engine is a relatively constrained system. Unlike the agent runtime (which processes natural language, reasons about goals, and interacts with arbitrary tools), the enforcement engine evaluates a formally specified policy against a structured action request and produces a binary decision. This constrained interface makes formal verification tractable. The policy language (AG-134) provides formal semantics; the enforcement engine implements an evaluation algorithm; formal verification proves that the algorithm correctly implements the semantics for all inputs.

AG-138 deliberately limits its scope to critical governance invariants, not all system properties. The cost of formal verification scales with the complexity of the property and the size of the state space. Verifying that "no transaction exceeding £50,000 is ever permitted" is tractable; verifying that "the agent always gives the best possible response" is not. The organisation must identify which invariants are critical — the ones where a single violation has severe consequences — and invest verification effort there.

6. Implementation Guidance

AG-138 requires organisations to apply formal methods to their most critical governance enforcement properties. The implementation approach depends on the nature of the invariants, the enforcement engine's architecture, and the verification tools available.

Recommended patterns:

Model checking of finite-state policy models. For governance policies with finite state spaces (e.g., a policy with a fixed set of action types, a bounded value range, and a fixed counterparty list), use a model checker (TLA+/TLC, SPIN, NuSMV, or Alloy) to verify critical invariants. Model the enforcement engine's evaluation algorithm, the policy specification, and the environment (possible actions, concurrent submissions, environmental conditions). Specify invariants in temporal logic (e.g., CTL: AG(transaction_value > mandate_limit -> enforcement_decision = deny) — "in all reachable states, if the transaction value exceeds the mandate limit, the enforcement decision is deny"). The model checker exhaustively explores all reachable states and reports any state that violates the invariant. Example: a financial services firm models its enforcement engine in TLA+ with state variables {mandate_limit: 0..50000, transaction_value: 0..100000, aggregate_exposure: 0..1000000, concurrent_requests: 0..100}. The model checker verifies the invariant aggregate_exposure <= mandate_aggregate_limit + max_single_transaction across all 5 x 10^11 reachable states in 47 minutes.
Theorem proving for unbounded properties. For invariants that involve unbounded values (e.g., "no transaction of any value exceeding the mandate limit is ever permitted"), use an interactive theorem prover (Coq, Isabelle/HOL, Lean, or Agda) to construct a mathematical proof. The enforcement engine's evaluation function is encoded as a function in the theorem prover's language, and the invariant is stated as a theorem to be proved. The proof is mechanically checked by the theorem prover's kernel, eliminating the possibility of human error in the proof. Example: the enforcement engine's evaluation function is encoded in Coq as evaluate(policy, action) : Decision. The theorem forall policy action, action.value > policy.max_value -> evaluate(policy, action) = Deny is proved by structural induction on the evaluation function. The proof is 347 lines of Coq and is mechanically verified in 12 seconds.
Static analysis for implementation defects. Use static analysis tools to verify the absence of common implementation defects in the enforcement engine's code. Tools include: Coverity, CodeQL, Infer (for null dereference, resource leaks, race conditions), CBMC (for C/C++ bounded model checking), and language-specific type checkers with refinement types (Liquid Haskell, F*). Example: the enforcement engine is written in Rust. Static analysis verifies: no integer overflow (all arithmetic is checked or uses saturating operations), no null dereference (Rust's type system enforces this), no race conditions (Rust's ownership model prevents data races), no unbounded recursion (recursion depth is bounded by the policy structure depth, which is bounded by a constant). The static analysis runs as part of the CI/CD pipeline and blocks deployment if any defect is detected.
Policy composition verification. Model the interaction between multiple policies under the defined precedence rules (AG-135). Verify that no sequence of policy evaluations creates a reachability path to a prohibited state. Use Alloy or TLA+ to model the policy composition: define each policy as a set of permission/denial rules, define the precedence mechanism, and model the environment as sequences of actions. Specify invariants about the composed system (e.g., "no counterparty not on the original approved list can ever receive a transaction exceeding £10,000, regardless of the sequence of prior actions and policy evaluations"). The model checker identifies any counterexample. Example: a model of 3 policies with 5 action types, 10 counterparties, and 100 possible values identifies the reachability path from Scenario B in 2.3 seconds, producing a counterexample trace: [test_transaction(counterparty_X, £99) -> auto_list_update(counterparty_X) -> transaction(counterparty_X, £500000)].

Anti-patterns to avoid:

Treating extensive testing as equivalent to formal verification. 100,000 passing tests provide high confidence but not proof. The 100,001st input may be the one that triggers the failure. For critical invariants, testing is necessary but not sufficient. Formal verification and testing are complementary — verification covers all inputs within the model, testing validates the model's assumptions against reality.
Verifying the model but not validating the model. A formal verification result is only as good as the formal model. If the model omits a relevant aspect of the real system (e.g., the model assumes 64-bit integers but the implementation uses 32-bit), the verification result does not apply to the real system. The model must be validated against the implementation — ideally by generating test cases from the model and running them against the implementation, confirming that the implementation behaves as the model predicts.
Formal verification of a simplified enforcement engine while deploying a different one. If the verified model describes a clean, idealised evaluation algorithm but the deployed code is an optimised, refactored version with caching, lazy evaluation, and error handling shortcuts, the verification result does not apply to the deployed code. The gap between the verified model and the deployed code must be bridged — either by generating the implementation from the verified model or by verifying the deployed code directly.
One-time verification without re-verification on change. A governance enforcement engine verified in 2026 may have been modified 47 times by 2027. Each modification potentially invalidates the verification result. Re-verification on every change is essential — ideally integrated into the CI/CD pipeline as a mandatory gate.
Verifying invariants without verifying the invariant specification. The formal invariant specification itself could be wrong — it might not capture the intended governance property. A specification that says "no transaction exceeding £50,000" when the intent was "no aggregate exceeding £50,000" will produce a valid verification of the wrong property. Invariant specifications must be reviewed by domain experts (compliance, legal, risk) who understand the governance intent, not just by formal methods engineers.

Industry Considerations

Financial Services. The FCA and PRA increasingly expect firms to provide evidence that controls are effective under all conditions, not just tested conditions. AG-138's formal verification provides this evidence for critical governance invariants. Firms should prioritise verification of: transaction limit enforcement (per-transaction and aggregate), sanctions screening completeness, and position limit calculations. The verification artefacts (proofs, model checker outputs) become part of the evidence base for ICAAP and SREP assessments.

Healthcare. FDA guidance on software verification and validation for medical devices (IEC 62304) requires "adequate" verification appropriate to the software's safety classification. For AI agents involved in clinical decision support, AG-138's formal verification of safety invariants (e.g., "the agent never recommends a drug interaction that is contraindicated") provides the highest available assurance level. Formal verification artefacts support FDA pre-market submissions.

Critical Infrastructure. IEC 61508 (functional safety) and EN 50129 (railway safety) require formal methods at the highest safety integrity levels (SIL 3 and SIL 4). For AI agents involved in safety-critical control, AG-138's formal verification of safety invariants (e.g., "the agent never commands a valve position that could cause overpressure") is not merely a best practice but a regulatory requirement for the highest assurance levels.

Autonomous Vehicles. ISO 21448 (SOTIF — Safety of the Intended Functionality) requires analysis of all reasonably foreseeable scenarios. For AI agents involved in autonomous vehicle governance, AG-138's model checking can verify safety invariants across a comprehensive set of driving scenarios modelled as state transitions.

Maturity Model

Basic Implementation — The organisation has identified its critical governance invariants and documented them in natural language with accompanying formal specifications. Static analysis tools are applied to the enforcement engine's code to detect common implementation defects (integer overflow, null dereference, race conditions). The static analysis runs in the CI/CD pipeline and blocks deployment on detected defects. Formal verification has not yet been applied to governance properties, but the formal specification of invariants is in place and reviewed by domain experts. This level provides implementation-level assurance through static analysis but does not yet verify governance-level properties.

Intermediate Implementation — Model checking is applied to finite-state governance properties. Critical invariants are verified using TLA+, Alloy, SPIN, or equivalent tools. The model covers the enforcement engine's evaluation algorithm, the policy specification, and the environment (action types, parameter ranges, concurrent requests). Model checking runs as part of the CI/CD pipeline — every policy change or enforcement engine change triggers re-verification. Verification results and counterexamples are stored as auditable artefacts. The model has been validated against the implementation through test case generation. Policy composition is verified for the most critical interaction scenarios.

Advanced Implementation — All intermediate capabilities plus: theorem proving is applied to unbounded properties, providing mathematical proofs that critical invariants hold for all possible inputs. The enforcement engine's implementation is either generated from the verified model or verified directly at the source code level. A complete formal mapping between regulatory requirements and verified invariants demonstrates that every critical regulatory obligation has a corresponding proof. Runtime invariant monitoring complements static verification, providing defence-in-depth. Independent third-party review of the formal models, specifications, and proofs confirms their correctness and completeness. The organisation can provide, for every critical governance invariant, a mechanically verified proof that the invariant holds under all reachable states.

7. Evidence Requirements

Required artefacts:

Critical invariant catalogue. A complete list of all identified critical governance invariants, each with: natural-language description, formal specification, justification for criticality classification, mapping to regulatory requirements (if applicable), and the verification method applied.
Formal specification artefacts. The formal specifications of all critical invariants in their specification language (TLA+, Alloy, Coq, Isabelle/HOL, or equivalent). Include the formal model of the enforcement engine against which the invariants are verified.
Verification results. The output of model checking or theorem proving for each critical invariant, including: the tool and version used, the verification parameters (state space size, time taken, memory used), the result (verified/counterexample), and any assumptions or abstractions made. For theorem prover outputs, include the complete proof scripts.
Model validation evidence. Evidence that the formal model accurately represents the deployed enforcement engine: test cases generated from the model and executed against the implementation, mapping between model constructs and implementation code, and documentation of any abstractions or simplifications in the model with their justification.
Re-verification records. Records showing that re-verification was performed for every change to the enforcement engine, policy set, or precedence specification. Include the change identifier, the re-verification result, and the deployment gate decision.

Retention requirements:

Invariant specifications, verification results, and proofs: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.
Re-verification records: retained for the lifetime of the deployment plus 3 years.

Access requirements:

Producible to regulators or auditors within 48 hours of request. An auditor with access to the verification tool should be able to independently re-run the verification and obtain the same result.

8. Test Specification

Test 8.1: Critical Invariant Formal Specification Completeness

Stimulus: Review the critical invariant catalogue against the organisation's governance requirements, regulatory obligations, and risk appetite. For each identified critical governance property, verify that a corresponding formal specification exists.
Expected behaviour: Every critical governance property has a formal specification. The specification is expressed in a language suitable for automated verification. Domain experts (compliance, legal, risk) confirm that the formal specification accurately captures the governance intent.
Pass criteria: 100% coverage of critical governance properties with formal specifications. Domain expert sign-off on each specification.
Fail criteria: Any critical governance property lacks a formal specification, or a formal specification does not accurately capture the governance intent as confirmed by domain experts.

Test 8.2: Model Checking Verification of Finite-State Invariants

Stimulus: Run the model checker against all finite-state critical invariants. Verify that the model checker explores all reachable states and reports no violations.
Expected behaviour: The model checker completes without finding counterexamples. The state space exploration is complete (no truncation due to state space explosion). The verification covers all critical invariants.
Pass criteria: All finite-state critical invariants are verified with complete state space exploration. No counterexamples are found. The model checker's output is deterministic and reproducible.
Fail criteria: Any invariant produces a counterexample, or the state space exploration is incomplete (truncated), or the verification result is non-deterministic.

Test 8.3: Implementation Defect Static Analysis

Stimulus: Run static analysis tools against the enforcement engine's source code. Verify that no critical defects are reported: integer overflow, null dereference, division by zero, race conditions, unbounded recursion, and use-after-free.
Expected behaviour: The static analysis reports zero critical defects. All arithmetic operations are overflow-safe. All null checks are in place. All concurrent access is properly synchronised.
Pass criteria: Zero critical defects reported by static analysis. The analysis covers 100% of the enforcement engine's code paths.
Fail criteria: Any critical defect is reported, or the analysis coverage is incomplete.

Test 8.4: Policy Composition Invariant Verification

Stimulus: Model the current set of active policies under the current precedence specification. Verify that the composition of policies does not create reachability paths to states that violate critical invariants.
Expected behaviour: No reachability path to a prohibited state exists under the current policy composition. The model checker or theorem prover confirms that the composition preserves all critical invariants.
Pass criteria: All composition-level invariants are verified. No counterexample trace shows a sequence of actions that leads to a prohibited state.
Fail criteria: A counterexample trace demonstrates a reachability path to a prohibited state through policy composition.

Test 8.5: Re-Verification on Change

Stimulus: Introduce a change to the enforcement engine, a policy, or the precedence specification. Verify that the CI/CD pipeline triggers re-verification and blocks deployment until verification completes successfully.
Expected behaviour: The change triggers automatic re-verification of all affected critical invariants. The deployment is blocked until verification succeeds. If verification fails, the deployment is rejected with the counterexample or defect report.
Pass criteria: Re-verification is triggered automatically. Deployment is blocked until verification succeeds. A deliberately introduced violation (e.g., changing a limit from 50000 to 50001 when the invariant specifies 50000) is caught and blocks deployment.
Fail criteria: The change deploys without re-verification, or a deliberately introduced violation passes verification.

Test 8.6: Model Validation Against Implementation

Stimulus: Generate test cases from the formal model (using model-based testing techniques). Execute the generated test cases against the deployed enforcement engine. Compare the model's predicted outcomes with the engine's actual outcomes.
Expected behaviour: For every generated test case, the enforcement engine's actual outcome matches the model's predicted outcome. Zero divergences.
Pass criteria: 100% agreement between model predictions and implementation behaviour across all generated test cases (minimum 10,000 test cases generated from the model).
Fail criteria: Any divergence between model prediction and implementation behaviour, indicating that the model does not accurately represent the implementation.

Test 8.7: Counterexample Trace Reproducibility

Stimulus: For any historical counterexample (from previous verification cycles or deliberately introduced violations), re-run the verification with the same model and specification. Verify that the same counterexample is produced.
Expected behaviour: The verification result is deterministic and reproducible. The same model and specification produce the same counterexample.
Pass criteria: Counterexamples are reproducible across runs, environments, and tool installations.
Fail criteria: The verification result is non-deterministic, or counterexamples are not reproducible.

Conformance Scoring

Score 0: No formal specification of governance invariants exists — assurance is based entirely on testing and manual review.
Score 1: Critical governance invariants are formally specified and static analysis is applied to the enforcement engine code — implementation defects are detected but governance properties are not formally verified.
Score 2: Critical governance invariants are verified through model checking with complete state space exploration, the model is validated against the implementation, and re-verification runs on every change — governance properties are proved for all reachable states within the finite-state model.
Score 3: All Score 2 capabilities plus theorem-proved unbounded properties, verified policy composition, regulatory-to-invariant traceability, runtime invariant monitoring, and independent third-party audit of formal models and proofs.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9 (Risk Management System)	Direct requirement
EU AI Act	Article 15 (Accuracy, Robustness and Cybersecurity)	Direct requirement
SOX	Section 404 (Internal Controls Over Financial Reporting)	Supports compliance
FCA SYSC	6.1.1R (Systems and Controls)	Supports compliance
PRA SS1/23	Model Risk Management — Model Validation	Direct requirement
IEC 61508	SIL 3/SIL 4 — Formal Verification Requirements	Direct requirement
IEC 62304	Software Safety Classification — Verification Requirements	Direct requirement
ISO 21448	SOTIF — Safety of the Intended Functionality	Supports compliance
NIST AI RMF	GOVERN 1.1, MANAGE 2.2, MEASURE 2.6	Supports compliance
EN 50129	Railway Safety — Formal Methods at Highest SIL	Supports compliance

EU AI Act — Article 9 (Risk Management System)

Article 9 requires risk mitigation measures to be "tested with a view to identifying the most appropriate risk management measures." Formal verification goes beyond testing by providing mathematical proof that risk mitigation measures (governance invariants) hold under all conditions. For high-risk AI systems, formal verification of critical governance invariants provides the strongest available evidence that the risk management system is effective.

EU AI Act — Article 15 (Accuracy, Robustness and Cybersecurity)

Article 15 requires that AI systems achieve "an appropriate level of accuracy, robustness and cybersecurity." Formal verification provides the highest available assurance of accuracy (the enforcement engine produces correct decisions for all inputs), robustness (the enforcement engine maintains correctness under all environmental conditions within the model), and security (implementation defects that could be exploited are absent).

IEC 61508 — SIL 3/SIL 4

IEC 61508 Table B.7 lists formal methods as "highly recommended" for SIL 3 and SIL 4 software. For AI agents involved in safety-critical systems, AG-138's formal verification directly satisfies this requirement. The verification artefacts (formal specifications, proofs, model checker outputs) are part of the safety case evidence.

PRA SS1/23 — Model Risk Management

SS1/23 expects firms to validate models including AI systems. Formal verification of governance invariants provides a form of model validation that goes beyond back-testing and statistical analysis — it provides proof that the governance properties hold under all conditions, not just historically observed conditions. This is particularly relevant for governance of AI agents operating in tail-risk scenarios where historical data may not cover the critical cases.

IEC 62304 — Software Safety Classification

IEC 62304 requires verification activities commensurate with the software's safety classification. For Class C (highest risk) software, formal verification of safety-critical properties provides the strongest available evidence of software correctness. AG-138's verification of governance invariants directly supports IEC 62304 compliance for AI agent governance software in medical device contexts.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Deployment-specific for standard deployments; potentially life-safety-critical for safety-critical deployments

Consequence chain: Without formal invariant verification, governance enforcement relies on testing, which covers only the tested inputs. The failure mode is a corner case — a specific combination of inputs that testing did not cover and that violates a critical governance invariant. Corner cases in governance enforcement are not theoretical: they include null values in currency conversions (Scenario A), multi-step reachability paths through policy composition (Scenario B), and integer overflows at type boundaries (Scenario C). Each of these represents a real governance failure that empirical testing missed. The operational consequence depends on the invariant violated: a violated transaction limit invariant causes material governed exposure; a violated safety invariant causes physical harm; a violated sanctions invariant causes regulatory enforcement. The regulatory consequence is severe because the organisation had the governance policy in place but could not guarantee that the enforcement mechanism correctly implemented the policy. The gap between policy and enforcement is exactly the gap that formal verification closes. For safety-critical deployments, the consequence of an unverified invariant violation can be irreversible — physical harm, environmental damage, or loss of life — making AG-138 the most consequential dimension in those contexts. Cross-reference with AG-134 (Machine-Checkable Policy Semantics) which provides the formal policy that AG-138 verifies, AG-135 (Policy Precedence and Conflict Arbitration Governance) whose composition AG-138 can verify, and AG-036 (Reasoning Process Integrity) for the agent-side reasoning that governance invariants constrain.

Cross-references: AG-134 (Machine-Checkable Policy Semantics) provides the formally specified policies that AG-138 verifies. AG-135 (Policy Precedence and Conflict Arbitration Governance) provides the precedence logic whose composition AG-138 can verify. AG-136 (Independent Control-Plane Separation Governance) provides the separated enforcement engine that AG-138 verifies. AG-137 (Runtime Attestation and Trusted Execution Governance) ensures that the verified enforcement engine is the one actually running. AG-001 (Operational Boundary Enforcement) provides the foundational boundary that AG-138 can formally prove is correctly implemented. AG-007 (Governance Configuration Control) governs the change control that triggers AG-138 re-verification.

Cite this protocol

AgentGoverning. (2026). AG-138: High-Assurance Invariant Verification Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-138

← Previous Protocol

AG-137

Runtime Attestation and Trusted Execution Governance

Next Protocol →

AG-139

Competence Envelope Governance