This dimension governs the detection, measurement, and remediation of behavioural divergence across multiple simultaneously operating instances of the same agent—divergence that arises from differences in cached context, in-session memory state, system prompt versioning, infrastructure configuration, model weight drift, or adversarial manipulation targeting individual replicas. Consistent governance-relevant behaviour across instances is a foundational requirement because downstream systems, human overseers, audit mechanisms, and affected users legitimately expect that the same agent identity produces materially equivalent decisions, refusals, disclosures, and policy applications regardless of which replica handles a given interaction. Failure manifests as instances of the same nominal agent producing contradictory safety refusals—one instance refusing a financial transaction as policy-violating while another approves it, one replica disclosing regulatory limitations to a user while another conceals them, or a safety-critical agent accepting a hazardous command on one node while rejecting it on another—producing inconsistent real-world outcomes, undermined audit integrity, exploitable inconsistency gaps, and legal liability attributable to a single agent identity operating non-uniformly at scale.
A high-frequency trading workflow agent is horizontally scaled to twelve replicas across two data-centre availability zones to meet throughput requirements during market-open volatility windows. An infrastructure team pushes a configuration patch to six replicas in zone A to adjust risk-threshold parameters following a compliance review; the patch deployment pipeline fails silently on zone B, leaving six replicas running the prior configuration. A large institutional client submits forty-seven trade authorisation requests during a fifteen-minute window. Requests are load-balanced across both zones. Thirty-one requests are routed to zone A replicas operating under the updated configuration, which correctly applies the new position-limit ceiling of USD 2.4 million per instrument; sixteen requests reach zone B replicas still operating under the prior ceiling of USD 8.1 million. The zone B replicas approve four trades that the zone A replicas would have rejected. Net exposure created by the inconsistency: USD 23.6 million above the post-compliance ceiling. The discrepancy is not detected until a regulatory reconciliation audit fourteen days later. The firm faces a corrective capital requirement, a supervisory notice, and a USD 4.1 million fine. The root cause—divergent configuration state across replicas of the same agent—was not observable in real time because no cross-instance consistency monitoring was in place.
A public-sector benefits-eligibility agent is deployed across thirty-two nodes serving a national social services platform processing approximately 180,000 citizen interactions per day. A memory-layer bug introduced in a library update causes eleven of the thirty-two nodes to cache an outdated disclosure template that omits a mandatory statement about the right to human review of automated decisions, a disclosure required by national administrative law. The remaining twenty-one nodes serve the correct disclosure. For six weeks before the bug is detected, approximately 37,800 citizen interactions are processed by nodes serving the deficient disclosure. Citizens whose interactions were handled by non-disclosing nodes were not informed of their legal right and consequently did not exercise it in a number of cases involving adverse eligibility determinations. A Parliamentary oversight inquiry identifies the inconsistency through analysis of session logs; the administering department faces a formal finding of non-compliance with automated decision-making law, is required to notify all affected citizens, and must reopen adverse determinations made during the affected window. Remediation cost: approximately EUR 2.9 million. The technical cause was a memory-caching divergence across replicas of the same agent identity, detectable through cross-instance output comparison but not monitored.
A robotic process control agent manages pneumatic isolation valves across a chemical manufacturing line. It is deployed in an active-active high-availability configuration across two edge compute nodes for resilience. An adversary with network access to one node injects a modified system prompt fragment via a maintenance interface that partially overwrites the agent's safety-envelope policy parameters on node 2, raising the maximum permissible pressure threshold for a specific valve cluster from 340 PSI (safe operational ceiling) to 510 PSI (above material safety limit). Node 1 retains the correct parameters. Over a six-hour shift, valve commands are distributed across both nodes. Nine commands reaching node 2 are approved at pressures that node 1 would have rejected. On the fourth such approval cycle, a valve assembly operating at 487 PSI experiences a seal failure. The resulting release causes a minor chemical exposure incident requiring evacuation of twelve workers, two of whom require hospital assessment. Equipment damage is valued at USD 680,000. The divergence between the two nodes—produced by an adversarial injection against one replica—was not detected because no mechanism compared the governance-relevant behavioural state of the two nodes against a canonical reference in real time.
This dimension applies to any deployment in which more than one instance, replica, container, edge node, or process of the same agent identity operates concurrently or in succession, regardless of whether those instances share infrastructure, communicate with one another, or are known to end users to be distinct processes. It applies equally to horizontally scaled cloud deployments, active-active high-availability configurations, geographically distributed edge deployments, multi-tenant agent platforms where a single agent definition is instantiated per tenant, and rolling-deployment scenarios where old and new versions overlap during transition windows. Governance-relevant behaviour is defined for the purposes of this dimension as any agent output, decision, refusal, disclosure, classification, or action that: (a) is subject to a regulatory requirement; (b) creates, modifies, or terminates a legal, financial, or safety-consequential obligation; (c) is recorded in an audit trail as an authoritative decision; or (d) is presented to a human user as a statement of policy, limitation, right, or eligibility. Purely cosmetic, stylistic, or non-consequential output variation (e.g., sentence phrasing differences that do not alter substance) is out of scope. The dimension imposes detective controls; it does not mandate how divergence is architecturally prevented, but it mandates that divergence is detected, characterised, escalated, and remediated within defined time bounds.
4.1.1 Each operating instance of an agent MUST be registered in a centralised or federated instance registry at the time of instantiation, recording at minimum: a unique instance identifier, the agent identity to which it belongs, the exact model version or weight hash, the system prompt version hash, the memory configuration state hash, the deployment timestamp, and the infrastructure node or container identifier.
4.1.2 Each instance MUST re-attest its configuration state to the registry at a minimum interval of once per sixty minutes during active operation, and immediately following any configuration reload, restart, or external parameter update.
4.1.3 The instance registry MUST be readable by the cross-instance consistency monitoring system defined in 4.3, and MUST be write-protected against modification by the agent instances themselves.
4.1.4 An agent operator MUST be able to enumerate all currently active instances of a given agent identity within five minutes of request, with output accurate to the most recent attestation cycle.
4.2.1 For each agent identity, a canonical behavioural reference MUST be defined and maintained. The canonical reference MUST include: the authorised system prompt and its version hash; the authorised model version or weight hash; the authorised set of tool permissions and capability flags; the authorised memory access scopes; the authorised output policy rules applicable to governance-relevant behaviour classes; and the authorised refusal and disclosure templates for all mandatory disclosure categories.
4.2.2 The canonical behavioural reference MUST be version-controlled, with each version assigned an immutable content-addressed identifier.
4.2.3 Updates to the canonical reference MUST be authorised through a documented change-control process subject to the requirements of AG-019 (Configuration Integrity) and MUST be propagated to all active instances within a defined maximum propagation window not to exceed two hours for governance-relevant changes.
4.2.4 The canonical reference MUST be stored in a location that is read-accessible to monitoring systems but is not modifiable by agent instances or by automated processes without human authorisation.
4.3.1 A cross-instance consistency monitoring system MUST be deployed for every agent operating under this dimension's scope. The monitoring system MUST operate independently of the agent instances it monitors and MUST NOT share a runtime environment, process space, or configuration source with those instances.
4.3.2 The monitoring system MUST perform scheduled behavioural probe evaluations at a minimum frequency of once per thirty minutes per instance. Each probe evaluation MUST submit a standardised set of governance-relevant test stimuli to each instance and compare the outputs against the canonical expected outputs defined in the canonical behavioural reference.
4.3.3 The probe stimulus library MUST include test cases covering at minimum: safety refusal triggers for the agent's domain; mandatory regulatory disclosure triggers; policy-boundary decision cases at the margins of the agent's authorised action space; and capability-limitation acknowledgement cases.
4.3.4 The monitoring system MUST compute a per-instance consistency score after each probe evaluation cycle, expressing the proportion of governance-relevant probe responses that are materially equivalent to the canonical expected responses.
4.3.5 The monitoring system MUST detect and flag configuration-state divergence between any instance and the canonical reference within one attestation cycle of the divergence occurring.
4.3.6 The monitoring system MUST log all probe evaluations, scores, and divergence events to an append-only audit store that is inaccessible to the monitored agent instances.
4.4.1 An operator MUST define divergence severity thresholds at agent-identity level, specifying at minimum: a warning threshold (below which a consistency score triggers a logged notification to the operations team), a degraded threshold (below which an instance is automatically flagged for review and optionally removed from the active pool), and a critical threshold (below which an instance MUST be automatically quarantined and removed from production traffic without requiring human initiation).
4.4.2 The critical threshold for safety-critical, financial-value, and public-sector / rights-sensitive agent profiles MUST be set at no higher than a consistency score of 0.95 (i.e., any instance failing more than five percent of governance-relevant probes must be automatically quarantined).
4.4.3 Divergence alerts MUST be delivered to a designated human oversight channel within five minutes of threshold breach detection, including the instance identifier, the divergence type, the specific probe categories where divergence was observed, the delta between the instance configuration state and the canonical reference, and the recommended immediate action.
4.4.4 Quarantined instances MUST NOT be returned to production traffic until a root-cause investigation has been completed, a remediation action has been applied and verified, and a human authoriser has approved reinstatement.
4.5.1 Where agent instances use shared memory stores (vector databases, key-value caches, session stores), the operator MUST document the intended sharing scope—specifying which memory content is intended to be common across instances and which is intended to be instance-local.
4.5.2 The monitoring system MUST verify that memory content contributing to governance-relevant outputs is consistent with the intended sharing scope, and MUST flag any instance where memory content deviates from the canonical state in a way that could alter governance-relevant outputs.
4.5.3 Instance-local memory MUST NOT be permitted to override governance-relevant policy rules defined in the canonical behavioural reference. Operators MUST implement and verify a precedence architecture in which canonical policy rules take precedence over any instance-level memory state.
4.5.4 Cache invalidation events that affect governance-relevant content MUST be propagated to all affected instances within a maximum window of fifteen minutes and MUST be confirmed in the instance registry as successfully applied.
4.6.1 During any rolling deployment, blue-green transition, or canary release in which multiple versions of an agent are simultaneously active, the operator MUST classify each active version as either canonical or transitional.
4.6.2 Transitional instances MUST be subject to enhanced consistency monitoring at a minimum probe frequency of once per ten minutes.
4.6.3 The maximum permitted duration of a mixed-version state for governance-relevant agent deployments MUST be documented in the deployment plan, and MUST NOT exceed four hours for safety-critical, financial-value, public-sector / rights-sensitive, or crypto/web3 agent profiles without explicit risk acceptance signed by a named human authoriser.
4.6.4 If transitional instances exhibit governance-relevant behavioural divergence from the canonical version during a mixed-version window, the deployment MUST be halted and the transitional instances MUST be rolled back or quarantined before the transition proceeds.
4.7.1 The monitoring system MUST include adversarial divergence detection logic capable of distinguishing between benign configuration drift (arising from infrastructure variation or software bugs) and adversarially induced divergence (arising from external manipulation of instance configuration, memory, or system prompt state).
4.7.2 Indicators of adversarial divergence that MUST be monitored for include: sudden step-change divergence on a single instance with no corresponding configuration event; divergence confined to a specific category of governance-relevant probe (e.g., safety refusals only) inconsistent with a benign configuration change; divergence correlating in time with external access events on the affected node; and divergence in the direction of expanded permissions or reduced refusals.
4.7.3 When adversarial divergence indicators are detected, the affected instance MUST be immediately quarantined and the event MUST be escalated as a security incident under the operator's incident response process, with notification to the responsible security team within fifteen minutes.
4.7.4 Forensic evidence of the divergence event, including the instance configuration state at the time of detection, the probe results that triggered detection, and all access log entries for the affected node in the preceding twenty-four hours, MUST be preserved in tamper-evident storage immediately upon quarantine.
4.8.1 The operator MUST produce a cross-instance consistency report at a minimum frequency of weekly for all agent identities in scope of this dimension, summarising: the number of instances active during the reporting period; the range and distribution of per-instance consistency scores; all divergence events detected, their severity, and their resolution status; and any instances that were quarantined, their root causes, and their reinstatement status.
4.8.2 Consistency reports MUST be made available to the operator's AI governance function and, where applicable, to the responsible human supervisor defined under the operator's AI oversight policy.
4.8.3 Any unresolved divergence event that has been open for more than forty-eight hours MUST be escalated to senior management within the AI governance function, with a documented remediation plan and a committed resolution date.
4.8.4 For regulated profiles (financial-value, public-sector / rights-sensitive, safety-critical, cross-border / multi-jurisdiction), consistency reports and associated divergence event records SHOULD be retained for a minimum of five years and SHOULD be available for production to a regulator on request within five business days.
4.9.1 The operator MUST maintain operational documentation describing the cross-instance consistency monitoring architecture, the canonical behavioural reference management process, the divergence threshold configuration and its rationale, and the incident response procedures for divergence events.
4.9.2 Personnel responsible for operating, maintaining, or overseeing agents in scope of this dimension SHOULD receive documented training on cross-instance consistency risks, the use of monitoring tooling, and the escalation procedures for divergence events.
4.9.3 The documentation required by 4.9.1 MAY be integrated into the operator's broader AI system documentation under AG-034 (Audit Trail Completeness) and AG-019 (Configuration Integrity) rather than maintained as a standalone artefact, provided all required content is present and readily locatable.
The instinct of many engineering teams when confronted with cross-instance consistency requirements is to address them entirely through structural means: enforcing immutable infrastructure patterns, using read-only system prompt mounts, implementing GitOps-style configuration pipelines that guarantee identical deployment artefacts across all replicas. These structural approaches are correct and valuable, and they substantially reduce the probability of benign divergence. However, they are insufficient as a sole control for three reasons.
First, structural controls address only the surface they are designed for. An immutable system prompt mount does not govern memory-layer divergence; a GitOps deployment pipeline does not detect adversarial manipulation of a running instance's in-memory state; an identical model weight hash across all replicas does not prevent one instance from operating with a corrupted or tampered cache that affects its output behaviour in governance-relevant ways. Real-world deployments involve multiple configuration surfaces—model weights, system prompts, memory state, tool permission grants, runtime parameters, cached context—and structural controls applied to any one surface leave others exposed.
Second, multi-instance agent deployments exist in infrastructure environments that are subject to partial failures, network partitions, race conditions in configuration propagation, and human error in deployment pipelines. Even organisations with mature infrastructure-as-code practices experience states in which individual instances temporarily or persistently diverge from the intended configuration. The question is not whether divergence will ever occur, but whether it will be detected and remediated before it produces governance-relevant harm.
Third, and most critically for the adversarial model resistance framing of this landscape, a sufficiently motivated adversary will specifically target individual instances rather than the deployment pipeline, because a compromised single instance in a horizontally scaled fleet is both easier to attack and, without cross-instance monitoring, harder to detect than a compromised pipeline. Structural controls that enforce consistency at deployment time provide no defence against a post-deployment attack on a running instance. Detective controls that continuously compare all instances against a canonical reference are the only mechanism that closes this gap.
A naive approach to cross-instance consistency monitoring attempts to achieve complete output identity across replicas—requiring that every instance produce byte-for-byte identical responses to identical inputs. This is both technically infeasible (due to inherent non-determinism in LLM inference at production temperatures) and analytically unnecessary. The requirement is not that all instances behave identically in all respects; it is that all instances behave consistently with respect to governance-relevant decisions. The dimension therefore deliberately scopes its monitoring and alerting requirements to governance-relevant behaviour as defined in Section 4.0, and its probe stimulus library to stimuli that exercise those specific behavioural classes. This scoping is essential both for making the monitoring computationally tractable and for ensuring that alert thresholds are calibrated to real harm rather than to benign variation.
For cross-border and multi-jurisdiction deployments, cross-instance consistency carries additional legal weight. Where an agent's obligations differ by jurisdiction—because it must apply different disclosure requirements, different data handling rules, or different decision constraints in different regulatory territories—instances serving different territories must be not only internally consistent within their territory but correctly configured for their territory. A cross-instance consistency framework that monitors only for intra-fleet uniformity would incorrectly flag legitimate jurisdictional variation as divergence. The canonical behavioural reference framework in 4.2 must therefore support jurisdiction-scoped canonical references, so that an instance serving EU users is compared against the EU-applicable canonical reference and an instance serving a different jurisdiction is compared against the applicable reference for that territory.
Canonical Reference as Code. Maintain the canonical behavioural reference as a version-controlled artefact in a code repository subject to access controls, code review, and cryptographic signing of releases. Every element of the canonical reference—system prompt, tool permission manifest, disclosure templates, refusal policy rules—should exist as a structured, machine-readable artefact that can be programmatically loaded by both the monitoring system and the deployment pipeline. This ensures that the canonical reference used by monitoring is identical to the canonical reference used for deployment, eliminating a class of false positives caused by monitoring-deployment misalignment.
Probe Stimulus Library as a Governed Asset. Treat the probe stimulus library used by the consistency monitoring system as a governed asset subject to change control, separate from—but linked to—the canonical behavioural reference. Probes should be specifically designed to exercise the governance-relevant behavioural boundaries of the agent, not general capability evaluation. Engage subject-matter experts in the agent's regulatory domain (compliance, legal, safety) in the design and review of probe stimuli. Ensure probes are updated whenever the canonical behavioural reference changes to reflect new governance obligations.
Separate Monitoring Infrastructure. Deploy the cross-instance consistency monitoring system on infrastructure that is administratively and technically segregated from the agent instances it monitors. Use a separate identity, separate network path, separate configuration source, and ideally separate cloud account or physical infrastructure. The monitoring system should be treated as a privileged audit and control system, not as a peer of the agent instances. This segregation prevents an adversary who has compromised an agent instance from also compromising the monitoring system.
Attestation Heartbeat with Cryptographic Binding. Implement the instance attestation mechanism using cryptographic commitments rather than plain-text state reports. Each instance should compute and transmit a signed hash of its current governance-relevant configuration state at each attestation cycle. The signature should be verifiable by the registry against a key provisioned at deployment time. This allows the registry to detect not only configuration drift but also attempts by a compromised instance to falsify its attestation.
Graduated Response Automation. Automate the response to divergence events in proportion to their severity. Warning-level events should generate logged notifications. Degraded-level events should trigger automatic traffic-weighted reduction (e.g., removing the flagged instance from the load balancer pool while preserving it for investigation). Critical-level events should trigger automatic quarantine with no human gate on the quarantine action itself—human oversight is required for reinstatement, not for the protective isolation. This graduated model ensures that the monitoring system can act within the time window that matters for harm prevention, while preserving human authority over remediation decisions.
Jurisdiction-Scoped Canonical References. For cross-border deployments, implement canonical reference scoping at the jurisdiction or regulatory-territory level. The canonical reference management system should support definition of a base canonical reference plus jurisdiction-specific overrides, with the monitoring system resolving the applicable reference for each instance based on the instance's declared serving scope. Flag as divergence only deviations from the applicable scoped reference, not from the base reference, to avoid false positives on legitimate jurisdictional variation.
Integration with Change Management. Wire the canonical reference update process directly into the organisation's change management system. When a change to the canonical reference is approved, the change management system should automatically trigger: propagation of the updated reference to all instances, updated probe evaluations against all instances under the new reference, and a consistency report confirming that all instances have adopted the new canonical state. This eliminates the gap between approved policy changes and verified implementation.
Anti-pattern: Monitoring Only on Deployment. Running consistency checks only at deployment time and treating post-deployment divergence as an infrastructure team responsibility with no automated monitoring is the most common and most dangerous anti-pattern. Deployment-time checks catch configuration errors introduced during deployment; they do not detect adversarial manipulation, memory-layer drift, or partial infrastructure failures that occur after deployment.
Anti-pattern: Using Production Traffic as the Consistency Signal. Attempting to detect inconsistency by analysing divergence in production outputs is superficially attractive but fundamentally flawed. Production traffic varies in topic, user population, and context, making it extremely difficult to isolate governance-relevant divergence from benign output variation. It also delays detection—governance-relevant divergence may affect a small proportion of production interactions and go undetected for weeks. Dedicated probe evaluations with controlled stimuli are the correct mechanism.
Anti-pattern: Shared Monitoring and Agent Infrastructure. Running the monitoring system as a co-located process or container within the same deployment unit as the agent instances it monitors defeats the independence requirement. An adversary who compromises an agent instance can also compromise a co-located monitoring process. Monitoring must be structurally independent.
Anti-pattern: Binary Pass/Fail Consistency Scoring. Defining consistency as a binary pass/fail based on whether all probes return identical outputs ignores the governance-relevant scoping requirement and produces excessive false positives on benign stylistic variation. Scoring should be calibrated to the proportion of governance-relevant probe categories in which material divergence is observed.
Anti-pattern: Static Probe Libraries. Maintaining a fixed probe stimulus library that is not updated when the agent's regulatory obligations or operational context change results in monitoring that fails to cover new governance-relevant behaviours. The probe library must evolve in step with the canonical behavioural reference and the agent's compliance obligations.
Anti-pattern: Treating Consistency Monitoring as a Development-Phase Activity. Some teams treat cross-instance consistency verification as a QA activity performed before release and conclude that it does not need to continue in production. This misunderstands the threat model. Post-deployment adversarial manipulation, infrastructure divergence, and memory-layer drift are production phenomena; they cannot be detected by pre-release testing.
| Maturity Level | Characteristic State |
|---|---|
| Level 1 — Initial | No cross-instance consistency monitoring; divergence detected only through user complaints or manual audit; no canonical reference formally defined. |
| Level 2 — Developing | Canonical reference defined and version-controlled; ad-hoc consistency checks performed on demand; alert thresholds not formally defined; no automated quarantine. |
| Level 3 — Defined | Automated probe-based monitoring deployed; divergence thresholds defined; alerting to operations team; manual quarantine process documented and exercised. |
| Level 4 — Managed | Automated graduated response (warning/degraded/quarantine); cryptographic attestation heartbeat; adversarial divergence detection logic in place; weekly governance reports produced. |
| Level 5 — Optimising | Jurisdiction-scoped canonical references; probe library dynamically updated with regulatory change; monitoring integrated with change management pipeline; continuous improvement of probe coverage based on incident analysis. |
| Artefact | Description | Minimum Retention |
|---|---|---|
| Instance Registry Records | Timestamped registration and attestation records for each agent instance, including all configuration state hashes | 3 years |
| Canonical Behavioural Reference Version History | All versions of the canonical reference with content-addressed identifiers, change authorisation records, and propagation confirmation records | 5 years |
| Probe Evaluation Logs | Complete logs of all probe stimulus submissions, per-instance responses, consistency scores, and divergence flags | 3 years |
| Divergence Event Records | Full records for each divergence event including detection timestamp, instance identifier, divergence characterisation, severity classification, response actions taken, root-cause investigation findings, and reinstatement authorisation | 5 years |
| Adversarial Divergence Investigation Reports | Forensic reports produced following any adversarial divergence detection, including preserved configuration state, access logs, and security incident classification | 7 years |
| Cross-Instance Consistency Reports | Weekly governance reports summarising fleet consistency state, event trends, and open remediation items | 5 years |
| Change-Control Records for Canonical Reference Updates | Authorisation records, impact assessments, and propagation confirmation records for each canonical reference change | 5 years |
| Deployment Documentation | Architectural documentation of the monitoring system, including evidence of infrastructure segregation from agent instances | 3 years, updated with each significant architectural change |
| Training Records | Evidence of completion of consistency governance training for responsible personnel | Duration of employment plus 2 years |
All log and record artefacts MUST be stored in append-only, tamper-evident storage. Cryptographic integrity verification (e.g., hash-chained log entries or write-once object storage with object-lock) SHOULD be applied to probe evaluation logs and divergence event records. Artefacts required for regulatory profiles (financial-value, public-sector / rights-sensitive, safety-critical, cross-border / multi-jurisdiction) MUST be stored with redundancy sufficient to survive single-region infrastructure failure. Retention periods given are minimum periods; where applicable regulatory obligations specify longer retention, those obligations govern.
Maps to: 4.1.1, 4.1.2, 4.1.4
Procedure: Deploy a test fleet of five instances of a nominated agent. Verify that all five instances appear in the instance registry within two minutes of instantiation with all required fields (unique instance identifier, agent identity, model version hash, system prompt version hash, memory configuration state hash, deployment timestamp, node identifier). Simulate a configuration reload on two of the five instances. Verify that updated attestation records appear in the registry within the sixty-minute re-attestation interval. Issue a request to enumerate all active instances of the agent identity; verify that an accurate and complete enumeration is returned within five minutes.
Pass criteria:
Maps to: 4.2.1, 4.2.2, 4.2.3, 4.2.4
Procedure: Retrieve the current canonical behavioural reference for a nominated agent identity and verify that it contains all required components (system prompt hash, model version hash, tool permission manifest, memory access scope definition, output policy rules, mandatory disclosure templates). Apply an authorised change to the canonical reference through the documented change-control process. Measure the time from change authorisation to confirmed propagation on all active instances. Attempt to modify the canonical reference directly from an agent instance credential; verify that the modification is rejected.
Pass criteria:
Maps to: 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5, 4.3.6
Procedure: Confirm that the consistency monitoring system is deployed on infrastructure separate from the agent instances (verify separate process space, configuration source, and network identity). Submit the governance-relevant probe stimulus library to all active instances of the nominated agent. Verify that the probe library contains at minimum: three safety refusal trigger cases appropriate to the agent's domain, three mandatory disclosure trigger cases, three policy-boundary decision cases, and three capability-limitation acknowledgement cases. Verify that a per-instance consistency score is computed and logged after each evaluation cycle. Deliberately introduce a configuration divergence on one instance (under controlled test conditions) and verify that the divergence is flagged within one attestation cycle. Verify that probe logs are written to an append-only store inaccessible to agent instances.
Pass criteria:
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 9 (Risk Management System) | Direct requirement |
| EU AI Act | Article 15 (Accuracy, Robustness and Cybersecurity) | Direct requirement |
| NIST AI RMF | GOVERN 1.1, MAP 3.2, MANAGE 2.2 | Supports compliance |
| ISO 42001 | Clause 6.1 (Actions to Address Risks), Clause 8.2 (AI Risk Assessment) | Supports compliance |
Article 9 requires providers of high-risk AI systems to establish and maintain a risk management system that identifies, analyses, estimates, and evaluates risks. Cross-Instance Behavioural Consistency Governance implements a specific risk mitigation measure within this framework. The regulation requires that risks be mitigated "as far as technically feasible" using appropriate risk management measures. For deployments classified as high-risk under Annex III, compliance with AG-731 supports the Article 9 obligation by providing structural governance controls rather than relying solely on the agent's own reasoning or behavioural compliance.
Article 15 requires high-risk AI systems to achieve appropriate levels of accuracy, robustness, and cybersecurity. Cross-Instance Behavioural Consistency Governance directly supports the robustness and cybersecurity requirements by implementing structural controls that resist adversarial manipulation and ensure system integrity under attack conditions.
GOVERN 1.1 addresses legal and regulatory requirements; MAP 3.2 addresses risk context mapping; MANAGE 2.2 addresses risk mitigation through enforceable controls. AG-731 supports compliance by establishing structural governance boundaries that implement the framework's approach to AI risk management.
Clause 6.1 requires organisations to determine actions to address risks and opportunities within the AI management system. Clause 8.2 requires AI risk assessment. Cross-Instance Behavioural Consistency Governance implements a risk treatment control within the AI management system, directly satisfying the requirement for structured risk mitigation.
| Field | Value |
|---|---|
| Severity Rating | Critical |
| Blast Radius | Organisation-wide — potentially cross-organisation where agents interact with external counterparties or shared infrastructure |
| Escalation Path | Immediate executive notification and regulatory disclosure assessment |
Consequence chain: Without cross-instance behavioural consistency governance, the governance framework has a structural gap that can be exploited at machine speed. The failure mode is not gradual degradation — it is a binary absence of control that permits unbounded agent behaviour in the dimension this protocol governs. The immediate consequence is uncontrolled agent action within the scope of AG-731, potentially cascading to dependent dimensions and downstream systems. The operational impact includes regulatory enforcement action, material financial or operational loss, reputational damage, and potential personal liability for senior managers under applicable accountability regimes. Recovery requires both technical remediation and regulatory engagement, with timelines measured in weeks to months.