AG-154

Correlated Control Failure Analysis

Control Efficacy, Redundancy & Meta-Governance ~15 min read AGS v2.1 · April 2026
EU AI Act FCA NIST ISO 42001

2. Summary

Correlated Control Failure Analysis requires that organisations systematically identify, model, and mitigate shared failure modes across their AI agent governance controls. When multiple governance controls share dependencies — the same infrastructure, the same credentials, the same vendor, the same reasoning model, or the same data pipeline — a single point of failure can simultaneously disable all controls that share that dependency. This dimension ensures that the apparent redundancy of multiple governance controls translates into actual independence, and that correlated failure risks are identified before they materialise as simultaneous multi-control failures.

3. Example

Scenario A — Shared Infrastructure Disables All Monitoring: An organisation deploys five governance controls for its AI agent fleet: mandate enforcement (AG-001), behavioural drift detection (AG-022), deception detection (AG-039), action logging, and human escalation triggers. All five controls are deployed as microservices on the same Kubernetes cluster, sharing the same node pool. A resource exhaustion event caused by a denial-of-service attack on the agent API consumes all available cluster resources. All five governance controls become unresponsive simultaneously. The agents continue to operate because they are hosted on a separate cluster, but they now operate with zero governance oversight. In the 47 minutes before the infrastructure team restores governance services, one agent executes £340,000 in transactions that would have been blocked by mandate enforcement, and another exhibits behavioural drift that would have triggered escalation.

What went wrong: Five nominally independent governance controls shared a single infrastructure dependency. The apparent redundancy was illusory — a single failure event disabled all controls simultaneously. No correlated failure analysis had identified this shared dependency.

Scenario B — Shared Credential Rotation Disables All Controls: An organisation's governance controls authenticate to the agent event stream using service account credentials managed by a central identity provider. During a scheduled credential rotation, the identity provider experiences a 2-hour outage. All governance controls lose access to the event stream simultaneously. The controls are designed to fail-safe (AG-008 compliant), so agents are paused — but the organisation's entire AI agent fleet is offline for 2 hours during peak business operations, causing an estimated £1.2 million in lost revenue and 3,400 customer service failures. A correlated failure analysis would have identified the identity provider as a single point of failure across all controls and recommended credential caching or independent authentication for critical controls.

What went wrong: All governance controls shared a credential management dependency. The correlated failure was not identified during governance architecture design. The fail-safe behaviour (agent pause) was correct but the correlated failure made it unnecessarily broad — all agents paused rather than only those whose specific controls were affected.

Scenario C — Shared AI Model Creates Common-Mode Reasoning Failure: An organisation uses the same large language model as a component in three governance controls: content safety filtering, regulatory compliance checking, and anomaly detection. A model update introduces a reasoning regression that causes the model to underweight specific categories of safety-relevant content. All three controls degrade simultaneously because they share the same reasoning component. The content safety filter misses 28% of policy-violating outputs. The compliance checker fails to flag 19% of regulatory violations. The anomaly detector generates 40% more false negatives. Because all three controls use the same model, their failures are correlated — they all miss the same categories of violations, providing no compensating detection. A diverse model approach (different models for different controls) would have limited the failure to one control while the others compensated.

What went wrong: Three governance controls shared a common reasoning component (the same LLM). A model regression created a common-mode failure across all three. No diversity analysis had identified the shared model dependency as a correlated failure risk.

4. Requirement Statement

Scope: This dimension applies to all AI agent governance deployments where multiple governance controls are expected to provide layered or redundant assurance. Any organisation deploying two or more governance controls has a correlated failure risk — and the risk increases with the number of controls sharing common dependencies. Single-control deployments are technically excluded, though such deployments are inherently fragile and should be evolving toward multi-control architectures. The scope extends to shared dependencies at all layers: infrastructure (compute, network, storage), platform (operating system, container runtime, orchestration), service (identity providers, logging services, configuration management), data (event streams, data pipelines, reference data), and reasoning (shared AI models, shared algorithmic components).

4.1. A conforming system MUST maintain a dependency map documenting the infrastructure, platform, service, data, and reasoning dependencies of each deployed governance control.

4.2. A conforming system MUST identify all shared dependencies where two or more governance controls depend on the same component, and classify each shared dependency by the number of controls affected and the criticality of the controls involved.

4.3. A conforming system MUST assess the correlated failure risk for each shared dependency, documenting the impact of the dependency's failure on each governance control and the combined governance coverage loss.

4.4. A conforming system MUST implement mitigation for any shared dependency whose failure would simultaneously disable governance controls covering more than 50% of the governance function scope (e.g., more than 50% of deployed controls, or all controls covering a specific risk domain).

4.5. A conforming system MUST update the dependency map and correlated failure assessment within 30 days of any change to governance control deployment, infrastructure, or dependencies.

4.6. A conforming system SHOULD implement diversity requirements for critical governance controls, ensuring that controls covering the same risk domain do not share infrastructure, platform, vendor, or reasoning model dependencies.

4.7. A conforming system SHOULD conduct correlated failure simulation exercises at least annually, testing the impact of simultaneous failure of controls sharing a common dependency.

4.8. A conforming system SHOULD monitor shared dependencies in real time and alert when a dependency experiences degradation that could affect multiple controls.

4.9. A conforming system MAY implement automatic governance posture adjustment when a shared dependency degrades, tightening remaining controls or pausing agent operations based on the residual governance coverage level.

5. Rationale

Redundancy is a fundamental principle of reliable system design. Governance frameworks rely on redundancy — multiple controls covering the same risk domain — to ensure that the failure of any single control does not leave a risk unmitigated. But redundancy only works if the redundant components fail independently. When multiple components share a common dependency, their failures are correlated, and the apparent redundancy is partially or wholly illusory.

This problem is well-understood in safety engineering (common-cause failure analysis is mandatory in nuclear safety and aerospace), financial risk management (correlation risk is a core concept in portfolio theory and was a primary driver of the 2008 financial crisis), and reliability engineering (common-mode failure analysis is standard in high-reliability systems). Yet in AI governance, correlated failure risk is routinely overlooked. Organisations deploy multiple governance controls and assume that "more controls means more assurance" without analysing whether those controls share dependencies that could cause them to fail together.

The relationship to AG-155 (Oversight Diversity and Heterogeneous Redundancy Governance) is direct: AG-154 identifies correlated failure risks; AG-155 prescribes the diversity measures needed to mitigate them. AG-154 is the diagnostic dimension (where are the correlated risks?); AG-155 is the prescriptive dimension (how do we eliminate them?). Both are necessary — diagnosis without prescription identifies risks but doesn't fix them; prescription without diagnosis may address the wrong risks.

The 50% threshold in requirement 4.4 represents a pragmatic balance. Requiring mitigation for every shared dependency would be impractical and disproportionate. Requiring mitigation only for dependencies whose failure would disable all controls would miss dangerous partial failures. The 50% threshold ensures that the most impactful correlated failure risks are mitigated while allowing organisations to accept minor correlation risks with appropriate documentation.

6. Implementation Guidance

Correlated control failure analysis begins with mapping dependencies and proceeds through identification, assessment, and mitigation of shared failure modes.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. Correlated failure risk in governance controls maps directly to operational resilience requirements under DORA and FCA/PRA operational resilience policy. Financial regulators expect firms to identify important business services and map their dependencies to identify concentration risks. For AI agent governance, governance controls are supporting functions for important business services, and their correlated failure risks must be identified and mitigated.

Healthcare. Clinical AI governance controls that share a common dependency create patient safety risks. If a clinical decision support agent's safety filter and dosage checker both depend on the same drug interaction database, a database corruption event disables both safety controls simultaneously. Healthcare regulators expect defence-in-depth with genuinely independent layers.

Critical Infrastructure. IEC 62443 and nuclear safety frameworks require common-cause failure analysis for safety-related systems. AI agents controlling critical infrastructure must apply equivalent analysis to their governance controls.

Maturity Model

Basic Implementation — A dependency map documents infrastructure, platform, service, data, and reasoning dependencies for each governance control. Shared dependencies are identified and classified. Correlated failure risk is assessed for shared dependencies affecting more than 50% of governance coverage. Mitigation is implemented for the highest-risk shared dependencies. The dependency map is updated within 30 days of changes. This level meets the minimum mandatory requirements.

Intermediate Implementation — All basic capabilities plus: diversity scoring is computed for each risk domain. Correlated failure simulation exercises are conducted annually. Real-time dependency health monitoring alerts when shared dependencies degrade. The dependency graph is maintained as a live artefact, updated automatically from infrastructure-as-code configurations.

Advanced Implementation — All intermediate capabilities plus: automatic governance posture adjustment responds to shared dependency degradation in real time. Multi-cloud or multi-region deployment eliminates infrastructure-level correlated failure risks for critical controls. Reasoning model diversity eliminates common-mode reasoning failures. Independent adversarial testing of correlated failure resilience has been conducted. The organisation can demonstrate to regulators that no single dependency failure can disable more than a defined percentage of governance coverage.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Dependency Map Completeness

Test 8.2: Shared Dependency Identification

Test 8.3: Correlated Failure Impact Simulation

Test 8.4: 50% Coverage Threshold Mitigation

Test 8.5: Dependency Map Currency

Test 8.6: Reasoning Model Correlation

Test 8.7: Automatic Posture Adjustment

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 9 (Risk Management System)Supports compliance
EU AI ActArticle 15 (Accuracy, Robustness and Cybersecurity)Supports compliance
DORAArticle 9 (ICT Risk Management Framework)Direct requirement
DORAArticle 28 (Third-Party ICT Concentration Risk)Direct requirement
FCA/PRAOperational Resilience Policy (PS6/21, PS21/3)Direct requirement
NIST AI RMFGOVERN 1.1, MANAGE 2.2Supports compliance
ISO 42001Clause 6.1 (Actions to Address Risks)Supports compliance
IEC 62443SR 7.1 (Denial of Service Protection)Supports compliance

DORA — Article 28 (Third-Party ICT Concentration Risk)

Article 28 requires financial entities to identify and manage concentration risk arising from dependency on third-party ICT service providers. For AI agent governance, this includes concentration risk where multiple governance controls depend on the same third-party provider (cloud infrastructure, model provider, identity service). AG-154 directly implements the concentration risk identification and assessment required by Article 28.

FCA/PRA Operational Resilience Policy

PS6/21 and PS21/3 require firms to identify important business services and map the resources (people, processes, technology, facilities, information) required to deliver them. Firms must identify vulnerabilities arising from concentration of resources. For AI agent governance, governance controls are resources supporting important business services, and their shared dependencies create concentration vulnerabilities that must be identified and mitigated.

DORA — Article 9

Article 9 requires financial entities to have in place mechanisms to promptly detect anomalous activities. Correlated control failure — where multiple governance controls degrade simultaneously due to a shared dependency — is an anomalous activity that requires detection mechanisms. Real-time dependency health monitoring implements this detection requirement.

10. Failure Severity

FieldValue
Severity RatingCritical
Blast RadiusOrganisation-wide — a correlated failure can simultaneously disable all governance controls, leaving the entire agent fleet ungoverned

Consequence chain: Correlated control failure is a meta-failure that turns apparent defence-in-depth into a single point of failure. The immediate consequence is simultaneous degradation or loss of multiple governance controls. The operational consequence depends on the fail-safe posture: if controls fail-safe (AG-008 compliant), all dependent agents pause simultaneously, causing a total AI service outage; if controls fail-open, all dependent agents operate without governance oversight simultaneously, creating unbounded risk exposure. Both outcomes are severe — total outage has immediate business impact, and total governance loss has immediate risk exposure. The failure is particularly dangerous because it occurs precisely when the organisation most needs its governance controls: during adversarial attack, infrastructure degradation, or unusual operational conditions. The business consequences include: regulatory enforcement for inadequate operational resilience, financial losses from ungoverned agent operations, service disruption costs, and potential systemic risk if the correlated failure affects agents interacting with external markets, counterparties, or public services.

Cross-references: AG-008 (Governance Continuity Under Failure) — ensures individual controls survive component failures; AG-154 ensures that multiple controls do not share failure modes. AG-007 (Governance Configuration Control) — changes to governance configuration can introduce or remove shared dependencies, triggering dependency map updates. AG-027 (Governance Override Resistance) — a correlated failure that disables override resistance across multiple controls creates an override vulnerability. AG-056 (Independent Validation) — validates that claimed independence between controls is actual independence. AG-153 (Control Efficacy Measurement Governance) — live challenge results provide data for identifying correlated failure patterns. AG-155 (Oversight Diversity and Heterogeneous Redundancy Governance) — prescribes the diversity measures needed to mitigate the correlated failure risks identified by AG-154.

Cite this protocol
AgentGoverning. (2026). AG-154: Correlated Control Failure Analysis. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-154