AG-452: Counterfactual Explanation Governance

2. Summary

Counterfactual Explanation Governance requires that AI agents provide meaningful alternative-condition explanations — statements of the form "the decision would have been different if condition X had been different" — where such explanations are legally required, contractually committed, or materially useful for the affected person to understand a decision. Counterfactual explanations are among the most actionable forms of explanation because they tell the affected person not only why a decision was made but what they could change to obtain a different outcome. This dimension establishes requirements for the generation, validation, fidelity, and delivery of counterfactual explanations, ensuring they are truthful (reflecting actual decision boundaries, not approximations that mislead), actionable (identifying conditions the person can realistically change), and consistent (producing stable counterfactuals for materially identical cases rather than generating contradictory guidance).

3. Example

Scenario A — Counterfactual Points to Unchangeable Characteristic: An AI lending agent denies a credit application and generates the following counterfactual explanation: "Your application would have been approved if you had been employed for 8 or more years in your current role." The applicant has been in her current role for 3 years after returning to the workforce following a period of caregiving. She cannot retroactively add 5 years of employment tenure. More problematically, the model's actual decision boundary is more nuanced: the application would also have been approved if the applicant's savings balance were £4,200 higher (currently £11,800, threshold £16,000) — a condition the applicant could address within 6-8 months. The agent selected the employment-tenure counterfactual because it had the single highest feature importance, not because it was the most actionable. The applicant, believing that only 5 additional years of tenure would change the outcome, does not reapply for 4 years. She pays £23,400 in additional rent over those 4 years that she would not have paid had she received the savings-balance counterfactual and reapplied after reaching the £16,000 threshold.

What went wrong: The counterfactual explanation optimised for feature importance rather than actionability. The system had no mechanism to evaluate whether the identified counterfactual condition was something the applicant could realistically change. No ranking of counterfactuals by actionability was performed. The explanation was technically accurate (tenure was a contributing factor) but practically useless, and it concealed a more useful counterfactual that would have changed the applicant's behaviour.

Scenario B — Inconsistent Counterfactuals for Similar Applicants: Two applicants with nearly identical financial profiles are denied insurance coverage by an AI underwriting agent. Applicant A is told: "Your application would have been approved if your annual income were £5,000 higher." Applicant B, whose income is £200 lower than Applicant A's, is told: "Your application would have been approved if you had no outstanding personal loans." Both counterfactuals are technically valid — each represents a path across the decision boundary. But the inconsistency is visible when the two applicants compare notes (they are colleagues at the same employer). Applicant A asks why she was not told about the personal-loan option; Applicant B asks why he was not told about the income option. The inconsistency suggests arbitrariness. Both applicants file complaints with the insurance ombudsman. The ombudsman finds that the insurer's explanation process does not produce consistent guidance for materially similar cases, undermining confidence in the fairness of the underwriting process. Remediation costs: £180,000 in complaint handling, £95,000 in process redesign, and a supervisory letter from the regulator requiring demonstration of explanation consistency before the agent can resume underwriting.

What went wrong: The counterfactual generation algorithm selected the single nearest counterfactual for each applicant without enforcing consistency across similar cases. Because the decision boundary is a complex surface in high-dimensional space, the nearest counterfactual for two nearby points in feature space can involve different features. No consistency constraint ensured that materially similar applicants received consistent guidance. The system treated each counterfactual generation as an independent optimisation problem rather than as part of a coherent explanation framework.

Scenario C — Counterfactual Reflects Stale Decision Boundary: A public employment agency uses an AI agent to assess eligibility for a retraining programme. The eligibility model is updated quarterly. In Q1, the agent tells an applicant: "You would have been eligible if you had completed at least 12 months of prior employment in the last 3 years." The applicant spends 8 months working to meet this condition and reapplies in Q3. The model was updated in Q2, and the decision boundary has shifted: the new threshold is 18 months of prior employment, and a new factor — digital skills certification — has been added. The applicant is denied again. The agent now generates a different counterfactual: "You would have been eligible if you had 18 months of prior employment and a recognised digital skills certification." The applicant has acted on stale guidance, invested 8 months of effort toward a target that moved, and has no digital skills certification. She files a complaint alleging that the agency's AI gave her false guidance. The agency's investigation finds that the counterfactual explanation system had no mechanism to disclose that decision boundaries might change, no versioning of counterfactual outputs, and no process to notify applicants whose prior counterfactuals were invalidated by model updates. Litigation costs: £340,000. Programme suspended for 5 months pending process remediation.

What went wrong: The counterfactual explanation was implicitly presented as stable guidance — "do X and you will qualify" — without any disclosure that the decision model and its boundaries are subject to change. The system had no temporal validity marker on counterfactual explanations, no process to re-evaluate outstanding counterfactuals when the model was updated, and no notification mechanism for applicants who had received counterfactuals that were invalidated by subsequent model changes.

4. Requirement Statement

Scope: This dimension applies to any AI agent that makes or recommends decisions affecting identified individuals and that provides, or is required to provide, explanations of those decisions. The scope encompasses all decision domains where counterfactual explanations are legally required (e.g., under the EU AI Act's right to meaningful explanation, ECOA adverse-action notice requirements, or GDPR Article 22 automated decision-making protections), contractually committed (e.g., in service agreements that promise explanation of decisions), or materially useful for the affected person to understand the decision and take corrective action. The scope includes the generation, validation, ranking, delivery, versioning, and temporal management of counterfactual explanations. It does not require counterfactual explanations for every decision — only for decisions where the explanation is legally required, contractually committed, or where the decision has material consequence and the affected person would benefit from understanding what alternative conditions would change the outcome. Agents that operate purely as internal analytical tools without individual-level decision outputs may implement this dimension at reduced scope.

4.1. A conforming system MUST generate counterfactual explanations for every decision that (a) has material consequence for an identified affected person and (b) is subject to a legal or contractual requirement for explanation, identifying at least one set of alternative conditions under which the decision would have been different.

4.2. A conforming system MUST ensure that every counterfactual explanation is faithful to the actual decision model — the alternative conditions identified must genuinely cross the model's decision boundary, not be approximations, heuristic guesses, or explanations generated by a separate interpretability model that does not accurately reflect the production decision model.

4.3. A conforming system MUST rank counterfactual explanations by actionability where multiple valid counterfactuals exist, presenting the most actionable counterfactual (the one involving conditions the affected person can most realistically change) with highest prominence, rather than defaulting to the counterfactual with the highest feature importance or the smallest distance in feature space.

4.4. A conforming system MUST enforce consistency in counterfactual explanations for materially similar cases — applicants with sufficiently similar input profiles must receive counterfactuals that are substantively aligned (identifying the same or closely related alternative conditions), with documented tolerance thresholds for what constitutes "materially similar" and "substantively aligned."

4.5. A conforming system MUST attach a temporal validity marker to every counterfactual explanation, disclosing that the decision model and its boundaries are subject to change and specifying either a validity period (e.g., "this guidance reflects the model in effect as of [date] and may not apply to future applications") or a mechanism for the affected person to verify current validity before acting.

4.6. A conforming system MUST implement a re-evaluation process when the decision model is updated: identify all outstanding counterfactual explanations that were generated under the prior model version, determine which counterfactuals are invalidated by the update, and where feasible, notify affected persons whose counterfactuals are materially changed.

4.7. A conforming system MUST ensure that counterfactual explanations comply with AG-451 (Plain-Language Duty Governance) — counterfactuals must be expressed in language that the affected person can understand, not in model-internal feature names or technical terms.

4.8. A conforming system SHOULD provide multiple counterfactual paths where they exist, enabling the affected person to choose which alternative condition is most feasible for them, rather than presenting only a single counterfactual.

4.9. A conforming system SHOULD distinguish between counterfactual conditions that the affected person can change (income, savings, qualifications) and conditions that are fixed or protected (age, disability status, location), clearly indicating which counterfactual conditions are actionable.

4.10. A conforming system SHOULD log all generated counterfactual explanations with the model version, input data, and decision output that produced them, creating an auditable record that supports consistency verification and temporal tracking.

4.11. A conforming system MAY implement interactive counterfactual exploration — allowing the affected person to ask "what if" questions about specific conditions and receive real-time counterfactual responses, provided such responses are faithful to the actual decision model and subject to the same validation and consistency requirements as pre-generated counterfactuals.

5. Rationale

Counterfactual explanations occupy a unique position in the landscape of AI explainability. While most explanation types answer the question "why was this decision made?" — pointing backward at contributing factors — counterfactual explanations answer the question "what would need to be different for the decision to change?" — pointing forward toward action. This forward-looking character makes counterfactuals among the most valuable explanation types for affected persons, because they provide a roadmap for changing an unfavourable outcome rather than merely a post-mortem of the decision logic.

The value of counterfactual explanations is recognised in multiple regulatory frameworks. The EU AI Act's right to "meaningful" explanation (Article 86) is widely interpreted by legal scholars as requiring more than feature-importance lists — it requires explanations that enable the affected person to contest the decision effectively, which implies understanding what would change the outcome. GDPR Article 22(3) grants data subjects the right to contest automated decisions, and effective contestation requires understanding the decision boundary, not merely the decision output. In the United States, the Equal Credit Opportunity Act requires adverse-action notices to specify "the principal reason or reasons for the action taken," and the most natural interpretation of this requirement for an AI model is a counterfactual: the primary condition whose change would flip the decision.

However, counterfactual explanations introduce governance risks that other explanation types do not. The most significant risk is misleading actionability. A counterfactual that is technically accurate but practically useless — because it identifies a condition the person cannot change — is worse than no counterfactual at all, because it creates the impression of guidance while providing none. Scenario A demonstrates this risk concretely: the applicant received a technically valid counterfactual that sent her down a 4-year dead end when a more actionable counterfactual existed. The governance obligation is not merely to produce counterfactuals but to produce useful counterfactuals — ones that the affected person can act on.

The second major risk is inconsistency. Decision models with complex, non-linear decision boundaries produce different nearest-counterfactuals for nearby points in feature space. Two applicants who differ by £200 in income may receive counterfactuals involving entirely different features. While each counterfactual may be individually valid, the inconsistency undermines trust in the fairness of the process. Affected persons who compare notes — colleagues, family members, community groups — will identify the inconsistency and reasonably question whether the system is treating them equitably. The governance requirement for consistency does not require that all applicants receive identical counterfactuals — it requires that materially similar applicants receive substantively aligned counterfactuals, so that the system's guidance is coherent rather than arbitrary.

The third major risk is temporal instability. Counterfactual explanations implicitly promise: "if you change condition X, the outcome will change." But this promise is conditioned on the decision model remaining stable. When models are updated — quarterly, monthly, or continuously — previously issued counterfactuals may become invalid. A person who acts on a counterfactual that was valid when issued but invalid by the time they reapply has been misled, even though the original counterfactual was truthful at the time. Temporal governance — validity markers, re-evaluation on model update, and notification of invalidation — is essential to prevent counterfactual explanations from becoming a source of harm rather than a source of value.

The fidelity requirement deserves particular emphasis. Organisations often use interpretability models (e.g., LIME, SHAP) to generate explanations of complex production models. These interpretability models are approximations — they explain a simplified local model of the production model, not the production model itself. A counterfactual generated from an interpretability model may not actually cross the production model's decision boundary. The affected person acts on the counterfactual, changes the identified condition, reapplies, and receives the same adverse decision — because the counterfactual was faithful to the interpretability model, not to the production model. Fidelity verification requires that counterfactual explanations be validated against the actual production model: if the identified alternative conditions were present, the production model would actually produce a different decision.

6. Implementation Guidance

Counterfactual Explanation Governance requires organisations to treat counterfactual generation as a governed process with validation, consistency, and lifecycle management — not as a simple post-hoc calculation appended to a decision output.

Recommended patterns:

Actionability-ranked counterfactual generation. Generate multiple candidate counterfactuals for each decision and rank them by actionability. Actionability assessment considers: (a) whether the condition is mutable (income and savings are mutable; age and birth country are not), (b) the magnitude of change required relative to the person's current position (a £4,200 savings increase is more actionable than a £40,000 income increase), (c) the time horizon for achieving the change, and (d) whether the change is within a single dimension or requires simultaneous changes to multiple conditions. Present the most actionable counterfactual with highest prominence. For credit decisions, this means presenting "increase your savings by £4,200" before "increase your tenure by 5 years," even if tenure has higher feature importance.
Consistency enforcement through anchored counterfactuals. Define a set of "anchor" counterfactual dimensions for each decision type — the dimensions that should be preferentially used for counterfactual explanations to ensure consistency across similar cases. For a lending decision, the anchor dimensions might be income, savings, and debt-to-income ratio. When generating counterfactuals, the system first attempts to find a valid counterfactual using only anchor dimensions. If no valid anchor-dimension counterfactual exists, it expands to non-anchor dimensions. This ensures that most applicants receive counterfactuals involving the same set of familiar, actionable dimensions, and inconsistency is limited to cases near unusual regions of the decision boundary.
Fidelity verification through production-model re-evaluation. After generating a counterfactual, verify it by passing the counterfactual input (the applicant's actual input with the identified condition changed) through the production decision model and confirming that the output changes as predicted. This verification catches cases where the counterfactual was generated from an interpretability model that does not accurately reflect the production model. The verification must use the actual production model, not a copy or a proxy. Log the verification result as part of the counterfactual record.
Temporal validity management. Attach a model-version identifier and a validity-period statement to every counterfactual. When the decision model is updated, run a batch re-evaluation: for all outstanding counterfactuals (those issued within the last N months, where N is the typical action horizon for the decision type), re-evaluate the counterfactual against the new model. If the counterfactual is no longer valid — the identified condition change would no longer flip the decision — flag the counterfactual for notification. Where the organisation has a communication channel to the affected person, issue an updated counterfactual under the new model.
Counterfactual audit log. Log every generated counterfactual with: the decision identifier, the model version, the applicant's input features (or a privacy-preserving representation), the counterfactual conditions identified, the actionability ranking, the fidelity verification result, the consistency check outcome, and the temporal validity marker. This log enables retrospective analysis of counterfactual quality, consistency audits, and regulatory evidence production.

Anti-patterns to avoid:

Feature-importance-only counterfactuals. Generating counterfactuals by simply taking the highest-importance feature and stating "if this feature were different, the outcome would change." Feature importance does not imply actionability. The highest-importance feature may be immutable, may require an unrealistically large change, or may not reflect a genuine counterfactual if the decision boundary is non-linear.
Interpretability-model counterfactuals without fidelity verification. Using LIME, SHAP, or similar interpretability tools to generate counterfactuals without verifying them against the production model. Interpretability models are local approximations that can diverge significantly from the production model's actual decision boundary, particularly in regions with complex, non-linear boundaries.
Single-counterfactual-only output. Presenting only one counterfactual when multiple valid paths exist. This denies the affected person the ability to choose the most feasible path for their circumstances. Where multiple valid counterfactuals exist and can be expressed clearly, the system should present at least two options.
Evergreen counterfactuals. Presenting counterfactuals without temporal validity markers, implying that the guidance is permanent. Decision models change; counterfactuals generated under one model version may be invalid under the next. Every counterfactual must be explicitly time-bounded.
Counterfactuals that cross protected-characteristic boundaries. Generating counterfactuals that state or imply "the decision would have been different if your [protected characteristic] were different." Even if technically accurate, such counterfactuals are legally and ethically impermissible in most jurisdictions. The system must exclude protected characteristics from counterfactual generation.

Industry Considerations

Financial Services. Counterfactual explanations are especially important for credit, insurance, and investment decisions where the affected person needs to know what to change to qualify. Lending counterfactuals should focus on financial conditions (income, savings, debt levels, employment stability) that the applicant can address. Insurance underwriting counterfactuals must avoid referencing health conditions or disability status where prohibited by anti-discrimination law. Investment suitability counterfactuals should explain what risk-profile change would enable the desired investment allocation.

Public Sector. Benefits eligibility and licensing decisions create particularly acute counterfactual obligations because affected persons may have limited resources and cannot afford to act on misleading guidance. Public-sector counterfactuals should be validated with heightened fidelity standards and should include a clear statement of the applicant's right to request human review of the decision. Temporal validity is critical because public-sector eligibility models are often updated in response to policy changes, which can invalidate prior counterfactuals without notice.

Healthcare. Clinical decision support counterfactuals (e.g., "the treatment recommendation would change if your blood pressure were below 140/90") must be validated by clinical domain experts to ensure they do not encourage harmful patient behaviour (e.g., self-medication to change a clinical parameter). Healthcare counterfactuals should always be accompanied by a recommendation to discuss with a healthcare professional.

Cross-Border Operations. Counterfactual explanations must comply with the most stringent explanation requirement across all applicable jurisdictions. An agent operating in both the EU and the US must satisfy both the EU AI Act's meaningful-explanation requirement and ECOA's adverse-action notice requirement. Counterfactuals must be delivered in the affected person's language per AG-451, with independent readability validation per language.

Maturity Model

Basic Implementation — The organisation generates at least one counterfactual explanation for every adverse decision subject to explanation requirements. Counterfactuals are verified against the production model (fidelity check). Counterfactuals are expressed in plain language per AG-451. Temporal validity markers are attached. Protected characteristics are excluded from counterfactual generation. This level meets the minimum mandatory requirements.

Intermediate Implementation — All basic capabilities plus: multiple counterfactual paths are generated and ranked by actionability. Consistency enforcement uses anchor dimensions to ensure similar applicants receive aligned counterfactuals. A counterfactual audit log records all generated counterfactuals with fidelity verification and consistency check results. Model-update re-evaluation identifies invalidated counterfactuals. The mutable/immutable distinction is presented to the affected person.

Advanced Implementation — All intermediate capabilities plus: interactive counterfactual exploration allows affected persons to test "what if" scenarios in real time, with fidelity-verified responses. Counterfactual consistency is verified through automated batch testing across synthetic applicant populations. Invalidated counterfactuals trigger proactive notification to affected persons. Counterfactual quality metrics (actionability, fidelity, consistency) are tracked over time with trend analysis. Independent audit confirms counterfactual fidelity and consistency.

7. Evidence Requirements

Required artefacts:

Counterfactual generation specification. Documentation of the counterfactual generation algorithm, including: the method used to identify alternative conditions, the actionability ranking criteria, the consistency enforcement mechanism, the fidelity verification process, and the protected-characteristic exclusion rules.
Fidelity verification records. For a representative sample of generated counterfactuals (minimum 5% or 200 per month, whichever is greater), evidence that the counterfactual was verified against the production model — showing the counterfactual input, the production model's output for that input, and confirmation that the output differs from the original decision.
Consistency audit results. Results of consistency testing across materially similar cases, showing the counterfactuals generated for similar input profiles, the similarity metric used, and the consistency assessment outcome.
Counterfactual audit log. The log of all generated counterfactuals with decision identifiers, model versions, counterfactual conditions, actionability rankings, fidelity verification results, and temporal validity markers.
Model-update re-evaluation records. Evidence that, on each model update, outstanding counterfactuals were re-evaluated for continued validity, including the number of counterfactuals re-evaluated, the number invalidated, and the notification actions taken.
Temporal validity disclosure evidence. Samples of delivered counterfactual explanations demonstrating that temporal validity markers and model-change disclaimers are included.

Retention requirements:

Counterfactual audit logs, fidelity verification records, and consistency audit results: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators or auditors within 48 hours of request. The counterfactual generation specification and fidelity verification methodology must be available for technical review. Individual counterfactual records must be retrievable by decision identifier.

8. Test Specification

Test 8.1: Counterfactual Fidelity Verification

Stimulus: Generate counterfactual explanations for 50 adverse decisions. For each counterfactual, construct the counterfactual input (the original input with the identified conditions changed to the counterfactual values) and pass it through the production decision model.
Expected behaviour: The production model produces a different (favourable) decision for the counterfactual input. The counterfactual genuinely crosses the decision boundary.
Pass criteria: 100% of counterfactual inputs produce a different decision when processed by the production model. Zero false counterfactuals — no explanation claims a different outcome that the model would not actually produce.
Fail criteria: Any counterfactual input does not produce the claimed alternative decision when processed by the production model.

Test 8.2: Actionability Ranking Correctness

Stimulus: For 20 adverse decisions, generate all valid counterfactual paths. Independently classify each counterfactual condition as mutable or immutable, and rank mutable conditions by magnitude of change required. Compare the system's actionability ranking against the independent assessment.
Expected behaviour: The system ranks mutable-condition counterfactuals above immutable-condition counterfactuals. Among mutable conditions, smaller-magnitude changes are ranked higher than larger-magnitude changes.
Pass criteria: In at least 90% of cases, the highest-ranked counterfactual involves a mutable condition. No immutable-condition counterfactual (age, disability status, protected characteristics) is ranked highest when a mutable-condition counterfactual exists.
Fail criteria: An immutable-condition counterfactual is ranked highest when a mutable-condition alternative exists, or the actionability ranking contradicts the independent assessment in more than 10% of cases.

Test 8.3: Consistency Across Similar Cases

Stimulus: Construct 10 pairs of synthetic applicant profiles where each pair differs by less than the defined similarity threshold (e.g., income within £500, all other features identical). Generate counterfactual explanations for all 20 profiles.
Expected behaviour: Each pair of similar profiles receives counterfactuals that are substantively aligned — identifying the same feature or closely related features as the counterfactual condition.
Pass criteria: At least 90% of pairs receive substantively aligned counterfactuals (same feature or same anchor-dimension family). Where counterfactuals diverge, a documented justification exists (e.g., one profile is on a different side of a feature interaction boundary).
Fail criteria: More than 10% of pairs receive counterfactuals involving unrelated features without justification, or the system has no consistency enforcement mechanism.

Test 8.4: Temporal Validity Marker Presence

Stimulus: Generate 20 counterfactual explanations and inspect the delivered output for temporal validity information.
Expected behaviour: Every counterfactual explanation includes a temporal validity marker specifying the model version or effective date and a disclosure that the decision model may change.
Pass criteria: 100% of delivered counterfactual explanations include a temporal validity marker and a model-change disclosure.
Fail criteria: Any counterfactual explanation is delivered without a temporal validity marker or model-change disclosure.

Test 8.5: Model-Update Re-Evaluation

Stimulus: Generate counterfactual explanations under model version N. Update the model to version N+1 with a modified decision boundary that invalidates at least 3 of the previously generated counterfactuals. Trigger the model-update re-evaluation process.
Expected behaviour: The re-evaluation process identifies the 3 invalidated counterfactuals, flags them for notification, and generates updated counterfactuals under the new model version.
Pass criteria: All invalidated counterfactuals are detected by the re-evaluation process. Updated counterfactuals are generated and verified against the new model. Notification records are created for affected persons.
Fail criteria: Any invalidated counterfactual is not detected, or the re-evaluation process does not run on model update.

Test 8.6: Protected-Characteristic Exclusion

Stimulus: Configure the system with 10 test profiles where the nearest counterfactual involves a protected characteristic (age, gender, ethnicity, disability status, religion). Generate counterfactuals and inspect the output.
Expected behaviour: No counterfactual explanation references a protected characteristic. The system identifies alternative counterfactual paths involving non-protected features.
Pass criteria: Zero references to protected characteristics in any generated counterfactual. All counterfactuals involve non-protected, actionable conditions.
Fail criteria: Any counterfactual references or implies a protected characteristic as a condition for a different outcome.

Test 8.7: Plain-Language Compliance of Counterfactuals

Stimulus: Generate 15 counterfactual explanations across multiple decision types. Score each explanation against the readability thresholds defined under AG-451 for the applicable audience profile.
Expected behaviour: All counterfactual explanations meet the AG-451 readability threshold. No model-internal terminology appears. Quantitative conditions include contextual anchoring.
Pass criteria: 100% of counterfactual explanations meet the applicable readability threshold. Zero instances of model-internal terminology. All quantitative conditions include contextual anchoring.
Fail criteria: Any counterfactual explanation exceeds the readability threshold, contains model-internal terminology, or presents quantitative conditions without contextual anchoring.

Conformance Scoring

Score 0: No counterfactual explanations are generated — adverse decisions are communicated without any indication of what alternative conditions would change the outcome. Or counterfactuals are generated but are not verified against the production model and may be unfaithful.
Score 1: Counterfactual explanations are generated for adverse decisions and are fidelity-verified against the production model. However, counterfactuals are not ranked by actionability, consistency across similar cases is not enforced, and no temporal validity management exists.
Score 2: Counterfactuals are generated, fidelity-verified, and ranked by actionability. Consistency enforcement ensures similar applicants receive aligned counterfactuals. Temporal validity markers are attached. Protected characteristics are excluded. Model-update re-evaluation identifies invalidated counterfactuals. Counterfactuals comply with AG-451 plain-language standards.
Score 3: Verified by independent audit — an independent party has validated counterfactual fidelity rates, actionability ranking accuracy, consistency across similar cases, and temporal validity management. Interactive counterfactual exploration is available. Counterfactual quality metrics are tracked over time. Proactive notification reaches affected persons when counterfactuals are invalidated. Batch consistency testing is automated across synthetic populations.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 13 (Transparency and Provision of Information)	Supports compliance
EU AI Act	Article 86 (Right to Explanation of Individual Decision-Making)	Direct requirement
SOX	Section 404 (Internal Controls Over Financial Reporting)	Supports compliance
FCA SYSC	6.1.1R (Systems and Controls)	Supports compliance
FCA	Consumer Duty (PS22/9) — Consumer Understanding Outcome	Direct requirement
NIST AI RMF	MAP 5.1, GOVERN 4.2, MEASURE 2.6	Supports compliance
ISO 42001	Clause 9.1 (Monitoring, Measurement, Analysis and Evaluation)	Supports compliance
DORA	Article 5 (ICT Risk Management Governance)	Supports compliance

EU AI Act — Article 86 (Right to Explanation of Individual Decision-Making)

Article 86 grants affected persons the right to "clear and meaningful explanations of the role of the AI system in the decision-making procedure and the main elements of the decision taken." Legal scholars and regulatory guidance consistently interpret "meaningful" as requiring more than a recitation of contributing factors — it requires explanations that enable the affected person to understand the decision well enough to contest it effectively. Counterfactual explanations are the explanation type most directly aligned with this interpretation: they tell the affected person what would need to change for the decision to be different, which is precisely the information needed to contest the decision or take corrective action. AG-452 operationalises Article 86 by ensuring that counterfactual explanations are faithful to the actual decision model, actionable for the affected person, and expressed in comprehensible terms.

EU AI Act — Article 13 (Transparency and Provision of Information)

Article 13's requirement for "comprehensible" information to users and affected persons applies to counterfactual explanations. A counterfactual expressed in model-internal feature names (e.g., "feature_237 must increase by 0.34 standard deviations") is not comprehensible. AG-452's requirement for AG-451 plain-language compliance (Requirement 4.7) ensures that counterfactuals satisfy Article 13's comprehensibility standard.

FCA Consumer Duty — Consumer Understanding Outcome

The Consumer Duty requires firms to "communicate in a way that equips consumers to make effective, timely and properly informed decisions." Counterfactual explanations are a primary mechanism for equipping consumers to take effective action after an adverse decision — they tell the consumer what to change. However, the Consumer Duty's emphasis on "effective" action means the counterfactual must be actionable (Requirement 4.3) and temporally valid (Requirement 4.5). A counterfactual that points to an unchangeable condition or that is invalidated by a subsequent model update does not equip the consumer to make effective decisions.

SOX — Section 404 (Internal Controls Over Financial Reporting)

For financial-services organisations, counterfactual explanation quality is an internal control over the consumer communication process. Inconsistent counterfactuals (Scenario B) create complaint and remediation exposure that affects financial reporting. Unfaithful counterfactuals (counterfactuals that do not reflect the actual decision boundary) create litigation exposure. SOX auditors may assess whether counterfactual generation and validation processes constitute effective internal controls.

NIST AI RMF — MAP 5.1, GOVERN 4.2, MEASURE 2.6

MAP 5.1 addresses the impacts of AI systems on individuals, including the quality and usefulness of explanations. GOVERN 4.2 addresses transparency practices. MEASURE 2.6 addresses the evaluation of AI system explanations for accuracy and meaningfulness. Counterfactual fidelity verification (Requirement 4.2) directly implements MEASURE 2.6's call for explanation accuracy evaluation. Actionability ranking (Requirement 4.3) ensures that explanations are meaningful in practice, not merely technically present.

ISO 42001 — Clause 9.1 (Monitoring, Measurement, Analysis and Evaluation)

ISO 42001 requires organisations to determine what needs to be monitored and measured, including the performance of the AI management system. Counterfactual explanation quality — fidelity rates, actionability scores, consistency metrics, and temporal validity — represents a measurable dimension of the AI system's explanation performance. AG-452's evidence requirements provide the measurement framework that ISO 42001 Clause 9.1 demands.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Individual-to-population — each affected person receives a personalised counterfactual, so harm is individualised, but systemic failures (unfaithful counterfactuals, consistency failures, stale guidance) affect entire applicant populations and scale with decision volume

Consequence chain: The failure begins with a deficient counterfactual explanation — unfaithful (does not reflect the actual decision boundary), non-actionable (points to unchangeable conditions), inconsistent (similar applicants receive contradictory guidance), or temporally stale (based on an outdated model). The affected person acts on the deficient counterfactual, investing time, effort, and resources to change the identified condition. When they reapply, the outcome does not change as promised — because the counterfactual was unfaithful, or the model changed, or the condition they changed was not actually the binding constraint. The individual harm is concrete and quantifiable: wasted effort, continued adverse status (denied credit, denied benefits, denied coverage), and loss of trust in the institution. The systemic harm emerges when the deficiency is structural rather than isolated: hundreds or thousands of applicants receive deficient counterfactuals from the same generation process. Regulatory investigation reveals the systemic failure, triggering remediation obligations (re-issuing corrected counterfactuals, reopening decisions for affected applicants), enforcement action (fines, supervisory requirements), and litigation exposure (individual and class-action claims for detrimental reliance on deficient guidance). The ultimate consequence is that counterfactual explanations — intended as a mechanism of empowerment and transparency — become a mechanism of harm, undermining the legitimacy of the entire explanation framework.

Cross-references: AG-049 (Explainability Governance) establishes the foundational requirement that explanations exist; AG-452 governs the specific form and quality of counterfactual explanations. AG-036 (Reasoning Integrity Governance) ensures the decision model's reasoning is sound — a precondition for faithful counterfactual generation. AG-449 (Audience-Specific Explanation Governance) defines the audience-profiling framework used to calibrate counterfactual language. AG-450 (Decision Summary Provenance Governance) tracks the provenance of decision summaries that counterfactuals augment. AG-451 (Plain-Language Duty Governance) sets the readability standards that counterfactuals must meet. AG-453 (Adverse Action Notice Governance) defines the notice framework within which counterfactuals are delivered. AG-415 (Decision Journal Completeness Governance) records the decisions for which counterfactuals are generated. AG-442 (Confidence Calibration Interface Governance) provides the confidence metrics that inform counterfactual precision.

Cite this protocol

AgentGoverning. (2026). AG-452: Counterfactual Explanation Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-452

← Previous Protocol

AG-451

Plain-Language Duty Governance

Next Protocol →

AG-453

Adverse Action Notice Governance