This dimension governs the design, operation, audit, and continuous correction of AI-assisted case prioritisation systems operating within public sector, justice, border enforcement, and law enforcement contexts — covering any algorithmic or agentic mechanism that determines the order, urgency category, resource allocation, or response timeline assigned to individual cases, complaints, permit applications, asylum claims, enforcement referrals, or investigative leads. Fairness in case prioritisation is a precondition of due process and administrative justice: when an AI system systematically deprioritises cases on the basis of proxies for protected characteristics — such as postcode, language preference, legal representation status, or origin of submission — it translates latent model bias into concrete harm through delayed justice, denied rights, and differential enforcement outcomes experienced disproportionately by already-marginalised groups. Failure in this dimension manifests as statistically measurable disparate wait times across demographic groups, evidence that high-severity cases belonging to particular communities are systematically downgraded, absence of auditable rationale for individual triage decisions, and retrospective discovery that enforcement actions were directed toward populations whose cases were artificially elevated by uncorrected model drift.
Example 1 — Asylum Claim Triage, Border Processing Agency
A border processing agency deploys an AI triage system to prioritise approximately 14,000 pending asylum claims. The system assigns each claim one of four urgency bands (Immediate, Expedited, Standard, Deferred) based on a composite score derived from 23 features including country of origin, travel route, presence of legal counsel, and submission channel (paper versus digital portal). A post-deployment audit conducted 18 months after go-live identifies that claims submitted on paper — a channel used at 3.4× the rate by applicants from Sub-Saharan African countries compared to applicants from Eastern European countries — are assigned to the Deferred band at a rate of 61% versus 29% for portal submissions. The disparate outcome is not explained by claim complexity or merit indicators: controlling for claim type and evidence completeness, the paper-submission proxy adds a statistically significant 14-day median delay to resolution. During the 18-month period, 2,847 claimants in the Deferred band whose claims were subsequently granted refugee status experienced delays exceeding 11 months — a period during which 312 faced deportation proceedings before their claims were processed. The root cause is that the model was trained on historical processing data reflecting a pre-2019 workflow in which paper submissions were genuinely slower due to manual digitisation backlogs that no longer exist. The bias was not caught because the system lacked a required fairness dashboard, no demographic parity metric was computed at deployment, and the human review layer operated without access to disaggregated triage statistics.
Example 2 — Criminal Enforcement Referral Scoring, Municipal Law Enforcement
A municipal police department integrates a predictive case routing tool to prioritise investigative referrals across a backlog of approximately 8,600 open property crime cases. The tool scores each case for "investigative viability" and routes high-scoring cases to a dedicated fast-track unit with an average resolution time of 19 days, while low-scoring cases are assigned to a general queue with an average resolution time of 94 days. An independent review panel examines 12 months of routing decisions and finds that cases in which the complainant's registered address falls within three postcodes — all majority Black and Hispanic neighbourhoods — receive viability scores averaging 31 points lower (on a 100-point scale) than cases of identical reported crime type and evidence availability in other postcodes. The panel establishes that postcode is not a direct input but is reconstructed by the model through correlations between beat identifier, reporting officer precinct, and temporal patterns of case logging, all of which are highly collinear with neighbourhood demographics. The practical consequence is that victims of property crime in these three postcodes wait an average of 73 additional days for investigative action compared to demographically similar victims elsewhere. Across the review period, 1,104 cases in the affected postcodes were closed as "unresolvable" after the general queue waiting period elapsed — a closure rate 2.7× higher than the fast-track unit's rate for comparable cases. The department had no mechanism to detect proxy reconstruction, no protected-attribute fairness test had been applied since procurement, and the vendor's model card did not disclose postcode-correlated feature behaviour.
Example 3 — Regulatory Complaint Queue Management, Financial Conduct Regulator
A national financial conduct regulator uses an AI queue management system to triage approximately 22,000 consumer complaints per quarter. The system assigns each complaint a priority score that determines whether it receives a substantive response within 5 business days (Priority 1), 15 business days (Priority 2), or 45 business days (Priority 3). An internal fairness review identifies that complaints submitted in languages other than the dominant national language — representing 7.4% of total complaints — are assigned to Priority 3 at a rate of 54%, compared to 18% for complaints in the dominant language, even after controlling for complaint complexity using an independently computed complexity index. The disparity is caused by the model penalising low lexical similarity to its training corpus: the training set contained 98.7% dominant-language complaints, and the model learned to equate low lexical familiarity with low urgency — a feature that is legally and logically unwarranted. During a calendar year in which this pattern persists, 843 complainants whose matters involved time-sensitive mis-selling of investment products received Priority 3 assignments; for 214 of these complainants, the limitation period for private legal action expired before the regulator issued a substantive response that would have supported their claim. The regulator faces judicial review proceedings from a claimants' advocacy group. The system had not been tested for linguistic fairness at procurement, no Language Fairness Audit Protocol was in place, and there was no override mechanism available to intake officers who noticed the pattern.
This dimension applies to any AI or agentic system that performs, assists, or materially influences the prioritisation, triage, scoring, banding, routing, or scheduling of cases, complaints, referrals, applications, enforcement actions, or investigative leads within a public sector, justice, border enforcement, or law enforcement context. It applies regardless of whether the AI system makes a final determination or provides a ranked recommendation to a human decision-maker, because downstream human actors frequently ratify AI-generated priority scores without substantive independent review. It applies across all stages of a system's lifecycle — procurement, initial deployment, model update, and ongoing operation. It does not apply to internal administrative workflow tools that prioritise only operational tasks with no direct impact on the rights, status, or outcomes of individuals.
4.1.1 The deploying organisation MUST define, prior to deployment, a written Fairness Baseline Document (FBD) that specifies: (a) the protected and sensitive attributes relevant to the case population being served; (b) the fairness metrics to be applied (at minimum: demographic parity difference, equalised odds difference, and individual consistency score); (c) the acceptable tolerance thresholds for each metric; and (d) the rationale for selecting those metrics over alternatives.
4.1.2 The FBD MUST be reviewed and re-approved following any model update, training data refresh, or change to the feature set used in prioritisation scoring.
4.1.3 The deploying organisation MUST NOT use the same fairness metrics exclusively without evaluating whether the chosen metrics mask disparate impact when examined from the perspective of a different metric class.
4.2.1 The system MUST undergo a Proxy Reconstruction Analysis (PRA) prior to deployment and at intervals not exceeding 12 months, designed to identify whether any input feature or combination of features functions as a statistical proxy for a protected characteristic within the relevant case population.
4.2.2 Any feature identified as a proxy with a Cramér's V correlation coefficient of 0.15 or above against a protected attribute MUST be flagged for mandatory review by the responsible AI governance team before the feature is retained in the model.
4.2.3 Features that are determined to be unjustified proxies MUST be removed, bounded, or re-weighted prior to or at the next model update cycle, and the justification for each decision MUST be recorded in the system's Governance Log.
4.2.4 Submission channel, language of submission, legal representation status, geographic identifier below regional level, and digital-versus-physical submission format MUST be individually examined in every PRA as presumptive proxies unless affirmatively disproven for the specific deployment context.
4.3.1 The system MUST undergo a Pre-Deployment Fairness Assessment (PDFA) using representative test data that reflects the demographic composition of the actual case population to be served, not merely the training population.
4.3.2 The PDFA MUST compute disaggregated prioritisation outcome rates for each protected attribute group defined in the FBD and MUST document whether the measured disparities fall within the tolerance thresholds specified in the FBD.
4.3.3 The system MUST NOT be deployed if any measured disparity exceeds the FBD tolerance threshold unless the deploying organisation has documented a written Disparity Acceptance Justification (DAJ) reviewed and approved by a senior responsible officer and a designated fairness specialist.
4.3.4 The PDFA MUST include adversarial scenario testing in which synthetic cases with identical substantive characteristics but differing sensitive attribute signals are submitted to the system, and the resulting priority scores MUST be compared.
4.4.1 The deploying organisation MUST maintain a real-time or near-real-time Fairness Monitoring Dashboard (FMD) that continuously displays disaggregated prioritisation statistics across each protected attribute group defined in the FBD.
4.4.2 The FMD MUST generate an automated alert when any fairness metric exceeds its FBD tolerance threshold over a rolling 30-day observation window.
4.4.3 Automated alerts generated under 4.4.2 MUST be triaged by a designated Fairness Response Officer within 5 business days of generation, and a documented response action MUST be initiated within 15 business days.
4.4.4 The FMD MUST be accessible to the human oversight function described in 4.6 and MUST NOT be restricted exclusively to the technical operations team.
4.4.5 Monitoring data MUST be retained in tamper-evident format for a minimum of 7 years.
4.5.1 The system MUST be capable of producing, upon request, a human-readable explanation for any individual priority score or band assignment, specifying the features that materially contributed to that score and their relative weights.
4.5.2 Explanations MUST be expressed in language accessible to a non-specialist case officer and MUST NOT consist solely of numeric outputs or opaque model internals.
4.5.3 The explanation generation mechanism MUST be validated to ensure that the explanations it produces accurately reflect the model's actual decision logic rather than post-hoc rationalisation from a separate explanation model.
4.5.4 Organisations SHOULD surface individual-level explanations to affected parties upon request, subject to applicable legal exemptions, and MUST log all such requests and responses.
4.6.1 The system MUST provide a documented, accessible, and operationally supported mechanism through which a designated human case officer can override an AI-assigned priority score and substitute a human-determined priority assignment.
4.6.2 Override decisions MUST be logged with: the identity of the overriding officer; the original AI-assigned score; the replacement score; a mandatory free-text justification field of no fewer than 50 characters; and the timestamp of the override.
4.6.3 Override rates MUST be tracked by protected attribute group, case type, and overriding officer, and MUST be reviewed at minimum quarterly to detect patterns that may indicate systematic gaming, under-use, or demographic inconsistency in override behaviour.
4.6.4 The human oversight function MUST have access to the Fairness Monitoring Dashboard (4.4) and MUST conduct a documented quarterly review of system-wide fairness performance.
4.6.5 The organisation MUST NOT configure incentive structures, performance metrics, or workload targets for case officers in a manner that systematically discourages use of the override function.
4.7.1 Where a post-deployment audit or operational monitoring alert identifies a fairness breach that has materially affected the prioritisation of a class of cases, the deploying organisation MUST conduct a Retrospective Impact Assessment (RIA) covering all affected cases within the breach period.
4.7.2 The RIA MUST identify, to the extent technically feasible, the individual cases affected, quantify the delay or deprioritisation experienced, and assess whether the delay has caused or is likely to cause material harm to the affected individuals.
4.7.3 Where the RIA identifies individuals who have suffered material harm attributable to unjustified prioritisation disparity, the organisation MUST notify those individuals of the error and the remedial action being taken, subject to applicable legal frameworks governing such disclosure.
4.7.4 The RIA and its findings MUST be reported to the organisation's senior governance body within 30 calendar days of the identification of the fairness breach.
4.8.1 Where the prioritisation system is procured from a third-party vendor, the deploying organisation MUST require, as a contractual condition, that the vendor provides: (a) a model card disclosing training data composition, known bias risks, and tested fairness metrics; (b) access to model internals or explanations sufficient to conduct a Proxy Reconstruction Analysis; and (c) support for integration with the deploying organisation's Fairness Monitoring Dashboard.
4.8.2 The deploying organisation MUST conduct an independent PDFA using its own case population data and MUST NOT rely solely on vendor-supplied fairness assessments.
4.8.3 Vendor contracts MUST include provisions requiring notification to the deploying organisation within 14 days of any model update, retraining event, or identified bias issue that affects the prioritisation function.
4.8.4 The deploying organisation SHOULD require vendors to undergo third-party fairness audits at intervals not exceeding 24 months and SHOULD request audit findings as a condition of contract renewal.
4.9.1 The deploying organisation MUST designate a named Senior Responsible Officer (SRO) with personal accountability for compliance with this dimension, and this designation MUST be documented and reviewed annually.
4.9.2 The organisation MUST maintain a Governance Log for the system that records, at minimum: all FBD versions and approval dates; all PDFA results; all PRA findings and feature decisions; all FMD alert events and responses; all override audit results; and all RIA findings.
4.9.3 The Governance Log MUST be maintained in a format that supports external audit and MUST be made available to oversight bodies, regulatory inspectors, or judicial review proceedings upon lawful request.
4.9.4 The organisation MUST conduct a comprehensive Annual Governance Review of the system that evaluates compliance with all requirements in this dimension and produces a written report submitted to the senior governance body.
4.9.5 The organisation MAY publish a summary of Annual Governance Review findings — excluding operationally sensitive details — as a transparency measure, and is encouraged to do so where public accountability obligations apply.
Case prioritisation is frequently treated as an administrative convenience function — a neutral mechanism for managing queues. This framing is structurally incorrect in public sector and law enforcement contexts. Priority assignment determines not merely the sequence in which cases are processed but, in many systems, whether a case is processed at all within a timeframe that preserves the practical rights of the individual. Limitation periods expire. Deportation orders are executed. Evidence degrades. Life circumstances deteriorate. The decision to place a case in a lower-priority band is therefore not an administrative scheduling choice: it is a decision with potential constitutional and human rights implications equivalent to many substantive determinations. Any AI system that makes or materially influences this decision must be governed as a rights-affecting instrument.
Direct use of protected characteristics in algorithmic prioritisation is both legally prohibited and technically rare. The more common and more dangerous failure mode is indirect discrimination through proxy features — input variables that are neutral in appearance but correlated with protected characteristics within the specific case population. This correlation is often not present in the training data in an obvious form, but emerges from historical processing patterns that themselves reflect prior discriminatory workflows. A model trained on historical processing outcomes will learn that certain case attributes are associated with faster resolution — not because those attributes are legitimately predictive of case urgency, but because they reflect the demographics of users who historically had better access to the system. The model then perpetuates and amplifies the historical disparity. This mechanism cannot be addressed through surface-level algorithmic audits that examine only direct feature inclusion; it requires the proactive proxy analysis mandated in Section 4.2.
Human case officers who operate within AI-assisted triage systems are subject to well-documented anchoring effects: when a system presents a priority score before the officer makes their assessment, the officer's subsequent judgement is systematically biased toward the AI recommendation even when the officer has independent grounds for deviation. This means that structural technical controls — proxy removal, fairness monitoring, adversarial testing — must be accompanied by governance controls that preserve and reinforce the capacity for genuine human override (Section 4.6). The override mechanism is not an emergency fallback; it is a primary fairness control whose effectiveness depends on organisational culture, workload design, and performance incentive structures that actively support its use rather than discourage it through throughput pressure.
Unlike lower-tier AI systems, prioritisation tools in law enforcement and justice contexts operate on populations whose vulnerability to compounding harm is high. A 30-day drift in model fairness performance that goes undetected can affect thousands of cases during that window, creating a cohort of individuals who have experienced unjustified deprioritisation and whose retrospective remedy is practically limited. The continuous monitoring requirement in Section 4.4 is therefore not a best-practice enhancement but a minimum structural necessity given the blast radius of undetected drift in this context.
Fairness-by-Design Feature Engineering. During model development, define the feature set by beginning with a protected attribute exclusion analysis before selecting input variables. Enumerate the protected attributes relevant to the case population, map known proxy pathways from existing literature and domain expertise, and apply a proxy screen to candidate features before training. This is more effective than post-hoc debiasing because it prevents proxy correlations from being learned into model weights in the first place.
Stratified Evaluation Datasets. Construct evaluation datasets that are stratified by protected attribute group such that each group is represented at statistical power sufficient to detect disparities at the tolerance thresholds defined in the FBD. Many deployment failures occur because evaluation datasets are too small in the minority-group stratum to detect meaningful disparities at acceptable confidence levels. A minimum of 200 cases per protected attribute subgroup should be treated as a floor for evaluation, with larger samples required where the tolerance threshold is narrow.
Fairness Dashboard Integration with Case Management System. Integrate the FMD directly with the live case management system rather than operating it as a separate reporting tool. Real-time integration enables alert generation tied to actual caseload dynamics, allows the FMD to surface case-level anomalies alongside aggregate statistics, and reduces the operational latency between a fairness event and a corrective response.
Structured Override Logging with Mandatory Justification Taxonomy. Rather than accepting purely free-text override justifications, provide a structured taxonomy of override reasons (e.g., "urgency indicator not captured by model features," "protected attribute proxy suspected," "additional evidence received post-scoring") combined with a mandatory free-text supplement. This enables systematic analysis of override reasons across the caseload and supports identification of feature gaps that the model has not learned.
Fairness-Oriented Vendor Engagement Protocol. Establish a pre-procurement fairness questionnaire that requires vendors to disclose training data demographics, known failure modes, and prior bias incidents. Require vendors to provide access to intermediate model outputs (not just final scores) to enable independent proxy analysis. Include contractual provisions for model escrow or documentation sufficient to conduct an independent PDFA if the vendor relationship terminates.
Maturity Model for Case Prioritisation Fairness Governance:
| Maturity Level | Characteristics |
|---|---|
| Level 1 — Initial | No formal fairness documentation; prioritisation tool deployed without PDFA; monitoring limited to throughput metrics |
| Level 2 — Developing | FBD exists but may not reflect current model; PDFA conducted at deployment only; FMD operational but not integrated with alerts; override function available but underused |
| Level 3 — Defined | FBD current and reviewed at model update; PDFA conducted at deployment and annually; FMD with automated alerting; override rates monitored quarterly; Governance Log maintained |
| Level 4 — Managed | PRA conducted proactively with proxy correlation tracking; FMD integrated with case management system; override justification taxonomy in use; RIA protocol tested and exercised; Annual Governance Review conducted and reported to senior body |
| Level 5 — Optimising | Continuous proxy monitoring with automated feature flagging; public transparency reporting; fairness metrics integrated into vendor contract KPIs; retrospective impact assessments used to inform retraining cycles; cross-agency fairness benchmarking participation |
Fairness Washing Through Single-Metric Compliance. Selecting a single fairness metric (most commonly demographic parity) and declaring compliance on that metric while ignoring that the same model may fail on equalised odds or calibration within groups. Single-metric compliance is particularly deceptive when the metric is computed at a population level that conceals disparities at the intersection of multiple protected attributes.
Delegating Fairness Responsibility to the Vendor. Treating vendor-provided model cards or vendor-conducted audits as sufficient evidence of fairness compliance for the deploying organisation's context. The vendor's training population is not the deploying organisation's case population. Disparities that do not manifest in the vendor's test data may be prominent in the deployment context. Deploying organisations bear independent legal responsibility and cannot discharge it through contractual delegation.
Designing Override Mechanisms That Are Nominally Available but Practically Inaccessible. Providing an override function in the case management UI while simultaneously setting throughput targets that make override use practically impossible within working hours, or requiring multi-level supervisory approval for overrides in a manner that imposes disproportionate time costs. These structural disincentives nullify the fairness value of the override function while allowing the organisation to assert that an override mechanism exists.
Using Closure Rate as a Proxy for System Quality. Optimising the prioritisation system toward minimising overall backlog or maximising cases closed within target timeframes, without disaggregating those metrics by demographic group. A system that closes cases quickly by systematically concentrating resources on high-viability, low-complexity cases from well-resourced communities may show excellent aggregate performance while failing catastrophically on fairness.
Deploying Updated Models Without Re-Running Fairness Assessments. Treating fairness compliance as a one-time deployment gate rather than a continuous obligation. Model updates, even minor ones, can alter proxy correlations and fairness performance in unpredictable ways. Any model update that changes feature weights, adds features, or retrains on new data MUST trigger a new PDFA under Section 4.3.2.
Restricting Fairness Dashboard Access to Technical Teams. Configuring the FMD such that fairness statistics are visible only to the model operations or data science team. Human oversight functions, governance officers, and senior responsible officers require independent access to fairness performance data to fulfil their oversight roles. Technical gate-keeping of fairness information is an organisational control failure regardless of the technical sophistication of the team holding the data.
In asylum and border processing contexts, case populations frequently include individuals whose protected attribute data — nationality, ethnicity, religion — is legitimately relevant to substantive case merit but must be handled with strict controls to prevent its inappropriate use in triage scoring. Organisations in this context should implement attribute-aware fairness testing that separately evaluates whether substantive relevance is being inappropriately extended to procedural prioritisation.
In criminal justice and enforcement contexts, the base rate problem in predictive prioritisation is acute: if the model is trained on historical enforcement data that reflects prior discriminatory enforcement patterns, the model will learn to replicate those patterns as predictive of "real crime." Organisations must apply counterfactual data augmentation techniques or, where that is insufficient, consider whether historical enforcement data can legitimately serve as training ground truth at all.
In regulatory complaint contexts, the intersection of language access and fairness is a primary risk vector. Organisations should specifically test for lexical-score-based proxy discrimination and should ensure that language assistance workflows are integrated with prioritisation systems in a manner that prevents language barrier from functioning as a de facto deprioritisation signal.
| Artefact | Description | Retention Period |
|---|---|---|
| Fairness Baseline Document (FBD) | Defines protected attributes, fairness metrics, and tolerance thresholds; all versions retained | 10 years from date of system decommissioning |
| Pre-Deployment Fairness Assessment (PDFA) | Full test results including disaggregated statistics and comparison to FBD thresholds | 10 years from date of system decommissioning |
| Proxy Reconstruction Analysis (PRA) | Feature correlation matrices, Cramér's V outputs, feature retention decisions, and justifications | 10 years from date of system decommissioning |
| Disparity Acceptance Justification (DAJ) | Where applicable: signed approval by SRO and fairness specialist with rationale | 10 years from date of system decommissioning |
| Fairness Monitoring Dashboard Logs | Time-series fairness metric records, alert events, and response records | 7 years minimum; 10 years preferred |
| Override Audit Logs | Individual override records per 4.6.2; quarterly override audit reports | 7 years minimum |
| Retrospective Impact Assessments (RIA) | Findings, affected case lists (where identifiable), harm quantification, and remedial actions | 10 years; subject to any longer period required by judicial proceedings |
| Governance Log | All entries per 4.9.2 in chronological order | 10 years from date of system decommissioning |
| Annual Governance Review Reports | Written reports submitted to senior governance body | 10 years from date of system decommissioning |
| Vendor Model Cards and Contracts | Including fairness disclosure obligations, notification clauses, and audit provisions | Duration of vendor relationship plus 7 years |
| Senior Responsible Officer Designation Records | Named SRO, date of designation, and annual review confirmation | Duration of SRO tenure plus 7 years |
All artefacts MUST be stored in formats that: (a) support external audit access; (b) are tamper-evident or include integrity verification; (c) are accessible to oversight bodies, ombudspersons, or judicial review proceedings upon lawful request; and (d) are protected against premature deletion through accidental or deliberate data management actions. Organisations SHOULD maintain a secondary off-system archive of critical governance artefacts (FBD, PDFA, PRA, RIA) to protect against system decommissioning events that destroy associated records.
Maps to: 4.1.1, 4.1.2 Objective: Verify that a current, complete FBD exists and has been re-approved following all material system changes. Method: Request the most recent FBD. Check for presence of all four required elements (protected attributes, fairness metrics, tolerance thresholds, metric selection rationale). Cross-reference FBD version date against the system's model update history. Verify that any model update since the FBD creation date has been followed by an updated FBD bearing a new approval signature. Evidence Required: FBD document with version history; model update log; approval signatures with dates.
| Score | Criteria |
|---|---|
| 3 — Full Conformance | FBD present, complete across all four required elements, and re-approved within 30 days of each model update in the preceding 24-month period |
| 2 — Substantial Conformance | FBD present and substantially complete; one or two elements missing or one re-approval cycle missed |
| 1 — Partial Conformance | FBD present but significantly incomplete, or not updated following two or more model updates |
| 0 — Non-Conformance | No FBD present, or FBD predates initial deployment with no subsequent updates |
Maps to: 4.2.1, 4.2.2, 4.2.3, 4.2.4 Objective: Verify that a PRA has been conducted within the required 12-month cycle and that feature decisions have been made and recorded for all flagged proxies. Method: Request the most recent PRA report. Verify that it covers the full feature set of the current model. Check for presence of Cramér's V correlation computations for each feature against each protected attribute. Verify that features meeting the 0.15 threshold have been reviewed and that decisions are recorded in the Governance Log. Confirm that the four presumptive proxy categories (submission channel, language of submission, legal representation status, sub-regional geographic identifier, and digital-versus-physical format) were individually examined. Evidence Required: PRA report with correlation matrices; Governance Log entries for feature decisions; feature set documentation from current model version.
| Score | Criteria |
|---|---|
| 3 — Full Conformance | PRA completed within 12 months; all four presumptive proxy categories individually examined; all features at or above 0.15 threshold have documented decisions in Governance Log |
| 2 — Substantial Conformance | PRA completed within 12 months; most presumptive categories examined; minor gaps in Governance Log documentation |
| 1 — Partial Conformance | PRA completed but more than 12 months old, or fewer than three presumptive proxy categories examined |
| 0 — Non-Conformance | No PRA conducted, or Cramér's V analysis not present, or flagged features have no documented decision |
Maps to: 4.3.1, 4.3.2, 4.3.3, 4.3.4 Objective: Verify that the PDFA used a representative test dataset, computed disaggregated outcomes, compared results to FBD thresholds, and included adversarial scenario testing. Method: Request the PDFA report for the current model version. Verify that the test dataset composition matches or reasonably approximates the demographic composition of the actual case population (using population-level reference data where available). Check for disaggregated outcome statistics for each FBD-defined protected attribute group. Compare reported disparity values against FBD tolerance thresholds and verify that any exceedances are accompanied by a signed DAJ. Verify that adversarial scenario testing (matched synthetic cases) was conducted and results documented. Evidence Required: PDFA report; test dataset demographic composition documentation; FBD tolerance thresholds; DAJ where applicable; adversarial test case results.
| Score | Criteria |
|---|---|
| 3 — Full Conformance | PDFA present for current model; representative dataset used and documented; all protected attribute groups evaluated; adversarial testing conducted; all exceedances either corrected or covered by signed DAJ |
| 2 — Substantial Conformance | PDFA present; dataset representativeness partially documented; most protected attribute groups evaluated; adversarial testing conducted but limited in scope |
| 1 — Partial Conformance | PDFA conducted but dataset is known to be unrepresentative, or fewer than half of FBD-defined groups evaluated, or adversarial testing absent |
| 0 — Non-Conformance | No PDFA for current model version, or system deployed with known threshold exceedances and no DAJ |
Maps to: 4.4.1, 4
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 9 (Risk Management System) | Direct requirement |
| NIST AI RMF | GOVERN 1.1, MAP 3.2, MANAGE 2.2 | Supports compliance |
| ISO 42001 | Clause 6.1 (Actions to Address Risks), Clause 8.2 (AI Risk Assessment) | Supports compliance |
Article 9 requires providers of high-risk AI systems to establish and maintain a risk management system that identifies, analyses, estimates, and evaluates risks. Case Prioritisation Fairness Governance implements a specific risk mitigation measure within this framework. The regulation requires that risks be mitigated "as far as technically feasible" using appropriate risk management measures. For deployments classified as high-risk under Annex III, compliance with AG-559 supports the Article 9 obligation by providing structural governance controls rather than relying solely on the agent's own reasoning or behavioural compliance.
GOVERN 1.1 addresses legal and regulatory requirements; MAP 3.2 addresses risk context mapping; MANAGE 2.2 addresses risk mitigation through enforceable controls. AG-559 supports compliance by establishing structural governance boundaries that implement the framework's approach to AI risk management.
Clause 6.1 requires organisations to determine actions to address risks and opportunities within the AI management system. Clause 8.2 requires AI risk assessment. Case Prioritisation Fairness Governance implements a risk treatment control within the AI management system, directly satisfying the requirement for structured risk mitigation.
| Field | Value |
|---|---|
| Severity Rating | Critical |
| Blast Radius | Organisation-wide — potentially cross-organisation where agents interact with external counterparties or shared infrastructure |
| Escalation Path | Immediate executive notification and regulatory disclosure assessment |
Consequence chain: Without case prioritisation fairness governance, the governance framework has a structural gap that can be exploited at machine speed. The failure mode is not gradual degradation — it is a binary absence of control that permits unbounded agent behaviour in the dimension this protocol governs. The immediate consequence is uncontrolled agent action within the scope of AG-559, potentially cascading to dependent dimensions and downstream systems. The operational impact includes regulatory enforcement action, material financial or operational loss, reputational damage, and potential personal liability for senior managers under applicable accountability regimes. Recovery requires both technical remediation and regulatory engagement, with timelines measured in weeks to months.