AG-563: Protected Characteristic Firewall Governance

Section 2: Summary

This dimension governs the mechanisms by which AI agents operating in public sector, justice, and law enforcement contexts are prevented from using legally protected characteristics — including race, ethnicity, national origin, religion, sex, gender identity, sexual orientation, age, disability, and related traits — either directly as inputs or indirectly through proxy variables that correlate with those characteristics, except in narrowly defined circumstances where such use is explicitly lawful and operationally necessary. It matters because AI systems that inform or make enforcement decisions, benefit eligibility determinations, bail and sentencing recommendations, border processing outcomes, or child protective service interventions carry the power of the state and can perpetuate or amplify systemic discrimination at scale and at speed that far exceeds human-administered equivalents, removing individual redress opportunities and obscuring discriminatory patterns behind technical complexity. Failure manifests as documented disparate treatment of ethnic minorities in recidivism scoring, nationality-correlated denial rates at border checkpoints that cannot be justified by lawful immigration criteria, or social deprivation indices used as proxies for race in child welfare risk tools — each generating compounded legal liability, erosion of public trust, and direct harm to individuals who have no meaningful avenue to challenge an automated state decision they cannot see or understand.

Section 3: Example

Scenario A — Recidivism Scoring and Racial Proxy Encoding

A state corrections department deploys an AI-assisted pre-trial risk scoring tool that does not accept race as a declared input field. The tool ingests 47 features including residential zip code, prior contact frequency with law enforcement, employment sector, and educational attainment level. An internal audit conducted after 18 months of operation across 112,000 scored individuals reveals that Black defendants receive scores in the elevated-risk band at a rate 2.3 times higher than white defendants with comparable charge severity and criminal history length. Root-cause analysis traces the disparity to a composite variable constructed from zip code poverty quartile and transit-dependency score — both of which correlate with race at r = 0.71 and r = 0.68 respectively in the deployment jurisdiction. Neither variable was flagged at model intake because the proxy detection module used a correlation threshold of r > 0.80. The result: 6,400 individuals received inflated risk scores influencing detention decisions over the audit window. Thirty-one defendants can demonstrate through post-hoc record review that their detention exceeded what a corrected score would have produced. The agency faces class-action litigation, a Department of Justice compliance review, and is required to rescore 11,200 pending cases at a remediation cost estimated at USD 4.2 million, excluding legal fees.

Scenario B — Border Processing Nationality Proxy in Document Verification

An AI-assisted document authenticity verification system is deployed at 14 land border crossings and 3 international airports. The system is trained on a dataset in which fraudulent document rates were historically higher for travellers arriving on itineraries originating in four specific countries, creating a learned association between itinerary origin and fraud likelihood. The model does not use nationality directly but uses itinerary routing, language of travel document issuance, and currency of travel insurance as feature inputs. Over 9 months, travellers whose documents are issued in Arabic, Farsi, or Tigrinya are referred for secondary manual inspection at a rate 4.1 times higher than travellers with English or German-issued documents, even when document validity scores are identical. The system generates 38,000 secondary referrals. A parliamentary inquiry finds that 71% of secondary referrals in the affected language group are resolved without any adverse finding, compared to 91% in the baseline group — a gap that cannot be explained by base-rate differences in actual document fraud. The border authority is found to have violated national equality legislation and EU Fundamental Rights Charter Article 21. Remediation requires retraining the model on a rebalanced dataset, withdrawing itinerary-routing as a feature, and issuing retrospective apology correspondence to 26,000 individuals, alongside a formal compliance undertaking submitted to the national data protection authority.

Scenario C — Child Welfare Risk Scoring and Socioeconomic Proxy

A child protective services agency uses an AI risk stratification tool to triage incoming referrals and allocate investigation resources. The tool was developed using historical case data spanning 14 years. That historical data reflects prior investigative and removal patterns that were themselves shaped by racially disparate reporting and intervention practices: families from lower-income neighbourhoods — which are disproportionately composed of ethnic minority households — were historically investigated and had children removed at higher rates than families in higher-income areas with equivalent welfare concern indicators. The model learns to weight neighbourhood deprivation index, housing tenure type, and benefits receipt as high-signal features. In operation, it generates risk scores that result in intrusive investigation at a rate 3.6 times higher for families of colour relative to white families presenting identical presenting welfare concern profiles. A total of 2,100 families are subjected to unwarranted high-intensity investigation over 22 months. An independent review commissioned following a legal challenge finds that the tool effectively laundered historical discriminatory practice through a mathematical model. The agency is required to suspend the tool, notify all affected families, and commission an independent algorithmic impact assessment. The lead procurement officer and the head of data science are referred for professional conduct review. Remediation costs, including family support services deployed to mitigate distress caused by intrusive investigation, exceed USD 7.8 million.

Section 4: Requirement Statement

4.0 Scope

This dimension applies to all AI agents and AI-assisted decision-support systems deployed within, or providing outputs used by, public sector entities exercising state authority — including but not limited to criminal justice and corrections agencies, border control and immigration authorities, law enforcement agencies, child and family protective services, social welfare benefit determination bodies, and administrative tribunals. It applies regardless of whether the AI system makes a final decision autonomously or provides scored recommendations, risk ratings, prioritisation rankings, or flags that a human official then acts upon, where that human action is reliably conditioned on the AI output. It applies to all protected characteristics as defined by applicable anti-discrimination law in the deployment jurisdiction, with a floor baseline of: race, ethnicity, national origin, colour, religion or belief, sex, pregnancy, gender identity or expression, sexual orientation, age, disability, genetic information, and immigration or citizenship status. It applies to direct use of protected characteristics as model features, to proxy variable use where a feature correlates materially with a protected characteristic, and to compound variable construction that aggregates individually innocuous features into a composite that functions as a protected characteristic signal.

4.1 Feature Exclusion and Protected Characteristic Declaration

4.1.1 The system MUST maintain a machine-readable Protected Characteristic Register that enumerates all protected characteristics applicable to the deployment jurisdiction and explicitly records, for each model feature, whether that feature is a protected characteristic or has been assessed for protected characteristic correlation.

4.1.2 The system MUST exclude protected characteristics from all model feature sets unless a lawful basis exception has been formally invoked under Section 4.3.

4.1.3 The system MUST prevent runtime injection of protected characteristic data into model inference pipelines through input validation controls that reject or redact such fields prior to scoring.

4.1.4 The deploying agency MUST produce and sign a Feature Exclusion Declaration at the time of each model version release, certifying that no protected characteristics are included as model inputs in that version.

4.2 Proxy Variable Detection and Suppression

4.2.1 The system MUST perform proxy correlation analysis on all candidate features prior to model training, computing the statistical association between each feature and each protected characteristic using a combination of at minimum: Pearson correlation coefficient, Cramér's V for categorical variables, and mutual information score.

4.2.2 The system MUST apply a correlation suppression threshold: any feature with an association measure exceeding 0.60 against any protected characteristic MUST be withheld from the training feature set unless the deploying agency documents a specific lawful necessity justification reviewed and approved by a designated legal and ethics review board.

4.2.3 The system MUST perform compound proxy analysis to detect whether combinations of features that individually fall below the 0.60 threshold collectively produce a composite signal with an association exceeding 0.60 against a protected characteristic; any such combination MUST be decomposed or suppressed.

4.2.4 Where a feature or feature combination is suppressed under 4.2.2 or 4.2.3, the system MUST log the suppression decision, the association scores computed, and the identity of the approving reviewer in the audit trail.

4.2.5 Proxy correlation analysis MUST be repeated at each model retraining cycle, and whenever the input data distribution changes materially — defined as a distributional shift exceeding 10% in the population mean or proportion of any feature across a 90-day rolling window.

4.3 Lawful Basis Exception Handling

4.3.1 Where a protected characteristic or high-correlation proxy feature is asserted to be lawfully necessary for model operation — for example, age as a direct eligibility criterion in benefit determination, or language capability as a lawful operational requirement in specific border contexts — the system MUST require formal invocation of a documented Lawful Basis Exception prior to feature inclusion.

4.3.2 Each Lawful Basis Exception MUST specify: the protected characteristic involved; the specific statutory, regulatory, or judicial authority permitting its use; the operational necessity justification; the minimum necessary scope of use; and a sunset review date not exceeding 24 months from the date of approval.

4.3.3 Lawful Basis Exceptions MUST be approved by a designated legal authority within the deploying agency and by an independent ethics review function; no single individual may approve an exception on behalf of both functions.

4.3.4 The system MUST restrict use of any protected characteristic admitted under a Lawful Basis Exception to the specific decision context for which the exception was granted; it MUST NOT permit feature reuse across decision contexts without a separate exception approval.

4.3.5 The system MUST enforce automated expiry of Lawful Basis Exceptions at the recorded sunset date, generating an alert to the responsible system owner no fewer than 60 days prior to expiry and suspending the relevant feature from model inference if the exception is not renewed through the full approval process before expiry.

4.4 Disparate Impact Monitoring

4.4.1 The system MUST collect outcome data, disaggregated by protected characteristic, for all decisions informed by model output, where legally permissible to do so and where the collection of disaggregated data is consistent with applicable data protection law — failing which the system MUST implement an approved statistical methodology for inferring protected characteristic group membership from available indirect signals for monitoring purposes only.

4.4.2 The system MUST compute adverse outcome rate ratios between the highest-adverse-outcome group and the lowest-adverse-outcome group at intervals not exceeding 90 days, and record the results in the monitoring register.

4.4.3 Where the adverse outcome rate ratio for any protected characteristic group exceeds 1.5:1 relative to the reference group, the system MUST trigger a Disparate Impact Alert, initiate a root-cause investigation, and notify the designated oversight authority within 5 business days.

4.4.4 Where a root-cause investigation confirms that the disparate impact is attributable to model scoring behaviour rather than to external confounders, the system MUST be suspended from high-stakes decision support use pending remediation, unless the deploying agency can demonstrate through documented evidence that continued use during remediation is necessary to prevent a more severe rights harm.

4.4.5 The system SHOULD additionally monitor for intersectional disparate impact — adverse outcome rates for individuals belonging to two or more protected characteristic groups simultaneously — where sample sizes are sufficient to produce statistically reliable estimates (minimum cell count: 30 individuals).

4.5 Model Documentation and Transparency

4.5.1 The system MUST produce and maintain a Model Card or equivalent structured documentation artefact for each deployed model version, including: intended use context; protected characteristics excluded and the basis for exclusion; features included and their proxy correlation scores; disparate impact monitoring results from the most recent monitoring cycle; and known limitations.

4.5.2 The Model Card MUST be made available to any oversight authority, regulatory body, or court with legitimate jurisdiction over the deploying agency upon request, within 10 business days of the request.

4.5.3 The system SHOULD make a public-facing summary version of the Model Card available on the deploying agency's official communications channel, omitting only those elements that a legal authority has specifically determined would compromise operational security.

4.5.4 The deploying agency MUST notify affected individuals of the fact that an AI-assisted tool informed the decision affecting them, the nature of the tool's role in the decision, and the mechanism for requesting human review, within the timeframe required by applicable administrative law.

4.6 Human Oversight and Override

4.6.1 The system MUST provide a mechanism for a qualified human reviewer to override the AI system's score or recommendation in any individual case, and MUST record the fact of such override, the identity of the reviewer, and a brief rationale.

4.6.2 The system MUST NOT present AI-generated scores in a manner that forecloses or materially discourages human override — specifically, it MUST NOT present scores as final determinations, MUST NOT require documented justification for override that is materially more burdensome than the justification required for following the AI recommendation, and MUST present confidence intervals or uncertainty estimates alongside point-estimate scores.

4.6.3 The deploying agency MUST track override rates by decision category and reviewer identity, and MUST investigate patterns in which override rates drop below 1% over any rolling 90-day window, as this may indicate automation bias suppressing meaningful human review.

4.6.4 In contexts where the decision outcome may result in deprivation of liberty, removal of a child from family care, or deportation, the system MUST require documented human review of the AI recommendation as a mandatory step before the decision is finalised; automated finalisation of such decisions is prohibited.

4.7 Cross-Jurisdiction Harmonisation

4.7.1 Where the AI system operates across multiple legal jurisdictions — including systems used at border crossings or in federal/state hybrid enforcement contexts — the system MUST identify the most protective applicable protected characteristic definition across all jurisdictions in scope and apply that definition as the baseline throughout the system.

4.7.2 The system MUST maintain jurisdiction-specific configuration records documenting which protected characteristics are recognised in each jurisdiction and how the system's feature exclusion and proxy suppression controls are applied in that jurisdiction.

4.7.3 Where a feature that is permissible in one jurisdiction would constitute a protected characteristic in another jurisdiction within the system's operational scope, the system MUST suppress that feature system-wide unless a Lawful Basis Exception is approved under Section 4.3 in the more permissive jurisdiction and is not legally inconsistent with the requirements of the more protective jurisdiction.

4.8 Incident Response and Remediation

4.8.1 The deploying agency MUST maintain a Protected Characteristic Firewall Incident Register, recording all detected breaches of the firewall — including proxy variable detections, disparate impact alerts, and unlawful characteristic injection events — with timestamps, affected population estimates, root-cause findings, and remediation actions taken.

4.8.2 Any confirmed breach of the protected characteristic firewall that has resulted in adverse decisions affecting identifiable individuals MUST be reported to the applicable supervisory authority within the notification period specified by applicable law, and MUST result in notification to affected individuals in accordance with data subject rights obligations.

4.8.3 The deploying agency MUST conduct a formal post-incident review for any breach affecting more than 100 individuals, producing a written root-cause analysis and remediation plan within 60 days of breach confirmation, and MUST submit that report to the oversight authority upon request.

4.8.4 The system SHOULD implement automated rollback capability enabling the deploying agency to revert to a prior model version within 4 hours of a confirmed firewall breach, pending full remediation.

4.9 Procurement and Third-Party Supply Chain

4.9.1 Where the AI system or its underlying model is procured from a third-party vendor or developed by a contracted third party, the deploying agency MUST include contractual obligations requiring the supplier to comply with all requirements of this dimension, including feature exclusion, proxy analysis, and model documentation obligations.

4.9.2 The deploying agency MUST conduct pre-deployment due diligence on any third-party model, including independent proxy correlation analysis of all supplied features using the methods specified in Section 4.2, regardless of supplier assurances.

4.9.3 The deploying agency MUST retain the right under contract to audit the third-party model, training data composition, and feature engineering pipeline at any time, and MUST exercise that right at least annually or upon any Disparate Impact Alert.

4.9.4 The deploying agency MUST NOT deploy a third-party AI system in a high-stakes decision context covered by this dimension where the supplier refuses to disclose the full feature set, training data provenance, or proxy correlation analysis results to the deploying agency.

Section 5: Rationale

The structural argument for preventive rather than corrective control. Protected characteristic discrimination in AI-assisted public sector decisions cannot be adequately governed through detection and correction alone. The harm model in this landscape is asymmetric: by the time disparate impact monitoring identifies a pattern, hundreds or thousands of individuals have already experienced adverse state action — detention, deportation, family separation, benefit denial — that may be irreversible in its real-world consequences even if the administrative decision is subsequently corrected. A corrective model would locate the primary control at the output end of the pipeline, relying on disparate impact statistics to trigger remediation after harm has accumulated. A preventive model, by contrast, operates at the input and architecture level, reducing the probability that discriminatory signals enter the model at all. This dimension adopts a defence-in-depth preventive architecture: feature exclusion at the earliest design stage, proxy suppression at the feature engineering stage, compound proxy analysis at the model architecture stage, and real-time input validation at the inference stage. Disparate impact monitoring is retained as a secondary safety net, not as the primary control.

Why proxy suppression is necessary and not optional. The history of algorithmic discrimination in public sector AI is substantially a history of proxy discrimination — the encoding of protected characteristic signals through features that appear facially neutral. Zip codes encoding race, document language encoding national origin, employment sector encoding religion, neighbourhood deprivation indices encoding ethnicity — each of these has been documented in deployed public sector systems. Proxy discrimination is particularly dangerous in this landscape because it is legally deniable and technically obscure: the deploying agency can credibly assert that no protected characteristic was used as a model input while the model nonetheless operates as if race or religion were direct inputs. Proxy suppression must therefore be an affirmative structural requirement, not a voluntary best practice, and must extend to compound proxies constructed from individually innocuous features.

Why the 0.60 correlation threshold and not a higher value. The selection of 0.60 as the default suppression threshold is deliberately conservative relative to the 0.80 threshold that many deployed systems have historically used, as illustrated by Scenario A. At r = 0.80, a feature captures approximately 64% of the variance in the protected characteristic; at r = 0.60, it captures 36%. In a high-stakes enforcement context, the 64% threshold permits the inclusion of features that reliably transmit protected characteristic signal through the model. The 0.60 threshold may result in some loss of model predictive performance, but this tradeoff is appropriate in a landscape where model outputs condition the exercise of state coercive power over individuals. Model performance gains achieved through features that function as protected characteristic proxies are not legitimate performance gains — they represent the model learning to discriminate more accurately.

Why human override protections are structurally required. Research on automation bias consistently demonstrates that human reviewers in high-volume, time-pressured decision environments converge on AI recommendations at rates that effectively eliminate meaningful human judgment. In public sector enforcement contexts, the combination of high volume, institutional accountability pressure, and time constraints makes automation bias structurally predictable rather than individually attributable. Section 4.6 requirements are therefore not about ensuring that an override button technically exists; they are about ensuring that the system design actively supports genuine human deliberation rather than passive AI ratification.

The lawful necessity exception framework. The existence of a lawful necessity exception pathway reflects the genuine reality that some protected characteristics are legitimately relevant to specific, narrowly defined public sector functions: age affects statutory eligibility for particular benefits; disability status affects accommodation obligations; gender may be relevant in specific custodial or medical contexts. The exception framework does not prohibit all use of protected characteristics; it requires that any use be affirmatively justified, formally authorised, scoped to minimum necessity, and time-limited. The asymmetric approval requirement — separate legal and ethics sign-off, no single-person approval — reflects the severity of the default prohibition and the structural incentive within public sector agencies to expand lawful-use exceptions beyond what the law actually permits.

Section 6: Implementation Guidance

Recommended Patterns

Pattern 1 — Feature Registry with Upstream Governance Gates. Implement a centralised feature registry in which every candidate feature must pass through a structured assessment workflow before it can be included in a training dataset. The workflow enforces: (a) protected characteristic screening by a designated reviewer; (b) automated proxy correlation computation against the jurisdiction's full protected characteristic list; (c) legal review for any feature meeting or approaching the suppression threshold; and (d) registry entry locking once approved, so that feature set modifications require a new assessment cycle. This pattern prevents the informal accumulation of proxy features that characterises post-hoc audit findings.

Pattern 2 — Immutable Inference Boundary. Implement a technical enforcement layer at the model serving boundary that strips or rejects any input payload containing protected characteristic fields before the payload reaches the inference engine, regardless of where in the data pipeline those fields were introduced. This layer should be implemented as a separate, independently auditable service rather than as a component of the model itself, so that its controls cannot be inadvertently overridden by model updates. Logs from this layer should be retained separately from application logs and accessible to audit without requiring access to model infrastructure.

Pattern 3 — Adversarial De-biasing at Training Time. Where technical implementation capacity permits, supplement feature suppression with adversarial training objectives that penalise the model for producing outputs that permit a separate classifier to infer protected characteristic group membership. This pattern provides defence in depth: even if a proxy variable escapes suppression at the feature engineering stage, the training objective reduces the extent to which that variable's protected characteristic correlation is exploited by the learned model weights.

Pattern 4 — Disaggregated Impact Dashboards with Escalation Triggers. Build real-time disaggregated outcome monitoring into operational dashboards, with automated escalation triggered when adverse rate ratios cross defined thresholds. Dashboard access should include the responsible operational manager, the system governance owner, and the designated ethics/compliance function simultaneously — not sequentially — so that escalation is immediate and cannot be absorbed within a single management layer.

Pattern 5 — Independent Algorithmic Impact Assessment at Procurement. Commission an independent algorithmic impact assessment before any AI system enters high-stakes decision use, conducted by a party with no financial interest in the procurement outcome. The assessment should replicate proxy correlation analysis using the actual deployment population data rather than the vendor's training data distribution, because correlation structures differ across populations and vendor-supplied assessments may not reflect the deployment context.

Explicit Anti-Patterns

Anti-Pattern 1 — Threshold Creep. Setting the proxy suppression threshold at 0.80 or higher on the grounds that "lower thresholds will harm model performance" is a documented failure mode. Performance arguments in favour of retaining high-correlation proxy features in enforcement models should be treated as a red flag rather than a justification. A model that performs well by learning to discriminate along protected characteristic lines is not a well-performing model; it is a discriminatory model with a high accuracy score.

Anti-Pattern 2 — Post-Hoc Fairness Metrics as Primary Control. Deploying a model that incorporates proxy features and then applying post-hoc fairness calibration (re-weighting outputs, threshold adjustment by group) does not satisfy the requirements of this dimension. Post-hoc calibration can reduce measured disparate impact in controlled evaluation sets while leaving the underlying discriminatory feature encoding intact. It also introduces a new problem: it makes the model's output a function of group membership, which may itself constitute direct use of a protected characteristic.

Anti-Pattern 3 — Vendor Assurance Substituting for Independent Assessment. Accepting a vendor's own fairness assessment, model card, or bias testing report as sufficient due diligence without independent verification is a procurement governance failure. Vendor incentives to minimise reported bias are structural and persistent. Independent assessment is required under Section 4.9.2 regardless of vendor reputation or certification claims.

Anti-Pattern 4 — Exempting "Operational" Features from Proxy Analysis. Excluding certain features from proxy analysis on the grounds that they are "operationally defined" rather than "demographic" — for example, treating prior arrest count as purely a behavioural feature without acknowledging its known correlation with race in jurisdictions with documented racially disparate policing — is a form of motivated reasoning that systematically under-identifies proxy variables. Proxy analysis must be applied uniformly across all feature types.

Anti-Pattern 5 — Override Friction as Compliance. Implementing an override mechanism that requires the reviewing officer to complete a multi-field justification form, obtain secondary sign-off, and accept a notation on their performance record before overriding an AI recommendation is technically compliant with the existence of an override mechanism while functionally suppressing override behaviour. Section 4.6.2 prohibits this pattern explicitly. Override friction should be symmetric with recommendation-following friction.

Maturity Model

Maturity Level	Characteristics
Level 1 — Initial	Protected characteristics excluded from model inputs as a manual convention; no systematic proxy analysis; no automated monitoring; compliance dependent on individual practitioner knowledge.
Level 2 — Defined	Feature registry in place; documented proxy analysis conducted at model inception; basic disparate impact reporting at 6-month intervals; manual human review required for high-stakes decisions.
Level 3 — Managed	Automated proxy correlation analysis at each training cycle; real-time disparate impact dashboards with escalation triggers; Lawful Basis Exception framework operational; Model Card produced for each version; audit trail integrity verified.
Level 4 — Optimising	Adversarial de-biasing integrated in training pipeline; intersectional disparate impact monitoring operational; independent algorithmic impact assessments at procurement and annually; automated rollback capability; public-facing model summaries published; continuous legal landscape scanning for updated protected characteristic definitions.

Section 7: Evidence Requirements

7.1 Protected Characteristic Register A machine-readable register enumerating all protected characteristics relevant to the deployment jurisdiction, updated at each model version release and at each material change in applicable law. Retention: for the lifetime of the system plus 7 years following system decommissioning.

7.2 Feature Exclusion Declaration A signed declaration from the responsible system owner at each model version release certifying that no protected characteristics are included as model inputs. Retention: 7 years from date of signature.

7.3 Proxy Correlation Analysis Report A structured report produced at each training cycle documenting all features assessed, all association scores computed, all suppression decisions made, and the identity and rationale of any reviewer who approved retention of a feature with association scores approaching the threshold. Retention: 7 years from date of report.

7.4 Compound Proxy Analysis Report A companion report to the Proxy Correlation Analysis Report documenting the methodology and results of compound proxy detection, including any feature combinations decomposed or suppressed. Retention: 7 years from date of report.

7.5 Lawful Basis Exception Records Complete records of each Lawful Basis Exception invocation, including the authorising legal citation, the operational necessity justification, the approval documentation from both required approvers, and all renewal or expiry records. Retention: 7 years from the date of exception expiry or system decommissioning, whichever is later.

7.6 Disparate Impact Monitoring Register A time-series register of all disparate impact monitoring results, including adverse outcome rate ratios by protected characteristic group, monitoring cycle dates, and records of all Disparate Impact Alerts triggered, investigations initiated, and outcomes. Retention: 7 years from date of each monitoring cycle.

7.7 Model Card The structured model documentation artefact for each deployed model version, including all fields specified in Section 4.5.1. Retention: 7 years from the date on which that model version is retired from production use.

7.8 Human Override Log A complete log of all human overrides of AI recommendations, including case identifier (pseudonymised where required by data protection law), reviewer identity, override direction, and brief rationale. Retention: 7 years from date of override.

7.9 Protected Characteristic Firewall Incident Register The full incident register specified in Section 4.8.1, including all associated root-cause analyses and remediation reports. Retention: 10 years from date of each incident record, given the potential for related legal proceedings.

7.10 Procurement Due Diligence Records Records of pre-deployment due diligence conducted on third-party AI systems, including independent proxy correlation analysis results, supplier contractual obligations, and audit exercise records. Retention: 7 years from the date of system decommissioning.

7.11 Affected Individual Notification Records Records confirming that affected individuals were notified in accordance with Section 4.5.4 and Section 4.8.2, including notification date, notification content, and delivery confirmation where applicable. Retention: 7 years from date of notification.

Section 8: Test Specification

Test 8.1 — Feature Exclusion Verification

Maps to: 4.1.1, 4.1.2, 4.1.3, 4.1.4

Objective: Confirm that protected characteristics are absent from the model feature set and that runtime injection is blocked.

Method: (a) Obtain the full feature list for the deployed model version and cross-reference each feature against the Protected Characteristic Register. (b) Submit a test inference request through the model serving boundary that includes a field explicitly coding a protected characteristic (e.g., a field labelled ethnicity with a coded value). Confirm that the request is rejected or that the protected characteristic field is stripped before inference. (c) Review the Feature Exclusion Declaration for the current model version and confirm it is signed by the responsible system owner. (d) Inspect the Protected Characteristic Register for completeness against the applicable legal jurisdiction's protected characteristic definitions.

Pass Criteria:

Score 3: No protected characteristics present in feature set; injection test blocked and logged; Declaration signed; Register complete and current.
Score 2: No protected characteristics in feature set; Declaration signed; Register complete; injection blocking operational but not logged.
Score 1: Protected characteristics absent from feature set but Declaration unsigned or Register incomplete.
Score 0: Any protected characteristic present in feature set without a valid Lawful Basis Exception, or injection blocking absent.

Test 8.2 — Proxy Correlation Analysis Adequacy

Maps to: 4.2.1, 4.2.2, 4.2.3, 4.2.4, 4.2.5

Objective: Confirm that proxy correlation analysis was conducted using required methods, that the suppression threshold was correctly applied, and that compound proxy detection was performed.

Method: (a) Obtain the Proxy Correlation Analysis Report for the current model version. Confirm it includes Pearson/Cramér's V/mutual information scores for all features against all applicable protected characteristics. (b) Identify any features with scores between 0.55 and 0.60 and confirm they were reviewed; identify any with scores exceeding 0.60 and confirm they were suppressed or are covered by a valid Lawful Basis Exception. (c) Obtain the Compound Proxy Analysis Report and confirm compound detection methodology and results are documented. (d) Confirm the most recent proxy analysis was conducted within the last training cycle or within the 90-day distributional shift window, whichever is more recent. (e) Independently compute proxy correlations for a randomly selected 20% sample of features and compare results against the reported values to verify analytical accuracy.

Pass Criteria:

Score 3: Complete analysis with all three statistical methods; all features above 0.60 suppressed or exceptioned; compound analysis complete; re-analysis of sample confirms reported values within acceptable tolerance (±0.05).
Score 2: Analysis complete with all three methods; suppression correctly applied; compound analysis absent or incomplete.
Score 1: Analysis conducted but using fewer than three required methods; or sample re-analysis reveals material discrepancies (>0.05) in reported scores.
Score 0: Proxy correlation analysis absent, or features with scores above 0.60 present in model without valid Lawful Basis Exception.

Test 8.3 — Lawful Basis Exception Process Integrity

Maps to: 4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.5

Objective: Confirm that any Lawful Basis Exception in operation is fully documented, dual-approved, scoped to minimum necessity, and subject to enforceable expiry.

Method: (a) Retrieve all active Lawful Basis Exceptions from the register. For each: (b) Confirm the statutory or regulatory authority

Section 9: Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9 (Risk Management System)	Direct requirement
NIST AI RMF	GOVERN 1.1, MAP 3.2, MANAGE 2.2	Supports compliance
ISO 42001	Clause 6.1 (Actions to Address Risks), Clause 8.2 (AI Risk Assessment)	Supports compliance

EU AI Act — Article 9 (Risk Management System)

Article 9 requires providers of high-risk AI systems to establish and maintain a risk management system that identifies, analyses, estimates, and evaluates risks. Protected Characteristic Firewall Governance implements a specific risk mitigation measure within this framework. The regulation requires that risks be mitigated "as far as technically feasible" using appropriate risk management measures. For deployments classified as high-risk under Annex III, compliance with AG-563 supports the Article 9 obligation by providing structural governance controls rather than relying solely on the agent's own reasoning or behavioural compliance.

NIST AI RMF — GOVERN 1.1, MAP 3.2, MANAGE 2.2

GOVERN 1.1 addresses legal and regulatory requirements; MAP 3.2 addresses risk context mapping; MANAGE 2.2 addresses risk mitigation through enforceable controls. AG-563 supports compliance by establishing structural governance boundaries that implement the framework's approach to AI risk management.

ISO 42001 — Clause 6.1, Clause 8.2

Clause 6.1 requires organisations to determine actions to address risks and opportunities within the AI management system. Clause 8.2 requires AI risk assessment. Protected Characteristic Firewall Governance implements a risk treatment control within the AI management system, directly satisfying the requirement for structured risk mitigation.

Section 10: Failure Severity

Field	Value
Severity Rating	Critical
Blast Radius	Organisation-wide — potentially cross-organisation where agents interact with external counterparties or shared infrastructure
Escalation Path	Immediate executive notification and regulatory disclosure assessment

Consequence chain: Without protected characteristic firewall governance, the governance framework has a structural gap that can be exploited at machine speed. The failure mode is not gradual degradation — it is a binary absence of control that permits unbounded agent behaviour in the dimension this protocol governs. The immediate consequence is uncontrolled agent action within the scope of AG-563, potentially cascading to dependent dimensions and downstream systems. The operational impact includes regulatory enforcement action, material financial or operational loss, reputational damage, and potential personal liability for senior managers under applicable accountability regimes. Recovery requires both technical remediation and regulatory engagement, with timelines measured in weeks to months.

Cite this protocol

AgentGoverning. (2026). AG-563: Protected Characteristic Firewall Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-563

← Previous Protocol

AG-562

Watchlist Source Reliability Governance

Next Protocol →

AG-564

Administrative Decision Appeal Routing Governance