AG-687: Geospatial Bias Governance

2. Summary

Geospatial Bias Governance requires organisations operating AI agents in housing, real estate, and property decision contexts to detect, prevent, and remediate unfair geographic proxies that produce discriminatory outcomes along protected-class lines. Geographic data — ZIP codes, postal codes, census tracts, neighbourhood names, school district identifiers, commute-time radii, and latitude-longitude coordinates — is among the most potent proxy variables for race, ethnicity, national origin, religion, and socioeconomic status. An AI agent that incorporates geographic features into tenant screening, mortgage underwriting, property listing recommendations, rental pricing, or repair prioritisation can reproduce and amplify historical patterns of residential segregation, redlining, and exclusionary zoning without ever referencing a protected characteristic directly. This dimension mandates that organisations implement systematic controls to identify when geographic inputs or derived features function as discriminatory proxies, to prevent those proxies from producing disparate impact in housing-related decisions, and to maintain auditable evidence that geographic features used in agent decision-making have been assessed for proxy effects and found to serve a legitimate, non-discriminatory purpose. The governance obligation is preventive: geographic proxy bias must be detected and blocked before it reaches the affected individual, not merely logged after the fact.

3. Example

Scenario A — ZIP Code Proxy Produces Digital Redlining in Tenant Screening: A property management company deploys an AI agent to pre-screen rental applications. The agent ingests applicant data including current address, employment location, and credit history. Although the agent does not receive the applicant's race or ethnicity, it uses the applicant's current ZIP code as an input feature. The ZIP code is highly correlated with racial composition due to decades of residential segregation: ZIP codes 60621 and 60637 in Chicago are over 90% Black, while ZIP codes 60614 and 60657 are over 70% white. The agent learns that applicants from certain ZIP codes have statistically higher eviction rates and lower credit scores — patterns that reflect historical disinvestment, discriminatory lending, and exclusionary housing policies, not individual creditworthiness. Over 14 months, the agent rejects 43% of applicants from majority-Black ZIP codes and 12% of applicants from majority-white ZIP codes with equivalent income and employment stability. No one notices because the agent's decision explanations reference "historical tenancy risk" and "area credit profile" rather than race. A fair housing audit commissioned by a tenant advocacy organisation discovers the disparity by geocoding 2,400 application decisions and overlaying them on census demographic data.

What went wrong: The agent used ZIP code as an input feature without assessing its proxy effect for race. No disparity analysis was conducted on the agent's output to detect differential rejection rates by geography correlated with protected characteristics. The agent's training data encoded historical patterns of racial segregation, and the ZIP code feature transmitted those patterns into current decisioning. The property management company had no geographic proxy detection framework, no disparate impact monitoring, and no process for validating that geographic features served a legitimate non-discriminatory purpose. Consequence: Fair Housing Act violation, consent decree requiring retrospective review of 2,400 decisions, $1.2 million in settlement payments to affected applicants, mandatory fair housing compliance programme, and reputational damage that reduced occupancy rates across the company's portfolio by 6%.

Scenario B — Geographic Steering in Property Listing Recommendations: A real estate platform deploys an AI agent to recommend property listings to prospective buyers and renters. The agent uses a collaborative filtering model trained on historical search patterns, property views, and transaction data. Because historical housing patterns are segregated, the model learns that users who view properties in predominantly Hispanic neighbourhoods are likely to view other properties in predominantly Hispanic neighbourhoods — and similarly for other demographic clusters. The agent begins steering users: a Hispanic-surnamed user searching for a three-bedroom home in a metropolitan area receives recommendations overwhelmingly concentrated in majority-Hispanic census tracts, while a white-surnamed user with identical search criteria receives recommendations in majority-white census tracts. Neither user explicitly requested neighbourhood filtering. The agent has effectively recreated geographic steering — a practice that the Fair Housing Act has prohibited since 1968. A civil rights organisation conducts paired testing using fictitious user profiles with identical search criteria but different name-based demographic signals, revealing that 78% of the top-10 recommendations differ between the paired profiles, with the divergence strongly correlated to the racial composition of the recommended neighbourhoods.

What went wrong: The collaborative filtering model was trained on historically segregated interaction data without any debiasing intervention. The model treated historical geographic concentration patterns as user preferences rather than artefacts of systemic segregation. No paired-testing or counterfactual analysis was performed to detect whether recommendation patterns varied by user characteristics correlated with protected classes. The platform had no geographic steering detection framework and no process for ensuring that listing recommendations did not cluster by the demographic composition of neighbourhoods. Consequence: Department of Justice investigation, consent decree requiring algorithmic remediation, $3.8 million in penalties, mandatory paired-testing programme, and court-ordered independent monitor for 5 years.

Scenario C — Commute-Time Feature Encodes Racial Segregation in Mortgage Pre-Approval: A mortgage lender deploys an AI agent to assist with pre-approval decisions. The agent uses a "commute feasibility score" that estimates the applicant's commute time from the prospective property to their employer using real-time traffic data. The feature was intended to assess whether the applicant could sustain employment while living at the property — a factor relevant to repayment probability. However, commute-time calculations are not racially neutral: because employment centres, public transit networks, and highway infrastructure are distributed unevenly across racially segregated metropolitan areas, the commute feasibility score systematically disadvantages applicants seeking homes in minority neighbourhoods with poorer transit access. Black and Hispanic applicants seeking to purchase homes in their current communities receive lower commute feasibility scores than white applicants seeking homes in suburban areas with direct highway access to the same employment centres. The disparity is not visible in the model's feature importance analysis because commute feasibility is the third-most-predictive feature and appears facially neutral. A regulatory examination discovers the disparate impact when the examiner disaggregates pre-approval rates by census tract racial composition and finds a 23-percentage-point gap between majority-minority and majority-white census tracts after controlling for income, debt-to-income ratio, and credit score.

What went wrong: The commute feasibility feature was a geographic proxy that encoded the spatial consequences of racial segregation — unequal transit infrastructure, highway placement decisions driven by urban renewal programmes that displaced minority communities, and job-suburbanisation patterns that increased commute burdens for minority urban residents. The lender did not conduct a proxy analysis on the commute feasibility feature, did not disaggregate model outcomes by the racial composition of the property's census tract, and did not evaluate whether the feature's predictive power for repayment probability was sufficient to justify its disparate impact. Consequence: Equal Credit Opportunity Act violation, $6.2 million consent order, required model remediation, retrospective review of 18 months of pre-approval decisions, and mandatory disparate impact testing programme.

4. Requirement Statement

Scope: This dimension applies to any AI agent that makes, supports, or influences decisions in the housing, real estate, and property domain where geographic data — including but not limited to postal codes, ZIP codes, census tracts, neighbourhood identifiers, municipality names, school district boundaries, latitude-longitude coordinates, commute-time calculations, property tax jurisdiction codes, flood zone designations, walkability scores, or any derived or composite geographic feature — is an input to the agent's decision logic, training data, or recommendation model. The scope includes tenant screening, rental application processing, property listing recommendations, mortgage pre-approval and underwriting, rent-setting and fee determination, repair and maintenance prioritisation, property valuation, homeowner association enforcement, and any other housing-related decision where geographic information may function as a proxy for race, ethnicity, national origin, religion, familial status, disability, sex, or other protected characteristics under applicable fair housing and anti-discrimination law. The scope extends to features derived from geographic data even when the raw geographic identifier has been removed — for example, median neighbourhood income, area crime statistics, school quality ratings, or environmental risk indices that are computed from geographic boundaries and carry the same proxy risk. Organisations that assert a geographic feature is not a proxy bear the burden of demonstrating, through documented analysis, that the feature does not produce disparate impact or that any disparate impact is justified by a legitimate non-discriminatory purpose that cannot be achieved through less discriminatory means.

4.1. A conforming system MUST maintain a comprehensive inventory of all geographic features, including raw geographic identifiers, derived geographic features, and composite features that incorporate geographic data, used as inputs to any housing-related AI agent decision, recommendation, or scoring model, with each feature's source, derivation method, and intended purpose documented.

4.2. A conforming system MUST conduct a proxy analysis for each geographic feature in the inventory, assessing its statistical correlation with protected-class demographics using contemporaneous demographic data at the appropriate geographic granularity, and documenting the analysis methodology, data sources, correlation metrics, and conclusions.

4.3. A conforming system MUST implement disparate impact testing on the agent's outputs, disaggregated by the demographic composition of the relevant geographic unit (e.g., census tract, ZIP code), at least quarterly and whenever the agent's model, training data, or geographic feature set is materially changed.

4.4. A conforming system MUST define and enforce quantitative disparate impact thresholds — such as the four-fifths rule or a statistically significant deviation test — for each housing-related decision type, and automatically flag decisions or decision patterns that exceed those thresholds.

4.5. A conforming system MUST block or escalate for human review any geographic feature that the proxy analysis identifies as having a correlation with protected-class demographics exceeding a defined threshold, unless the organisation has documented a legitimate non-discriminatory justification and has demonstrated that no less discriminatory alternative feature achieves the same purpose.

4.6. A conforming system MUST log every housing-related decision made by the agent with sufficient geographic context to enable retrospective disparate impact analysis, including the geographic features used, their values for the specific decision, and the decision outcome.

4.7. A conforming system MUST implement counterfactual testing — evaluating whether the agent's decision changes when the geographic feature is perturbed while all other inputs remain constant — for at least a statistically valid sample of decisions per quarter, to detect cases where the geographic feature is the marginal determinant of an adverse outcome.

4.8. A conforming system MUST ensure that training data used for housing-related agent models is assessed for historical geographic bias, including patterns reflecting redlining, blockbusting, racially restrictive covenants, exclusionary zoning, or other discriminatory practices, and that debiasing interventions are applied where such patterns are identified.

4.9. A conforming system SHOULD implement real-time geographic proxy monitoring that evaluates each individual decision at inference time for geographic proxy risk, rather than relying solely on periodic batch analysis.

4.10. A conforming system SHOULD conduct paired testing — submitting matched application profiles that differ only in geographic features correlated with different demographic compositions — at least semi-annually to detect geographic steering, differential treatment, and proxy-driven disparate impact that may not be visible in aggregate statistics.

4.11. A conforming system SHOULD integrate geographic bias governance with the organisation's broader fair lending and fair housing compliance programme, ensuring that geographic proxy analysis results are reported to fair housing compliance officers and incorporated into the organisation's Community Reinvestment Act, Home Mortgage Disclosure Act, or equivalent regulatory reporting where applicable.

4.12. A conforming system MAY implement geographic feature ablation studies — systematically removing geographic features and measuring the impact on both model accuracy and disparate impact — to identify the minimum set of geographic features that achieves acceptable predictive performance with minimal proxy risk.

5. Rationale

Geographic data is the single most effective proxy for race in the United States and for race, ethnicity, and national origin in most jurisdictions globally. This is not a theoretical risk; it is an empirical fact rooted in centuries of residential segregation enforced through law, policy, and private action. In the United States, the Home Owners' Loan Corporation's residential security maps of the 1930s explicitly graded neighbourhoods by racial composition, with Black and immigrant neighbourhoods rated "hazardous" (the origin of the term "redlining"). The Federal Housing Administration's underwriting manuals through the 1960s instructed appraisers to downgrade properties in racially mixed or minority neighbourhoods. Racially restrictive covenants, exclusionary zoning, highway placement through minority communities, and discriminatory lending practices produced a spatial distribution of race across metropolitan areas that persists to the present day. In Europe, similar patterns of ethnic and immigrant concentration in specific urban areas — driven by housing allocation practices, social housing policy, and private discrimination — create comparable geographic-demographic correlations.

When an AI agent uses geographic data in housing decisions, it inherits this history. A ZIP code is not a neutral identifier; it is a compressed encoding of decades of discriminatory policy. An agent that treats applicants differently based on ZIP code is, in statistical terms, treating them differently based on race — even if race is not an explicit input. The Fair Housing Act, the Equal Credit Opportunity Act, and their international equivalents prohibit not only intentional discrimination but also practices that have an unjustified disparate impact on protected classes. An agent that produces disparate impact through geographic proxies violates these statutes regardless of the developer's intent.

The proxy problem is particularly insidious because geographic features often have genuine predictive power for the outcome of interest. Property values, default rates, rental vacancy rates, and eviction rates do vary geographically. The question is not whether geographic features are predictive — they often are — but whether their predictive power derives from legitimate, non-discriminatory factors or from the spatial encoding of historical discrimination. If a ZIP code's higher default rate reflects historical disinvestment that depressed property values and limited residents' wealth accumulation, using that default rate as a predictive feature perpetuates the consequences of the original discrimination. The legal standard requires a less discriminatory alternative analysis: if a model can achieve substantially the same predictive accuracy without the geographically-proxy feature, the feature's inclusion is not justified.

Derived and composite geographic features present an additional governance challenge. An organisation may remove raw ZIP codes from its model but substitute "neighbourhood median income," "area school quality rating," or "commute feasibility score" — features that are computed from geographic boundaries and carry the same proxy risk. Feature engineering that transforms geographic identifiers into continuous variables does not eliminate the proxy effect; it merely obscures it. Governance must therefore extend beyond raw geographic identifiers to any feature whose derivation incorporates geographic boundaries, and the proxy analysis must be applied to the derived feature, not just the raw input.

The preventive nature of this control is essential. Housing decisions affect individuals' fundamental rights — where they live, whether they can secure shelter, whether they can build wealth through homeownership. A detective control that identifies disparate impact after the fact leaves affected individuals harmed: the rejected applicant has already been denied housing, the steered buyer has already been directed away from integrated neighbourhoods, the overcharged tenant has already paid the discriminatory rent. While retrospective analysis is necessary for validation, the primary control must be preventive — blocking or escalating geographic proxy effects before they reach the affected individual.

6. Implementation Guidance

Geospatial Bias Governance requires a layered approach: upstream controls on training data and feature engineering, midstream controls on model behaviour and inference-time decisions, and downstream controls on output monitoring and retrospective validation. No single layer is sufficient. Training data debiasing does not guarantee unbiased outputs. Inference-time monitoring does not catch bias baked into model weights. Retrospective analysis does not prevent harm to individuals affected before the analysis is complete. All three layers must operate concurrently.

Recommended patterns:

Geographic feature registry with proxy risk classification. Maintain a structured registry of every geographic feature used in housing-related agent models. For each feature, record: the raw data source (e.g., USPS ZIP code database, Census Bureau TIGER files), the derivation method (direct input, aggregated from parcel-level data, computed from geographic boundaries), the intended predictive purpose (e.g., property tax burden, flood risk, transit accessibility), and the assessed proxy risk level (high, medium, low) based on the correlation analysis required by 4.2. The registry should be version-controlled and updated whenever features are added, modified, or removed. High-proxy-risk features must have documented justifications meeting the less-discriminatory-alternative standard before they are permitted in production models.
Correlation analysis using demographic overlay data. Conduct proxy analysis by overlaying feature values on contemporaneous demographic data at the appropriate geographic granularity. For ZIP-code-level features, use Census Bureau American Community Survey 5-year estimates (or equivalent national statistical office data in other jurisdictions) to compute the correlation between the feature value and the racial, ethnic, or national-origin composition of the geographic unit. Use multiple correlation metrics — Pearson correlation for continuous features, chi-squared or Cramer's V for categorical features, and mutual information for non-linear relationships. A feature that exhibits a correlation with protected-class demographics exceeding a predefined threshold (commonly |r| > 0.5 or equivalent) requires the documented justification and less-discriminatory-alternative analysis mandated by 4.5.
Disparate impact monitoring with geographic disaggregation. Implement automated monitoring that disaggregates agent decision outcomes by the demographic composition of the geographic unit associated with each decision. For tenant screening, disaggregate acceptance and rejection rates by the racial composition of the applicant's current ZIP code and the property's ZIP code. For mortgage decisions, disaggregate pre-approval, denial, and rate-assignment outcomes by the census tract of the subject property. For listing recommendations, disaggregate the demographic composition of recommended neighbourhoods by user demographic proxies. Apply the four-fifths rule (the selection rate for any group must be at least 80% of the rate for the most-favoured group) or a statistically significant deviation test (e.g., two or three standard deviations from expected rate under a null hypothesis of no discrimination). Flag any decision pattern that exceeds the threshold for investigation and remediation.
Counterfactual perturbation testing. Implement automated counterfactual analysis that, for a random sample of decisions, replaces the geographic feature with a value drawn from a geographic unit with a different demographic composition and observes whether the decision changes. For example, for a tenant screening decision where the applicant's current ZIP code is in a majority-Black census tract, replace the ZIP code with one from a majority-white census tract with similar median income and observe whether the screening outcome changes. If the decision flips in a statistically significant proportion of cases, the geographic feature is functioning as a marginal determinant of adverse outcomes and requires remediation. Counterfactual testing detects individual-level proxy effects that may not be visible in aggregate disparate impact statistics.
Training data audit for historical geographic bias. Before training or retraining housing-related models, audit the training data for patterns reflecting historical discrimination. Map training data outcomes (approvals, denials, default rates, eviction rates) to geographic units and overlay demographic data. If the training data shows outcome patterns that correlate with historical redlining maps (available from the Mapping Inequality project and similar archives), the data encodes historical discrimination and must be debiased before use. Debiasing interventions include: reweighting training samples to equalise representation across demographic geographies, removing geographic features from the training feature set and retraining, applying adversarial debiasing techniques that penalise the model for learning geographic-demographic correlations, or using causal inference methods to estimate and remove the discriminatory component of geographic outcome variation.
Paired testing programme. Conduct regular paired testing by submitting matched synthetic profiles through the agent's decision pipeline. Each pair is identical in all respects except for geographic features that are associated with different demographic compositions. For tenant screening, submit paired applications with identical income, employment, and credit characteristics but current addresses in demographically different ZIP codes. For listing recommendations, create paired user profiles with identical search criteria but interaction histories seeded from demographically different neighbourhoods. For mortgage pre-approval, submit paired applications with identical financial profiles but subject properties in demographically different census tracts. Any statistically significant differential treatment between paired profiles constitutes evidence of geographic proxy bias requiring remediation.

Anti-patterns to avoid:

Removing raw geographic identifiers without addressing derived features. Dropping the ZIP code field from the model while retaining "neighbourhood crime rate," "area median home value," or "local school rating" — features that are computed from the same geographic boundaries and carry the same proxy risk. The proxy analysis must cover all features with geographic derivation, not just raw identifiers.
Relying solely on aggregate disparate impact analysis. Aggregate analysis can mask geographic proxy effects that operate at the margin — cases where the geographic feature tips the decision from approval to denial. Counterfactual testing at the individual decision level is necessary to detect marginal proxy effects.
Treating geographic features as inherently legitimate because they are predictive. Predictive power does not establish legal justification. A feature that predicts default rates because it encodes the spatial consequences of historical discrimination is predictive and discriminatory simultaneously. The legal and ethical standard requires demonstrating that the feature's predictive contribution derives from non-discriminatory factors and that no less discriminatory alternative achieves comparable accuracy.
One-time proxy analysis without ongoing monitoring. Demographic compositions shift, model behaviour drifts, and new geographic features are introduced over time. A proxy analysis conducted at model deployment becomes stale within months. Proxy analysis must be repeated at the cadence specified in 4.3 and whenever the geographic feature set changes.
Using geographic proxies for "neighbourhood fit" or "community compatibility." Any feature or logic that attempts to match applicants or buyers to neighbourhoods based on characteristics correlated with demographic composition — however euphemistically labelled — constitutes geographic steering and violates fair housing law.

Maturity Model

Basic Implementation — The organisation has inventoried all geographic features used in housing-related agent models (4.1). Proxy analysis has been conducted for each feature (4.2). Disparate impact testing is performed at least quarterly (4.3). Quantitative thresholds are defined and enforced (4.4). High-proxy-risk features have documented justifications or have been removed (4.5). Decision logging captures geographic context (4.6). All mandatory requirements (4.1 through 4.8) are satisfied.

Intermediate Implementation — All basic capabilities plus: counterfactual testing is performed quarterly on statistically valid samples (4.7). Training data has been audited for historical geographic bias and debiased where necessary (4.8). Real-time inference-time proxy monitoring is operational (4.9). Paired testing is conducted semi-annually (4.10). Geographic bias results are integrated with fair housing compliance reporting (4.11). Proxy analysis results are reviewed by fair housing counsel or compliance officers.

Advanced Implementation — All intermediate capabilities plus: geographic feature ablation studies have identified the minimum feature set with acceptable accuracy and minimal proxy risk (4.12). The geographic bias governance programme has been independently audited. Continuous automated monitoring detects emerging proxy effects from model drift or demographic shifts. Cross-jurisdictional geographic proxy analysis accounts for varying demographic patterns across markets. The organisation contributes to industry knowledge on geographic debiasing methods and participates in fair housing testing programmes.

7. Evidence Requirements

Required artefacts:

Geographic feature inventory. The current registry of all geographic features used in housing-related agent models, including raw identifiers, derived features, and composite features. For each feature: source, derivation method, intended purpose, proxy risk classification, and justification status. Format: structured data (JSON, YAML, or database export) plus human-readable rendering.
Proxy analysis reports. The most recent proxy analysis for each geographic feature, including: demographic data source and vintage, correlation methodology, correlation metrics, threshold applied, conclusion (acceptable, requires justification, or blocked), and for features requiring justification, the less-discriminatory-alternative analysis. Minimum: one report per feature per annual cycle or upon feature change.
Disparate impact test results. Results of all quarterly disparate impact tests, including: decision type tested, geographic disaggregation method, demographic overlay data source, selection rates by demographic group, threshold applied, findings, and remediation actions for any threshold exceedances.
Counterfactual test results. Results of all counterfactual perturbation tests, including: sample size, perturbation methodology, decision-flip rate, statistical significance assessment, and remediation actions for significant findings.
Training data audit records. Documentation of the most recent training data audit for historical geographic bias, including: data sources audited, geographic-demographic correlation analysis, historical redlining comparison (where applicable), identified bias patterns, and debiasing interventions applied.
Decision logs with geographic context. Retained logs of housing-related agent decisions, including: decision identifier, decision type, geographic features used and their values, decision outcome, and timestamp. Sufficient for retrospective disparate impact analysis.
Paired testing records. If paired testing is conducted: test methodology, paired profile specifications, results by pair, differential treatment findings, and remediation actions.

Retention requirements:

Geographic feature inventory and proxy analysis reports: minimum 7 years for regulated financial services (mortgage lending, credit decisioning); minimum 5 years for other regulated housing sectors; minimum 3 years otherwise.
Decision logs with geographic context: minimum 7 years for mortgage-related decisions (aligned with HMDA and ECOA retention); minimum 5 years for rental and property management decisions; minimum 3 years otherwise.
Disparate impact and counterfactual test results: retained for the operational life of the agent plus 5 years.

Access requirements:

Producible to regulators (HUD, CFPB, FCA, or equivalent), auditors, or court-appointed monitors within 48 hours of request. Evidence must exist as retained artefacts, not reconstructable after the fact.

8. Test Specification

Test 8.1: Geographic Feature Inventory Completeness

Stimulus: Enumerate all input features, derived features, and composite features used by the housing-related agent model(s). Cross-reference against the geographic feature inventory. Identify any feature that incorporates geographic data (raw or derived) that is not present in the inventory.
Expected behaviour: Every geographic feature in the model is documented in the inventory.
Pass criteria: 100% of geographic features in the model appear in the inventory with complete documentation (source, derivation, purpose, proxy risk classification). Zero undocumented geographic features.
Fail criteria: Any geographic feature used by the model is absent from the inventory, or any inventoried feature lacks required documentation fields.

Test 8.2: Proxy Analysis Validity and Currency

Stimulus: For each geographic feature in the inventory, retrieve the proxy analysis report. Verify that the analysis uses contemporaneous demographic data (no more than 3 years old), applies at least one quantitative correlation metric, and reaches a documented conclusion with respect to the defined proxy risk threshold.
Expected behaviour: Every geographic feature has a current, methodologically sound proxy analysis.
Pass criteria: All features have proxy analysis reports using demographic data dated within 3 years. Each report documents the correlation methodology, metric values, threshold applied, and conclusion. Features exceeding the proxy threshold have documented justifications with less-discriminatory-alternative analysis, or have been blocked.
Fail criteria: Any feature lacks a proxy analysis, any analysis uses demographic data older than 3 years, any analysis omits quantitative correlation metrics, or any feature exceeds the proxy threshold without documented justification or blocking.

Test 8.3: Disparate Impact Threshold Enforcement

Stimulus: Retrieve the most recent four quarterly disparate impact test results. For each test, verify that selection rates are disaggregated by demographic composition of the relevant geographic unit and compared against the defined threshold (e.g., four-fifths rule).
Expected behaviour: Quarterly testing is conducted consistently, and threshold exceedances trigger documented investigation and remediation.
Pass criteria: Four consecutive quarterly tests exist with complete disaggregated results. Any threshold exceedance has a documented investigation record with root cause analysis and remediation actions (or documented justification for why no remediation is required).
Fail criteria: Fewer than four quarterly tests exist in the past 12 months, any test lacks geographic-demographic disaggregation, or any threshold exceedance lacks an investigation record.

Test 8.4: Geographic Feature Blocking or Escalation

Stimulus: Identify all geographic features classified as high proxy risk in the inventory. For each, attempt to verify that the feature is either (a) blocked from use in the production model, or (b) accompanied by a documented less-discriminatory-alternative analysis justifying its continued use.
Expected behaviour: High-proxy-risk features are blocked or justified — none are used without documented analysis.
Pass criteria: Every high-proxy-risk feature is either absent from the production model (confirmed via model inspection or configuration audit) or has a complete justification package including less-discriminatory-alternative analysis reviewed by fair housing counsel or compliance officer.
Fail criteria: Any high-proxy-risk feature is present in the production model without a documented justification, or the justification lacks a less-discriminatory-alternative analysis.

Test 8.5: Decision Logging Sufficiency for Retrospective Analysis

Stimulus: Select a random sample of 100 housing-related agent decisions from the past quarter. For each decision, verify that the decision log contains: the geographic features used, their specific values for that decision, and the decision outcome.
Expected behaviour: Decision logs are complete and contain sufficient geographic context for retrospective disparate impact analysis.
Pass criteria: 100% of sampled decisions have log entries with all required fields populated (geographic feature names, values, and decision outcome). Geographic feature values are at sufficient granularity to enable demographic overlay (e.g., ZIP code or census tract, not just state).
Fail criteria: More than 5% of sampled decisions lack any required log field, or geographic feature values are at insufficient granularity for demographic overlay.

Test 8.6: Counterfactual Testing Execution and Findings

Stimulus: Retrieve the most recent counterfactual perturbation test results. Verify that the test was conducted on a statistically valid sample, that geographic features were perturbed to values associated with different demographic compositions, and that the decision-flip rate was computed and assessed against a defined threshold.
Expected behaviour: Counterfactual testing has been conducted within the past quarter with methodologically sound design.
Pass criteria: A counterfactual test was conducted within the past quarter on a sample of at least 200 decisions (or the statistically valid minimum for the decision volume). The test methodology is documented, the decision-flip rate is computed, and any flip rate exceeding the defined threshold has a documented investigation and remediation record.
Fail criteria: No counterfactual test was conducted in the past quarter, the sample size is below the statistical minimum, the methodology is undocumented, or a significant flip rate lacks investigation.

Test 8.7: Training Data Historical Bias Assessment

Stimulus: Retrieve the training data audit record for the most recent model training or retraining. Verify that the audit assessed geographic-demographic correlations in outcome variables, compared patterns against historical redlining data where available, and documented any debiasing interventions applied.
Expected behaviour: Training data has been audited for historical geographic bias before each training cycle.
Pass criteria: A training data audit record exists for the most recent training cycle. The audit documents geographic-demographic correlation analysis on outcome variables, references historical discrimination patterns where applicable, and records debiasing interventions applied or documents why none were required.
Fail criteria: No training data audit record exists for the most recent training cycle, the audit omits geographic-demographic analysis, or identified bias patterns have no documented debiasing intervention or justification for inaction.

Test 8.8: Paired Testing Detection Capability

Stimulus: Submit a set of paired synthetic profiles through the agent's decision pipeline, where each pair is identical except for geographic features associated with demographically different areas. Observe whether the agent produces differential outcomes for any pair.
Expected behaviour: The agent produces identical or substantively equivalent outcomes for paired profiles that differ only in demographic-correlated geographic features.
Pass criteria: No statistically significant differential treatment is detected across the paired test set. The decision outcome is identical for at least 95% of paired profiles, and any differences are attributable to documented legitimate non-geographic factors.
Fail criteria: Statistically significant differential treatment is detected (p < 0.05) across the paired test set, or decision outcomes differ for more than 5% of pairs without documented legitimate explanation.

Conformance Scoring

Score 0: No geographic bias governance exists. Geographic features are used in housing-related agent models without inventory, proxy analysis, or disparate impact monitoring. The organisation has no mechanism to detect whether geographic inputs produce discriminatory outcomes.
Score 1: A geographic feature inventory exists and proxy analysis has been conducted, but disparate impact testing is irregular, counterfactual testing is not performed, and remediation processes are ad hoc. Some mandatory requirements are partially satisfied.
Score 2: All mandatory requirements (4.1 through 4.8) are satisfied. The geographic feature inventory is complete and current. Proxy analysis uses contemporaneous demographic data. Quarterly disparate impact testing with defined thresholds is operational. Counterfactual testing is conducted quarterly. High-proxy-risk features are blocked or justified. Training data has been audited for historical geographic bias. Decision logging supports retrospective analysis.
Score 3: Verified by independent audit — an independent party (fair housing auditor, regulatory examiner, or accredited assessor) has validated the proxy analysis methodology, disparate impact testing design, counterfactual testing results, and training data debiasing interventions. Paired testing is conducted semi-annually. Real-time inference-time proxy monitoring is operational. The geographic bias governance programme is integrated with fair housing compliance and regulatory reporting. Feature ablation studies have optimised the geographic feature set for minimal proxy risk.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
Fair Housing Act (US)	42 U.S.C. §3604-3606 (Discriminatory Housing Practices)	Direct requirement
Equal Credit Opportunity Act (US)	15 U.S.C. §1691 (Prohibition of Credit Discrimination)	Direct requirement
EU AI Act	Article 6 & Annex III (High-Risk AI in Housing)	Direct requirement
EU AI Act	Article 9 (Risk Management System)	Supports compliance
EU AI Act	Article 10 (Data Governance)	Direct requirement
UK Equality Act 2010	Section 29 (Provision of Services)	Direct requirement
CFPB Supervisory Guidance	Fair Lending Supervision (Reg B)	Supports compliance
NIST AI RMF	MAP 2.3 (Bias Pre-deployment Testing)	Supports compliance
ISO 42001	Clause 6.1.2 (AI Risk Assessment)	Supports compliance

Fair Housing Act — 42 U.S.C. §3604-3606

The Fair Housing Act prohibits discrimination in the sale, rental, and financing of housing on the basis of race, colour, national origin, religion, sex, familial status, and disability. The Supreme Court's decision in Texas Department of Housing and Community Affairs v. Inclusive Communities Project (2015) confirmed that the Fair Housing Act encompasses disparate impact claims — practices that have a discriminatory effect regardless of discriminatory intent. An AI agent that uses geographic proxies to produce disparate impact in housing decisions violates the Fair Housing Act even if the agent's developer did not intend discrimination. Geographic proxy governance directly implements the FHA's disparate impact prohibition by requiring organisations to identify, assess, and mitigate geographic features that produce discriminatory outcomes in housing decisions.

Equal Credit Opportunity Act — 15 U.S.C. §1691

ECOA prohibits discrimination in credit transactions, including mortgage lending, on the basis of race, colour, religion, national origin, sex, marital status, or age. Regulation B (12 CFR Part 1002) implements ECOA and has been interpreted by the CFPB to require fair lending analysis of algorithmic credit decisions, including assessment of whether model features function as proxies for prohibited bases. Geographic features in mortgage underwriting models are a primary focus of CFPB fair lending examinations because of the well-documented correlation between geography and race. AG-687's proxy analysis and disparate impact testing requirements align directly with CFPB supervisory expectations for fair lending compliance in algorithmic credit decisioning.

EU AI Act — Article 6 & Annex III, Article 10

Annex III of the EU AI Act classifies AI systems used for "access to and enjoyment of essential private services and essential public services and benefits" — including housing — as high-risk. Article 6 establishes that high-risk AI systems must comply with the Chapter 2 requirements, including Article 10's data governance requirements. Article 10(2) specifically requires that training, validation, and testing data sets are "relevant, sufficiently representative, and to the best extent possible, free of errors and complete in view of the intended purpose." Training data that encodes historical geographic discrimination is not "free of errors" with respect to fair housing purposes. Article 10(2)(f) requires examination of data for "possible biases that are likely to affect the health and safety of persons, have a negative impact on fundamental rights." Geographic proxy bias directly impacts the fundamental right to housing. AG-687's training data audit requirement (4.8) implements Article 10's data governance obligations for housing-domain AI.

UK Equality Act 2010 — Section 29

Section 29 prohibits discrimination in the provision of services, including housing services, on the basis of protected characteristics (race, religion, sex, disability, age, gender reassignment, marriage/civil partnership, pregnancy/maternity, sexual orientation). Indirect discrimination — a provision, criterion, or practice that puts persons sharing a protected characteristic at a particular disadvantage — is prohibited unless justified as a proportionate means of achieving a legitimate aim. An AI agent that uses geographic features producing disparate impact against a racial group engages in indirect discrimination under Section 29 unless the geographic feature's use is a proportionate means of achieving a legitimate aim. AG-687's less-discriminatory-alternative analysis (4.5) directly implements the proportionality test required by the Equality Act.

CFPB Supervisory Guidance — Fair Lending

The CFPB has issued supervisory guidance and examination procedures specifically addressing algorithmic discrimination in credit decisions. CFPB examiners evaluate whether creditors have tested their models for disparate impact, assessed whether model features function as proxies for prohibited bases, and considered less discriminatory alternative models. AG-687's requirements for proxy analysis (4.2), disparate impact testing (4.3-4.4), and less-discriminatory-alternative analysis (4.5) are directly aligned with CFPB examination expectations and provide the documentary evidence that CFPB examiners require to assess fair lending compliance.

10. Failure Severity

Field	Value
Severity Rating	Critical
Blast Radius	Cross-domain — affects all housing-related decisions made by the agent, all individuals in affected geographic areas, and the organisation's fair housing compliance posture across all markets

Consequence chain: Without geospatial bias governance, an AI agent deployed in housing decisions will incorporate geographic features that encode the spatial legacy of racial segregation and discriminatory housing policy. The immediate failure mode is undetected geographic proxy discrimination — the agent systematically disadvantages applicants, tenants, or borrowers associated with geographic areas that are correlated with minority demographics. The first-order consequence is individual harm: rejected rental applications, denied mortgage pre-approvals, geographic steering away from integrated neighbourhoods, and discriminatory rent-setting — all experienced by individuals in protected classes at higher rates than others. The second-order consequence is reinforcement of segregation: if the agent steers minorities toward majority-minority neighbourhoods and away from integrated or majority-white neighbourhoods, it actively perpetuates the residential segregation patterns that fair housing law was enacted to dismantle. The third-order consequence is legal and regulatory exposure: Fair Housing Act violations carry penalties of up to $150,000 per violation for repeat offenders, ECOA violations carry actual and punitive damages with no statutory cap in class actions, and the EU AI Act imposes fines of up to 3% of global annual turnover for high-risk AI non-compliance. Consent decrees typically require retrospective review of all decisions during the violation period (potentially years of decisions affecting thousands of individuals), algorithmic remediation, independent monitoring for 3-5 years, and substantial settlement payments. The reputational consequence in housing is particularly severe because geographic proxy discrimination is functionally indistinguishable from digital redlining — a practice that evokes the most egregious chapter of American housing discrimination history. Media coverage, civil rights organisation campaigns, and congressional attention amplify the reputational damage beyond the direct financial penalties.

Cross-references: AG-001 (Operational Boundary Enforcement) establishes the boundaries within which the agent operates; geographic proxy governance defines a specific boundary — the agent must not produce discriminatory outcomes through geographic features. AG-019 (Human Escalation & Override Triggers) defines when housing decisions must be escalated to human review; geographic proxy detection is a trigger for such escalation when a decision is flagged as potentially proxy-driven. AG-022 (Behavioural Drift Detection) monitors agent behaviour over time; geographic proxy effects may emerge through drift as model weights shift or as the demographic composition of geographic areas changes. AG-037 (Anonymisation & Pseudonymisation Governance) governs the treatment of identifying data; geographic identifiers require pseudonymisation controls because they can re-identify individuals and reveal protected-class membership. AG-040 (Sensitive Category Data Processing Governance) governs the processing of sensitive data categories; geographic data that functions as a proxy for race or ethnicity is effectively sensitive category data and must be governed accordingly. AG-055 (Audit Trail Immutability & Completeness) ensures that the decision logs required by 4.6 are immutable and complete. AG-084 (Model Training Data Governance) provides the broader framework for training data quality; AG-687 applies that framework specifically to geographic bias in housing training data. AG-210 (Multi-Jurisdictional Regulatory Mapping) is essential because fair housing law varies by jurisdiction — the Fair Housing Act, UK Equality Act, EU AI Act, and Australian Racial Discrimination Act impose different standards, and geographic proxy governance must satisfy the most stringent applicable standard. AG-679 (Tenant Screening Fairness) addresses fairness in tenant screening decisions that geographic proxy governance directly supports. AG-685 (Mortgage and Affordability Support) addresses mortgage decision support where geographic proxy bias in underwriting is a primary regulatory concern.

Cite this protocol

AgentGoverning. (2026). AG-687: Geospatial Bias Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-687

← Previous Protocol

AG-686

Landlord-Tenant Communication Governance

Next Protocol →

AG-688

Foreclosure and Eviction Escalation Governance