AG-527

Protected Health Information Segmentation Governance

Healthcare & Life Sciences ~25 min read AGS v2.1 · April 2026
EU AI Act GDPR NIST HIPAA ISO 42001

2. Summary

Protected Health Information Segmentation Governance requires that AI agents operating in healthcare and life sciences environments enforce strong, structurally verified segregation between protected health information, derived clinical insights, de-identified research datasets, and operational metadata at every stage of processing, storage, and output generation. PHI carries the highest regulatory sensitivity of any personal data category: a single unsegmented data flow can expose diagnosis codes, genomic markers, substance-use history, mental health records, and reproductive health information to model components, downstream consumers, or audit logs that have no clinical justification for access. This dimension mandates that segmentation is not a policy aspiration but an architecturally enforced boundary with runtime verification, ensuring that PHI never leaks across segment boundaries without explicit, auditable, consent-validated authorisation.

3. Example

Scenario A — Diagnostic Inference Leaks into Billing Workflow: A hospital deploys an AI agent to assist with discharge summary generation and insurance pre-authorisation. The agent processes a patient's full electronic health record including a confirmed HIV diagnosis (ICD-10 B20), substance-use disorder treatment history (ICD-10 F10-F19 codes), and a psychiatric admission from 2019. The insurance pre-authorisation workflow requires only the primary discharge diagnosis (pneumonia, ICD-10 J18.9) and the length of stay. However, the agent's context window contains the complete clinical record, and it generates a pre-authorisation narrative that references "history of immunocompromised status" and "ongoing behavioural health management." The insurance carrier's claims processing system flags the patient for a substance-use-related coverage exclusion, resulting in denial of a $47,000 pneumonia treatment claim. The patient files a complaint with the Office for Civil Rights. Investigation reveals that 42 CFR Part 2 substance-use records and state-protected HIV status were disclosed without proper segmentation or patient authorisation. The hospital faces a $1.2 million settlement, $280,000 in OCR penalties, and mandatory corrective action plan lasting 3 years.

What went wrong: The agent had unrestricted access to the full clinical record when generating insurance pre-authorisation content. No segmentation boundary separated 42 CFR Part 2 substance-use records, state-protected HIV status, and psychiatric history from the billing-relevant clinical data. The agent inferred and disclosed protected categories because its context contained those categories without architectural barriers. The segmentation failure was not a model hallucination — the information was accurate but improperly disclosed.

Scenario B — Genomic Data Cross-Contaminates Research De-Identification Pipeline: A pharmaceutical research organisation uses an AI agent to de-identify clinical trial records for secondary research use. The agent processes 3,400 patient records containing genomic sequencing results (whole-exome sequencing), demographic data, treatment responses, and adverse event reports. The de-identification pipeline is designed to remove direct identifiers (name, date of birth, medical record number) and apply k-anonymity with k=5 to quasi-identifiers. However, the genomic sequencing data is processed in the same pipeline segment as demographic data. A downstream research team receives "de-identified" records that still contain rare genomic variants present in fewer than 50 individuals globally. Cross-referencing with the publicly available ClinVar database re-identifies 14 patients. Three of these patients are enrolled in a stigmatised disease study (Huntington's disease carrier status). The re-identification is discovered 9 months later during a routine privacy audit. The organisation reports to the relevant data protection authority, faces an estimated $3.8 million in regulatory penalties across two jurisdictions, must notify 3,400 patients of the breach, and suspends the research programme pending remediation.

What went wrong: Genomic data was not segmented from demographic quasi-identifiers during de-identification. The AI agent treated all record fields uniformly in its de-identification logic, failing to recognise that genomic variants constitute a re-identification vector independent of traditional quasi-identifiers. No segmentation boundary existed between genomic data and other PHI categories, allowing the pipeline to produce outputs that were technically de-identified under traditional criteria but re-identifiable through genomic cross-referencing.

Scenario C — Mental Health Notes Surface in Multi-Provider Care Coordination: A regional health information exchange deploys an AI agent to summarise patient records for care coordination across 12 participating provider organisations. A patient receiving treatment at a psychiatric facility and a primary care clinic has records in both systems. The care coordination agent generates a summary for the primary care physician that includes a verbatim excerpt from a psychotherapy session note: "Patient expressed suicidal ideation on 2024-11-15, safety plan established." The patient had explicitly refused consent for mental health records to be shared with non-psychiatric providers, a preference recorded in the psychiatric facility's consent management system but not propagated to the health information exchange's consent infrastructure. The primary care physician's office staff — three administrative employees without clinical justification — view the summary. The patient discovers the disclosure when a family member employed at the primary care office mentions the psychiatric treatment. The patient withdraws from psychiatric treatment, files a complaint, and the health information exchange faces a $620,000 HIPAA settlement and must rebuild its consent propagation architecture.

What went wrong: Mental health records were not segmented with consent-gated boundaries. The AI agent had access to the full record set across all participating providers without per-category consent verification. The patient's refusal of consent for mental health sharing was recorded in one system but not enforced as a segmentation boundary in the agent's data access layer. The agent could not distinguish between shareable and consent-restricted record categories because no segmentation architecture existed — all records were available in a single, undifferentiated data pool.

4. Requirement Statement

Scope: This dimension applies to any AI agent that processes, generates, stores, transmits, or derives insights from protected health information as defined by HIPAA (45 CFR §160.103), equivalent national health data protection laws, or organisational policy extending equivalent protection to health-related data. The scope includes agents operating within healthcare provider organisations, health plans, healthcare clearinghouses, business associates, pharmaceutical companies, clinical research organisations, health information exchanges, digital health platforms, and any entity that handles PHI under a data use agreement or business associate agreement. The scope explicitly includes derived data — clinical inferences, risk scores, treatment recommendations, and diagnostic predictions generated by the agent from PHI inputs — because derived data can reveal the same protected information as the source PHI. The scope also includes de-identification pipelines, because segmentation failures during de-identification create re-identification risks that are functionally equivalent to PHI disclosure.

4.1. A conforming system MUST enforce architecturally distinct segmentation boundaries between the following PHI categories: (a) direct patient identifiers, (b) clinical diagnosis and treatment records, (c) substance-use disorder records subject to 42 CFR Part 2 or equivalent protections, (d) mental and behavioural health records, (e) reproductive and sexual health records, (f) genomic and genetic data, (g) HIV/AIDS status and related records, and (h) minor and adolescent health records with age-dependent consent requirements. Each segment MUST be independently access-controlled.

4.2. A conforming system MUST verify, at runtime and prior to any cross-segment data access, that a valid clinical or operational justification exists for the access and that applicable patient consent authorisations permit the specific cross-segment flow. Verification MUST be logged with the justification basis, consent reference, requesting component identity, and timestamp.

4.3. A conforming system MUST prevent derived insights — including clinical inferences, risk scores, diagnostic predictions, and treatment recommendations — from being transmitted to downstream consumers unless the consumer is authorised to receive the source PHI categories from which the insight was derived. An inference that a patient is immunocompromised MUST be treated with the same segmentation restrictions as the underlying diagnosis.

4.4. A conforming system MUST implement segmentation in the de-identification pipeline such that data categories with independent re-identification risk (particularly genomic and genetic data) are processed in isolation from demographic quasi-identifiers, with separate re-identification risk assessment for each segment before any recombination.

4.5. A conforming system MUST propagate patient consent preferences — including category-specific refusals — from the consent management system to all segmentation enforcement points within the agent's data access layer, with a maximum propagation latency of 60 seconds for consent revocations and 300 seconds for new consent grants.

4.6. A conforming system MUST maintain a segmentation map — a machine-readable artefact documenting all PHI categories, their segment boundaries, the enforcement mechanisms at each boundary, and the authorised cross-segment flows with their justification bases.

4.7. A conforming system MUST generate tamper-evident audit records for every cross-segment data access, including accesses that were denied, with sufficient detail to reconstruct the data flow for regulatory investigation.

4.8. A conforming system SHOULD implement minimum-necessary filtering at each segment boundary, ensuring that when cross-segment access is authorised, only the specific data elements required for the authorised purpose are transmitted — not the entire contents of the source segment.

4.9. A conforming system SHOULD implement automated segmentation validation that continuously monitors data flows for cross-segment leakage by analysing agent outputs, audit logs, and downstream data stores for the presence of data elements from segments that were not authorised for the specific flow.

4.10. A conforming system MAY implement dynamic segmentation granularity that adjusts segment boundaries based on jurisdictional requirements — for example, applying stricter segmentation to reproductive health records in jurisdictions with enhanced reproductive data protections.

5. Rationale

Protected health information occupies a unique position in the data sensitivity hierarchy. Unlike financial data, where the primary risk is monetary loss, PHI disclosure can cause irreversible harm to individuals — stigmatisation from mental health or substance-use diagnoses, discrimination based on genetic predispositions, loss of insurance coverage, employment consequences, relationship damage, and in extreme cases, physical danger. The sensitivity is compounded by the categorical heterogeneity of PHI: a single patient record may contain data elements subject to five or more distinct regulatory regimes (HIPAA general provisions, 42 CFR Part 2 for substance-use, state-specific HIV protections, genetic information nondiscrimination protections, and minor consent laws), each with different disclosure rules, consent requirements, and penalty structures.

AI agents create an amplified segmentation risk compared to traditional health IT systems. In a traditional electronic health record, access controls are applied at the user interface level — a billing clerk sees billing fields, a clinician sees clinical fields. The data may be stored in a single database, but the presentation layer enforces role-based segmentation. AI agents bypass this presentation-layer segmentation by processing the underlying data directly. An agent generating a discharge summary has access to the raw data, not a role-filtered view of it. The agent's "view" of the data is its context window, and unless segmentation is enforced before data enters the context, all categories of PHI are available to the agent simultaneously. This means the agent can — and will, if the data is present — draw inferences across categories, combine information that should remain separated, and produce outputs that reflect unsegmented access even if the output is intended for a restricted-access consumer.

The derived-insights problem is particularly acute for AI agents. A traditional query against a health database returns stored data — the query cannot infer information that is not in the queried fields. An AI agent, by contrast, routinely infers information from context. If an agent's context contains medication lists (antiretrovirals), laboratory results (CD4 counts), and infectious disease consultation notes, the agent will infer HIV status even if the HIV diagnosis code is redacted. Segmentation must therefore extend to inference pathways, not just stored data fields. This is why Requirement 4.3 mandates that derived insights inherit the segmentation restrictions of their source data.

The de-identification challenge represents another dimension of segmentation risk. Traditional de-identification focuses on removing direct identifiers and suppressing quasi-identifiers. Genomic data breaks this model because a genomic sequence is both a clinical data element and a unique identifier — rare variants can identify individuals with near-certainty, and this identification capability is independent of traditional quasi-identifiers like age, gender, and zip code. Processing genomic data in the same de-identification pipeline as demographic data creates a false assurance of de-identification: the output passes traditional k-anonymity tests but is trivially re-identifiable through genomic cross-referencing. Requirement 4.4 mandates separate processing to prevent this category of segmentation failure.

Consent propagation latency — addressed in Requirement 4.5 — reflects the temporal dimension of segmentation. A patient's consent preferences are not static. A patient may revoke consent for mental health record sharing during a therapy session, but if the revocation takes 24 hours to propagate to the care coordination agent, the agent may disclose mental health information during that window in direct violation of the patient's expressed wishes. The 60-second maximum for revocations reflects the clinical reality that consent revocations are typically urgent (the patient has a reason for revoking now) while new consent grants are less time-sensitive.

6. Implementation Guidance

PHI segmentation for AI agents requires enforcement at the data access layer, not the output layer. By the time PHI reaches the agent's context window, the segmentation opportunity has passed — the agent has already "seen" the data. Effective segmentation prevents unauthorised PHI categories from entering the context in the first place, supplements this with output filtering as defence in depth, and monitors continuously for segmentation leakage.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Hospital Systems and Integrated Delivery Networks. Large health systems with multiple specialties generate the most complex segmentation challenges because a single patient may have records across psychiatric, substance-use, oncology, reproductive health, and genomic testing services — all within the same enterprise. The segmentation architecture must handle intra-organisational boundaries that are as strict as inter-organisational boundaries when regulatory requirements demand it (42 CFR Part 2 applies equally within and between organisations).

Health Information Exchanges. HIEs aggregate records from multiple provider organisations, each with potentially different consent models and state-law requirements. The segmentation challenge is compounded by jurisdictional variation: a patient treated in two states may have different consent requirements for mental health records in each state. The AI agent must enforce the most restrictive applicable requirement for each record segment.

Pharmaceutical and Clinical Research. Research organisations face the dual challenge of segmentation for clinical operations and segmentation during de-identification for secondary use. The genomic re-identification risk described in Scenario B is a particular concern for oncology and rare-disease research, where genomic data is both clinically essential and uniquely identifying.

Digital Health and Telemedicine Platforms. Direct-to-consumer health platforms may process self-reported health data alongside clinician-generated records. Self-reported data about substance use, mental health, or sexual health may not be classified as PHI under HIPAA if the platform is not a covered entity, but may still carry significant sensitivity and be subject to state consumer health data protection laws. Segmentation should be applied based on data sensitivity, not solely on regulatory classification.

Maturity Model

Basic Implementation — The organisation has defined PHI segments aligned with regulatory categories (substance-use, mental health, genomic, HIV, reproductive health, minor records). A segmentation map documents each segment, its boundaries, and authorised cross-segment flows. Pre-context segmentation enforcement prevents restricted segments from entering agent contexts without authorisation. Consent preferences are propagated to enforcement points within the required latency. Audit records capture all cross-segment access decisions. Testing verifies segmentation enforcement for each defined segment boundary.

Intermediate Implementation — All basic capabilities plus: derived-insight tagging traces inferences to source segments and applies inherited restrictions. Genomic data is processed in an isolated de-identification pipeline. Minimum-necessary filtering limits cross-segment data transmission to specific required elements. Automated segmentation validation monitors outputs for cross-segment leakage. Consent verification is performed at runtime for each access, not cached. Segmentation canary monitoring provides continuous breach detection.

Advanced Implementation — All intermediate capabilities plus: dynamic segmentation adapts to jurisdictional requirements based on patient location, provider location, and applicable law. Cross-segment access is subject to real-time risk scoring that considers the sensitivity of the data, the purpose of the access, the consumer's authorisation level, and the patient's consent history. The organisation can demonstrate through independent audit that no known segmentation bypass — including inference-based disclosure, de-identification re-identification, and consent propagation delay — produces unauthorised PHI disclosure. Segmentation effectiveness metrics are reported to governance leadership on a monthly cadence.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Segment Boundary Enforcement — Substance-Use Records

Test 8.2: Derived-Insight Segmentation — Inference Blocking

Test 8.3: Consent Revocation Propagation Latency

Test 8.4: Genomic Data Isolation in De-Identification

Test 8.5: Cross-Segment Access Audit Completeness

Test 8.6: Segmentation Map Accuracy Verification

Test 8.7: Multi-Segment Workflow — Minimum-Necessary Filtering

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 10 (Data and Data Governance)Direct requirement
EU AI ActArticle 9 (Risk Management System)Supports compliance
EU MDRArticle 110 (Confidentiality) and Annex I Section 17.2 (Data Protection)Direct requirement
HIPAA45 CFR §164.502 (Uses and Disclosures), §164.514 (De-Identification)Direct requirement
GDPRArticle 9 (Processing of Special Categories), Article 25 (Data Protection by Design)Direct requirement
FDA 21 CFR Part 11§11.10 (Controls for Closed Systems)Supports compliance
NIST AI RMFMAP 3.1, MANAGE 2.2, GOVERN 1.5Supports compliance
ISO 42001Clause 6.1 (Actions to Address Risks), Annex B (AI Objectives)Supports compliance

EU AI Act — Article 10 (Data and Data Governance)

Article 10 requires that training, validation, and testing datasets for high-risk AI systems be subject to appropriate data governance practices, including examination for biases and measures to detect, prevent, and mitigate possible biases. For AI agents processing PHI, this extends to ensuring that data governance includes segmentation that prevents inappropriate cross-category contamination. An agent trained or operating on unsegmented PHI may develop inference pathways that systematically violate segmentation boundaries — for example, learning to infer substance-use history from medication patterns. Data governance under Article 10 must address segmentation at the data preparation stage, not only at the operational stage.

EU MDR — Article 110 and Annex I Section 17.2

The EU Medical Device Regulation applies to AI systems classified as medical devices. Article 110 imposes confidentiality obligations on information obtained during conformity assessment. Annex I Section 17.2 requires that software medical devices ensure the security and protection of personal data. For AI agents operating as or within medical devices, PHI segmentation is a core requirement for demonstrating that patient data is protected throughout the device's data processing lifecycle. Failure to segment PHI categories within a medical device AI agent may constitute a non-conformity with the essential requirements of Annex I.

HIPAA — 45 CFR §164.502 and §164.514

HIPAA's minimum-necessary standard (§164.502(b)) requires that covered entities limit PHI disclosures to the minimum necessary to accomplish the intended purpose. For AI agents, this translates directly to segmentation: the agent should access only the PHI segments necessary for its current task. The de-identification standard (§164.514) specifies methods for de-identification, but does not address the genomic re-identification risk that arises when genomic data is processed alongside quasi-identifiers. AG-527 extends the HIPAA framework by mandating genomic isolation during de-identification — a segmentation control that prevents re-identification vectors not addressed by the Safe Harbor or Expert Determination methods alone.

GDPR — Article 9 and Article 25

Article 9 imposes heightened protections on special categories of personal data, including health data and genetic data. Article 25 requires data protection by design and by default. PHI segmentation is a direct implementation of data protection by design — structurally preventing unnecessary processing of special-category data rather than relying on procedural controls. The principle of data minimisation (Article 5(1)(c)) further supports segmentation: the agent should process only the PHI categories required for the specific purpose, enforced through architectural boundaries.

FDA 21 CFR Part 11 — §11.10 (Controls for Closed Systems)

Part 11 requires controls for systems that create, modify, maintain, or transmit electronic records. For AI agents processing PHI, the integrity controls required by §11.10 extend to segmentation enforcement: the system must ensure that electronic records (PHI segments) are not accessed, combined, or transmitted outside of authorised pathways. The audit trail requirements of §11.10(e) align with AG-527's requirement for tamper-evident audit records of all cross-segment access decisions.

NIST AI RMF — MAP 3.1, MANAGE 2.2, GOVERN 1.5

MAP 3.1 addresses the identification and documentation of AI system dependencies, including data dependencies. PHI segmentation maps are a direct implementation of MAP 3.1 for healthcare AI systems. MANAGE 2.2 addresses mechanisms for tracking identified risks, and GOVERN 1.5 addresses processes for ongoing monitoring. The continuous segmentation monitoring and canary-based breach detection mechanisms in AG-527 support MANAGE 2.2 and GOVERN 1.5 compliance.

ISO 42001 — Clause 6.1 and Annex B

ISO 42001 requires organisations to address risks and opportunities related to their AI management system. For healthcare AI, PHI segmentation failures represent a high-severity risk that must be identified, assessed, and mitigated through the risk treatment process. Annex B's AI objectives include ensuring that AI systems process data in accordance with applicable legal and regulatory requirements — a direct mandate for PHI segmentation where health data protection laws apply.

10. Failure Severity

FieldValue
Severity RatingCritical
Blast RadiusPatient-level harm with institutional consequences — each segmentation failure potentially affects an individual patient's privacy, health outcomes, and legal rights, while institutional consequences scale with the number of affected patients and the sensitivity of the disclosed categories

Consequence chain: A segmentation boundary failure allows PHI from a restricted category (substance-use, mental health, HIV, genomic, reproductive health) to cross into an unauthorised context — the agent's processing context, an output delivered to an unauthorised consumer, or a de-identified dataset that retains re-identification capability. The immediate consequence is an unauthorised disclosure of protected health information. For the affected patient, this may trigger: insurance coverage denial based on disclosed conditions (Scenario A: $47,000 claim denial), employment discrimination if genetic predispositions are revealed, social stigmatisation from mental health or substance-use disclosures, withdrawal from treatment due to loss of trust in the healthcare system (Scenario C), or legal exposure if reproductive health data is disclosed in jurisdictions with restrictive reproductive laws. For the organisation, the consequence chain includes: regulatory investigation by OCR, state attorneys general, or national data protection authorities; civil monetary penalties ranging from $100 to $1,919,173 per HIPAA violation category (adjusted annually for inflation); private litigation including class actions if multiple patients are affected (Scenario B: 3,400 patients); mandatory corrective action plans lasting 1-3 years; reputational damage affecting patient trust and provider participation in health information exchanges; and potential criminal referral for knowing violations. The critical severity rating reflects the combination of irreversible patient harm, multi-regulatory enforcement exposure, and the scaling potential — a systemic segmentation failure in a health information exchange or de-identification pipeline can affect thousands of patients simultaneously.

Cross-references: AG-014 (Data Classification Governance) provides the foundational classification framework that PHI segmentation extends with healthcare-specific categories. AG-015 (PII & Sensitive Data Handling) establishes general sensitive data protections that AG-527 specialises for the PHI context. AG-520 (Patient Consent and Override Governance) governs the consent decisions that AG-527 enforces at segment boundaries. AG-526 (Device and Regimen Coordination Governance) creates cross-segment data flows that must comply with AG-527 segmentation requirements. AG-515 (HR Sensitive Data Compartmentalisation Governance) applies analogous segmentation principles in the employment context. AG-480 (Insider Information Isolation Governance) addresses analogous isolation requirements for financial insider information. AG-376 (Connector Data Return Minimisation Governance) supports minimum-necessary filtering at integration boundaries. AG-048 (Cross-Border Data Sovereignty Governance) addresses jurisdictional complexity when PHI crosses national borders with varying health data protection regimes.

Cite this protocol
AgentGoverning. (2026). AG-527: Protected Health Information Segmentation Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-527