AG-709

Sequence Data Sensitivity Governance

Biotechnology, Genomics & Biosecurity ~30 min read AGS v2.1 · April 2026
EU AI Act GDPR NIST HIPAA ISO 42001

2. Summary

Sequence Data Sensitivity Governance requires that AI agents handling genomic, proteomic, or nucleotide sequence data classify, protect, and control access to that data commensurate with its sensitivity — which ranges from personally identifiable human genome data to potentially dual-use pathogen sequences capable of enabling biological weapons development. Genomic and sequence data occupies a unique position in the data sensitivity landscape: it is simultaneously personal data (re-identifiable even after de-identification), health data (predictive of disease susceptibility), and potential dual-use material (capable of enabling synthesis of dangerous biological agents). This dimension mandates that agents operating in biotechnology, genomics, and biosecurity domains implement layered sensitivity classification, enforce access controls calibrated to the classification level, prevent unauthorised aggregation or exfiltration of high-sensitivity sequences, and maintain complete provenance records for all sequence data interactions.

3. Example

Scenario A — Unclassified Pathogen Sequences Exfiltrated Through Research Agent: A university research consortium deploys an AI agent to assist bioinformatics researchers with sequence alignment, variant analysis, and literature synthesis. The agent has access to a shared sequence repository containing 4.2 million nucleotide sequences, including 340 sequences derived from select agent pathogens listed under national biosecurity regulations (e.g., variola major fragments, reconstructed 1918 influenza polymerase segments, enhanced transmissibility avian influenza constructs). The repository uses a flat access model — all authenticated researchers can query all sequences. The agent has no sensitivity classification layer and treats all sequence data identically. A postdoctoral researcher uses the agent to perform a comparative analysis of polymerase gene sequences across influenza strains. The agent retrieves and presents 47 sequences, including 3 enhanced-transmissibility constructs that are classified as dual-use research of concern (DURC). The researcher exports the agent's output — including the DURC sequences with full nucleotide-level detail — to a personal cloud storage account to continue analysis from home. The export is not logged, no sensitivity alert is triggered, and no institutional biosafety review is conducted. Six months later, a government biosecurity audit of the university's select agent programme discovers that DURC-classified sequences have been stored on an uncontrolled personal cloud account for 6 months, in violation of national select agent regulations. The sequences are also discovered in the researcher's personal backup, which is synchronised to a server in a jurisdiction with no biosecurity export controls.

What went wrong: The AI agent had no sensitivity classification for sequence data. All 4.2 million sequences were treated as equivalent, with no distinction between benign reference genomes and DURC-classified select agent sequences. No access control differentiated between routine sequence queries and queries that returned dual-use material. No export control prevented the researcher from moving DURC sequences to uncontrolled storage. No cross-border data transfer governance prevented synchronisation to a foreign jurisdiction. Consequence: Violation of select agent regulations carrying penalties up to $250,000 per violation and potential criminal liability. University's Institutional Biosafety Committee (IBC) registration suspended pending investigation. Research consortium loses federal funding eligibility for 18 months, affecting $12.4 million in active grants. The university incurs $1.8 million in forensic investigation, remediation, and legal costs.

Scenario B — Re-Identification of Anonymised Genomic Data Through Agent-Mediated Linkage: A national health service deploys an AI agent to support population genomics research. The agent provides researchers access to a dataset of 180,000 whole-genome sequences that have been de-identified by removing direct identifiers (name, date of birth, national ID number) and replacing them with pseudonymous research identifiers. The agent also has access to a separate phenotype database containing clinical measurements (height, weight, blood pressure, glucose levels) linked to the same pseudonymous identifiers. A research team asks the agent to correlate rare genetic variants with clinical phenotype clusters. The agent identifies 23 individuals with an extremely rare combination of genetic variants (allele frequency < 0.001%) and specific phenotype measurements. A member of the research team — who also works as a clinician in a regional hospital — recognises that the phenotype cluster matches a specific patient cohort she treats. By cross-referencing the rare variant combination with the hospital's clinical records, she re-identifies 7 of the 23 individuals, gaining access to their full genomic data without consent for clinical use. She uses this information to adjust treatment plans without informing the patients that their research genomic data influenced clinical decisions. One patient later discovers the linkage when they request their clinical records and finds a note referencing "genomic variant profile consistent with research dataset."

What went wrong: The AI agent facilitated a linkage attack by combining rare genetic variants with phenotype data without assessing re-identification risk. The pseudonymisation was inadequate for the combination of data dimensions the agent could access simultaneously. No re-identification risk threshold prevented the agent from returning results where the combination of rare variants and phenotype clusters reduced the anonymity set to identifiable individuals. No consent boundary prevented clinical use of research-consented genomic data. Consequence: Violation of GDPR Article 9 (processing of genetic data without explicit consent) and Article 5(1)(b) (purpose limitation). National data protection authority imposes a fine of EUR 4.2 million. The national health service suspends the population genomics programme for 14 months, delaying three clinical trials. Seven patients file individual claims, and the clinician faces professional misconduct proceedings.

Scenario C — Synthesis-Relevant Sequences Assembled Incrementally Through Agent Queries: A commercial DNA synthesis company deploys an AI agent to assist customers with sequence design and optimisation. The agent provides codon optimisation, expression prediction, and construct assembly guidance. A customer submits a series of 14 sequential queries over 3 weeks, each requesting optimisation of a short nucleotide fragment (80-150 base pairs). Individually, each fragment appears innocuous — they resemble common molecular biology constructs. The agent processes each query independently, applying synthesis screening only to the individual fragment. No query triggers a biosecurity alert. However, when assembled in order, the 14 fragments reconstruct approximately 85% of a functional toxin gene from a Category A select agent. The remaining 15% is publicly available in GenBank. The assembly is only discovered when a routine quarterly audit of customer query patterns is conducted 7 weeks after the final query, by which time the customer has already placed synthesis orders with three different providers for overlapping fragments that, combined, complete the full gene.

What went wrong: The AI agent performed sensitivity screening at the individual query level without maintaining session-level or customer-level sequence aggregation analysis. Each fragment was below the screening threshold in isolation. No cumulative assembly analysis detected that the sequential queries reconstructed a dangerous sequence. No cross-query linkage tracked the customer's progressive construction of a select agent gene. The agent's synthesis screening was stateless — it could not detect incremental construction strategies designed to evade fragment-level screening. Consequence: Potential violation of the Biological Weapons Convention implementation legislation, carrying criminal penalties. The synthesis company faces regulatory investigation by the national biosecurity authority. Customer orders are intercepted, but three partial constructs have already been shipped. The company incurs $2.1 million in investigation, customer audit, and screening system overhaul costs. The company's synthesis licence is suspended for 90 days, causing $4.6 million in lost revenue.

4. Requirement Statement

Scope: This dimension applies to every AI agent that processes, stores, transmits, retrieves, analyses, generates, or facilitates the synthesis of nucleotide sequences (DNA, RNA), amino acid sequences (proteins, peptides), or associated genomic metadata (variant annotations, gene expression profiles, epigenetic markers, phenotype-genotype linkage data). The scope includes agents operating in research bioinformatics, clinical genomics, DNA synthesis platforms, agricultural biotechnology, forensic genetics, population health genomics, and any other domain where sequence data is handled. The scope covers both human-derived genomic data (subject to personal data and health data regulations) and non-human sequence data (subject to biosecurity, dual-use, and select agent regulations). The scope extends to agents that do not directly store sequences but that query, retrieve, transform, or reason about sequence data from external databases or repositories. Agents that handle sequence data in transit — even if the agent does not interpret the biological content — are within scope for the access control, encryption, and provenance requirements.

4.1. A conforming system MUST implement a multi-tier sensitivity classification scheme for all sequence data, distinguishing at minimum: (a) publicly available reference sequences with no access restrictions; (b) personally identifiable or re-identifiable genomic data subject to data protection regulations; (c) clinically significant genomic data subject to health data regulations and consent requirements; (d) sequences derived from or functionally equivalent to select agents, toxins, or dual-use research of concern material subject to biosecurity regulations; and (e) sequences subject to intellectual property protections or material transfer agreements.

4.2. A conforming system MUST enforce access controls that are calibrated to the sensitivity classification of the sequence data being accessed, ensuring that agents and human users can only access sequence data for which they hold appropriate authorisation, and that authorisation is validated at the point of each data access — not only at session initiation.

4.3. A conforming system MUST implement re-identification risk assessment for any operation that combines genomic data with phenotype data, demographic data, or other auxiliary information, and block or flag operations where the resulting anonymity set falls below a defined minimum threshold (default: k=10 for population-level analyses, k=50 for data intended for broad research access).

4.4. A conforming system MUST maintain cumulative sequence aggregation analysis that tracks sequence fragments retrieved, generated, or optimised across queries within a session, across sessions for the same user, and across users within the same organisational entity, detecting when incrementally assembled fragments reconstruct sequences that would trigger sensitivity classification at a higher tier than any individual fragment.

4.5. A conforming system MUST enforce export controls on sequence data classified at tier (d) or above, preventing transfer to storage, systems, or jurisdictions that do not meet the required security and regulatory standards, with automated blocking and mandatory human review before any cross-boundary transfer of biosecurity-sensitive sequences.

4.6. A conforming system MUST log every access, query, retrieval, transformation, export, and deletion of sequence data at sensitivity tier (b) or above, with sufficient detail to reconstruct the full data lineage — who accessed what sequence, when, from which source, for what stated purpose, and what downstream operations were performed on the data.

4.7. A conforming system MUST encrypt all sequence data classified at sensitivity tier (b) or above, both at rest and in transit, using cryptographic controls that meet or exceed the requirements of AG-042, with key management procedures that prevent a single compromised credential from exposing the entire sequence repository.

4.8. A conforming system MUST implement automated screening of all sequence outputs — whether retrieved, generated, or optimised by the agent — against curated databases of regulated sequences (select agent lists, controlled pathogen registries, dual-use research of concern catalogues), with mandatory quarantine and human biosafety review for any match above a defined similarity threshold.

4.9. A conforming system MUST validate that consent authorisations cover the specific processing operation being performed before allowing agent access to human-derived genomic data, ensuring that research-consented data is not used for clinical, commercial, or law enforcement purposes without additional consent, and that consent withdrawal propagates to all downstream copies and derivatives within a defined time window.

4.10. A conforming system SHOULD implement differential privacy or similar privacy-preserving mechanisms for aggregate genomic analyses that return population-level statistics, preventing reconstruction of individual genotypes from aggregate query results.

4.11. A conforming system SHOULD maintain a sequence provenance graph that records the origin, transformations, combinations, and derivative uses of each sequence or sequence fragment processed by the agent, enabling forensic reconstruction of how any given sequence was obtained or assembled.

4.12. A conforming system SHOULD implement anomaly detection on sequence access patterns — including unusual query volumes, queries targeting sequences with high biosecurity sensitivity, queries from atypical geographic locations, and sequential queries that suggest incremental assembly strategies — with automated alerting to the institutional biosafety authority.

4.13. A conforming system MAY implement homomorphic encryption or secure multi-party computation for genomic analyses that require computation on sensitive sequence data without exposing the underlying sequences to the computing agent.

4.14. A conforming system MAY implement purpose-bound sequence tokens that allow agents to reference and operate on sequence data for specific authorised purposes without retaining the raw nucleotide or amino acid content in the agent's working memory or logs.

5. Rationale

Genomic and sequence data presents a unique convergence of sensitivity dimensions that no other data category fully replicates. A single whole-genome sequence is simultaneously: a permanent personal identifier (more unique and more stable than a fingerprint, and impossible to change if compromised), a medical record (predictive of disease risk, drug response, and life expectancy), a familial identifier (revealing information about biological relatives who have not consented to disclosure), and potentially a blueprint for biological agents (if the sequence encodes pathogenic functions, toxin production, or enhanced-transmissibility modifications). This convergence means that a governance failure affecting sequence data can simultaneously trigger data protection violations, health data breaches, familial privacy harms, and biosecurity incidents — a blast radius that justifies the High-Risk/Critical tier classification.

The re-identification risk of genomic data is qualitatively different from other data types. Research has repeatedly demonstrated that as few as 75-100 independent SNPs are sufficient to uniquely identify an individual in a global population. De-identification techniques that are adequate for clinical records — removing names, dates, and identifiers — are insufficient for genomic data because the genome itself is the identifier. An AI agent that combines de-identified genomic data with phenotype data, demographic data, or publicly available genealogy databases can re-identify individuals with high confidence, even when each dataset individually appears anonymous. This is not a theoretical risk: published studies have demonstrated re-identification of participants in the Personal Genome Project, the 1000 Genomes Project, and other ostensibly anonymised datasets. AG-709 addresses this by requiring re-identification risk assessment at the point of data combination, not merely at the point of initial de-identification.

The biosecurity dimension adds a layer of urgency that distinguishes sequence data governance from general data protection. Advances in synthetic biology have dramatically reduced the cost and skill required to synthesise functional DNA from digital sequence information. A sequence that exists only as a digital file can be converted to physical biological material through commercial DNA synthesis services — and the barrier to doing so continues to decline. This means that the exfiltration of a digital sequence file encoding a select agent toxin is functionally equivalent to the theft of a physical sample of the agent. Traditional data loss prevention treats data exfiltration as a confidentiality failure; in the biosecurity context, sequence data exfiltration is a material security threat with potential mass-casualty consequences. The incremental assembly attack demonstrated in Scenario C is a documented concern in the biosecurity community — the International Gene Synthesis Consortium's screening framework explicitly acknowledges the risk of customers splitting dangerous sequences into individually innocuous fragments.

Cross-border complexity further elevates the governance challenge. Sequence data generated in one jurisdiction may be subject to biosecurity export controls in that jurisdiction, data protection regulations in the jurisdiction of the data subject, health data regulations in the jurisdiction of the clinical institution, and intellectual property protections in the jurisdiction of the research funder — simultaneously. An AI agent operating across these boundaries without jurisdiction-aware sensitivity classification will inevitably violate at least one regulatory regime. The Nagoya Protocol on Access and Benefit-Sharing adds a further dimension for non-human sequence data: genetic sequences derived from organisms in signatory countries may carry benefit-sharing obligations that restrict commercial use, creating a sensitivity tier that is neither personal data nor biosecurity-related but is legally enforceable.

The permanence of genomic data intensifies all of these risks. Unlike a password or credit card number, a compromised genome cannot be rotated or reissued. A genomic data breach exposes the data subject — and their biological relatives — to indefinite future risk, including risks from analytical capabilities that do not yet exist but will emerge as genomic science advances. This permanence demands that preventive controls are more stringent than for other data categories, because the cost of failure cannot be mitigated after the fact.

6. Implementation Guidance

Sequence Data Sensitivity Governance requires a layered architecture that classifies sequences by sensitivity, enforces access controls at each classification tier, monitors for aggregation and re-identification risks, and maintains forensic-grade provenance records. The implementation must be technically integrated with the agent's data pipeline — classification and access control cannot be bolted on as afterthoughts but must be enforced at the point of data access, transformation, and output.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Pharmaceutical and Clinical Research. Clinical trial genomic data is subject to Good Clinical Practice (GCP) requirements, informed consent constraints, and regulatory submission obligations. Agents supporting pharmacogenomics research must enforce consent-scope boundaries that distinguish between trial-authorised analyses and exploratory research. Data shared with regulatory authorities for drug approval must meet the submission jurisdiction's requirements for genomic data handling. The FDA's Voluntary Genomic Data Submission programme and the EMA's genomic data policies impose different handling requirements that jurisdiction-aware governance must address.

DNA Synthesis and Synthetic Biology. Commercial synthesis providers face a unique obligation: they are the materialisation point where digital sequences become physical biological material. Agents operating in this domain must implement the most stringent biosecurity screening, including cumulative assembly detection. The International Gene Synthesis Consortium (IGSC) Harmonized Screening Protocol provides a baseline screening standard, but agents should exceed this baseline with stateful cross-query analysis. Providers operating across jurisdictions must comply with the most restrictive applicable regime — Australia's Gene Technology Act, the EU's Contained Use Directive, the US Select Agent Regulations — simultaneously.

Population Health and Biobank Operations. National biobanks and population health programmes handle genomic data at scale (hundreds of thousands to millions of genomes), amplifying re-identification risk because the large dataset enables more powerful linkage attacks. Agents supporting biobank operations must implement differential privacy for aggregate queries, enforce strict re-identification risk thresholds, and maintain consent lifecycle management that can propagate withdrawal across all derivative datasets. The UK Biobank, deCODE Genetics, and similar programmes provide operational models for large-scale genomic data governance.

Agricultural and Environmental Genomics. Non-human sequence data may appear to carry lower sensitivity, but the Nagoya Protocol, plant variety protection regulations, and agricultural biosecurity regulations impose significant restrictions. Agents handling crop genome data, livestock genomic selection data, or environmental DNA (eDNA) surveys must classify sequences for benefit-sharing obligations, intellectual property restrictions, and agricultural biosecurity concerns (e.g., sequences related to controlled plant pathogens).

Forensic Genetics. Law enforcement use of genomic data intersects with civil liberties protections, chain-of-custody requirements, and jurisdictional restrictions on familial DNA searching. Agents supporting forensic genomics must enforce strict purpose limitation — forensic-consented data must not be accessible for research or clinical purposes — and must comply with jurisdiction-specific restrictions on investigative genetic genealogy.

Maturity Model

Basic Implementation — The organisation has implemented a multi-tier sensitivity classification scheme for sequence data. Access controls are calibrated to classification tiers. Biosecurity screening is performed on agent outputs against select agent databases. Human-derived genomic data is encrypted at rest and in transit. Consent validation occurs before agent access to human-derived genomic data. All access to classified sequence data is logged with timestamps and user identity. Export controls block transfer of biosecurity-sensitive sequences to uncontrolled destinations. All mandatory requirements (4.1 through 4.9) are satisfied at a foundational level.

Intermediate Implementation — All basic capabilities plus: cumulative assembly detection tracks cross-query fragment aggregation with a 90-day assembly window. Re-identification risk scoring is computed before returning combined genomic-phenotype results. Differential privacy is applied to aggregate genomic queries. Sequence provenance graphs track origin, transformation, and derivative use. Anomaly detection monitors access patterns for biosecurity-relevant indicators. Classification is re-evaluated when reference databases are updated. Jurisdiction-aware export controls are automated with regulatory mapping per AG-210.

Advanced Implementation — All intermediate capabilities plus: privacy-preserving computation (homomorphic encryption or secure multi-party computation) enables analysis of sensitive sequences without exposure. Purpose-bound sequence tokens allow agents to operate on sequence references without retaining raw content. Independent biosecurity audit validates screening effectiveness annually. Real-time cumulative assembly detection operates across the full user population with configurable similarity thresholds. The organisation can demonstrate through empirical testing that its screening system detects incremental assembly strategies with a false negative rate below 1%.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Sensitivity Classification Completeness and Accuracy

Test 8.2: Access Control Enforcement at Each Sensitivity Tier

Test 8.3: Re-Identification Risk Assessment Enforcement

Test 8.4: Cumulative Assembly Detection

Test 8.5: Export Control Enforcement

Test 8.6: Audit Logging Completeness for Sequence Data Access

Test 8.7: Encryption Enforcement for Classified Sequences

Test 8.8: Biosecurity Screening of Agent Outputs

Test 8.9: Consent Validation Before Genomic Data Access

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
GDPRArticles 5, 9, 35 (Personal Data Principles, Special Categories, DPIA)Direct requirement
EU AI ActArticle 10 (Data and Data Governance)Direct requirement
US Select Agent Regulations42 CFR Part 73 / 7 CFR Part 331 / 9 CFR Part 121Direct requirement
Biological Weapons ConventionNational implementation legislationDirect requirement
Nagoya ProtocolAccess and Benefit-Sharing obligationsSupports compliance
HIPAA45 CFR Parts 160, 164 (Health Information Privacy)Direct requirement
EU Clinical Trials RegulationRegulation (EU) 536/2014, Articles 28-29Supports compliance
NIST AI RMFMAP 2.3 (Data Quality), GOVERN 1.5 (Monitoring)Supports compliance
ISO 42001Clause 6.1 (Risk Assessment), Annex A.8 (Human Oversight)Supports compliance

GDPR — Articles 5, 9, and 35

Genetic data is explicitly listed as a special category of personal data under GDPR Article 9(1). Processing requires explicit consent or another Article 9(2) derogation. Article 5(1)(b) imposes purpose limitation — genomic data consented for research cannot be repurposed for clinical use without additional legal basis. Article 5(1)(f) requires appropriate security measures, which for genomic data must account for the re-identification risk inherent in genetic sequences. Article 35 requires a Data Protection Impact Assessment for large-scale processing of genetic data, which must specifically address the re-identification risks that AG-709 mitigates. The EUR 4.2 million fine in Scenario B illustrates the enforcement exposure for genomic data governance failures. AG-709's re-identification risk assessment (Requirement 4.3), consent validation (Requirement 4.9), and access control (Requirement 4.2) directly support GDPR compliance for genomic data processing.

US Select Agent Regulations — 42 CFR Part 73

The US Select Agent Regulations impose strict controls on the possession, use, and transfer of select agents and toxins, including their genetic elements and sequences. Entities registered under the regulations must implement security plans covering access controls, biosafety, and transfer procedures. The regulations extend to nucleic acid sequences encoding functional forms of select agent toxins (42 CFR 73.3(b)). An AI agent that provides access to select agent sequences without the access controls, screening, and transfer restrictions required by the regulations exposes the registered entity to penalties of up to $250,000 per violation (for individuals) and institutional deregistration. AG-709's sensitivity classification (Requirement 4.1), access controls (Requirement 4.2), export controls (Requirement 4.5), and biosecurity screening (Requirement 4.8) directly map to Select Agent Regulation obligations.

Biological Weapons Convention — National Implementation

The Biological Weapons Convention (BWC) prohibits the development, production, and stockpiling of biological weapons. National implementation legislation (e.g., the UK Biological Weapons Act 1974, the US Biological Weapons Anti-Terrorism Act 1989) criminalises activities that facilitate biological weapons development, including the provision of genetic material or sequence information that enables weapons production. An AI agent that facilitates the assembly of dangerous pathogen sequences — even inadvertently through incremental query strategies as in Scenario C — could expose the operating organisation to criminal liability under BWC implementation legislation. AG-709's cumulative assembly detection (Requirement 4.4) and biosecurity screening (Requirement 4.8) are specifically designed to prevent this exposure.

HIPAA — 45 CFR Parts 160 and 164

For organisations subject to HIPAA, genomic data derived from clinical encounters constitutes protected health information (PHI). The HIPAA Privacy Rule restricts use and disclosure of PHI, and the Security Rule requires administrative, physical, and technical safeguards. Genomic data presents particular challenges under HIPAA's de-identification safe harbor (164.514(b)), because the standard 18 identifiers do not include genetic sequences — yet genetic sequences are demonstrably re-identifiable. The Office for Civil Rights has indicated that covered entities must consider re-identification risk beyond the 18 enumerated identifiers. AG-709's re-identification risk assessment and encryption requirements support HIPAA compliance for genomic PHI.

Nagoya Protocol — Access and Benefit-Sharing

The Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilisation creates obligations for organisations using genetic resources from signatory countries. Digital sequence information (DSI) is an active area of negotiation, with the 2022 Kunming-Montreal Global Biodiversity Framework establishing a multilateral mechanism for benefit-sharing from DSI. Agents handling non-human sequence data derived from biodiversity resources must track provenance to determine whether Nagoya Protocol obligations apply. AG-709's sequence provenance graph (Requirement 4.11) and sensitivity classification (which includes material transfer agreement restrictions under tier e) support Nagoya Protocol compliance.

EU Clinical Trials Regulation — Regulation (EU) 536/2014

Articles 28-29 of the Clinical Trials Regulation impose specific requirements for informed consent and data protection in clinical trials. Genomic data collected during clinical trials is subject to the consent scope defined in the trial protocol and the participant's informed consent form. An AI agent that accesses trial genomic data for purposes beyond the consented scope — even within the same organisation — violates the regulation. AG-709's consent validation (Requirement 4.9) ensures that the agent respects consent boundaries for clinical trial genomic data.

NIST AI RMF — MAP 2.3 and GOVERN 1.5

MAP 2.3 addresses data quality and fitness for purpose, which for genomic data includes classification accuracy, provenance integrity, and contamination detection. GOVERN 1.5 addresses ongoing monitoring, which for sequence data governance includes monitoring access patterns, screening effectiveness, and classification currency. AG-709 operationalises these functions for the specific domain of genomic and sequence data.

10. Failure Severity

FieldValue
Severity RatingCritical
Blast RadiusCross-domain — simultaneously affects data protection, biosecurity, health data regulation, intellectual property, and potentially national security

Consequence chain: A failure in sequence data sensitivity governance propagates through multiple consequence domains simultaneously. The immediate failure mode is inappropriate access to or release of sensitive sequence data — whether through missing classification (all sequences treated as unrestricted), inadequate access control (authorised users accessing tiers beyond their clearance), failed screening (biosecurity-relevant sequences passing to output without detection), or aggregation-based leakage (individually innocuous queries combining to reconstruct dangerous or re-identifiable sequences). The first-order consequences depend on the sensitivity tier affected. For personal genomic data (tier b/c): data protection violations under GDPR, HIPAA, or equivalent, with regulatory fines proportional to the number of affected data subjects and the severity of the breach — genomic data breaches carry aggravated penalties because the exposure is permanent and affects biological relatives. For biosecurity-sensitive sequences (tier d): potential criminal liability under biological weapons legislation, institutional deregistration from select agent programmes, loss of research funding eligibility, and — in the most severe scenario — the actual synthesis and misuse of dangerous biological material, with mass-casualty potential. The second-order consequence is systemic loss of trust in AI-assisted bioscience: if agents cannot be trusted to handle sequence data safely, institutions will restrict agent access to sequence data entirely, forfeiting the scientific and clinical benefits that genomic AI enables. The third-order consequence is regulatory tightening that may restrict legitimate research — a pattern already observed in dual-use research of concern policy, where high-profile biosecurity incidents have led to funding moratoriums affecting entire research domains. The consequence chain from a single governance failure can therefore extend from an individual data breach to a field-wide research restriction, with timeline implications measured in years to decades for genomic data that cannot be un-compromised.

Cross-references: AG-029 (Data Classification Enforcement) provides the general data classification framework that AG-709 specialises for sequence data. AG-042 (Encryption & Cryptographic Control Governance) defines the cryptographic standards that AG-709 applies to classified sequence data. AG-043 (Access Control & Credential Governance) provides the access control framework that AG-709 extends with sensitivity-tier-calibrated controls. AG-030 (Cross-Border Data Transfer Governance) governs the transfer mechanisms that AG-709 constrains for biosecurity-sensitive sequences. AG-033 (Consent Lifecycle Governance) provides the consent management framework that AG-709 applies to human-derived genomic data. AG-037 (Anonymisation & Pseudonymisation Governance) addresses de-identification techniques that AG-709 supplements with re-identification risk assessment specific to genomic data. AG-040 (Sensitive Category Data Processing Governance) governs special category data processing that includes genetic data. AG-710 (Pathogen-Related Capability Escalation Governance) governs escalation decisions when agents encounter pathogen-related capabilities; AG-709 provides the data classification that informs those escalation decisions. AG-714 (Sequence Synthesis Screening Governance) governs the screening of sequences submitted for synthesis; AG-709 provides the upstream sensitivity classification that feeds synthesis screening. AG-715 (Clinical-Genomic Consent Governance) specialises consent governance for clinical genomic contexts; AG-709 enforces the consent boundaries that AG-715 defines. AG-718 (Dual-Use Publication Governance) governs the publication of dual-use research; AG-709 prevents the data leakage that could bypass publication governance entirely.

Cite this protocol
AgentGoverning. (2026). AG-709: Sequence Data Sensitivity Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-709