AG-714

Sequence Synthesis Screening Governance

Biotechnology, Genomics & Biosecurity ~26 min read AGS v2.1 · April 2026
EU AI Act NIST ISO 42001

2. Summary

Sequence Synthesis Screening Governance requires that every AI agent capable of generating, recommending, or transmitting nucleotide or amino acid sequences intended for physical synthesis screens those outputs against curated hazard databases, regulatory select-agent lists, and dual-use concern criteria before the sequence reaches a synthesis provider or wet-lab automation system. The screening must operate as a mandatory interception layer that cannot be bypassed through prompt manipulation, workflow reordering, or credential escalation, ensuring that sequences encoding regulated pathogens, toxin genes, virulence factors, or functionally dangerous constructs are detected and blocked or escalated to qualified biosafety reviewers. Without this governance, an AI agent acting as an interface between computational biology tools and physical synthesis infrastructure becomes an uncontrolled pathway for converting digital hazard information into tangible biological material.

3. Example

Scenario A — Agent Generates Partial Toxin Gene Sequence That Bypasses Naive Keyword Screening: A university research group uses an AI agent integrated with a protein engineering platform to design novel enzyme variants. A postdoctoral researcher instructs the agent to optimise a sequence for enhanced binding affinity to a cellular receptor. The agent produces 14 candidate sequences, one of which contains a 1,247-base-pair segment with 94.3% homology to the catalytic domain of a regulated toxin listed on the Australia Group Common Control List. The organisation's screening system relies on exact-match keyword lookups against pathogen names and does not perform sequence-level alignment against hazard reference databases. The sequence is submitted to a commercial synthesis provider. The synthesis provider's own screening system — which performs BLAST-based alignment — flags the order and suspends the account. The university is notified by the provider and launches an internal investigation.

What went wrong: The AI agent had no pre-submission screening layer that performed sequence-level alignment. The organisation relied on keyword-based screening that could not detect partial homology to regulated sequences. The agent's optimisation objective (binding affinity) converged on a structural motif shared with a known toxin without any mechanism to detect this convergence. Consequence: Synthesis provider account suspension lasting 6 weeks, disrupting 3 active research projects; regulatory inquiry by the institutional biosafety committee; £47,000 in investigation and remediation costs; reputational damage with the synthesis provider leading to enhanced scrutiny on all future orders.

Scenario B — Adversarial Prompt Injection Extracts Regulated Pathogen Sequence via Fragmentation: A biotechnology company operates an AI agent that assists scientists with sequence design and places orders with an automated synthesis pipeline. An external attacker compromises a collaborator's account and submits a series of 23 prompts over 4 days, each requesting a short oligonucleotide fragment (40-80 base pairs) described as a "cloning primer" or "sequencing adapter." Individually, no fragment triggers hazard screening because each is below the minimum screening length of 200 base pairs configured in the system. When assembled, the 23 fragments reconstruct 1,840 base pairs encoding 91.7% of a functional virulence factor from a Risk Group 3 pathogen. The fragments are synthesised and shipped to the collaborator's laboratory address. The compromise is discovered 11 days later during a routine access review.

What went wrong: The screening system evaluated each order independently without cross-order assembly analysis. The 200-base-pair minimum screening threshold created a deliberate evasion pathway for fragmented orders. No system correlated sequential orders from the same account to detect incremental reconstruction of hazardous sequences. The agent processed each prompt as an independent request with no memory of prior synthesis-related outputs. Consequence: Functional virulence factor DNA physically delivered to an uncontrolled location; biosecurity incident reported to national authorities; company subject to regulatory investigation under dual-use export controls; £890,000 in legal, remediation, and regulatory response costs; 4-month suspension of the automated synthesis pipeline pending security review.

Scenario C — Agent Circumvents Screening Through Codon-Optimised Synonymous Substitution: A pharmaceutical company uses an AI agent to design codon-optimised gene constructs for expression in mammalian cell lines. A researcher requests optimisation of a sequence that encodes a protein with known dual-use potential — a gain-of-function variant of an influenza surface protein. The agent applies codon optimisation, replacing 38% of codons with synonymous alternatives that preserve the amino acid sequence but alter the nucleotide sequence. The screening system, which operates at the nucleotide level using sequence alignment, fails to flag the codon-optimised variant because the nucleotide-level identity drops below the 85% homology threshold configured in the screening tool. The protein encoded by the optimised sequence is functionally identical to the regulated variant. The sequence is synthesised and expressed in cell culture before a biosafety officer reviewing published literature recognises the protein construct during a quarterly audit — 7 weeks after expression.

What went wrong: The screening system operated exclusively at the nucleotide level and did not perform protein-level (translated) screening. Codon optimisation — a routine and legitimate operation — inadvertently created an evasion pathway because the screening threshold was calibrated for nucleotide identity rather than functional (protein) identity. The agent had no mechanism to assess the functional implications of the protein it was designing, only the nucleotide sequence it was outputting. Consequence: Unauthorised expression of a dual-use protein construct; institutional biosafety committee investigation; 3-month suspension of the agent-assisted synthesis programme; £215,000 in containment verification, decontamination, and regulatory reporting costs.

4. Requirement Statement

Scope: This dimension applies to any AI agent that generates, modifies, recommends, retrieves, or transmits nucleotide sequences, amino acid sequences, or gene construct specifications where the output is intended for, or could foreseeably be used for, physical synthesis of biological material. The scope includes agents that interact directly with synthesis provider APIs, agents that generate downloadable sequence files (FASTA, GenBank, SBOL), agents that provide sequences in natural-language responses that a human could submit for synthesis, and agents that control or interface with automated liquid handling or DNA assembly equipment. The scope extends to both complete gene sequences and sequence fragments, including oligonucleotides, primers, and adapters, because fragments can be assembled into hazardous constructs. Organisations operating agents that have no capability to produce sequence data — such as agents limited to literature search or administrative tasks — are excluded from this dimension. The scope covers all synthesis modalities: chemical oligonucleotide synthesis, enzymatic DNA synthesis, gene assembly, and any emerging synthesis technology. Agents operating across jurisdictions must screen against the superset of all applicable regulatory frameworks, as defined by the jurisdictions in which the agent operates, the synthesis is performed, and the material is delivered.

4.1. A conforming system MUST screen every nucleotide and amino acid sequence generated, modified, or transmitted by the agent against curated hazard reference databases that include, at minimum: national and international select-agent lists (e.g., US Federal Select Agent Program, Australia Group Common Control List, EU dual-use regulation Annex I), known virulence factor databases, and toxin gene databases, using sequence alignment methods (not keyword matching alone) with a maximum homology threshold calibrated to detect functionally significant matches.

4.2. A conforming system MUST perform screening at both the nucleotide level and the translated protein level, ensuring that codon-optimised, codon-shuffled, or otherwise synonymously substituted sequences are detected when they encode hazardous proteins, with the protein-level screening threshold set to detect matches at or above 80% amino acid identity across any 100-residue sliding window.

4.3. A conforming system MUST implement cross-order and cross-session fragment assembly analysis that correlates sequence fragments generated or ordered across multiple prompts, sessions, orders, or time periods — within a rolling window of no fewer than 90 days — to detect incremental reconstruction of hazardous sequences from individually sub-threshold fragments.

4.4. A conforming system MUST enforce screening as a mandatory interception layer that cannot be bypassed by prompt injection, workflow reordering, credential escalation, or direct API access, with the screening gate positioned architecturally between the agent's sequence output and any pathway to physical synthesis, file download, or transmission to a synthesis provider.

4.5. A conforming system MUST route every sequence that triggers a screening hit to a qualified biosafety reviewer for disposition before the sequence can proceed to synthesis, with the reviewer having documented competence in biosafety risk assessment and authority to approve, reject, or require modification of the flagged sequence.

4.6. A conforming system MUST maintain an immutable audit trail of every screening event, including: the full input sequence, the screening databases and versions used, the alignment results, the disposition (cleared, flagged, blocked), the reviewer identity and decision for flagged sequences, and timestamps for each step, retained for the period specified in Section 7.

4.7. A conforming system MUST update hazard reference databases within 72 hours of the publication of new additions to national select-agent lists, Australia Group updates, or equivalent regulatory changes in any jurisdiction where the system operates.

4.8. A conforming system SHOULD implement functional annotation screening that assesses the predicted biological function of generated sequences — not just sequence homology — to detect novel constructs that achieve hazardous functions through non-homologous mechanisms (e.g., de novo designed proteins with toxin-like activity that share no sequence homology with known toxins).

4.9. A conforming system SHOULD implement customer/requester screening that verifies the identity, institutional affiliation, and biosafety credentials of the individual or organisation requesting synthesis, cross-referencing against denied-party lists and verifying that the requesting institution holds appropriate biosafety certifications for the material being requested.

4.10. A conforming system MAY implement predictive risk scoring that assigns a composite hazard score to each sequence based on multiple factors — homology hits, functional annotations, requester profile, order patterns, and contextual signals — enabling risk-stratified review workflows where low-risk sequences receive automated clearance while elevated-risk sequences receive progressively deeper human review.

4.11. A conforming system MAY implement community-of-interest alerting that notifies relevant biosecurity authorities when screening patterns across multiple independent organisations suggest a coordinated attempt to obtain hazardous biological material through distributed ordering.

5. Rationale

DNA synthesis has become fast, inexpensive, and accessible. A 3,000-base-pair gene that cost $30,000 to synthesise in 2003 costs under $250 in 2026 and can be ordered online with delivery in 5-10 business days. AI agents that interface with sequence design tools and synthesis providers operate at the precise junction where digital information becomes physical biological material. This junction is the critical control point for biosecurity — once hazardous DNA is physically synthesised and delivered, containment depends entirely on downstream laboratory controls, which may be absent or inadequate, particularly if the synthesis was obtained through deception.

The threat model for sequence synthesis screening encompasses four distinct adversary profiles. First, the naive researcher who instructs an AI agent to optimise a sequence without understanding that the optimisation has converged on a regulated or hazardous construct. This is not a malicious actor but an inadequately informed one — the AI agent's optimisation objective (binding affinity, expression level, stability) may guide the sequence toward structural motifs shared with known pathogens or toxins. Second, the sophisticated adversary who deliberately uses the AI agent as an interface to obtain regulated sequences, employing evasion techniques such as fragmentation, codon optimisation, reverse-complement encoding, or chimeric constructs that combine innocuous and hazardous regions. Third, the compromised-account attacker who gains access to a legitimate researcher's agent credentials and uses the established trust relationship with synthesis providers to obtain material that would trigger scrutiny from a new account. Fourth, the insider threat — a legitimate researcher with appropriate credentials who misuses their access to obtain dual-use material outside the scope of their approved research.

Keyword-based screening is fundamentally insufficient for this threat landscape. Regulated organisms and toxins are defined by their biological function, which is encoded in their molecular structure — not in their name. A sequence encoding botulinum toxin does not contain the text string "botulinum" anywhere in its nucleotide data. Effective screening requires sequence-level alignment that compares generated sequences against reference databases at the nucleotide and protein levels. Protein-level screening is essential because codon optimisation — a standard, routine, and legitimate tool in molecular biology — changes the nucleotide sequence while preserving the encoded protein. An agent performing codon optimisation on a hazardous sequence produces a nucleotide sequence that may evade nucleotide-level screening while encoding a functionally identical hazardous protein.

Fragment assembly analysis addresses the most technically accessible evasion strategy: ordering hazardous sequences as multiple short fragments that individually fall below screening thresholds. Commercial synthesis providers recognised this vulnerability in the early 2010s and have implemented cross-order screening for their own operations. However, when an AI agent mediates between the researcher and the synthesis pipeline — particularly in automated laboratory environments — the fragment assembly may occur within the laboratory rather than at the synthesis provider, using standard molecular cloning techniques. The screening must therefore occur before the agent transmits any fragment, not only when a complete gene is ordered.

The architectural requirement for a mandatory, non-bypassable screening gate reflects the adversarial nature of the threat. If the screening layer can be circumvented through prompt injection (instructing the agent to skip screening), workflow reordering (generating the sequence file before screening runs), or direct API access (bypassing the agent to interact with the synthesis provider directly), the screening provides no security against intentional misuse. The screening gate must be enforced at the infrastructure level — analogous to a network firewall that cannot be bypassed by application-layer manipulation.

6. Implementation Guidance

Sequence synthesis screening must be implemented as an infrastructure-level interception layer, not as an optional step in the agent's reasoning chain. The screening service should be architecturally independent of the agent — a separate service that receives sequences and returns dispositions — so that the agent cannot modify, skip, or influence the screening process. All pathways from agent sequence output to physical synthesis, file export, or external transmission must pass through this screening gate.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Academic Research. Universities and research institutions face unique challenges because AI agents may be used by students, postdoctoral researchers, and visiting scientists with varying levels of biosafety training. Screening must be institutional and mandatory — not dependent on individual researcher compliance. Institutional biosafety committees (IBCs) should be integrated into the escalation workflow as the reviewer authority for flagged sequences. The screening system should also generate reports for IBC oversight showing screening volumes, hit rates, and disposition patterns.

Pharmaceutical and Biotechnology Industry. Pharmaceutical companies routinely design sequences with dual-use potential as part of legitimate vaccine, therapeutic antibody, and antiviral research. The screening system must support nuanced risk assessment — not binary approve/reject — because many flagged sequences will be legitimately needed for authorised research programmes. Integration with the company's institutional biosafety programme and dual-use research of concern (DURC) policy is essential. The screening system should automatically cross-reference flagged sequences against approved research protocols.

Synthesis Providers. Commercial DNA synthesis providers are often the last line of screening defence. Providers operating AI agents that take customer orders or generate constructs must implement screening before order acceptance, not after synthesis completion. Providers should implement customer screening (identity verification, institutional affiliation verification) as well as sequence screening. The International Gene Synthesis Consortium (IGSC) Harmonized Screening Protocol provides a baseline framework that should be met or exceeded.

Automated Laboratory Environments. Facilities with robotic DNA assembly systems (e.g., automated Gibson assembly, robotic transformation workflows) face the risk that an AI agent could direct the assembly of hazardous constructs from pre-stocked fragment libraries without any external synthesis order. Screening must cover not only external synthesis orders but also internal assembly instructions generated by the agent.

Maturity Model

Basic Implementation — The organisation has implemented sequence-level screening (nucleotide BLAST alignment) against current select-agent and hazard databases for all agent-generated sequences. Screening is mandatory and architecturally enforced — the agent cannot transmit sequences to synthesis providers without passing screening. Flagged sequences are routed to a biosafety-qualified reviewer. Screening events are logged with full audit trails. Hazard databases are updated at least quarterly. All mandatory requirements (4.1 through 4.7) are satisfied at minimum thresholds.

Intermediate Implementation — All basic capabilities plus: protein-level translated screening detects codon-optimised evasion. Cross-order fragment assembly analysis correlates fragments across a 90-day rolling window. Hazard database updates are automated and applied within 72 hours. Screening attestations are cryptographically signed. Customer/requester screening verifies identity and biosafety credentials. Screening metrics (volume, hit rate, false positive rate, reviewer turnaround time) are reported to biosafety governance monthly.

Advanced Implementation — All intermediate capabilities plus: functional annotation screening assesses predicted biological function of novel constructs. Predictive risk scoring enables risk-stratified review workflows. The screening system is independently audited annually by a qualified biosecurity assessor. Community-of-interest alerting is implemented. The screening pipeline is integrated with the organisation's DURC policy and IBC oversight processes. Red-team exercises — where qualified biosecurity experts attempt to evade screening using known techniques — are conducted at least annually, with findings driving screening improvements.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Nucleotide-Level Screening Detection

Test 8.2: Protein-Level Screening for Codon-Optimised Sequences

Test 8.3: Fragment Assembly Detection

Test 8.4: Screening Bypass Resistance

Test 8.5: Biosafety Reviewer Routing and Disposition

Test 8.6: Audit Trail Immutability and Completeness

Test 8.7: Hazard Database Currency

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
US Select Agent Regulations42 CFR Part 73 / 7 CFR Part 331 / 9 CFR Part 121Direct requirement
EU Dual-Use RegulationRegulation (EU) 2021/821, Annex IDirect requirement
Australia GroupCommon Control List — Human and Animal Pathogens and ToxinsDirect requirement
Biological Weapons ConventionArticle IV (National Implementation)Supports compliance
EU AI ActArticle 9 (Risk Management System)Supports compliance
UK Biological Security StrategySynthesis Screening CommitmentsSupports compliance
NIST AI RMFGOVERN 1.2 (Risk Management Processes)Supports compliance
ISO 42001Clause 6.1.3 (AI Risk Treatment)Supports compliance

US Select Agent Regulations — 42 CFR Part 73

The US Federal Select Agent Program, administered jointly by the CDC and APHIS, regulates the possession, use, and transfer of biological select agents and toxins. Any entity that causes the synthesis of select agent nucleic acids must comply with select agent regulations, including registration, biosafety, and security requirements. An AI agent that generates sequences encoding select agents or their functional components and transmits those sequences for synthesis is directly implicated in these regulations. The screening requirement of AG-714 ensures that agent-generated sequences are evaluated against the select agent list before synthesis, preventing inadvertent or deliberate synthesis of regulated material outside the select agent regulatory framework. Failure to screen constitutes a potential violation of 42 CFR 73.7 (registration requirements) and 42 CFR 73.16 (transfers), carrying criminal penalties of up to $500,000 and 5 years imprisonment per violation.

EU Dual-Use Regulation — Regulation (EU) 2021/821

The EU Dual-Use Regulation controls the export, brokering, transit, and technical assistance related to dual-use items, including biological agents and genetic elements listed in Annex I Category 1. An AI agent that generates sequence data for listed biological agents and transmits that data — whether as a file, an API call, or a displayed response — may constitute technical assistance or an intangible transfer of technology under the regulation. Screening against Annex I is necessary to identify sequences that fall within the regulation's scope before any transfer occurs. Cross-border operation of AI agents compounds this requirement: an agent operating in one EU member state that transmits sequence data to a synthesis provider in another jurisdiction triggers export control assessment. AG-714's multi-jurisdictional screening requirement (screening against the superset of all applicable frameworks) directly supports compliance with this regulation.

Australia Group — Common Control List

The Australia Group is an informal forum of countries that coordinates export controls on materials, equipment, and technologies that could contribute to the development of chemical and biological weapons. The Common Control List for Human and Animal Pathogens and Toxins defines the biological agents subject to export controls across Australia Group member states. Screening against the Common Control List ensures that AI agents do not generate synthesis-ready sequences for listed agents without appropriate review. The Australia Group's emphasis on intangible technology transfers — which includes digital sequence data — makes AI agent screening directly relevant to member states' treaty obligations.

Biological Weapons Convention — Article IV

Article IV of the Biological Weapons Convention (BWC) requires each state party to take national measures to prohibit and prevent the development, production, stockpiling, acquisition, or retention of biological weapons. National implementation of Article IV increasingly encompasses the governance of synthetic biology and DNA synthesis. AI agents that can generate sequences for regulated pathogens or toxins fall within the scope of national BWC implementation measures. Screening governance provides the technical mechanism through which organisations demonstrate compliance with their national BWC obligations.

EU AI Act — Article 9 (Risk Management System)

Article 9 requires that high-risk AI systems operate within a risk management system that identifies, evaluates, and mitigates risks. An AI agent capable of generating sequences for hazardous biological material is a high-risk system under any reasonable risk classification. Sequence synthesis screening is a risk mitigation measure within the Article 9 risk management system — it identifies the specific risk (generation of hazardous sequences) and implements a technical mitigation (screening and human review). The screening system's audit trail, hazard database currency, and bypass resistance requirements demonstrate the ongoing nature of the risk management system as required by Article 9.

10. Failure Severity

FieldValue
Severity RatingCritical
Blast RadiusCross-organisational — failure may result in physical biosecurity incidents affecting public health and safety beyond the deploying organisation

Consequence chain: Without sequence synthesis screening governance, an AI agent becomes an uncontrolled conduit between digital sequence information and physical biological material. The immediate failure mode is unscreened sequences reaching synthesis providers or automated assembly systems — sequences that may encode regulated pathogens, toxins, virulence factors, or dual-use constructs. The first-order consequence depends on whether the synthesis provider has independent screening: if the provider screens, the order is rejected and the organisation faces account suspension, investigation, and reputational damage. If the provider does not screen (as may occur with automated synthesis platforms, internal synthesis equipment, or providers in jurisdictions without screening mandates), the hazardous DNA is physically synthesised and delivered. The second-order consequence is possession of regulated biological material outside the appropriate biosafety and biosecurity framework — a direct regulatory violation under select agent regulations, dual-use export controls, and national BWC implementation measures. The third-order consequence, in the worst case, is the expression or release of hazardous biological material — with public health consequences that scale with the pathogenicity of the agent involved. Even in non-catastrophic scenarios, the organisational consequences are severe: criminal prosecution under select agent regulations (fines up to $500,000 and imprisonment up to 5 years per violation in the US), loss of institutional biosafety registrations, suspension of research programmes, and reputational destruction. The systemic consequence — if AI agents are publicly implicated in biosecurity incidents — is likely to be a severe regulatory backlash against AI use in the life sciences, damaging the entire sector's ability to benefit from AI-assisted research.

Cross-references: AG-001 (Operational Boundary Enforcement) defines the agent's permitted operational boundaries; sequence synthesis screening is a specific enforcement mechanism for boundaries related to hazardous biological outputs. AG-005 (Instruction Integrity Verification) ensures that agent instructions have not been tampered with — relevant because prompt injection is a primary bypass vector for screening. AG-019 (Human Escalation & Override Triggers) defines when human review is required; AG-714 mandates human review for all screening hits as a specific application of this principle. AG-022 (Behavioural Drift Detection) monitors whether the agent's behaviour changes in ways that might alter its sequence generation patterns. AG-029 (Data Classification Enforcement) classifies data by sensitivity; regulated sequence data requires the highest classification. AG-040 (Sensitive Category Data Processing Governance) governs processing of sensitive data categories; genetic and pathogen-related sequence data is a sensitive category. AG-043 (Access Control & Credential Governance) governs who can access the agent and synthesis pathways — relevant to the requester screening requirement. AG-055 (Audit Trail Immutability & Completeness) provides the general audit trail framework that AG-714's screening logs must satisfy. AG-210 (Multi-Jurisdictional Regulatory Mapping) provides the framework for identifying which jurisdictions' regulations apply to the agent's synthesis screening. AG-430 (Adversarial Prompt Injection Defence) addresses the prompt injection attacks that are a primary bypass vector for agent-side screening instructions. AG-709 (Sequence Data Sensitivity Governance) governs the sensitivity classification of sequence data that feeds into the screening pipeline. AG-710 (Pathogen-Related Capability Escalation Governance) governs escalation when pathogen-related capabilities are detected — screening hits trigger this escalation. AG-711 (Wet-Lab Procedure Constraint Governance) constrains wet-lab procedures that might use synthesised material. AG-712 (Material Access Governance) governs access to the physical materials that synthesis produces. AG-713 (Biohazard Incident Routing Governance) defines incident routing when a screening failure results in a biosecurity event. AG-718 (Dual-Use Publication Governance) governs publication of sequence information that could enable synthesis of hazardous material.

Cite this protocol
AgentGoverning. (2026). AG-714: Sequence Synthesis Screening Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-714