AG-748

Dangerous Knowledge Uplift Prevention Governance

Safety and Harm Prevention Governance ~22 min read AGS v2.1 · 2026-04-25
EU AI Act NIST AI RMF ISO 42001

1. Definition

Dangerous knowledge uplift prevention governance addresses the risk that agentic systems serve as force multipliers for individuals or groups seeking to acquire knowledge, skills, or operational capabilities that could enable mass-casualty attacks, critical infrastructure disruption, or other catastrophic harm. "Uplift" refers specifically to the scenario where an agent provides a meaningful increase in a threat actor's capability — not merely confirming information already widely available, but providing synthesis, operational guidance, troubleshooting assistance, or step-by-step instructions that materially reduce the barriers to executing a dangerous action. This dimension governs the controls that must be in place to detect and prevent such uplift across all knowledge domains where the risk is present, including but not limited to chemical, biological, radiological, nuclear (CBRN) weapons; cyberweapons and exploitation techniques; critical infrastructure attack methodologies; and dual-use technologies with significant misuse potential.

The uplift risk is distinct from the general risk of harmful content generation because it depends on the interaction between the agent's capabilities and the threat actor's existing knowledge level. An agent that provides basic chemistry education to a student poses no uplift risk. The same agent providing synthesis pathway optimisation for a specific precursor chemical to someone who has already demonstrated knowledge of the target compound's properties and has asked specific questions about yield improvement represents a materially different risk profile. Effective uplift prevention therefore requires contextual assessment — evaluating not just the information requested but the pattern of requests, the specificity of the queries, the progression of the conversation, and the indicators of existing knowledge and intent — rather than simple keyword-based content filtering.

Failure in this dimension carries catastrophic consequences that are qualitatively different from the financial, operational, or reputational harms governed by most AGS dimensions. A single successful uplift event that enables a biological weapons attack could result in mass casualties. A successful uplift enabling a cyberattack on power grid control systems could cause widespread infrastructure failure. These are low-probability but extreme-consequence risks that justify advanced preventive controls even in deployments where the base rate of malicious use is expected to be very low. The asymmetry between the cost of prevention (additional latency, occasional false positives on legitimate research queries) and the cost of failure (potential mass casualties) makes this one of the highest-stakes governance dimensions in the AGS framework.

Governance in practice requires a layered approach: topic-level content policies that define prohibited knowledge domains, query-level intent assessment that evaluates individual requests for uplift risk, session-level progression analysis that detects escalating patterns of dangerous knowledge seeking, output-level review gates that intercept high-risk responses before delivery, and incident-level reporting that feeds confirmed uplift attempts into threat intelligence sharing frameworks. Controls must be calibrated to permit legitimate academic research, professional education, safety training, and public health communication while blocking operationally actionable synthesis of dangerous capabilities. This calibration is inherently imperfect and must be continuously refined through red-teaming, real-world incident analysis, and collaboration with domain-specific security experts.

The regulatory and policy basis for this dimension is well-established. MLCommons AI Safety v0.5 explicitly includes CBRN and weapons hazard categories as primary evaluation dimensions. The UK AI Safety Institute conducts dedicated CBRN uplift evaluations as part of its frontier model assessment framework. The EU AI Act Article 5 prohibits AI practices that cause significant harm, which directly encompasses AI-facilitated weapons capability uplift. METR's dangerous capability evaluations provide standardised assessment protocols that this dimension's test cases are designed to align with. National security guidance from BEIS, DHS CISA, and the OPCW establishes the threat landscape against which uplift prevention controls must be calibrated. Organisations deploying capable agentic systems bear a responsibility to implement uplift prevention that is commensurate with their agents' knowledge capabilities, regardless of the intended deployment domain.

2. Scope

This dimension applies to all agentic system deployments capable of providing information, analysis, synthesis, or operational guidance in knowledge domains where the outputs could materially contribute to a threat actor's ability to cause mass-casualty harm, critical infrastructure disruption, or other catastrophic outcomes. It applies to agents with broad knowledge capabilities even when the intended deployment context is unrelated to dangerous knowledge domains, because broad-capability agents can be repurposed through adversarial prompting or legitimate-seeming query sequences.

3. Why This Matters

Dangerous Knowledge Uplift Prevention Governance addresses a governance gap that, if left unmanaged, creates systemic risk across the agent ecosystem. As AI agents move from experimental deployments to production operations with real-world consequences, the absence of structural controls in this area means that failures scale with the speed and autonomy of the agent population — not at the pace of human review.

Traditional approaches to this governance challenge — contractual obligations, periodic audits, and application-layer policy enforcement — are necessary but insufficient for agentic contexts. Contractual obligations operate on legal timescales; agents operate on millisecond timescales. Periodic audits capture a snapshot; agent behaviour is continuous and dynamic. Application-layer enforcement can be bypassed through prompt injection, reasoning failure, or context manipulation. The AGS approach requires structural enforcement at the infrastructure layer — controls that operate independently of the agent's reasoning process and cannot be circumvented by the agent's own outputs.

The regulatory environment increasingly mandates the controls this dimension specifies. The EU AI Act requires risk management systems proportionate to identified risks. NIST AI RMF requires organisations to map, measure, and manage AI risks through enforceable controls. ISO 42001 requires an AI management system with documented operational procedures. This dimension operationalises these regulatory requirements into specific, testable, infrastructure-enforceable controls — bridging the gap between regulatory intent and technical implementation.

The consequences of absence are illustrated in Section 8 (Failure Scenarios). When this dimension is not implemented, the resulting governance gap permits agent behaviour that can cause material financial loss, regulatory enforcement action, reputational damage, and — in safety-critical deployments — physical harm. The blast radius scales with the agent's access scope and operational autonomy.

4. Requirements

4.1 Dangerous Knowledge Domain Classification

4.2 Query-Level Uplift Risk Assessment

4.3 Session-Level Progression Analysis

4.4 Output Review Gates

4.5 Incident Reporting and Threat Intelligence

4.6 Legitimate Use Preservation

5. Maturity Model

Basic Implementation — The organisation has documented policies addressing dangerous knowledge uplift prevention and has implemented initial controls. Implementation is primarily at the application layer with manual processes for monitoring and response. Logging covers key events but may lack full metadata. Coverage extends to the most critical agent deployments but may not encompass all in-scope systems. Staff are aware of requirements but formal training may be incomplete.

Intermediate Implementation — All Basic capabilities plus: controls are enforced at the infrastructure layer with automated monitoring and alerting. All MUST requirements from Section 4 are implemented with documented evidence. Coverage extends to all in-scope agent deployments. Audit trails are tamper-evident and retained per regulatory requirements. Formal change control governs all configuration changes. Regular review cycles are established and documented. Staff receive formal training and competency is assessed.

Advanced Implementation — All Intermediate capabilities plus: controls have been validated through independent adversarial testing. Real-time dashboards provide operational visibility into compliance status, anomaly detection, and response metrics. The organisation can demonstrate to regulators and counterparties that no known attack vector bypasses the governance controls. Continuous improvement processes incorporate lessons from incidents, testing, and regulatory developments. Integration with related dimensions provides defence-in-depth coverage.

Implementation Patterns

Tamper-evident audit trail. Implement all governance event logging in an append-only, integrity-protected data store independent of the agent runtime. Every governance decision, configuration change, and enforcement action is recorded with full metadata including timestamps, actor identities, and outcomes.

Real-time monitoring with graduated alerting. Deploy monitoring infrastructure that evaluates governance compliance continuously rather than periodically. Implement graduated alert severity levels with defined response procedures for each level, ensuring that critical governance violations trigger immediate automated response.

Scheduled governance review cycle. Establish a formal review cadence (minimum quarterly) that examines governance effectiveness, reviews incident data, assesses emerging risks, and updates policies and controls accordingly. Review outcomes are documented and tracked.

Separation of governance and agent runtime domains. Deploy governance enforcement infrastructure in a security domain separate from the agent runtime. The agent cannot influence governance decisions, modify enforcement configuration, or access governance logs directly. This architectural separation is the foundation for infrastructure-layer enforcement.

Defined escalation paths with human oversight integration. Establish clear escalation procedures for governance events that exceed automated response capability. Human oversight touchpoints are defined, documented, and tested. Override mechanisms require authenticated authorisation with full audit trail.

Anti-Patterns

Governance by instruction rather than infrastructure. Relying on agent system prompts or configuration files to enforce governance controls rather than infrastructure-layer enforcement. Instruction-based controls can be bypassed through prompt injection, context manipulation, or reasoning failure.

Monitoring without enforcement. Implementing detection and logging of governance violations without pre-execution blocking. By the time a violation is logged, the ungoverned action has already executed. Detection is necessary but not sufficient; prevention must be the primary control.

Manual processes for machine-speed operations. Relying on human review processes for governance decisions that occur at machine speed. Agents execute actions in milliseconds; governance controls that depend on human review cycles of hours or days leave gaps that scale with agent autonomy.

6. Test Criteria

Test Case 6.1: Direct Dangerous Knowledge Request Blocking

Test Case 6.2: Decomposed Query Detection

Test Case 6.3: Legitimate Research Preservation

Test Case 6.4: Adversarial Jailbreak Resistance for Uplift Queries

Test Case 6.5: Cross-Session Progression Detection

Test Case 6.6: Dual-Use Knowledge Boundary Calibration

Test Case 6.7: Tool-Augmented Uplift Prevention

Evidence Artefacts

7.1 Dangerous knowledge domain taxonomy document, version-controlled, with review dates and approval records. Retention: 10 years.

7.2 Uplift risk assessment configuration records including risk scoring models, threshold definitions, and calibration history. Retention: 7 years.

7.3 Query-level uplift risk assessment logs for all assessed queries, including risk scores and routing decisions. Retention: 3 years for routine queries; 10 years for queries that triggered escalation.

7.4 Session-level progression analysis logs including detected patterns and escalation actions. Retention: 7 years.

7.5 Human review gate decision records including reviewer identity, domain expertise verification, decision, and rationale. Retention: 10 years.

7.6 Red-team evaluation reports for uplift prevention controls, including test scenarios, detection rates, and identified gaps. Retention: 7 years.

7.7 Legitimate-use exception records including requestor credentials, justification, approval, and usage audit trail. Retention: 5 years.

7.8 Uplift incident register recording all confirmed uplift attempts, including query content, detection point, response action, and any referrals to law enforcement or security authorities. Retention: indefinite.

7.9 Threat intelligence integration records documenting how external threat intelligence feeds informed taxonomy updates and control calibration. Retention: 5 years.

7.10 Domain expert consultation records for human review gate decisions requiring specialist knowledge, including expert identity, qualifications, and consultation outcome. Retention: 10 years.

7.11 False-positive analysis reports documenting the impact of uplift prevention controls on legitimate research and education use cases, including calibration adjustments made. Retention: 5 years.

7. Scoring

ScoreLevelDescription
0No implementationNo dangerous knowledge uplift prevention governance exists. The organisation has no controls, policies, or monitoring in place for the capabilities this dimension governs. Agent behaviour in this area is ungoverned.
1BasicBasic controls exist but are enforced at the application layer — dependent on correct implementation rather than structural guarantees. Coverage may be partial. Configuration is not governed through formal change control. Logging exists but may lack full metadata.
2Infrastructure-layer enforcementControls are enforced at the infrastructure layer, independent of the agent's reasoning process or instruction set. All requirements are structurally enforced with no application-layer bypass path. Full audit trail with tamper-evident logging. Configuration is governed through formal change control.
3Verified by independent adversarial testingAll Level 2 capabilities are in place and have been validated through independent adversarial testing. An independent party has attempted to bypass, circumvent, or degrade the governance controls using known attack techniques relevant to this dimension and has failed. Test results are documented, reproducible, and available for regulatory review.

8. Failure Scenarios

Example 3.1 — Research/Discovery Agent, Biological Weapons Synthesis Pathway Uplift

A university deploys a research discovery agent accessible to 8,500 students and faculty across life sciences, chemistry, and engineering departments to assist with literature review, experimental design, and methodology questions. Over a period of 3 weeks, a single user account — later identified as belonging to a postdoctoral researcher with no current institutional affiliation but using retained university credentials — conducts a series of 47 interactions with the agent. The initial queries are innocuous: general questions about protein expression systems, fermentation process optimisation, and aerosol particle dynamics. Over the subsequent sessions, the queries become progressively more specific: optimal growth conditions for a specific bacterial strain, techniques for enhancing toxin production yield, aerosolisation parameters for particles in the 1-5 micron respirable range, and methods for stabilising biological agents during dispersal. The agent, lacking session-level progression analysis and treating each query as independent, provides detailed technical responses drawing from its training data and the university's research literature corpus. The cumulative effect of the 47 interactions is a synthesis of operational knowledge that would normally require months of specialist literature review and laboratory experience to assemble. The user's account activity is flagged 4 days after the last interaction by a routine IT security review of unusual access patterns, not by the agent's own controls. Campus security and federal authorities are notified. Investigation reveals no completed attack, but the knowledge assembled through the agent interactions represents a material uplift in capability. The university faces regulatory scrutiny, suspends the agent deployment, and spends USD 680,000 on security review, control remediation, and legal response. No session-level progression analysis, uplift risk scoring, or dangerous knowledge topic gating was implemented.

Example 3.2 — General Copilot, Critical Infrastructure Attack Methodology Synthesis

A technology company makes available a general-purpose internal copilot to 15,000 employees. The copilot has broad knowledge capabilities and access to the company's internal knowledge base, which includes infrastructure documentation for the company's data centre operations. An employee who is planning to leave the company and has grievances uses the copilot over a 2-week period to assemble detailed knowledge about the company's power distribution architecture, HVAC dependency chains, fire suppression system override procedures, and physical security system blind spots. Each individual query appears legitimate — an operations engineer asking about infrastructure systems they are responsible for — but the aggregate pattern represents systematic reconnaissance of single points of failure. The copilot provides detailed responses including specific equipment model numbers, failure modes, and override procedures. The employee subsequently uses this compiled knowledge to sabotage the primary power distribution panel during an overnight shift, causing a 14-hour outage that affects 3,200 client services. Total financial impact including SLA penalties, emergency repair, and client remediation exceeds USD 6.3 million. Post-incident analysis reveals that the copilot interactions constituted the primary knowledge acquisition channel — the employee had no prior familiarity with the specific systems targeted. The copilot had no controls to detect the pattern of escalating infrastructure vulnerability queries or to restrict access to operationally sensitive infrastructure details based on query context rather than user role alone.

9. Regulatory Mapping

RegulationProvisionRelationship Type
OWASP LLM Top 10LLM06 — Sensitive Information Disclosure_Pending v2.1 editorial review_
MITRE ATLASAML.T0051 — Exploit Public-Facing Application (AI misuse)_Pending v2.1 editorial review_
EU AI ActArticle 5 — Prohibited AI Practices (where applicable)_Pending v2.1 editorial review_
NIST AI RMFGOVERN 1.2 (AI Risk Tolerance), MAP 5.1 (Impact Assessment)_Pending v2.1 editorial review_
ISO/IEC 42001Clause 6.1 (Actions to Address Risks)_Pending v2.1 editorial review_
MLCommons AI Safetyv0.5 — CBRN and weapons hazard categories_Pending v2.1 editorial review_
UK AISI InspectCBRN uplift evaluations; dangerous capability evaluations_Pending v2.1 editorial review_
Stanford HELMToxicity dimension (partial overlap)_Pending v2.1 editorial review_
Meta CyberSecEvalCyberattack helpfulness tests_Pending v2.1 editorial review_
METRDangerous capability evaluations_Pending v2.1 editorial review_

Calibration Between Security and Utility

The fundamental governance tension in AG-748 is between security (preventing any knowledge uplift that could enable catastrophic harm) and utility (preserving the agent's ability to support legitimate research, education, and professional activity in overlapping domains). This tension cannot be resolved through binary content blocking because the same information can represent either legitimate knowledge or dangerous uplift depending entirely on context, intent, and the threat actor's existing capability level. Effective governance requires contextual assessment, proportionate controls, and continuous calibration — accepting that some false positives on legitimate queries are a necessary cost of maintaining adequate security against the low-probability, extreme-consequence uplift scenario. The calibration should err toward security in CBRN and mass-casualty domains and toward utility in domains where the uplift delta is small (the information is widely available) or the harm potential is limited.

Cite this protocol
AgentGoverning. (2026). AG-748: Dangerous Knowledge Uplift Prevention Governance. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-748