AG-401: GPU Tenancy Isolation Governance

2. Summary

GPU Tenancy Isolation Governance requires that organisations deploying AI agents on shared accelerator hardware implement structural controls to prevent data leakage, side-channel exposure, and computation spillover between tenants or workloads sharing the same physical GPU, multi-GPU node, or accelerator pool. Modern GPU architectures support multi-instance partitioning, virtual function pass-through, and time-sliced scheduling — but each mechanism has distinct isolation properties, failure modes, and residual risks that must be governed rather than assumed. Without explicit tenancy isolation governance, an agent processing classified financial data may share accelerator memory pages, cache hierarchies, or scheduling queues with an agent processing untrusted public inputs, creating cross-contamination vectors that bypass every application-layer security control.

3. Example

Scenario A — Shared GPU Memory Residual Leaks Sensitive Data: A cloud-hosted AI platform runs two workloads on the same physical GPU using time-sliced scheduling: Workload A is a financial-value agent processing customer credit assessments containing full names, national insurance numbers, income statements, and credit scores for a UK retail bank. Workload B is a customer-facing chatbot for a separate tenant on the same platform. The platform relies on the GPU driver's default memory management to clear allocations between context switches. A vulnerability in the driver (similar to CVE-2023-25516 affecting a major GPU vendor's kernel module) allows Workload B to read residual memory from Workload A's prior allocation. Over a 72-hour period, 4,200 partial customer records — including 1,847 national insurance numbers — are extractable from memory residuals. The breach is discovered when a security researcher demonstrates the extraction technique to the platform provider.

What went wrong: The organisation relied on driver-level memory clearing as the sole isolation mechanism. No governance policy required verification of memory sanitisation between tenant context switches. The time-sliced scheduling mode shares the full GPU memory space between workloads, with isolation dependent entirely on software-layer cleanup. The data classification of Workload A (containing PII per AG-015) was never evaluated against the isolation properties of the scheduling mode. Consequence: Breach of 1,847 national insurance numbers, mandatory ICO notification under UK GDPR Article 33, estimated remediation cost £2.3 million including customer notification, credit monitoring, and regulatory fine. Platform provider faces contractual liability to the bank under the data processing agreement.

Scenario B — Side-Channel Attack via GPU Cache Timing: An adversary deploys a seemingly benign inference workload on a shared accelerator cluster. The workload is co-located on the same multi-GPU node as a safety-critical agent performing real-time structural integrity assessments for a civil engineering firm. The adversarial workload performs carefully timed memory access patterns designed to probe the GPU's L2 cache, measuring timing differences that reveal access patterns of the co-located safety-critical workload. Over 14 days, the adversary reconstructs the model weights and inference patterns of the structural integrity model, including the safety thresholds used to classify structures as pass or fail. The adversary sells this information to a contractor who uses it to game the assessment system, submitting structural data calibrated to narrowly pass the thresholds. A building assessed using the compromised thresholds develops structural defects 18 months later.

What went wrong: The multi-GPU node shared L2 cache hierarchies between workloads from different tenants. No governance policy required cache partitioning or side-channel resistance assessment for safety-critical workloads. The platform's isolation controls addressed memory isolation but not microarchitectural side channels. The safety-critical workload was never assessed for co-tenancy risk. Consequence: Compromised safety assessment model, structural defects in a building, potential loss of life, regulatory investigation under building safety regulations, criminal liability investigation for the contractor, civil liability for the engineering firm and platform provider.

Scenario C — Noisy Neighbour Degrades Real-Time Agent Performance: A financial-value agent executing algorithmic trading strategies shares a GPU pool with a batch training workload from another tenant. The training workload saturates memory bandwidth on the shared interconnect during a large gradient synchronisation pass. The trading agent's inference latency spikes from 2ms to 340ms during the saturation event — a 170x degradation. During this 47-second window, the agent cannot execute time-sensitive trades and misses a market movement. The strategy accumulates £890,000 in opportunity cost because hedging positions were not placed within the required 5ms window. The trading firm's risk management system flags the latency anomaly but only after the window has closed.

What went wrong: The GPU pool allowed unrestricted bandwidth consumption by co-tenants. No quality-of-service partitioning guaranteed minimum bandwidth or compute capacity for the trading workload. The infrastructure provider's SLA guaranteed "available compute" but did not address shared-resource contention. The trading firm did not assess whether the GPU pool's tenancy model met its latency requirements under adversarial co-tenant conditions. Consequence: £890,000 in direct opportunity cost, potential FCA scrutiny under algorithmic trading systems requirements (MiFID II RTS 6, Article 17), SLA dispute with infrastructure provider, mandatory review of all infrastructure tenancy arrangements.

4. Requirement Statement

Scope: This dimension applies to any AI agent deployment where inference, fine-tuning, or embedding generation occurs on GPU, TPU, or other accelerator hardware that is or could be shared with other workloads — whether those workloads belong to the same organisation (internal multi-tenancy), different organisations (external multi-tenancy), or the same organisation but different trust domains (cross-classification multi-tenancy). The scope includes cloud-hosted GPU instances, on-premises GPU clusters with multi-tenant scheduling, edge accelerators shared between applications, and any accelerator pool where the organisation does not have exclusive physical control of the entire hardware device. Deployments on dedicated, single-tenant hardware where the organisation controls the entire physical device from power-on to decommission are excluded, provided the organisation can demonstrate exclusive tenancy through infrastructure evidence. The scope extends to the full accelerator hardware stack: compute cores, memory (HBM, GDDR, SRAM), cache hierarchies (L1, L2, shared caches), memory controllers, interconnects (NVLink, NVSwitch, PCIe), and scheduling subsystems. A workload that is isolated at the compute core level but shares a cache hierarchy with an untrusted workload is within scope.

4.1. A conforming system MUST classify every accelerator workload by data sensitivity tier (aligned with AG-014) and determine the minimum isolation level required before any workload is scheduled on shared hardware.

4.2. A conforming system MUST enforce hardware-level memory isolation between workloads of different tenants or different data classification tiers, preventing any workload from reading, writing, or inferring the memory contents of another workload through direct access, residual data, or timing side channels.

4.3. A conforming system MUST verify that GPU memory is sanitised between tenant context switches — either through hardware-enforced clearing, cryptographic memory isolation, or dedicated partitioning that eliminates shared memory surfaces entirely.

4.4. A conforming system MUST prohibit co-location of workloads from different data classification tiers on the same physical accelerator unless the isolation mechanism has been independently validated to prevent cross-tier data leakage under adversarial conditions.

4.5. A conforming system MUST maintain an auditable record of workload-to-hardware placement, including which workloads shared which physical accelerators, which isolation mechanisms were active, and the data classification tier of each co-located workload, for the full retention period defined in Section 7.

4.6. A conforming system MUST implement fail-safe behaviour when GPU isolation mechanisms are unavailable or degraded: workloads requiring isolation MUST be suspended rather than scheduled without isolation.

4.7. A conforming system SHOULD implement hardware-partitioned multi-instance GPU configurations (or equivalent technology-neutral partitioning) for workloads processing data classified as confidential or above, providing dedicated compute, memory, and cache resources rather than relying on time-sliced sharing.

4.8. A conforming system SHOULD monitor accelerator hardware for known side-channel vulnerabilities and apply mitigations (firmware updates, driver patches, scheduling restrictions) within a defined SLA from vulnerability disclosure.

4.9. A conforming system SHOULD implement performance isolation guarantees — minimum compute capacity, memory bandwidth, and interconnect bandwidth — for each workload, preventing co-tenant resource contention from degrading agent performance below defined thresholds.

4.10. A conforming system MAY implement cryptographic memory tagging or hardware memory encryption to provide defence-in-depth isolation even when primary partitioning mechanisms are active.

4.11. A conforming system MAY implement continuous side-channel monitoring that detects anomalous memory access patterns indicative of cache-timing or rowhammer-style attacks against co-located workloads.

5. Rationale

GPU tenancy isolation addresses a class of risk that is invisible at the application layer but critical at the infrastructure layer. When an AI agent performs inference, the model weights, activations, input data, and output data all reside in accelerator memory. This memory is physically shared hardware — the same DRAM chips, the same cache lines, the same memory controllers. If two workloads share this hardware, the isolation between them is entirely a function of how the hardware and its software stack partition those shared resources.

The risk is not theoretical. GPU side-channel attacks have been demonstrated in academic research since at least 2018, with practical exploits published against major GPU architectures. CVE-2023-25516 and related vulnerabilities in GPU kernel-mode drivers demonstrated that memory isolation between GPU contexts was insufficient to prevent cross-context data leakage. The GPU.zip attack (2023) demonstrated that GPU data compression side channels could leak visual data across security boundaries in web browsers — a technique that extends naturally to inference workload isolation in cloud environments. These are not exotic attacks requiring nation-state capabilities; they are implementable by moderately skilled adversaries with co-tenant access.

The governance challenge is that GPU isolation is significantly less mature than CPU isolation. Modern CPUs benefit from decades of hardening against side-channel attacks (Spectre, Meltdown, MDS), hardware-enforced memory tagging (ARM MTE, Intel TME), and virtualisation-level isolation (VT-x, AMD-V). GPU architectures have historically prioritised throughput over isolation, and their multi-tenancy features — multi-instance GPU, virtual GPU, time-slicing — have different isolation properties that are not well understood by most deploying organisations. A multi-instance GPU partition provides dedicated compute and memory; a time-sliced context shares both compute and memory with isolation dependent on software context switching. These are fundamentally different security postures, but both are marketed as "multi-tenancy solutions."

For AI agent deployments, the data flowing through accelerators is often the organisation's most sensitive material: customer PII, financial data, proprietary model weights, safety-critical inference results. The consequence of inadequate isolation ranges from regulatory breach (GDPR, HIPAA) to safety failure (compromised safety models) to competitive harm (leaked model intellectual property). The infrastructure layer cannot be treated as a trusted black box — it must be governed with the same rigour as the application layer.

Cross-border deployments amplify the risk. An agent operating under EU data sovereignty requirements (AG-048) may have its data co-located with workloads subject to different jurisdictional rules if GPU tenancy is not governed. The data never leaves the region — but it may share memory pages with a workload whose data is accessible to a foreign government under that government's data access laws. This jurisdictional contamination through shared hardware is a novel risk that existing data sovereignty frameworks do not adequately address.

6. Implementation Guidance

GPU Tenancy Isolation Governance requires that organisations understand the isolation properties of their accelerator infrastructure and match those properties to the sensitivity of the workloads they schedule. The core principle is that the isolation mechanism must be at least as strong as the classification gap between the most sensitive and least sensitive co-located workloads.

Recommended patterns:

Multi-instance partitioning for classified workloads. Use hardware-level multi-instance GPU partitioning (or equivalent technology) for workloads processing confidential or restricted data. Multi-instance partitions provide dedicated compute engines, dedicated memory controllers, and dedicated memory regions — eliminating shared memory surfaces and shared cache hierarchies. Each partition operates as an independent GPU instance with hardware-enforced resource boundaries. This is the strongest widely available isolation mechanism and should be the default for workloads processing PII, financial data, or safety-critical inputs.
Dedicated accelerator allocation for safety-critical workloads. For workloads where side-channel risk is unacceptable — safety-critical inference, cryptographic key generation, high-value financial computation — allocate dedicated physical accelerators with no co-tenancy. The entire GPU, including all compute engines, memory, caches, and interconnects, is exclusively assigned to the single workload. This eliminates all shared-resource side channels at the cost of reduced utilisation efficiency.
Workload classification gating in the scheduler. Implement a pre-scheduling gate that evaluates the data classification tier of each workload against the isolation properties of available accelerator resources. The scheduler refuses to place a workload on hardware that does not meet its minimum isolation requirement. If no suitable hardware is available, the workload queues rather than being downgraded to weaker isolation. This gate is the enforcement point for isolation policy and must operate independently of the workload requesting scheduling.
Memory sanitisation verification. For time-sliced configurations where memory is shared between context switches, implement verification that memory sanitisation occurs between every context switch. This can be achieved through: (a) enabling hardware-enforced memory clearing in the GPU driver configuration, (b) running a verification workload that checks for residual data patterns after each context switch, or (c) using GPU memory encryption to render residual data cryptographically inaccessible. Verification must be continuous, not a one-time configuration check.
Side-channel vulnerability monitoring. Subscribe to GPU vendor security advisories and GPU security research publications. Maintain a vulnerability register specific to the accelerator hardware and driver versions in use. When a side-channel vulnerability is disclosed that affects the deployed configuration, assess exposure within 48 hours and apply mitigations within the SLA defined in the organisation's vulnerability management policy.

Anti-patterns to avoid:

Assuming cloud provider isolation is sufficient. Cloud providers offer GPU instances with varying isolation properties. A cloud GPU instance may be a full physical GPU, a virtual GPU partition, or a time-sliced context — the isolation properties differ dramatically. Accepting "GPU instance" as equivalent to "isolated GPU" without understanding the underlying partitioning is a governance failure. The organisation must verify what isolation mechanism the cloud provider uses and whether it meets the workload's classification requirements.
Treating GPU memory like CPU memory. GPU memory management differs fundamentally from CPU memory management. GPUs do not have the same page table isolation, address space layout randomisation, or memory protection mechanisms as CPUs. Assumptions about memory isolation that are valid for CPU workloads may not apply to GPU workloads.
Ignoring interconnect sharing. Multi-GPU configurations share interconnects (NVLink, NVSwitch, PCIe) between GPUs on the same node. Even if individual GPUs are dedicated to single tenants, the interconnect may leak information through timing or bandwidth side channels. Interconnect isolation must be assessed separately from GPU isolation.
Governing only memory isolation while ignoring performance isolation. Memory leakage is the data confidentiality risk, but resource contention is the availability and integrity risk. An agent whose inference latency degrades 170x due to co-tenant resource saturation (Scenario C) has suffered a governance failure even if no data leaked. Performance isolation and data isolation are both required.
Relying solely on firmware or driver-level controls without verification. GPU firmware and driver isolation features are complex software with their own vulnerability histories. Trusting them without verification is equivalent to trusting an application firewall without testing it. Isolation mechanisms must be verified through testing, not assumed from vendor documentation.

Industry Considerations

Financial Services. Algorithmic trading and real-time risk computation agents require both data isolation and performance isolation. MiFID II RTS 6 requires that algorithmic trading systems have sufficient capacity and resilience — GPU contention that degrades latency may constitute a systems and controls failure. Financial data classification typically requires multi-instance partitioning at minimum, with dedicated accelerators for high-frequency trading workloads.

Healthcare. Patient data processed through AI diagnostic or clinical decision support agents is subject to HIPAA (US), UK GDPR, and sector-specific data protection requirements. GPU memory residuals containing patient imaging data, genomic data, or clinical records constitute a data breach if accessible to another tenant. Healthcare workloads should default to dedicated accelerator allocation or multi-instance partitioning with verified memory sanitisation.

Defence and National Security. Classified workloads processed on GPUs must comply with national classification handling requirements. In most jurisdictions, classified data may not share physical hardware with unclassified workloads under any partitioning arrangement. Dedicated hardware with TEMPEST-level physical security controls is typically required.

Crypto/Web3. Cryptographic key material and blockchain transaction data processed on shared GPUs creates key extraction risk. Side-channel attacks against cryptographic operations on GPUs are well-documented in academic literature. Cryptographic workloads should use dedicated accelerators and, where available, GPU-based trusted execution environments.

Maturity Model

Basic Implementation — The organisation has classified its accelerator workloads by data sensitivity and documented the isolation properties of its GPU infrastructure. Workloads processing confidential or restricted data are scheduled on multi-instance partitions or dedicated hardware. A workload placement log records which workloads share physical hardware. Memory sanitisation is enabled in the GPU driver configuration. This level meets the minimum mandatory requirements but relies on driver-level controls without independent verification.

Intermediate Implementation — All basic capabilities plus: a pre-scheduling gate enforces workload classification against hardware isolation properties. Memory sanitisation is verified through periodic testing, not solely through configuration. Side-channel vulnerability monitoring is active with defined response SLAs. Performance isolation guarantees are defined and monitored for latency-sensitive workloads. Workload placement records are retained for the full regulatory retention period with tamper-evident logging.

Advanced Implementation — All intermediate capabilities plus: independent adversarial testing has verified isolation under side-channel attack scenarios. Continuous side-channel monitoring detects anomalous memory access patterns in real time. Cryptographic memory encryption provides defence-in-depth. Hardware security modules manage GPU partition keys. The organisation can demonstrate to regulators that no known side-channel attack can extract data across tenant boundaries in its GPU infrastructure. Dynamic workload migration automatically evacuates workloads from hardware where isolation degradation is detected.

7. Evidence Requirements

Required artefacts:

GPU tenancy isolation policy. Documented policy specifying the minimum isolation level required for each data classification tier, the approved GPU partitioning mechanisms, and the prohibited co-location configurations.
Workload-to-hardware placement log. Timestamped, tamper-evident records showing which workloads were scheduled on which physical accelerators, the isolation mechanism active for each placement, and the data classification tier of each co-located workload.
Memory sanitisation verification results. Test results demonstrating that memory is effectively sanitised between tenant context switches, including the testing methodology, frequency, and any residual data detected.
Side-channel vulnerability assessment. Assessment of the accelerator hardware and driver stack against known side-channel vulnerabilities, including the mitigations applied and any residual risk accepted.
Isolation mechanism test results. Results from testing that verifies the isolation mechanism prevents cross-tenant data leakage under adversarial conditions, including the attack scenarios tested and outcomes.

Retention requirements:

Workload placement logs and isolation verification results: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Evidence must exist as retained artefacts, not be reconstructable after the fact.

8. Test Specification

Testing AG-401 compliance requires verification that isolation mechanisms prevent data leakage, side-channel exploitation, and resource contention across all co-tenancy configurations in the deployed infrastructure.

Test 8.1: Workload Classification Gating

Stimulus: Attempt to schedule a workload classified as confidential (per AG-014) on a GPU resource that only provides time-sliced isolation without memory partitioning. Submit the scheduling request through the standard workload submission path.
Expected behaviour: The scheduler rejects the placement, returning a structured error indicating that the available isolation level does not meet the workload's classification requirement. The workload queues until appropriate hardware is available.
Pass criteria: The confidential workload is never scheduled on hardware with insufficient isolation. The rejection is logged with the workload classification, the available isolation level, and the required isolation level.
Fail criteria: The confidential workload is scheduled on hardware that does not meet its isolation requirement, or the workload is silently downgraded to weaker isolation.

Test 8.2: Memory Sanitisation Between Context Switches

Stimulus: Run a test workload that writes a known pattern (a distinctive 1 MiB byte sequence) to GPU memory, then terminates. Immediately schedule a second workload on the same GPU resource. The second workload scans all accessible GPU memory for the known pattern.
Expected behaviour: The second workload finds no trace of the known pattern. Memory has been sanitised between context switches.
Pass criteria: Zero bytes of the known pattern are recoverable by the second workload across 100 consecutive test iterations.
Fail criteria: Any portion of the known pattern is recoverable in any test iteration.

Test 8.3: Hardware Memory Isolation Under Adversarial Probing

Stimulus: Co-locate two workloads on the same physical GPU using the strongest available partitioning mechanism. Workload A writes sensitive test data to its partition. Workload B executes a comprehensive set of memory probing techniques: direct address access outside its partition, cache-timing probes across partition boundaries, memory bus contention analysis, and known GPU side-channel attack techniques (cache occupancy analysis, memory coalescing side channels).
Expected behaviour: Workload B cannot recover any data from Workload A's partition through any probing technique.
Pass criteria: No test data from Workload A is recoverable by Workload B through any probing technique. Timing analysis does not reveal data-dependent patterns from Workload A.
Fail criteria: Any test data from Workload A is recoverable, or timing analysis reveals data-dependent patterns correlated with Workload A's operations.

Test 8.4: Cross-Classification Co-Location Prevention

Stimulus: Attempt to schedule workloads from two different data classification tiers (e.g., public and confidential) on the same physical accelerator where the isolation mechanism has not been validated for cross-tier isolation. Submit both scheduling requests simultaneously.
Expected behaviour: The system prevents co-location. Either the lower-classified workload is scheduled and the higher-classified workload queues for appropriate hardware, or the system allocates the hardware to the higher-classified workload exclusively.
Pass criteria: At no point are workloads from incompatible classification tiers co-located on the same physical accelerator without a validated isolation mechanism. The prevention is logged.
Fail criteria: Workloads from incompatible classification tiers are co-located without validated isolation, or the scheduling system does not evaluate classification compatibility.

Test 8.5: Workload Placement Audit Trail Completeness

Stimulus: Schedule 50 workloads across the GPU infrastructure over a 24-hour period, including workloads on shared hardware, dedicated hardware, and multi-instance partitions. After the test period, request the complete workload placement log.
Expected behaviour: The placement log contains an entry for every workload scheduling event, including: workload identifier, physical hardware identifier, isolation mechanism type, data classification tier, co-located workload identifiers (if any), and timestamp.
Pass criteria: 100% of scheduling events are recorded. Every required field is populated. The log is tamper-evident (cryptographically signed or append-only). No gaps exist in the timeline.
Fail criteria: Any scheduling event is missing from the log, any required field is absent, or the log is mutable without detection.

Test 8.6: Fail-Safe Behaviour Under Isolation Degradation

Stimulus: Simulate degradation of the GPU isolation mechanism: disable multi-instance partitioning through a configuration change, simulate a driver failure that disables memory clearing, or introduce a condition where the isolation verification check fails. Then attempt to schedule a workload requiring isolation.
Expected behaviour: The system detects that the isolation mechanism is unavailable or degraded. Workloads requiring isolation are suspended or queued rather than scheduled without isolation. An alert is generated to the infrastructure operations team.
Pass criteria: No workload requiring isolation is scheduled while the isolation mechanism is degraded. Existing workloads on the affected hardware are suspended within the defined SLA. An alert is generated.
Fail criteria: Any workload is scheduled without its required isolation level, or existing workloads continue operating on degraded hardware without suspension or alert.

Test 8.7: Performance Isolation Under Co-Tenant Contention

Stimulus: Co-locate a latency-sensitive inference workload with a resource-intensive training workload on shared accelerator infrastructure. The training workload executes a maximum-bandwidth gradient synchronisation designed to saturate the shared interconnect and memory bus. Measure the inference workload's latency during the contention event.
Expected behaviour: The inference workload's latency does not exceed the defined performance isolation threshold (e.g., no more than 2x baseline latency). Quality-of-service mechanisms prevent the training workload from monopolising shared resources.
Pass criteria: Inference latency remains within the defined performance isolation threshold during the contention event. The performance isolation mechanism demonstrably limits the training workload's resource consumption.
Fail criteria: Inference latency exceeds the defined threshold, or no performance isolation mechanism prevents resource monopolisation.

Conformance Scoring

Score 0: No GPU tenancy isolation governance exists — workloads are scheduled on shared hardware without regard to data classification, isolation properties, or co-tenancy risk.
Score 1: Workload classification and isolation requirements are documented, but enforcement is manual or advisory. Co-location decisions are made by operations teams without automated gating. Memory sanitisation relies on default driver behaviour without verification.
Score 2: Automated workload classification gating prevents incompatible co-location. Memory sanitisation is verified through periodic testing. Workload placement is logged with full audit trail. Performance isolation thresholds are defined and monitored. Fail-safe behaviour suspends workloads when isolation degrades.
Score 3: Verified by independent adversarial testing confirming that no known side-channel attack can extract data across tenant boundaries. Continuous side-channel monitoring is active. Cryptographic memory protection provides defence-in-depth. The organisation can demonstrate to regulators that GPU tenancy isolation meets or exceeds the security posture of dedicated physical hardware.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 15 (Accuracy, Robustness and Cybersecurity)	Direct requirement
EU AI Act	Article 9 (Risk Management System)	Supports compliance
SOX	Section 404 (Internal Controls Over Financial Reporting)	Supports compliance
FCA SYSC	6.1.1R (Systems and Controls)	Direct requirement
NIST AI RMF	MANAGE 2.2, MAP 3.5	Supports compliance
ISO 42001	Clause 6.1 (Actions to Address Risks), Clause 8.2 (AI Risk Assessment)	Supports compliance
DORA	Article 9 (ICT Risk Management Framework), Article 11 (ICT Third-Party Risk)	Direct requirement

EU AI Act — Article 15 (Accuracy, Robustness and Cybersecurity)

Article 15 requires that high-risk AI systems be resilient against attempts by unauthorised third parties to exploit system vulnerabilities, including attempts to manipulate the training dataset, inputs, or the system environment. GPU side-channel attacks are precisely the type of system environment exploitation that Article 15 addresses. An AI system that leaks inference data through shared accelerator memory is not cybersecure. An AI system whose inference accuracy degrades due to co-tenant resource contention is not robust. Organisations deploying high-risk AI systems on shared GPU infrastructure must demonstrate that their tenancy isolation controls prevent both data leakage and performance degradation under adversarial co-tenant conditions.

EU AI Act — Article 9 (Risk Management System)

The risk management system required by Article 9 must identify and analyse known and reasonably foreseeable risks. GPU tenancy side-channel risk is a known and documented risk — it has been demonstrated in peer-reviewed research and has resulted in assigned CVEs. An Article 9 risk assessment that does not consider GPU tenancy isolation for AI systems deployed on shared accelerator hardware is incomplete.

SOX — Section 404 (Internal Controls Over Financial Reporting)

Financial AI agents performing calculations, generating reports, or executing transactions process data whose integrity is material to financial reporting. If that data transits GPU memory that is shared with untrusted workloads, the integrity of the internal control is compromised. A SOX auditor assessing an AI agent's control environment will evaluate whether the infrastructure layer provides adequate isolation. GPU tenancy without verified isolation creates a control deficiency — potentially a material weakness if the agent processes financially material data.

FCA SYSC — 6.1.1R (Systems and Controls)

The FCA requires that firms' systems and controls are adequate for the nature, scale, and complexity of their operations. GPU-accelerated AI agents processing customer data, executing trades, or generating financial advice operate in an infrastructure environment whose security properties must be governed. The FCA's expectations extend to the full technology stack, including infrastructure-layer controls. A firm that deploys AI agents on shared GPU hardware without governing tenancy isolation is failing to maintain adequate systems and controls for the infrastructure layer.

NIST AI RMF — MANAGE 2.2, MAP 3.5

MANAGE 2.2 addresses risk mitigation through enforceable controls. GPU tenancy isolation is an infrastructure-layer risk mitigation control for data confidentiality and system integrity. MAP 3.5 addresses the mapping of AI system dependencies and interactions. Shared accelerator hardware is a critical dependency whose properties (isolation level, co-tenancy model, side-channel exposure) must be mapped and governed.

ISO 42001 — Clause 6.1, Clause 8.2

The AI management system must address risks arising from the infrastructure supporting AI systems. GPU tenancy isolation is a risk treatment for the infrastructure-layer risks of data leakage and performance degradation. Clause 8.2's AI risk assessment must include infrastructure-layer risks, not solely model-layer or application-layer risks.

DORA — Article 9, Article 11

Article 9 requires financial entities to maintain an ICT risk management framework covering the full ICT environment, including infrastructure. Article 11 extends risk management to ICT third-party service providers — including cloud GPU providers. Financial entities using third-party GPU infrastructure must assess and govern the tenancy isolation properties of that infrastructure. A cloud GPU service with inadequate tenancy isolation is an ICT third-party risk that must be managed under DORA.

10. Failure Severity

Field	Value
Severity Rating	Critical
Blast Radius	Cross-organisation — data leakage across tenant boundaries affects multiple organisations; safety-critical model compromise affects downstream users and the public

Consequence chain: Inadequate GPU tenancy isolation creates a shared-hardware attack surface that bypasses all application-layer and network-layer security controls. The immediate technical failure is cross-tenant data exposure through memory residuals, cache side channels, or interconnect timing analysis. The data exposed is typically the most sensitive material in the organisation's AI pipeline — model inputs containing PII, financial data, or safety-critical measurements, and model outputs containing decisions, classifications, or predictions. The operational impact bifurcates into two paths. On the confidentiality path: exposed data triggers mandatory breach notification (UK GDPR Article 33, HIPAA breach notification rule), regulatory investigation, customer remediation, and reputational damage. Scenario A demonstrates this path: 1,847 national insurance numbers leaked through GPU memory residuals, £2.3 million remediation cost. On the integrity path: compromised model weights or safety thresholds enable adversaries to game AI decision systems, as in Scenario B where a structural integrity model was reverse-engineered through cache timing, leading to a building with structural defects and potential loss of life. The performance path (Scenario C) demonstrates availability impact: £890,000 in trading losses from a 47-second latency spike caused by co-tenant resource contention. The severity is critical because the attack surface is invisible to application-layer monitoring — no application log, API gateway, or network firewall detects data leakage through shared GPU memory. Detection requires infrastructure-layer monitoring that most organisations do not implement. The blast radius extends cross-organisation because shared GPU infrastructure connects tenants who have no business relationship and no visibility into each other's workloads.

Cross-references: AG-400 (Hardware Enclave Policy Governance), AG-014 (Data Classification Governance), AG-399 (Region Pinning Governance), AG-402 (Model Serving Rate Partitioning Governance), AG-404 (Network Egress and DNS Control Governance), AG-408 (Infrastructure Drift Detection Governance), AG-015 (PII & Sensitive Data Handling), AG-048 (Cross-Border Data Sovereignty Governance).

Cite this protocol

AgentGoverning. (2026). AG-401: GPU Tenancy Isolation Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-401

← Previous Protocol

AG-400

Hardware Enclave Policy Governance

Next Protocol →

AG-402

Model Serving Rate Partitioning Governance