GPU Tenancy Isolation Governance requires that organisations deploying AI agents on shared accelerator hardware implement structural controls to prevent data leakage, side-channel exposure, and computation spillover between tenants or workloads sharing the same physical GPU, multi-GPU node, or accelerator pool. Modern GPU architectures support multi-instance partitioning, virtual function pass-through, and time-sliced scheduling — but each mechanism has distinct isolation properties, failure modes, and residual risks that must be governed rather than assumed. Without explicit tenancy isolation governance, an agent processing classified financial data may share accelerator memory pages, cache hierarchies, or scheduling queues with an agent processing untrusted public inputs, creating cross-contamination vectors that bypass every application-layer security control.
Scenario A — Shared GPU Memory Residual Leaks Sensitive Data: A cloud-hosted AI platform runs two workloads on the same physical GPU using time-sliced scheduling: Workload A is a financial-value agent processing customer credit assessments containing full names, national insurance numbers, income statements, and credit scores for a UK retail bank. Workload B is a customer-facing chatbot for a separate tenant on the same platform. The platform relies on the GPU driver's default memory management to clear allocations between context switches. A vulnerability in the driver (similar to CVE-2023-25516 affecting a major GPU vendor's kernel module) allows Workload B to read residual memory from Workload A's prior allocation. Over a 72-hour period, 4,200 partial customer records — including 1,847 national insurance numbers — are extractable from memory residuals. The breach is discovered when a security researcher demonstrates the extraction technique to the platform provider.
What went wrong: The organisation relied on driver-level memory clearing as the sole isolation mechanism. No governance policy required verification of memory sanitisation between tenant context switches. The time-sliced scheduling mode shares the full GPU memory space between workloads, with isolation dependent entirely on software-layer cleanup. The data classification of Workload A (containing PII per AG-015) was never evaluated against the isolation properties of the scheduling mode. Consequence: Breach of 1,847 national insurance numbers, mandatory ICO notification under UK GDPR Article 33, estimated remediation cost £2.3 million including customer notification, credit monitoring, and regulatory fine. Platform provider faces contractual liability to the bank under the data processing agreement.
Scenario B — Side-Channel Attack via GPU Cache Timing: An adversary deploys a seemingly benign inference workload on a shared accelerator cluster. The workload is co-located on the same multi-GPU node as a safety-critical agent performing real-time structural integrity assessments for a civil engineering firm. The adversarial workload performs carefully timed memory access patterns designed to probe the GPU's L2 cache, measuring timing differences that reveal access patterns of the co-located safety-critical workload. Over 14 days, the adversary reconstructs the model weights and inference patterns of the structural integrity model, including the safety thresholds used to classify structures as pass or fail. The adversary sells this information to a contractor who uses it to game the assessment system, submitting structural data calibrated to narrowly pass the thresholds. A building assessed using the compromised thresholds develops structural defects 18 months later.
What went wrong: The multi-GPU node shared L2 cache hierarchies between workloads from different tenants. No governance policy required cache partitioning or side-channel resistance assessment for safety-critical workloads. The platform's isolation controls addressed memory isolation but not microarchitectural side channels. The safety-critical workload was never assessed for co-tenancy risk. Consequence: Compromised safety assessment model, structural defects in a building, potential loss of life, regulatory investigation under building safety regulations, criminal liability investigation for the contractor, civil liability for the engineering firm and platform provider.
Scenario C — Noisy Neighbour Degrades Real-Time Agent Performance: A financial-value agent executing algorithmic trading strategies shares a GPU pool with a batch training workload from another tenant. The training workload saturates memory bandwidth on the shared interconnect during a large gradient synchronisation pass. The trading agent's inference latency spikes from 2ms to 340ms during the saturation event — a 170x degradation. During this 47-second window, the agent cannot execute time-sensitive trades and misses a market movement. The strategy accumulates £890,000 in opportunity cost because hedging positions were not placed within the required 5ms window. The trading firm's risk management system flags the latency anomaly but only after the window has closed.
What went wrong: The GPU pool allowed unrestricted bandwidth consumption by co-tenants. No quality-of-service partitioning guaranteed minimum bandwidth or compute capacity for the trading workload. The infrastructure provider's SLA guaranteed "available compute" but did not address shared-resource contention. The trading firm did not assess whether the GPU pool's tenancy model met its latency requirements under adversarial co-tenant conditions. Consequence: £890,000 in direct opportunity cost, potential FCA scrutiny under algorithmic trading systems requirements (MiFID II RTS 6, Article 17), SLA dispute with infrastructure provider, mandatory review of all infrastructure tenancy arrangements.
Scope: This dimension applies to any AI agent deployment where inference, fine-tuning, or embedding generation occurs on GPU, TPU, or other accelerator hardware that is or could be shared with other workloads — whether those workloads belong to the same organisation (internal multi-tenancy), different organisations (external multi-tenancy), or the same organisation but different trust domains (cross-classification multi-tenancy). The scope includes cloud-hosted GPU instances, on-premises GPU clusters with multi-tenant scheduling, edge accelerators shared between applications, and any accelerator pool where the organisation does not have exclusive physical control of the entire hardware device. Deployments on dedicated, single-tenant hardware where the organisation controls the entire physical device from power-on to decommission are excluded, provided the organisation can demonstrate exclusive tenancy through infrastructure evidence. The scope extends to the full accelerator hardware stack: compute cores, memory (HBM, GDDR, SRAM), cache hierarchies (L1, L2, shared caches), memory controllers, interconnects (NVLink, NVSwitch, PCIe), and scheduling subsystems. A workload that is isolated at the compute core level but shares a cache hierarchy with an untrusted workload is within scope.
4.1. A conforming system MUST classify every accelerator workload by data sensitivity tier (aligned with AG-014) and determine the minimum isolation level required before any workload is scheduled on shared hardware.
4.2. A conforming system MUST enforce hardware-level memory isolation between workloads of different tenants or different data classification tiers, preventing any workload from reading, writing, or inferring the memory contents of another workload through direct access, residual data, or timing side channels.
4.3. A conforming system MUST verify that GPU memory is sanitised between tenant context switches — either through hardware-enforced clearing, cryptographic memory isolation, or dedicated partitioning that eliminates shared memory surfaces entirely.
4.4. A conforming system MUST prohibit co-location of workloads from different data classification tiers on the same physical accelerator unless the isolation mechanism has been independently validated to prevent cross-tier data leakage under adversarial conditions.
4.5. A conforming system MUST maintain an auditable record of workload-to-hardware placement, including which workloads shared which physical accelerators, which isolation mechanisms were active, and the data classification tier of each co-located workload, for the full retention period defined in Section 7.
4.6. A conforming system MUST implement fail-safe behaviour when GPU isolation mechanisms are unavailable or degraded: workloads requiring isolation MUST be suspended rather than scheduled without isolation.
4.7. A conforming system SHOULD implement hardware-partitioned multi-instance GPU configurations (or equivalent technology-neutral partitioning) for workloads processing data classified as confidential or above, providing dedicated compute, memory, and cache resources rather than relying on time-sliced sharing.
4.8. A conforming system SHOULD monitor accelerator hardware for known side-channel vulnerabilities and apply mitigations (firmware updates, driver patches, scheduling restrictions) within a defined SLA from vulnerability disclosure.
4.9. A conforming system SHOULD implement performance isolation guarantees — minimum compute capacity, memory bandwidth, and interconnect bandwidth — for each workload, preventing co-tenant resource contention from degrading agent performance below defined thresholds.
4.10. A conforming system MAY implement cryptographic memory tagging or hardware memory encryption to provide defence-in-depth isolation even when primary partitioning mechanisms are active.
4.11. A conforming system MAY implement continuous side-channel monitoring that detects anomalous memory access patterns indicative of cache-timing or rowhammer-style attacks against co-located workloads.
GPU tenancy isolation addresses a class of risk that is invisible at the application layer but critical at the infrastructure layer. When an AI agent performs inference, the model weights, activations, input data, and output data all reside in accelerator memory. This memory is physically shared hardware — the same DRAM chips, the same cache lines, the same memory controllers. If two workloads share this hardware, the isolation between them is entirely a function of how the hardware and its software stack partition those shared resources.
The risk is not theoretical. GPU side-channel attacks have been demonstrated in academic research since at least 2018, with practical exploits published against major GPU architectures. CVE-2023-25516 and related vulnerabilities in GPU kernel-mode drivers demonstrated that memory isolation between GPU contexts was insufficient to prevent cross-context data leakage. The GPU.zip attack (2023) demonstrated that GPU data compression side channels could leak visual data across security boundaries in web browsers — a technique that extends naturally to inference workload isolation in cloud environments. These are not exotic attacks requiring nation-state capabilities; they are implementable by moderately skilled adversaries with co-tenant access.
The governance challenge is that GPU isolation is significantly less mature than CPU isolation. Modern CPUs benefit from decades of hardening against side-channel attacks (Spectre, Meltdown, MDS), hardware-enforced memory tagging (ARM MTE, Intel TME), and virtualisation-level isolation (VT-x, AMD-V). GPU architectures have historically prioritised throughput over isolation, and their multi-tenancy features — multi-instance GPU, virtual GPU, time-slicing — have different isolation properties that are not well understood by most deploying organisations. A multi-instance GPU partition provides dedicated compute and memory; a time-sliced context shares both compute and memory with isolation dependent on software context switching. These are fundamentally different security postures, but both are marketed as "multi-tenancy solutions."
For AI agent deployments, the data flowing through accelerators is often the organisation's most sensitive material: customer PII, financial data, proprietary model weights, safety-critical inference results. The consequence of inadequate isolation ranges from regulatory breach (GDPR, HIPAA) to safety failure (compromised safety models) to competitive harm (leaked model intellectual property). The infrastructure layer cannot be treated as a trusted black box — it must be governed with the same rigour as the application layer.
Cross-border deployments amplify the risk. An agent operating under EU data sovereignty requirements (AG-048) may have its data co-located with workloads subject to different jurisdictional rules if GPU tenancy is not governed. The data never leaves the region — but it may share memory pages with a workload whose data is accessible to a foreign government under that government's data access laws. This jurisdictional contamination through shared hardware is a novel risk that existing data sovereignty frameworks do not adequately address.
GPU Tenancy Isolation Governance requires that organisations understand the isolation properties of their accelerator infrastructure and match those properties to the sensitivity of the workloads they schedule. The core principle is that the isolation mechanism must be at least as strong as the classification gap between the most sensitive and least sensitive co-located workloads.
Recommended patterns:
Anti-patterns to avoid:
Financial Services. Algorithmic trading and real-time risk computation agents require both data isolation and performance isolation. MiFID II RTS 6 requires that algorithmic trading systems have sufficient capacity and resilience — GPU contention that degrades latency may constitute a systems and controls failure. Financial data classification typically requires multi-instance partitioning at minimum, with dedicated accelerators for high-frequency trading workloads.
Healthcare. Patient data processed through AI diagnostic or clinical decision support agents is subject to HIPAA (US), UK GDPR, and sector-specific data protection requirements. GPU memory residuals containing patient imaging data, genomic data, or clinical records constitute a data breach if accessible to another tenant. Healthcare workloads should default to dedicated accelerator allocation or multi-instance partitioning with verified memory sanitisation.
Defence and National Security. Classified workloads processed on GPUs must comply with national classification handling requirements. In most jurisdictions, classified data may not share physical hardware with unclassified workloads under any partitioning arrangement. Dedicated hardware with TEMPEST-level physical security controls is typically required.
Crypto/Web3. Cryptographic key material and blockchain transaction data processed on shared GPUs creates key extraction risk. Side-channel attacks against cryptographic operations on GPUs are well-documented in academic literature. Cryptographic workloads should use dedicated accelerators and, where available, GPU-based trusted execution environments.
Basic Implementation — The organisation has classified its accelerator workloads by data sensitivity and documented the isolation properties of its GPU infrastructure. Workloads processing confidential or restricted data are scheduled on multi-instance partitions or dedicated hardware. A workload placement log records which workloads share physical hardware. Memory sanitisation is enabled in the GPU driver configuration. This level meets the minimum mandatory requirements but relies on driver-level controls without independent verification.
Intermediate Implementation — All basic capabilities plus: a pre-scheduling gate enforces workload classification against hardware isolation properties. Memory sanitisation is verified through periodic testing, not solely through configuration. Side-channel vulnerability monitoring is active with defined response SLAs. Performance isolation guarantees are defined and monitored for latency-sensitive workloads. Workload placement records are retained for the full regulatory retention period with tamper-evident logging.
Advanced Implementation — All intermediate capabilities plus: independent adversarial testing has verified isolation under side-channel attack scenarios. Continuous side-channel monitoring detects anomalous memory access patterns in real time. Cryptographic memory encryption provides defence-in-depth. Hardware security modules manage GPU partition keys. The organisation can demonstrate to regulators that no known side-channel attack can extract data across tenant boundaries in its GPU infrastructure. Dynamic workload migration automatically evacuates workloads from hardware where isolation degradation is detected.
Required artefacts:
Retention requirements:
Access requirements:
Testing AG-401 compliance requires verification that isolation mechanisms prevent data leakage, side-channel exploitation, and resource contention across all co-tenancy configurations in the deployed infrastructure.
Test 8.1: Workload Classification Gating
Test 8.2: Memory Sanitisation Between Context Switches
Test 8.3: Hardware Memory Isolation Under Adversarial Probing
Test 8.4: Cross-Classification Co-Location Prevention
Test 8.5: Workload Placement Audit Trail Completeness
Test 8.6: Fail-Safe Behaviour Under Isolation Degradation
Test 8.7: Performance Isolation Under Co-Tenant Contention
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 15 (Accuracy, Robustness and Cybersecurity) | Direct requirement |
| EU AI Act | Article 9 (Risk Management System) | Supports compliance |
| SOX | Section 404 (Internal Controls Over Financial Reporting) | Supports compliance |
| FCA SYSC | 6.1.1R (Systems and Controls) | Direct requirement |
| NIST AI RMF | MANAGE 2.2, MAP 3.5 | Supports compliance |
| ISO 42001 | Clause 6.1 (Actions to Address Risks), Clause 8.2 (AI Risk Assessment) | Supports compliance |
| DORA | Article 9 (ICT Risk Management Framework), Article 11 (ICT Third-Party Risk) | Direct requirement |
Article 15 requires that high-risk AI systems be resilient against attempts by unauthorised third parties to exploit system vulnerabilities, including attempts to manipulate the training dataset, inputs, or the system environment. GPU side-channel attacks are precisely the type of system environment exploitation that Article 15 addresses. An AI system that leaks inference data through shared accelerator memory is not cybersecure. An AI system whose inference accuracy degrades due to co-tenant resource contention is not robust. Organisations deploying high-risk AI systems on shared GPU infrastructure must demonstrate that their tenancy isolation controls prevent both data leakage and performance degradation under adversarial co-tenant conditions.
The risk management system required by Article 9 must identify and analyse known and reasonably foreseeable risks. GPU tenancy side-channel risk is a known and documented risk — it has been demonstrated in peer-reviewed research and has resulted in assigned CVEs. An Article 9 risk assessment that does not consider GPU tenancy isolation for AI systems deployed on shared accelerator hardware is incomplete.
Financial AI agents performing calculations, generating reports, or executing transactions process data whose integrity is material to financial reporting. If that data transits GPU memory that is shared with untrusted workloads, the integrity of the internal control is compromised. A SOX auditor assessing an AI agent's control environment will evaluate whether the infrastructure layer provides adequate isolation. GPU tenancy without verified isolation creates a control deficiency — potentially a material weakness if the agent processes financially material data.
The FCA requires that firms' systems and controls are adequate for the nature, scale, and complexity of their operations. GPU-accelerated AI agents processing customer data, executing trades, or generating financial advice operate in an infrastructure environment whose security properties must be governed. The FCA's expectations extend to the full technology stack, including infrastructure-layer controls. A firm that deploys AI agents on shared GPU hardware without governing tenancy isolation is failing to maintain adequate systems and controls for the infrastructure layer.
MANAGE 2.2 addresses risk mitigation through enforceable controls. GPU tenancy isolation is an infrastructure-layer risk mitigation control for data confidentiality and system integrity. MAP 3.5 addresses the mapping of AI system dependencies and interactions. Shared accelerator hardware is a critical dependency whose properties (isolation level, co-tenancy model, side-channel exposure) must be mapped and governed.
The AI management system must address risks arising from the infrastructure supporting AI systems. GPU tenancy isolation is a risk treatment for the infrastructure-layer risks of data leakage and performance degradation. Clause 8.2's AI risk assessment must include infrastructure-layer risks, not solely model-layer or application-layer risks.
Article 9 requires financial entities to maintain an ICT risk management framework covering the full ICT environment, including infrastructure. Article 11 extends risk management to ICT third-party service providers — including cloud GPU providers. Financial entities using third-party GPU infrastructure must assess and govern the tenancy isolation properties of that infrastructure. A cloud GPU service with inadequate tenancy isolation is an ICT third-party risk that must be managed under DORA.
| Field | Value |
|---|---|
| Severity Rating | Critical |
| Blast Radius | Cross-organisation — data leakage across tenant boundaries affects multiple organisations; safety-critical model compromise affects downstream users and the public |
Consequence chain: Inadequate GPU tenancy isolation creates a shared-hardware attack surface that bypasses all application-layer and network-layer security controls. The immediate technical failure is cross-tenant data exposure through memory residuals, cache side channels, or interconnect timing analysis. The data exposed is typically the most sensitive material in the organisation's AI pipeline — model inputs containing PII, financial data, or safety-critical measurements, and model outputs containing decisions, classifications, or predictions. The operational impact bifurcates into two paths. On the confidentiality path: exposed data triggers mandatory breach notification (UK GDPR Article 33, HIPAA breach notification rule), regulatory investigation, customer remediation, and reputational damage. Scenario A demonstrates this path: 1,847 national insurance numbers leaked through GPU memory residuals, £2.3 million remediation cost. On the integrity path: compromised model weights or safety thresholds enable adversaries to game AI decision systems, as in Scenario B where a structural integrity model was reverse-engineered through cache timing, leading to a building with structural defects and potential loss of life. The performance path (Scenario C) demonstrates availability impact: £890,000 in trading losses from a 47-second latency spike caused by co-tenant resource contention. The severity is critical because the attack surface is invisible to application-layer monitoring — no application log, API gateway, or network firewall detects data leakage through shared GPU memory. Detection requires infrastructure-layer monitoring that most organisations do not implement. The blast radius extends cross-organisation because shared GPU infrastructure connects tenants who have no business relationship and no visibility into each other's workloads.
Cross-references: AG-400 (Hardware Enclave Policy Governance), AG-014 (Data Classification Governance), AG-399 (Region Pinning Governance), AG-402 (Model Serving Rate Partitioning Governance), AG-404 (Network Egress and DNS Control Governance), AG-408 (Infrastructure Drift Detection Governance), AG-015 (PII & Sensitive Data Handling), AG-048 (Cross-Border Data Sovereignty Governance).