Concurrency and Shared-State Locking Governance requires that when multiple AI agents — or multiple instances of the same agent — access shared state (databases, configuration stores, resource pools, queues, or any mutable data), the system enforces explicit locking, serialisation, or conflict resolution mechanisms that prevent race conditions, lost updates, phantom reads, and state corruption. AI agents operating concurrently without shared-state governance can produce outcomes that no individual agent intended: double-spending from a shared balance, conflicting updates to the same record, or decisions based on stale state that another agent has already changed. This dimension ensures that concurrent agent operations are correct, consistent, and auditable — that the outcome of concurrent execution is equivalent to some serial execution of the same operations.
Scenario A — Double-Spend from Shared Account Balance: Two payment agents (Agent-Alpha and Agent-Beta) share access to a corporate payment account with a balance of £500,000. Agent-Alpha checks the balance (£500,000), determines it can process a £350,000 payment, and begins the payment submission. Agent-Beta, concurrently, checks the balance (still £500,000 — Agent-Alpha's payment has not yet settled), determines it can process a £280,000 payment, and begins its own submission. Both payments execute successfully. The total disbursement is £630,000 against a balance of £500,000, creating a £130,000 overdraft. The account has no overdraft facility. The bank bounces the second payment 48 hours later, after the recipient has already credited the sender.
What went wrong: Both agents read the account balance without acquiring a lock. The check-and-spend operation was not atomic. The time between reading the balance and executing the payment created a window during which the balance was stale. Neither agent was aware of the other's in-flight transaction. Consequence: £130,000 overdraft exposure. Bounced payment damages supplier relationship. Bank charges for unauthorised overdraft. FCA inquiry into automated payment controls.
Scenario B — Lost Update on Customer Record: A customer service agent and a fraud detection agent both update the same customer record concurrently. The customer service agent reads the record, modifies the contact email (customer request), and writes the updated record. The fraud detection agent, which read the record 200 milliseconds before the customer service agent's write, adds a fraud flag to the record and writes its version. The fraud detection agent's write overwrites the customer service agent's update because it was based on the pre-update version of the record. The customer's email change is lost. The customer continues receiving communications at the old email address, including sensitive financial documents. The fraud flag is present, but the customer's requested update is silently discarded.
What went wrong: Both agents used a read-modify-write pattern without optimistic or pessimistic locking. The second write overwrote the first write entirely because the system had no mechanism to detect that the record had changed between the read and the write. Consequence: Customer data integrity failure. Sensitive financial documents sent to an email address the customer has disowned. GDPR breach notification potential. Customer complaint to the Financial Ombudsman.
Scenario C — Optimistic Locking Prevents Conflicting Updates: Two agents attempt to update the same inventory record. Each reads the record, which includes a version number (version 47). Agent-A modifies the quantity and submits the update with version 47. The system accepts the update and increments the version to 48. Agent-B, which also read version 47, submits its update referencing version 47. The system detects that the current version is 48, not 47 — the record has changed since Agent-B read it. The system rejects Agent-B's update with a conflict response. Agent-B re-reads the record (now version 48 with Agent-A's changes), applies its own modification to the updated data, and resubmits with version 48. The update succeeds. Both agents' changes are preserved.
What went right: Optimistic locking detected the concurrent modification. The conflict was resolved by requiring the second writer to re-read and re-apply changes. No data was lost. The final state reflects both agents' intended modifications.
Scope: This dimension applies to every deployment where multiple AI agents — or multiple instances of the same agent — can access the same mutable state simultaneously. Mutable state includes: database records, configuration files, resource pools (account balances, inventory counts, capacity allocations), message queues, shared caches, file system objects, and any other data that can be read and modified by more than one agent concurrently. The scope extends to implicit shared state: two agents that do not directly share a database but both affect the same downstream system (e.g., both submit payments from the same account) are accessing shared state (the account balance) and require concurrency governance. Single-agent deployments with no concurrent access to mutable state are excluded, but organisations should assess whether the deployment could evolve to multi-agent or multi-instance operation.
4.1. A conforming system MUST implement explicit concurrency control for all shared mutable state accessed by agents, using pessimistic locking, optimistic locking with conflict detection, or serialisable transaction isolation.
4.2. A conforming system MUST ensure that read-modify-write operations on shared state are atomic — the state cannot change between the read and the write without detection and resolution.
4.3. A conforming system MUST enforce a maximum lock hold duration for pessimistic locks, after which the lock is automatically released and the holding agent's pending operation is rolled back.
4.4. A conforming system MUST implement deadlock detection and automatic resolution for pessimistic locking implementations, ensuring that deadlocked agents are released within a defined timeout (maximum 30 seconds).
4.5. A conforming system MUST log all concurrency conflicts — lock contention events, optimistic locking conflicts, deadlock detections, and lock timeout releases — with the agents involved, the resource contended, and the resolution.
4.6. A conforming system MUST prevent agents from bypassing the locking mechanism through direct access to the underlying state store — all access must be mediated through the concurrency control layer.
4.7. A conforming system SHOULD implement resource-level locking rather than system-level locking — locking the specific record, row, or resource being accessed rather than the entire table, collection, or system.
4.8. A conforming system SHOULD monitor concurrency contention rates and alert when contention exceeds defined thresholds (e.g., more than 5% of operations experience lock contention), as high contention indicates architectural issues.
4.9. A conforming system MAY implement conflict-free replicated data types (CRDTs) for state that can be structurally designed to avoid conflicts, eliminating the need for locking in specific use cases.
AI agent concurrency is qualitatively different from traditional software concurrency. Traditional concurrent software follows deterministic code paths — the concurrency bugs are in the code, and they can be found by testing the code. AI agents operate on reasoning — their concurrent behaviour depends on their inputs, their context, and their model's inference at that moment. Two agent instances processing the same type of request may access shared state in different orders, at different speeds, and with different intermediate states, depending on the inputs they receive and the reasoning paths their models take.
This creates concurrency challenges that are both more likely and harder to detect than traditional software concurrency bugs. They are more likely because the agent's behaviour is non-deterministic — the same agent processing the same input may access resources in a different order each time, depending on model inference variability. They are harder to detect because the failure modes are subtle: a lost update does not crash the system; it silently produces incorrect state that may not be detected until reconciliation or audit.
The governed exposure from concurrency failures can be enormous. The double-spend scenario in Example A is not hypothetical — it is the natural result of deploying concurrent payment agents without shared-state governance. The probability of concurrent access to the same balance increases with the number of agents, the frequency of operations, and the latency of balance updates. For an organisation processing 10,000 payments per day across 5 concurrent agent instances, the probability of at least one concurrent balance access per day approaches certainty.
The governance dimension is critical because concurrency failures undermine auditability. When a lost update silently overwrites data, the audit trail may not record the overwritten state — only the final state is visible. The fraud flag that was overwritten in Example B exists in neither the current record nor the audit trail (if the audit records the final state of each write operation). This makes concurrency failures not just operationally harmful but forensically invisible.
The implementation requires a concurrency control layer that mediates all agent access to shared mutable state. The layer enforces locking, detects conflicts, resolves deadlocks, and logs contention events.
Recommended patterns:
Anti-patterns to avoid:
Financial Services. Double-spend prevention is a regulatory requirement, not merely a best practice. Trading systems must enforce that the same collateral or margin is not committed to multiple trades concurrently. Payment systems must enforce that the same balance is not spent by multiple concurrent payment operations. The FCA expects that automated transaction systems have concurrency controls equivalent to those in traditional trading and payment infrastructure.
Crypto/Web3. Blockchain nonce management requires strict serialisation — concurrent transactions from the same address must use sequential nonces. Two agents submitting transactions with the same nonce will cause one to fail. Two agents submitting with different nonces but targeting the same smart contract state may produce unexpected results. The concurrency control layer must serialise blockchain interactions per address.
Healthcare. Concurrent modification of patient records can be patient-safety-critical. Two agents modifying the same medication list concurrently — one adding a new medication, the other removing an interaction — must not produce a state where the new medication is added but the interaction removal is lost (or vice versa). Healthcare deployments should use serialisable transactions for all patient record modifications.
Safety-Critical / CPS. Concurrent control of physical actuators by multiple agents creates safety hazards. Two agents adjusting the same valve, the same motor speed, or the same temperature setpoint must be serialised to prevent oscillation, overshoot, or conflicting commands. Physical safety constraints should be enforced through hardware interlocks in addition to software concurrency control.
Basic Implementation — Shared state access uses database-level locking (e.g., SELECT FOR UPDATE or equivalent). Lock hold durations are bounded by database timeout settings. Deadlock detection relies on the database engine's built-in detection. Concurrency conflicts are handled by the application with retry logic. Logging captures write operations but does not specifically capture concurrency contention events. Granularity is table-level or collection-level.
Intermediate Implementation — A dedicated concurrency control layer mediates all shared state access. Optimistic or pessimistic locking is selected based on workload characteristics. Resource-level locking minimises contention. Lock hold durations are explicitly configured and enforced. Deadlock detection operates within a 30-second maximum timeout. All concurrency conflicts are logged with agent identities, resource details, and resolution. Contention rate monitoring alerts when thresholds are exceeded. Agents cannot bypass the concurrency control layer.
Advanced Implementation — All intermediate capabilities plus: formal verification has confirmed that the concurrency control mechanism is correct (no lost updates, no dirty reads, no phantom reads). CRDTs are used where structurally appropriate to eliminate locking overhead. Adaptive locking adjusts granularity and strategy based on real-time contention patterns. Independent testing has attempted to exploit concurrency windows using timing attacks, concurrent request flooding, and lock manipulation — all attempts were detected and prevented. The system can demonstrate that the outcome of any concurrent execution is equivalent to some serial execution (serialisability proof).
Required artefacts:
Retention requirements:
Access requirements:
Testing AG-165 compliance requires verification that concurrent agent operations produce correct, consistent state under all contention scenarios.
Test 8.1: Read-Modify-Write Atomicity
Test 8.2: Double-Spend Prevention
Test 8.3: Lock Timeout Enforcement
Test 8.4: Deadlock Detection and Resolution
Test 8.5: Bypass Prevention
Test 8.6: Granular Locking Verification
Test 8.7: High-Contention Stress Test
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 9 (Risk Management System) | Supports compliance |
| EU AI Act | Article 15 (Accuracy, Robustness and Cybersecurity) | Direct requirement |
| FCA SYSC | 6.1.1R (Systems and Controls) | Direct requirement |
| SOX | Section 404 (Internal Controls) | Direct requirement |
| PSD2 | Article 73 (Liability for Unauthorised Payment Transactions) | Supports compliance |
| DORA | Article 9 (ICT Risk Management Framework) | Supports compliance |
| ISO 42001 | Clause 8.4 (AI System Operation) | Supports compliance |
| NIST AI RMF | MANAGE 2.2 (Risk Controls) | Supports compliance |
Article 15 requires that high-risk AI systems achieve an appropriate level of accuracy and robustness. Concurrency failures produce inaccurate outcomes — a lost update that changes a customer's risk score, a double-spend that misrepresents an account balance, a phantom read that causes a decision based on stale data. These are accuracy failures caused by inadequate concurrency controls. AG-165 ensures that concurrent agent operations produce accurate outcomes by enforcing data consistency under concurrent access.
SYSC 6.1.1R requires adequate systems and controls. Double-spend from concurrent payment agents is a systems deficiency. Lost updates on customer records are a controls deficiency. The FCA expects that automated systems have concurrency controls that prevent data integrity failures — controls that are at least equivalent to those in traditional financial systems, where database transactions and locking mechanisms have been standard practice for decades.
Section 404 requires effective internal controls over financial reporting. Concurrency failures that produce inaccurate financial data — double-counted revenue, duplicate expenses, inconsistent balances — are control deficiencies. If AI agents produce financial data that is inaccurate due to concurrency bugs, the internal controls are not effective. AG-165's concurrency governance prevents data integrity failures that would constitute Section 404 control deficiencies.
A double-spend caused by concurrent payment agents creates an unauthorised payment for the duplicate amount. Article 73 places liability on the payment service provider for unauthorised payments. AG-165 prevents the concurrency condition that would create the duplicate payment, avoiding the liability.
| Field | Value |
|---|---|
| Severity Rating | High |
| Blast Radius | Proportional to the number of concurrent agents and the criticality of shared state — highest in multi-agent financial operations |
Consequence chain: Concurrency failures produce incorrect state that may not be immediately detectable. The immediate impact of a lost update is data corruption — the overwritten data is gone, and the current state does not reflect all intended modifications. For financial data, this means incorrect balances, incorrect positions, and incorrect risk calculations. The detection challenge is that concurrency bugs are often intermittent — they occur only when specific timing conditions align, which may happen once in 1,000 operations or once in 100,000. The bug exists in every operation but manifests only under specific concurrent timing. This makes testing challenging and production detection delayed. The cumulative impact of undetected concurrency failures compounds over time: each lost update, each double-spend, each stale-read decision adds a small error. Over months, these errors accumulate into material discrepancies that are discovered during reconciliation, audit, or incident investigation. The remediation cost includes forensic analysis to identify all affected operations (which may require replaying the entire operation history), correction of affected records, financial remediation for affected parties, and architectural remediation to prevent recurrence. For the double-spend scenario, the governed exposure is immediate and direct — exceeding the available balance creates a liability. For regulatory purposes, concurrency failures in financial systems demonstrate inadequate systems and controls, potentially triggering enforcement action.