AG-165: Concurrency and Shared-State Locking Governance

2. Summary

Concurrency and Shared-State Locking Governance requires that when multiple AI agents — or multiple instances of the same agent — access shared state (databases, configuration stores, resource pools, queues, or any mutable data), the system enforces explicit locking, serialisation, or conflict resolution mechanisms that prevent race conditions, lost updates, phantom reads, and state corruption. AI agents operating concurrently without shared-state governance can produce outcomes that no individual agent intended: double-spending from a shared balance, conflicting updates to the same record, or decisions based on stale state that another agent has already changed. This dimension ensures that concurrent agent operations are correct, consistent, and auditable — that the outcome of concurrent execution is equivalent to some serial execution of the same operations.

3. Example

Scenario A — Double-Spend from Shared Account Balance: Two payment agents (Agent-Alpha and Agent-Beta) share access to a corporate payment account with a balance of £500,000. Agent-Alpha checks the balance (£500,000), determines it can process a £350,000 payment, and begins the payment submission. Agent-Beta, concurrently, checks the balance (still £500,000 — Agent-Alpha's payment has not yet settled), determines it can process a £280,000 payment, and begins its own submission. Both payments execute successfully. The total disbursement is £630,000 against a balance of £500,000, creating a £130,000 overdraft. The account has no overdraft facility. The bank bounces the second payment 48 hours later, after the recipient has already credited the sender.

What went wrong: Both agents read the account balance without acquiring a lock. The check-and-spend operation was not atomic. The time between reading the balance and executing the payment created a window during which the balance was stale. Neither agent was aware of the other's in-flight transaction. Consequence: £130,000 overdraft exposure. Bounced payment damages supplier relationship. Bank charges for unauthorised overdraft. FCA inquiry into automated payment controls.

Scenario B — Lost Update on Customer Record: A customer service agent and a fraud detection agent both update the same customer record concurrently. The customer service agent reads the record, modifies the contact email (customer request), and writes the updated record. The fraud detection agent, which read the record 200 milliseconds before the customer service agent's write, adds a fraud flag to the record and writes its version. The fraud detection agent's write overwrites the customer service agent's update because it was based on the pre-update version of the record. The customer's email change is lost. The customer continues receiving communications at the old email address, including sensitive financial documents. The fraud flag is present, but the customer's requested update is silently discarded.

What went wrong: Both agents used a read-modify-write pattern without optimistic or pessimistic locking. The second write overwrote the first write entirely because the system had no mechanism to detect that the record had changed between the read and the write. Consequence: Customer data integrity failure. Sensitive financial documents sent to an email address the customer has disowned. GDPR breach notification potential. Customer complaint to the Financial Ombudsman.

Scenario C — Optimistic Locking Prevents Conflicting Updates: Two agents attempt to update the same inventory record. Each reads the record, which includes a version number (version 47). Agent-A modifies the quantity and submits the update with version 47. The system accepts the update and increments the version to 48. Agent-B, which also read version 47, submits its update referencing version 47. The system detects that the current version is 48, not 47 — the record has changed since Agent-B read it. The system rejects Agent-B's update with a conflict response. Agent-B re-reads the record (now version 48 with Agent-A's changes), applies its own modification to the updated data, and resubmits with version 48. The update succeeds. Both agents' changes are preserved.

What went right: Optimistic locking detected the concurrent modification. The conflict was resolved by requiring the second writer to re-read and re-apply changes. No data was lost. The final state reflects both agents' intended modifications.

4. Requirement Statement

Scope: This dimension applies to every deployment where multiple AI agents — or multiple instances of the same agent — can access the same mutable state simultaneously. Mutable state includes: database records, configuration files, resource pools (account balances, inventory counts, capacity allocations), message queues, shared caches, file system objects, and any other data that can be read and modified by more than one agent concurrently. The scope extends to implicit shared state: two agents that do not directly share a database but both affect the same downstream system (e.g., both submit payments from the same account) are accessing shared state (the account balance) and require concurrency governance. Single-agent deployments with no concurrent access to mutable state are excluded, but organisations should assess whether the deployment could evolve to multi-agent or multi-instance operation.

4.1. A conforming system MUST implement explicit concurrency control for all shared mutable state accessed by agents, using pessimistic locking, optimistic locking with conflict detection, or serialisable transaction isolation.

4.2. A conforming system MUST ensure that read-modify-write operations on shared state are atomic — the state cannot change between the read and the write without detection and resolution.

4.3. A conforming system MUST enforce a maximum lock hold duration for pessimistic locks, after which the lock is automatically released and the holding agent's pending operation is rolled back.

4.4. A conforming system MUST implement deadlock detection and automatic resolution for pessimistic locking implementations, ensuring that deadlocked agents are released within a defined timeout (maximum 30 seconds).

4.5. A conforming system MUST log all concurrency conflicts — lock contention events, optimistic locking conflicts, deadlock detections, and lock timeout releases — with the agents involved, the resource contended, and the resolution.

4.6. A conforming system MUST prevent agents from bypassing the locking mechanism through direct access to the underlying state store — all access must be mediated through the concurrency control layer.

4.7. A conforming system SHOULD implement resource-level locking rather than system-level locking — locking the specific record, row, or resource being accessed rather than the entire table, collection, or system.

4.8. A conforming system SHOULD monitor concurrency contention rates and alert when contention exceeds defined thresholds (e.g., more than 5% of operations experience lock contention), as high contention indicates architectural issues.

4.9. A conforming system MAY implement conflict-free replicated data types (CRDTs) for state that can be structurally designed to avoid conflicts, eliminating the need for locking in specific use cases.

5. Rationale

AI agent concurrency is qualitatively different from traditional software concurrency. Traditional concurrent software follows deterministic code paths — the concurrency bugs are in the code, and they can be found by testing the code. AI agents operate on reasoning — their concurrent behaviour depends on their inputs, their context, and their model's inference at that moment. Two agent instances processing the same type of request may access shared state in different orders, at different speeds, and with different intermediate states, depending on the inputs they receive and the reasoning paths their models take.

This creates concurrency challenges that are both more likely and harder to detect than traditional software concurrency bugs. They are more likely because the agent's behaviour is non-deterministic — the same agent processing the same input may access resources in a different order each time, depending on model inference variability. They are harder to detect because the failure modes are subtle: a lost update does not crash the system; it silently produces incorrect state that may not be detected until reconciliation or audit.

The governed exposure from concurrency failures can be enormous. The double-spend scenario in Example A is not hypothetical — it is the natural result of deploying concurrent payment agents without shared-state governance. The probability of concurrent access to the same balance increases with the number of agents, the frequency of operations, and the latency of balance updates. For an organisation processing 10,000 payments per day across 5 concurrent agent instances, the probability of at least one concurrent balance access per day approaches certainty.

The governance dimension is critical because concurrency failures undermine auditability. When a lost update silently overwrites data, the audit trail may not record the overwritten state — only the final state is visible. The fraud flag that was overwritten in Example B exists in neither the current record nor the audit trail (if the audit records the final state of each write operation). This makes concurrency failures not just operationally harmful but forensically invisible.

6. Implementation Guidance

The implementation requires a concurrency control layer that mediates all agent access to shared mutable state. The layer enforces locking, detects conflicts, resolves deadlocks, and logs contention events.

Recommended patterns:

Optimistic concurrency control (OCC) for read-heavy workloads. Each shared state record carries a version number or ETag. Agents read the record and its version. When writing, the agent includes the version it read. The system checks whether the current version matches the submitted version. If it matches, the write proceeds and the version increments. If it does not match, the write is rejected with a conflict response. The agent re-reads, re-applies its changes to the current data, and retries. OCC is efficient when conflicts are infrequent (read-heavy workloads) because it avoids lock acquisition overhead. It is less efficient when conflicts are frequent because each conflict requires a re-read and retry cycle.
Pessimistic locking for write-heavy or high-contention workloads. When an agent needs to modify shared state, it acquires a lock on the specific resource before reading. The lock prevents other agents from reading or modifying the resource until the lock is released. The agent reads, modifies, writes, and releases the lock as a single operation. Pessimistic locking guarantees that no conflict can occur (only one agent accesses the resource at a time) but introduces contention: other agents must wait for the lock. Lock hold durations must be bounded (requirement 4.3) to prevent a crashed or slow agent from blocking all access to a resource indefinitely.
Serialisable transaction isolation for complex multi-record operations. When an agent's operation involves reading and writing multiple related records (e.g., debiting one account and crediting another), the entire operation should execute within a serialisable transaction. The database guarantees that the transaction's effect is consistent with some serial execution, even if other transactions execute concurrently. This is the strongest isolation level and the most correct for complex financial operations. It comes at a performance cost — increased lock contention and potential for deadlocks — which must be managed through careful transaction scope design and deadlock detection.
Resource-level granular locking. Lock the specific resource being accessed (a specific account record, a specific inventory item, a specific customer record) rather than the entire table or collection. Granular locking reduces contention: two agents updating different customer records can proceed concurrently because they lock different resources. Table-level locking would force them to serialize. The locking granularity should match the access pattern — if agents typically access individual records, row-level locks are appropriate; if agents access related groups of records, lock the group.

Anti-patterns to avoid:

No concurrency control ("last write wins"). Allowing agents to read and write shared state without any locking or version checking. The last agent to write overwrites all previous writes, silently losing data. This is the default behaviour of many data stores and must be explicitly overridden.
Application-level locking without enforcement. Implementing locking in the agent application code without enforcement at the data layer. An agent that acquires a lock in its own code but writes directly to the database without the database enforcing the lock provides no protection — another agent (or another instance of the same agent) can bypass the application-level lock.
Unbounded lock hold durations. Allowing locks to be held indefinitely. If an agent crashes while holding a lock, the resource is blocked from all access until the lock is manually released. Lock hold durations must be bounded, with automatic release and rollback on timeout.
System-wide locking granularity. Locking the entire database, entire table, or entire collection when only a single record is being modified. This serialises all agent operations unnecessarily, creating a bottleneck that defeats the purpose of concurrent agent deployment.
Ignoring deadlocks. In pessimistic locking, Agent-A may hold Lock 1 and wait for Lock 2, while Agent-B holds Lock 2 and waits for Lock 1. Without deadlock detection, both agents wait indefinitely. Deadlock detection and automatic resolution (e.g., aborting one of the deadlocked transactions) is essential.

Industry Considerations

Financial Services. Double-spend prevention is a regulatory requirement, not merely a best practice. Trading systems must enforce that the same collateral or margin is not committed to multiple trades concurrently. Payment systems must enforce that the same balance is not spent by multiple concurrent payment operations. The FCA expects that automated transaction systems have concurrency controls equivalent to those in traditional trading and payment infrastructure.

Crypto/Web3. Blockchain nonce management requires strict serialisation — concurrent transactions from the same address must use sequential nonces. Two agents submitting transactions with the same nonce will cause one to fail. Two agents submitting with different nonces but targeting the same smart contract state may produce unexpected results. The concurrency control layer must serialise blockchain interactions per address.

Healthcare. Concurrent modification of patient records can be patient-safety-critical. Two agents modifying the same medication list concurrently — one adding a new medication, the other removing an interaction — must not produce a state where the new medication is added but the interaction removal is lost (or vice versa). Healthcare deployments should use serialisable transactions for all patient record modifications.

Safety-Critical / CPS. Concurrent control of physical actuators by multiple agents creates safety hazards. Two agents adjusting the same valve, the same motor speed, or the same temperature setpoint must be serialised to prevent oscillation, overshoot, or conflicting commands. Physical safety constraints should be enforced through hardware interlocks in addition to software concurrency control.

Maturity Model

Basic Implementation — Shared state access uses database-level locking (e.g., SELECT FOR UPDATE or equivalent). Lock hold durations are bounded by database timeout settings. Deadlock detection relies on the database engine's built-in detection. Concurrency conflicts are handled by the application with retry logic. Logging captures write operations but does not specifically capture concurrency contention events. Granularity is table-level or collection-level.

Intermediate Implementation — A dedicated concurrency control layer mediates all shared state access. Optimistic or pessimistic locking is selected based on workload characteristics. Resource-level locking minimises contention. Lock hold durations are explicitly configured and enforced. Deadlock detection operates within a 30-second maximum timeout. All concurrency conflicts are logged with agent identities, resource details, and resolution. Contention rate monitoring alerts when thresholds are exceeded. Agents cannot bypass the concurrency control layer.

Advanced Implementation — All intermediate capabilities plus: formal verification has confirmed that the concurrency control mechanism is correct (no lost updates, no dirty reads, no phantom reads). CRDTs are used where structurally appropriate to eliminate locking overhead. Adaptive locking adjusts granularity and strategy based on real-time contention patterns. Independent testing has attempted to exploit concurrency windows using timing attacks, concurrent request flooding, and lock manipulation — all attempts were detected and prevented. The system can demonstrate that the outcome of any concurrent execution is equivalent to some serial execution (serialisability proof).

7. Evidence Requirements

Required artefacts:

Concurrency architecture documentation. Architecture showing the concurrency control layer, its position in the data access path, the locking strategy used, and the enforcement mechanism preventing bypass.
Concurrency conflict logs. Records of all concurrency contention events: lock waits, optimistic locking conflicts, deadlock detections, and lock timeout releases. Including: agents involved, resource contended, timestamps, and resolution. Minimum 12 months retention.
Contention rate monitoring records. Continuous monitoring of concurrency contention rates across all shared state, with alerting thresholds and historical trends.
Concurrency correctness test results. Results from concurrent access testing demonstrating that no data is lost, no inconsistent state is produced, and all conflicts are detected and resolved.

Retention requirements:

Concurrency conflict logs: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators or auditors within 48 hours of request.

8. Test Specification

Testing AG-165 compliance requires verification that concurrent agent operations produce correct, consistent state under all contention scenarios.

Test 8.1: Read-Modify-Write Atomicity

Stimulus: Two agents concurrently read the same record (initial value: 100), each add 50, and write the result. Without concurrency control, both would write 150 (lost update). With concurrency control, the final value should be 200.
Expected behaviour: The concurrency control mechanism detects the conflict. One write succeeds. The other is either blocked (pessimistic) or rejected and retried (optimistic). The final value is 200.
Pass criteria: The final value is 200 (both updates applied). No update is lost.
Fail criteria: The final value is 150 (one update lost) or any other value other than 200.

Test 8.2: Double-Spend Prevention

Stimulus: A shared balance of £100,000. Two agents concurrently attempt to spend £80,000 each. Only one should succeed (total spend cannot exceed balance).
Expected behaviour: One agent's payment succeeds (balance becomes £20,000). The other agent's payment is blocked because the balance is insufficient after the first payment.
Pass criteria: Total spend does not exceed the initial balance. Exactly one payment succeeds.
Fail criteria: Both payments succeed, creating an overdraft. Total spend exceeds £100,000.

Test 8.3: Lock Timeout Enforcement

Stimulus: Agent-A acquires a lock on a resource. Simulate Agent-A crash (the lock is not explicitly released). Measure the time until the lock is automatically released and another agent can access the resource.
Expected behaviour: The lock is automatically released after the configured maximum hold duration. Agent-B, waiting for the lock, can proceed after the timeout.
Pass criteria: The lock is released within the configured timeout plus a 5-second tolerance. Agent-B can access the resource after the timeout.
Fail criteria: The lock is not released, or Agent-B cannot access the resource after the timeout.

Test 8.4: Deadlock Detection and Resolution

Stimulus: Create a deadlock: Agent-A locks Resource 1 and requests Resource 2. Agent-B locks Resource 2 and requests Resource 1. Measure the time until the deadlock is detected and resolved.
Expected behaviour: The system detects the deadlock within 30 seconds. One of the deadlocked agents' operations is rolled back. The other proceeds. The rolled-back agent can retry.
Pass criteria: Deadlock is detected and resolved within 30 seconds. No indefinite wait occurs. The system remains operational.
Fail criteria: The deadlock is not detected, or the wait exceeds 30 seconds, or the system becomes unresponsive.

Test 8.5: Bypass Prevention

Stimulus: Attempt to access shared state directly (bypassing the concurrency control layer) — through direct database connection, direct file access, or alternative API path.
Expected behaviour: Direct access is blocked. All access to shared state must pass through the concurrency control layer.
Pass criteria: No bypass path exists. All access attempts without proper concurrency control mediation are rejected.
Fail criteria: Any bypass path allows uncontrolled access to shared state.

Test 8.6: Granular Locking Verification

Stimulus: Two agents access different records in the same table concurrently. Verify that they proceed without contention (resource-level locking, not table-level).
Expected behaviour: Both agents proceed concurrently without waiting for each other. No lock contention occurs for unrelated records.
Pass criteria: Concurrent access to different records proceeds without serialisation. Throughput is proportional to the number of concurrent agents (not capped at 1).
Fail criteria: Access to different records is serialised due to overly broad locking granularity.

Test 8.7: High-Contention Stress Test

Stimulus: 50 agents concurrently access the same shared resource, each performing a read-modify-write operation. The initial value is 0, and each agent adds 1. The expected final value is 50.
Expected behaviour: All 50 operations complete (via locking, conflict detection, and retry). The final value is exactly 50.
Pass criteria: The final value is exactly 50. No updates are lost. All operations complete within a reasonable timeframe (e.g., 60 seconds).
Fail criteria: The final value is less than 50 (lost updates) or greater than 50 (phantom writes), or operations do not complete within the timeout.

Conformance Scoring

Score 0: No concurrency governance — agents access shared state without locking, version control, or conflict detection. "Last write wins" is the default behaviour.
Score 1: Basic database-level locking exists but is not consistently applied across all shared state access. Some operations bypass locking. Deadlock detection relies on database defaults. Concurrency conflicts are not logged.
Score 2: A dedicated concurrency control layer mediates all shared state access with appropriate locking strategies. Lock timeouts, deadlock detection, bypass prevention, and conflict logging are all operational. Resource-level granularity minimises contention.
Score 3: All Score 2 capabilities plus formal verification of concurrency correctness, adaptive locking strategies, CRDT usage where appropriate, independently tested under adversarial concurrency conditions, and serialisability proof for all critical operations.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9 (Risk Management System)	Supports compliance
EU AI Act	Article 15 (Accuracy, Robustness and Cybersecurity)	Direct requirement
FCA SYSC	6.1.1R (Systems and Controls)	Direct requirement
SOX	Section 404 (Internal Controls)	Direct requirement
PSD2	Article 73 (Liability for Unauthorised Payment Transactions)	Supports compliance
DORA	Article 9 (ICT Risk Management Framework)	Supports compliance
ISO 42001	Clause 8.4 (AI System Operation)	Supports compliance
NIST AI RMF	MANAGE 2.2 (Risk Controls)	Supports compliance

EU AI Act — Article 15 (Accuracy, Robustness and Cybersecurity)

Article 15 requires that high-risk AI systems achieve an appropriate level of accuracy and robustness. Concurrency failures produce inaccurate outcomes — a lost update that changes a customer's risk score, a double-spend that misrepresents an account balance, a phantom read that causes a decision based on stale data. These are accuracy failures caused by inadequate concurrency controls. AG-165 ensures that concurrent agent operations produce accurate outcomes by enforcing data consistency under concurrent access.

FCA SYSC — 6.1.1R (Systems and Controls)

SYSC 6.1.1R requires adequate systems and controls. Double-spend from concurrent payment agents is a systems deficiency. Lost updates on customer records are a controls deficiency. The FCA expects that automated systems have concurrency controls that prevent data integrity failures — controls that are at least equivalent to those in traditional financial systems, where database transactions and locking mechanisms have been standard practice for decades.

SOX — Section 404 (Internal Controls)

Section 404 requires effective internal controls over financial reporting. Concurrency failures that produce inaccurate financial data — double-counted revenue, duplicate expenses, inconsistent balances — are control deficiencies. If AI agents produce financial data that is inaccurate due to concurrency bugs, the internal controls are not effective. AG-165's concurrency governance prevents data integrity failures that would constitute Section 404 control deficiencies.

PSD2 — Article 73

A double-spend caused by concurrent payment agents creates an unauthorised payment for the duplicate amount. Article 73 places liability on the payment service provider for unauthorised payments. AG-165 prevents the concurrency condition that would create the duplicate payment, avoiding the liability.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Proportional to the number of concurrent agents and the criticality of shared state — highest in multi-agent financial operations

Consequence chain: Concurrency failures produce incorrect state that may not be immediately detectable. The immediate impact of a lost update is data corruption — the overwritten data is gone, and the current state does not reflect all intended modifications. For financial data, this means incorrect balances, incorrect positions, and incorrect risk calculations. The detection challenge is that concurrency bugs are often intermittent — they occur only when specific timing conditions align, which may happen once in 1,000 operations or once in 100,000. The bug exists in every operation but manifests only under specific concurrent timing. This makes testing challenging and production detection delayed. The cumulative impact of undetected concurrency failures compounds over time: each lost update, each double-spend, each stale-read decision adds a small error. Over months, these errors accumulate into material discrepancies that are discovered during reconciliation, audit, or incident investigation. The remediation cost includes forensic analysis to identify all affected operations (which may require replaying the entire operation history), correction of affected records, financial remediation for affected parties, and architectural remediation to prevent recurrence. For the double-spend scenario, the governed exposure is immediate and direct — exceeding the available balance creates a liability. For regulatory purposes, concurrency failures in financial systems demonstrate inadequate systems and controls, potentially triggering enforcement action.

Cite this protocol

AgentGoverning. (2026). AG-165: Concurrency and Shared-State Locking Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-165

← Previous Protocol

AG-164

Idempotency and Duplicate Action Suppression Governance

Next Protocol →

AG-166

Distributed Workflow Atomicity and Compensating-Action Governance