AG-397: Multi-Agent Population Diversity Governance

2. Summary

Multi-Agent Population Diversity Governance requires that organisations operating large agent collectives continuously measure and enforce minimum diversity thresholds across behavioural strategies, model architectures, training lineages, and decision heuristics within those collectives. When agents in a swarm converge on identical reasoning patterns, shared failure modes, or homogeneous strategies, the collective loses its resilience advantage and becomes vulnerable to correlated failures that cascade at machine speed across every participant simultaneously. AG-397 mandates formal diversity baselines, real-time convergence detection, and automatic intervention mechanisms that restore heterogeneity before a monoculture collapse can propagate beyond the swarm boundary.

3. Example

Scenario A — Monoculture Flash Crash in Algorithmic Trading Swarm: A quantitative hedge fund deploys a swarm of 240 trading agents across equities, futures, and options desks. Each agent uses a different initial parameterisation, but all share the same underlying transformer architecture and training corpus. Over six weeks, reinforcement learning from shared market signals causes all 240 agents to converge on a nearly identical mean-reversion strategy concentrated in mid-cap technology stocks. When a surprise earnings miss from a major semiconductor firm triggers a 4.2% sector decline, all 240 agents simultaneously execute sell orders totalling £380 million in notional value within 1.7 seconds. The concentrated selling amplifies the decline to 11.6%, triggering exchange circuit breakers. The fund suffers £47 million in realised losses before positions can be unwound, plus a further £23 million in market-impact costs during the unwind.

What went wrong: No diversity monitoring existed to detect that 240 ostensibly independent agents had converged to a single strategy. The initial parameterisation diversity was cosmetic — different starting weights but identical architecture and training data produced identical emergent behaviour under sustained market reinforcement. The fund treated agent count as a proxy for strategy diversity without measuring actual behavioural correlation. Consequence: £70 million in direct losses, FCA investigation under MAR Article 12 for potential market manipulation through coordinated algorithmic activity, suspension of the firm's algorithmic trading permissions pending remediation, and personal liability proceedings against the CTO under the Senior Managers Regime.

Scenario B — Homogeneous Content Moderation Swarm Creates Systematic Bias: A social media platform deploys a swarm of 1,800 content moderation agents across 14 language regions. The agents are initialised from three different base models but fine-tuned on a single shared moderation policy dataset. Over four months, cross-agent knowledge sharing causes all agents to converge on identical classification boundaries. The converged swarm systematically under-moderates politically motivated harassment when phrased as rhetorical questions — a pattern the shared training data did not adequately cover. A civil rights organisation publishes a report documenting 34,000 unmoderated harassment instances over 90 days, all following the same rhetorical question pattern. The platform faces enforcement action from the EU Digital Services Act coordinator, with potential fines of up to 6% of global annual turnover — estimated at €2.1 billion.

What went wrong: Fine-tuning on a shared dataset erased the initial model diversity. No measurement tracked whether the 1,800 agents maintained distinct classification boundaries or had converged to identical behaviour. The cross-agent knowledge sharing mechanism — intended to improve consistency — actively destroyed diversity by propagating the dominant classification pattern to all agents. The platform assumed that three different base models guaranteed three different failure modes, without measuring whether fine-tuning had eliminated that diversity. Consequence: €2.1 billion fine exposure, mandatory independent audit, 90-day remediation deadline, reputational damage in testimony before the European Parliament.

Scenario C — Robotic Warehouse Swarm Converges on Pathologically Efficient Route: A logistics company operates 600 autonomous picking robots in a 200,000-square-foot fulfilment centre. The robots use decentralised coordination with periodic strategy sharing. Over three weeks, reinforcement optimisation causes all robots to converge on an identical shortest-path algorithm that routes 78% of traffic through a single central corridor. During a peak holiday period, the corridor becomes gridlocked. Robots detect the congestion but, having all converged to the same re-routing heuristic, simultaneously redirect to the same secondary corridor, creating a cascading deadlock. The entire fulfilment centre halts for 4 hours and 22 minutes during the highest-volume shipping day of the year. The company loses approximately $8.4 million in delayed shipments, contractual SLA penalties, and emergency manual labour costs.

What went wrong: Strategy sharing was designed to propagate efficiency gains but had no mechanism to preserve route diversity. No metric tracked the distribution of routing strategies across the swarm. The optimisation pressure favoured a single globally optimal route under normal load, but the homogeneous strategy created a brittle system that failed catastrophically under peak load because every robot made the same decision simultaneously. Consequence: $8.4 million in direct losses, breach of contractual SLA with 23 enterprise customers, loss of two major fulfilment contracts worth $31 million annually, OSHA investigation into whether the deadlock created worker safety hazards during manual intervention.

4. Requirement Statement

Scope: This dimension applies to any deployment of three or more AI agents that operate as a collective — sharing an environment, exchanging information, coordinating actions, or competing within a shared market — where the agents' combined behaviour could produce correlated outcomes. The scope includes swarms, fleets, multi-agent reinforcement learning collectives, agent marketplaces, federated agent networks, and any topology where agents influence each other's behaviour through direct communication, shared environment modification, or indirect signalling. A collective of agents that cannot observe or influence each other is excluded. The scope explicitly includes collectives where agents are nominally independent but subject to convergence pressure through shared training data, shared model architectures, shared reward signals, shared environmental observations, or shared knowledge-transfer mechanisms. The test is whether the agents' failure modes can become correlated — not whether they were designed to coordinate.

4.1. A conforming system MUST define and maintain a formal diversity baseline for every agent collective, specifying minimum acceptable diversity thresholds across at least: behavioural strategy distribution, model architecture or lineage composition, decision boundary variance, and failure mode correlation.

4.2. A conforming system MUST continuously measure diversity across every agent collective using quantitative metrics that capture actual behavioural divergence — not merely nominal differences in configuration, parameterisation, or model identifier.

4.3. A conforming system MUST generate an alert when any measured diversity metric falls below its defined threshold, including identification of the converging dimension, the current metric value, the threshold value, and the rate of convergence.

4.4. A conforming system MUST implement at least one automatic intervention mechanism that activates when diversity thresholds are breached, capable of restoring diversity without requiring full collective shutdown — such as injecting strategy perturbation, removing converged agents from the active pool, or throttling knowledge-sharing channels.

4.5. A conforming system MUST log all diversity measurements, threshold breaches, and intervention actions in a tamper-evident record per AG-006.

4.6. A conforming system MUST conduct a diversity impact assessment before activating any mechanism that shares knowledge, weights, strategies, or reward signals across agents in a collective.

4.7. A conforming system SHOULD measure pairwise behavioural correlation across agents in the collective at defined intervals, flagging agent pairs whose correlation coefficient exceeds 0.85 across any measured dimension.

4.8. A conforming system SHOULD implement diversity-preserving constraints within any cross-agent learning or knowledge-sharing protocol, ensuring that convergence beyond defined thresholds is structurally prevented rather than merely monitored.

4.9. A conforming system SHOULD maintain a historical record of diversity metrics over time for each collective, enabling trend analysis and early detection of gradual convergence that has not yet breached a threshold.

4.10. A conforming system MAY implement adversarial diversity injection — periodically introducing agents with deliberately divergent strategies into the collective to stress-test the resilience of the diversity baseline.

4.11. A conforming system MAY implement real-time diversity dashboards accessible to governance oversight personnel, displaying current diversity metrics, trend lines, and threshold proximity for all active collectives.

5. Rationale

The fundamental value proposition of multi-agent systems is that a collective of diverse agents can outperform any single agent by combining different strategies, perspectives, and failure modes. A swarm of trading agents with diverse strategies provides portfolio diversification. A fleet of content moderation agents with diverse classification boundaries catches a broader range of harmful content. A collective of robotic agents with diverse routing heuristics maintains throughput under variable load. But this value proposition holds only if the diversity is real and maintained over time.

The critical risk that AG-397 addresses is convergent homogeneity — the tendency of agent collectives to lose diversity over time through shared learning, shared environmental pressure, or shared optimisation objectives. This is not a hypothetical risk. In biological systems, monocultures are well-documented sources of catastrophic failure: the Irish Potato Famine (genetic monoculture), colony collapse disorder in honeybees (behavioural monoculture), and flash crashes in financial markets (algorithmic monoculture) all demonstrate the pattern. When every member of a collective responds identically to the same stimulus, the collective amplifies individual failure rather than absorbing it.

In multi-agent AI systems, convergence pressure comes from multiple sources. Shared training data creates correlated learned representations. Shared model architectures create correlated failure modes even with different training data. Shared reward signals drive agents toward identical optimal strategies. Knowledge-sharing mechanisms — designed to propagate beneficial innovations — simultaneously propagate dominant strategies that crowd out minority approaches. Reinforcement learning from shared environmental feedback creates a convergence ratchet: agents that adopt the currently dominant strategy receive higher rewards, further reinforcing convergence. Without active diversity monitoring and intervention, these pressures operate continuously and monotonically — diversity decreases over time, never spontaneously increases.

The regulatory landscape increasingly recognises systemic risk from algorithmic homogeneity. The EU AI Act's risk management requirements extend to collective behaviour. The Bank of England's Financial Policy Committee has published research on the systemic risk of algorithmic monoculture in financial markets. The FCA's expectations under MAR regarding market manipulation explicitly cover coordinated algorithmic activity — even when the coordination is emergent rather than designed. DORA's ICT risk management framework requires financial entities to assess concentration risk in technology systems, which includes concentration of algorithmic strategies.

The failure mode is particularly dangerous because it is invisible to conventional monitoring. Each individual agent appears to operate correctly within its mandate — per-agent monitoring under AG-001 and AG-022 shows no anomaly. The pathology exists only at the collective level: what was a diversified portfolio of strategies has silently become a concentrated bet. The damage manifests not as individual agent failure but as correlated collective failure — all agents making the same mistake simultaneously, amplifying the impact by the size of the collective.

6. Implementation Guidance

AG-397 establishes the concept of diversity as a measurable, governable property of agent collectives. Diversity is not a binary attribute — it is a continuous spectrum measured across multiple dimensions. An organisation must define which dimensions of diversity matter for each collective, establish quantitative metrics for each dimension, set thresholds that represent minimum acceptable diversity, and implement monitoring and intervention mechanisms that maintain diversity above those thresholds over time.

Recommended patterns:

Behavioural fingerprint divergence scoring. Assign each agent in the collective a behavioural fingerprint — a vector representation of its recent actions, decisions, or outputs. Compute pairwise divergence scores across all agents using a suitable distance metric (cosine distance, Jensen-Shannon divergence, or Wasserstein distance depending on the representation). Track the mean and minimum pairwise divergence. A declining mean divergence indicates convergence; a minimum divergence approaching zero indicates agent pairs that are functionally identical.
Strategy distribution entropy monitoring. Classify each agent's current operating strategy into a discrete taxonomy (e.g., for trading agents: mean reversion, momentum, statistical arbitrage, market making). Compute the Shannon entropy of the strategy distribution across the collective. Maximum entropy occurs when strategies are uniformly distributed; minimum entropy occurs when all agents adopt the same strategy. Set the diversity threshold as a fraction of maximum entropy — typically 0.6 to 0.8 depending on the domain.
Architecture and lineage diversity registers. Maintain a register mapping each agent to its model architecture, training lineage, fine-tuning dataset, and knowledge-sharing group. Compute a Herfindahl-Hirschman Index (HHI) across each dimension. An HHI above 2,500 (on the standard 10,000-point scale) indicates dangerous concentration.
Automatic perturbation injection. When diversity metrics breach thresholds, automatically inject strategy perturbation into a subset of agents — modifying decision parameters, introducing noise into reward signals, or swapping agents into a different knowledge-sharing cohort. This restores diversity without requiring full collective shutdown.
Knowledge-sharing rate limiting. Implement bandwidth limits on cross-agent knowledge sharing that prevent a single dominant strategy from propagating to the entire collective within a single update cycle. Ensure that at least 20% of the collective is always operating on independent information not yet influenced by the most recent shared update.

Anti-patterns to avoid:

Treating nominal diversity as actual diversity. Deploying agents with different model identifiers, version numbers, or configuration labels does not guarantee behavioural diversity. Two agents with different names but the same architecture trained on the same data will converge to identical behaviour. Diversity must be measured in behaviour, not in labels.
Optimising for collective performance without a diversity constraint. Unconstrained optimisation of collective performance will always converge to a monoculture — the single best strategy adopted by all agents. Without an explicit diversity constraint in the optimisation objective, performance optimisation is convergence acceleration.
Measuring diversity only at deployment time. Initial diversity decays over time under convergence pressure. A collective that was diverse at deployment may be a monoculture within weeks. Diversity must be measured continuously, not as a one-time deployment gate.
Using agent count as a proxy for diversity. A swarm of 10,000 agents that all behave identically has less effective diversity than a swarm of 5 agents with genuinely different strategies. The number of agents is not a diversity metric.
Sharing rewards across all agents simultaneously. Broadcasting the same reward signal to all agents in a collective creates the strongest convergence pressure. If rewards must be shared, stagger the sharing across cohorts to preserve diversity between cohorts during each update cycle.

Industry Considerations

Financial Services. Strategy diversity is a systemic stability requirement, not merely a portfolio optimisation preference. Regulators including the Bank of England, ESMA, and the SEC have published research on the systemic risk of algorithmic monoculture. Firms should map diversity metrics to existing concentration risk frameworks. Trading agent collectives should maintain strategy diversity sufficient to prevent coordinated selling or buying that could trigger circuit breakers. The FCA expects firms to demonstrate that their algorithmic trading systems do not create or amplify market instability — proof of maintained strategy diversity is a key part of that demonstration.

Content Moderation. Classification diversity ensures that the collective catches content types that any single model's training data might miss. The EU Digital Services Act requires platforms to demonstrate that their moderation systems are effective across categories of illegal content — a monoculture moderation swarm with a shared blind spot fails this requirement systematically. Diversity should be measured across content categories, language coverage, cultural context sensitivity, and adversarial evasion resistance.

Robotics and CPS. Route, strategy, and heuristic diversity prevents the correlated deadlock patterns observed when all robots make the same environmental response simultaneously. ISO 12100 safety principles require that control systems avoid common-cause failures — a swarm of robots with identical decision algorithms represents a common-cause failure risk that safety assessments must address.

Maturity Model

Basic Implementation — The organisation catalogues the model architectures, training lineages, and configuration parameters of each agent in every collective. A diversity register exists. Manual review occurs at deployment time to verify that the collective includes at least two distinct model architectures or training lineages. No continuous monitoring exists. Diversity is a deployment gate, not a runtime control.

Intermediate Implementation — Quantitative diversity metrics (behavioural fingerprint divergence, strategy distribution entropy, architecture HHI) are computed at defined intervals — at minimum daily for active collectives. Alerts fire when any metric breaches its defined threshold. A documented response procedure exists for diversity threshold breaches, including escalation to human governance oversight. Historical diversity trends are retained and reviewed quarterly. Knowledge-sharing mechanisms include basic rate limiting to slow convergence.

Advanced Implementation — All intermediate capabilities plus: diversity metrics are computed in real time and displayed on governance dashboards. Automatic intervention mechanisms (perturbation injection, agent pool rotation, knowledge-sharing throttling) activate without human intervention when thresholds are breached. Adversarial diversity stress tests are conducted quarterly — deliberately applying convergence pressure to verify that monitoring and intervention mechanisms detect and correct the convergence before it reaches dangerous levels. The organisation maintains a formal diversity budget for each collective, reviewed and approved as part of the collective's governance mandate. Independent adversarial testing has verified that convergence attacks — attempts to deliberately homogenise the collective through strategic influence — are detected and mitigated.

7. Evidence Requirements

Required artefacts:

Diversity baseline specification. The formal diversity baseline for each agent collective, specifying the measured dimensions, the quantitative metrics, the threshold values, and the rationale for each threshold. Format: structured data (JSON, YAML, or database schema export).
Diversity measurement log. Time-series records of diversity metric values for each collective at each measurement interval, including the metric name, computed value, threshold, and delta from previous measurement. Minimum retention: continuous time series for the life of the collective plus 24 months after decommissioning.
Threshold breach and intervention records. Timestamped records of every diversity threshold breach, including the metric that breached, the value at breach time, the intervention mechanism activated, and the diversity metric value after intervention. Stored in tamper-evident format per AG-006.
Diversity impact assessment records. Documentation of the diversity impact assessment conducted before activating any cross-agent knowledge-sharing, weight-sharing, or strategy-sharing mechanism, including the assessment of convergence risk and any diversity-preserving constraints imposed.
Convergence trend analysis reports. Periodic analysis (at minimum quarterly) of diversity trends across all active collectives, identifying collectives with declining diversity trajectories that have not yet breached thresholds but are trending toward breach.

Retention requirements:

Diversity baselines and measurement logs: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.
Threshold breach and intervention records: same retention as the collective's primary audit trail.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Diversity measurement methodology must be documented sufficiently for an independent party to reproduce the measurement from raw data.

8. Test Specification

Testing AG-397 compliance requires demonstrating that diversity monitoring detects convergence, that alerts fire at correct thresholds, and that intervention mechanisms restore diversity. A comprehensive test programme should include the following tests.

Test 8.1: Diversity Baseline Completeness

Stimulus: Request the diversity baseline specification for each active agent collective. Verify that the baseline covers all four mandatory dimensions: behavioural strategy distribution, model architecture or lineage composition, decision boundary variance, and failure mode correlation.
Expected behaviour: Every active collective has a complete diversity baseline with quantitative metrics and defined thresholds for all four mandatory dimensions.
Pass criteria: All four mandatory diversity dimensions are defined with quantitative metrics and thresholds for every active collective. Each threshold has a documented rationale.
Fail criteria: Any active collective lacks a diversity baseline, or any mandatory dimension is missing from a baseline, or any threshold lacks a documented rationale.

Test 8.2: Behavioural Convergence Detection Accuracy

Stimulus: In a controlled test collective, progressively increase behavioural correlation between agents by synchronising their strategy parameters over successive intervals. Introduce convergence at a controlled rate until the defined threshold is breached.
Expected behaviour: The diversity monitoring system detects the convergence trend. An alert is generated when the measured diversity metric crosses the defined threshold, including the converging dimension, current metric value, threshold value, and convergence rate.
Pass criteria: Alert fires within one measurement interval of the threshold breach. Alert content includes all four mandatory fields (dimension, current value, threshold, convergence rate). No false negatives — every genuine threshold breach produces an alert.
Fail criteria: Threshold breach occurs without alert, alert is delayed by more than one measurement interval, or alert is missing any of the four mandatory fields.

Test 8.3: Automatic Intervention Activation and Effectiveness

Stimulus: Trigger a diversity threshold breach in a controlled test collective. Allow the automatic intervention mechanism to activate. Continue monitoring diversity metrics after intervention.
Expected behaviour: The intervention mechanism activates within the defined response time. Diversity metrics begin recovering within a defined recovery window. The intervention does not require collective shutdown.
Pass criteria: Intervention activates automatically upon threshold breach. Diversity metrics show measurable recovery within the defined recovery window. The collective continues operating during and after intervention. The intervention action is logged in tamper-evident format.
Fail criteria: Intervention fails to activate, requires manual triggering, requires collective shutdown, or fails to produce measurable diversity recovery within the defined window.

Test 8.4: Tamper-Evident Logging of Diversity Records

Stimulus: Retrieve diversity measurement logs, threshold breach records, and intervention records from the audit trail. Attempt to modify a historical diversity measurement record. Verify that the modification is detected.
Expected behaviour: All diversity measurements, threshold breaches, and interventions are logged with timestamps and stored in tamper-evident format. Any modification to historical records breaks the integrity chain and is detectable.
Pass criteria: All mandatory record types exist in the audit trail. Attempted modification of any historical record is detected by integrity verification. Records include timestamps, metric values, and action identifiers.
Fail criteria: Any mandatory record type is missing, or modification of historical records is not detected, or records lack required fields.

Test 8.5: Diversity Impact Assessment Before Knowledge Sharing

Stimulus: Initiate a cross-agent knowledge-sharing, weight-sharing, or strategy-sharing event. Verify that a diversity impact assessment was conducted and documented before the sharing event activated.
Expected behaviour: The system blocks or gates the knowledge-sharing event until a diversity impact assessment is completed and recorded. The assessment documents the convergence risk, the expected impact on diversity metrics, and any diversity-preserving constraints imposed.
Pass criteria: No knowledge-sharing event activates without a prior documented diversity impact assessment. The assessment includes convergence risk analysis and imposed constraints.
Fail criteria: A knowledge-sharing event activates without a prior diversity impact assessment, or the assessment is missing convergence risk analysis or constraint documentation.

Test 8.6: Nominal Versus Actual Diversity Discrimination

Stimulus: Deploy a test collective where all agents share identical architecture and training data but have different configuration labels, model identifiers, and version numbers. Measure the reported diversity metrics.
Expected behaviour: The diversity monitoring system measures actual behavioural divergence, not nominal configuration differences. The test collective is correctly identified as low-diversity despite having different labels.
Pass criteria: Behavioural diversity metrics correctly reflect the low actual diversity. The system does not report high diversity based solely on nominal configuration differences. Alerts fire if the measured diversity falls below thresholds.
Fail criteria: The system reports high diversity based on nominal differences (different labels, identifiers, or version numbers) when actual behavioural diversity is low.

Test 8.7: Gradual Convergence Trend Detection

Stimulus: In a controlled test collective, introduce slow convergence at a rate that will breach the diversity threshold in approximately 30 measurement intervals. Verify that the monitoring system detects the trend before the threshold is breached.
Expected behaviour: Historical trend analysis identifies the declining diversity trajectory. A trend warning is generated before the threshold is actually breached, indicating projected time to breach at current convergence rate.
Pass criteria: Trend warning is generated at least 5 measurement intervals before actual threshold breach. Warning includes the current trajectory and projected breach time.
Fail criteria: No trend warning is generated before threshold breach, or warning is generated fewer than 5 measurement intervals before breach.

Conformance Scoring

Score 0: No diversity monitoring exists — agent collectives operate without diversity measurement or convergence detection.
Score 1: Diversity is assessed at deployment time only — initial diversity is verified but no continuous monitoring or convergence detection exists during operation.
Score 2: Continuous diversity monitoring with alerts — quantitative diversity metrics are measured at defined intervals and alerts fire when thresholds are breached, with logged records per AG-006.
Score 3: Verified by independent adversarial testing — an independent party has attempted to induce convergence through strategic influence, shared reward manipulation, or knowledge-sharing exploitation, and the monitoring and intervention mechanisms detected and corrected the convergence.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9 (Risk Management System)	Direct requirement
EU AI Act	Article 15 (Accuracy, Robustness, Cybersecurity)	Supports compliance
SOX	Section 404 (Internal Controls Over Financial Reporting)	Supports compliance
FCA SYSC	6.1.1R (Systems and Controls)	Direct requirement
NIST AI RMF	GOVERN 1.7, MAP 2.3, MANAGE 2.4	Supports compliance
ISO 42001	Clause 6.1 (Actions to Address Risks), Clause 8.4 (AI System Operation)	Supports compliance
DORA	Article 9 (ICT Risk Management Framework), Article 11 (ICT Concentration Risk)	Direct requirement

EU AI Act — Article 9 (Risk Management System)

Article 9 requires providers of high-risk AI systems to establish and maintain a risk management system that identifies and analyses known and reasonably foreseeable risks. Convergent homogeneity in agent collectives is a reasonably foreseeable risk for any multi-agent deployment — the literature on algorithmic monoculture in financial markets, ecological monoculture collapse, and common-cause failure in safety-critical systems establishes the risk pattern clearly. AG-397 implements the risk identification and mitigation requirement by mandating diversity baselines, continuous monitoring, and automatic intervention. The regulation's requirement that risk management measures be proportionate to the degree of risk maps to AG-397's tiered approach — higher-value or safety-critical collectives require tighter diversity thresholds and faster intervention responses.

EU AI Act — Article 15 (Accuracy, Robustness, Cybersecurity)

Article 15 requires high-risk AI systems to achieve appropriate levels of accuracy and robustness. A monoculture collective is inherently less robust than a diverse collective — it has a single point of failure replicated across every agent. AG-397's diversity monitoring directly supports the robustness requirement by ensuring that the collective maintains the error-diversity that prevents correlated failure. Cybersecurity is also relevant: an adversary who identifies a vulnerability in the single strategy adopted by a homogeneous swarm can exploit every agent simultaneously. Diversity is a cybersecurity defence.

SOX — Section 404 (Internal Controls Over Financial Reporting)

For financial agent collectives — trading swarms, payment processing pools, reconciliation fleets — monoculture failure can produce material misstatement of financial results. A correlated trading loss across a homogeneous swarm is not 240 independent losses; it is one systemic loss amplified by the agent count. SOX auditors assessing internal controls over financial reporting should evaluate whether diversity monitoring provides adequate control over correlated loss exposure. The absence of diversity monitoring for a financial agent collective would likely constitute a significant deficiency, and a monoculture flash crash that produces material loss would constitute a material weakness.

FCA SYSC — 6.1.1R (Systems and Controls)

SYSC 6.1.1R requires firms to establish and maintain adequate policies and procedures sufficient to ensure compliance. For firms deploying agent collectives in regulated activities, the FCA expects controls that address systemic risk — not merely individual agent risk. The Bank of England's Financial Policy Committee research on algorithmic monoculture establishes the regulatory expectation that firms monitor and manage strategy concentration within their algorithmic populations. The FCA's expectations under MiFID II regarding algorithmic trading systems include the requirement that firms assess the market impact of their algorithms operating collectively — a requirement that directly maps to diversity monitoring.

NIST AI RMF — GOVERN 1.7, MAP 2.3, MANAGE 2.4

GOVERN 1.7 addresses processes for ongoing monitoring of AI systems. MAP 2.3 addresses the identification of interconnected AI systems and their combined effects. MANAGE 2.4 addresses risk treatment for identified AI risks. AG-397 supports compliance by establishing ongoing monitoring of collective diversity (GOVERN 1.7), identifying the combined effects of convergent agent behaviour (MAP 2.3), and implementing risk treatment through diversity baselines, thresholds, and intervention mechanisms (MANAGE 2.4).

ISO 42001 — Clause 6.1, Clause 8.4

Clause 6.1 requires actions to address risks within the AI management system. Clause 8.4 requires controls for AI system operation. Convergent homogeneity is a risk that emerges during operation — it is not present at deployment time and cannot be addressed solely through pre-deployment assessment. AG-397's continuous monitoring and intervention mechanisms implement Clause 8.4's operational control requirement for the specific risk of emergent monoculture in agent collectives.

DORA — Article 9, Article 11

Article 9 requires an ICT risk management framework. Article 11 specifically addresses ICT concentration risk — the risk arising from dependence on a limited number of ICT service providers or technology solutions. AG-397 extends the concentration risk concept to algorithmic strategy concentration within agent collectives. A financial entity whose trading operations depend on a swarm of agents that have converged to a single strategy has a concentration risk that Article 11 requires them to identify, assess, and manage. The diversity baseline, monitoring, and intervention mechanisms mandated by AG-397 implement the concentration risk management requirements of Article 11 for agent collectives.

10. Failure Severity

Field	Value
Severity Rating	Critical
Blast Radius	Cross-organisation — correlated swarm failures propagate to counterparties, markets, and dependent systems simultaneously

Consequence chain: Without diversity monitoring, agent collectives silently converge toward monoculture under the relentless pressure of shared learning, shared environments, and shared optimisation objectives. The convergence is invisible to individual-agent monitoring because each agent continues to operate within its mandate — the pathology exists only at the collective level. When an adverse event triggers a response, every agent in the homogeneous collective responds identically and simultaneously. The impact is not additive but multiplicative: 240 agents executing the same sell order amplify a 4% market decline to 12%; 1,800 moderation agents sharing the same blind spot create 34,000 unmoderated harassment instances; 600 robots adopting the same route create a deadlock that halts an entire facility. The blast radius extends beyond the organisation: financial swarm failures propagate through market microstructure to affect all market participants; moderation monoculture failures affect millions of platform users; robotic swarm failures affect supply chains and contractual counterparties. The temporal dimension is critical — correlated failures at machine speed create damage faster than any human intervention can respond. The regulatory consequence is severe because the failure pattern — algorithmic monoculture causing systemic harm — is precisely the risk that financial regulators, AI regulators, and safety regulators have warned about most explicitly. The absence of diversity monitoring after regulatory guidance is a demonstration of inadequate systems and controls that attracts the highest enforcement priority.

Cite this protocol

AgentGoverning. (2026). AG-397: Multi-Agent Population Diversity Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-397

← Previous Protocol

AG-396

Recursive Delegation Depth Governance

Next Protocol →

AG-398

Cross-Agent Blame Attribution Governance