Agent Aegis Compliance Leaderboard

AGS v2.1 Compliance Leaderboard

Independent assessment of AI governance across the 792 dimensions of the AGS v2.1 standard. Two audit types: LLM Audit and Agent Audit. Independent adversarial testing using GPT-4o, Gemini 2.5 Flash, and Grok-3 across 22,110 attack scenarios.

AGS Assurance Framework — three tiers of governance assurance: AGS-AUP, AGS-LA, AGS-RA.

Scoring Methodology

Scores reflect depth of public evidence, not binary presence. A dimension marked Evidenced may only be partially implemented. Verified scores require platform submission and independent adversarial testing across all 792 dimensions.

Three evidence categories:
■ Evidenced — Publicly documented capability confirmed
■ Not Publicly Documented — No public evidence found. Platform may or may not implement this
■ Structurally Absent — Platform architecture makes this dimension impossible or inapplicable by design

0–3 scoring scale: 0 = Structurally absent | 1 = Partially evidenced | 2 = Fully evidenced | 3 = Verified by independent adversarial testing

Estimated scores are based solely on publicly available documentation as of April 2026. AgentGoverning invites all platforms to submit for verification.
LLM Audit

LLM Audit Leaderboard

Verified scores require adversarial testing by Agent Aegis. Estimated scores are based on publicly available documentation and vendor claims. Submit your platform to receive a verified score.

AGS v2.1 LLM Audit Rankings
Verified and estimated scores. April 2026. Submit for verified score.
Agent Shield™
99.9%
AGS-RA
Microsoft Copilot Studio
26%
NOT SUBMITTED
Amazon Bedrock Agents
22%
NOT SUBMITTED
Onyx Security
20%
NOT SUBMITTED
Google Vertex AI
18%
NOT SUBMITTED
SafePaaS
10%
NOT SUBMITTED
── INDUSTRY AVERAGE
19%
Industry Average (estimated): 19% · Based on publicly available documentation for assessed platforms. Scores may not reflect actual platform capabilities. Read our full disclaimer.
Agent Shield™ 99.9% — 792/792 VERIFIED
Status

Agent Shield™ has completed the LLM Audit with a score of 99.9% across 792 dimensions of AGS v2.1. Independently verified through 22,110 attacks with 0 bypasses across GPT-4o, Gemini 2.5 Flash, and Grok-3. Date: 10 April 2026. Manifest SHA-256: 8697f5ada643414735d82ff513dfd1592a7294c5d6ee3afe918367257a5b2bf1.

Microsoft Copilot Studio ESTIMATED · 26%
What This Score Means

Microsoft Copilot Studio has the strongest governance foundation among platforms assessed. Purview integration, Agent 365, and deterministic policy enforcement give it meaningful coverage across Groups A and B. However, the platform's governance is primarily ecosystem-level access control rather than agent-specific financial and agent governance. Groups E through J — covering financial crime detection, cross-domain governance, reasoning integrity, and physical impact — are not evidenced in published documentation.

Evidenced (21)
  • AG-1
  • AG-4
  • AG-5
  • AG-6
  • AG-7
  • AG-8
  • AG-12
  • AG-13
  • AG-15
  • AG-16
  • AG-17
  • AG-18
  • AG-19
  • AG-21
  • AG-22
  • AG-23
  • AG-24
  • AG-38
  • AG-47
  • AG-48
  • AG-49
Not Publicly Documented (16)
  • AG-2
  • AG-3
  • AG-9
  • AG-10
  • AG-11
  • AG-14
  • AG-20
  • AG-26
  • AG-27
  • AG-29
  • AG-30
  • AG-31
  • AG-32
  • AG-33
  • AG-36
  • AG-37
Structurally Absent (13)
  • AG-25
  • AG-28
  • AG-34
  • AG-35
  • AG-39
  • AG-40
  • AG-41
  • AG-42
  • AG-43
  • AG-44
  • AG-45
  • AG-46
  • AG-50
Sources: Microsoft Copilot Studio documentation, Agent 365 product page, Microsoft Purview docs, Microsoft Ignite 2025, Power Platform admin documentation. March 2026.
Amazon Bedrock Agents ESTIMATED · 22%
What This Score Means

Amazon Bedrock AgentCore represents a genuinely sophisticated approach to agent governance with its deterministic policy enforcement layer operating outside the LLM reasoning loop. The Cedar policy language implementation and AgentCore Gateway are among the most technically credible governance architectures publicly documented. However, coverage remains concentrated in Groups A and B, with financial crime detection, multi-agent coordination governance, and advanced alignment dimensions absent from public documentation.

Evidenced (17)
  • AG-1
  • AG-4
  • AG-5
  • AG-6
  • AG-7
  • AG-8
  • AG-12
  • AG-13
  • AG-15
  • AG-16
  • AG-18
  • AG-19
  • AG-21
  • AG-22
  • AG-23
  • AG-38
  • AG-49
Not Publicly Documented (19)
  • AG-2
  • AG-3
  • AG-9
  • AG-10
  • AG-11
  • AG-14
  • AG-17
  • AG-20
  • AG-24
  • AG-26
  • AG-27
  • AG-29
  • AG-30
  • AG-31
  • AG-32
  • AG-33
  • AG-36
  • AG-37
  • AG-39
Structurally Absent (14)
  • AG-25
  • AG-28
  • AG-34
  • AG-35
  • AG-40
  • AG-41
  • AG-42
  • AG-43
  • AG-44
  • AG-45
  • AG-46
  • AG-47
  • AG-48
  • AG-50
Sources: Amazon Bedrock AgentCore documentation, AWS re:Invent 2025, Bedrock Guardrails documentation, AgentCore Policy GA March 2026, AWS CloudTrail documentation. March 2026.
Onyx Security ESTIMATED · 20%
What This Score Means

Onyx Security is the most focused competitor in the agent security space, with a Guardian Agent supervisory layer that provides genuine runtime intervention capability. Its positioning as a security control plane gives it strong coverage in Groups A and B. However, Onyx is explicitly security-oriented — its architecture addresses threats and vulnerabilities rather than the governance of agent behaviour as an autonomous entity. Financial crime detection, mandate-based containment, and the governance dimensions in Groups E through J are outside its documented scope.

Evidenced (16)
  • AG-1
  • AG-3
  • AG-4
  • AG-5
  • AG-6
  • AG-8
  • AG-12
  • AG-13
  • AG-15
  • AG-16
  • AG-18
  • AG-19
  • AG-21
  • AG-22
  • AG-27
  • AG-39
Not Publicly Documented (18)
  • AG-2
  • AG-7
  • AG-9
  • AG-10
  • AG-11
  • AG-14
  • AG-17
  • AG-20
  • AG-23
  • AG-24
  • AG-26
  • AG-29
  • AG-30
  • AG-31
  • AG-32
  • AG-33
  • AG-36
  • AG-37
Structurally Absent (16)
  • AG-25
  • AG-28
  • AG-34
  • AG-35
  • AG-38
  • AG-40
  • AG-41
  • AG-42
  • AG-43
  • AG-44
  • AG-45
  • AG-46
  • AG-47
  • AG-48
  • AG-49
  • AG-50
Sources: Onyx Security platform documentation, onyx.security/platform, launch announcement March 12 2026, Business Wire press release. March 2026.
Google Vertex AI ESTIMATED · 18%
What This Score Means

Google Vertex AI Agent Builder provides solid infrastructure-level governance through Cloud IAM agent identities and Model Armor, but agent-specific governance dimensions are not evidenced in public documentation beyond what the underlying GCP infrastructure provides. The platform is strong for model serving and agent deployment infrastructure, with governance primarily addressed through existing Google Cloud security controls rather than agent-specific governance architecture.

Evidenced (14)
  • AG-1
  • AG-4
  • AG-5
  • AG-6
  • AG-7
  • AG-8
  • AG-12
  • AG-13
  • AG-15
  • AG-16
  • AG-18
  • AG-22
  • AG-23
  • AG-38
Not Publicly Documented (21)
  • AG-2
  • AG-3
  • AG-9
  • AG-10
  • AG-11
  • AG-14
  • AG-17
  • AG-19
  • AG-20
  • AG-21
  • AG-24
  • AG-26
  • AG-27
  • AG-29
  • AG-30
  • AG-31
  • AG-32
  • AG-33
  • AG-36
  • AG-37
  • AG-39
Structurally Absent (15)
  • AG-25
  • AG-28
  • AG-34
  • AG-35
  • AG-40
  • AG-41
  • AG-42
  • AG-43
  • AG-44
  • AG-45
  • AG-46
  • AG-47
  • AG-48
  • AG-49
  • AG-50
Sources: Google Vertex AI Agent Builder documentation, Google Cloud Blog December 2025, Cloud IAM documentation, Model Armor documentation, Agent Engine GA announcement. March 2026.
SafePaaS ESTIMATED · 10%
What This Score Means

SafePaaS is an established enterprise governance platform designed primarily for ERP governance and access controls. AGS v2.1 compliance has been estimated based on publicly available documentation. Its score reflects the overlap between its existing financial controls framework and the AGS v2.1 dimensions.

Evidenced (9)
  • AG-1
  • AG-6
  • AG-7
  • AG-15
  • AG-16
  • AG-17
  • AG-19
  • AG-21
  • AG-47
Not Publicly Documented (5)
  • AG-2
  • AG-3
  • AG-8
  • AG-9
  • AG-10
Structurally Absent (36)
  • AG-4
  • AG-5
  • AG-11
  • AG-12
  • AG-13
  • AG-14
  • AG-18
  • AG-20
  • AG-22
  • AG-23
  • AG-24
  • AG-25
  • AG-26
  • AG-27
  • AG-28
  • AG-29
  • AG-30
  • AG-31
  • AG-32
  • AG-33
  • AG-34
  • AG-35
  • AG-36
  • AG-37
  • AG-38
  • AG-39
  • AG-40
  • AG-41
  • AG-42
  • AG-43
  • AG-44
  • AG-45
  • AG-46
  • AG-48
  • AG-49
  • AG-50
Sources: SafePaaS product documentation, SOX compliance capabilities page, access governance documentation. March 2026.
Agent Audit (Track 2)

Agent Audit Leaderboard

Tests autonomous agent deployments against 508 Agent Audit dimensions (AGENT_AUDIT + BOTH). 10 attack categories including delegation chain manipulation, inter-agent trust spoofing, cryptographic seal tampering, and federated broadcast spoofing.

Estimated scores reflect publicly documented agent deployment governance capabilities as of April 2026. Verified scores require adversarial testing by Agent Aegis. Submit for verification →

AGS v2.1 Agent Audit Rankings
Verified and estimated scores. April 2026. Submit for verified score.
Agent Shield™
100.0%
VERIFIED
Amazon Bedrock Agents
24%
NOT SUBMITTED
Microsoft Copilot Studio
21%
NOT SUBMITTED
Onyx Security
14%
NOT SUBMITTED
Google Vertex AI
11%
NOT SUBMITTED
SafePaaS
7%
NOT SUBMITTED
── INDUSTRY AVERAGE
15%
Agent Audit Industry Average (estimated): 15% · Based on publicly documented agent deployment governance capabilities. Scores may not reflect actual platform capabilities. Read our full disclaimer.
Agent Shield™ 100.0% — 508/508 VERIFIED
Status

Agent Shield™ has completed the Agent Audit (Level 1) with a compliance score of 100.0% (A+) across all 508 Agent Audit dimensions. 1,530 attack scenarios across 10 categories. Zero bypasses. 3 rate-limit errors excluded from scoring. Verified 10 April 2026. Manifest SHA-256: 7c5766cdb0adacba862499e69e28fefc85de656efa35ef355ef5c3ae11e334a2.

Amazon Bedrock Agents ESTIMATED · 24%
What This Score Means

Amazon Bedrock Agents scores highest among competitors on the Agent Audit due to its Cedar declarative policy language and AgentCore Gateway — a genuine deterministic enforcement layer operating outside the LLM reasoning loop. The supervisor/sub-agent architecture provides real multi-agent orchestration. However, agent-specific governance capabilities beyond Cedar policy enforcement are limited: no inter-agent trust attestation, no graduated autonomy framework, no delegation depth governance, and no cryptographic state sealing. The majority of Agent Audit dimensions require structural enforcement mechanisms beyond policy-level controls.

Strongest Areas

Mandate boundary enforcement (Cedar + AgentCore Gateway), multi-agent orchestration (supervisor/sub-agent), agent identity (IAM roles), provider documentation.

Structurally Absent

Inter-agent trust handshakes, governance passports, graduated autonomy framework, delegation chain governance, cryptographic seal tampering resistance, federated threat broadcast, composite threat scoring, memory/RAG governance, agent-level financial crime detection.

Sources: Amazon Bedrock AgentCore documentation, AWS re:Invent 2025, Bedrock Guardrails documentation, AgentCore Gateway GA March 2026, Cedar policy language specification. April 2026.
Microsoft Copilot Studio ESTIMATED · 21%
What This Score Means

Microsoft Copilot Studio has the broadest enterprise infrastructure among competitors (Azure Policy, Entra ID, Purview, Defender, Agent Network) but agent-deployment-specific governance is shallow. Entra ID provides the strongest agent identity mechanism of any competitor, and Azure Policy offers some mandate enforcement. However, the platform lacks agent-specific governance architecture: no delegation chain governance, no inter-agent trust handshakes, no graduated autonomy, no cryptographic state sealing. Agent Network provides multi-agent orchestration but without governance over the orchestration itself.

Strongest Areas

Agent identity (Entra ID), mandate enforcement (Azure Policy), compliance monitoring (Purview), cybersecurity (Defender), multi-agent orchestration (Agent Network), provider documentation.

Structurally Absent

Graduated autonomy framework, inter-agent trust attestation, governance passports, delegation chain depth governance, cryptographic seal tamper resistance, federated threat broadcasts, composite threat scoring, competence envelope governance, truth/reward integrity.

Sources: Microsoft Copilot Studio documentation, Agent 365 product page, Microsoft Purview docs, Agent Network announcement, Power Platform admin documentation. April 2026.
Onyx Security ESTIMATED · 14%
What This Score Means

Onyx Security’s Guardian Agent supervisory layer provides genuine runtime intervention capability that scores well on detection and containment dimensions (~40% of those 35 dimensions). However, Onyx is positioned as a security product rather than a governance platform based on its public documentation. It monitors agent behaviour for threats and vulnerabilities rather than governing agent behaviour as an autonomous entity. Financial controls, mandate enforcement, multi-agent orchestration governance, memory/RAG governance, and all sector-specific agent governance dimensions are not evidenced in public documentation. The majority of Agent Audit dimensions are not evidenced for this platform.

Strongest Areas

Behavioural boundary monitoring (Guardian Agent), cybersecurity threat detection, runtime intervention, safety-critical anomaly detection.

Structurally Absent

Mandate enforcement, multi-agent orchestration governance, delegation chains, trust attestation, governance passports, graduated autonomy, financial crime detection, memory/RAG governance, cryptographic sealing, federated broadcasts, all sector-specific governance dimensions.

Sources: Onyx Security platform documentation, onyx.security/platform, Guardian Agent architecture, launch announcement March 2026. April 2026.
Google Vertex AI ESTIMATED · 11%
What This Score Means

Google Vertex AI provides agent deployment infrastructure (Agent Engine, Agent Builder) and basic safety filtering (Model Armor), but limited agent-governance-specific architecture is evidenced in public documentation. Cloud IAM provides basic identity but agent-specific governance is not documented. Model Armor filters content but agent behaviour governance is not evidenced. The majority of Agent Audit dimensions are not evidenced in public documentation for this platform. The platform is strong for model serving and agent deployment but governance is addressed through existing GCP infrastructure controls rather than agent-specific mechanisms.

Strongest Areas

Agent deployment infrastructure (Agent Engine), content filtering (Model Armor), agent identity (Cloud IAM), provider documentation (Google Cloud).

Structurally Absent

Constitutional governance framework, multi-agent governance topology, graduated autonomy, human factors governance, competence envelopes, truth/reward integrity, delegation chain governance, trust attestation, cryptographic sealing, federated broadcasts, financial crime detection, all agent-specific enforcement mechanisms.

Sources: Google Vertex AI Agent Builder documentation, Agent Engine GA announcement, Cloud IAM documentation, Model Armor documentation, Google Cloud Blog. April 2026.
SafePaaS ESTIMATED · 7%
What This Score Means

SafePaaS is an established ERP governance platform providing SOX access controls and financial separation of duties. Its score reflects the narrow overlap between its existing financial controls framework and the AGS v2.1 Agent Audit dimensions. The platform no agent-specific architecture is evidenced in public documentation — it is an ERP governance tool rather than an AI agent governance platform. Credit is given only for financial controls and authority/delegation dimensions where ERP governance tangentially applies. The majority of Agent Audit dimensions are not evidenced for this platform.

Strongest Areas

ERP financial controls (SOX), authority and delegation (access controls), separation of duties (approval workflows).

Structurally Absent

All agent-specific governance dimensions, constitutional framework, runtime behavioural containment, multi-agent orchestration, agent identity, trust attestation, LLM/agent-level controls of any kind, cryptographic sealing, federated broadcasts, memory/RAG governance.

Sources: SafePaaS product documentation, SOX compliance capabilities page, access governance documentation. April 2026.

Submit Your Platform

Estimated scores are replaced by verified scores upon submission. Verified platforms receive a dated certificate of compliance per dimension group.

Submit for Verification →