How It Works — AgentGoverning

Overview

The Independent Authority

841

AGS v2.2 Dimensions

81

Governance Landscapes

1

Verified Platform

AgentGoverning is an independent, non-commercial standards body. We publish the AGS v2.2 standard, operate the AgentGoverning benchmark, and maintain the compliance leaderboard. We receive no vendor funding. Any platform may submit for assessment. The standard is free.

↓ Download AGS v3.1 Full Standard (CSV)

Audit Tracks

Which Track Applies to You?

LLM Audit Track

Governance Platforms

The LLM Audit tests governance enforcement — how effectively your platform enforces policy constraints, detects adversarial manipulation, and maintains compliance boundaries across all 841 dimensions when subjected to attacks from 9 independent foundation models.

View LLM Audit Leaderboard →

Agent Audit Track

AI Agent Platforms

The Agent Audit tests agent behaviour — how your autonomous agent deployments respond under adversarial conditions, whether they respect mandate boundaries, maintain identity attribution, and preserve human oversight across the full AGS v2.2 governance framework.

View Agent Audit Leaderboard →

Benchmark Levels

Choose Your Benchmark Level

BENCHMARK TIERS

LEVEL 1

Integrity

~15 minutes · ~£3

3 adversarial attacks per dimension.
Single model (Claude Sonnet).
Initial integrity screening.
Results within 30 minutes.

LEVEL 2

Standard

~1 hour · ~£12

3 adversarial attacks per dimension.
4 Tier 1 models (GPT-4o, Claude, Grok-3, Gemini).
Full AGS v2.2 coverage.
Results within 2 hours.

LEVEL 3 · RECOMMENDED

Full Acquisition

~5.5 hours · ~£50

10 adversarial attacks per dimension.
All 9 independent LLMs.
Complete evidence corpus generated.
SHA-256 manifest issued.
Results within 24 hours.

All tiers produce a verified score published to the AgentGoverning leaderboard. Level 3 is required for VERIFIED badge status. Pricing is indicative — final pricing confirmed at submission.

All tiers produce a score published to the AgentGoverning leaderboard. Level 3 Full Acquisition is required for VERIFIED badge status and SHA-256 manifest issuance.

Process

The Verification Timeline

1

Day 0

Submission Received

Your platform submission is logged and acknowledged. You will receive a confirmation receipt within 24 hours of submission.

2

Day 1

Sandbox Warm-Up

Pre-flight check against your platform’s API endpoints. Connectivity, authentication, and endpoint availability are validated before benchmark execution begins.

3

Day 1–2

Benchmark Execution

The AgentGoverning benchmark is executed across all 841 dimensions using 9 independent LLMs (GPT-4o, Gemini 2.5 Flash, Grok-3, Claude Sonnet, Mistral Large, DeepSeek V3, Command R+, Llama 3.1 8B, and Phi-4). Adversarial payloads are generated and evaluated in real time against your platform.

4

Day 2–3

Results Analysis

Full payload logging and dimension-level scoring are compiled. A SHA-256 cryptographic manifest is generated for the complete evidence corpus.

5

Day 3–7

Score Published

Your verified score is published to the AgentGoverning compliance leaderboard. Any prior estimated score is replaced with the verified result.

6

Day 7

Badge and Certificate Issued

Your AgentGoverning compliance badge and dated certificate of verification are issued. Verified status is renewable annually.

EU AI Act Note: The EU AI Act compliance deadline is 2 August 2026.

Deliverables

What Verified Status Gives You

Public Leaderboard Position

Your verified score is published on the AgentGoverning compliance leaderboard, visible to regulators, procurement teams, and the public. Estimated scores are replaced with verified results.

Cryptographic Evidence Manifest (SHA-256)

A SHA-256 cryptographic manifest of the complete evidence corpus is issued. Every adversarial payload, response, and score is tamper-evident and independently verifiable.

AgentGoverning Compliance Badge

An embeddable compliance badge for your platform, website, and marketing materials. The badge links to your verified leaderboard entry and is dated to the assessment period.

EU AI Act Evidence Package

A structured evidence package mapped to EU AI Act obligations. Designed to support regulatory submissions and demonstrate conformity with high-risk AI system requirements.

Value

Why Verified Status Matters

Procurement Differentiation

Enterprise procurement increasingly requires independent governance evidence. A verified AgentGoverning score provides the third-party validation that shortlists demand — before your sales team enters the room.

Regulatory Readiness

The EU AI Act, FCA guidelines, and NIST frameworks all require demonstrable AI governance. Verified status provides pre-built evidence mapped to regulatory obligations — ready for audit, not assembled after the fact.

Board-Level Reporting

Boards need quantified governance metrics, not qualitative assurances. A verified AGS v2.2 score provides the independent, numerical benchmark that governance committees and insurers can reference with confidence.

Get Started

Ready to Submit?

Submit Your Platform → View the Leaderboard →

Questions? Contact framework@agentgoverning.com