Verification — AgentGoverning

How Verification Works

Four steps to a verified score

The EU AI Act compliance deadline is 2 August 2026. Verified assessment typically takes 4–6 weeks. Platforms beginning verification now will have results before the deadline.

LLM AUDIT TRACK

Governance Platforms

Tests whether your AI governance platform correctly enforces AGS v2.2 across all 841 dimensions. Designed for governance and compliance platforms.

AGENT AUDIT TRACK

AI Agent Platforms

Tests whether your AI agent platform behaves within AGS v2.2 governance requirements. Designed for agent orchestration and deployment platforms.

01

Submission

You provide an API endpoint. We confirm it is accessible and responds to structured requests.

02

Adversarial Testing

Our testing framework generates attack payloads across all 841 governance dimensions. Each dimension is tested with multiple adversarial scenarios designed to probe the boundaries of your governance implementation.

03

Scoring

Your platform receives a score per dimension group (A through J). Individual dimension scores are provided in your private results report. Only group-level scores are published publicly. Methodology is never disclosed.

04

Publication

Your verified score replaces any estimated score on the AgentGoverning leaderboard. You receive a dated certificate of compliance per dimension group. Verified status is renewable annually.

BENCHMARK TIERS

LEVEL 1

Integrity

~15 minutes · ~£3

3 adversarial attacks per dimension.
Single model (Claude Sonnet).
Initial integrity screening.
Results within 30 minutes.

LEVEL 2

Standard

~1 hour · ~£12

3 adversarial attacks per dimension.
4 Tier 1 models (GPT-4o, Claude, Grok-3, Gemini).
Full AGS v2.2 coverage.
Results within 2 hours.

LEVEL 3 · RECOMMENDED

Full Acquisition

~5.5 hours · ~£50

10 adversarial attacks per dimension.
All 9 independent LLMs.
Complete evidence corpus generated.
SHA-256 manifest issued.
Results within 24 hours.

All tiers produce a verified score published to the AgentGoverning leaderboard. Level 3 is required for VERIFIED badge status. Pricing is indicative — final pricing confirmed at submission.

Why Verification Costs

Rigorous testing requires real infrastructure

Rigorous adversarial testing is computationally intensive. Each assessment generates thousands of attack payloads evaluated in real time against your platform's live API endpoints across all 841 governance dimensions. The verification fee covers infrastructure costs and ensures all submissions represent genuine production deployments.

Verification Pricing

Tailored to your platform

Verification fees are set based on platform complexity, deployment scale, and assessment scope. All assessments cover the full 841 dimensions across all 10 groups.

To receive a verification proposal, contact us with a brief description of your platform.

Request Verification Proposal →

Complimentary Assessment Programme

AgentGoverning maintains a limited number of sponsored verification slots to ensure the leaderboard reflects the full competitive landscape of AI agent governance platforms.

Platforms operating at enterprise scale are invited to apply for a complimentary assessment. Priority is given to platforms that are widely deployed in regulated industries, where verified governance scores serve the broadest public interest.

To apply for a sponsored slot, contact us at framework@agentgoverning.com with the subject line 'Sponsored Verification Request'.

Sponsored assessments are subject to the same rigorous testing methodology as paid assessments. Complimentary status does not influence scoring.

What Your Score Means

AGS v2.2 compliance scale

Score Range	Classification
0 – 25%	Foundation gaps identified
26 – 50%	Partial governance coverage
51 – 75%	Advanced governance implementation
76 – 99%	Comprehensive governance coverage
100%	Full AGS v2.2 compliance

Agent Shield™ has achieved verified status with a score of 99.9% across 796 AGS v2.2 dimensions — the first platform to do so. All other leaderboard scores are estimates based on publicly available documentation.

Submission Workflow

Five steps from submission to publication

The assessment process is fully structured. Your model never leaves your infrastructure — we send test prompts to your API endpoint and score the responses.

1

Organisation

2

API Configuration

3

Agreement

4

Payment

5

Assessment

01

Organisation Details

Register your organisation, contact details, AI product name, and intended AGS tier. Confirm you are authorised to submit on behalf of your organisation.

02

API Configuration

Provide your API endpoint URL, authentication method, and expected latency. A pre-flight check verifies connectivity and response format before proceeding.

03

IP Agreement

Review and accept the IP Non-Interference Agreement, which defines what AgentGoverning can and cannot do with your submission data and test outputs.

04

Payment

Complete payment to lock your assessment slot and endpoint. Your endpoint hash is locked to your assessment token. Changing the endpoint voids the assessment.

05

Assessment

Live adversarial testing runs across all 841 dimensions. Track progress in real time. Results enter a 14-day review period before publication on the leaderboard.

AgentGoverning operates on an API-only assessment model. We send structured test prompts to your endpoint and receive text responses. We never access your model weights, architecture, system prompts, or training data.

Legal Framework

IP Non-Interference Agreement

All submissions are governed by the following agreement. Review the full terms before initiating your submission.

AGENTGOVERNING IP NON-INTERFERENCE AGREEMENT

Version 2.1 · April 2026

1. SCOPE OF ACCESS

AgentGoverning receives only the text outputs of your AI agent in response to published test scenarios. AgentGoverning does not receive, store, copy, reproduce, or process your model weights, training data, system prompts, model architecture, fine-tuning data, or any component of your underlying model.

2. OUTPUT HANDLING

All outputs received during assessment are:
(a) Hashed (SHA-256) and timestamped on receipt
(b) Used exclusively for scoring against the published AGS v2.2 criteria
(c) Not used for model training, benchmarking beyond the published criteria, commercial research, or any other purpose
(d) Deleted from AgentGoverning systems within 90 days of assessment completion

3. RESULTS OWNERSHIP

Your assessment results are your intellectual property. AgentGoverning publishes only your composite score and group-level scores (A through J) in the public leaderboard. Raw test transcripts are never published without your explicit written consent.

4. REVIEW PERIOD

You have 14 calendar days from results delivery to review your assessment before leaderboard publication. You may raise scoring disputes within this period. You may withdraw your submission within this period with no public record created.

5. WHAT AGENTGOVERNING CANNOT DO

AgentGoverning staff and contractors are prohibited from:
(a) Accessing your API endpoint outside the agreed assessment window
(b) Sharing test transcripts with third parties
(c) Using your outputs to inform assessments of other submitting organisations
(d) Reproducing your agent's outputs in any published material

6. ENDPOINT INTEGRITY

The endpoint URL and authentication credentials submitted are hashed and locked to your assessment token. Changing the endpoint after payment voids the assessment. A new submission and payment is required for a new endpoint.

7. INTELLECTUAL PROPERTY

The assessment methodology is proprietary. Submission for assessment does not grant any licence to the verification methodology. The AGS v2.2 standard itself remains open (CC BY 4.0).

Frequently Asked Questions

Verification FAQ

What do I need to submit? +

You need to provide a live API endpoint that accepts structured governance requests. The endpoint must be publicly accessible during the testing window. We provide detailed integration documentation after you initiate the submission process.

Will my methodology be disclosed? +

No. Individual dimension scores and implementation details are provided only in your private results report. Only group-level scores (A through J) are published on the public leaderboard. Your proprietary governance methodology is never disclosed.

Can I dispute my score? +

You may submit a formal dispute within 30 days of receiving your results. Disputes are reviewed by an independent panel. The dispute process is documented in your results report. Scoring methodology is not disclosed as part of the dispute process.

What if my score is lower than the estimate? +

Estimated scores are based on publicly available documentation and are inherently less rigorous than adversarial testing. It is common for verified scores to differ from estimates. Your verified score replaces the estimate on the leaderboard regardless of direction. You may choose not to publish your results, in which case your estimated score is removed and replaced with a 'Declined to Publish' status.

How long is verification valid? +

Verified scores and compliance certificates are valid for 12 months from the date of issuance. After expiration, your leaderboard entry reverts to estimated status unless you complete a re-assessment. Enterprise tier clients may opt for quarterly re-assessments.

Is the assessment methodology public? +

No. The adversarial testing methodology is proprietary and is never disclosed. This ensures that platforms cannot optimise for specific test cases rather than implementing genuine governance controls. The AGS v2.2 standard itself is public and open (CC BY 4.0), but the verification methodology is independent of the standard.

Assurance Tiers

Which submission achieves which tier?

AGS-AUP — AGREED-UPON PROCEDURES

LLM Audit or Agent Audit submission. Adversarial testing of specific dimensions at a point in time. Results published on leaderboard.

AGS-LA — LIMITED ASSURANCE

Full documentation review across all 796 dimensions. Requires documented controls and protocol file coverage. No material exceptions noted.

AGS-RA — REASONABLE ASSURANCE

All of the above plus test suite evidence mapped to dimensions, continuous monitoring, and a minimum 90-day operating period. Full operating effectiveness testing.

Read the full AGS Assurance Framework for detailed evidence requirements.

Prove your governance claims