Published v1.0 — April 2026 — CC-BY-4.0
Applies to: AGS v2.1 — Next scheduled review: July 2026

Summary

This document specifies how scores published on the Agent Governance Standard (AGS) leaderboard are kept current, when re-scoring is triggered, how vendors are notified of score changes, and how the score change history is maintained as a public, citable record. The process is designed to be predictable for vendors, reproducible by external reviewers, and aligned with the Score Dispute Process so that disagreements raised during a score change are resolved through a single coherent procedure.

When Scores Are Reviewed

Scores are reviewed under one of five trigger conditions. Each trigger has a defined response timeline measured in calendar days from the trigger event.

TriggerDescriptionRe-assessment SLA
Scheduled review cycleAll estimated scores are reviewed every 90 days (the start of January, April, July, and October)Completed within the 90-day cycle
Vendor major releaseVendor publicly announces or releases a change to capabilities affecting any AGS dimensionRe-assessment of affected dimensions within 30 days of release
Vendor requestVendor submits a re-score request via disputes@agentgoverning.com (subject line “RE-SCORE REQUEST: [platform name]”)Re-assessment of requested dimensions within 30 days
Third-party disputeAn external party disputes a published score per the Score Dispute ProcessPer dispute timelines (acknowledgement 3 business days, initial assessment 14 days, final determination 30 days)
AGS methodology version changeA new AGS version is published with revised dimensions, rubric, or scopeAll platforms re-assessed in the next assessment cycle following methodology release

A “vendor major release” is any release that changes the platform’s behaviour or documented capability on any AGS dimension. Examples include: new governance features, removal of existing capabilities, architectural changes to enforcement layer, expansion or restriction of supported sectors, changes to default configurations, or material updates to security and compliance posture.

Vendors are expected to notify Imperium of major releases within 30 days of release via disputes@agentgoverning.com with the subject line “MAJOR RELEASE: [platform name]”. Imperium also monitors public surfaces — vendor changelogs, GitHub releases, blog posts, product announcements, and security disclosures — to detect material changes between vendor notifications. Undisclosed material releases discovered after the fact result in retroactive score expiry, with the platform reverting to “Estimated — pending re-assessment” until the next assessment completes.

Monitoring Cadence

Between scheduled review cycles, Imperium monitors each assessed platform’s public surface area for material changes:

  • Daily: automated checks of vendor changelogs, RSS feeds, GitHub release watches, status pages
  • Weekly: manual review of vendor blog posts, product announcement pages, security disclosure pages
  • Monthly: comprehensive review of vendor documentation pages flagged in the evidence file
  • Quarterly: full re-assessment per the scheduled cycle

Material changes detected outside the scheduled cycle trigger immediate re-assessment of affected dimensions per the SLAs in the trigger table above.

Review and Approval Workflow

Each score update follows the same workflow regardless of trigger:

  1. Trigger detected. The trigger and its source are recorded in the platform’s evidence file.
  2. Affected dimensions identified. The assessor identifies which AGS dimensions are affected by the change.
  3. Evidence collected. Public documentation supporting the score change is collected with URLs, access dates, and direct quotations as required by the Agent Audit methodology (and the LLM Audit methodology when published at AGS v2.2).
  4. Draft scores produced. Updated per-dimension scores are drafted with rationale for each change.
  5. Internal review. Until the AGS v3.0 Technical Advisory Board is established, draft scores are reviewed by the AGS Author against the published methodology and operational rules. This is acknowledged as a single-assessor review limitation in the Independence and Conflict of Interest document.
  6. Vendor notification. The vendor is notified of the proposed score change at least 7 calendar days before publication (see Vendor Notification Policy below).
  7. Vendor response window. Vendors may respond with additional evidence within the 7-day window. If material new evidence is provided, the change enters the fast-track dispute path (see Score Dispute Process).
  8. Score published. Updated scores are published on agentgoverning.com on the agreed publication date. The evidence file is updated. The change is logged in the public score change register (see Score Change Register below).
  9. Vendor confirmation. The vendor is notified of publication. Score history and evidence file URLs are included in the notification.

The same workflow applies to all platforms, including Agent Shield. Agent Shield’s score is reviewed under the same triggers, on the same cadence, with the same evidence standards as competitor platforms — this is required by the Independence and Conflict of Interest document.

Vendor Notification Policy

Vendors are notified of any score change via email to a maintained vendor contact registry. Notification timeline is 7 calendar days before publication.

Notification content:

  • Platform name and current score
  • Proposed updated score and effective date
  • Per-dimension changes with rationale
  • Evidence supporting each change (URLs, quotations)
  • Vendor’s response options:
    • Accept (publication proceeds on the agreed date)
    • Submit additional evidence (publication paused, fast-track dispute initiated)
    • Formally dispute (publication paused, standard dispute initiated per the Score Dispute Process)

Vendor contact registry. Imperium maintains a contact registry for each assessed platform. Vendors are encouraged to provide a designated contact at disputes@agentgoverning.com with subject “VENDOR CONTACT: [platform name]”. Where no designated contact exists, Imperium uses the vendor’s publicly listed press, security, or partnerships contact in that order of preference. If no public contact can be reached, the score change is held until contact is established or until 30 days have elapsed, after which the score publishes with the absence of vendor response noted in the change register.

Silence is acceptance. If no vendor response is received within the 7-day notification window, the proposed score change is published as drafted on the next publication cycle. The absence of response is noted in the change register entry.

Vendor Self-Disputes

Vendor self-disputes follow the standard process specified in the Score Dispute Process, with public-evidence requirements applied uniformly across all platforms including Agent Shield.

Score Change Register

Every score change ever published is logged in the public score change register at agentgoverning.com/score-history/. Each entry includes:

  • Date of change
  • Platform name
  • Old score and new score (headline percentage)
  • Per-dimension changes (dimension ID, old per-dimension score, new per-dimension score)
  • Trigger that initiated the change (scheduled cycle, major release, vendor request, dispute, methodology version)
  • Brief rationale for the change
  • Link to the corresponding evidence file revision
  • Vendor response status (accepted, no response, disputed, dispute outcome)
  • Methodology version under which the change was scored

The register is maintained as an append-only log. No score change is retroactively altered without re-assessment. When the AGS methodology version changes, prior scores remain published under the methodology version that produced them, with the methodology version visible alongside each score on the leaderboard and on platform pages.

Retraction Policy

In rare cases, a published score may need to be retracted — for example, if the assessor’s evidence sources are later found to be unreliable (a vendor URL changed and the cited content no longer matches), if an internal error is discovered, or if a methodology error is identified through dispute resolution.

Retraction follows this process:

  1. The reason for retraction is documented in the public score change register
  2. The previously published score is replaced with a “Retracted — pending re-assessment” status on the leaderboard
  3. A re-assessment is conducted within 30 days of retraction
  4. The new score (if any) is published with full evidence and the retraction history visible

Retractions are public events. Imperium commits to publicising any retraction promptly, including retractions that affect Agent Shield. The Score Change Register preserves the history of any retracted score for citation continuity.

Score-Change Reversal Policy

A score that has been amended through the update or dispute process may itself be subsequently amended again — for example, if a vendor withdraws a capability claim, if subsequent evidence contradicts a prior amendment, or if a methodology refinement clarifies an ambiguous case in either direction.

Score reversals follow the standard update workflow with one additional requirement: the change rationale must explicitly reference the prior amendment being reversed. The Score Change Register preserves the full history of every revision so that any score’s lineage can be reconstructed.

Coordination With the Dispute Process

This document and the Score Dispute Process describe one coherent system. The handshake works as follows:

SituationPath
Imperium initiates a score changeUpdate process: 7-day vendor notification → publication or dispute
Vendor responds during 7-day window with new evidenceFast-track dispute: 14-day final determination, publication paused
Vendor responds during 7-day window with formal objectionStandard dispute: 30-day final determination, publication paused
Third party disputes a published scoreStandard dispute: 30-day final determination
Vendor disputes their own published scoreStandard dispute: 30-day final determination
Dispute outcome is “score amended”Update process: amended score follows update workflow with 7-day notification
Dispute outcome is “methodology refined”Update process: methodology version increment, all platforms re-assessed in next cycle

Vendors interacting with both processes can expect the same notification cadence, the same evidence standards, and the same Score Change Register entry pattern regardless of which path their score change followed.

Limitations

The score update process has the following acknowledged limitations as of AGS v2.1:

  1. Single-assessor application. All score changes are currently reviewed by the AGS Author. Mitigations are documented in the Independence and Conflict of Interest document; the planned AGS v3.0 Technical Advisory Board addresses this structurally.
  2. Detection latency. Material changes by vendors that are not announced may not be detected until the next scheduled monitoring cycle. Vendors are encouraged to notify Imperium directly to ensure timely re-assessment.
  3. Public-source dependency. Re-assessments depend on the availability of public documentation. Capabilities documented privately do not trigger score changes; vendors are encouraged to publish capabilities to receive credit.
  4. Vendor-contact dependency. Where no vendor contact is reachable, score changes proceed after 30 days without notification. This is a known limitation and is mitigated by the Vendor Contact Registry.

Changelog

VersionDateChange
1.02026-04Initial publication — codifies the score update process applied to AGS v2.1 leaderboard scoring as published on agentgoverning.com

Related Documents