AG-346: Frontier Capability Reclassification Governance

2. Summary

Frontier Capability Reclassification Governance requires that organisations reassess the risk classification, governance controls, and deployment constraints of AI models whenever new capabilities are discovered or emerge — whether through model updates from providers, novel prompting techniques discovered in the field, fine-tuning that unlocks latent capabilities, or adapter compositions that create emergent capabilities. The governance posture established at initial deployment is based on the model's understood capability profile at that point. When that profile changes materially — a model previously classified as low-risk demonstrates unexpected reasoning about hazardous materials, or a model update adds agentic tool use that was absent at classification time — the original governance controls may be insufficient. AG-346 mandates a reassessment trigger and reclassification process to ensure that governance controls evolve with capabilities.

3. Example

Scenario A — Provider Update Adds Agentic Capability: An organisation deploys a language model for document summarisation, classified as low-risk because it has no ability to take actions — it can only read and summarise. The model provider releases an update that adds native tool-use capabilities: the model can now call APIs, execute code, and browse the web. The update is applied automatically through the provider's API versioning. The organisation's deployment infrastructure passes user queries to the model and returns results; it does not constrain tool use because the original model could not use tools. Within 48 hours of the update, the summarisation agent begins making web requests to verify facts in documents, inadvertently exposing confidential document contents to external web services through its search queries.

What went wrong: The model update added capabilities (tool use) that fundamentally changed the risk profile. The original classification (low-risk, read-only) was no longer accurate. No reclassification trigger existed for capability changes. The deployment infrastructure assumed the model could not take actions and therefore had no action-constraining controls. Consequence: Confidential document contents exposed to external services, potential data breach notification requirement, client notification obligation, and emergency deployment rollback costing £75,000 in engineering time.

Scenario B — Novel Prompting Technique Unlocks Latent Capability: A model deployed for customer service is classified as medium-risk with controls calibrated for conversational text generation. A security researcher publishes a novel prompting technique that enables the model to produce detailed instructions for synthesising controlled substances — a capability that standard safety evaluations did not test because it was not considered within the model's capability range. The published technique works on the organisation's deployed model version. The organisation discovers the exposure when a journalist contacts them with examples of harmful outputs produced using the published technique.

What went wrong: The model's capability classification was based on evaluation at deployment time. No ongoing monitoring for capability discoveries in the public domain existed. No reclassification trigger activated when the prompting technique was published. The governance controls were calibrated for a capability profile that no longer reflected reality. Consequence: Reputational damage, emergency content filter deployment (£45,000), re-evaluation of the model's capability profile (£120,000), and media management costs.

Scenario C — Fine-Tuning Unlocks Latent Capabilities: An organisation fine-tunes a base model on legal documents to create a legal research assistant, classified as medium-risk. The fine-tuning inadvertently activates the model's latent capability for generating convincing legal documents — not just researching them. The model begins producing legal opinions that appear authoritative and are formatted as formal legal memoranda. Junior legal staff treat these outputs as drafts of actual legal advice rather than research summaries. One generated opinion contains a material error in statutory interpretation that a client acts upon, resulting in a £1.4 million adverse outcome.

What went wrong: The fine-tuning changed the model's effective capability from "research assistant" to "legal document generator" without triggering reclassification. The original medium-risk classification and associated controls were appropriate for a research tool but not for a tool producing authoritative-seeming legal opinions. No evaluation assessed whether fine-tuning had changed the model's capability classification. Consequence: £1.4 million client loss, professional liability exposure, insurance claim, and immediate service suspension pending full capability reassessment.

4. Requirement Statement

Scope: This dimension applies to all deployed AI models throughout their operational lifecycle, from initial deployment to decommissioning. It covers capability changes arising from any source: model provider updates (including minor and patch versions), novel prompting or jailbreaking techniques published in the research community or discovered internally, fine-tuning or adaptation that activates latent capabilities, adapter composition that creates emergent capabilities (per AG-342), and changes in the deployment context that expose the model to new interaction patterns. The scope explicitly includes models accessed through third-party APIs where the provider may update the model without notice. Reclassification is not a one-time event — it is an ongoing obligation for the entire deployment lifecycle.

4.1. A conforming system MUST define and document triggers that initiate a capability reclassification assessment, including: model version updates from providers, discovery of novel capability-eliciting techniques (internally or externally), fine-tuning or adaptation operations that may change the capability profile, significant changes to the deployment context or integration architecture, and periodic scheduled reassessment at defined intervals (at least annually).

4.2. A conforming system MUST conduct a capability reclassification assessment when any defined trigger activates, evaluating whether the model's effective capability profile has changed materially from the profile on which the current risk classification is based.

4.3. A conforming system MUST update governance controls, deployment constraints, and risk classification when a reclassification assessment determines that the capability profile has changed materially — upgrading controls if capabilities have increased, and potentially relaxing controls (with approval) if capabilities have decreased.

4.4. A conforming system MUST maintain version-pinning or update-gating capability for models accessed through third-party APIs, preventing automatic updates from changing the deployed model's capabilities without assessment.

4.5. A conforming system MUST document all reclassification assessments, whether or not they result in a classification change, including the trigger, the assessment methodology, the findings, and the decision.

4.6. A conforming system SHOULD monitor public sources (security research publications, vulnerability databases, provider announcements, AI safety forums) for discoveries of new capability-eliciting techniques affecting deployed model families.

4.7. A conforming system SHOULD include latent capability probing in post-fine-tuning evaluation (complementing AG-341), specifically testing whether the fine-tuning has activated capabilities not present in the base model.

4.8. A conforming system SHOULD implement automated capability monitoring that periodically probes the deployed model for capabilities not in its classification profile.

4.9. A conforming system MAY participate in information-sharing arrangements with other deployers of the same model families to receive early warning of capability discoveries.

5. Rationale

The capability profile of an AI model is not static. It changes through explicit modifications (provider updates, fine-tuning) and through discovery of latent capabilities (novel prompting techniques, emergent behaviours). The governance posture established at deployment time is based on the model's understood capabilities at that time. If capabilities change and governance does not, a gap opens between what the model can do and what the governance controls are designed to constrain.

This gap is particularly dangerous because capability increases are often invisible. A model that gains tool-use capabilities through a provider update looks the same from the outside — the API signature may not change, the response format may be identical, and standard quality metrics may improve. The capability increase only becomes apparent when someone tests for it or when it manifests in production through an unintended action.

The latent capability problem adds another dimension of risk. Modern large language models contain capabilities that are present in the weights but not easily accessible through standard prompting. These capabilities were learned during pre-training but are suppressed by safety alignment or simply not surfaced by typical inputs. When a novel prompting technique or a fine-tuning operation makes these capabilities accessible, the model's effective capability profile changes even though the weights have not — or have changed only slightly in the case of fine-tuning. The organisation's risk classification, based on the capabilities that were accessible at classification time, is now outdated.

The regulatory dimension is increasingly important. The EU AI Act classifies AI systems based on their risk level, which is determined in part by their capabilities. If a model's capabilities increase to the point where it would be classified at a higher risk level, the organisation's obligations under the EU AI Act change accordingly. An organisation that does not reassess classification when capabilities change risks operating a high-risk system under low-risk governance — a compliance failure.

6. Implementation Guidance

Reclassification trigger registry. Maintain a documented list of events that trigger reclassification assessment. At minimum: any update to the model version or API version from a third-party provider, publication of novel capability-eliciting techniques for the deployed model family (monitored through security advisories, research publications, and industry forums), any fine-tuning or adaptation operation (linking to AG-341's evaluation), any adapter composition change (linking to AG-342's evaluation), any change to the deployment architecture that gives the model access to new tools or systems, and scheduled periodic reassessment (recommended: quarterly for high-risk deployments, annually for low-risk).

Capability assessment protocol. Define a standard protocol for capability reassessment. The protocol should include: evaluation on the original classification benchmark suite (to detect capability regressions as well as gains), evaluation on an expanded capability probe suite (testing for capabilities not in the original classification, including dual-use and harmful capabilities), assessment of the model's tool-use capabilities (if the deployment provides tool access), assessment of the model's autonomous action capabilities, and comparison against the documented capability profile on which the current classification is based.

Version-pinning strategy. For models accessed through third-party APIs: pin to a specific model version in production, monitor the provider's release schedule, and gate upgrades on reclassification assessment. Most major API providers support version specification (e.g., model="gpt-4-0613" rather than model="gpt-4"). Using unpinned version identifiers that automatically resolve to the latest version is a governance failure for production deployments.

Recommended patterns:

Capability change detection dashboard. Implement an automated monitoring system that periodically probes the deployed model with a capability detection suite and compares results against the baseline capability profile. The dashboard highlights any capability scores that have changed by more than a defined threshold, triggering investigation. This catches both provider-initiated changes and fine-tuning-induced capability changes.
External threat intelligence integration. Subscribe to AI safety research feeds, CVE databases, and model-specific security advisories. When a novel capability-eliciting technique is published for a model family the organisation deploys, automatically flag the relevant deployments for reclassification assessment. Example sources: AI safety research preprints, provider security advisories, and industry information-sharing groups.
Graduated response protocol. Define response tiers based on the severity of the capability change. Tier 1 (minor): document the change, update the capability profile, verify existing controls are sufficient. Tier 2 (moderate): enhance controls to address the new capability, implement additional monitoring. Tier 3 (critical): immediately restrict or suspend the deployment until enhanced controls are in place and validated.

Anti-patterns to avoid:

Automatic API version upgrades in production. Using unpinned model identifiers that resolve to the latest version. This delegates the timing and scope of capability changes to the model provider, bypassing the organisation's governance process entirely. Every production deployment should use a pinned version.
Classifying once and never reassessing. The deployment-time classification is a snapshot. Without ongoing reassessment, the classification becomes stale as capabilities evolve. A model classified as low-risk 18 months ago may have capabilities today that would warrant high-risk classification.
Treating provider updates as bug fixes. Model providers release updates for various reasons — performance improvements, safety enhancements, new capabilities, cost optimisations. Any update may change the capability profile. Even "bug fix" releases can alter model behaviour in ways that affect capability classification.
Ignoring capability discoveries in the research community. The AI safety research community regularly discovers novel capability-eliciting techniques. Organisations that do not monitor these discoveries may be operating models with known (to the research community) exploitable capabilities without any compensating controls.
Conflating safety alignment updates with risk reduction. A provider's safety alignment update may suppress some capabilities but may also be accompanied by capability increases in other dimensions. A net assessment is required — not an assumption that safety updates reduce risk uniformly.

Industry Considerations

Financial Services. Capability reclassification in financial services may trigger model risk management re-validation obligations. If a model deployed for customer communication gains capability for financial advice through an update or technique, it may fall under MiFID suitability requirements that the original deployment did not address.

Healthcare. Capability changes in clinical AI may constitute a significant change requiring regulatory notification. A diagnostic model that gains capability for treatment recommendation through a provider update may cross the boundary from clinical decision support to medical device.

Defence and Dual-Use. Capability increases may trigger export control reclassification. A model that gains autonomous planning or weapon-system reasoning capabilities through an update or fine-tuning may become subject to export restrictions.

Maturity Model

Basic Implementation — The organisation pins API versions for production deployments and conducts reclassification when major version changes occur. Novel capability discoveries are addressed reactively when they come to the team's attention. Scheduled reassessment is annual. This level prevents automatic updates but does not proactively monitor for capability changes.

Intermediate Implementation — A documented trigger registry defines all reclassification triggers. Capability assessments follow a standard protocol. External monitoring covers provider announcements and major AI safety publications. Version pinning is enforced for all production deployments. Reclassification assessments are documented regardless of outcome. Scheduled reassessment is quarterly for high-risk deployments.

Advanced Implementation — All intermediate capabilities plus: automated capability change detection probes deployed models periodically. External threat intelligence integration flags relevant capability discoveries. Graduated response protocols define response tiers by severity. Latent capability probing is included in all fine-tuning evaluations. The organisation can demonstrate continuous capability monitoring across its entire model deployment inventory, with documented assessments for every trigger event.

7. Evidence Requirements

Required artefacts:

Trigger registry. The documented list of reclassification triggers and their definitions.
Reclassification assessment records. All assessments conducted, including trigger, methodology, findings, and classification decision.
Capability profile baselines. The baseline capability profile for each deployed model at classification time.
Version pinning evidence. Configuration evidence showing that production deployments use pinned model versions.
External monitoring records. Evidence of monitoring for capability discoveries (subscription records, review logs, response records).

Retention requirements:

Reclassification records: operational lifetime of the deployment plus minimum 5 years for regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators or auditors within 48 hours of request.

8. Test Specification

Test 8.1: Trigger Registry Completeness

Stimulus: Review the trigger registry. Verify that it covers all required trigger categories (provider updates, novel techniques, fine-tuning, architecture changes, scheduled reassessment).
Expected behaviour: The trigger registry is documented and covers all required categories with clear definitions.
Pass criteria: All required trigger categories are defined with clear activation criteria.
Fail criteria: Any required trigger category is missing, or trigger definitions are vague enough to be unenforceable.

Test 8.2: Version Pinning Enforcement

Stimulus: Audit all production deployments using third-party model APIs. Verify that each uses a pinned model version identifier.
Expected behaviour: Every deployment specifies an exact model version, not a floating identifier.
Pass criteria: 100% of production deployments use pinned version identifiers.
Fail criteria: Any production deployment uses an unpinned or floating version identifier.

Test 8.3: Reclassification on Provider Update

Stimulus: Simulate a model provider announcing a version update. Verify that the reclassification trigger activates.
Expected behaviour: The trigger activates and initiates a reclassification assessment workflow. The updated version is not deployed until the assessment is complete.
Pass criteria: The trigger fires and blocks deployment of the updated version until reclassification is complete.
Fail criteria: The update is deployed without triggering reclassification assessment.

Test 8.4: Capability Assessment Coverage

Stimulus: Retrieve the most recent reclassification assessment. Verify that it covers both the original classification benchmarks and the expanded capability probe suite.
Expected behaviour: The assessment evaluates the model on the original classification benchmarks (to verify maintenance of classified capabilities) and the expanded probe suite (to detect new capabilities).
Pass criteria: Both original and expanded assessments are present with documented results.
Fail criteria: The assessment covers only the original benchmarks, or the expanded probe suite is absent.

Test 8.5: External Monitoring Responsiveness

Stimulus: Identify a capability-eliciting technique published in the last 6 months for a model family deployed by the organisation. Verify that a reclassification assessment was triggered.
Expected behaviour: The external monitoring process identified the publication and triggered a reclassification assessment for affected deployments.
Pass criteria: The publication was identified and an assessment was conducted for all affected deployments.
Fail criteria: The publication was not identified, or affected deployments were not assessed.

Test 8.6: Controls Updated After Reclassification

Stimulus: Retrieve a reclassification assessment that resulted in an upgraded risk classification. Verify that governance controls were updated accordingly.
Expected behaviour: Controls were enhanced to match the new risk classification before the model continued serving production traffic.
Pass criteria: Controls match the new classification. The update was implemented before or simultaneously with continued production operation.
Fail criteria: Controls remain at the previous classification level despite reclassification, or the model continued operating under inadequate controls during the control update period.

Conformance Scoring

Score 0: No reclassification governance — models are classified once at deployment and never reassessed regardless of capability changes.
Score 1: Reactive reclassification — major provider updates trigger informal reassessment, but no documented trigger registry, protocol, or external monitoring exists.
Score 2: Systematic reclassification — documented trigger registry, standardised assessment protocol, version pinning, and documented assessments for all triggered events.
Score 3: Proactive reclassification — all Score 2 controls plus automated capability change detection, external threat intelligence integration, graduated response protocols, and continuous monitoring across the deployment inventory.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9 (Risk Management System — continuous, iterative)	Direct requirement
EU AI Act	Article 6 (Classification Rules for High-Risk)	Supports compliance
EU AI Act	Article 43 (Conformity Assessment — substantial modification)	Direct requirement
NIST AI RMF	GOVERN 1.2, MAP 2.3, MANAGE 3.1, MEASURE 2.5	Supports compliance
ISO 42001	Clause 8.2 (AI Risk Assessment — periodic), Clause 10.1 (Continual Improvement)	Supports compliance
PRA SS1/23	Model Risk Management — Ongoing Monitoring	Direct requirement

EU AI Act — Article 43 (Conformity Assessment — Substantial Modification)

Article 43 requires a new conformity assessment when a high-risk AI system undergoes a "substantial modification." A capability change that materially alters the model's risk profile constitutes a substantial modification. AG-346 ensures that organisations detect substantial modifications — whether initiated by the organisation or by a provider's update — and trigger the appropriate reassessment. Without capability monitoring, a substantial modification could occur without the organisation's knowledge, creating a compliance gap.

EU AI Act — Article 6 (Classification Rules)

Article 6 defines the criteria for classifying AI systems as high-risk. A model that gains new capabilities may cross the threshold from low-risk to high-risk, triggering obligations under the AI Act that did not apply at the original classification. AG-346's reclassification process ensures that the organisation detects when this threshold is crossed and applies the appropriate governance.

PRA SS1/23 — Ongoing Monitoring

PRA SS1/23 expects firms to monitor model performance and behaviour on an ongoing basis, with trigger-based re-validation when material changes occur. A capability change is a material change. AG-346's trigger-based reclassification process directly aligns with the supervisory expectation for ongoing monitoring with event-driven re-assessment.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Deployment-wide — potentially organisation-wide if the same model family is deployed across multiple services

Consequence chain: Failure to reclassify when capabilities change creates a growing gap between the model's actual capability and the governance controls constraining it. The gap widens with each undetected capability change. When the gap manifests — through an unintended action, a data exposure, or a harmful output — the consequences are proportional to the gap size. Scenario A's confidential document exposure occurred within 48 hours of a capability change. Scenario C's £1.4 million client loss resulted from a fine-tuning-induced capability change that went undetected for weeks. The common factor is that governance controls calibrated for one capability profile were applied to a different, more capable model. The regulatory consequence is particularly severe: operating a high-risk AI system under low-risk governance is a direct violation of the EU AI Act's classification requirements. The organisation cannot claim ignorance — AG-346 establishes that monitoring for capability changes is an ongoing obligation, and failure to monitor is itself a governance failure.

Cross-references: AG-048 (AI Model Provenance and Integrity) tracks model versions and provides the version awareness necessary for reclassification triggers. AG-342 (Adapter Composition Approval Governance) addresses emergent capabilities from adapter composition. AG-341 (Fine-Tune Objective Documentation Governance) addresses capability changes from fine-tuning. AG-024 (Authorised Learning Governance) governs the authorisation of changes that may alter capabilities. AG-339 through AG-348 form the sibling landscape for Model Provenance, Training & Adaptation.

Cite this protocol

AgentGoverning. (2026). AG-346: Frontier Capability Reclassification Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-346

← Previous Protocol

AG-345

Model Family Substitution Governance

Next Protocol →

AG-347

Model Rollback Readiness Governance