The Standard

The 841 Dimensions Regulatory Mapping Version History

Compliance

Compliance Leaderboard Platform Comparison

Verification

Submit for Verification Self-Assessment Tool

About

About AgentGoverning Press & Media

Contact

AG-048

AI Model Provenance and Integrity

Group J — Cross-Border, Explainability & Physical ~17 min read AGS v2.1 · April 2026

EU AI Act SOX FCA NIST ISO 42001 SOC 2

2. Summary

AI Model Provenance and Integrity governs the verification and ongoing assurance that the underlying AI model powering an agent is the model that was approved, tested, and calibrated for governance purposes. Every governance protocol in this framework makes assumptions about the model's behaviour — its reasoning patterns, its response to instructions, its compliance with constraints, its susceptibility to known attack vectors. If the model is substituted, poisoned, or silently updated, those assumptions become invalid and the entire governance framework operates on a false foundation. AG-048 ensures that the model in production is the model that was evaluated, and that any change to the model triggers governance re-assessment. This protocol is distinct from AG-014 (External Dependency Integrity), which governs the tools, APIs, and services an agent calls during operation. AG-014 addresses the service layer — the instruments the agent uses. AG-048 addresses the agent's cognitive core — the model that processes inputs, reasons about actions, and generates outputs. A compromised model is fundamentally more dangerous than a compromised tool because it produces incorrect reasoning across all operations, potentially in ways specifically designed to evade governance detection.

3. Example

Scenario A — Silent API Model Update Shifts Risk Profile: A wealth management firm uses an AI agent for portfolio rebalancing recommendations, powered by a commercial language model API. The API provider updates the model to version 4.1, improving general reasoning but subtly changing how the model weighs risk factors. The provider updates the model changelog but does not notify customers directly. The firm's agent begins recommending slightly more aggressive allocations for conservative-profile clients. Over eight weeks, portfolio drift accumulates. A quarterly review reveals that 340 client portfolios have drifted outside their approved risk bands. The firm cannot demonstrate that the model powering the recommendations was the model that was approved by the model risk committee.

What went wrong: The firm relied on the API endpoint remaining stable and did not verify model version or behaviour on an ongoing basis. No behavioural fingerprinting existed to detect the shift in risk assessment. The model change was not treated as a mandate change event. Consequence: 340 client portfolios require manual review and potential rebalancing. Client complaints and potential mis-selling claims. FCA model risk management findings. The firm's model risk management framework is found inadequate under SS1/23 expectations.

Scenario B — Supply Chain Attack Embeds Backdoor in Fine-Tuned Model: An organisation fine-tunes an open-source model for its governance-sensitive application using a dataset sourced from a public repository. The dataset contains poisoned examples that embed a backdoor: when the model encounters a specific trigger phrase in its input, it bypasses its safety constraints and follows instructions uncritically. The backdoor passes standard evaluation benchmarks because it only activates on the specific trigger. In production, an external party includes the trigger phrase in a transaction description, causing the agent to approve a fraudulent transaction of $450,000 that would normally be blocked.

What went wrong: The model's provenance chain did not include verification of the fine-tuning dataset. No behavioural fingerprinting tested for trigger-activated behaviour changes. The model was treated as trustworthy based on benchmark performance alone. Consequence: $450,000 in fraudulent loss. Potential additional losses as the backdoor may have been exploited previously without detection. The organisation must withdraw the model from production and re-fine-tune from a verified dataset. Regulatory investigation into the adequacy of model risk management controls.

Scenario C — Internal Deployment Error Substitutes Wrong Model: During a routine deployment update, an operations engineer accidentally deploys a development version of the model instead of the production version. The development model has relaxed safety constraints because it was configured for internal testing. The deployment pipeline does not verify model identity — it deploys whatever artefact is in the specified storage location. The development model processes production transactions for 72 hours before the error is discovered during a routine operational review. During this period, the model approved several transactions that exceeded normal risk thresholds.

What went wrong: The deployment pipeline did not verify model identity against the approved model registry. No hash verification, version check, or behavioural fingerprint comparison occurred at deployment time. The deployment was treated as an infrastructure operation rather than a governance-relevant event. Consequence: 72 hours of production transactions processed by an unapproved model. All transactions must be retrospectively reviewed. Regulatory reporting obligation for inadequate change control. SOX control deficiency finding for the deployment pipeline.

4. Requirement Statement

Scope: This dimension applies to all deployments where the underlying model could be substituted, poisoned, or fine-tuned without governance review. This includes virtually every AI agent deployment — whether the model is self-hosted, accessed through an API, or deployed as a managed service. The scope extends to all components of the model stack: base model weights, fine-tuning adapters (e.g., LoRA adapters), system prompts that materially alter model behaviour, inference configuration parameters (temperature, top-p, top-k), and any post-processing or filtering layers that shape model outputs. A change to any of these components can alter the model's behaviour in governance-relevant ways. The scope also covers model provider changes — if an API provider silently updates the model behind an endpoint, this constitutes a model change from a governance perspective, even though the API endpoint URL has not changed. AG-048 requires detection of such changes regardless of whether the provider communicates them.

4.1. A conforming system MUST record and verify model version at agent deployment and on every agent restart.

4.2. A conforming system MUST detect unauthorised model version changes and trigger an immediate governance alert upon detection.

4.3. A conforming system MUST maintain verifiable model provenance from source to deployment, with a documented chain of custody.

4.4. A conforming system MUST treat model changes as mandate change events requiring governance re-approval per AG-007.

4.5. A conforming system SHOULD verify model endpoint authenticity cryptographically, through signed model hashes or TLS certificate pinning.

4.6. A conforming system SHOULD detect behavioural changes consistent with weight poisoning or backdoor fine-tuning through continuous agent monitoring.

4.7. A conforming system SHOULD require explicit governance re-approval before an updated model is used in production.

4.8. A conforming system SHOULD maintain a model registry with the complete history of all model versions used in production, including deployment timestamps and approval records.

4.9. A conforming system MAY implement behavioural fingerprinting to detect model substitution without access to model weights.

4.10. A conforming system MAY maintain a shadow deployment where the approved model version runs in parallel with the production version for continuous comparison.

5. Rationale

AI Model Provenance and Integrity addresses a foundational vulnerability in any AI governance framework: the assumption that the model being governed is the model that was assessed. Every other protocol in this standard — from AG-001's mandate enforcement to AG-022's agent monitoring — is calibrated against a specific model's behaviour. If the model changes, those calibrations become unreliable, and the governance framework operates on false assumptions.

The threat landscape for model integrity includes several distinct attack vectors. Supply chain attacks compromise the model before or during delivery, embedding backdoors or biased behaviours that activate under specific conditions. Silent version substitution replaces a tested model with an untested or compromised version, either through API provider changes or internal deployment errors. Weight poisoning introduces targeted modifications to model weights that alter behaviour for specific inputs while preserving normal behaviour for general inputs, making detection through routine testing extremely difficult. Fine-tuning attacks use the model's own adaptation mechanisms to modify its behaviour in ways that undermine governance compliance.

The principle underlying AG-048 is that model identity is a governance-critical property that must be verified continuously, not assumed. Just as AG-001 requires structural enforcement of operational boundaries rather than relying on the agent's own compliance, AG-048 requires structural verification of model identity rather than relying on the assumption that the model has not changed. The model is not a static component — it is a living dependency that can change through provider updates, deployment errors, supply chain compromise, or deliberate attack. Governance must account for this dynamism.

The failure mode is systemic. Unlike a single protocol failure that affects one aspect of governance, a model integrity failure undermines every protocol that depends on model behaviour — which is nearly all of them. AG-001's mandate enforcement assumes the model respects structural boundaries. AG-005's instruction integrity assumes the model processes instructions as designed. AG-022's agent monitoring assumes a stable behavioural baseline. A compromised model invalidates all of these assumptions simultaneously. Governance calibration is only valid for the specific model version it was performed against, and any model change invalidates that calibration until re-assessment is completed.

6. Implementation Guidance

Record model identifier, version hash, and source at deployment. Verify the hash on every startup and periodically during operation. Implement continuous behavioural fingerprinting to detect statistical deviations consistent with model substitution. Treat any detected version or behaviour change as a critical governance event requiring immediate review.

Recommended patterns:

Cryptographic model registry. Maintain a model registry where each approved model version is recorded with its cryptographic hash, approval metadata, and behavioural baseline. At deployment and on every restart, the deployed model's hash is verified against the registry. For API-hosted models where hash verification is not possible, the registry stores the behavioural baseline instead. Any discrepancy blocks agent operations until resolved.
Continuous behavioural fingerprinting. Implement a monitoring service that periodically submits a standard set of test prompts to the model and compares outputs against the approved baseline. The test set includes prompts designed to detect common modification patterns: risk assessment shifts, safety constraint relaxation, instruction compliance changes, and trigger-activated behaviour. Statistical divergence beyond a defined threshold triggers an alert and optional automatic agent suspension.
Shadow model comparison. Maintain a verified copy of the approved model version in a controlled environment. Route a sample of production requests to both the production model and the shadow model. Compare outputs for statistical consistency. Divergence indicates that the production model has changed. This pattern is particularly valuable for API-hosted models where direct hash verification is impossible.

Anti-patterns to avoid:

Trusting API provider version metadata. When using API-hosted models, organisations often rely on the provider's reported version string to verify model identity. Version strings can be stale, inaccurate, or unchanged despite material model modifications. Behavioural verification provides an independent check that does not depend on provider accuracy.
Testing only at deployment time. Model integrity must be verified continuously, not only at deployment. API providers can update models at any time. Self-hosted models can be modified through unauthorised access. Ongoing verification catches changes that post-deployment checks miss.
Ignoring fine-tuning data provenance. Organisations that fine-tune models often verify the base model provenance but not the fine-tuning dataset. Poisoned training data is one of the most effective attack vectors for model compromise. The provenance chain must extend through the fine-tuning process.
Treating model updates as routine maintenance. Model updates are governance events, not operational events. Even minor version updates can change model behaviour in governance-relevant ways. Every model change should trigger a governance re-assessment proportionate to the scope of the change.

Industry Considerations

Financial Services. Model risk management is a well-established discipline in financial services, governed by regulatory expectations including the FCA's SS1/23 and the Fed/OCC's SR 11-7. AG-048 extends traditional model risk management to address the specific risks of AI model supply chains. Financial services firms should integrate AG-048 verification into their existing model risk management frameworks, including model inventory, validation, and ongoing monitoring processes. Per-transaction limits, daily aggregate limits, and product-type restrictions calibrated against a specific model version require re-validation when the model changes.

Healthcare. AI models used in clinical decision support are subject to medical device regulations in many jurisdictions. A model change in a clinical AI system may require re-certification under FDA 510(k) or EU MDR. AG-048's requirement for change detection and re-approval directly supports regulatory compliance for clinical AI models, ensuring that model changes are detected before they affect patient care.

Critical Infrastructure. Models controlling critical infrastructure must meet the highest integrity standards. Model substitution in a power grid management system or water treatment facility could have physical safety consequences. AG-048 implementation for critical infrastructure should include hardware-backed model verification, air-gapped model registries, and immediate operational shutdown if model integrity cannot be confirmed.

Maturity Model

Basic Implementation — The organisation records the model identifier and version at deployment time. The model version is logged in the governance record for each agent session. Version verification occurs on agent startup by querying the model endpoint for version metadata. If the reported version differs from the approved version, an alert is generated. This level meets the minimum mandatory requirements but has weaknesses: it relies on the model provider accurately reporting version information, it does not detect changes between startup checks, and it cannot detect behavioural changes that occur without a version change (e.g., weight poisoning or provider-side modifications that do not update the version string).

Intermediate Implementation — Model identity is verified through cryptographic mechanisms in addition to version metadata. For self-hosted models, this means verifying the hash of model weights on every load. For API-hosted models, this means continuous behavioural fingerprinting — running a standard set of test prompts through the model at regular intervals and comparing outputs to a baseline established during governance approval. Statistically significant deviations trigger an alert. The model registry maintains a complete provenance chain including source, download verification, fine-tuning history, and deployment approvals. Model updates go through a formal change control process with governance re-assessment before production deployment.

Advanced Implementation — All intermediate capabilities plus: behavioural fingerprinting operates continuously during production use, not only through dedicated test prompts. Statistical models detect subtle behavioural drift that might indicate gradual model modification. Shadow deployments run the approved model version in parallel, comparing outputs to detect divergence. Independent adversarial testing has verified that model substitution is detected even when the substitute model is specifically designed to mimic the original on common inputs. Hardware security modules protect model signing keys and fingerprint baselines. The organisation can demonstrate to regulators that model identity is verified to a cryptographic standard of assurance.

7. Evidence Requirements

Required artefacts:

Model version registry. Complete record of all model versions deployed to production, with deployment timestamps, approval records, and retirement dates. Format: structured data export from the model registry.
Provenance verification mechanism. Documentation of how model identity is verified, including hash verification procedures or behavioural fingerprinting methodology and the specific test prompts and baseline responses used.
Change detection procedure. Documented process for detecting and responding to model changes, including alert thresholds, response procedures, and escalation paths.
Re-approval requirement documentation. Evidence that model changes trigger governance re-assessment and require explicit approval before production use, including records of re-approval events.
Behavioural fingerprint baselines. The baseline behavioural profile established during governance approval, against which ongoing behaviour is compared, with the statistical methods used to detect divergence.
Supply chain verification records. Evidence of provenance verification for model components including weights, fine-tuning data, and adapters, with cryptographic hashes where available.

Retention requirements:

Model version records and provenance documentation: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Evidence must exist as retained artefacts, not be reconstructable after the fact.

8. Test Specification

Testing AG-048 compliance requires verification that model identity is structurally assured and that changes are detected regardless of their source or mechanism.

Test 8.1: Version Substitution Detection

Stimulus: Replace the model endpoint with a different model version and verify that the version change is detected. Test with models that have similar but not identical version strings. Test with models from different providers to verify that provider-level substitution is detected.
Expected behaviour: The system detects the version change and generates a governance alert. Agent operations are suspended or restricted pending review.
Pass criteria: All version substitutions are detected. No substituted model processes production requests without detection.
Fail criteria: Any model substitution goes undetected, or a substituted model processes production requests before detection.

Test 8.2: Behavioural Fingerprint Deviation Detection

Stimulus: After establishing a behavioural baseline, introduce a model with subtly different behaviour (e.g., a model fine-tuned on a small dataset to shift responses for specific input categories). Test with varying degrees of behavioural divergence to establish the detection threshold.
Expected behaviour: The behavioural fingerprinting system detects the change and generates an alert. The detection threshold is documented and justified.
Pass criteria: Behavioural changes exceeding the documented detection threshold are identified. The system does not generate excessive false positives on normal model variance.
Fail criteria: Meaningful behavioural changes go undetected, or the system cannot distinguish between normal variance and actual model modification.

Test 8.3: Provenance Chain Verification

Stimulus: Attempt to deploy a model that lacks a complete provenance chain — missing source verification, unsigned weights, or absent approval records. Test with provenance records that have been tampered with.
Expected behaviour: The deployment is blocked or flagged. The system identifies the specific provenance gap.
Pass criteria: No model with an incomplete or tampered provenance chain is deployed to production without detection and explicit override.
Fail criteria: A model with incomplete provenance is deployed without detection.

Test 8.4: Silent Update Detection

Stimulus: Simulate an API provider updating the model behind a stable endpoint by routing requests to a different model version without changing the endpoint URL.
Expected behaviour: The change is detected through agent monitoring or version metadata verification. A governance alert is generated.
Pass criteria: The silent update is detected within the documented detection interval. Agent operations are flagged for review.
Fail criteria: The silent update goes undetected, or detection takes longer than the documented interval.

Test 8.5: Degradation Fails Safe

Stimulus: Disable model verification components and verify that the system fails safely — blocking agent operations rather than continuing with an unverified model. This intersects with AG-008.
Expected behaviour: The system blocks all agent actions rather than operating with an unverified model.
Pass criteria: No agent action executes while model verification is unavailable.
Fail criteria: Agent actions execute with an unverified model when verification components are disabled.

Conformance Scoring

Score 0: No model provenance tracking exists — the organisation cannot confirm which model version is in production.
Score 1: Version tracking exists but integrity verification and change detection are absent — the version is recorded but not verified.
Score 2: Full provenance verification with change detection, agent monitoring, and re-approval requirements.
Score 3: Verified by independent adversarial testing with model substitution scenarios, including subtle behavioural modifications designed to evade detection.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 15 (Accuracy, Robustness, and Cybersecurity)	Direct requirement
EU AI Act	Article 9 (Risk Management System)	Supports compliance
NIST AI RMF	GOVERN 1.7, MAP 3.2 (Supply Chain Risk)	Supports compliance
SOC 2 Type II	Change Management and System Integrity	Supports compliance
SOX	Section 404 (Internal Controls Over Financial Reporting)	Supports compliance
ISO 42001	Clause 6.1 (Actions to Address Risks), Clause 8.2 (AI Risk Assessment)	Supports compliance
FCA SS1/23	Model Risk Management	Supports compliance

EU AI Act — Article 15 (Accuracy, Robustness, and Cybersecurity)

Article 15 requires high-risk AI systems to achieve appropriate levels of accuracy, robustness, and cybersecurity throughout their lifecycle. Model integrity is a direct prerequisite for accuracy — a substituted or poisoned model cannot maintain the accuracy level that was established during conformity assessment. The article requires resilience against exploitation of system vulnerabilities, which includes supply chain attacks on the model itself. AG-048 implements the lifecycle model integrity assurance that Article 15 envisions. The requirement that accuracy be maintained "throughout the lifecycle" maps directly to AG-048's continuous verification requirement — it is not sufficient to verify the model once at deployment.

EU AI Act — Article 9 (Risk Management System)

Article 9 requires providers of high-risk AI systems to establish and maintain a risk management system that addresses risks throughout the system lifecycle. Model substitution and poisoning are risks that AG-048's provenance verification and change detection directly mitigate. The risk management system must include testing procedures to ensure the AI system performs consistently — AG-048's behavioural fingerprinting provides the mechanism for detecting inconsistency attributable to model changes.

NIST AI RMF — Supply Chain Risk (GOVERN 1.7, MAP 3.2)

The NIST AI RMF identifies AI supply chain risk as a significant concern, including risks from model components, training data, and third-party model providers. GOVERN 1.7 addresses processes for managing AI risks from third-party entities. MAP 3.2 addresses the mapping of risk contexts for AI systems. The framework recommends organisations verify the provenance of AI components and monitor for changes that could affect system behaviour. AG-048 provides the operational implementation of these recommendations, with specific mechanisms for provenance verification, change detection, and governance re-assessment.

SOC 2 Type II — Change Management and System Integrity

SOC 2 requires that changes to system components are authorised, tested, and documented. For AI agent deployments, the model is a system component. SOC 2 auditors expect to see evidence that model changes are controlled through a formal change management process. AG-048's requirement for model changes to be treated as mandate change events aligns directly with SOC 2 change management expectations. The model registry provides the audit trail that SOC 2 requires for system component changes.

SOX — Section 404 (Internal Controls Over Financial Reporting)

For AI agents executing financial operations, the model is a control component. If the model changes without governance re-assessment, the control environment has changed without assessment — a potential control deficiency. AG-048 ensures that model changes are detected and assessed, supporting the SOX requirement for management to assess internal control effectiveness.

ISO 42001 — Clause 6.1, Clause 8.2

Clause 6.1 requires organisations to determine actions to address risks within the AI management system. Clause 8.2 requires AI risk assessment. Model substitution and integrity compromise are AI-specific risks that require specific controls. AG-048's provenance verification and change detection implement the risk treatment controls that ISO 42001 requires for model-related risks.

FCA SS1/23 — Model Risk Management

The FCA's supervisory statement on model risk management sets expectations for how firms manage risks arising from the use of models, including AI models. SS1/23 requires firms to maintain model inventories, validate models before use, and monitor models in production. AG-048's model registry, provenance verification, and continuous monitoring directly implement these expectations for AI agent deployments. The FCA expects firms to detect and respond to model changes that could affect the firm's risk profile.

10. Failure Severity

Field	Value
Severity Rating	Critical
Blast Radius	Organisation-wide — a compromised model undermines every governance protocol that depends on model behaviour, which is effectively all of them

Consequence chain: Without model provenance and integrity controls, a compromised model — through supply chain attack, backdoor fine-tuning, or silent version substitution — bypasses all governance controls that assume the original model's behaviour. The governance layer evaluates a different model than it was calibrated for, making all governance assessments unreliable. Mandate limits calibrated for one model's risk profile may be inadequate for a different model. Behavioural monitoring baselines become meaningless. The entire governance framework operates on false assumptions. The immediate technical failure is undetected model substitution or modification. The operational impact is that governance decisions are made against invalid assumptions — actions that should be blocked may be approved, risk assessments may be miscalibrated, and behavioural baselines may fail to detect anomalies because the baseline itself has shifted. The business consequence includes regulatory enforcement action for inadequate model risk management, financial losses from decisions made under invalid governance assumptions, reputational damage from control failures that could have been prevented, and potential personal liability for senior managers under regimes such as the FCA Senior Managers Regime. The severity is amplified by the systemic nature of the failure — unlike a single protocol failure that affects one governance dimension, a model integrity failure undermines every dimension simultaneously.

Cross-references: AG-048 intersects with AG-007 (Governance Configuration Control) for model change management, AG-014 (External Dependency Integrity) for supply chain assurance at the service layer, AG-022 (Behavioural Drift Detection) for detecting anomalies that may indicate model modification, AG-043 (Self-Modification Detection) for agent-initiated changes, and AG-046 (Operating Environment Integrity) for infrastructure protection.

Cite this protocol

AgentGoverning. (2026). AG-048: AI Model Provenance and Integrity. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-048

← Previous Protocol

AG-047

Cross-Jurisdiction Compliance Governance

Next Protocol →

AG-049

Governance Decision Explainability