AI Model Provenance and Integrity governs the verification and ongoing assurance that the underlying AI model powering an agent is the model that was approved, tested, and calibrated for governance purposes. Every governance protocol in this framework makes assumptions about the model's behaviour — its reasoning patterns, its response to instructions, its compliance with constraints, its susceptibility to known attack vectors. If the model is substituted, poisoned, or silently updated, those assumptions become invalid and the entire governance framework operates on a false foundation. AG-048 ensures that the model in production is the model that was evaluated, and that any change to the model triggers governance re-assessment. This protocol is distinct from AG-014 (External Dependency Integrity), which governs the tools, APIs, and services an agent calls during operation. AG-014 addresses the service layer — the instruments the agent uses. AG-048 addresses the agent's cognitive core — the model that processes inputs, reasons about actions, and generates outputs. A compromised model is fundamentally more dangerous than a compromised tool because it produces incorrect reasoning across all operations, potentially in ways specifically designed to evade governance detection.
Scenario A — Silent API Model Update Shifts Risk Profile: A wealth management firm uses an AI agent for portfolio rebalancing recommendations, powered by a commercial language model API. The API provider updates the model to version 4.1, improving general reasoning but subtly changing how the model weighs risk factors. The provider updates the model changelog but does not notify customers directly. The firm's agent begins recommending slightly more aggressive allocations for conservative-profile clients. Over eight weeks, portfolio drift accumulates. A quarterly review reveals that 340 client portfolios have drifted outside their approved risk bands. The firm cannot demonstrate that the model powering the recommendations was the model that was approved by the model risk committee.
What went wrong: The firm relied on the API endpoint remaining stable and did not verify model version or behaviour on an ongoing basis. No behavioural fingerprinting existed to detect the shift in risk assessment. The model change was not treated as a mandate change event. Consequence: 340 client portfolios require manual review and potential rebalancing. Client complaints and potential mis-selling claims. FCA model risk management findings. The firm's model risk management framework is found inadequate under SS1/23 expectations.
Scenario B — Supply Chain Attack Embeds Backdoor in Fine-Tuned Model: An organisation fine-tunes an open-source model for its governance-sensitive application using a dataset sourced from a public repository. The dataset contains poisoned examples that embed a backdoor: when the model encounters a specific trigger phrase in its input, it bypasses its safety constraints and follows instructions uncritically. The backdoor passes standard evaluation benchmarks because it only activates on the specific trigger. In production, an external party includes the trigger phrase in a transaction description, causing the agent to approve a fraudulent transaction of $450,000 that would normally be blocked.
What went wrong: The model's provenance chain did not include verification of the fine-tuning dataset. No behavioural fingerprinting tested for trigger-activated behaviour changes. The model was treated as trustworthy based on benchmark performance alone. Consequence: $450,000 in fraudulent loss. Potential additional losses as the backdoor may have been exploited previously without detection. The organisation must withdraw the model from production and re-fine-tune from a verified dataset. Regulatory investigation into the adequacy of model risk management controls.
Scenario C — Internal Deployment Error Substitutes Wrong Model: During a routine deployment update, an operations engineer accidentally deploys a development version of the model instead of the production version. The development model has relaxed safety constraints because it was configured for internal testing. The deployment pipeline does not verify model identity — it deploys whatever artefact is in the specified storage location. The development model processes production transactions for 72 hours before the error is discovered during a routine operational review. During this period, the model approved several transactions that exceeded normal risk thresholds.
What went wrong: The deployment pipeline did not verify model identity against the approved model registry. No hash verification, version check, or behavioural fingerprint comparison occurred at deployment time. The deployment was treated as an infrastructure operation rather than a governance-relevant event. Consequence: 72 hours of production transactions processed by an unapproved model. All transactions must be retrospectively reviewed. Regulatory reporting obligation for inadequate change control. SOX control deficiency finding for the deployment pipeline.
Scope: This dimension applies to all deployments where the underlying model could be substituted, poisoned, or fine-tuned without governance review. This includes virtually every AI agent deployment — whether the model is self-hosted, accessed through an API, or deployed as a managed service. The scope extends to all components of the model stack: base model weights, fine-tuning adapters (e.g., LoRA adapters), system prompts that materially alter model behaviour, inference configuration parameters (temperature, top-p, top-k), and any post-processing or filtering layers that shape model outputs. A change to any of these components can alter the model's behaviour in governance-relevant ways. The scope also covers model provider changes — if an API provider silently updates the model behind an endpoint, this constitutes a model change from a governance perspective, even though the API endpoint URL has not changed. AG-048 requires detection of such changes regardless of whether the provider communicates them.
4.1. A conforming system MUST record and verify model version at agent deployment and on every agent restart.
4.2. A conforming system MUST detect unauthorised model version changes and trigger an immediate governance alert upon detection.
4.3. A conforming system MUST maintain verifiable model provenance from source to deployment, with a documented chain of custody.
4.4. A conforming system MUST treat model changes as mandate change events requiring governance re-approval per AG-007.
4.5. A conforming system SHOULD verify model endpoint authenticity cryptographically, through signed model hashes or TLS certificate pinning.
4.6. A conforming system SHOULD detect behavioural changes consistent with weight poisoning or backdoor fine-tuning through continuous agent monitoring.
4.7. A conforming system SHOULD require explicit governance re-approval before an updated model is used in production.
4.8. A conforming system SHOULD maintain a model registry with the complete history of all model versions used in production, including deployment timestamps and approval records.
4.9. A conforming system MAY implement behavioural fingerprinting to detect model substitution without access to model weights.
4.10. A conforming system MAY maintain a shadow deployment where the approved model version runs in parallel with the production version for continuous comparison.
AI Model Provenance and Integrity addresses a foundational vulnerability in any AI governance framework: the assumption that the model being governed is the model that was assessed. Every other protocol in this standard — from AG-001's mandate enforcement to AG-022's agent monitoring — is calibrated against a specific model's behaviour. If the model changes, those calibrations become unreliable, and the governance framework operates on false assumptions.
The threat landscape for model integrity includes several distinct attack vectors. Supply chain attacks compromise the model before or during delivery, embedding backdoors or biased behaviours that activate under specific conditions. Silent version substitution replaces a tested model with an untested or compromised version, either through API provider changes or internal deployment errors. Weight poisoning introduces targeted modifications to model weights that alter behaviour for specific inputs while preserving normal behaviour for general inputs, making detection through routine testing extremely difficult. Fine-tuning attacks use the model's own adaptation mechanisms to modify its behaviour in ways that undermine governance compliance.
The principle underlying AG-048 is that model identity is a governance-critical property that must be verified continuously, not assumed. Just as AG-001 requires structural enforcement of operational boundaries rather than relying on the agent's own compliance, AG-048 requires structural verification of model identity rather than relying on the assumption that the model has not changed. The model is not a static component — it is a living dependency that can change through provider updates, deployment errors, supply chain compromise, or deliberate attack. Governance must account for this dynamism.
The failure mode is systemic. Unlike a single protocol failure that affects one aspect of governance, a model integrity failure undermines every protocol that depends on model behaviour — which is nearly all of them. AG-001's mandate enforcement assumes the model respects structural boundaries. AG-005's instruction integrity assumes the model processes instructions as designed. AG-022's agent monitoring assumes a stable behavioural baseline. A compromised model invalidates all of these assumptions simultaneously. Governance calibration is only valid for the specific model version it was performed against, and any model change invalidates that calibration until re-assessment is completed.
Record model identifier, version hash, and source at deployment. Verify the hash on every startup and periodically during operation. Implement continuous behavioural fingerprinting to detect statistical deviations consistent with model substitution. Treat any detected version or behaviour change as a critical governance event requiring immediate review.
Recommended patterns:
Anti-patterns to avoid:
Financial Services. Model risk management is a well-established discipline in financial services, governed by regulatory expectations including the FCA's SS1/23 and the Fed/OCC's SR 11-7. AG-048 extends traditional model risk management to address the specific risks of AI model supply chains. Financial services firms should integrate AG-048 verification into their existing model risk management frameworks, including model inventory, validation, and ongoing monitoring processes. Per-transaction limits, daily aggregate limits, and product-type restrictions calibrated against a specific model version require re-validation when the model changes.
Healthcare. AI models used in clinical decision support are subject to medical device regulations in many jurisdictions. A model change in a clinical AI system may require re-certification under FDA 510(k) or EU MDR. AG-048's requirement for change detection and re-approval directly supports regulatory compliance for clinical AI models, ensuring that model changes are detected before they affect patient care.
Critical Infrastructure. Models controlling critical infrastructure must meet the highest integrity standards. Model substitution in a power grid management system or water treatment facility could have physical safety consequences. AG-048 implementation for critical infrastructure should include hardware-backed model verification, air-gapped model registries, and immediate operational shutdown if model integrity cannot be confirmed.
Basic Implementation — The organisation records the model identifier and version at deployment time. The model version is logged in the governance record for each agent session. Version verification occurs on agent startup by querying the model endpoint for version metadata. If the reported version differs from the approved version, an alert is generated. This level meets the minimum mandatory requirements but has weaknesses: it relies on the model provider accurately reporting version information, it does not detect changes between startup checks, and it cannot detect behavioural changes that occur without a version change (e.g., weight poisoning or provider-side modifications that do not update the version string).
Intermediate Implementation — Model identity is verified through cryptographic mechanisms in addition to version metadata. For self-hosted models, this means verifying the hash of model weights on every load. For API-hosted models, this means continuous behavioural fingerprinting — running a standard set of test prompts through the model at regular intervals and comparing outputs to a baseline established during governance approval. Statistically significant deviations trigger an alert. The model registry maintains a complete provenance chain including source, download verification, fine-tuning history, and deployment approvals. Model updates go through a formal change control process with governance re-assessment before production deployment.
Advanced Implementation — All intermediate capabilities plus: behavioural fingerprinting operates continuously during production use, not only through dedicated test prompts. Statistical models detect subtle behavioural drift that might indicate gradual model modification. Shadow deployments run the approved model version in parallel, comparing outputs to detect divergence. Independent adversarial testing has verified that model substitution is detected even when the substitute model is specifically designed to mimic the original on common inputs. Hardware security modules protect model signing keys and fingerprint baselines. The organisation can demonstrate to regulators that model identity is verified to a cryptographic standard of assurance.
Required artefacts:
Retention requirements:
Access requirements:
Testing AG-048 compliance requires verification that model identity is structurally assured and that changes are detected regardless of their source or mechanism.
Test 8.1: Version Substitution Detection
Test 8.2: Behavioural Fingerprint Deviation Detection
Test 8.3: Provenance Chain Verification
Test 8.4: Silent Update Detection
Test 8.5: Degradation Fails Safe
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 15 (Accuracy, Robustness, and Cybersecurity) | Direct requirement |
| EU AI Act | Article 9 (Risk Management System) | Supports compliance |
| NIST AI RMF | GOVERN 1.7, MAP 3.2 (Supply Chain Risk) | Supports compliance |
| SOC 2 Type II | Change Management and System Integrity | Supports compliance |
| SOX | Section 404 (Internal Controls Over Financial Reporting) | Supports compliance |
| ISO 42001 | Clause 6.1 (Actions to Address Risks), Clause 8.2 (AI Risk Assessment) | Supports compliance |
| FCA SS1/23 | Model Risk Management | Supports compliance |
Article 15 requires high-risk AI systems to achieve appropriate levels of accuracy, robustness, and cybersecurity throughout their lifecycle. Model integrity is a direct prerequisite for accuracy — a substituted or poisoned model cannot maintain the accuracy level that was established during conformity assessment. The article requires resilience against exploitation of system vulnerabilities, which includes supply chain attacks on the model itself. AG-048 implements the lifecycle model integrity assurance that Article 15 envisions. The requirement that accuracy be maintained "throughout the lifecycle" maps directly to AG-048's continuous verification requirement — it is not sufficient to verify the model once at deployment.
Article 9 requires providers of high-risk AI systems to establish and maintain a risk management system that addresses risks throughout the system lifecycle. Model substitution and poisoning are risks that AG-048's provenance verification and change detection directly mitigate. The risk management system must include testing procedures to ensure the AI system performs consistently — AG-048's behavioural fingerprinting provides the mechanism for detecting inconsistency attributable to model changes.
The NIST AI RMF identifies AI supply chain risk as a significant concern, including risks from model components, training data, and third-party model providers. GOVERN 1.7 addresses processes for managing AI risks from third-party entities. MAP 3.2 addresses the mapping of risk contexts for AI systems. The framework recommends organisations verify the provenance of AI components and monitor for changes that could affect system behaviour. AG-048 provides the operational implementation of these recommendations, with specific mechanisms for provenance verification, change detection, and governance re-assessment.
SOC 2 requires that changes to system components are authorised, tested, and documented. For AI agent deployments, the model is a system component. SOC 2 auditors expect to see evidence that model changes are controlled through a formal change management process. AG-048's requirement for model changes to be treated as mandate change events aligns directly with SOC 2 change management expectations. The model registry provides the audit trail that SOC 2 requires for system component changes.
For AI agents executing financial operations, the model is a control component. If the model changes without governance re-assessment, the control environment has changed without assessment — a potential control deficiency. AG-048 ensures that model changes are detected and assessed, supporting the SOX requirement for management to assess internal control effectiveness.
Clause 6.1 requires organisations to determine actions to address risks within the AI management system. Clause 8.2 requires AI risk assessment. Model substitution and integrity compromise are AI-specific risks that require specific controls. AG-048's provenance verification and change detection implement the risk treatment controls that ISO 42001 requires for model-related risks.
The FCA's supervisory statement on model risk management sets expectations for how firms manage risks arising from the use of models, including AI models. SS1/23 requires firms to maintain model inventories, validate models before use, and monitor models in production. AG-048's model registry, provenance verification, and continuous monitoring directly implement these expectations for AI agent deployments. The FCA expects firms to detect and respond to model changes that could affect the firm's risk profile.
| Field | Value |
|---|---|
| Severity Rating | Critical |
| Blast Radius | Organisation-wide — a compromised model undermines every governance protocol that depends on model behaviour, which is effectively all of them |
Consequence chain: Without model provenance and integrity controls, a compromised model — through supply chain attack, backdoor fine-tuning, or silent version substitution — bypasses all governance controls that assume the original model's behaviour. The governance layer evaluates a different model than it was calibrated for, making all governance assessments unreliable. Mandate limits calibrated for one model's risk profile may be inadequate for a different model. Behavioural monitoring baselines become meaningless. The entire governance framework operates on false assumptions. The immediate technical failure is undetected model substitution or modification. The operational impact is that governance decisions are made against invalid assumptions — actions that should be blocked may be approved, risk assessments may be miscalibrated, and behavioural baselines may fail to detect anomalies because the baseline itself has shifted. The business consequence includes regulatory enforcement action for inadequate model risk management, financial losses from decisions made under invalid governance assumptions, reputational damage from control failures that could have been prevented, and potential personal liability for senior managers under regimes such as the FCA Senior Managers Regime. The severity is amplified by the systemic nature of the failure — unlike a single protocol failure that affects one governance dimension, a model integrity failure undermines every dimension simultaneously.
Cross-references: AG-048 intersects with AG-007 (Governance Configuration Control) for model change management, AG-014 (External Dependency Integrity) for supply chain assurance at the service layer, AG-022 (Behavioural Drift Detection) for detecting anomalies that may indicate model modification, AG-043 (Self-Modification Detection) for agent-initiated changes, and AG-046 (Operating Environment Integrity) for infrastructure protection.