AG-230

Substantial Modification Determination Governance

Legal, Regulatory & Records ~16 min read AGS v2.1 · April 2026
EU AI Act FCA NIST ISO 42001

2. Summary

Substantial Modification Determination Governance requires that every change to an AI agent — its model, training data, operational parameters, deployment context, or governance configuration — is assessed against defined materiality thresholds to determine whether the change constitutes a substantial modification triggering reclassification, reapproval, or recertification. This dimension prevents incremental changes from cumulatively transforming an agent's risk profile without triggering the governance reviews that the original deployment required. The EU AI Act, medical device regulations, and financial services change management frameworks all impose reclassification obligations upon substantial modification — AG-230 implements the determination mechanism that identifies when those obligations activate.

3. Example

Scenario A — Incremental Fine-Tuning Creates Undetected Reclassification Trigger: A financial-value agent is deployed for customer suitability assessments, classified as high-risk under the EU AI Act and certified through a conformity assessment. Over 8 months, the development team applies 23 incremental fine-tuning updates based on customer interaction data. Each update modifies fewer than 0.5% of model weights and passes unit tests. No individual update triggers the change management threshold of "material change." After 8 months, the cumulative effect of the 23 updates has shifted the model's behaviour significantly: the suitability recommendations have drifted by 14% from the original certified behaviour on a benchmark test suite. A regulatory audit identifies that the agent's behaviour no longer matches the conformity assessment documentation. The regulator determines that the cumulative changes constitute a substantial modification under EU AI Act Article 43(4), requiring a new conformity assessment. The agent must be taken offline pending reassessment.

What went wrong: Each individual change was assessed independently against the materiality threshold. No mechanism tracked the cumulative impact of successive changes. The determination process evaluated changes in isolation rather than measuring cumulative drift from the certified baseline. Consequence: 3-month service interruption during reassessment, EUR 1.8 million reassessment cost, regulatory finding for failure to maintain conformity, and customer impact from suspended suitability services.

Scenario B — Deployment Context Change Triggers Unrecognised Reclassification: An AI agent originally deployed as an internal research assistant (classified as limited risk under the EU AI Act) is repurposed by a business unit as a customer-facing advisory tool for insurance product selection. The model is identical — no technical change was made. The business unit does not consult the compliance or legal teams because "nothing changed about the AI." However, the change in deployment context — from internal research to customer-facing financial advice — changes the risk classification from limited risk to high risk. The agent operates for 4 months without the required conformity assessment, transparency obligations, or human oversight mechanisms required for high-risk AI systems. A customer complaint triggers a regulatory inquiry that identifies the classification gap.

What went wrong: The substantial modification determination process evaluated only technical changes (model updates, data changes, parameter modifications). It did not evaluate deployment context changes (user population, use case, regulatory exposure). The deployment context change — identical technology, fundamentally different risk profile — was not captured. Consequence: Regulatory enforcement for operating a high-risk AI system without conformity assessment, 4 months of non-compliant customer interactions requiring remediation, and potential customer compensation for unsuitable advice.

Scenario C — Training Data Expansion Into Protected Categories: A public-sector agent used for benefits eligibility screening is retrained with an expanded dataset that includes additional demographic fields not present in the original training data: ethnicity, disability status, and religious affiliation. The retraining improves prediction accuracy by 3.2% on the test set. The change is classified as "performance improvement" and deployed without reclassification review. A civil rights organisation files a challenge demonstrating that the model now uses protected characteristics as features — something the original conformity assessment explicitly excluded. The agency faces a discrimination lawsuit and regulatory investigation.

What went wrong: The training data change was evaluated on the performance dimension (3.2% accuracy improvement) but not on the legal dimension (introduction of protected characteristics into the feature space). The materiality threshold was defined technically rather than legally. Consequence: Discrimination lawsuit with potential class-action scope, regulatory investigation by the equality body, mandatory model rollback to pre-expansion state, and reputational damage to the public-sector agency.

4. Requirement Statement

Scope: This dimension applies to every AI agent that has undergone a conformity assessment, certification, regulatory approval, internal governance sign-off, or any other structured approval process at deployment time. It also applies to agents operating in regulated sectors where changes to AI systems may trigger re-notification, re-registration, or re-certification obligations. The scope covers all change types: model changes (retraining, fine-tuning, architecture modification), data changes (training data expansion, feature addition or removal), parameter changes (temperature, sampling strategy, context window), deployment context changes (user population, use case, jurisdiction, integration point), and governance configuration changes (mandate limits, access controls, monitoring thresholds). The scope extends to changes made by automated systems (e.g., continuous learning, automated retraining pipelines) — the determination obligation applies regardless of whether the change was made by a human or an automated process.

4.1. A conforming system MUST evaluate every change to an agent — whether to its model, training data, operational parameters, deployment context, or governance configuration — against defined materiality thresholds before the change is deployed to production.

4.2. A conforming system MUST track cumulative change impact from the most recent approved baseline, not only the incremental impact of each individual change, to prevent incremental drift from bypassing materiality thresholds.

4.3. A conforming system MUST define materiality thresholds that include both technical dimensions (e.g., behavioural drift exceeding 5% on the approved benchmark suite) and legal dimensions (e.g., introduction of protected characteristics, change of user population, change of jurisdiction).

4.4. A conforming system MUST block deployment of changes determined to constitute a substantial modification until the required reclassification, reapproval, or recertification process is completed.

4.5. A conforming system MUST maintain a change ledger recording every change, its materiality assessment, the baseline against which it was assessed, the determination outcome, and the identity of the person or process that made the determination.

4.6. A conforming system MUST define, for each applicable regulation, the specific criteria that constitute a substantial modification under that regulation and map each change assessment to those criteria.

4.7. A conforming system SHOULD implement automated drift detection that continuously measures the agent's current behaviour against the approved baseline and raises alerts when cumulative drift approaches materiality thresholds (e.g., at 70% and 90% of the threshold).

4.8. A conforming system SHOULD require independent review (not by the team that made the change) for materiality determinations where the cumulative drift exceeds 50% of any materiality threshold.

4.9. A conforming system SHOULD support automated rollback to the most recent approved baseline if a change that has not completed the substantial modification determination process is detected in production.

4.10. A conforming system MAY implement sandbox environments where changes can be evaluated against materiality thresholds before entering the deployment pipeline.

5. Rationale

AI agents are not static. They are retrained, fine-tuned, reconfigured, redeployed, and extended continuously. Each change, individually, may be minor. But the cumulative effect of many minor changes can fundamentally alter the agent's behaviour, risk profile, or regulatory classification. The legal frameworks governing AI systems recognise this: the EU AI Act Article 43(4) specifically addresses substantial modification, requiring a new conformity assessment when a modification is "substantial." Medical device regulations (MDR Article 120, FDA 510(k)) impose similar obligations. Financial services change management frameworks (e.g., FCA's model risk management expectations) require re-validation when model changes are material.

The challenge is that "substantial" and "material" are legal determinations, not purely technical ones. A 0.1% change in model weights that introduces a protected characteristic into the feature space is legally substantial even though it is technically trivial. A change in deployment context — same model, different use case — can trigger reclassification with zero technical modification. The determination mechanism must therefore evaluate changes across multiple dimensions: technical (how much did the behaviour change?), legal (did the change introduce new legal exposures?), regulatory (does the change trigger re-notification under any applicable regulation?), and contextual (did the deployment context change in a legally relevant way?).

The alternative — evaluating each change independently and only against technical thresholds — creates the "ship of Theseus" problem. After enough incremental changes, the deployed agent bears no resemblance to the agent that was originally assessed and approved, but no single change triggered the reclassification process. AG-230 prevents this by requiring cumulative tracking from the approved baseline.

6. Implementation Guidance

The substantial modification determination requires two components: a baseline definition (the approved state against which changes are measured) and a determination engine (the process that evaluates each change against materiality thresholds).

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. The FCA and PRA expect firms to have model risk management frameworks (aligned with SS1/23) that include change management with materiality assessment. A model change that alters output by more than a defined threshold (commonly 5-10% on key metrics) typically triggers re-validation. For AI agents, the FCA has indicated that changes to the agent's operational scope (e.g., expanding from advisory to execution) are inherently material regardless of the technical magnitude of the change.

Healthcare / Medical Devices. The EU Medical Devices Regulation (MDR) and FDA's 510(k) framework both define substantial modification criteria for software as a medical device (SaMD). Changes to intended use, clinical significance of outputs, or core algorithm architecture are generally considered substantial. The International Medical Device Regulators Forum (IMDRF) provides guidance on when AI/ML-based SaMD changes require new regulatory submissions.

Public Sector. Algorithmic impact assessments required by frameworks such as Canada's Algorithmic Impact Assessment Tool or the EU AI Act's conformity assessment for high-risk systems create specific baselines. Changes that would alter the impact assessment outcome are inherently substantial. Public sector AI has heightened sensitivity to changes affecting fairness, bias, and discrimination.

Maturity Model

Basic Implementation — The organisation maintains a change log for each deployed agent and conducts manual materiality assessments for significant changes (model retrains, major feature changes). Deployment context changes are captured when the development team identifies them. Cumulative tracking is manual — a reviewer examines the change history periodically. This level catches obvious substantial modifications but misses incremental drift and deployment context changes that the development team does not recognise as material.

Intermediate Implementation — The organisation has a baseline fingerprint for each approved agent version. Every change is assessed against the baseline using the multi-dimensional framework (behavioural, feature space, context, regulatory). Cumulative drift is tracked automatically. Early warning alerts fire at 70% and 90% of materiality thresholds. Independent review is required for changes exceeding 50% of any threshold. The determination process is integrated into the CI/CD pipeline — changes cannot deploy without a recorded materiality assessment.

Advanced Implementation — All intermediate capabilities plus: automated drift detection continuously measures production behaviour against the baseline (not just at change time). The regulatory trigger analysis is automated — the system evaluates each change against the specific substantial modification criteria for every applicable regulation and generates a jurisdiction-specific determination. Sandbox evaluation allows changes to be tested against materiality thresholds before entering the deployment pipeline. The organisation can demonstrate to any regulator the complete chain from original approval through every change to current state, with materiality assessments for each step.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Individual Change Assessment

Test 8.2: Cumulative Drift Detection

Test 8.3: Deployment Context Change Detection

Test 8.4: Legal Dimension Assessment

Test 8.5: Early Warning Alerts

Test 8.6: Blocking of Unassessed Changes

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 43(4) (Substantial Modification)Direct requirement
EU AI ActArticle 16 (Provider Obligations — Conformity Maintenance)Direct requirement
EU MDRArticle 120, MDCG 2020-3 (Significant Change)Direct requirement
FDA510(k) Substantial Equivalence, Predetermined Change Control PlanDirect requirement
FCA SS1/23Model Risk Management — Change ManagementSupports compliance
NIST AI RMFMANAGE 2.3 (Risk Monitoring), MANAGE 4.1 (Change Management)Supports compliance
ISO 42001Clause 8.2 (AI Risk Assessment), Clause 10.2 (Continual Improvement)Supports compliance

EU AI Act — Article 43(4) (Substantial Modification)

Article 43(4) requires that when a high-risk AI system undergoes a "substantial modification," it must undergo a new conformity assessment. The Act defines substantial modification as a change that affects the system's compliance with the requirements of Title III, Chapter 2, or that results in a modification to the intended purpose for which the system has been assessed. AG-230 implements the determination mechanism that identifies whether a given change meets this definition. The key implementation challenge is that the EU AI Act does not provide quantitative thresholds for "substantial" — organisations must define their own thresholds and be prepared to defend them to regulators. AG-230 requires that these thresholds cover both technical and legal dimensions, preventing the common failure of technically-focused-only assessment.

EU AI Act — Article 16 (Provider Obligations)

Article 16 requires providers to ensure that their high-risk AI systems continue to comply with the requirements of the Act throughout the system's lifecycle. This creates an ongoing obligation to monitor for changes that could affect compliance. AG-230's cumulative drift tracking directly implements this ongoing monitoring obligation.

EU MDR — Significant Change Determination

The EU Medical Devices Regulation and associated MDCG guidance define criteria for when changes to software as a medical device constitute a "significant change" requiring new regulatory submissions. For AI-based medical devices, these criteria include changes to intended use, changes to the clinical significance of outputs, changes to the core algorithm architecture, and changes that could affect safety or performance. AG-230's multi-dimensional assessment framework covers all these criteria.

FDA — Predetermined Change Control Plan

The FDA's framework for AI/ML-based Software as a Medical Device includes the concept of a Predetermined Change Control Plan (PCCP) — a pre-approved plan defining the types of changes the manufacturer intends to make and the methodology for evaluating their impact. AG-230's change classification taxonomy and materiality thresholds align with the PCCP framework, allowing organisations to define pre-approved change categories with associated assessment criteria.

FCA SS1/23 — Model Change Management

SS1/23 expects firms to have change management processes for models that include materiality assessment and re-validation triggers. The FCA expects that changes to AI models used in regulated activities are subject to the same change management rigour as traditional quantitative models. AG-230 implements the materiality determination layer that feeds the broader change management process.

10. Failure Severity

FieldValue
Severity RatingHigh
Blast RadiusSystem-specific, but with potential organisation-wide regulatory consequences

Consequence chain: Without substantial modification determination, an agent's risk profile can drift from its approved baseline without detection. The immediate technical consequence is that the deployed agent no longer matches its conformity assessment, certification, or approval documentation. The regulatory consequence is that the organisation is operating an unapproved or uncertified AI system — a compliance violation in every jurisdiction that imposes conformity assessment or certification requirements. For high-risk AI systems under the EU AI Act, this can result in fines up to EUR 15 million or 3% of global turnover. For medical devices, this can result in product recall, market withdrawal, and criminal liability. For financial services, this can result in regulatory enforcement, client remediation, and personal liability under senior manager regimes. The cumulative nature of the risk means that the longer the drift continues undetected, the larger the remediation cost: an agent that has drifted over 12 months requires a full reassessment, re-validation, and potentially re-deployment — a process that can take 3-6 months and cost millions in assessment fees, operational disruption, and customer impact.

Cross-references: AG-007 (Governance Configuration Control) governs the versioning and immutability of the baseline fingerprint that AG-230 measures against. AG-022 (Behavioural Drift Detection) provides the continuous monitoring that feeds AG-230's cumulative drift tracker. AG-021 (Regulatory Obligation Identification) identifies the specific regulatory requirements that define what constitutes a substantial modification in each applicable jurisdiction. AG-229 (Jurisdictional Applicability Mapping Governance) determines which jurisdictions' substantial modification criteria apply. AG-006 (Tamper-Evident Record Integrity) ensures that the change ledger and baseline fingerprints are tamper-evident.

Cite this protocol
AgentGoverning. (2026). AG-230: Substantial Modification Determination Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-230