AG-341: Fine-Tune Objective Documentation Governance

2. Summary

Fine-Tune Objective Documentation Governance requires that every fine-tuning operation is preceded by a formal, documented statement of intent that specifies the business objective, the behavioural hypothesis being tested, the expected capability trade-offs, the evaluation criteria for success and failure, and the rollback conditions. Fine-tuning is not a routine operational task — it is a deliberate modification of model behaviour that can introduce regressions, amplify biases, degrade safety properties, or create capabilities the organisation did not intend. Without documented objectives, organisations cannot evaluate whether a fine-tune achieved its purpose, cannot detect unintended behavioural changes, and cannot make principled decisions about whether to deploy, iterate, or abandon a fine-tuned model.

3. Example

Scenario A — Fine-Tuning Degrades Safety Without Detection: A customer-facing agent is fine-tuned to be "more helpful" based on 50,000 human preference labels that reward longer, more detailed responses. The fine-tuning succeeds on its stated metric: average response length increases by 40% and helpfulness ratings improve by 12%. However, no documented objective specified which safety properties must be preserved. Post-fine-tuning evaluation reveals that the model's refusal rate for harmful requests dropped from 97.3% to 84.1% — because the preference data implicitly rewarded compliance over refusal. The degradation is discovered three weeks after deployment, during which the agent has served 1.2 million customer interactions.

What went wrong: The fine-tuning objective was defined in terms of a single metric (helpfulness) without specifying trade-off boundaries for safety properties. No documentation stated which capabilities must not degrade. No evaluation criteria for safety refusal rates were defined before fine-tuning commenced. Consequence: 1.2 million interactions served by a model with degraded safety properties, potential regulatory scrutiny for deploying a model that fails to refuse harmful requests, and remediation cost of £340,000 for emergency rollback, re-evaluation, and re-fine-tuning with safety-aware objectives.

Scenario B — Objective Drift Across Iterative Fine-Tuning: A financial-value agent undergoes five sequential fine-tuning rounds over four months. Each round is documented individually, but no overarching objective document tracks the cumulative intent. Round 1 targets accuracy on structured financial data. Round 2 improves conversational fluency. Round 3 adds regulatory compliance phrasing. Round 4 adjusts tone for enterprise customers. Round 5 optimises for response latency. By round 5, the model's accuracy on structured financial data — the original priority — has degraded by 8.7 percentage points because rounds 2 through 5 each slightly eroded the gains from round 1. No one noticed because each round was evaluated against its own objective, not against the cumulative objective stack.

What went wrong: No cumulative objective document tracked the expected behaviour across all fine-tuning rounds. Each round's evaluation criteria were local to that round. No regression testing against prior round objectives was mandated. The cumulative effect of five optimisations was a net degradation of the most critical capability. Consequence: Deployed agent provides inaccurate financial data to enterprise customers for six weeks before discovery, resulting in two client complaints, one regulatory inquiry, and £180,000 in remediation costs.

Scenario C — Undocumented Fine-Tune Makes Audit Impossible: A regulator asks an organisation to explain why its AI credit scoring agent rejects loan applications from a specific demographic group at a rate 2.3x higher than the population average. The agent was fine-tuned eight months ago on a proprietary dataset. No fine-tune objective document exists. The data scientist who performed the fine-tuning has left the organisation. The training configuration files are on a decommissioned server. The organisation cannot explain: what the fine-tuning intended to achieve, what data was used, what evaluation criteria were applied, or whether the observed demographic disparity was an anticipated trade-off, an unintended consequence, or a pre-existing bias amplified by fine-tuning.

What went wrong: No documented objective existed. No evaluation criteria were specified before fine-tuning. No artefact persisted beyond the individual who performed the work. The organisation has no institutional memory of why this fine-tune was done. Consequence: Regulatory finding for inability to explain model behaviour, potential enforcement action under equality legislation, and £2.4 million remediation programme including full model re-evaluation, retraining, and external audit.

4. Requirement Statement

Scope: This dimension applies to every fine-tuning, instruction-tuning, preference-tuning (RLHF, DPO, etc.), and continuous learning operation performed on a model that is intended for production deployment or pre-production evaluation with real data. It covers full fine-tuning (all parameters updated), parameter-efficient fine-tuning (LoRA, QLoRA, prefix tuning, etc.), and any operation that modifies model behaviour through gradient updates on new data. It does not apply to prompt engineering or in-context learning, which do not modify model weights. It does apply to operations performed by third parties on the organisation's behalf — the obligation to document objectives cannot be delegated to a vendor without the organisation retaining the documentation.

4.1. A conforming system MUST require a documented fine-tune objective before any fine-tuning operation commences, specifying: the business rationale, the behavioural hypothesis, the expected capability gains, the capability trade-offs accepted, the evaluation metrics and thresholds for success, and the conditions under which the fine-tuned model will be rolled back.

4.2. A conforming system MUST specify, in the fine-tune objective, which existing capabilities and safety properties MUST NOT degrade beyond defined thresholds, and include regression tests for those capabilities in the post-fine-tuning evaluation.

4.3. A conforming system MUST record the actual fine-tuning configuration (learning rate, number of epochs, batch size, data composition, parameter-efficiency method, and any regularisation techniques) alongside the objective, enabling reproducibility.

4.4. A conforming system MUST evaluate the fine-tuned model against all criteria specified in the objective document before any deployment decision, and record the evaluation results as a persistent artefact linked to the objective.

4.5. A conforming system MUST maintain fine-tune objective documents for the operational lifetime of the model plus the applicable retention period, independent of the individuals who performed the fine-tuning.

4.6. A conforming system SHOULD maintain a cumulative objective register for models that undergo multiple sequential fine-tuning rounds, tracking the intended capability stack and evaluating each round against the full cumulative objective set, not just the current round's objective.

4.7. A conforming system SHOULD require sign-off on the fine-tune objective from both a technical lead and a risk/governance representative before fine-tuning commences.

4.8. A conforming system MAY implement automated pre-fine-tuning checks that validate the objective document's completeness against a required schema before permitting the training pipeline to execute.

5. Rationale

Fine-tuning is the primary mechanism by which general-purpose models are adapted to specific organisational contexts. It is also the primary mechanism by which model behaviour is degraded unintentionally. The literature on fine-tuning consistently demonstrates that optimising for one capability frequently degrades others — a phenomenon known as catastrophic forgetting in its extreme form, and more subtly as capability trade-off in its common form. Safety alignment, in particular, is fragile under fine-tuning: multiple research studies have shown that as few as 100 carefully selected examples can substantially degrade a model's safety refusal behaviour.

Despite these known risks, fine-tuning in practice is often treated as a routine operational task — "we'll fine-tune it on our data and see if it gets better." This casual approach fails because it defines no baseline, specifies no trade-off boundaries, and establishes no criteria for success or failure. Without a documented objective, the organisation cannot answer the most basic governance question: "Did this fine-tune do what we intended?"

The documentation requirement serves three functions. First, it forces intentionality: writing down the objective before fine-tuning requires the team to think carefully about what they are trying to achieve, what they are willing to sacrifice, and how they will know if they succeeded. Second, it enables evaluation: post-fine-tuning assessment is meaningful only when measured against pre-defined criteria — otherwise, the team selects metrics post hoc that make the result look favourable. Third, it enables auditability: regulators, auditors, and future team members can understand why a model behaves the way it does by reading the documented chain of fine-tuning objectives.

6. Implementation Guidance

Fine-tune objective document template. Establish a standardised template that every fine-tuning operation must complete. The template should include: business rationale (why this fine-tune is needed), behavioural hypothesis (what specific behaviour change is expected), success metrics (quantitative thresholds the fine-tuned model must meet), regression boundaries (which existing capabilities must not degrade beyond specified thresholds), dataset summary (what data will be used and its rights status per AG-340), configuration plan (proposed hyperparameters, method, compute budget), rollback criteria (conditions under which the fine-tuned model will be abandoned in favour of the prior version), and approval signatures (technical lead, risk/governance representative).

Cumulative objective register. For models undergoing iterative fine-tuning, maintain a register that stacks objectives. When round 3 of fine-tuning is proposed, the register shows objectives from rounds 1 and 2, and round 3's evaluation must include regression tests against rounds 1 and 2's success criteria. This prevents the silent erosion of earlier capabilities.

Evaluation framework. Establish a standard evaluation framework that automatically runs when a fine-tuning pipeline completes. The framework should: load the objective document, run all specified success metrics, run all specified regression tests, compare results against thresholds, and produce a structured report showing pass/fail for each criterion. The deployment decision should be gated on this report.

Recommended patterns:

Objective-gated pipeline. Integrate the objective document as a required artefact in the training pipeline. The pipeline refuses to start without a validated objective document. The pipeline automatically runs the evaluation suite defined in the objective and produces the evaluation report. No manual step is required to trigger evaluation — it is an intrinsic part of the pipeline.
Safety regression suite. Maintain a standing safety evaluation suite (e.g., harmful request refusal rate, bias benchmark scores, factual accuracy benchmarks) that runs after every fine-tune regardless of the specific objective. This catches safety degradation even when the fine-tune objective does not specifically address safety.
Pre/post comparison dashboard. Generate a visual comparison of the model's behaviour before and after fine-tuning across a comprehensive set of metrics, not just those in the objective. This surfaces unintended changes that the objective did not anticipate. For example, a fine-tune targeting response quality might inadvertently increase the model's tendency to hallucinate, which would appear on the dashboard even if hallucination was not in the objective.

Anti-patterns to avoid:

Post-hoc rationalisation. Writing the objective document after fine-tuning has been completed to match the observed results. This defeats the purpose of documentation by eliminating the intentionality and evaluation functions. The objective must be timestamped before fine-tuning commences.
Metric cherry-picking. Defining success metrics after seeing results and selecting only the metrics that improved. The objective must specify metrics before fine-tuning, and all specified metrics must be reported regardless of outcome.
Ignoring regression boundaries. Defining success metrics but not regression boundaries. A fine-tune that improves helpfulness by 10% but degrades safety refusal by 15% may look like a success if only helpfulness is measured. Regression boundaries make trade-offs explicit and governable.
One-line objectives. An objective that states "improve customer satisfaction" without specifying: how satisfaction is measured, what threshold constitutes success, which other capabilities must be preserved, and what data will be used is not a governance document — it is a wish. Objectives must be specific, measurable, and complete.
Undocumented iterative fine-tuning. Performing multiple fine-tuning rounds without cumulative tracking. Each round appears individually acceptable, but the cumulative effect may be a model that no longer meets any of its original objectives.

Industry Considerations

Financial Services. Fine-tuning objectives for financial models should explicitly address regulatory requirements: fair lending compliance (no prohibited discriminatory outcomes), explainability requirements (can the organisation explain why the fine-tuned model makes specific decisions?), and model risk management obligations (PRA SS1/23 expectations for documentation of model changes). The fine-tune objective document may need to be submitted to model risk management governance for review.

Healthcare. Fine-tuning of clinical AI models may constitute a significant change requiring regulatory notification under MDR/IVDR. The fine-tune objective document should assess whether the change requires notification to the relevant notified body and should document the clinical risk assessment for the behavioural changes introduced.

Safety-Critical Systems. Fine-tuning of models used in safety-critical contexts (autonomous vehicles, industrial control, aviation) should require formal safety assessment of the proposed behavioural changes before fine-tuning commences. The fine-tune objective should reference the relevant safety case and document how the proposed changes affect safety arguments.

Maturity Model

Basic Implementation — Fine-tuning operations are accompanied by informal documentation (emails, tickets, wiki pages) describing the general intent. Evaluation is performed against ad hoc metrics selected by the data scientist performing the fine-tune. Regression testing is inconsistent. Documentation is often incomplete and difficult to locate after the fact. This level demonstrates awareness but lacks the rigour needed for governance: documentation may not precede fine-tuning, evaluation criteria may not be pre-specified, and regression boundaries are often absent.

Intermediate Implementation — A standardised fine-tune objective template is used for all fine-tuning operations. The template is completed and approved before fine-tuning commences. Post-fine-tuning evaluation runs against all specified metrics and regression boundaries. Results are recorded as persistent artefacts linked to the objective. A cumulative objective register tracks iterative fine-tuning. The organisation can retrieve the objective, configuration, and evaluation results for any fine-tuned model in its deployment inventory.

Advanced Implementation — All intermediate capabilities plus: the fine-tune objective is a required pipeline artefact that gates training execution. Automated evaluation suites run immediately upon fine-tuning completion, including a standing safety regression suite. A pre/post comparison dashboard visualises all behavioural changes, not just those in the objective. Deployment is gated on evaluation results meeting all thresholds. Independent review (e.g., model risk management committee) is required for fine-tunes affecting high-risk models. The organisation can demonstrate to regulators a complete, auditable chain of intent, execution, evaluation, and decision for every fine-tuning operation that contributed to any deployed model.

7. Evidence Requirements

Required artefacts:

Fine-tune objective documents. Completed objective documents for every fine-tuning operation, timestamped before fine-tuning commenced, including business rationale, hypothesis, success metrics, regression boundaries, and approval signatures.
Fine-tuning configuration records. Actual hyperparameters, methods, data references, and compute details used for each fine-tuning run.
Evaluation reports. Post-fine-tuning evaluation results for all specified metrics and regression tests, with pass/fail determination against pre-specified thresholds.
Cumulative objective register. For iteratively fine-tuned models, the register showing the full stack of objectives and cumulative evaluation results.
Deployment decisions. Records of go/no-go decisions following evaluation, with rationale for any deployment of models that did not meet all thresholds.

Retention requirements:

Fine-tune objective documents and evaluation reports: operational lifetime of the model plus minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Objective documents must be retrievable by model identifier and by date range.

8. Test Specification

Test 8.1: Objective Document Completeness

Stimulus: Audit the 10 most recent fine-tuning operations. Retrieve the objective document for each.
Expected behaviour: Every fine-tuning operation has a complete objective document timestamped before fine-tuning commenced, containing all required fields (business rationale, hypothesis, success metrics with thresholds, regression boundaries, rollback conditions).
Pass criteria: 100% of fine-tuning operations have complete, pre-dated objective documents.
Fail criteria: Any fine-tuning operation lacks an objective document, or any document is incomplete or post-dated.

Test 8.2: Regression Boundary Specification

Stimulus: Review objective documents for the 10 most recent fine-tuning operations. Verify that each specifies regression boundaries for existing capabilities.
Expected behaviour: Each objective document identifies specific capabilities that must not degrade and specifies quantitative thresholds for acceptable degradation.
Pass criteria: All objective documents include regression boundaries with quantitative thresholds.
Fail criteria: Any objective document omits regression boundaries or specifies only qualitative ("should not significantly degrade") rather than quantitative thresholds.

Test 8.3: Evaluation Against Objective

Stimulus: For the 10 most recent fine-tuning operations, compare the evaluation report against the objective document. Verify that every metric and regression test specified in the objective was evaluated and reported.
Expected behaviour: The evaluation report covers every metric and regression test in the objective. Results are compared against pre-specified thresholds. Pass/fail is determined for each criterion.
Pass criteria: 100% of specified metrics and regression tests are evaluated and reported. No specified criterion is omitted from the evaluation.
Fail criteria: Any specified metric or regression test is missing from the evaluation report.

Test 8.4: Cumulative Regression Detection

Stimulus: For a model that has undergone three or more sequential fine-tuning rounds, evaluate the current model against the success criteria from round 1.
Expected behaviour: The cumulative objective register includes round 1 criteria. The current model meets round 1 criteria within the specified regression boundaries.
Pass criteria: The current model meets round 1 success criteria within documented regression boundaries, or deviations are documented with explicit risk acceptance.
Fail criteria: The current model violates round 1 regression boundaries without documented awareness or risk acceptance.

Test 8.5: Pipeline Gate Enforcement

Stimulus: Attempt to launch a fine-tuning pipeline without providing an objective document.
Expected behaviour: The pipeline refuses to execute and returns an error indicating that an objective document is required.
Pass criteria: The pipeline blocks execution without a validated objective document.
Fail criteria: The pipeline executes without an objective document.

Test 8.6: Objective Timestamp Integrity

Stimulus: Audit 10 fine-tune objective documents. Compare the document timestamp against the training job start time.
Expected behaviour: Every objective document is timestamped before the corresponding training job commenced.
Pass criteria: All objective documents predate their corresponding training jobs by at least 1 hour (accounting for pipeline setup time).
Fail criteria: Any objective document is timestamped after the training job commenced.

Conformance Scoring

Score 0: No fine-tune documentation — fine-tuning is performed without recorded objectives, evaluation criteria, or regression boundaries.
Score 1: Informal documentation exists — some fine-tuning operations have associated documentation, but it is inconsistent, may be post-hoc, and does not follow a standard template.
Score 2: Systematic objective governance — all fine-tuning operations have pre-dated objective documents following a standard template, with evaluation reports against specified criteria and regression boundaries.
Score 3: Pipeline-enforced objective governance — all Score 2 controls plus automated pipeline gating, standing safety regression suites, cumulative objective tracking, pre/post dashboards, and independent review for high-risk fine-tunes.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9 (Risk Management System)	Supports compliance
EU AI Act	Article 10 (Data and Data Governance)	Supports compliance
PRA SS1/23	Model Risk Management — Model Development	Direct requirement
FCA SYSC	6.1.1R (Systems and Controls)	Supports compliance
NIST AI RMF	MAP 2.1, MAP 2.3, MANAGE 2.2	Supports compliance
ISO 42001	Clause 8.2 (AI Risk Assessment), Clause 8.4 (AI System Operation)	Supports compliance
FDA AI/ML SaMD	Predetermined Change Control Plan	Supports compliance

PRA SS1/23 — Model Risk Management

PRA SS1/23 establishes supervisory expectations for model risk management at PRA-regulated firms. The guidance explicitly requires documentation of model development decisions, validation of model changes, and ongoing monitoring of model performance. Fine-tuning constitutes a model change. The expectation that firms document the rationale for model changes, validate changes before deployment, and monitor for performance degradation maps directly to AG-341's requirements for objective documentation, evaluation, and regression testing. A firm that fine-tunes a credit risk model without a documented objective and evaluation would face supervisory challenge under SS1/23.

FDA AI/ML SaMD — Predetermined Change Control Plan

The FDA's framework for AI/ML-based Software as a Medical Device includes the concept of a Predetermined Change Control Plan — a documented plan that specifies what changes the manufacturer intends to make and how those changes will be evaluated. Fine-tuning of clinical AI models falls within this framework. AG-341's fine-tune objective document is structurally aligned with the PCCP concept: it specifies the intended change, the evaluation criteria, and the conditions for rollback.

EU AI Act — Article 9 (Risk Management System)

Article 9 requires a continuous, iterative risk management process. Fine-tuning that modifies model behaviour is a risk-relevant change that must be managed within the risk management system. AG-341 ensures that fine-tuning operations are documented, evaluated, and governed as risk management activities, not as routine engineering tasks that bypass governance.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Model-specific — affects all deployments of the fine-tuned model and all decisions made by those deployments

Consequence chain: Undocumented fine-tuning creates two compounding risks. First, undetected behavioural degradation: without pre-specified regression boundaries, fine-tuning can silently erode safety properties, accuracy, fairness, or other critical capabilities. The degradation may not be detected until a downstream failure occurs — by which time the model may have made thousands or millions of decisions with degraded capability. A financial model with an 8.7% accuracy degradation serving enterprise clients for six weeks could produce material financial harm. A customer-facing model with a 13% drop in safety refusal rate serving 1.2 million interactions creates significant liability exposure. Second, audit impossibility: when a regulator or auditor asks "why does this model behave this way?", the absence of fine-tuning documentation makes it impossible to explain the model's development history. For regulated industries, inability to explain model behaviour is itself a compliance failure — distinct from whatever the actual behavioural issue is. The remediation cost depends on the severity: minor cases require re-evaluation and documentation reconstruction (£50,000-£200,000); major cases require full model retraining with proper documentation (£500,000-£5,000,000); worst cases involve regulatory enforcement action and service withdrawal.

Cross-references: AG-090 (Fine-Tune and Adapter Provenance) provides the technical provenance infrastructure within which fine-tune objectives are tracked. AG-048 (AI Model Provenance and Integrity) establishes the broader model provenance framework. AG-057 (Dataset Suitability and Bias Control) addresses the quality and bias properties of fine-tuning data. AG-340 (Training Corpus Rights Governance) covers the rights dimension of fine-tuning datasets. AG-339 through AG-348 form the sibling landscape for Model Provenance, Training & Adaptation.

Cite this protocol

AgentGoverning. (2026). AG-341: Fine-Tune Objective Documentation Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-341

← Previous Protocol

AG-340

Training Corpus Rights Governance

Next Protocol →

AG-342

Adapter Composition Approval Governance