AG-104

Trust Calibration Governance

Human Factors & Sociotechnical Control ~17 min read AGS v2.1 · April 2026
EU AI Act FCA NIST ISO 42001

2. Summary

Trust Calibration Governance requires that organisations implement explicit controls to ensure human operators maintain appropriately calibrated trust in AI agent outputs and decisions — neither over-trusting (automation complacency) nor under-trusting (automation disuse). The dimension mandates measurable mechanisms that align operator confidence with actual agent reliability, including dynamic trust indicators, performance transparency dashboards, and structured recalibration interventions when trust-reliability divergence is detected. Without trust calibration controls, human oversight becomes either a rubber-stamp exercise or an obstruction that defeats the purpose of agent deployment.

3. Example

Scenario A — Automation Complacency in Financial Trade Review: A financial services firm deploys an AI agent to generate trade recommendations with a human reviewer approving each trade. During the first three months, the agent's recommendations are correct 98.7% of the time. The human reviewer, observing consistent accuracy, develops a pattern of approving recommendations within 2 seconds of display — insufficient time to meaningfully evaluate the trade rationale. In month four, the agent begins generating subtly flawed recommendations due to a data pipeline change that introduces stale pricing data. The human reviewer continues approving at the same 2-second pace, rubber-stamping 47 trades over three days that collectively result in £2.3 million in losses. Post-incident analysis reveals the reviewer's approval time had declined from an initial average of 45 seconds to 1.8 seconds, but no system tracked or flagged this decline.

What went wrong: No trust calibration mechanism existed. The system did not track operator engagement metrics (review time, query rate, override frequency) as proxies for trust level. No recalibration intervention was triggered when the reviewer's behaviour indicated automation complacency. The human oversight control existed on paper but had become functionally absent. Consequence: £2.3 million in trading losses, FCA investigation into adequacy of human oversight controls, personal liability risk for the Senior Manager responsible under SM&CR.

Scenario B — Automation Disuse in Clinical Decision Support: A hospital deploys an AI agent to assist radiologists with preliminary scan analysis. After a widely publicised incident at another hospital where an AI system missed a tumour, the radiology department's trust in the tool collapses. Radiologists begin ignoring the agent's outputs entirely, performing full independent reads on every scan. The agent correctly identifies 12 critical findings over a two-week period that the radiologists, now operating under time pressure from performing double work, miss on their independent reads. Three patients experience delayed diagnoses.

What went wrong: No mechanism existed to communicate the agent's actual per-category accuracy to the radiologists. The trust failure was driven by an anecdotal external event, not by observed local performance data. No structured recalibration process re-established appropriate trust by presenting the agent's validated accuracy for the specific scan types in use. Consequence: Three delayed diagnoses, potential malpractice claims, regulatory scrutiny from CQC, and effective waste of the AI investment.

Scenario C — Trust Asymmetry Across Operator Shifts: A logistics company deploys an AI agent for route optimisation. Day-shift operators, who were involved in the agent's training and validation, trust it appropriately and override only when they have specific local knowledge. Night-shift operators, who received a 30-minute briefing, either follow the agent's recommendations blindly or override them based on gut instinct. The day shift achieves a 14% efficiency improvement; the night shift shows a 3% degradation. Management cannot explain the discrepancy because no per-operator trust calibration metrics exist.

What went wrong: Trust calibration was not systematically managed across the operator population. Training was inconsistent. No per-operator engagement metrics were tracked. No mechanism identified the divergent trust profiles between shifts. Consequence: Inconsistent operational performance, inability to demonstrate uniform human oversight quality, and audit finding for inadequate operator training.

4. Requirement Statement

Scope: This dimension applies to all AI agent deployments where human operators are expected to review, approve, override, or otherwise exercise judgement over agent outputs or actions. It applies regardless of whether the human role is formally designated as an approver, reviewer, monitor, or supervisor. The scope includes direct human-agent interaction (a human reviewing an agent's recommendation) and indirect interaction (a human monitoring a dashboard of agent activity). It excludes fully autonomous operations where no human oversight is expected or required — though organisations should note that removing human oversight without trust calibration evidence may itself be a governance deficiency under AG-019. The scope extends to all operator roles across all shifts, locations, and experience levels; trust calibration is not a one-time training event but an ongoing operational control.

4.1. A conforming system MUST track at least three operator engagement metrics as proxies for trust calibration: mean review time per decision, override rate per operator, and query or clarification request rate.

4.2. A conforming system MUST define threshold bands for each trust proxy metric that distinguish appropriately calibrated trust from over-trust (complacency) and under-trust (disuse), with thresholds validated against the agent's measured reliability for the specific task category.

4.3. A conforming system MUST trigger a recalibration intervention when any operator's trust proxy metrics fall outside the defined threshold bands for more than 48 consecutive hours or 50 consecutive decisions, whichever comes first.

4.4. A conforming system MUST present operators with ongoing, contextual trust indicators that communicate the agent's current reliability for the specific decision type — not a single global accuracy figure, but per-category performance data updated at least weekly.

4.5. A conforming system MUST log all trust calibration metrics, threshold breaches, and recalibration interventions with timestamps and operator identifiers.

4.6. A conforming system SHOULD implement dynamic trust thresholds that adjust as the agent's measured reliability changes — tightening acceptable review times when agent reliability decreases, and relaxing them when reliability is validated at higher levels.

4.7. A conforming system SHOULD deliver recalibration interventions through structured methods: presenting recent agent errors to complacent operators, and presenting validated accuracy data to distrustful operators.

4.8. A conforming system SHOULD track trust calibration metrics per operator, per task category, and per shift to detect population-level calibration asymmetries.

4.9. A conforming system MAY implement challenge tasks — occasional synthetic decisions where the agent's recommendation is deliberately incorrect — to verify that operators are genuinely evaluating outputs rather than rubber-stamping.

5. Rationale

Trust Calibration Governance addresses the fundamental vulnerability in any human-in-the-loop architecture: the assumption that human oversight is meaningful simply because a human is present. Decades of human factors research in aviation, nuclear power, and process control demonstrate that human trust in automated systems follows predictable but ungoverned trajectories — typically rising to complacency after a period of high automation reliability, or collapsing to disuse after a salient failure event. Neither trajectory produces effective oversight.

The concept originates from Lee and See's foundational trust calibration framework (2004), which established that trust in automation must be calibrated to match actual system capability. Parasuraman and Riley's research on automation misuse and disuse (1997) demonstrated that uncalibrated trust produces worse outcomes than no automation at all — operators either ignore valid automation outputs or fail to detect automation failures. These findings have been consistently replicated across domains for three decades.

In the AI agent context, the trust calibration problem is amplified by three factors. First, AI agents exhibit variable reliability across task categories — an agent that is 99% accurate on routine decisions may be only 60% accurate on edge cases, but operators who experience the 99% develop trust that generalises inappropriately to the 60%. Second, AI agent reliability can shift rapidly due to data drift, model updates, or environmental changes, creating a moving target for human calibration. Third, AI agents can be persuasive in their explanations, creating an illusion of competence that further biases operators toward over-trust.

AG-104 intersects directly with AG-019 (Human Escalation & Override Triggers) because escalation mechanisms are effective only when operators trust them to work and trust their own judgement to invoke them. It intersects with AG-038 (Human Control Responsiveness) because response time requirements are meaningful only when operators are engaged rather than complacent. And it intersects with AG-049 (Governance Decision Explainability) because explanation quality directly influences trust calibration — poor explanations can undermine warranted trust, while convincing explanations of wrong answers can amplify unwarranted trust.

6. Implementation Guidance

Trust calibration is an ongoing operational control, not a one-time configuration. The implementation must continuously measure the alignment between operator trust (as expressed through behaviour) and agent reliability (as measured through outcomes), then intervene when the two diverge.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. Trust calibration directly supports SM&CR obligations by demonstrating that human oversight is substantive rather than nominal. The FCA's expectations for algorithmic trading oversight (MiFID II RTS 6) require that human monitors are capable of intervening effectively — which requires calibrated trust. Trading firms should align trust calibration thresholds with existing human performance monitoring for manual traders. Challenge task frequency should be higher for high-value decision categories: at least 1 per 50 decisions for trade approvals exceeding £100,000.

Healthcare. Clinical decision support systems are subject to MHRA regulation (where the AI qualifies as a medical device) and CQC oversight of care quality. Trust calibration is a patient safety control: both over-trust (missing an AI error that harms a patient) and under-trust (ignoring a valid AI finding that delays diagnosis) produce adverse patient outcomes. Per-category reliability data should align with clinical sensitivity — distinguishing accuracy for common conditions from accuracy for rare conditions where AI training data is sparse.

Critical Infrastructure. In safety-critical environments (aviation, nuclear, process control), trust calibration has an established regulatory basis through human factors requirements in IEC 61511, DO-178C, and nuclear regulatory frameworks. Organisations deploying AI agents in these contexts should map AG-104 requirements to existing human factors obligations. Challenge task design must account for the risk that a synthetic error could trigger real safety consequences if the operator fails to catch it.

Maturity Model

Basic Implementation — The organisation tracks at least three trust proxy metrics (review time, override rate, query rate) per operator. Threshold bands are defined based on initial calibration baselines. Alerts are generated when operators breach thresholds. Recalibration consists of a notification to the operator's supervisor. Per-category reliability data is available but not actively presented to operators. This level meets the minimum mandatory requirements but relies on manual supervision for recalibration and does not systematically verify that operators are genuinely evaluating outputs.

Intermediate Implementation — All basic capabilities plus: per-category reliability dashboards are presented to operators in the decision interface. Structured recalibration interventions are automated — complacent operators receive curated error examples; distrustful operators receive validated accuracy presentations. Trust metrics are tracked per operator, per category, and per shift, with population-level analysis identifying systemic calibration issues. Thresholds adjust dynamically as agent reliability changes. Recalibration effectiveness is measured by tracking whether metrics return to calibrated bands within 5 working days of intervention.

Advanced Implementation — All intermediate capabilities plus: challenge tasks are injected at a rate of at least 1 per 100 decisions, with per-operator detection rates tracked and reported. Predictive models identify operators trending toward miscalibration before thresholds are breached. Trust calibration data feeds into agent deployment decisions — new agent capabilities are not activated until operator trust calibration for the new category has been validated. Independent auditors review trust calibration effectiveness annually. The organisation can demonstrate to regulators that human oversight is substantively engaged, not nominal, with quantitative evidence at the per-operator level.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Engagement Metric Capture Accuracy

Test 8.2: Threshold Breach Detection

Test 8.3: Recalibration Intervention Delivery

Test 8.4: Per-Category Reliability Display Accuracy

Test 8.5: Challenge Task Indistinguishability

Test 8.6: Trust Metric Logging Completeness

Test 8.7: Population-Level Asymmetry Detection

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 14 (Human Oversight)Direct requirement
EU AI ActArticle 9 (Risk Management System)Supports compliance
FCA SYSC6.1.1R (Systems and Controls)Supports compliance
MiFID II RTS 6Article 18 (Human review of algorithmic trading)Direct requirement
NIST AI RMFGOVERN 1.4, MEASURE 2.6Supports compliance
ISO 42001Clause 8.4 (AI System Operation)Supports compliance
MHRA Software as Medical DeviceIntended Purpose & Human FactorsSupports compliance

EU AI Act — Article 14 (Human Oversight)

Article 14(4)(a) requires that human oversight measures enable the individual exercising oversight to "correctly interpret the high-risk AI system's output." Trust calibration is the operational mechanism that ensures this requirement is met in practice. Without calibrated trust, human overseers either do not interpret outputs at all (complacency) or reject correct outputs (disuse). Article 14(4)(b) requires that overseers can "decide not to use the high-risk AI system or to disregard, override or reverse the output." Trust calibration ensures that override decisions are based on informed judgement rather than miscalibrated trust. The EU AI Act's human oversight requirements are meaningful only if the humans exercising oversight are genuinely engaged — AG-104 provides the control that ensures genuine engagement.

MiFID II RTS 6 — Article 18

Article 18 requires investment firms using algorithmic trading to ensure adequate human review. For AI agents executing or recommending trades, trust calibration provides evidence that human reviewers are substantively engaged rather than rubber-stamping. Regulatory supervisors examining a trading loss will ask not only whether a human reviewed the trade but whether the review was meaningful — AG-104 provides the quantitative evidence to answer that question.

NIST AI RMF — GOVERN 1.4, MEASURE 2.6

GOVERN 1.4 addresses organisational practices for AI risk governance including human-AI interaction. MEASURE 2.6 addresses the assessment of human-AI teaming effectiveness. AG-104 provides the measurement infrastructure and intervention mechanisms that operationalise these functions, ensuring that human-AI interaction is monitored and managed rather than assumed.

10. Failure Severity

FieldValue
Severity RatingHigh
Blast RadiusOrganisation-wide — affects all human-overseen agent operations and the credibility of human oversight claims to regulators

Consequence chain: Without trust calibration controls, human oversight degrades silently. Over-trust produces rubber-stamping that renders human-in-the-loop controls functionally absent — the organisation believes it has human oversight but in practice does not. Under-trust produces rejection of valid agent outputs, negating the operational benefit of agent deployment and potentially producing worse outcomes than either pure human or pure agent operation. Both failure modes are invisible without measurement: an operator who approves every recommendation in 2 seconds appears to be "working efficiently" unless engagement metrics are tracked. The regulatory consequence is severe: organisations claiming human oversight compliance (EU AI Act Article 14, FCA SM&CR) without trust calibration evidence face the risk that a post-incident investigation reveals oversight was nominal. The financial consequence depends on the domain but scales with the volume of decisions processed under miscalibrated oversight. In the financial services example above, a single complacent reviewer produced £2.3 million in losses over three days; at scale, the exposure is proportionally larger.

Cross-references: AG-019 (Human Escalation & Override Triggers) establishes when escalation must occur; AG-104 ensures operators are calibrated to invoke escalation appropriately. AG-038 (Human Control Responsiveness) sets response time requirements; AG-104 ensures operators are engaged enough to meet them. AG-049 (Governance Decision Explainability) provides the explanation quality that supports informed trust; AG-104 measures whether that trust is actually calibrated. AG-105 (Oversight Workload and Alarm Fatigue Governance) addresses the workload conditions that degrade trust calibration. AG-106 (Human Skill Atrophy Monitoring Governance) addresses the skill degradation that miscalibrated trust accelerates. AG-107 (Override Usability and Actionability Governance) ensures that when calibrated operators decide to override, the mechanism is usable. AG-108 (Operator Role Segregation Governance) ensures that trust calibration is measured per role with appropriate thresholds.

Cite this protocol
AgentGoverning. (2026). AG-104: Trust Calibration Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-104