AG-170: Approval Quality and Substantive Review Governance

2. Summary

Approval Quality and Substantive Review Governance requires that every human approval in an AI agent governance workflow represents a genuine, informed decision — not a rubber stamp. As organisations deploy AI agents that escalate decisions to human approvers, a critical vulnerability emerges: the human approval becomes a formality rather than a substantive review. Approvers click "approve" without reading the request, review a summary without examining the underlying data, or approve within seconds of receiving the request — all patterns indicating that the human oversight is performative rather than meaningful. AG-170 addresses this by requiring structural controls that measure, enforce, and verify the quality of human approvals. The principle is that a governance framework that routes decisions to human approvers is only as strong as the rigour of those approvals. If the human layer is a rubber stamp, the governance framework has a human-shaped hole in it.

3. Example

Scenario A — Rapid-Fire Approval of AI-Generated Loan Decisions: A retail bank deploys an AI agent that underwrites consumer loans up to GBP 50,000. Loans exceeding GBP 25,000 require human approval per the bank's credit risk policy. The agent processes 400 loan applications per day, of which approximately 120 exceed the GBP 25,000 threshold and are routed to a team of 4 human approvers. Each approver receives 30 approval requests per day. The agent presents each request with a summary: "Applicant: [name]. Amount: GBP [amount]. AI recommendation: Approve. Risk score: [score]. Approval required."

Over 3 months, the approval rate is 97.8%. Average time-to-approval is 11 seconds. No approver has ever requested additional information. No approver has modified any loan terms. An internal audit samples 50 approved loans and finds that 8 had risk indicators that should have triggered further inquiry — income inconsistencies, recent address changes combined with high loan amounts, and employer verification failures. The approvers approved them all in under 15 seconds.

What went wrong: The approval workflow was designed for throughput, not quality. The AI's recommendation ("Approve") anchored the human decision. The summary excluded the risk indicators that would have prompted further inquiry. The 11-second average approval time is physically incompatible with substantive review — the minimum time to read the summary, assess the recommendation, and make an informed decision is estimated at 90-120 seconds. The approval was performative. Consequence: 8 loans totalling GBP 287,000 approved without substantive review, estimated GBP 115,000 in expected credit losses from those loans, FCA enforcement risk for inadequate creditworthiness assessment under CONC 5.2.

Scenario B — Approval Fatigue in Safety-Critical System: A pharmaceutical manufacturer uses an AI agent to manage batch release for a biologic drug. Each batch undergoes automated quality testing, and the agent compiles test results into a release recommendation. A qualified person (QP) must review and approve each batch release — a regulatory requirement under the EU GMP Annex 16. The agent processes 15 batches per day, and the QP approves each within 2-3 minutes.

Over time, the QP develops approval fatigue. When Batch 2024-0847 arrives with an out-of-specification endotoxin reading (0.28 EU/mL against a limit of 0.25 EU/mL), the agent correctly flags the result but presents it among 47 other in-specification results. The QP approves the batch in 2 minutes and 14 seconds. The batch is released and administered to 340 patients. 12 patients develop pyrogenic reactions. The MHRA investigation reveals that the QP's approval time, approval rate (100% over 6 months), and lack of any queries or rejections indicate inadequate review.

What went wrong: The approval workflow buried the critical information (one out-of-specification result among 47 in-specification results). The QP's 100% approval rate over 6 months with no queries or rejections is statistical evidence of rubber-stamping. No mechanism enforced minimum review time, required acknowledgement of flagged anomalies, or varied the presentation to maintain the QP's attention. Consequence: 12 patients harmed, batch recall costing EUR 4.7 million, MHRA regulatory action against the manufacturing authorisation, personal liability for the QP.

Scenario C — Approval Without Access to Underlying Data: An AI agent in an asset management firm prepares quarterly investor reports. Each report requires compliance officer approval before distribution. The agent generates a 40-page report with performance figures, risk metrics, and regulatory disclosures. The compliance officer receives a notification: "Report ready for approval. Summary: Q3 performance report for Fund X. 40 pages. Risk level: standard." The compliance officer approves based on the summary without opening the report.

The report contains a performance figure calculated using a methodology that was changed mid-quarter without disclosure — the Q2 report used time-weighted returns and the Q3 report uses money-weighted returns, making the performance appear 3.2 percentage points better than it would under the original methodology. The methodology change is disclosed on page 34 in a footnote. The compliance officer, who approved without reading, did not detect the undisclosed methodology change.

What went wrong: The compliance officer approved without accessing the underlying document. The approval workflow did not require the approver to demonstrate access to the full content. No mechanism flagged the methodology change as a material disclosure issue requiring explicit review. Consequence: Misleading performance reporting to investors, FCA enforcement for COBS 4.2 breach (fair, clear, and not misleading communications), potential investor claims for GBP 12 million (the net new investment attracted by the inflated performance figure).

4. Requirement Statement

Scope: This dimension applies to all AI governance workflows that include a human approval step. This includes but is not limited to: transaction approvals, content review and release, escalation handling, exception approvals, batch release, regulatory submission review, and any governance control that requires a human to review and approve an AI agent's proposed action or output before it takes effect. The scope covers both formal approval workflows (structured approval queues with defined authority levels) and informal approval workflows (an AI agent asking a human "shall I proceed?" via chat or email). The test for inclusion is: does the governance framework rely on a human reviewing and approving something before the agent acts? If yes, the quality of that approval is within AG-170's scope.

4.1. A conforming system MUST enforce a minimum review period proportional to the complexity of the approval request, below which approval is not accepted (e.g., minimum 60 seconds for standard requests, minimum 300 seconds for high-complexity requests).

4.2. A conforming system MUST require the approver to access the underlying data or document, not just the summary, before approval is accepted — verified through content access logging (e.g., the approver opened the full document, scrolled to at least 80% of the content, or accessed specific flagged sections).

4.3. A conforming system MUST present risk indicators, anomalies, and flagged items prominently and require explicit acknowledgement of each before approval is accepted, preventing approval without awareness of known concerns.

4.4. A conforming system MUST track approval quality metrics per approver — including approval rate, average review time, percentage of requests where additional information was requested, and percentage of approvals that subsequently required remediation — and flag approvers whose patterns indicate rubber-stamping.

4.5. A conforming system MUST vary the presentation of approval requests to prevent pattern-based auto-approval — for example, occasionally reordering sections, changing the visual layout, or inserting synthetic test requests that have a known correct decision.

4.6. A conforming system MUST not present the AI agent's recommendation in a way that anchors the approver's decision — for example, presenting "AI recommends: Approve" before the approver has reviewed the underlying data biases the decision toward approval.

4.7. A conforming system MUST log every approval decision with the approver's identity, the time spent reviewing, the content accessed, and any anomalies acknowledged, in a tamper-evident record per AG-006.

4.8. A conforming system SHOULD implement periodic calibration exercises where approvers review cases with known correct decisions, and their performance is measured against the known answers.

4.9. A conforming system SHOULD rotate approvers across request types to prevent familiarity-induced complacency and ensure fresh perspective.

4.10. A conforming system MAY implement dual-approval requirements for high-risk decisions, where two independent approvers must agree before the action proceeds.

5. Rationale

Human approval is the last line of defence in many AI governance frameworks. When an agent's automated controls flag a concern, the typical response is to escalate to a human approver. When a decision exceeds the agent's authority, a human makes the final call. When regulatory requirements demand human oversight, a human signs off. The entire governance architecture depends on the assumption that the human approval is a meaningful, informed decision. AG-170 challenges this assumption — and provides structural controls to make it true.

The problem is well-documented in human factors research. Automation bias — the tendency for humans to over-rely on automated recommendations — increases when the human perceives the automated system as more capable or more consistent than themselves. When an AI agent recommends approval with a confidence score, the human approver faces a cognitive asymmetry: rejecting the recommendation requires active effort, justification, and the risk of being wrong, while accepting it requires only a click. Over time, approval rates converge toward 100% as approvers learn that questioning the AI's recommendation is rarely rewarded and often delayed.

The structural indicators of rubber-stamping are measurable: approval rates above 95% over sustained periods, average review times below the physical minimum for substantive review, zero requests for additional information, and zero modifications to the proposed action. These patterns do not prove that every individual approval was inadequate, but they provide strong evidence that the approval process is not functioning as a genuine control.

AG-170 addresses this through structural interventions: minimum review times that make rubber-stamping physically impossible, content access verification that ensures the approver saw the data, prominence requirements for risk indicators that prevent critical information from being buried, and quality metrics that detect pattern-based approval behaviour. The goal is not to slow down the approval process for its own sake — it is to ensure that when the governance framework routes a decision to a human, the human's engagement is genuine and informed.

6. Implementation Guidance

Approval quality governance requires a combination of workflow design, behavioural measurement, and continuous calibration. The challenge is to enforce substantive review without creating an approval bottleneck that delays legitimate operations.

Recommended patterns:

Progressive disclosure approval workflow. Present the approval request in stages: first, the core facts without the AI's recommendation; second, the risk indicators and anomalies requiring explicit acknowledgement; third, the AI's recommendation (if any) alongside the approver's own assessment. This sequence prevents anchoring — the approver forms their initial judgment before seeing the AI's recommendation. The workflow tracks which stages the approver accessed and how long they spent on each.
Synthetic test injection. Periodically inject synthetic approval requests with known correct decisions into the approval queue. These test cases include cases that should be rejected (e.g., a loan with fabricated income documentation, a batch release with an out-of-specification result). Track whether the approver correctly identifies the issue. Test cases should be indistinguishable from real requests in format and presentation. Results inform the approver quality metrics and calibration scores.
Approval quality dashboard. Maintain a real-time dashboard showing per-approver metrics: approval rate (30-day rolling), average review time by complexity tier, percentage of requests with queries or modifications, and synthetic test case accuracy. The dashboard is visible to the approver's manager and to the governance function. Approvers whose metrics indicate rubber-stamping (e.g., approval rate >97%, average review time <30 seconds, zero queries over 30 days) are flagged for intervention — additional training, workload redistribution, or temporary removal from the approval pool.
Minimum engagement verification. Beyond minimum time, verify that the approver engaged with the content: did they open the full document (not just the summary)? Did they scroll to flagged sections? Did they acknowledge each flagged anomaly? Implementation options include document access logging, scroll depth tracking, and explicit checkbox acknowledgement for flagged items.

Anti-patterns to avoid:

Presenting the AI recommendation first. Showing "AI recommends: Approve" at the top of the approval request creates an anchoring effect that biases the approver toward approval. The recommendation should appear only after the approver has reviewed the underlying data.
One-click approval without engagement gates. A single "Approve" button with no engagement requirements enables rubber-stamping. The approval interface should require multiple engagement steps (acknowledge risks, confirm data access, provide rationale for high-value decisions).
Measuring approval speed as a performance metric. If approvers are measured on throughput (approvals per hour), they are incentivised to approve quickly. Approval quality should be measured by accuracy (calibrated against test cases and subsequent outcomes), not speed.
Sending approval requests in large batches. Presenting 30 approval requests at once creates queue pressure that incentivises rapid processing. Approvals should be presented individually or in small batches with quality controls between each.
Ignoring approval outcome feedback. If an approval subsequently proves incorrect (the loan defaults, the batch is recalled, the report is misleading), this information should feed back to the approver's quality metrics. Without outcome feedback, there is no mechanism to distinguish good approvers from rubber-stampers.

Industry Considerations

Financial Services. FCA COBS 5.2 requires that suitability assessments are based on adequate information. A human approver who approves a suitability recommendation without reviewing the underlying data has not performed an adequate assessment. CONC 5.2 requires creditworthiness assessments to be based on sufficient information — a 15-second approval of a GBP 30,000 loan application is not a sufficient assessment.

Healthcare. EU GMP Annex 16 requires the Qualified Person to certify that each batch meets its release specification. The QP's certification is personal and non-delegable. A QP who rubber-stamps batch releases is not fulfilling their legal obligation, creating personal criminal liability and regulatory risk to the manufacturing authorisation.

Public Sector. Decisions affecting citizens' rights (benefits, immigration, criminal justice risk assessments) require meaningful human review to satisfy Article 22 GDPR (automated decision-making). A rubber-stamp approval of an AI recommendation does not constitute meaningful human intervention — the European Data Protection Board has stated that the human review must be "carried out by someone who has the authority and competence to change the decision."

Maturity Model

Basic Implementation — Minimum review times are enforced for approval requests. The AI recommendation is not shown until the approver accesses the underlying data. Approval rate and average review time are tracked per approver. Approvers whose metrics indicate potential rubber-stamping are flagged for management review. Coverage: all high-value or safety-critical approval workflows.

Intermediate Implementation — All basic capabilities plus: progressive disclosure prevents anchoring. Risk indicators require explicit acknowledgement. Content access verification confirms the approver opened the full document. Synthetic test cases are injected periodically and results are tracked. An approval quality dashboard provides real-time visibility. Approvers are rotated across request types. Coverage: all approval workflows, including informal approvals.

Advanced Implementation — All intermediate capabilities plus: outcome feedback links approval decisions to subsequent outcomes, enabling measurement of actual approval accuracy (not just process compliance). Calibration exercises are conducted quarterly with known-answer cases. Dual approval is implemented for the highest-risk decisions. The organisation can demonstrate to regulators that human approvals represent genuine, informed decisions with measured accuracy rates — not performative compliance.

7. Evidence Requirements

Required artefacts:

Approval quality metrics reports. Per-approver metrics showing approval rate, average review time, query rate, and calibration accuracy. Minimum frequency: monthly. Format: structured data with trend analysis.
Content access logs. Records demonstrating that approvers accessed the underlying data/documents, not just the summary, including timestamps, document identifiers, and access depth (pages viewed, scroll depth, sections accessed). Minimum 12 months retention.
Anomaly acknowledgement records. Records showing that flagged risk indicators were explicitly acknowledged by the approver before approval was granted. Includes the anomaly identifier, the approver's acknowledgement timestamp, and any notes.
Synthetic test case results. Results from calibration test cases, showing the test case presented, the approver's decision, the known correct decision, and whether the approver identified the issue. Minimum 12 months retention.

Retention requirements:

Approval quality metrics and content access logs: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators or auditors within 48 hours of request. Per-approver accuracy metrics must be available for individual regulatory inquiries.

8. Test Specification

Test 8.1: Minimum Review Time Enforcement

Stimulus: Submit an approval request and attempt to approve it in less than the configured minimum review time (e.g., approve within 5 seconds against a 60-second minimum).
Expected behaviour: The approval is rejected. The system displays a message indicating that the minimum review period has not elapsed.
Pass criteria: The approval is not accepted before the minimum review time. The rejection is logged.
Fail criteria: The approval is accepted before the minimum review time.

Test 8.2: Content Access Requirement

Stimulus: Submit an approval request for a 40-page document. Attempt to approve without opening the document.
Expected behaviour: The approval is blocked. The system requires the approver to open and access the document before the approve action becomes available.
Pass criteria: Approval is not possible without content access. The system tracks and logs content access.
Fail criteria: Approval is accepted without the approver accessing the underlying document.

Test 8.3: Anomaly Acknowledgement Requirement

Stimulus: Submit an approval request with 3 flagged risk indicators. Attempt to approve without acknowledging any of them.
Expected behaviour: The approval is blocked until all 3 risk indicators are explicitly acknowledged (e.g., checkbox per indicator).
Pass criteria: Approval requires acknowledgement of all flagged anomalies. Unacknowledged anomalies prevent approval.
Fail criteria: Approval is accepted with unacknowledged risk indicators.

Test 8.4: Anti-Anchoring — Recommendation Sequencing

Stimulus: Review the approval workflow presentation sequence. Verify that the AI's recommendation is not displayed until after the approver has accessed the underlying data.
Expected behaviour: The workflow shows the data first, risk indicators second, and the AI recommendation third (or not at all).
Pass criteria: The AI recommendation is hidden until the approver completes initial data review.
Fail criteria: The AI recommendation is visible before the approver accesses the underlying data.

Test 8.5: Rubber-Stamping Detection

Stimulus: Simulate an approver pattern over 30 days: 98% approval rate, average review time of 12 seconds, zero queries or modifications, zero rejected approvals.
Expected behaviour: The approval quality monitoring system flags the approver as exhibiting rubber-stamping indicators. An alert is generated for management review.
Pass criteria: The pattern is detected and flagged. The alert includes the specific metrics triggering the flag.
Fail criteria: The pattern is not detected, or no alert is generated.

Test 8.6: Synthetic Test Case Detection Accuracy

Stimulus: Inject a synthetic approval request with a known issue (e.g., a loan application with income documentation from a non-existent employer). Track the approver's response.
Expected behaviour: The approver either rejects the request or queries the income documentation. The result is recorded in the approver's quality metrics.
Pass criteria: If the approver approves without detecting the issue, the miss is recorded and contributes to the approver's calibration score.
Fail criteria: The synthetic test case is indistinguishable in format but the system fails to record the approver's response.

Test 8.7: Approval Decision Logging Completeness

Stimulus: Process 20 approval requests through the workflow. Verify that every decision is logged with the required metadata.
Expected behaviour: Every approval and rejection is logged with: approver identity, timestamp, time spent reviewing, content access evidence, anomalies acknowledged, and decision rationale (if required by the complexity tier).
Pass criteria: 100% of decisions are logged with all required metadata.
Fail criteria: Any decision is missing from the log, or any required metadata field is absent.

Conformance Scoring

Score 0: No approval quality governance exists — approvals are accepted with no minimum review time, no content access verification, and no quality metrics tracking.
Score 1: Minimum review times are enforced and approval rates are tracked, but content access is not verified, the AI recommendation anchors the approver, and no calibration exercises are conducted.
Score 2: Progressive disclosure prevents anchoring. Content access and anomaly acknowledgement are verified. Approval quality metrics are tracked per approver and flagged for management review. Synthetic test cases are injected periodically.
Score 3: Verified through outcome analysis — approval decisions are correlated with subsequent outcomes, demonstrating that the approval process has measurable accuracy. Calibration exercises demonstrate that approvers correctly identify issues at a rate above a defined threshold (e.g., 85%). The organisation can demonstrate to regulators that human approvals represent substantive, informed decisions.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 14 (Human Oversight)	Direct requirement
EU AI Act	Article 29 (Obligations of Users of High-Risk AI)	Supports compliance
GDPR	Article 22 (Automated Individual Decision-Making)	Direct requirement
FCA COBS	5.2 (Suitability)	Supports compliance
FCA CONC	5.2 (Creditworthiness Assessment)	Supports compliance
EU GMP	Annex 16 (Certification by a Qualified Person)	Direct requirement
SOX	Section 302 (Corporate Responsibility for Financial Reports)	Supports compliance
NIST AI RMF	GOVERN 1.3, MANAGE 4.2	Supports compliance

EU AI Act — Article 14 (Human Oversight)

Article 14 requires that high-risk AI systems be designed and developed to enable effective human oversight, including by individuals who can "fully understand the capacities and limitations of the high-risk AI system" and "properly interpret the high-risk AI system's output." A rubber-stamp approval does not satisfy these requirements. AG-170 ensures that the human oversight mandated by Article 14 is substantive — the approver demonstrates understanding by engaging with the data, acknowledging risks, and making an informed decision.

Article 22 gives data subjects the right not to be subject to a decision based solely on automated processing. When organisations use human approval to satisfy Article 22 (i.e., the decision is not "solely" automated because a human approves it), the quality of that approval matters. The EDPB has stated that the human review must involve "someone who has the authority and competence to change the decision." A rubber-stamp approval that changes nothing does not satisfy Article 22.

EU GMP — Annex 16

Annex 16 requires that the Qualified Person personally certify that each batch meets specification. The QP cannot delegate this certification or perform it perfunctorily. AG-170's minimum review time, content access verification, and anomaly acknowledgement requirements directly support the QP's ability to demonstrate that their certification was based on genuine review.

FCA CONC — 5.2 (Creditworthiness Assessment)

CONC 5.2 requires firms to assess creditworthiness based on sufficient information. A human approver who approves a credit decision in 11 seconds without reviewing the underlying documentation has not conducted a creditworthiness assessment based on sufficient information. AG-170 ensures that the review time and content access are proportionate to the decision's complexity.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Process-wide — affects all decisions flowing through the approval workflow

Consequence chain: Without approval quality governance, human approval becomes a rubber stamp, and the governance framework has a human-shaped hole in it. The immediate technical failure is a non-substantive approval — a human clicked "approve" without genuinely reviewing the request. The operational impact compounds over time: approval quality degrades as approvers learn that rubber-stamping has no consequences, approval rates converge toward 100%, and the AI agent's outputs are effectively unreviewed. The risk accumulates until an incident exposes the gap — a defaulted loan that should have been queried, a released batch that should have been held, a published report that should have been corrected. The business consequence includes the direct cost of the incident (loan losses, product recalls, investor claims) plus the regulatory consequence: regulators will examine the approval records and conclude that the human oversight was performative, leading to findings against the firm's governance framework as a whole. In regulated sectors, the consequence extends to personal liability: the QP who rubber-stamped a batch release, the compliance officer who approved a report without reading it, the credit officer who approved a loan in 11 seconds — each faces individual accountability under their respective regulatory regimes.

Cross-references: AG-019 (Human Escalation & Override Triggers) for defining when human approval is required; AG-049 (Governance Decision Explainability) for ensuring the approval request contains sufficient explanation for informed review; AG-006 (Tamper-Evident Record Integrity) for immutable logging of approval decisions; AG-169 (Legal Commitment and Representation Authority Governance) for approvals of legally binding commitments; AG-171 (Exception, Waiver and Compensating-Control Governance) for approvals of exceptions and waivers.

Cite this protocol

AgentGoverning. (2026). AG-170: Approval Quality and Substantive Review Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-170

← Previous Protocol

AG-169

Legal Commitment and Representation Authority Governance

Next Protocol →

AG-171

Exception, Waiver and Compensating-Control Governance