AG-170

Approval Quality and Substantive Review Governance

Execution Integrity, Accountability & Approval Quality ~17 min read AGS v2.1 · April 2026
EU AI Act GDPR SOX FCA NIST

2. Summary

Approval Quality and Substantive Review Governance requires that every human approval in an AI agent governance workflow represents a genuine, informed decision — not a rubber stamp. As organisations deploy AI agents that escalate decisions to human approvers, a critical vulnerability emerges: the human approval becomes a formality rather than a substantive review. Approvers click "approve" without reading the request, review a summary without examining the underlying data, or approve within seconds of receiving the request — all patterns indicating that the human oversight is performative rather than meaningful. AG-170 addresses this by requiring structural controls that measure, enforce, and verify the quality of human approvals. The principle is that a governance framework that routes decisions to human approvers is only as strong as the rigour of those approvals. If the human layer is a rubber stamp, the governance framework has a human-shaped hole in it.

3. Example

Scenario A — Rapid-Fire Approval of AI-Generated Loan Decisions: A retail bank deploys an AI agent that underwrites consumer loans up to GBP 50,000. Loans exceeding GBP 25,000 require human approval per the bank's credit risk policy. The agent processes 400 loan applications per day, of which approximately 120 exceed the GBP 25,000 threshold and are routed to a team of 4 human approvers. Each approver receives 30 approval requests per day. The agent presents each request with a summary: "Applicant: [name]. Amount: GBP [amount]. AI recommendation: Approve. Risk score: [score]. Approval required."

Over 3 months, the approval rate is 97.8%. Average time-to-approval is 11 seconds. No approver has ever requested additional information. No approver has modified any loan terms. An internal audit samples 50 approved loans and finds that 8 had risk indicators that should have triggered further inquiry — income inconsistencies, recent address changes combined with high loan amounts, and employer verification failures. The approvers approved them all in under 15 seconds.

What went wrong: The approval workflow was designed for throughput, not quality. The AI's recommendation ("Approve") anchored the human decision. The summary excluded the risk indicators that would have prompted further inquiry. The 11-second average approval time is physically incompatible with substantive review — the minimum time to read the summary, assess the recommendation, and make an informed decision is estimated at 90-120 seconds. The approval was performative. Consequence: 8 loans totalling GBP 287,000 approved without substantive review, estimated GBP 115,000 in expected credit losses from those loans, FCA enforcement risk for inadequate creditworthiness assessment under CONC 5.2.

Scenario B — Approval Fatigue in Safety-Critical System: A pharmaceutical manufacturer uses an AI agent to manage batch release for a biologic drug. Each batch undergoes automated quality testing, and the agent compiles test results into a release recommendation. A qualified person (QP) must review and approve each batch release — a regulatory requirement under the EU GMP Annex 16. The agent processes 15 batches per day, and the QP approves each within 2-3 minutes.

Over time, the QP develops approval fatigue. When Batch 2024-0847 arrives with an out-of-specification endotoxin reading (0.28 EU/mL against a limit of 0.25 EU/mL), the agent correctly flags the result but presents it among 47 other in-specification results. The QP approves the batch in 2 minutes and 14 seconds. The batch is released and administered to 340 patients. 12 patients develop pyrogenic reactions. The MHRA investigation reveals that the QP's approval time, approval rate (100% over 6 months), and lack of any queries or rejections indicate inadequate review.

What went wrong: The approval workflow buried the critical information (one out-of-specification result among 47 in-specification results). The QP's 100% approval rate over 6 months with no queries or rejections is statistical evidence of rubber-stamping. No mechanism enforced minimum review time, required acknowledgement of flagged anomalies, or varied the presentation to maintain the QP's attention. Consequence: 12 patients harmed, batch recall costing EUR 4.7 million, MHRA regulatory action against the manufacturing authorisation, personal liability for the QP.

Scenario C — Approval Without Access to Underlying Data: An AI agent in an asset management firm prepares quarterly investor reports. Each report requires compliance officer approval before distribution. The agent generates a 40-page report with performance figures, risk metrics, and regulatory disclosures. The compliance officer receives a notification: "Report ready for approval. Summary: Q3 performance report for Fund X. 40 pages. Risk level: standard." The compliance officer approves based on the summary without opening the report.

The report contains a performance figure calculated using a methodology that was changed mid-quarter without disclosure — the Q2 report used time-weighted returns and the Q3 report uses money-weighted returns, making the performance appear 3.2 percentage points better than it would under the original methodology. The methodology change is disclosed on page 34 in a footnote. The compliance officer, who approved without reading, did not detect the undisclosed methodology change.

What went wrong: The compliance officer approved without accessing the underlying document. The approval workflow did not require the approver to demonstrate access to the full content. No mechanism flagged the methodology change as a material disclosure issue requiring explicit review. Consequence: Misleading performance reporting to investors, FCA enforcement for COBS 4.2 breach (fair, clear, and not misleading communications), potential investor claims for GBP 12 million (the net new investment attracted by the inflated performance figure).

4. Requirement Statement

Scope: This dimension applies to all AI governance workflows that include a human approval step. This includes but is not limited to: transaction approvals, content review and release, escalation handling, exception approvals, batch release, regulatory submission review, and any governance control that requires a human to review and approve an AI agent's proposed action or output before it takes effect. The scope covers both formal approval workflows (structured approval queues with defined authority levels) and informal approval workflows (an AI agent asking a human "shall I proceed?" via chat or email). The test for inclusion is: does the governance framework rely on a human reviewing and approving something before the agent acts? If yes, the quality of that approval is within AG-170's scope.

4.1. A conforming system MUST enforce a minimum review period proportional to the complexity of the approval request, below which approval is not accepted (e.g., minimum 60 seconds for standard requests, minimum 300 seconds for high-complexity requests).

4.2. A conforming system MUST require the approver to access the underlying data or document, not just the summary, before approval is accepted — verified through content access logging (e.g., the approver opened the full document, scrolled to at least 80% of the content, or accessed specific flagged sections).

4.3. A conforming system MUST present risk indicators, anomalies, and flagged items prominently and require explicit acknowledgement of each before approval is accepted, preventing approval without awareness of known concerns.

4.4. A conforming system MUST track approval quality metrics per approver — including approval rate, average review time, percentage of requests where additional information was requested, and percentage of approvals that subsequently required remediation — and flag approvers whose patterns indicate rubber-stamping.

4.5. A conforming system MUST vary the presentation of approval requests to prevent pattern-based auto-approval — for example, occasionally reordering sections, changing the visual layout, or inserting synthetic test requests that have a known correct decision.

4.6. A conforming system MUST not present the AI agent's recommendation in a way that anchors the approver's decision — for example, presenting "AI recommends: Approve" before the approver has reviewed the underlying data biases the decision toward approval.

4.7. A conforming system MUST log every approval decision with the approver's identity, the time spent reviewing, the content accessed, and any anomalies acknowledged, in a tamper-evident record per AG-006.

4.8. A conforming system SHOULD implement periodic calibration exercises where approvers review cases with known correct decisions, and their performance is measured against the known answers.

4.9. A conforming system SHOULD rotate approvers across request types to prevent familiarity-induced complacency and ensure fresh perspective.

4.10. A conforming system MAY implement dual-approval requirements for high-risk decisions, where two independent approvers must agree before the action proceeds.

5. Rationale

Human approval is the last line of defence in many AI governance frameworks. When an agent's automated controls flag a concern, the typical response is to escalate to a human approver. When a decision exceeds the agent's authority, a human makes the final call. When regulatory requirements demand human oversight, a human signs off. The entire governance architecture depends on the assumption that the human approval is a meaningful, informed decision. AG-170 challenges this assumption — and provides structural controls to make it true.

The problem is well-documented in human factors research. Automation bias — the tendency for humans to over-rely on automated recommendations — increases when the human perceives the automated system as more capable or more consistent than themselves. When an AI agent recommends approval with a confidence score, the human approver faces a cognitive asymmetry: rejecting the recommendation requires active effort, justification, and the risk of being wrong, while accepting it requires only a click. Over time, approval rates converge toward 100% as approvers learn that questioning the AI's recommendation is rarely rewarded and often delayed.

The structural indicators of rubber-stamping are measurable: approval rates above 95% over sustained periods, average review times below the physical minimum for substantive review, zero requests for additional information, and zero modifications to the proposed action. These patterns do not prove that every individual approval was inadequate, but they provide strong evidence that the approval process is not functioning as a genuine control.

AG-170 addresses this through structural interventions: minimum review times that make rubber-stamping physically impossible, content access verification that ensures the approver saw the data, prominence requirements for risk indicators that prevent critical information from being buried, and quality metrics that detect pattern-based approval behaviour. The goal is not to slow down the approval process for its own sake — it is to ensure that when the governance framework routes a decision to a human, the human's engagement is genuine and informed.

6. Implementation Guidance

Approval quality governance requires a combination of workflow design, behavioural measurement, and continuous calibration. The challenge is to enforce substantive review without creating an approval bottleneck that delays legitimate operations.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Financial Services. FCA COBS 5.2 requires that suitability assessments are based on adequate information. A human approver who approves a suitability recommendation without reviewing the underlying data has not performed an adequate assessment. CONC 5.2 requires creditworthiness assessments to be based on sufficient information — a 15-second approval of a GBP 30,000 loan application is not a sufficient assessment.

Healthcare. EU GMP Annex 16 requires the Qualified Person to certify that each batch meets its release specification. The QP's certification is personal and non-delegable. A QP who rubber-stamps batch releases is not fulfilling their legal obligation, creating personal criminal liability and regulatory risk to the manufacturing authorisation.

Public Sector. Decisions affecting citizens' rights (benefits, immigration, criminal justice risk assessments) require meaningful human review to satisfy Article 22 GDPR (automated decision-making). A rubber-stamp approval of an AI recommendation does not constitute meaningful human intervention — the European Data Protection Board has stated that the human review must be "carried out by someone who has the authority and competence to change the decision."

Maturity Model

Basic Implementation — Minimum review times are enforced for approval requests. The AI recommendation is not shown until the approver accesses the underlying data. Approval rate and average review time are tracked per approver. Approvers whose metrics indicate potential rubber-stamping are flagged for management review. Coverage: all high-value or safety-critical approval workflows.

Intermediate Implementation — All basic capabilities plus: progressive disclosure prevents anchoring. Risk indicators require explicit acknowledgement. Content access verification confirms the approver opened the full document. Synthetic test cases are injected periodically and results are tracked. An approval quality dashboard provides real-time visibility. Approvers are rotated across request types. Coverage: all approval workflows, including informal approvals.

Advanced Implementation — All intermediate capabilities plus: outcome feedback links approval decisions to subsequent outcomes, enabling measurement of actual approval accuracy (not just process compliance). Calibration exercises are conducted quarterly with known-answer cases. Dual approval is implemented for the highest-risk decisions. The organisation can demonstrate to regulators that human approvals represent genuine, informed decisions with measured accuracy rates — not performative compliance.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Minimum Review Time Enforcement

Test 8.2: Content Access Requirement

Test 8.3: Anomaly Acknowledgement Requirement

Test 8.4: Anti-Anchoring — Recommendation Sequencing

Test 8.5: Rubber-Stamping Detection

Test 8.6: Synthetic Test Case Detection Accuracy

Test 8.7: Approval Decision Logging Completeness

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 14 (Human Oversight)Direct requirement
EU AI ActArticle 29 (Obligations of Users of High-Risk AI)Supports compliance
GDPRArticle 22 (Automated Individual Decision-Making)Direct requirement
FCA COBS5.2 (Suitability)Supports compliance
FCA CONC5.2 (Creditworthiness Assessment)Supports compliance
EU GMPAnnex 16 (Certification by a Qualified Person)Direct requirement
SOXSection 302 (Corporate Responsibility for Financial Reports)Supports compliance
NIST AI RMFGOVERN 1.3, MANAGE 4.2Supports compliance

EU AI Act — Article 14 (Human Oversight)

Article 14 requires that high-risk AI systems be designed and developed to enable effective human oversight, including by individuals who can "fully understand the capacities and limitations of the high-risk AI system" and "properly interpret the high-risk AI system's output." A rubber-stamp approval does not satisfy these requirements. AG-170 ensures that the human oversight mandated by Article 14 is substantive — the approver demonstrates understanding by engaging with the data, acknowledging risks, and making an informed decision.

GDPR — Article 22

Article 22 gives data subjects the right not to be subject to a decision based solely on automated processing. When organisations use human approval to satisfy Article 22 (i.e., the decision is not "solely" automated because a human approves it), the quality of that approval matters. The EDPB has stated that the human review must involve "someone who has the authority and competence to change the decision." A rubber-stamp approval that changes nothing does not satisfy Article 22.

EU GMP — Annex 16

Annex 16 requires that the Qualified Person personally certify that each batch meets specification. The QP cannot delegate this certification or perform it perfunctorily. AG-170's minimum review time, content access verification, and anomaly acknowledgement requirements directly support the QP's ability to demonstrate that their certification was based on genuine review.

FCA CONC — 5.2 (Creditworthiness Assessment)

CONC 5.2 requires firms to assess creditworthiness based on sufficient information. A human approver who approves a credit decision in 11 seconds without reviewing the underlying documentation has not conducted a creditworthiness assessment based on sufficient information. AG-170 ensures that the review time and content access are proportionate to the decision's complexity.

10. Failure Severity

FieldValue
Severity RatingHigh
Blast RadiusProcess-wide — affects all decisions flowing through the approval workflow

Consequence chain: Without approval quality governance, human approval becomes a rubber stamp, and the governance framework has a human-shaped hole in it. The immediate technical failure is a non-substantive approval — a human clicked "approve" without genuinely reviewing the request. The operational impact compounds over time: approval quality degrades as approvers learn that rubber-stamping has no consequences, approval rates converge toward 100%, and the AI agent's outputs are effectively unreviewed. The risk accumulates until an incident exposes the gap — a defaulted loan that should have been queried, a released batch that should have been held, a published report that should have been corrected. The business consequence includes the direct cost of the incident (loan losses, product recalls, investor claims) plus the regulatory consequence: regulators will examine the approval records and conclude that the human oversight was performative, leading to findings against the firm's governance framework as a whole. In regulated sectors, the consequence extends to personal liability: the QP who rubber-stamped a batch release, the compliance officer who approved a report without reading it, the credit officer who approved a loan in 11 seconds — each faces individual accountability under their respective regulatory regimes.

Cross-references: AG-019 (Human Escalation & Override Triggers) for defining when human approval is required; AG-049 (Governance Decision Explainability) for ensuring the approval request contains sufficient explanation for informed review; AG-006 (Tamper-Evident Record Integrity) for immutable logging of approval decisions; AG-169 (Legal Commitment and Representation Authority Governance) for approvals of legally binding commitments; AG-171 (Exception, Waiver and Compensating-Control Governance) for approvals of exceptions and waivers.

Cite this protocol
AgentGoverning. (2026). AG-170: Approval Quality and Substantive Review Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-170