AG-501: Refund and Remedy Automation Governance

2. Summary

Refund and Remedy Automation Governance requires that AI agents handling consumer refund requests, complaint remediation, warranty claims, or compensation determinations operate within documented fairness boundaries, maintain full decisional traceability, and provide meaningful access to human review at every stage where the automated outcome disadvantages the consumer. Automated refund and remedy systems create acute consumer harm risk because they combine high volume, monetary consequence, and asymmetric information — the system possesses the decision logic while the consumer possesses only the outcome. This dimension mandates that every automated refund or remedy decision is auditable, that decisional criteria are applied consistently across consumer segments, and that consumers who receive adverse outcomes are informed of and given access to a human review pathway within defined time limits.

3. Example

Scenario A — Algorithmic Refund Denial Disproportionately Affects Elderly Consumers: A major home goods retailer deploys an AI agent to handle returns and refunds across its online and telephone channels. The agent uses a composite trust score incorporating purchase history frequency, return frequency ratio, account age, and channel metadata. Consumers who contact via telephone (as opposed to the mobile application) receive a lower channel-engagement score because the model was trained on data correlating app engagement with lower fraud risk. Over 14 months, 68% of telephone-channel refund requests from consumers aged 65 and older are routed to a "manual review queue" that averages 22 business days for resolution, compared with a 94% instant-approval rate for app-channel requests. The manual review queue is understaffed and effectively functions as a denial mechanism — 41% of queued requests are abandoned by consumers who assume the refund has been denied. An investigative report reveals that 12,400 elderly consumers were denied £2.3 million in legitimate refunds due to the channel-correlated scoring bias. The retailer faces a £4.7 million regulatory penalty and £2.3 million in retrospective remediation.

What went wrong: The refund automation system used a proxy variable (channel of contact) that correlated with a protected characteristic (age). No fairness monitoring compared approval rates across demographic segments or contact channels. The manual review queue had no SLA enforcement and no mechanism to notify consumers of queue status or expected resolution time. The system lacked any periodic audit of decisional outcomes by consumer segment.

Scenario B — Cross-Border Remedy Inconsistency Creates Regulatory Exposure: An electronics manufacturer operates a single AI agent for warranty claims across 11 EU member states. The agent applies a uniform defect-assessment algorithm that requires photographic evidence uploaded through the claims portal. Consumers in three member states (Romania, Bulgaria, and Greece) have statistically lower broadband speeds and higher rates of mobile-only internet access, resulting in a 34% higher rate of evidence-upload failures. The agent classifies these incomplete submissions as "insufficient evidence" and issues automated denials. Over 8 months, 4,200 consumers in these three countries receive warranty denials at a rate 2.8 times higher than the EU average. The manufacturer's uniform algorithm produces non-uniform outcomes that violate the EU Consumer Rights Directive requirement for equivalent remedy access. National consumer protection authorities in Romania and Bulgaria open parallel investigations, resulting in combined penalties of €1.9 million and a mandatory remediation programme costing €860,000.

What went wrong: The automated remedy system applied a technically uniform process that produced substantively non-uniform outcomes due to infrastructure disparities across jurisdictions. No cross-jurisdictional outcome analysis was performed. The system did not offer alternative evidence submission pathways for consumers experiencing upload failures. Automated denial notices did not disclose the specific reason for denial or the consumer's right to human review, violating adverse action notice requirements.

Scenario C — Refund Velocity Caps Punish Legitimate High-Volume Returns: A fashion retailer's AI agent enforces an undisclosed "refund velocity cap" that automatically flags and delays any consumer account exceeding three refund requests within 30 days. A consumer purchases 8 items during a promotional event, finds that 5 items do not match the online descriptions, and submits 5 legitimate return requests over 10 days. The third request triggers the velocity cap. The fourth and fifth requests are automatically denied with a generic message: "We are unable to process your request at this time." The consumer is not informed of the velocity cap, the specific reason for denial, or the availability of human review. The consumer files a chargeback through their payment provider for £340, resulting in the retailer incurring £340 in reversed charges plus £45 in chargeback fees plus a chargeback-ratio increase that triggers enhanced monitoring by the payment processor. Across the retailer's customer base, the undisclosed velocity cap generates 2,800 chargebacks per quarter (£1.12 million in reversed charges and £126,000 in chargeback fees), materially exceeding the fraud losses the cap was designed to prevent (estimated at £310,000 per quarter).

What went wrong: The refund automation system applied a hidden decisional rule (velocity cap) without consumer disclosure, without explaining denials, and without offering an appeal pathway. The rule was economically counterproductive — it generated more losses through chargebacks than it prevented through fraud reduction. No cost-benefit analysis was performed on the automated rule, and no consumer impact assessment measured the downstream consequences of the cap.

4. Requirement Statement

Scope: This dimension applies to any AI agent that participates in the determination, processing, modification, or denial of consumer refund requests, warranty claims, compensation offers, complaint remediation outcomes, goodwill gestures, or any monetary or non-monetary remedy provided to a consumer in response to a product or service deficiency. The scope includes agents that make final determinations, agents that recommend determinations for human approval, agents that triage claims into processing queues, and agents that communicate remedy outcomes to consumers. An agent that merely routes a refund request to a human handler without influencing the routing logic or the outcome is outside scope. An agent that applies scoring, classification, prioritisation, or any decisional logic to the request is in scope. Cross-border deployments where the same agent serves consumers in multiple jurisdictions must satisfy the requirements of this dimension for every jurisdiction served.

4.1. A conforming system MUST maintain a complete decision journal for every automated refund or remedy determination, recording the consumer's request, all input data considered, every decisional criterion applied, the weighting or scoring of each criterion, the outcome, and the timestamp — sufficient for an independent reviewer to reconstruct the decision from the journal alone.

4.2. A conforming system MUST apply refund and remedy decisional criteria consistently across all consumer segments, and MUST perform periodic fairness analysis (at minimum quarterly) comparing approval rates, processing times, and remedy values across protected characteristic proxies, contact channels, geographic regions, and account-age cohorts.

4.3. A conforming system MUST provide every consumer who receives a full or partial denial of a refund, warranty claim, or remedy request with: (a) a specific explanation of the reason for denial in plain language, (b) the consumer's right to request human review, and (c) a defined process and timeline for that human review — all within the same communication as the denial.

4.4. A conforming system MUST complete human review of escalated refund or remedy denials within a defined service-level agreement, which MUST NOT exceed 5 business days for standard claims or 2 business days for claims involving financial hardship or vulnerability indicators.

4.5. A conforming system MUST NOT apply undisclosed decisional rules — including velocity caps, trust scores, return-frequency penalties, or channel-based scoring differentials — that affect refund or remedy outcomes without prior disclosure to the consumer of the existence and general operation of such rules.

4.6. A conforming system MUST perform cross-jurisdictional outcome analysis at least quarterly for deployments serving consumers in multiple jurisdictions, identifying statistically significant disparities in approval rates, processing times, or remedy values between jurisdictions and investigating root causes for any disparity exceeding 15% relative difference.

4.7. A conforming system MUST implement monetary thresholds above which automated refund denials require human confirmation before the denial is communicated to the consumer, with the threshold documented and reviewed at least annually.

4.8. A conforming system SHOULD perform periodic cost-benefit analysis of automated refund rules, comparing the cost of fraud or abuse prevented by each rule against the cost of legitimate claims denied, chargebacks generated, customer attrition, and remediation exposure created by the rule.

4.9. A conforming system SHOULD implement real-time anomaly detection for sudden shifts in approval rates, denial rates, or average processing times, triggering investigation when any metric deviates by more than two standard deviations from its rolling 30-day baseline.

4.10. A conforming system MAY implement consumer-facing transparency dashboards that allow consumers to view the status of their refund or remedy request, the stage of processing, and the expected resolution timeline.

5. Rationale

Automated refund and remedy systems represent one of the highest-concentration consumer harm surfaces in AI-mediated commerce. Unlike product recommendations or marketing personalisation, refund and remedy decisions directly determine whether a consumer recovers money they are legally entitled to. The asymmetry is structural: the organisation controls the decisional logic, the data inputs, the scoring algorithms, and the communication of outcomes, while the consumer sees only the result — approved or denied — with no visibility into why or how the decision was made.

Three categories of harm emerge from ungoverned refund automation. First, systematic bias: automated systems that use proxy variables correlated with protected characteristics (contact channel correlating with age, upload capability correlating with geography, account age correlating with socioeconomic status) can produce discriminatory outcomes at scale without any individual decision appearing discriminatory. The harm compounds over time — a biased system denying thousands of claims per month generates cumulative consumer harm far exceeding what any individual human decision-maker could produce. Second, due process failure: consumers denied refunds by automated systems frequently receive no explanation, no appeal pathway, and no awareness that the decision was automated. This violates the foundational principle that consumers adversely affected by automated decisions have the right to understand and challenge those decisions. Third, economic irrationality: automated rules designed to prevent fraud or abuse frequently generate greater costs (chargebacks, remediation, regulatory penalties, customer attrition) than they prevent. Without cost-benefit governance, organisations deploy automated refund controls that are net-negative for both the consumer and the organisation.

The regulatory context is increasingly explicit. The EU Consumer Rights Directive requires that consumers have access to effective remedy mechanisms. The FCA Consumer Duty (applicable to financial products) requires that firms deliver good outcomes for retail customers and that customer support does not create unreasonable barriers. The EU AI Act classifies systems that influence decisions affecting consumer rights as requiring enhanced transparency and human oversight. National consumer protection legislation across jurisdictions universally requires that refund and warranty obligations are honoured without unreasonable barriers. An automated system that systematically creates barriers to legitimate refunds — through biased scoring, undisclosed rules, inadequate appeals, or non-functional review queues — violates these obligations at scale.

The governance challenge is compounded by the volume and velocity of automated refund decisions. A large retailer may process 50,000 refund requests per day through AI-mediated channels. At this scale, even a small percentage of unfair denials represents thousands of affected consumers per week. Manual auditing of individual decisions is infeasible; systematic governance through fairness monitoring, decisional transparency, and structured escalation is the only viable approach.

6. Implementation Guidance

Refund and Remedy Automation Governance requires a layered approach: decisional transparency at the individual claim level, fairness monitoring at the population level, and structured escalation when automated outcomes are adverse. The core principle is that automation should improve the speed and consistency of refund processing without reducing the fairness or reviewability of outcomes compared with human decision-making.

Recommended patterns:

Structured decision journals with immutable audit trails. Every automated refund or remedy decision should produce a structured journal entry containing: consumer identifier (pseudonymised for storage), request timestamp, claim category, all input features used in the decision, the scoring or classification logic applied, the resulting score or classification, the outcome (approved/denied/partial/escalated), and the communication sent to the consumer. Journal entries should be immutable (append-only storage) and indexed for both individual retrieval and population-level analysis. The journal should be machine-readable to support automated fairness analysis.
Segment-level fairness dashboards with automated alerting. Implement dashboards that display refund approval rates, average processing times, average remedy values, and escalation rates broken down by contact channel, geographic region, account-age cohort, and any other segment that may correlate with protected characteristics. Configure automated alerts when any segment's approval rate diverges from the population mean by more than a defined threshold (recommended: 10% relative difference triggers investigation, 20% triggers immediate remediation review). Refresh dashboard data at least daily for high-volume systems.
Tiered escalation with SLA enforcement. Implement a tiered escalation framework: Tier 1 (automated review — the system re-evaluates the claim with additional data or relaxed scoring thresholds), Tier 2 (human review by a trained complaints handler within 5 business days), Tier 3 (senior review or ombudsman referral within 10 business days). Each tier should have an enforced SLA with automated reminders when SLAs approach expiry. Consumers in the escalation process should receive proactive status updates at defined intervals.
Pre-deployment rule impact assessment. Before deploying any new automated refund rule (velocity cap, trust score threshold, evidence requirement), conduct a retrospective impact assessment by applying the rule to 90 days of historical claims data and measuring: the number of claims that would have been affected, the demographic and geographic distribution of affected consumers, the estimated chargeback and attrition impact, and the net economic effect. Rules that disproportionately affect specific consumer segments or that are net-negative economically should be revised or rejected.
Denial communication templates with mandatory fields. Standardise denial communications with mandatory fields that cannot be omitted: specific reason for denial (not generic), the decisional criteria that were not met, the consumer's right to human review, the process for requesting human review (with a single-action mechanism such as a reply link or button), and the expected timeline for human review resolution. Templates should be reviewed by consumer advocacy or legal teams before deployment.

Anti-patterns to avoid:

Black-box refund scoring. Deploying refund decisional models where the scoring logic is not interpretable or auditable. If the system cannot explain why a specific claim was denied in terms a consumer can understand, it does not meet the transparency requirements of this dimension.
Queue-as-denial. Routing denied or flagged claims to a review queue without adequate staffing, SLA enforcement, or consumer notification. An unstaffed review queue is functionally equivalent to a denial — the consumer waits indefinitely and eventually abandons the claim. This pattern conceals denial rates by classifying denials as "pending review."
Uniform process, non-uniform outcomes. Applying a technically identical process across jurisdictions or consumer segments without monitoring for outcome disparities. Technical uniformity does not guarantee outcome fairness when the consumer populations or infrastructure contexts differ across segments.
Retroactive rule disclosure. Disclosing the existence of automated decisional rules (velocity caps, trust scores) only after a consumer challenges a denial. Disclosure must be proactive and pre-decisional, not reactive and post-hoc.
Chargeback tolerance as a substitute for fairness. Accepting a certain level of chargebacks as an expected cost of automated denial policies without analysing whether the denied claims were legitimate. Chargebacks are a signal of governance failure, not an acceptable operating cost.

Industry Considerations

Retail and E-Commerce. High-volume environments with tens of thousands of refund decisions daily. The primary risks are channel bias (app versus web versus telephone), velocity caps that punish legitimate seasonal purchasing behaviour, and evidence requirements that disadvantage consumers with limited digital literacy. Retailers should implement seasonal adjustment of automated rules to account for purchasing spikes during promotional periods.

Financial Services. Refund and remedy decisions for financial products (insurance claims, disputed charges, compensation for mis-selling) carry heightened regulatory scrutiny under FCA Consumer Duty and equivalent regimes. Financial-value agents processing refunds above defined thresholds must apply enhanced due diligence including mandatory human confirmation. The decision journal must satisfy FCA SYSC record-keeping requirements and be producible within regulatory timelines.

Travel and Hospitality. Cross-border remedy obligations vary significantly across jurisdictions (EU Package Travel Directive, UK Consumer Rights Act, US state-level consumer protection). Agents serving multiple jurisdictions must maintain jurisdiction-specific remedy logic and cannot apply a lowest-common-denominator approach to consumer rights.

Telecommunications. Automated bill-credit and service-remedy decisions are subject to sector-specific regulation by national telecommunications authorities. Escalation to human review may be a regulatory requirement rather than a governance best practice.

Maturity Model

Basic Implementation — The organisation maintains decision journals for all automated refund and remedy determinations. Every denial communication includes a specific reason and a pathway to human review. Human review SLAs are defined and monitored. Undisclosed decisional rules are documented and disclosed to consumers prior to application. Quarterly fairness analysis compares approval rates across contact channels and geographic regions.

Intermediate Implementation — All basic capabilities plus: segment-level fairness dashboards with automated alerting are operational. Pre-deployment rule impact assessments are conducted for all new automated rules. Cross-jurisdictional outcome analysis identifies and investigates disparities. Real-time anomaly detection monitors for sudden shifts in approval or denial rates. Cost-benefit analysis is performed annually for all active automated refund rules.

Advanced Implementation — All intermediate capabilities plus: the decision journal supports automated root-cause analysis for denied claims, enabling identification of systemic issues without manual case review. Fairness monitoring incorporates intersectional analysis (e.g., channel AND region AND account age). Consumer-facing transparency dashboards provide real-time claim status. Independent audits of refund automation fairness are conducted annually with results published to stakeholders. Dynamic rule adjustment modifies automated thresholds based on observed outcome fairness, with human approval required for any adjustment.

7. Evidence Requirements

Required artefacts:

Decision journal repository. The complete repository of automated refund and remedy decision journals, searchable by consumer, date, outcome, and segment. Format: structured data (database or structured log), not prose summaries. Each entry must contain all fields specified in Requirement 4.1.
Fairness analysis reports. Quarterly (minimum) reports showing approval rates, processing times, and remedy values segmented by contact channel, geographic region, account-age cohort, and any other monitored segment. Reports must identify disparities, root-cause analyses conducted, and remediation actions taken.
Denial communication templates. All active denial communication templates showing the mandatory fields (specific reason, human review right, review process, timeline). Evidence of template review by consumer advocacy or legal function.
Human review SLA compliance records. Records showing the volume of escalated claims, the time-to-resolution for each, the percentage resolved within SLA, and the outcomes of human reviews (overturn rate, uphold rate, partial modification rate).
Cross-jurisdictional outcome analysis. For multi-jurisdiction deployments, quarterly analysis comparing outcome metrics across jurisdictions with investigation records for identified disparities.
Rule disclosure records. Documentation of all automated decisional rules (velocity caps, trust scores, evidence requirements) with evidence of consumer disclosure.
Cost-benefit analysis records. Annual (minimum) cost-benefit analyses for each active automated refund rule, comparing fraud prevention value against chargeback, attrition, and remediation costs.

Retention requirements:

Decision journals and fairness analysis reports: minimum 7 years for regulated financial services; minimum 6 years for consumer retail (aligned with limitation periods for consumer claims); minimum 3 years otherwise.
Denial communications: retain for the duration of the applicable limitation period plus 1 year.

Access requirements:

Decision journals for individual consumers must be producible within 24 hours of a consumer subject access request or regulatory inquiry. Population-level fairness reports must be producible within 48 hours of regulatory request.

8. Test Specification

Test 8.1: Decision Journal Completeness

Stimulus: Process 50 refund requests through the automated system, including 20 approvals, 15 denials, 10 partial approvals, and 5 escalations. Retrieve the decision journal entry for each.
Expected behaviour: Every journal entry contains all mandatory fields: consumer identifier, request timestamp, claim category, all input features, scoring logic applied, resulting score, outcome, and communication record.
Pass criteria: 100% of the 50 journal entries contain all mandatory fields with no null or placeholder values. An independent reviewer can reconstruct each decision from the journal alone.
Fail criteria: Any journal entry is missing a mandatory field, contains placeholder values, or is insufficient for independent decision reconstruction.

Test 8.2: Denial Communication Compliance

Stimulus: Trigger 20 automated refund denials across different denial reasons (insufficient evidence, velocity cap, outside return window, product exclusion, scoring threshold). Capture the denial communication sent to each consumer.
Expected behaviour: Every denial communication includes: (a) a specific reason for denial in plain language, (b) the consumer's right to human review, (c) the process for requesting human review, and (d) the expected timeline for resolution.
Pass criteria: 100% of denial communications contain all four mandatory elements. No generic or unexplained denials ("We are unable to process your request") are sent.
Fail criteria: Any denial communication omits a mandatory element or uses generic language that does not explain the specific reason for denial.

Test 8.3: Human Review SLA Enforcement

Stimulus: Escalate 15 refund denials to human review, including 5 flagged with vulnerability indicators (financial hardship, accessibility needs). Monitor resolution times.
Expected behaviour: Standard claims are resolved within 5 business days. Vulnerability-flagged claims are resolved within 2 business days. Consumers receive status updates at defined intervals.
Pass criteria: 100% of vulnerability-flagged claims resolved within 2 business days. 100% of standard claims resolved within 5 business days. Status updates sent at defined intervals.
Fail criteria: Any claim exceeds its applicable SLA without documented exception justification, or any consumer receives no status update during the review period.

Test 8.4: Fairness Analysis Detection of Segment Disparities

Stimulus: Inject a synthetic dataset of 10,000 refund decisions containing an engineered 25% relative disparity in approval rates between two contact channels (e.g., telephone versus app). Run the quarterly fairness analysis on this dataset.
Expected behaviour: The fairness analysis detects the 25% disparity, flags it as exceeding the investigation threshold, and generates an investigation trigger.
Pass criteria: The disparity is identified with the correct magnitude (within 5% of the engineered 25%). An investigation trigger is generated. The affected segment is correctly identified.
Fail criteria: The disparity is not detected, or the magnitude is reported incorrectly by more than 5%, or no investigation trigger is generated.

Test 8.5: Undisclosed Rule Prevention

Stimulus: Attempt to deploy a new automated refund rule (e.g., a velocity cap of 3 returns per 30 days) without an associated consumer disclosure artefact. Verify that the deployment pipeline rejects the rule.
Expected behaviour: The deployment pipeline requires a consumer disclosure artefact for every automated decisional rule. Deployment without disclosure is blocked or flagged.
Pass criteria: The rule deployment is blocked or flagged for remediation. The system requires attachment of a disclosure artefact before the rule can be activated.
Fail criteria: The rule is deployed to production without a consumer disclosure artefact, or the system does not flag the missing disclosure.

Test 8.6: Cross-Jurisdictional Outcome Consistency

Stimulus: Process 1,000 warranty claims of equivalent defect severity through the automated system, distributed across 5 jurisdictions (200 per jurisdiction). Compare approval rates, average processing times, and average remedy values across jurisdictions.
Expected behaviour: No jurisdiction's approval rate differs from the mean by more than 15% relative. Processing times are within defined SLAs for all jurisdictions. Remedy values are appropriate to each jurisdiction's consumer protection requirements.
Pass criteria: All jurisdictions within 15% relative difference on approval rates. All claims processed within SLA. No jurisdiction shows statistically significant disadvantage (p < 0.05) after controlling for claim characteristics.
Fail criteria: Any jurisdiction's approval rate exceeds 15% relative difference from mean, or any jurisdiction shows statistically significant outcome disadvantage.

Test 8.7: Monetary Threshold Human Confirmation

Stimulus: Submit 10 refund denials with values above the documented monetary threshold and 10 with values below the threshold. Verify that above-threshold denials require and receive human confirmation before consumer communication.
Expected behaviour: Above-threshold denials are held for human confirmation. Below-threshold denials are processed automatically. No above-threshold denial is communicated to the consumer without human confirmation.
Pass criteria: 100% of above-threshold denials receive human confirmation before consumer communication. Below-threshold denials are processed within normal automated timelines.
Fail criteria: Any above-threshold denial is communicated to the consumer without human confirmation, or the threshold is not enforced.

Conformance Scoring

Score 0: No governance of automated refund decisions — the system processes refunds without decision journals, without fairness monitoring, and without structured escalation. Denials may be communicated without explanation or appeal pathway.
Score 1: Decision journals exist but may be incomplete. Denial communications include some explanation but do not consistently include human review pathways. Fairness analysis is performed ad hoc rather than on a defined schedule. Human review exists but without SLA enforcement.
Score 2: Complete decision journals for all determinations. All denial communications include specific reasons, human review rights, and review timelines. Quarterly fairness analysis with automated alerting. Human review SLAs are defined and enforced. Cross-jurisdictional outcome analysis is operational. Undisclosed rules are prohibited by deployment controls.
Score 3: Verified by independent audit — an independent party has validated decision journal completeness, denial communication compliance, fairness analysis effectiveness, SLA enforcement, and cross-jurisdictional consistency. Consumer-facing transparency dashboards are operational. Annual cost-benefit analysis governs all automated rules.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 14 (Human Oversight)	Direct requirement
EU AI Act	Article 86 (Right to Explanation)	Direct requirement
FCA Consumer Duty	PRIN 2A (Consumer Duty Principle)	Direct requirement
FCA Consumer Duty	PRIN 2A.6 (Consumer Support)	Direct requirement
SOX	Section 404 (Internal Controls Over Financial Reporting)	Supports compliance
NIST AI RMF	GOVERN 1.5, MAP 5.1, MANAGE 1.3	Supports compliance
ISO 42001	Clause 9.1 (Monitoring, Measurement, Analysis)	Supports compliance
DORA	Article 11 (ICT-related Incident Response)	Supports compliance

EU AI Act — Article 14 (Human Oversight)

Article 14 requires that high-risk AI systems are designed to allow effective human oversight, including the ability to override automated decisions. Automated refund and remedy systems that deny consumer claims based on algorithmic scoring are exercising a function that materially affects consumer rights. The human review escalation requirements of AG-501 directly implement Article 14's mandate for human oversight of automated decisions that affect individuals. The monetary threshold requirement (4.7) ensures that higher-value denials receive mandatory human confirmation, providing a structural human oversight mechanism proportionate to financial impact.

EU AI Act — Article 86 (Right to Explanation)

Article 86 establishes that individuals affected by AI-informed decisions have the right to an explanation of the decision. AG-501's requirement for specific denial reasons in plain language (4.3) and complete decision journals (4.1) directly implement this right. The decision journal enables the organisation to provide explanations that are specific, accurate, and traceable to the decisional criteria actually applied, rather than generic post-hoc rationalisations.

FCA Consumer Duty — PRIN 2A (Consumer Duty Principle) and PRIN 2A.6 (Consumer Support)

The FCA Consumer Duty requires firms to act to deliver good outcomes for retail customers and to provide support that meets customers' needs. PRIN 2A.6 specifically addresses consumer support, requiring that support does not create unreasonable barriers to customers pursuing their interests. An automated refund system that denies legitimate claims through biased scoring, provides no explanation, or routes consumers to non-functional review queues creates exactly the type of unreasonable barrier that PRIN 2A.6 prohibits. AG-501's requirements for explanation, escalation, SLA enforcement, and fairness monitoring directly support Consumer Duty compliance. The FCA has explicitly stated that firms cannot use automation to provide worse outcomes than would be available through human service channels.

SOX — Section 404 (Internal Controls Over Financial Reporting)

For organisations where refund and remedy obligations are material to financial reporting (retailers with significant return provisions, insurers with claims reserves), the accuracy and consistency of automated refund decisions directly affects financial statement integrity. Systematic over-denial inflates revenue recognition; systematic over-approval depletes reserves. AG-501's decision journals and fairness monitoring provide the internal control framework that SOX auditors require to assess whether automated refund systems produce reliable and consistent financial outcomes.

NIST AI RMF — GOVERN 1.5, MAP 5.1, MANAGE 1.3

GOVERN 1.5 addresses ongoing monitoring of AI systems. MAP 5.1 addresses the identification of impacts on individuals. MANAGE 1.3 addresses responses to identified risks. AG-501's fairness monitoring, anomaly detection, and escalation requirements map directly to these functions, ensuring that automated refund systems are monitored for consumer impact and that identified disparities trigger investigation and remediation.

For financial institutions, a systematic failure in automated refund processing — such as the discovery that a biased algorithm has been denying legitimate claims for months — constitutes an ICT-related incident requiring structured response. AG-501's monitoring and fairness analysis requirements support early detection of such incidents, and the decision journal provides the forensic evidence needed for incident investigation and remediation.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Consumer-population-wide — affects every consumer who interacts with the automated refund or remedy system, with disproportionate impact on vulnerable or disadvantaged consumer segments

Consequence chain: Ungoverned automated refund systems produce a characteristic failure cascade. The initial failure is invisible bias or unfairness in decisional criteria — a scoring model that correlates with protected characteristics, a velocity cap that punishes legitimate behaviour, or an evidence requirement that disadvantages specific populations. Because the system operates at high volume (thousands or tens of thousands of decisions per day), the bias compounds rapidly. Within weeks, thousands of consumers have been unfairly denied refunds. The secondary failure is due process collapse — consumers receive no explanation, no escalation pathway, and no awareness that a human review is available, so the unfair denial stands unchallenged. The tertiary failure is economic irrationality — unfairly denied consumers pursue chargebacks, file regulatory complaints, post negative reviews, and attrite from the customer base, generating costs that exceed the fraud losses the automated rules were designed to prevent. The regulatory consequence materialises when a consumer advocacy group, journalist, or regulator identifies the pattern: mass remediation (refunding all affected consumers), regulatory penalty (potentially millions in fines under consumer protection legislation), reputational damage (media coverage of algorithmic bias against vulnerable consumers), and mandatory governance remediation (implementing the controls that should have been in place from the start). In financial services, the consequence includes FCA Consumer Duty enforcement action for failure to deliver good outcomes and potential requirement to appoint a skilled person under Section 166 of FSMA 2000.

Cross-references: AG-019 (Human Escalation & Override Triggers) provides the escalation framework that AG-501 requires for refund denials. AG-453 (Adverse Action Notice Governance) governs the notice requirements that AG-501 extends to refund-specific contexts. AG-499 (Personalised Pricing Fairness Governance) addresses pricing fairness that may interact with refund determinations. AG-503 (Complaint Triage and Human Handoff Governance) governs the triage process for complaints that may originate from refund denials. AG-504 (Consumer Disclosure Timing Governance) governs when automated decisional rules must be disclosed to consumers. AG-464 (Reconciliation Break Escalation Governance) addresses financial reconciliation breaks that may arise from refund processing errors. AG-384 (Stateful Rollback Semantics Governance) governs the rollback mechanisms that may be required to reverse erroneous automated refund decisions. AG-415 (Decision Journal Completeness Governance) provides the general decision journal framework that AG-501 specialises for refund and remedy contexts.

Cite this protocol

AgentGoverning. (2026). AG-501: Refund and Remedy Automation Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-501

← Previous Protocol

AG-500

Dark Pattern Resistance Governance

Next Protocol →

AG-502

Vulnerability Targeting Prohibition Governance