AG-283: Deepfake-Resistant Approval Authentication Governance

2. Summary

Deepfake-Resistant Approval Authentication Governance requires that voice, video, and multimodal approval flows used to authorise AI agent governance actions are hardened against impersonation through AI-generated deepfakes. As generative AI matures, synthetic media — cloned voices, face-swapped video, lip-synced impersonations — has become a practical attack vector against approval workflows. An attacker who can generate a convincing video or voice call impersonating a senior approver can authorise mandate changes, emergency overrides, and high-value transactions without ever compromising the approver's credentials. AG-283 addresses this specific and rapidly evolving threat by requiring that approval flows incorporate deepfake detection, out-of-band confirmation, and cryptographic binding that synthetic media cannot replicate.

3. Example

Scenario A — AI Voice Clone Authorises Emergency Override: An organisation's emergency override procedure for AI agents requires verbal approval from the Head of Operations via a phone call to the governance team. An attacker uses a commercially available voice cloning service (requiring only 30 seconds of training audio sourced from a conference presentation) to generate a real-time voice clone. The attacker calls the governance team, impersonating the Head of Operations, and requests an emergency override that disables transaction limits on a payment agent. The governance team member, recognising the voice, activates the override. The agent processes £2.1 million in fraudulent transactions before the genuine Head of Operations is reached for confirmation.

What went wrong: The approval relied on voice recognition by a human listener — which is demonstrably unreliable against modern voice cloning. No technical deepfake detection was applied to the call. No out-of-band confirmation was required (e.g., the governance team did not initiate a callback to the Head of Operations' registered number). No cryptographic factor was required alongside the voice approval. Consequence: £2.1 million in fraudulent transactions, regulatory investigation for inadequate override controls, mandatory process redesign.

Scenario B — Deepfake Video Bypasses Video-Call Approval: A board-level approval is required for agent mandate changes affecting more than £5,000,000 in aggregate exposure. The approval is conducted via a video call. An attacker uses real-time face-swap software to impersonate a board member during a scheduled video call, using pre-obtained footage from public interviews to train the face-swap model. The deepfake is convincing enough to pass the video call, and the mandate change is approved. The agent subsequently operates with the expanded mandate, executing transactions that exceed the organisation's actual risk appetite.

What went wrong: The video-call approval relied on visual recognition of the approver without any technical deepfake detection, device binding, or cryptographic factor. Real-time face-swap technology has reached a level where it can fool human observers in video-call conditions (moderate resolution, variable lighting, compression artefacts). Consequence: Unapproved expansion of agent mandate beyond actual board authorisation, potential material misstatement of risk exposure, board-level incident.

Scenario C — Synthetic Voice Phishing of Agent Governance Credentials: An attacker generates a synthetic voice message impersonating the CISO, instructing a governance administrator to urgently reset a governance platform password and provide the temporary credentials by reply. The voice message is sent via the organisation's internal voicemail system. The administrator, believing it to be genuine, resets the password and provides it. The attacker uses the credentials to access the governance platform and modify agent configurations.

What went wrong: The voice message was not verified against a deepfake detection system. The organisation had no policy requiring that governance credential changes be confirmed through a separate, authenticated channel. The urgency framing (a common social engineering technique) was amplified by the perceived authority of the CISO's voice. Consequence: Full governance platform compromise, all agent configurations potentially modified, 3-week remediation.

4. Requirement Statement

Scope: This dimension applies to any AI agent governance approval workflow that includes a voice, video, or multimodal component — whether real-time (live call) or asynchronous (recorded message). This includes: voice-call-based emergency override authorisations, video-call-based mandate approvals, voicemail-based governance instructions, video-recorded approval attestations, and any hybrid flow combining voice/video with other factors. The scope extends to any workflow where a human observer makes an identity judgment based on voice or visual appearance, because these judgments are vulnerable to synthetic media deception. It does not apply to text-only approval workflows (which have different impersonation vectors addressed by AG-161) or to biometric-only authentication (addressed by AG-282), though deepfake resistance should complement biometric PAD.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

4.1. A conforming system MUST NOT rely solely on human recognition of voice or visual appearance to authenticate approvers in agent governance workflows — every voice or video-based approval MUST include at least one additional non-impersonatable factor (cryptographic device binding, FIDO2 passkey confirmation, or equivalent).

4.2. A conforming system MUST implement automated deepfake detection on all voice and video channels used for agent governance approvals, analysing for artefacts characteristic of synthetic media including temporal inconsistencies, spectral anomalies, lip-sync discrepancies, and generation model fingerprints.

4.3. A conforming system MUST require out-of-band confirmation for all voice or video-based governance approvals exceeding defined risk thresholds (e.g., mandate changes affecting financial limits above £100,000), where "out-of-band" means a confirmation through a separate communication channel initiated by the governance system, not by the approver.

4.4. A conforming system MUST bind voice and video approvals to a cryptographic factor — the approver must confirm the approval through a device-bound cryptographic action (e.g., FIDO2 assertion, smart card signature) in addition to the voice/video component.

4.5. A conforming system MUST maintain detection capabilities against current-generation deepfake tools, updating detection models at least quarterly to address new generation techniques.

4.6. A conforming system SHOULD implement challenge-response protocols for voice approvals that require the approver to respond to a randomly generated, time-limited prompt that cannot be predicted or pre-recorded.

4.7. A conforming system SHOULD analyse voice and video channels for environmental consistency indicators (background noise signatures, lighting conditions, network characteristics) that can distinguish a genuine remote approval from a synthetic injection.

4.8. A conforming system SHOULD implement media provenance verification (e.g., C2PA content credentials) on recorded approval attestations to provide cryptographic proof that the media originates from a genuine capture device and has not been synthetically modified.

4.9. A conforming system MAY implement continuous deepfake monitoring throughout the duration of video-call approvals, not just at the start of the call, to detect mid-call injection attacks where a deepfake replaces a genuine participant.

5. Rationale

The threat landscape for approval authentication has fundamentally changed with the commoditisation of generative AI. Voice cloning services can produce convincing clones from 3 seconds of reference audio. Real-time face-swap applications can run on a consumer GPU and inject into standard video-call software. Text-to-speech systems can generate natural speech in any cloned voice with arbitrary content. These tools are commercially available, affordable (many are free), and require no technical sophistication to operate.

This creates an acute threat to agent governance approval workflows because many organisations still rely on "I recognise the voice/face" as an authentication factor for high-value approvals. Verbal authorisation by phone, video-call approval by board members, and voicemail-based instructions from executives are common in governance workflows. All of these are now vulnerable to synthetic impersonation that is difficult or impossible for human observers to detect.

The problem is asymmetric: the attacker needs to fool a human for seconds to minutes, under conditions (phone audio quality, video compression, meeting fatigue) that favour the attacker. The defender needs to reliably distinguish genuine from synthetic across all conditions, all the time. Humans are demonstrably poor at this task — studies show that even trained listeners detect voice clones at rates only marginally better than chance when the clone quality is high.

AG-283 addresses this by requiring that voice/video approval workflows never rely solely on human perception. Every such workflow must include a non-impersonatable cryptographic factor, automated deepfake detection, and out-of-band confirmation for high-risk actions. This layered approach ensures that even if the deepfake is perfect, the attacker still cannot produce the cryptographic factor from the approver's bound device, and the out-of-band confirmation initiated by the governance system (not the attacker) provides a second verification channel.

6. Implementation Guidance

Deepfake-resistant approval authentication requires a layered approach combining detection, cryptographic binding, and procedural controls. No single technique is sufficient given the pace of deepfake improvement.

Recommended patterns:

Cryptographic-first approval with voice/video as supplementary. Redesign approval workflows so the primary authentication is cryptographic (FIDO2 passkey on a bound device) and the voice/video component is supplementary evidence, not the primary factor. Example: a mandate approval requires (1) FIDO2 assertion from the approver's registered device, (2) a video call where the approver verbally confirms the specific details of the mandate change, and (3) automated deepfake analysis of the video call. The approval is valid only if all three components pass. This means a perfect deepfake still fails because the attacker cannot produce the FIDO2 assertion.
System-initiated out-of-band callback. For voice-based emergency overrides, the governance system initiates the callback to the approver's registered phone number — the approver cannot initiate the call. This defeats attacks where the attacker calls in impersonating the approver, because the genuine approver's phone will ring on the registered number. If the attacker has also compromised the phone (e.g., SIM swap), the callback fails against the cryptographic factor required by 4.4.
Real-time deepfake detection pipeline. Deploy automated deepfake detection on all governance voice and video channels. For voice: analyse spectral features, formant transitions, breathing patterns, and background consistency; apply neural-network-based classifiers trained on current-generation TTS and voice-clone outputs. For video: analyse facial boundary artefacts, temporal flickering, lip-sync accuracy, eye reflection consistency, and compression-artefact patterns; apply frame-level and sequence-level classifiers. The detection pipeline should run in real-time alongside the call and flag anomalies before the approval is finalised.
Media provenance with C2PA. For recorded approval attestations (e.g., a board member records a video approving a mandate), require C2PA content credentials that cryptographically bind the recording to the capture device, timestamp, and location. Synthetic media cannot produce valid C2PA provenance from a genuine device. This provides a verifiable chain from capture to governance record.
Randomised challenge phrases for voice approval. When voice approval is used, the governance system presents a randomly generated phrase (e.g., "Please confirm by saying: seven-delta-four-november-two") that the approver must speak. The phrase is generated at the time of the approval request and expires within 60 seconds. Replay attacks fail because the phrase is unpredictable. Pre-recorded clips fail because they do not contain the specific phrase.

Anti-patterns to avoid:

Verbal approval by phone without any technical verification. "The CFO called and approved it" is not a governance control — it is a social convention that deepfakes can trivially exploit. Never accept voice-only approval without a cryptographic or out-of-band factor.
Video-call approval based on visual recognition alone. "I saw them on the video call" is not authentication. Real-time face-swap technology can produce video that passes visual inspection under typical video-call conditions. Always require a non-visual factor.
Deepfake detection as the sole defence. Deepfake detection is an arms race. Detection models can be defeated by new generation techniques. AG-283 requires detection as one layer in a defence-in-depth approach, never as the sole defence. The cryptographic factor is the anchor.
Static challenge phrases. Using the same challenge phrase every time (e.g., "Please say your passphrase") allows the attacker to pre-record or synthesise the target speaking that phrase. Challenges must be randomised and time-limited.
Approver-initiated contact for high-risk approvals. If the approver initiates the call to the governance team, the governance team cannot verify the caller's identity — they can only recognise (or think they recognise) the voice/face. The governance system must initiate the contact to ensure it reaches the genuine approver.

Industry Considerations

Financial Services. The FCA has issued specific warnings about deepfake-enabled fraud. Financial firms must demonstrate that their approval workflows for AI agent governance are resistant to synthetic media attacks. The combination of voice cloning and real-time face swap represents a material risk to trading desk and treasury approval workflows. The £2.1 million voice-clone attack in Scenario A is based on real-world incidents of CFO voice-clone fraud.

Healthcare. Clinical approval workflows for AI agents governing treatment recommendations or medication dispensing are targets for deepfake attacks where the attacker seeks to manipulate clinical outcomes. Voice approval of clinical AI agent overrides must include cryptographic binding.

Critical Infrastructure. Emergency override approvals for AI agents controlling critical infrastructure (energy, transport, water) are high-value targets. An attacker who can impersonate a senior operator can authorise dangerous actions. The combination of cryptographic binding and out-of-band confirmation is essential.

Maturity Model

Basic Implementation — Voice and video approvals require a second factor (e.g., OTP or push notification) in addition to the voice/video component. Basic deepfake awareness training is provided to governance team members. Approval procedures require callback to registered numbers for phone-based authorisations. Detection tools are deployed but not yet independently evaluated. This meets minimum mandatory requirements but relies partly on human judgment to detect sophisticated attacks.

Intermediate Implementation — Cryptographic device binding (FIDO2 passkey) is required alongside voice/video approval. Automated deepfake detection is deployed on all governance voice and video channels with quarterly model updates. Challenge-response protocols with randomised phrases are implemented for voice approvals. Out-of-band confirmation is required for approvals exceeding £100,000 in financial impact. Deepfake detection performance is independently evaluated.

Advanced Implementation — All intermediate capabilities plus: C2PA media provenance on recorded approvals. Continuous mid-call deepfake monitoring for video approvals. Real-time lip-sync analysis cross-referenced with audio stream. Environmental consistency analysis. Independent adversarial testing with state-of-the-art real-time face-swap and neural voice cloning tools. Detection models updated monthly based on threat intelligence. The organisation can demonstrate that no known deepfake tool defeats the combined cryptographic-plus-detection-plus-out-of-band defence.

7. Evidence Requirements

Required artefacts:

Approval workflow architecture. Documentation showing the layered defence for voice/video approvals: cryptographic factor, deepfake detection, out-of-band confirmation, and challenge-response protocol. Format: architecture diagram and workflow specification.
Deepfake detection evaluation. Independent evaluation of detection performance against current-generation deepfake tools, including false-positive and false-negative rates. Updated quarterly.
Approval event log. Timestamped records of all voice/video governance approvals including: cryptographic factor verification result, deepfake detection result, out-of-band confirmation result, challenge-response result, and the governance action approved.
Detection model update records. Evidence that deepfake detection models are updated at least quarterly, including the threat intelligence sources, new generation techniques addressed, and validation results.
Adversarial test results. Results from independent testing using current-generation voice cloning and face-swap tools against the governance approval workflow.

Retention requirements:

Approval event logs and detection evaluations: minimum 7 years for regulated financial services; minimum 5 years for other regulated sectors; minimum 3 years otherwise.

Access requirements:

Producible to regulators or auditors within 48 hours of request.

8. Test Specification

Test 8.1: Voice Clone Attack on Approval Workflow

Stimulus: Using a commercially available voice cloning service, generate a real-time voice clone of an authorised approver. Attempt to complete a voice-based governance approval using the clone.
Expected behaviour: The approval fails because: (a) the deepfake detection system flags the synthetic voice, and/or (b) the attacker cannot produce the required cryptographic factor from the approver's bound device.
Pass criteria: The approval does not succeed. At least one layer of the defence detects or blocks the attack.
Fail criteria: The approval succeeds with a voice clone without the cryptographic factor.

Test 8.2: Real-Time Face-Swap Attack on Video Approval

Stimulus: Using real-time face-swap software, impersonate an authorised approver during a video-call governance approval. Inject the face-swap into the video stream.
Expected behaviour: The approval fails because: (a) the deepfake detection system identifies face-swap artefacts, and/or (b) the attacker cannot produce the required FIDO2 assertion.
Pass criteria: The approval does not succeed. The deepfake detection system logs the detected anomaly.
Fail criteria: The approval succeeds with a face-swapped video.

Test 8.3: Out-of-Band Confirmation Enforcement

Stimulus: Initiate a governance approval exceeding the defined risk threshold (e.g., mandate change affecting £100,000+ limits). Attempt to complete the approval without the out-of-band confirmation step.
Expected behaviour: The approval is held in a pending state until out-of-band confirmation is received through the governance-system-initiated callback.
Pass criteria: No high-risk approval completes without out-of-band confirmation.
Fail criteria: A high-risk approval completes without the out-of-band step.

Test 8.4: Challenge-Response Unpredictability

Stimulus: Request 100 consecutive challenge phrases from the governance system. Analyse the phrases for predictability (sequential patterns, limited vocabulary, repeated phrases).
Expected behaviour: Phrases are randomly generated with sufficient entropy (at least 40 bits), with no repeated phrases in 100 attempts.
Pass criteria: No phrase is predictable from previous phrases. Entropy meets the defined minimum.
Fail criteria: Phrases show predictable patterns or insufficient entropy.

Test 8.5: Cryptographic Factor Independence

Stimulus: Provide a perfect voice and video approval (genuine, not synthetic) but without the required cryptographic factor (no FIDO2 assertion, no smart card signature).
Expected behaviour: The approval is rejected because the cryptographic factor is mandatory — voice/video alone is insufficient.
Pass criteria: The approval does not succeed without the cryptographic factor, regardless of how convincing the voice/video is.
Fail criteria: The approval succeeds on voice/video alone without the cryptographic factor.

Test 8.6: Mid-Call Deepfake Injection

Stimulus: Begin a genuine video-call approval. Mid-call, inject a face-swap to replace the genuine participant with a synthetic impersonation.
Expected behaviour: Continuous deepfake monitoring detects the mid-call switch (visual discontinuity, artefact onset, environmental change) and flags the approval.
Pass criteria: The mid-call injection is detected and the approval is flagged for additional verification.
Fail criteria: The mid-call injection is not detected and the approval proceeds as genuine.

Test 8.7: Recorded Approval Provenance

Stimulus: Submit a synthetically generated video recording as an approval attestation, without valid C2PA content credentials.
Expected behaviour: The governance system rejects the recording because it lacks valid provenance credentials from a genuine capture device.
Pass criteria: Recordings without valid provenance are rejected or flagged for manual verification.
Fail criteria: A synthetic recording without provenance is accepted as a valid approval.

Conformance Scoring

Score 0: Voice/video approvals rely solely on human recognition with no additional factors or detection.
Score 1: A second factor (OTP, push notification) supplements voice/video approval, but no automated deepfake detection is deployed and no cryptographic device binding is required.
Score 2: Cryptographic device binding is mandatory for all voice/video approvals. Automated deepfake detection is deployed with quarterly updates. Out-of-band confirmation is required for high-risk approvals.
Score 3: All Score 2 capabilities plus: C2PA media provenance, continuous mid-call monitoring, independent adversarial testing with state-of-the-art tools, and monthly detection model updates.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 52 (Transparency for Deepfakes)	Direct requirement
EU AI Act	Article 9 (Risk Management System)	Supports compliance
FCA SYSC	6.1.1R (Systems and Controls)	Direct requirement
PSD2/EBA RTS	Article 4 (Authentication Code)	Supports compliance
SOX	Section 404 (Internal Controls Over Financial Reporting)	Supports compliance
NIST AI RMF	MANAGE 2.2 (Manage AI Risks)	Supports compliance
ISO 27001	A.8.5 (Secure Authentication)	Supports compliance

EU AI Act — Article 52 (Transparency for Deepfakes)

Article 52 requires that users are informed when content has been generated or manipulated by AI, including deepfakes. In the context of agent governance, this maps to the requirement that governance systems can detect and flag synthetically generated approval media. AG-283's deepfake detection requirement directly supports this transparency obligation by identifying when governance-relevant media may be synthetic.

FCA SYSC — 6.1.1R (Systems and Controls)

The FCA expects firms to have systems and controls that are adequate for their business. With the demonstrated capability of voice-clone fraud (reported incidents exceeding £20 million globally), the FCA expects firms using voice or video approval for AI agent governance to demonstrate resilience against synthetic media attacks. AG-283's layered defence provides the evidence trail that supervisors expect.

SOX — Section 404 (Internal Controls Over Financial Reporting)

If AI agents perform financial operations under human governance approval, the integrity of that approval is a SOX internal control. A deepfake-compromised approval is a control failure. AG-283's cryptographic binding ensures that the approval cannot be synthetically fabricated, maintaining the integrity of the control for SOX attestation.

10. Failure Severity

Field	Value
Severity Rating	Critical
Blast Radius	Organisation-wide — a deepfake impersonating a senior approver can authorise changes to any agent within that approver's authority

Consequence chain: A successful deepfake attack on an approval workflow gives the attacker the apparent authority of the impersonated approver. The attacker can authorise mandate changes, emergency overrides, and configuration modifications under the approver's identity. The audit trail records a genuine-appearing approval from the legitimate approver — creating a false record that may not be detected until significant damage has occurred. The damage potential equals the full authority of the impersonated identity. If the impersonated person is the CFO, the attacker has the CFO's authority over all agent mandates. If the impersonated person controls emergency overrides, the attacker can disable safety controls. Detection is difficult because the audit trail appears legitimate. The false confidence created by an apparently genuine voice/video approval may delay investigation. Financial exposure in documented voice-clone fraud incidents ranges from £200,000 to £25,000,000 per incident.

Cross-references: AG-282 (Biometric Spoof Resistance Governance) provides the PAD foundation that AG-283 extends to the specific domain of deepfake media. AG-279 (Human Identity Proofing Governance) ensures the enrolled identity is genuine; AG-283 ensures the approval from that identity is genuine. AG-281 (Device Identity Binding Governance) provides the cryptographic anchor that deepfakes cannot replicate. AG-287 (Non-Repudiation Evidence Governance) depends on AG-283 to ensure the evidence of approval is not synthetically fabricated. AG-016 (Cryptographic Action Attribution) provides the cryptographic binding that is the primary defence when deepfake detection alone is insufficient. AG-115 (Strong Authentication for Agent-Initiated Value Transfer) should incorporate deepfake resistance for high-value transfer approvals.

Cite this protocol

AgentGoverning. (2026). AG-283: Deepfake-Resistant Approval Authentication Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-283

← Previous Protocol

AG-282

Biometric Spoof Resistance Governance

Next Protocol →

AG-284

Credential Presentation Minimisation Governance