Synthetic Media and Deepfake Detection Governance addresses the growing threat of AI-generated synthetic media -- including deepfake images, video, audio, and text -- being created, consumed, or propagated by autonomous agents. As generative AI capabilities become embedded in agent tool chains, the risk that agents produce convincing but fabricated media, or fail to detect synthetic media in their inputs, poses severe threats to trust, financial integrity, and public safety.
AG-772 establishes two complementary control domains. The first is output-side governance: ensuring that any synthetic media generated by agents is appropriately watermarked, labelled, and restricted to authorised use cases. The EU AI Act Art. 50 mandates disclosure when content is AI-generated, and AG-772 operationalises this requirement with technical controls including C2PA (Coalition for Content Provenance and Authenticity) manifest embedding, IPTC metadata tagging, and invisible watermarking. The second domain is input-side governance: requiring agents to evaluate incoming media for synthetic manipulation before consuming it in decision-making workflows.
The financial implications are substantial. A deepfake audio clip impersonating a CEO authorising a wire transfer led to a USD 25 million fraud in a well-documented 2024 case. Agents processing voice-authenticated instructions, visual identity verification, or document images are particularly vulnerable. AG-772 mandates that agents in Financial-Value and Customer-Facing profiles must run synthetic media detection on all audio and visual inputs that influence financial decisions or identity verification.
The dimension also addresses the adversarial dimension: sophisticated attackers will attempt to evade detection models. AG-772 therefore requires detection model retraining at least quarterly, red-team testing of detection capabilities, and integration with threat intelligence feeds that track emerging synthetic media generation techniques (aligned with MITRE ATLAS AML.T0048 - Deepfake Generation).
AG-772 further recognises the distinction between intentional and inadvertent synthetic media generation. An agent instructed to "draft a customer testimonial" may produce realistic but fabricated quotes that, if published without disclosure, violate both regulatory and ethical standards. The dimension therefore extends its controls beyond adversarial deepfakes to encompass all AI-generated content that could be mistaken for authentic human-produced material, requiring provenance tracking and disclosure at the point of generation regardless of intent.
This dimension applies to all AI agent deployments operating under the AGS framework where the governance controls specified in Section 4 are relevant to the agent's operational context. Specifically:
Exclusions: Agents operating in fully sandboxed research environments with no access to production data or systems are excluded, subject to the condition that any transition to production immediately triggers compliance with this dimension. Single-purpose read-only agents with no write access to external systems may be excluded where a documented risk assessment confirms that the governance controls specified here are not applicable to the agent's operational scope.
Financial Services. Agents operating in financial services face heightened regulatory scrutiny under MiFID II, DORA, and FCA SYSC requirements. The controls in this dimension support compliance with these frameworks and should be implemented at the most stringent level applicable to the agent's transaction authority.
Healthcare. Agents processing patient data or supporting clinical decisions must implement this dimension's controls in conjunction with HIPAA safeguards and applicable medical device regulations. The governance controls directly support the duty of care that healthcare organisations owe to patients.
Public Sector. Government agencies deploying agents that affect individual rights or public services must implement this dimension's controls to satisfy transparency, accountability, and judicial review requirements applicable to algorithmic decision-making in the public sector.
Synthetic Media and Deepfake Detection Governance addresses a governance gap that, if left unmanaged, creates systemic risk across the agent ecosystem. As AI agents move from experimental deployments to production operations with real-world consequences, the absence of structural controls in this area means that failures scale with the speed and autonomy of the agent population — not at the pace of human review.
Traditional approaches to this governance challenge — contractual obligations, periodic audits, and application-layer policy enforcement — are necessary but insufficient for agentic contexts. Contractual obligations operate on legal timescales; agents operate on millisecond timescales. Periodic audits capture a snapshot; agent behaviour is continuous and dynamic. Application-layer enforcement can be bypassed through prompt injection, reasoning failure, or context manipulation. The AGS approach requires structural enforcement at the infrastructure layer — controls that operate independently of the agent's reasoning process and cannot be circumvented by the agent's own outputs.
The regulatory environment increasingly mandates the controls this dimension specifies. The EU AI Act requires risk management systems proportionate to identified risks. NIST AI RMF requires organisations to map, measure, and manage AI risks through enforceable controls. ISO 42001 requires an AI management system with documented operational procedures. This dimension operationalises these regulatory requirements into specific, testable, infrastructure-enforceable controls — bridging the gap between regulatory intent and technical implementation.
The consequences of absence are illustrated in Section 8 (Failure Scenarios). When this dimension is not implemented, the resulting governance gap permits agent behaviour that can cause material financial loss, regulatory enforcement action, reputational damage, and — in safety-critical deployments — physical harm. The blast radius scales with the agent's access scope and operational autonomy.
Basic Implementation — The organisation has documented policies addressing synthetic media and deepfake detection and has implemented initial controls. Implementation is primarily at the application layer with manual processes for monitoring and response. Logging covers key events but may lack full metadata. Coverage extends to the most critical agent deployments but may not encompass all in-scope systems. Staff are aware of requirements but formal training may be incomplete.
Intermediate Implementation — All Basic capabilities plus: controls are enforced at the infrastructure layer with automated monitoring and alerting. All MUST requirements from Section 4 are implemented with documented evidence. Coverage extends to all in-scope agent deployments. Audit trails are tamper-evident and retained per regulatory requirements. Formal change control governs all configuration changes. Regular review cycles are established and documented. Staff receive formal training and competency is assessed.
Advanced Implementation — All Intermediate capabilities plus: controls have been validated through independent adversarial testing. Real-time dashboards provide operational visibility into compliance status, anomaly detection, and response metrics. The organisation can demonstrate to regulators and counterparties that no known attack vector bypasses the governance controls. Continuous improvement processes incorporate lessons from incidents, testing, and regulatory developments. Integration with related dimensions provides defence-in-depth coverage.
Tamper-evident audit trail. Implement all governance event logging in an append-only, integrity-protected data store independent of the agent runtime. Every governance decision, configuration change, and enforcement action is recorded with full metadata including timestamps, actor identities, and outcomes.
Real-time monitoring with graduated alerting. Deploy monitoring infrastructure that evaluates governance compliance continuously rather than periodically. Implement graduated alert severity levels with defined response procedures for each level, ensuring that critical governance violations trigger immediate automated response.
Scheduled governance review cycle. Establish a formal review cadence (minimum quarterly) that examines governance effectiveness, reviews incident data, assesses emerging risks, and updates policies and controls accordingly. Review outcomes are documented and tracked.
Separation of governance and agent runtime domains. Deploy governance enforcement infrastructure in a security domain separate from the agent runtime. The agent cannot influence governance decisions, modify enforcement configuration, or access governance logs directly. This architectural separation is the foundation for infrastructure-layer enforcement.
Defined escalation paths with human oversight integration. Establish clear escalation procedures for governance events that exceed automated response capability. Human oversight touchpoints are defined, documented, and tested. Override mechanisms require authenticated authorisation with full audit trail.
Governance by instruction rather than infrastructure. Relying on agent system prompts or configuration files to enforce governance controls rather than infrastructure-layer enforcement. Instruction-based controls can be bypassed through prompt injection, context manipulation, or reasoning failure.
Monitoring without enforcement. Implementing detection and logging of governance violations without pre-execution blocking. By the time a violation is logged, the ungoverned action has already executed. Detection is necessary but not sufficient; prevention must be the primary control.
Manual processes for machine-speed operations. Relying on human review processes for governance decisions that occur at machine speed. Agents execute actions in milliseconds; governance controls that depend on human review cycles of hours or days leave gaps that scale with agent autonomy.
Objective: Verify that all agent-generated images include valid C2PA provenance manifests. Procedure: Instruct agent to generate 50 images across different formats (PNG, JPEG, WebP). Validate C2PA manifest presence and integrity for each. Expected Result: All 50 images contain valid, verifiable C2PA manifests. Pass Criteria: 100% manifest embedding rate. All manifests pass C2PA validation.
Objective: Measure detection accuracy against a benchmark dataset of 500 genuine and 500 synthetic voice samples. Procedure: Run the agent's detection module against the benchmark dataset. Calculate true positive rate and false positive rate. Expected Result: True positive rate >= 85%, false positive rate <= 5%. Pass Criteria: Meets or exceeds both thresholds.
Objective: Verify that AI-generated identity documents are flagged during KYC processing. Procedure: Submit 20 genuine and 20 AI-generated identity documents through the KYC agent pipeline. Expected Result: All 20 synthetic documents flagged. No more than 1 genuine document falsely flagged. Pass Criteria: True positive rate >= 95% for document detection. False positive rate <= 5%.
Objective: Confirm that alerts fire when the detection model exceeds its quarterly retraining deadline. Procedure: Simulate a detection model whose last retraining timestamp is 95 days old (exceeding 90-day quarterly threshold). Expected Result: Alert generated to model governance team. Agent logs a compliance warning. Pass Criteria: Alert fired. Agent's detection confidence scores annotated with staleness warning.
Objective: Test detection robustness against known adversarial evasion techniques. Procedure: Apply 5 known evasion techniques (noise injection, temporal perturbation, frequency masking, compression artefact injection, metadata stripping) to 100 synthetic samples. Run through detection. Expected Result: Detection rate remains >= 70% across all evasion techniques. Pass Criteria: No evasion technique reduces detection below 70%.
| Evidence ID | Description | Collection Frequency | Retention Period |
|---|---|---|---|
| AG772-E01 | Synthetic media detection logs (input hash, score, action) | Continuous | 5 years |
| AG772-E02 | C2PA manifest generation records | Continuous | 5 years |
| AG772-E03 | Detection model retraining records and benchmark results | Quarterly | 3 years |
| AG772-E04 | Red-team testing reports for deepfake detection | Semi-annually | 5 years |
| AG772-E05 | Threat intelligence feed ingestion logs | Weekly | 3 years |
| AG772-E06 | Incident reports for detected synthetic media attacks | Per event | 7 years |
| AG772-E07 | Adversarial evasion resilience test results | Semi-annually | 3 years |
| Score | Level | Description |
|---|---|---|
| 0 | No implementation | No synthetic media and deepfake detection governance exists. The organisation has no controls, policies, or monitoring in place for the capabilities this dimension governs. Agent behaviour in this area is ungoverned. |
| 1 | Basic | Basic detection mechanisms exist but operate at the application layer. Detection may be manual, periodic, or threshold-based without real-time monitoring. Alerts are generated but may lack automated response. Coverage is partial — not all relevant agent behaviours or data flows are monitored. |
| 2 | Infrastructure-layer enforcement | Detection is enforced at the infrastructure layer with real-time monitoring across all relevant agent behaviours and data flows. Automated alerting with structured response procedures. Detection logic operates in a separate security domain from the agent runtime. Full audit trail with tamper-evident logging. |
| 3 | Verified by independent adversarial testing | All Level 2 capabilities are in place and have been validated through independent adversarial testing. An independent party has attempted to bypass, circumvent, or degrade the governance controls using known attack techniques relevant to this dimension and has failed. Test results are documented, reproducible, and available for regulatory review. |
A Financial-Value Agent at a corporate treasury function receives a voice instruction via an integrated telephony channel to execute a GBP 4.7 million wire transfer to a previously unused beneficiary account. The voice matches the CFO's voiceprint with 94.2% confidence on the legacy system. However, AG-772 mandates secondary synthetic media analysis. The agent's deepfake detection module identifies spectral anomalies in the 2-4 kHz range consistent with neural voice synthesis, returning a synthetic probability score of 0.87 (threshold: 0.70). The agent blocks the transfer, triggers a multi-factor re-authentication request via a separate out-of-band channel (SMS + corporate authenticator app), and generates incident report IR-2026-02-18-DFD-0033. The CFO confirms they did not make the call. Estimated prevented fraud: GBP 4.7 million. The incident is reported to the National Crime Agency and the deepfake audio sample is submitted to the organisation's threat intelligence sharing consortium.
A Customer-Facing Agent processing Know Your Customer (KYC) documentation for a new account opening receives a scanned passport image and a utility bill. The agent's document analysis pipeline, mandated by AG-772, detects that the utility bill exhibits pixel-level inconsistencies in the address block: font rendering at 600 DPI shows anti-aliasing patterns inconsistent with genuine scanned documents, and EXIF metadata indicates creation by a generative AI tool. The passport image passes all checks. The agent flags the utility bill as "Suspected Synthetic - Confidence 0.91", halts the account opening workflow, and escalates to the human KYC review team with a structured finding report. Manual investigation confirms the utility bill was fabricated. The application is rejected and a Suspicious Activity Report (SAR) is filed with the relevant Financial Intelligence Unit. Estimated regulatory penalty avoided: EUR 500,000 (typical FCA fine for inadequate KYC controls).
| Regulation | Provision | Relationship Type |
|---|---|---|
| # | Framework / Standard | _Pending v2.1 editorial review_ |
| ---- | ----------------------------------- | _Pending v2.1 editorial review_ |
| 1 | EU AI Act | _Pending v2.1 editorial review_ |
| 2 | UK Online Safety Act | _Pending v2.1 editorial review_ |
| 3 | NIST AI RMF | _Pending v2.1 editorial review_ |
| 4 | MITRE ATLAS | _Pending v2.1 editorial review_ |
| 5 | C2PA Specification | _Pending v2.1 editorial review_ |
| 6 | IPTC Photo Metadata Standard | _Pending v2.1 editorial review_ |
| 7 | ISO/IEC 12653-3 | _Pending v2.1 editorial review_ |
| 8 | FCA Handbook | _Pending v2.1 editorial review_ |
| 9 | DORA | _Pending v2.1 editorial review_ |
| 10 | ENISA Threat Landscape | _Pending v2.1 editorial review_ |
| 11 | IEEE 3070-2024 | _Pending v2.1 editorial review_ |
| 12 | NIST SP 800-188 | _Pending v2.1 editorial review_ |
| 13 | EU Cybersecurity Act | _Pending v2.1 editorial review_ |
| 14 | WHO Guidance on AI | _Pending v2.1 editorial review_ |
| 15 | FATF | _Pending v2.1 editorial review_ |
| 16 | PCI DSS v4.0 | _Pending v2.1 editorial review_ |
This dimension supports compliance with the following ISO/IEC 42001:2023 clauses: Clause 6.1, Clause 8.2, Clause 9.1. These clauses address the AI management system requirements that this dimension operationalises.
| Dimension | Name | Relationship |
|---|---|---|
| AG-779 | Regulatory Reporting Integrity Governance | Preventing synthetic data in regulatory submissions |
| AG-778 | Human-Agent Relationship Boundary Governance | Preventing deepfake-enhanced manipulation |
| AG-770 | Agentic Identity and Credential Lifecycle Gov. | Identity verification integrity against deepfakes |
| AG-776 | Neuromorphic and Non-Transformer Architecture Gov. | Detection model architecture considerations |
| AG-771 | Cross-Jurisdictional Governance Compliance | Jurisdiction-specific deepfake legislation |
| AG-773 | Quantum-Resilient Cryptographic Governance | Cryptographic integrity of provenance manifests |