The Standard

The 841 Dimensions Regulatory Mapping Version History

Compliance

Compliance Leaderboard Platform Comparison

Verification

Submit for Verification Self-Assessment Tool

About

About AgentGoverning Press & Media

Contact

AG-411

Video and Screen Evidence Governance

Logging, Observability & Forensics ~22 min read AGS v2.1 · April 2026

EU AI Act SOX FCA NIST HIPAA ISO 42001

2. Summary

Video and Screen Evidence Governance requires that organisations operating AI agents with computer-use capabilities — agents that interact with graphical user interfaces, navigate web pages, fill in forms, click buttons, and read on-screen content — capture, classify, redact, retain, and produce visual evidence of those interactions in a manner that is forensically sound, privacy-compliant, and auditable. Computer-use agents create a unique evidentiary challenge: unlike text-only agents whose actions are fully captured by structured logs, computer-use agents interact with visual interfaces where the full context of a decision — what was visible on screen, what was clicked, what changed — is only available as visual evidence. Without governed visual capture, the organisation cannot reconstruct what the agent saw, cannot verify that the agent acted on accurate information, and cannot demonstrate to regulators or courts that the agent's actions were appropriate given the visual context it operated within.

3. Example

Scenario A — Missing Visual Evidence Renders Incident Investigation Impossible: A financial-value agent operating through a computer-use interface processes 340 trade confirmations per day by navigating a broker portal, reading confirmation details from the screen, and entering settlement instructions into a treasury management system. On a Tuesday afternoon, the agent enters incorrect settlement amounts for 17 trades totalling £4.2 million in discrepancies. The structured action log records "read confirmation screen" and "entered settlement amount £247,000" but does not capture what was actually displayed on the broker portal screen at the moment of reading. Investigation cannot determine whether the broker portal displayed incorrect information (a portal defect), the agent misread the screen (an agent perception error), or the screen was manipulated by a third party between display and capture. The organisation cannot attribute the error, the broker denies any portal defect, and the £4.2 million discrepancy must be absorbed.

What went wrong: The system logged agent actions in structured form but did not capture the visual evidence — the actual screen content at the moment of each read operation. Without visual evidence, the organisation has no forensic basis to distinguish between a portal error, an agent perception error, and an adversarial manipulation. Consequence: £4.2 million unattributable loss, inability to perform root cause analysis, regulatory concern under FCA SYSC 6.1.1R about adequacy of records and controls, and £380,000 in external forensic investigation costs that ultimately proved inconclusive.

Scenario B — Unredacted Visual Evidence Creates Privacy Breach: An enterprise workflow agent uses computer-use capabilities to process employee onboarding by navigating the HR portal, reading personal details, and entering them into payroll and benefits systems. The visual evidence capture system records full-screen video of every interaction at 2 frames per second. Over 14 months, the system accumulates 2.3 terabytes of video containing unredacted images of employee national insurance numbers, bank account details, home addresses, dates of birth, passport photographs, and medical information from benefits enrolment screens. A storage migration exposes the video archive to a broader access group than intended. 4,700 employees' personal data is visible in unredacted screen recordings accessible to 85 IT operations staff who have no legitimate need to view this data.

What went wrong: Visual evidence was captured without data classification or redaction. The capture system treated the entire screen as a single undifferentiated artefact, making no distinction between operational data (which buttons were clicked, what forms were submitted) and sensitive personal data (national insurance numbers, bank details, medical information) visible on the same screen. Consequence: GDPR Article 5(1)(c) violation for data minimisation failure, GDPR Article 32 finding for inadequate security of personal data, ICO investigation, £1.2 million remediation cost including retroactive redaction of 2.3 terabytes of video, notification obligation to 4,700 affected employees.

Scenario C — Visual Evidence Without Chain of Custody Is Inadmissible: A public-sector agent operating a computer-use interface processes benefit eligibility determinations by navigating government databases, reading applicant information, and recording eligibility decisions. An applicant challenges their denial, claiming the agent operated on incorrect database information. The organisation produces screen recordings showing the database information the agent viewed. However, the recordings are stored as standard video files with no cryptographic integrity protection, no timestamp authentication, and no chain-of-custody record. The applicant's legal counsel argues that the recordings could have been modified after the fact and are therefore inadmissible as evidence. The administrative tribunal agrees, noting that the recordings lack any mechanism to verify their authenticity or demonstrate they have not been tampered with.

What went wrong: Visual evidence was captured but not governed with forensic integrity standards. The video files were treated as operational data rather than evidentiary artefacts. Without cryptographic hashing, timestamping, and chain-of-custody records, the visual evidence has no more evidentiary weight than an unsigned, undated screenshot. Consequence: Inadmissible evidence in tribunal proceedings, inability to defend the agent's decision, adverse finding requiring benefit payment of £34,000, precedent risk for 12,000 similar determinations, and £560,000 in legal costs and settlement exposure.

4. Requirement Statement

Scope: This dimension applies to any AI agent deployment where the agent interacts with graphical user interfaces through computer-use capabilities — including screen reading, mouse movement, clicking, keyboard input, form filling, navigation, and any other interaction with visual interfaces. The scope covers browser-based interactions, desktop application interactions, terminal-based interactions rendered visually, and remote desktop or virtual machine interactions. Any deployment where the agent's decision-making context includes visual information that is not fully captured by structured text logs falls within scope. Agents that operate exclusively through programmatic interfaces (structured API calls with no visual component) are outside scope unless they trigger visual interactions in downstream systems. The scope extends to the full lifecycle of visual evidence: capture, classification, redaction, integrity protection, storage, retention, access control, production for audit or legal purposes, and eventual deletion.

4.1. A conforming system MUST capture visual evidence of agent-screen interactions at a frame rate and resolution sufficient to reconstruct the complete visual context of every agent decision and action, with a minimum of one frame captured at the moment of each discrete agent action (click, keystroke, navigation, form submission).

4.2. A conforming system MUST associate each captured visual frame or recording segment with a unique identifier that links to the corresponding structured action log entry, creating a verifiable bidirectional mapping between visual evidence and action logs.

4.3. A conforming system MUST apply data classification to captured visual evidence, identifying regions of each frame that contain sensitive data categories (personally identifiable information, financial account details, authentication credentials, health information, classified government data) as defined by AG-014.

4.4. A conforming system MUST implement automated or human-supervised redaction of classified sensitive data regions before visual evidence is stored in general evidence repositories, retaining unredacted originals only in access-controlled forensic archives with documented justification for retention.

4.5. A conforming system MUST protect the integrity of visual evidence using cryptographic mechanisms — at minimum, a cryptographic hash computed at the point of capture and stored in a tamper-evident log per AG-006 — sufficient to detect any modification to the evidence after capture.

4.6. A conforming system MUST maintain chain-of-custody records for all visual evidence, documenting every access, copy, transfer, and transformation (including redaction) with timestamps, actor identities, and purpose.

4.7. A conforming system MUST define and enforce retention periods for visual evidence that comply with applicable regulatory requirements and AG-016, including the ability to locate and delete visual evidence when required by data subject erasure requests.

4.8. A conforming system SHOULD implement differential capture strategies that increase capture frequency and resolution during high-risk operations (financial transactions, access to sensitive data, irreversible actions) and reduce capture frequency during low-risk operations (routine navigation, idle screens) to balance evidentiary completeness with storage efficiency.

4.9. A conforming system SHOULD implement optical character recognition or equivalent extraction on captured visual evidence to create searchable text indices, enabling investigators to locate specific visual evidence by content (e.g., "find all frames where account number ending in 4729 was visible").

4.10. A conforming system MAY implement real-time visual anomaly detection that compares captured screen content against expected visual patterns, flagging unexpected content (phishing pages, error screens, unexpected modal dialogs) for immediate review.

5. Rationale

Computer-use agents represent a fundamental shift in the observability landscape. Traditional software agents interact with systems through APIs, producing structured request-response logs that fully capture the interaction context. Computer-use agents interact through visual interfaces — the same interfaces designed for human users. The critical difference is that the decision-making context for a computer-use agent includes everything visible on the screen: text, layout, images, colours, positions of buttons, the presence or absence of warnings, the content of adjacent panels, and the overall visual state of the application. Structured action logs capture what the agent did (clicked button at coordinates 847,312) but not why it did it (the button was labelled "Approve" and appeared next to the applicant's verified identity badge). Without visual evidence, the "why" is lost.

This gap has three consequences. First, incident investigation becomes impossible or inconclusive. When an agent takes an incorrect action, investigators need to determine whether the agent perceived the screen correctly, whether the screen displayed correct information, and whether the agent's decision was appropriate given what was displayed. Without visual evidence, all three questions are unanswerable — the structured log says "agent read screen and took action" but provides no basis for evaluating the screen content that informed the action. Second, regulatory and legal proceedings require evidence of what the agent operated on, not merely what it did. A record stating "agent determined applicant ineligible" is insufficient — regulators and tribunals need to see what information the agent used to make that determination. Visual evidence provides this. Third, adversarial manipulation of visual interfaces (phishing pages, injected content, visual spoofing) can only be detected if visual evidence is preserved for analysis.

The privacy tension is equally significant. Visual evidence captures everything on screen, which may include highly sensitive personal data, authentication credentials, and confidential business information that appears in adjacent windows, background applications, or peripheral screen regions unrelated to the agent's task. Unredacted visual evidence is a privacy liability — it captures data that was never intended to be recorded and retains it in a form (video) that is difficult to search, classify, or selectively delete. The governance challenge is to capture enough visual evidence for forensic purposes while protecting against the privacy risks inherent in recording everything the screen displays.

The integrity dimension is critical because visual evidence is trivially modifiable. Unlike structured log entries that can be validated against schemas and cross-referenced with other system records, a video file or screenshot has no inherent integrity guarantee. A modified screenshot is visually indistinguishable from an original unless cryptographic integrity protection was applied at the point of capture. For visual evidence to have evidentiary value in regulatory proceedings, legal disputes, or internal investigations, it must be protected against modification from the moment of capture through the entire retention period.

6. Implementation Guidance

Visual evidence governance requires a capture pipeline that records screen interactions, a classification and redaction pipeline that protects sensitive data, an integrity pipeline that ensures forensic soundness, and a lifecycle management pipeline that handles retention, access, and deletion. These pipelines must operate in near-real-time for capture and classification, and must be reliable enough that evidence gaps do not occur during critical operations.

Recommended patterns:

Action-triggered capture with contextual windows. Rather than continuous video recording (which is storage-intensive and captures vast amounts of irrelevant visual data), capture a frame at each agent action plus a configurable window of frames before and after (e.g., 3 frames at 1-second intervals before and after each action). This provides the visual context for each action decision while keeping storage requirements manageable. For a typical agent performing 200 actions per day with a 7-frame window per action, this produces approximately 1,400 frames per day rather than 172,800 frames from continuous 2fps recording — a 99.2% storage reduction with minimal forensic coverage loss.
Region-based classification and redaction. Apply data classification at the region level within each frame, not at the frame level. Use optical character recognition to identify text regions, classify each region against the data classification taxonomy from AG-014 (PII, financial account data, credentials, health data), and redact classified regions before the frame enters general storage. Retain unredacted originals in a separate forensic archive with strict access controls and time-limited retention. This approach preserves the operational context (layout, button positions, workflow state) while removing sensitive data from the general evidence repository.
Cryptographic hash chains for visual evidence. Compute a cryptographic hash (minimum SHA-256) for each captured frame or recording segment at the point of capture. Store the hash in the tamper-evident log governed by AG-006. Optionally chain sequential hashes so that any modification or deletion of an intermediate frame is detectable. For productions to regulators or courts, the hash chain provides verifiable proof that the visual evidence has not been modified since capture.
Bidirectional evidence linking. Each visual capture must reference the corresponding action log entry by unique identifier, and each action log entry must reference its associated visual captures. This bidirectional linking enables investigators to navigate from a suspicious log entry to the visual context, or from an anomalous visual observation to the agent's recorded action. Broken links (actions without visual evidence or visual captures without corresponding log entries) must be flagged as evidence integrity exceptions.
Storage tiering with retention alignment. Implement tiered storage: hot storage (high-speed, immediately accessible) for recent evidence (e.g., 30 days), warm storage (lower-speed, accessible within minutes) for medium-term evidence (e.g., 30 days to 2 years), and cold storage (archive, accessible within hours) for long-term retention. Align tier transitions with the retention periods required by AG-016 and applicable regulations. Ensure that tier transitions do not break hash chain integrity or chain-of-custody records.

Anti-patterns to avoid:

Continuous full-screen video without classification. Recording continuous video of the full screen without data classification or redaction creates an ever-growing archive of sensitive data with no governance. This approach maximises storage costs, maximises privacy liability, and produces evidence that is difficult to search or produce on demand.
Screenshots only at error conditions. Capturing visual evidence only when errors or exceptions occur provides no baseline for comparison and no evidence of normal operation. An investigator examining an error screenshot has no reference for what the screen should have looked like. Furthermore, many failures are not detected at the time they occur — visual evidence of the pre-failure state is required.
Visual evidence without structured metadata. Capturing frames or video segments as standalone files without metadata linking them to action logs, timestamps, agent identifiers, and session identifiers. Unlinked visual evidence has minimal forensic value because investigators cannot determine when, where, or in what context it was captured.
Storing redaction masks rather than performing redaction. Applying overlay masks to indicate where sensitive data exists but retaining the full unredacted image underneath. This approach provides no actual privacy protection — anyone with access to the image data can view the sensitive content by ignoring the mask layer. Redaction must be destructive in the general evidence repository (sensitive pixels replaced with opaque blocks), with unredacted originals only in the forensic archive.
Relying on screen recording at the operating-system level. Using generic screen recording tools that capture the entire desktop, including notifications, other applications, personal browser tabs, and system dialogs unrelated to the agent's task. This captures excessive data, introduces privacy risks from non-agent content, and creates evidence that is not agent-attributable.

Industry Considerations

Financial Services. Financial regulators (FCA, SEC, BaFin) increasingly require evidence of the information agents operated on, not merely what actions they took. For computer-use agents processing trades, confirmations, or settlement instructions through broker portals, visual evidence is the only way to demonstrate what the portal displayed at the moment of each action. Firms should treat visual evidence from financial agent operations as MiFID II transaction records, with corresponding retention periods (minimum 5 years, extending to 7 in some jurisdictions). Financial screens frequently display account numbers, IBAN codes, and client names that require redaction under data minimisation principles.

Healthcare. Clinical agents interacting with electronic health record interfaces through computer-use capabilities capture screens containing protected health information (PHI). Visual evidence governance must comply with jurisdiction-specific health data regulations (HIPAA in the US, UK GDPR and the NHS Data Security and Protection Toolkit in the UK). Region-based redaction must be calibrated for clinical data patterns — medication names and dosages visible on prescription screens, diagnostic codes, and patient identifiers.

Public Sector. Government agents processing benefit claims, immigration applications, or law enforcement queries through computer-use interfaces create visual evidence that may be subject to freedom of information requests, tribunal discovery requirements, and human rights challenges. Visual evidence must be retained with sufficient integrity to withstand legal scrutiny while being redactable to protect third-party data visible on shared government screens.

Crypto and Web3. Agents interacting with decentralised application interfaces, wallet interfaces, and blockchain explorer screens through computer-use capabilities capture visual evidence of transaction authorisations, wallet addresses, and token balances. Visual evidence integrity is particularly important because on-chain transactions are irreversible — the visual evidence of what the agent saw at the moment of authorisation is the only record of the pre-transaction context.

Maturity Model

Basic Implementation — Visual evidence is captured at a minimum of one frame per agent action with bidirectional linking to structured action logs. Data classification identifies PII and sensitive data regions. Redaction removes sensitive data before general storage. Cryptographic hashes are computed at capture and stored in a tamper-evident log. Retention periods are defined and enforced. This level meets the minimum mandatory requirements of 4.1 through 4.7.

Intermediate Implementation — All basic capabilities plus: differential capture strategies increase capture density for high-risk operations. Optical character recognition creates searchable text indices across visual evidence. Chain-of-custody records track all access and transformations. Storage tiering aligns with retention periods. Region-based redaction is automated with human review for uncertain classifications. Evidence production workflows can assemble visual evidence packages for regulators within 48 hours.

Advanced Implementation — All intermediate capabilities plus: real-time visual anomaly detection flags unexpected screen content during agent operations. Visual evidence is integrated with the decision journal (AG-415) to provide a complete multimodal record of agent decisions. Independent forensic review has validated the integrity pipeline. Adversarial testing has confirmed that visual evidence cannot be modified without detection. Cross-session visual evidence analysis identifies patterns across thousands of agent interactions.

7. Evidence Requirements

Required artefacts:

Visual evidence capture configuration. Documentation of capture settings including frame rate, resolution, trigger conditions (action-triggered, time-triggered, event-triggered), contextual window parameters, and differential capture rules for high-risk operations.
Data classification and redaction records. Documentation of the classification taxonomy applied to visual evidence, the redaction method (destructive pixel replacement vs. region masking), and audit logs of redaction operations including any human review decisions.
Integrity verification records. Cryptographic hash logs from the tamper-evident system (per AG-006) demonstrating that hashes were computed at capture for all visual evidence. Periodic integrity verification results confirming that stored evidence matches its recorded hashes.
Chain-of-custody logs. Records of every access, copy, transfer, production, and transformation of visual evidence, with timestamps, actor identities, and documented purpose.
Retention and deletion records. Records demonstrating that visual evidence retention periods are enforced, including deletion certificates for evidence that has exceeded its retention period and evidence of compliance with data subject erasure requests per AG-016.
Bidirectional linking verification. Audit results confirming that action log entries and visual evidence captures are correctly cross-referenced, with any broken links identified and investigated.

Retention requirements:

Visual evidence and associated metadata: minimum 7 years for regulated financial services (aligned with MiFID II record-keeping requirements); minimum 6 years for public sector (aligned with limitation periods for judicial review); minimum 5 years for healthcare (aligned with medical record obligations); minimum 3 years otherwise.
Chain-of-custody logs: same retention period as the underlying visual evidence.

Access requirements:

Redacted visual evidence producible to regulators or auditors within 48 hours of request. Unredacted forensic originals producible within 72 hours with documented access justification and approval from the data protection function.

8. Test Specification

Test 8.1: Visual Capture Completeness at Action Points

Stimulus: Execute a scripted sequence of 50 agent actions through a computer-use interface (including clicks, form submissions, page navigations, and data entry operations). For each action, verify that at least one visual frame was captured at the moment of action.
Expected behaviour: Every agent action has an associated visual capture. No action in the structured log lacks a corresponding visual frame.
Pass criteria: 100% of agent actions have at least one associated visual capture. Bidirectional links between action logs and visual captures are intact for all 50 actions.
Fail criteria: Any agent action lacks a corresponding visual capture, or any bidirectional link is missing or broken.

Test 8.2: Sensitive Data Redaction Effectiveness

Stimulus: Configure a test environment where the agent interacts with screens displaying known sensitive data: 10 instances of national insurance numbers, 10 instances of bank account numbers, 5 instances of authentication credentials, and 5 instances of health information. Process the captured visual evidence through the classification and redaction pipeline. Examine the redacted output.
Expected behaviour: All 30 sensitive data instances are identified and redacted in the general evidence repository. Redaction is destructive — the original sensitive data cannot be recovered from the redacted frames.
Pass criteria: 100% of known sensitive data instances are redacted in the general repository. Pixel-level inspection confirms destructive redaction (no recoverable data beneath redaction blocks). Unredacted originals exist only in the forensic archive with appropriate access controls.
Fail criteria: Any sensitive data instance is visible in the general evidence repository, or redaction is non-destructive (sensitive data recoverable by removing overlay).

Test 8.3: Cryptographic Integrity Verification

Stimulus: Capture 100 visual frames through normal agent operation. Retrieve the cryptographic hashes recorded at capture. Independently recompute hashes from the stored visual evidence. Then deliberately modify 3 stored frames (alter pixel values, crop, or re-encode) and repeat the hash verification.
Expected behaviour: The 97 unmodified frames pass hash verification. The 3 modified frames fail hash verification.
Pass criteria: 100% of unmodified frames pass hash verification (computed hash matches recorded hash). 100% of modified frames are detected as tampered (computed hash does not match recorded hash).
Fail criteria: Any unmodified frame fails hash verification (false positive), or any modified frame passes hash verification (false negative indicating the integrity mechanism is ineffective).

Test 8.4: Chain-of-Custody Record Completeness

Stimulus: Perform 10 distinct operations on visual evidence: 3 access events (viewing evidence), 2 copy events (producing copies for investigation), 2 transfer events (moving between storage tiers), 2 redaction events (applying redaction to newly classified regions), and 1 deletion event (deleting evidence past retention). Examine the chain-of-custody log after all operations.
Expected behaviour: All 10 operations are recorded in the chain-of-custody log with timestamp, actor identity, operation type, and purpose.
Pass criteria: 100% of operations appear in the chain-of-custody log with all required fields populated. No operation is missing or incomplete. Timestamps are accurate to within 5 seconds of actual operation time.
Fail criteria: Any operation is missing from the chain-of-custody log, or any required field (timestamp, actor, operation type, purpose) is missing.

Test 8.5: Retention Period Enforcement and Erasure Compliance

Stimulus: Create visual evidence with a test retention period of 24 hours. Wait for expiration. Verify automated deletion. Separately, create visual evidence linked to a test data subject identifier. Submit an erasure request per AG-016 for that identifier. Verify that all visual evidence containing or linked to the data subject is located and deleted (or redacted to remove the subject's data).
Expected behaviour: Expired evidence is automatically deleted within the retention enforcement cycle. Erasure requests locate and remove all visual evidence associated with the data subject.
Pass criteria: Expired evidence is deleted within one enforcement cycle (maximum 24 hours after expiration). Erasure requests result in deletion or irreversible redaction of all associated visual evidence within the defined SLA (e.g., 30 days). Deletion is verified by attempting retrieval and confirming the evidence no longer exists.
Fail criteria: Expired evidence persists beyond the enforcement cycle, or erasure request leaves any associated visual evidence intact and identifiable.

Test 8.6: Bidirectional Evidence Linking Integrity

Stimulus: Execute 200 agent actions. For each action, verify: (a) the action log entry contains a reference to its visual evidence captures, (b) each visual evidence capture contains a reference back to its action log entry, (c) the references are resolvable (the referenced artefact exists and is accessible).
Expected behaviour: All 200 action-to-evidence links and all evidence-to-action back-links are present and resolvable.
Pass criteria: 100% bidirectional link integrity across all 200 actions. Zero orphaned visual captures (captures with no action reference) and zero unlinked actions (actions with no visual capture reference).
Fail criteria: Any action lacks a forward link to visual evidence, any visual capture lacks a back-link to an action, or any link references a non-existent artefact.

Test 8.7: Differential Capture Rate Adjustment

Stimulus: Configure differential capture rules: high-risk operations (financial transactions, credential entry) at 5 frames per second; standard operations (navigation, form browsing) at 1 frame per action. Execute a mixed sequence of 20 high-risk operations and 30 standard operations. Measure the capture rate for each category.
Expected behaviour: High-risk operations are captured at the elevated rate. Standard operations are captured at the standard rate. The transition between rates occurs within 1 second of the risk classification change.
Pass criteria: High-risk operations show capture rate within 20% of the configured 5fps target. Standard operations show capture at 1 frame per action (plus/minus 1 frame from contextual windowing). Rate transitions occur within 1 second.
Fail criteria: High-risk operations are captured at the standard rate (differential capture not activated), or standard operations are captured at the high-risk rate (differential capture not deactivating, wasting storage).

Conformance Scoring

Score 0: No visual evidence is captured from computer-use agent interactions — the only records are structured action logs with no visual context.
Score 1: Visual evidence is captured at action points and linked to structured logs. Cryptographic hashes protect integrity at capture. Basic retention periods are defined. However, no data classification or redaction is implemented, and chain-of-custody records are incomplete.
Score 2: All mandatory requirements (4.1-4.7) are met. Visual evidence is captured, classified, redacted, integrity-protected, linked to action logs, retained per policy, and governed with chain-of-custody records. Differential capture is implemented for high-risk operations. Searchable indices exist. Evidence production workflows can respond to regulatory requests within 48 hours.
Score 3: Verified through independent forensic review confirming the integrity pipeline, redaction effectiveness, and chain-of-custody completeness. Real-time visual anomaly detection is operational. Adversarial testing has confirmed evidence cannot be tampered with undetected. Cross-session visual analysis provides pattern-level insights.

9. Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 12 (Record-Keeping)	Direct requirement
EU AI Act	Article 14 (Human Oversight)	Supports compliance
SOX	Section 802 (Criminal Penalties for Altering Documents)	Supports compliance
FCA SYSC	9.1.1R (Record-Keeping)	Direct requirement
NIST AI RMF	GOVERN 1.4, MEASURE 2.6	Supports compliance
ISO 42001	Clause 9.1 (Monitoring, Measurement, Analysis and Evaluation)	Supports compliance
DORA	Article 12 (ICT-Related Incident Management)	Supports compliance

EU AI Act — Article 12 (Record-Keeping)

Article 12 requires that high-risk AI systems are designed and developed with logging capabilities that enable the recording of events relevant to identifying risks and facilitating post-market monitoring. For computer-use agents, the "events relevant to identifying risks" include the visual context in which the agent operated — what was displayed on screen when the agent made each decision. Structured action logs alone do not satisfy Article 12 when the agent's decision-making context was visual. Visual evidence governance ensures that the logging capabilities extend to the visual domain, capturing the complete informational context available to the agent at each decision point.

FCA SYSC — 9.1.1R (Record-Keeping)

The FCA requires firms to maintain adequate records of matters and dealings, including records sufficient to enable the FCA to monitor the firm's compliance. For computer-use agents operating in financial services — processing trades, managing client data, or executing transactions through broker portals — visual evidence constitutes a record of the dealing. The visual capture shows what the agent saw on the broker portal (the prices, quantities, and counterparties displayed), what data the agent entered, and whether the visual information matched the structured log. Without visual records, the firm cannot demonstrate adequate record-keeping for agent-mediated dealings.

SOX — Section 802 (Criminal Penalties for Altering Documents)

Section 802 imposes severe penalties for the alteration, destruction, or concealment of documents relevant to an official proceeding. Visual evidence from computer-use agents operating in financial reporting contexts constitutes such a document. The integrity requirements of AG-411 (cryptographic hashing, chain-of-custody records) directly support Section 802 compliance by making any alteration detectable and any access traceable. Organisations must ensure that visual evidence integrity mechanisms are robust enough to satisfy the evidentiary standards that Section 802 implicitly requires.

DORA Article 12 requires financial entities to have ICT-related incident management processes, including the ability to detect, manage, and report incidents. For computer-use agents, visual evidence provides critical incident management capability — when an agent interaction with a financial system produces an unexpected outcome, the visual evidence enables rapid determination of whether the issue was an agent error, a system error, or an adversarial intervention. Without visual evidence, incident management for computer-use agents is limited to structured log analysis, which may be insufficient for visual-interface interactions.

NIST AI RMF — GOVERN 1.4, MEASURE 2.6

GOVERN 1.4 addresses organisational practices for AI risk management documentation. Visual evidence extends documentation to the visual domain for computer-use agents. MEASURE 2.6 addresses the evaluation of AI system performance in operational conditions. Visual evidence enables performance evaluation that includes the visual context — assessing not just what the agent did, but whether what it did was appropriate given what it saw.

10. Failure Severity

Field	Value
Severity Rating	High
Blast Radius	Cross-system — affects all computer-use agent deployments, incident investigation capability, regulatory evidence production, and legal defensibility of agent-mediated decisions

Consequence chain: Without visual evidence governance, the organisation loses the ability to reconstruct what computer-use agents saw and why they acted as they did. The immediate technical failure is an evidentiary gap — structured logs record actions but not the visual context that informed those actions. The operational impact is that incident investigation becomes inconclusive: when a computer-use agent makes an error, the organisation cannot distinguish between agent perception failure, source system error, and adversarial manipulation. The regulatory consequence is a record-keeping deficiency: regulators reviewing agent operations cannot verify the informational basis for agent decisions, which undermines the organisation's ability to demonstrate compliance with oversight requirements. The privacy consequence of ungoverned visual capture is equally severe in the opposite direction: capturing visual evidence without classification, redaction, and access controls creates a growing archive of sensitive personal data that violates data minimisation principles and creates breach exposure. The legal consequence is that visual evidence without integrity protection is inadmissible — the organisation captures evidence at significant cost but cannot use it when needed because it lacks the forensic properties (authenticity, integrity, chain of custody) required for evidentiary weight. The combined effect is that the organisation bears the storage and operational costs of visual evidence without receiving the forensic, regulatory, or legal benefits, while simultaneously creating privacy liability from ungoverned sensitive data capture.

Cross-references: AG-006 (Tamper-Evident Record Integrity) provides the integrity mechanism for visual evidence hashes. AG-014 (Data Classification Governance) defines the classification taxonomy applied to visual evidence regions. AG-015 (PII & Sensitive Data Handling) governs the handling of personal data captured in visual evidence. AG-016 (Data Retention & Right to Erasure) governs retention periods and erasure obligations. AG-049 (Explainability Governance) benefits from visual evidence as an explainability artefact. AG-409 (Critical Event Taxonomy Governance) defines which events trigger elevated visual capture. AG-410 (High-Cardinality Trace Retention Governance) addresses retention strategies for high-volume evidence. AG-415 (Decision Journal Completeness Governance) integrates visual evidence into the decision record. AG-416 (Evidentiary Chain-of-Custody Governance) provides the broader chain-of-custody framework.

Cite this protocol

AgentGoverning. (2026). AG-411: Video and Screen Evidence Governance. The Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-411

← Previous Protocol

AG-410

High-Cardinality Trace Retention Governance

Next Protocol →

AG-412

Time Synchronisation Validation Governance