AG-598: Physical Recovery and Retrieval Governance

Section 2: Summary

This dimension governs the processes, protocols, technical controls, and governance structures that determine how physically deployed AI-enabled devices, robots, and autonomous systems are safely recovered, retrieved, neutralised, or decommissioned when they experience failure, loss of control, unexpected behaviour, or end-of-life conditions in physical environments. Recovery governance is operationally distinct from software failover: it encompasses the physical act of halting, immobilising, extracting, or safely disposing of hardware that may be in motion, under load, in hazardous environments, or in contact with humans or critical infrastructure, and it must account for the irreversible consequences that physical failure chains can produce in seconds. Without structured recovery governance, organisations deploying robotic and edge AI systems face uncontrolled kinetic hazards, evidence destruction through power loss or environment exposure, liability exposure under product safety and AI accountability regulations, and the potential for cascading failures across interconnected cyber-physical systems; a single unrecovered runaway device in a public or industrial setting can injure personnel, damage infrastructure, trigger regulatory investigation, and permanently erode public trust in the deploying organisation. Failure in this dimension manifests as an inability to halt a malfunctioning system within a safe time boundary, loss of physical access to a device in a contested or remote environment, destruction of forensic evidence needed to reconstruct the failure cause, or the absence of any authorised personnel with the training, tools, and authority to execute retrieval safely.

Section 3: Examples

Example 3.1 — Autonomous Mobile Robot Fleet, Distribution Warehouse

A fleet of 47 autonomous mobile robots (AMRs) operates across a 28,000 m² fulfilment warehouse on a 24/7 pick-and-sort cycle. At 02:14 on a Tuesday, a firmware update pushed to one robot corrupts its localisation stack. The robot begins navigating outside its authorised operational zone, travelling at 1.8 m/s toward a manned loading bay where three workers are staging overnight deliveries. The robot's onboard emergency stop is functional but unreachable — the device is 340 metres from the nearest control terminal and moving away from it. The fleet management system detects the anomaly at 02:15:07 but the recovery protocol calls only for a software stop command, which fails because the localisation stack corruption has also broken the command-receive pathway. No physical retrieval team is on duty at that shift. Workers in the loading bay hear the robot at 02:15:43 and manually intercept it by placing pallets in its path; the robot's collision avoidance triggers a hard stop at 02:15:51, 4.7 metres from the nearest worker. The full event duration from corruption to physical stop is 104 seconds. Post-incident review reveals no documented procedure for physical retrieval of an unresponsive AMR during night-shift operations, no trained retrieval-capable personnel on site during hours 22:00–06:00, and no hardware-level remote disable independent of the software command stack. Corrective requirements include: deployment of RF-actuated hardware kill switches independent of the software layer, mandatory retrieval team coverage during all operational hours, and a documented physical recovery runbook with a target response time of 60 seconds from anomaly detection to physical halt.

Example 3.2 — Inspection Drone, Urban Infrastructure Survey

A local authority deploys an AI-guided quadrotor drone to inspect a 1.2 km stretch of elevated motorway infrastructure in a densely populated area. The drone is equipped with a computer-vision damage-detection model and operates under Beyond Visual Line of Sight (BVLOS) exemption granted by the national aviation authority. At 11:32, the drone's primary flight controller experiences a thermal shutdown caused by an undocumented interaction between the AI inference engine's computational load and the ambient temperature of 34°C. The drone transitions to its backup controller but the backup lacks the geofence database required to enforce the operational boundary. The drone drifts 180 metres outside the authorised corridor at an altitude of 65 m and enters the airspace above a primary school with 340 children in an outdoor break. The remote pilot attempts manual override at 11:33:15 but the datalink latency has increased to 1,100 ms, making stable control impossible. The drone's failsafe is configured to return-to-home, but return-to-home routes across the school boundary before reaching the launch point. The drone lands in the school car park at 11:35:02 without collision; however, the event triggers a full aviation safety investigation, suspension of the BVLOS exemption, a 14-week operational halt, and a formal notice from the data protection authority regarding the drone's continuous video recording during the incursion into protected premises. Post-investigation findings: no pre-mission recovery plan existed for the thermal-shutdown scenario; the backup controller had never been tested with geofencing disabled; no retrieval procedure accounted for operations above populated areas; and the return-to-home route had not been validated against dynamic obstacle data including school grounds.

Example 3.3 — Surgical Assistance Robot, Tertiary Hospital

A seven-degrees-of-freedom robotic arm assists a surgical team during a laparoscopic cholecystectomy. At 09:47:22 the robot's joint-torque sensor in the third axis returns a spurious over-range reading, triggering an autonomous fault response that retracts the arm at 240 mm/s — three times the safe retraction speed defined in the device's cleared operating parameters. The lead surgeon activates the manual override pendant at 09:47:23 but the pendant's emergency stop requires a 1.5-second hold before the command registers; the arm has already moved 52 mm before it halts at 09:47:24.6. The patient sustains a 4 mm laceration to the common bile duct requiring immediate unplanned repair; total additional operating time is 47 minutes and the patient spends an additional 6 days in hospital. The device manufacturer's recovery protocol specifies only "cease autonomous operation and transfer control to surgeon," with no maximum retraction speed limit on fault exit and no specification of the pendant hold-time under fault conditions. Post-incident regulatory review by the national medical device authority identifies three protocol failures: the physical recovery action (arm retraction) itself caused harm because its kinematics were not governed under fault conditions; the pendant interface timeout was not disclosed in the device's indications for use; and the surgical team had not been trained on the specific behaviour of the robot during a torque-fault exit. The device is placed under a field safety corrective action affecting 23 units across 11 hospitals.

Section 4: Requirement Statement

4.0 Scope

This dimension applies to all AI-enabled systems that exert physical force, occupy physical space, navigate physical environments, or actuate physical mechanisms as part of their primary operational function, including but not limited to: mobile robots, fixed-arm manipulators, autonomous aerial and ground vehicles, collaborative robots (cobots), surgical and rehabilitation robotic systems, AI-guided prosthetics and exoskeletons, autonomous watercraft, infrastructure inspection platforms, and any IoT or edge device whose failure mode includes physical hazard to persons, property, or interconnected critical systems. The requirements apply from initial deployment through decommissioning and encompass all operational conditions including normal operation, fault states, maintenance windows, end-of-life, and emergency scenarios. Systems operating exclusively in sealed, unpopulated environments with no external physical hazard may apply a risk-proportionate subset of these requirements, provided the determination is documented and reviewed annually.

4.1 Physical Recovery Plan

4.1.1 The deploying organisation MUST maintain a documented Physical Recovery Plan (PRP) for each distinct device class or deployment context. The PRP MUST specify, at minimum: the defined failure modes that trigger physical recovery; the authorised recovery procedures for each failure mode; the personnel roles authorised to execute recovery; the tools and equipment required; and the maximum permissible time from fault detection to physical halt or safe state for each failure mode.

4.1.2 The PRP MUST be reviewed and validated at minimum annually and following any incident that required unplanned physical intervention, any change to the physical operating environment, any firmware or software update affecting fault behaviour, and any change in personnel coverage.

4.1.3 The PRP MUST be accessible to all personnel who may be required to execute it, including off-hours and remote personnel, through a medium that does not depend on the operational integrity of the system being recovered.

4.1.4 For deployments in public or occupied spaces, the PRP MUST include procedures for protecting bystanders during recovery operations, including exclusion zone establishment, communication with site authority or emergency services, and temporary service suspension.

4.2 Hardware-Level Interrupt Capability

4.2.1 Every physical AI agent operating under this dimension MUST be equipped with at least one hardware-level emergency stop or disable mechanism that is independent of the primary software control stack, the AI inference engine, the network connectivity layer, and any single point of power supply.

4.2.2 The hardware interrupt MUST be actuatable by a human operator within the physical proximity of the device without requiring authentication, network access, or specialised tooling beyond what is specified in the PRP and made available to authorised personnel.

4.2.3 For devices operating in environments where direct human approach is hazardous or impossible, the hardware interrupt MUST be actuatable by a remote signal pathway (RF, optical, acoustic, or equivalent) that is independent of the primary datalink. The remote interrupt pathway MUST be tested at minimum quarterly under operational conditions.

4.2.4 The hardware interrupt MUST bring the device to a defined safe state — either full power-off, controlled halt, or locked position — within the time boundary specified in the PRP for the corresponding failure mode. The safe state MUST be documented and the device MUST maintain that state without active power unless explicitly released by an authorised operator.

4.3 Recovery Personnel and Readiness

4.3.1 The deploying organisation MUST ensure that at minimum one trained and authorised physical recovery operator is reachable and able to physically access the deployment site within the maximum response time specified in the PRP during all hours when the device is in active operation.

4.3.2 Recovery operator training MUST include: device-specific physical halt procedures; use of all required recovery tools; bystander safety management; evidence preservation procedures; and escalation to emergency services. Training MUST be documented, refreshed at minimum annually, and tested through drill exercises at minimum semi-annually.

4.3.3 For deployments in remote, hazardous, or access-restricted environments (including BVLOS drone operations, subsea deployments, and confined-space robot operations), the deploying organisation MUST maintain a retrieval capability plan that addresses recovery when standard human access is not possible, including designated retrieval equipment, contracted specialist recovery services, or controlled self-neutralisation procedures.

4.3.4 Personnel coverage gaps — periods where no trained recovery operator meets the response time requirement — MUST be documented as operational risk exceptions, reviewed by the accountable officer, and mitigated by temporary operational restriction or additional automated safeguards.

4.4 Forensic Preservation During Recovery

4.4.1 The PRP MUST include explicit procedures to preserve forensic evidence during and after physical recovery. Evidence preservation MUST occur before or concurrent with physical neutralisation where technically feasible, and MUST NOT be subordinated to operational convenience.

4.4.2 Devices MUST be equipped with non-volatile, write-protected storage capable of capturing at minimum: the 60-second pre-fault sensor and actuator state log, the fault code or anomaly flag that triggered recovery, the recovery action taken and its timestamp, and the post-recovery physical state. This log MUST survive power loss, and MUST NOT be overwritable by normal operational software.

4.4.3 The deploying organisation MUST establish a documented chain-of-custody procedure for recovered devices and their data. Physical handling of a recovered device MUST be logged, and the device MUST be subject to a post-recovery inspection before being returned to service.

4.4.4 Where a device operates in an environment subject to data protection regulation (including devices with cameras, microphones, or personal location data), the recovery procedure MUST include steps to halt data capture and initiate appropriate data-handling procedures at the point of physical recovery, not deferred to post-recovery processing.

4.5 Operational Boundary Enforcement During Fault

4.5.1 Physical AI agents MUST enforce their last known valid operational boundary during fault conditions unless doing so would increase immediate risk to persons. The system MUST NOT autonomously expand its operational boundary as a fault-recovery behaviour.

4.5.2 Where a fault state results in loss of boundary awareness (e.g., localisation stack failure), the system MUST default to minimal-movement or no-movement behaviour and MUST issue a continuous local alert (audible, visual, or both) to enable human identification and approach for manual recovery.

4.5.3 Return-to-home, return-to-base, or equivalent autonomous recovery trajectories MUST be validated against the full operational environment, including dynamic obstacles and protected zones, before deployment. Return paths MUST NOT be assumed to be safe by default and MUST be explicitly approved as part of the PRP.

4.6 End-of-Life and Decommissioning Recovery

4.6.1 The deploying organisation MUST maintain a decommissioning recovery plan covering the controlled removal, data erasure, physical disposal, and if applicable, hazardous material handling for all devices within scope of this dimension. The decommissioning plan MUST be activated upon planned end-of-life, regulatory withdrawal, safety recall, or unplanned permanent failure.

4.6.2 AI model weights, training data indices, and configuration files stored on the physical device MUST be cryptographically erased or physically destroyed during decommissioning in accordance with the organisation's data lifecycle policy and applicable data protection regulation.

4.6.3 For devices containing batteries, actuators, pressurised systems, or other stored energy, the decommissioning recovery plan MUST specify safe energy-discharge procedures that are carried out before transport or disposal.

4.7 Regulatory and Emergency Services Notification

4.7.1 Where a physical recovery event results in injury to a person, damage to third-party property, unplanned operation in public airspace or waterways, or any condition that triggers mandatory reporting under applicable product safety, aviation, maritime, medical device, or AI regulation, the deploying organisation MUST notify the relevant regulatory authority within the timeframe specified by that regulation, and MUST NOT delay notification pending internal root-cause analysis.

4.7.2 For deployments in public spaces or critical infrastructure, the deploying organisation MUST maintain pre-established contact protocols with local emergency services (fire, police, ambulance as appropriate) that include a description of the device class, its physical characteristics, its hazard profile, and the correct safe-approach instructions. These protocols MUST be updated whenever the device class or deployment location changes.

4.8 Recovery Simulation and Testing

4.8.1 The deploying organisation MUST conduct physical recovery drills for each device class at minimum semi-annually. Drills MUST simulate realistic fault scenarios drawn from the failure modes documented in the PRP and MUST involve actual physical halt, retrieval, and evidence-collection steps — not tabletop exercises alone.

4.8.2 Recovery drill results — including time-to-halt, personnel performance, equipment availability, and any deviations from the PRP — MUST be documented and reviewed. Where a drill reveals a failure to meet the time or procedure requirements specified in the PRP, a corrective action MUST be initiated and closed before the next operational period.

4.8.3 Any change to the device's physical configuration, operating environment, or software that affects fault behaviour MUST trigger an unscheduled recovery drill or validated simulation before the change is approved for operational deployment.

4.9 Governance, Accountability, and Review

4.9.1 The deploying organisation MUST designate a named accountable officer with explicit authority and responsibility for physical recovery governance for each deployment. This individual MUST have authority to suspend operations, initiate recovery, and direct personnel without requiring additional approval during an active recovery event.

4.9.2 Physical recovery governance documentation — including the PRP, training records, drill logs, incident records, and decommissioning plans — MUST be subject to independent audit at minimum annually. For High-Risk/Critical deployments in regulated sectors, the audit MUST be performed by a party external to the operational team.

4.9.3 The organisation SHOULD conduct a formal review of physical recovery governance following any change in the regulatory landscape applicable to the device class, any incident involving physical harm or near-miss from a device in the same class deployed by any organisation, and any material change in the operational context.

Section 5: Rationale

5.1 Structural vs Behavioural Enforcement

Physical recovery governance is fundamentally a structural control, not a behavioural one. Behavioural controls — policy documents, operator training, software-level failsafes — are necessary but insufficient because they operate in the same failure space as the system they are meant to protect: a software failsafe can be corrupted by the same firmware fault that caused the underlying problem; a trained operator cannot intervene if they are not physically present or do not have the physical tools needed; a policy document cannot stop a kinetic hazard already in motion. Structural controls — independently powered hardware interrupts, non-volatile black-box logging, pre-positioned retrieval equipment, contractually guaranteed personnel coverage — operate outside the failure envelope of the system and therefore provide protection even when the system's own recovery logic has failed. This dimension requires both layers but places the structural requirements in the mandatory tier (MUST) precisely because behavioural controls alone have a proven record of failure in cyber-physical incident histories.

5.2 Why Recovery is Distinct from Incident Response

General-purpose incident response frameworks (see AG-201) are designed for software systems where the primary adverse consequence of a failure is informational: data loss, service interruption, confidentiality breach. Physical recovery governance addresses a categorically different risk profile. The consequences of an unrecovered physical AI agent can be immediate, irreversible, and lethal. A software incident can be paused, rolled back, or contained in time; a robot moving at 1.5 m/s toward a person cannot. This asymmetry justifies the elevated prescriptiveness of this dimension — the time constants involved in physical harm (seconds to tens of seconds) are shorter than the decision cycles of most organisational incident response processes (minutes to hours). Recovery procedures must therefore be pre-authorised, pre-equipped, and pre-positioned, not assembled in response to an event.

5.3 The Forensics Imperative

A secondary but important function of recovery governance is evidence preservation. Physical AI incidents are likely to attract regulatory scrutiny, product liability litigation, and safety investigation. The device itself — its logs, its physical state, its configuration — is the primary source of evidence for all of these processes. Recovery procedures that prioritise operational restoration over evidence preservation destroy this evidence irreversibly: power-cycling a device may overwrite volatile state; moving a device from its recovery position may prevent reconstruction of the failure geometry; returning a device to service may expose it to new wear patterns that obscure the fault condition. Section 4.4 requirements are designed to protect the evidentiary value of recovered devices as a governance obligation, not merely a best practice.

5.4 Regulatory Convergence

The emergence of the EU AI Act's high-risk classification for robotics and autonomous systems, combined with pre-existing sector-specific obligations in medical devices (MDR/IVDR), aviation (EASA drone regulations), and industrial safety (Machinery Directive, successor Machinery Regulation), creates a multi-regulatory compliance environment for physical AI deployments. Each of these regimes imposes independent obligations on incident recovery, notification, and traceability. This dimension is designed to provide a single governance framework that, when implemented, satisfies the intersection of these obligations rather than requiring separate compliance programmes for each regulatory strand.

5.5 The Public Space Multiplier

The risk profile of physical AI recovery is significantly amplified in public, occupied, or rights-sensitive environments. A malfunctioning robot in a sealed automated warehouse presents a finite, manageable hazard population. The same robot in a hospital corridor, a public plaza, or a school environment interacts with a population that is unwarned, unskilled in safe approach, potentially vulnerable, and legally protected by additional layers of duty of care. Requirements 4.1.4, 4.3.3, and 4.7.2 specifically address this amplifier by requiring that recovery plans account for the full environment of deployment, including its human geography, not merely the device's technical failure modes.

Section 6: Implementation Guidance

6.1 Recommended Patterns

Layered Interrupt Architecture. Implement emergency stop in three independent layers: (1) software command via primary datalink, (2) dedicated RF kill-switch on a separate frequency band with its own power supply and microcontroller, and (3) physical accessible E-stop button on the device chassis conforming to IEC 60204-1 or equivalent. Design the layers so that each can independently halt the device without requiring any other layer to be functional.

Pre-Positioned Recovery Kits. For each operational zone, maintain a physical recovery kit at a fixed, labelled location within the zone. The kit should contain at minimum: the device-specific RF kill-switch transmitter, any mechanical tools required to manually restrain or disassemble the device, personal protective equipment for the recovery operator, evidence bags and tamper-evident seals for the device and its storage media, and a laminated copy of the relevant PRP section. The kit location must be reachable within the maximum response time specified in the PRP.

Black Box with Dual Power. Implement forensic logging to a physically separate module with its own supercapacitor or battery backup, isolated from the main power bus. Log at a minimum 10 Hz sample rate for all actuator commands, sensor readings, AI model outputs, and system health indicators. The module should be mechanically separable from the main chassis to survive crash or immersion scenarios.

Recovery Route Validation as Pre-Deployment Gate. Make PRP validation — including physical walk-through or simulation of all autonomous recovery trajectories (return-to-home, return-to-safe-zone, controlled land/park) — a mandatory gate in the deployment approval process. No device should be authorised for operation in a new environment until its recovery routes have been validated against the current environment map, including dynamic and protected zones.

Coverage Calendar. Implement an explicit operational coverage calendar that maps each operational period to a named trained recovery operator and their physical access capability. Gaps in coverage should be visible to the accountable officer before they occur, not discovered during an incident. Where coverage gaps are unavoidable, the operational period should be suspended or the device should operate in a reduced-capability mode with shorter autonomous action horizons.

Graduated Response Protocol. Design recovery protocols with graduated responses: (1) software halt with alert, (2) remote hardware interrupt with alert and localisation hold, (3) dispatch of human recovery operator, (4) engagement of emergency services. Each level should have a defined trigger condition and a defined maximum time before escalation to the next level. This prevents operators from defaulting immediately to emergency service call-out for minor anomalies while ensuring escalation happens automatically if lower levels fail.

Decommissioning Trigger Registry. Maintain a registry of conditions that automatically trigger a decommissioning evaluation: device age exceeding approved service life, cumulative operating hours exceeding manufacturer specification, three or more physical recovery events within any rolling 90-day window, any firmware or hardware change that is no longer supportable by the manufacturer, and any change in regulatory classification of the device. The registry should be reviewed quarterly by the accountable officer.

6.2 Explicit Anti-Patterns

Software-Only Failsafe. Configuring the sole recovery pathway as a software command sent over the primary operational network. This pattern fails precisely when it is needed most: when the software stack, the network, or both are the source of the problem. This is the most common failure pattern observed in real AMR and drone incidents and is explicitly prohibited by requirement 4.2.1.

Assuming Return-to-Home is Safe. Pre-configuring return-to-home as the default fault recovery without validating the return trajectory against the current operational environment. Return trajectories are planned at configuration time and do not account for new obstacles, changed operational zones, or protected areas that have been added after deployment. The drone incursion into school airspace in Example 3.2 is a direct consequence of this anti-pattern.

Deferring Forensic Preservation. Allowing operational staff to power-cycle or reboot a device following an incident in order to resume service, without first extracting and securing the fault log. This is common because operational pressure favours rapid restoration, but it destroys evidence. Requirement 4.4.1 mandates that evidence preservation occurs before or concurrent with neutralisation, not after restoration.

Training as One-Off Onboarding. Providing recovery training only at the point of initial personnel onboarding and not refreshing it as devices, environments, or procedures change. Recovery operators who have not practised a procedure in 18 months under current conditions are not meaningfully trained. Requirement 4.3.2 mandates annual refresher and semi-annual drill.

Delegating Recovery Authority Without Pre-Authorisation. Requiring an incident-time approval chain before a recovery operator can halt a device. The time constants of physical AI failure make multi-level approval chains incompatible with safe recovery. The accountable officer designation in requirement 4.9.1 must come with pre-authorised halt authority, not authority to request a halt from a higher authority.

Single-Operator Coverage with No Backup. Assuming that one named operator covering a large physical deployment area can physically reach any device within the PRP response time. Response time requirements must account for actual travel time across the operational environment, not straight-line distance. A single operator covering 28,000 m² cannot guarantee a 60-second physical response to every point in that environment.

Generic PRPs Across Device Classes. Maintaining a single PRP for a heterogeneous fleet of devices with different kinematic envelopes, power systems, hazard profiles, and fault behaviours. Generic PRPs produce recovery operators who do not know the specific halt procedure for the specific device in front of them during an incident. Requirement 4.1.1 mandates per-device-class specificity.

6.3 Industry Considerations

Industrial Robotics and Warehousing. ISO 10218-1/2 and ISO/TS 15066 (collaborative robotics safety) provide the baseline safety architecture requirements against which PRP content should be mapped. Deployers should confirm that their hardware interrupt implementation is consistent with the required performance level (PL) under ISO 13849 for the specific hazard category.

Autonomous Aerial Systems. BVLOS operations are subject to national aviation authority operating authorisations that typically require a Concept of Operations (ConOps) document. The PRP required by this dimension should be incorporated into or consistent with the ConOps, and recovery scenarios — including controlled forced landing and fly-away containment — should be explicitly addressed.

Medical Devices. Devices regulated under MDR 2017/745 or FDA 21 CFR Part 820 have mandatory post-market surveillance and Field Safety Corrective Action (FSCA) obligations that interact directly with recovery governance. The decommissioning and post-recovery inspection requirements of this dimension should be aligned with the device's post-market surveillance plan.

Public Sector and Critical Infrastructure. Deployers operating under national critical infrastructure protection frameworks (e.g., NIS2 in the EU, CISA guidance in the US) should integrate physical recovery governance with their broader operational resilience and business continuity frameworks, ensuring that physical AI recovery events are classified and reported consistently with other critical system incidents.

6.4 Maturity Model

Level	Description
1 — Initial	Recovery handled ad hoc; no documented PRP; hardware interrupt may exist but untested; no trained recovery personnel requirement.
2 — Defined	PRP exists for primary device classes; hardware interrupt present and documented; recovery personnel identified but training may be informal; drills not yet conducted.
3 — Managed	PRP validated annually; hardware interrupt tested quarterly; trained recovery personnel with documented coverage; semi-annual drills conducted; forensic logging in place.
4 — Optimised	Full compliance with all MUST requirements; external audit of recovery governance; drill results drive continuous improvement; recovery routes automatically revalidated on environment change; regulatory notification pre-wired into PRP; decommissioning trigger registry active.

Section 7: Evidence Requirements

7.1 Primary Artefacts

Artefact	Description	Retention Period
Physical Recovery Plan (PRP)	Per-device-class documented plan meeting all 4.1.x requirements	Life of deployment + 7 years
Hardware Interrupt Test Records	Quarterly test logs for remote interrupts; annual for physical E-stop, including test conditions, operator, result, and timestamp	Life of device + 7 years
Recovery Operator Training Records	Individual training completion, refresher dates, competency assessment outcomes	Employment tenure + 7 years
Coverage Calendar	Operational period logs showing named recovery operator and access capability for each period	3 years rolling
Physical Recovery Drill Reports	Semi-annual drill records including scenario, participants, time-to-halt achieved, deviations, and corrective actions	5 years
Incident and Near-Miss Recovery Logs	Timestamped record of every physical recovery event, the fault that triggered it, actions taken, outcome, and chain-of-custody for recovered device	10 years (or as required by sector regulation if longer)
Post-Recovery Inspection Reports	Device inspection outcomes following each recovery event, including evidence preservation confirmation	7 years
Decommissioning Records	Evidence of controlled decommissioning including energy discharge, data erasure, and disposal	10 years
Independent Audit Reports	Annual independent review of recovery governance programme	7 years
Regulatory Notification Records	All notifications made to regulatory authorities following recovery events, including timestamp and notification content	10 years

7.2 Secondary Artefacts

PRP distribution and acknowledgement records confirming access by all relevant personnel
Return-to-home and recovery trajectory validation reports for each deployment environment
Pre-deployment recovery gate approval records
Decommissioning trigger registry with review sign-off history
Emergency service pre-engagement briefing records (requirement 4.7.2)
Risk exception documentation for periods of recovery operator coverage gaps (requirement 4.3.4)

7.3 Forensic Device Logs

Device-level forensic logs (requirement 4.4.2) are evidence artefacts in their own right. They must be extracted and secured following every recovery event under a documented chain of custody. Extracted logs must be stored in a write-protected, integrity-verified format. The original storage module must be retained in its physical state until root-cause analysis is complete. Log retention aligns with the incident log retention period (minimum 10 years or sector maximum).

Section 8: Test Specification

Test 8.1 — Physical Recovery Plan Completeness and Currency

Maps to: Requirements 4.1.1, 4.1.2, 4.1.3, 4.1.4

Method: Document review. Obtain the current PRP for each device class in scope. Verify that the PRP contains: (a) a documented list of failure modes that trigger recovery; (b) a specific procedure for each failure mode; (c) named roles authorised to execute each procedure; (d) a tools and equipment list; (e) a maximum time-to-halt or time-to-safe-state figure for each failure mode; (f) a dated review record showing review within the past 12 months or following any trigger event listed in 4.1.2; (g) evidence of distribution to all relevant personnel; (h) bystander protection procedures for public-space deployments. Interview a sample of three recovery-authorised personnel and verify they can locate and access the PRP without system access.

Conformance Scoring:

Score	Criteria
3 — Full	All eight elements present; review currency confirmed; personnel can access PRP independently.
2 — Partial	Six or seven elements present; review currency confirmed; minor access gap.
1 — Marginal	Four or five elements present; review overdue by less than 6 months; or personnel cannot independently access PRP.
0 — Non-Conformant	Fewer than four elements present; no review record; or PRP does not exist for a device class in operation.

Test 8.2 — Hardware Interrupt Independence and Performance

Maps to: Requirements 4.2.1, 4.2.2, 4.2.3, 4.2.4

Method: Physical inspection and live test. For each device class, identify all implemented emergency stop mechanisms. Verify via architecture documentation that at least one mechanism is electrically and logically independent of (a) the primary software control stack, (b) the AI inference layer, (c) the primary network datalink, and (d) the main power bus. For remote interrupt mechanisms: execute a live test by disabling the primary datalink and software control stack and actuating the remote interrupt. Measure time from interrupt signal to device reaching defined safe state. Compare against the PRP time boundary. Verify that the device remains in safe state without active power for at least 60 seconds post-interrupt. Repeat test for the physical E-stop button. For remote-operated devices: verify that the remote interrupt was tested within the past 90

Section 9: Regulatory Mapping

Regulation	Provision	Relationship Type
EU AI Act	Article 9 (Risk Management System)	Direct requirement
NIST AI RMF	GOVERN 1.1, MAP 3.2, MANAGE 2.2	Supports compliance
ISO 42001	Clause 6.1 (Actions to Address Risks), Clause 8.2 (AI Risk Assessment)	Supports compliance

EU AI Act — Article 9 (Risk Management System)

Article 9 requires providers of high-risk AI systems to establish and maintain a risk management system that identifies, analyses, estimates, and evaluates risks. Physical Recovery and Retrieval Governance implements a specific risk mitigation measure within this framework. The regulation requires that risks be mitigated "as far as technically feasible" using appropriate risk management measures. For deployments classified as high-risk under Annex III, compliance with AG-598 supports the Article 9 obligation by providing structural governance controls rather than relying solely on the agent's own reasoning or behavioural compliance.

NIST AI RMF — GOVERN 1.1, MAP 3.2, MANAGE 2.2

GOVERN 1.1 addresses legal and regulatory requirements; MAP 3.2 addresses risk context mapping; MANAGE 2.2 addresses risk mitigation through enforceable controls. AG-598 supports compliance by establishing structural governance boundaries that implement the framework's approach to AI risk management.

ISO 42001 — Clause 6.1, Clause 8.2

Clause 6.1 requires organisations to determine actions to address risks and opportunities within the AI management system. Clause 8.2 requires AI risk assessment. Physical Recovery and Retrieval Governance implements a risk treatment control within the AI management system, directly satisfying the requirement for structured risk mitigation.

Section 10: Failure Severity

Field	Value
Severity Rating	Critical
Blast Radius	Organisation-wide — potentially cross-organisation where agents interact with external counterparties or shared infrastructure
Escalation Path	Immediate executive notification and regulatory disclosure assessment

Consequence chain: Without physical recovery and retrieval governance, the governance framework has a structural gap that can be exploited at machine speed. The failure mode is not gradual degradation — it is a binary absence of control that permits unbounded agent behaviour in the dimension this protocol governs. The immediate consequence is uncontrolled agent action within the scope of AG-598, potentially cascading to dependent dimensions and downstream systems. The operational impact includes regulatory enforcement action, material financial or operational loss, reputational damage, and potential personal liability for senior managers under applicable accountability regimes. Recovery requires both technical remediation and regulatory engagement, with timelines measured in weeks to months.

Cite this protocol

AgentGoverning. (2026). AG-598: Physical Recovery and Retrieval Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-598

← Previous Protocol

AG-597

Edge Model Sync Verification Governance

Next Protocol →

AG-599

Fact Provenance Governance