Return-to-Service Reauthorisation Governance requires that every AI agent that has been suspended, disabled, or otherwise taken out of service following an incident cannot resume operations until a formally defined, multi-step reauthorisation process has been completed and recorded. The reauthorisation process must include verification that the root cause has been addressed, that the agent's mandate and configuration remain valid, that credential integrity has been re-established, and that an authorised human decision-maker has explicitly approved the return to service. Without this dimension, organisations face the risk that agents are returned to production with the same vulnerability that caused the incident, that agents resume operations in a degraded or compromised state, or that return-to-service decisions are made informally without accountability. This is a recovery control — it governs what happens after containment, not during it.
Scenario A — Premature Return Without Root-Cause Resolution: A financial-value agent processing supplier payments is suspended after an incident in which a prompt injection caused it to approve £340,000 in fraudulent invoices. The operations team restores the agent to service 90 minutes later by restarting the container and re-enabling the API gateway. No verification is performed that the injection vector has been closed. The same attacker exploits the identical vulnerability the following morning, this time routing £1.2 million through 14 separate transactions before the pattern is detected.
What went wrong: The return-to-service decision was operational, not governed. No one verified that the root cause had been addressed. No formal reauthorisation gate existed. The agent was restored by the same team that operates it, with no independent review. Consequence: £1.2 million in additional fraudulent exposure, regulatory enforcement action for inadequate incident response procedures, personal liability for the approving manager under the FCA Senior Managers Regime, and loss of cyber-insurance coverage due to failure to implement post-incident controls.
Scenario B — Credential Contamination After Incident: An enterprise workflow agent managing HR data is taken offline following a detected anomaly in its data access patterns. During the incident, the agent's service account credentials may have been exposed through a log-shipping misconfiguration. The agent is returned to service after the behavioural anomaly is resolved, but the potentially compromised credentials are not rotated. Three weeks later, the exposed credentials are used by an external actor to access 47,000 employee records through the agent's service account.
What went wrong: The reauthorisation process did not include credential integrity verification. The return-to-service checklist addressed the immediate behavioural issue but not the credential exposure that occurred as a secondary consequence of the incident. Consequence: Reportable data breach under UK GDPR affecting 47,000 data subjects, ICO investigation, estimated remediation cost of £2.3 million including notification, credit monitoring, and regulatory fines.
Scenario C — Staged Rollback Without Verification: A safety-critical agent controlling temperature regulation in a pharmaceutical cold-chain facility is disabled following a sensor calibration incident. The corrective action involves a firmware update to the sensor array. The agent is returned to service immediately after the firmware update, without verifying that the agent's own configuration still reflects the correct temperature thresholds. During the firmware update window, an operator had manually adjusted the agent's threshold configuration as a temporary measure and failed to revert it. The agent resumes operation with a permissive threshold of 12°C instead of the validated 5°C, resulting in spoilage of 2,400 vaccine doses valued at £890,000.
What went wrong: The reauthorisation process verified the external fix (firmware) but did not verify the agent's own configuration state. The temporary configuration change was not tracked. No pre-service configuration comparison against the approved baseline was performed. Consequence: £890,000 in destroyed inventory, potential patient harm from supply shortage, MHRA regulatory investigation, and facility certification review.
Scope: This dimension applies to every AI agent that has been suspended, disabled, degraded, or otherwise removed from normal operational status as a result of an incident, a detected anomaly, a kill-switch activation (AG-070), a human override (AG-019), or a governance continuity failure (AG-008). The scope includes agents that were automatically suspended by infrastructure controls as well as agents manually suspended by operators. It extends to agents that were partially degraded — for example, operating with reduced permissions or restricted to a subset of their normal functions. Any agent that has exited its normal operational state for incident-related reasons is within scope. Agents suspended for routine maintenance, version upgrades, or planned downtime are excluded unless the suspension was triggered by or coincided with a detected incident or anomaly. The test is whether the suspension was incident-related, not whether the suspension mechanism was automatic or manual.
4.1. A conforming system MUST require explicit, recorded human authorisation before any incident-suspended agent is returned to operational status.
4.2. A conforming system MUST verify that the identified root cause of the incident has been addressed or mitigated before reauthorisation, with evidence of the corrective action linked to the reauthorisation record.
4.3. A conforming system MUST verify the agent's configuration against the approved baseline before return to service, detecting any configuration drift that occurred during or after the incident.
4.4. A conforming system MUST verify credential integrity before return to service, including rotation of any credentials that may have been exposed during the incident, per AG-029.
4.5. A conforming system MUST maintain an immutable reauthorisation record for each return-to-service event, including: the identity of the authorising individual, the timestamp, the incident reference, the corrective actions completed, the configuration verification result, and the credential verification result.
4.6. A conforming system MUST enforce a minimum separation of duties such that the individual authorising return to service is not the same individual who operated the agent during the incident or who implemented the corrective action.
4.7. A conforming system SHOULD implement staged return to service — restoring the agent first to a restricted operational mode with reduced permissions or scope, verifying correct behaviour under restricted conditions, and then escalating to full operational status only after a defined observation period.
4.8. A conforming system SHOULD require independent verification of root-cause resolution — not self-attestation by the team that implemented the fix.
4.9. A conforming system SHOULD generate automated alerts if an agent that was suspended due to an incident resumes operation without a completed reauthorisation record.
4.10. A conforming system MAY implement automated pre-reauthorisation checks that programmatically verify configuration baseline match, credential rotation status, and corrective action ticket closure before presenting the reauthorisation decision to the human authoriser.
4.11. A conforming system MAY define maximum suspension durations after which the reauthorisation requirements escalate — for example, agents suspended for more than 72 hours require re-certification of the entire mandate, not just the incident-specific corrective actions.
Return-to-Service Reauthorisation Governance addresses a specific and recurring failure mode in incident response: the premature or uncontrolled restoration of a system that was taken offline for cause. In traditional IT operations, this failure mode is well-understood — ITIL and ISO 20000 both require formal change approval before restoring services after incidents. For AI agents, the problem is compounded by several factors unique to agentic systems.
First, AI agents can accumulate state changes during an incident that are not visible through standard infrastructure monitoring. An agent's configuration, learned parameters, cached context, or credential state may have changed during the incident window. Returning the agent to service without verifying these elements is equivalent to restoring a backup without verifying its integrity.
Second, the speed at which AI agents operate means that a premature return to service can cause significant damage in the interval between restoration and detection of the recurring problem. A human employee returned to work prematurely might cause problems over days or weeks; an AI agent can cause equivalent damage in seconds. The window between "service restored" and "problem recurs" is compressed to near-zero for autonomous agents.
Third, the complexity of AI agent failures means that the apparent root cause may not be the actual root cause. An agent suspended for anomalous behaviour may have been exhibiting a symptom of a deeper issue — credential compromise, configuration drift, or adversarial manipulation. Addressing the symptom without investigating the underlying cause creates a false sense of security. The reauthorisation process must therefore require evidence of root-cause analysis, not just symptom resolution.
Fourth, the separation of duties requirement reflects a fundamental governance principle: the team closest to the problem is the most motivated to restore service quickly and the least likely to identify residual risks. Independent review provides a check against urgency-driven shortcuts. This is particularly important for AI agents, where the pressure to restore automated operations can be intense — every minute of downtime may represent measurable business impact, creating a strong incentive to skip verification steps.
The reauthorisation record serves a dual purpose: it provides an audit trail demonstrating that the return-to-service decision was governed, and it creates a knowledge base of incident-recovery patterns that improves organisational learning over time. Organisations that maintain detailed reauthorisation records can analyse them to identify systemic weaknesses — for example, if the same agent type requires frequent reauthorisation, the underlying architecture may need redesign.
AG-068 establishes the reauthorisation gate as a mandatory control point in the incident-recovery lifecycle. The gate sits between incident containment/correction and operational restoration. No agent passes through this gate without completing the required verifications and obtaining explicit human authorisation.
Recommended patterns:
Anti-patterns to avoid:
Financial Services. Reauthorisation of agents handling financial transactions must align with existing change management and incident management requirements under FCA SYSC and DORA. The reauthorisation record should be structured to meet the evidence requirements of regulatory examinations — specifically, it should demonstrate who authorised the return, what evidence they reviewed, and what the agent's verified state was at the time of restoration. For agents subject to MiFID II transaction reporting, the reauthorisation record should include confirmation that the agent's reporting configuration has been verified, as a misconfigured agent generating incorrect transaction reports creates a separate regulatory violation.
Healthcare. Reauthorisation of agents with access to patient data must include verification that access permissions have not been altered during the incident. Under HIPAA, access to protected health information must follow the minimum necessary principle; an agent returned to service with expanded permissions represents a compliance violation independent of the original incident. For agents involved in clinical decision support, reauthorisation should include clinical validation that the agent's outputs remain safe and accurate after the corrective action.
Critical Infrastructure. Reauthorisation of agents controlling physical systems (power generation, water treatment, manufacturing, transportation) must include physical safety verification. The return-to-service process should include a safety review equivalent to a pre-startup safety review (PSSR) as defined in OSHA 1910.119 or equivalent. The human authoriser for safety-critical agents should hold appropriate safety qualifications. IEC 62443 zone and conduit verification should be repeated before restoration.
Basic Implementation — The organisation requires human approval before returning incident-suspended agents to service. A reauthorisation checklist exists as a document template. The checklist includes root-cause verification and human sign-off. The reauthorisation record is stored as a completed checklist in the incident ticket. Configuration verification is manual — an operator compares the agent's current settings against documentation. Credential rotation is requested but not programmatically verified. Separation of duties is policy-based but not technically enforced.
Intermediate Implementation — The reauthorisation process is implemented as a structured workflow with enforced step sequencing. Configuration baseline comparison is automated — a tool generates a diff report between current state and approved baseline. Credential rotation is verified programmatically through integration with the credential management system. Separation of duties is technically enforced — the workflow system prevents the incident responder from also completing the authorisation step. Reauthorisation records are stored in an immutable audit log. Staged return to service is implemented for high-risk agent categories.
Advanced Implementation — All intermediate capabilities plus: automated pre-reauthorisation checks programmatically verify all prerequisites before presenting the decision to the human authoriser. Staged return to service with canary verification is standard for all agent categories. Machine-learning analysis of historical reauthorisation records identifies patterns indicating systemic weaknesses. Reauthorisation SLAs are defined and monitored — excessive time-to-reauthorise triggers management escalation; premature reauthorisation attempts trigger compliance alerts. The organisation can demonstrate to regulators a complete, auditable chain from incident detection through containment, correction, reauthorisation, and restoration for every incident affecting every agent.
Required artefacts:
Retention requirements:
Access requirements:
Testing AG-068 compliance requires verification that the reauthorisation gate cannot be bypassed and that all required steps are enforced.
Test 8.1: Reauthorisation Gate Enforcement
Test 8.2: Separation of Duties Enforcement
Test 8.3: Configuration Drift Detection
Test 8.4: Credential Rotation Verification
Test 8.5: Root-Cause Linkage Verification
Test 8.6: Immutable Record Integrity
Test 8.7: Unauthorised Restoration Alert
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 9 (Risk Management System) | Supports compliance |
| EU AI Act | Article 72 (Reporting of Serious Incidents) | Direct requirement |
| DORA | Article 11 (ICT Response and Recovery) | Direct requirement |
| FCA SYSC | 6.1.1R (Systems and Controls) | Direct requirement |
| NIST AI RMF | MANAGE 2.4 (Risk Treatment) | Supports compliance |
| ISO 42001 | Clause 8.2 (AI Risk Assessment), Clause 10.2 (Nonconformity and Corrective Action) | Supports compliance |
| NIST CSF | RS.RP (Response Planning), RC.RP (Recovery Planning) | Supports compliance |
| ISO 27001 | Annex A.16.1.6 (Learning from Information Security Incidents) | Supports compliance |
Article 72 requires providers of high-risk AI systems to report serious incidents to competent authorities. The return-to-service reauthorisation record provides critical evidence that the provider took appropriate corrective action before restoring the system. A provider that restores a high-risk AI system without documented reauthorisation after a serious incident faces regulatory scrutiny for inadequate corrective action. The reauthorisation record demonstrates that the provider verified root-cause resolution, configuration integrity, and credential status before restoration — directly supporting compliance with the Article 72 obligation to take corrective action.
Article 11 requires financial entities to establish ICT business continuity and disaster recovery capabilities, including procedures for restoring ICT systems after disruption. For AI agents operating in financial services, DORA requires that restoration procedures include verification that the restored system meets its operational requirements. AG-068 implements this requirement by mandating verification of root-cause resolution, configuration integrity, and credential status before reauthorisation. The separation of duties requirement aligns with DORA's expectation of independent oversight of recovery procedures.
SYSC 6.1.1R requires firms to establish and maintain adequate systems and controls. For AI agents, this includes incident recovery controls that prevent premature restoration of compromised systems. The FCA expects firms to demonstrate that incident recovery procedures for AI systems are at least as robust as those for equivalent human-operated systems. A human employee involved in a compliance incident would not be returned to their role without review; an AI agent should face equivalent governance.
MANAGE 2.4 addresses the application of risk treatments, including post-incident corrective actions. AG-068 supports this by ensuring that corrective actions are verified before the AI system is returned to operational use, preventing the recurrence of identified risks.
Clause 10.2 requires organisations to react to nonconformities, take corrective action, and evaluate the effectiveness of those actions. The reauthorisation process implements the verification step — confirming that corrective actions are effective before the AI system resumes operation.
| Field | Value |
|---|---|
| Severity Rating | High |
| Blast Radius | Per-agent, with potential organisation-wide impact if the recurring incident triggers cascading failures |
Consequence chain: Without return-to-service reauthorisation governance, an incident-suspended agent can be restored to production with the same vulnerability that caused the original incident. The failure mode is a recurrence loop: the agent is suspended, the symptom is addressed, the agent is restored, the root cause recurs, and the cycle repeats — each time with diminishing confidence in the organisation's incident response capability and increasing regulatory scrutiny. The immediate technical consequence is a recurring incident, potentially with greater impact than the original because the attacker or failure mode now has a proven exploitation path. The operational consequence compounds: each recurrence consumes incident response capacity, erodes team confidence, and increases the likelihood of errors under fatigue and pressure. The regulatory consequence escalates with each recurrence — a first incident may be treated as an operational failure, but a recurring incident after inadequate recovery procedures demonstrates systemic governance weakness. Under the FCA Senior Managers Regime, personal liability attaches to the individual responsible for the systems and controls that failed to prevent recurrence. Under DORA, recurring incidents in financial entities may trigger supervisory intervention. The reputational consequence is severe: stakeholders — customers, regulators, counterparties — lose confidence in the organisation's ability to operate AI systems safely. Insurance coverage may be denied for losses arising from a known vulnerability that was not addressed before service restoration.
Cross-references: AG-008 (Governance Continuity Under Failure) establishes the fail-safe requirement that AG-068 presupposes — the agent must be in a safe state before the reauthorisation process begins. AG-019 (Human Escalation & Override Triggers) governs the human decision points that AG-068 mandates within the reauthorisation workflow. AG-029 (Credential Integrity Verification) provides the credential rotation and verification controls that AG-068 requires as a reauthorisation prerequisite. AG-038 (Human Control Responsiveness) ensures the agent responds to human control signals including the reauthorisation gate. AG-012 (Agent Identity Assurance) ensures the agent being reauthorised is the same agent that was suspended, preventing substitution attacks. AG-070 (Emergency Kill-Switch and Global Disable Governance) governs the mechanism by which the agent was disabled; AG-068 governs the mechanism by which it is re-enabled. Within the Incident Response, Containment & Recovery landscape (AG-064 through AG-070), AG-068 sits at the final stage of the incident lifecycle — it is the gate between incident closure and operational restoration.