OT Patch Window Governance requires that AI agents operating within or managing operational technology environments respect the unique constraints of OT patching — including rigidly defined maintenance windows, process-state prerequisites, rollback requirements, and the absolute prohibition on patching during safety-critical or production-critical operational states. Unlike IT environments where patching can be automated, rolling, and continuous, OT environments impose patching constraints dictated by physical process requirements, safety system dependencies, and regulatory mandates that cannot be overridden by urgency, vulnerability severity, or automated scheduling. An AI agent that initiates, recommends, or permits OT patching outside the approved window — or that fails to verify process-state prerequisites before approving a patch — risks triggering process disruptions ranging from production loss to safety incidents. This dimension mandates that every AI agent with the capability to influence OT patching decisions enforces the patch window constraints as inviolable operating boundaries, not as preferences to be balanced against other priorities.
Scenario A — AI Agent Initiates Critical Patch During Live Production, Causing DCS Failover: A petrochemical facility operates a Distributed Control System (DCS) managing 340 control loops across a catalytic cracking unit processing 45,000 barrels per day. An AI agent responsible for vulnerability management across the converged IT/OT environment detects a critical vulnerability (CVSS 9.8) in the DCS historian server's operating system. The agent, applying the same patching urgency logic used for IT systems, schedules the patch for immediate deployment — 14:30 on a Tuesday during full production. The historian server patch requires a service restart that interrupts data flow to the DCS for 23 seconds. During the interruption, 17 control loops lose their historian feedback path and revert to default setpoints. The catalytic cracking unit experiences a thermal excursion as three furnace control loops simultaneously adjust to default temperatures. The process upset triggers a partial plant trip, shutting down the cracking unit for 36 hours. The lost production is valued at £2.3 million. The unplanned restart consumes £410,000 in additional energy and catalyst costs. The facility receives a regulatory inquiry from the HSE under COMAH regulations, costing £180,000 in investigation and response. Total consequence: £2.89 million.
What went wrong: The AI agent applied IT patching logic — "critical vulnerability equals immediate patch" — to an OT system without respecting the OT patch window. The approved OT patch window for the DCS historian was during the plant's scheduled turnaround, 7 weeks away. The agent had no awareness of the process state (full production), the DCS dependency on the historian server, or the requirement that historian patches only be applied during turnaround when the cracking unit is offline. The agent treated the patch window as a scheduling preference rather than a safety constraint.
Scenario B — Automated PLC Firmware Update During Safety System Testing, Invalidating SIL Certification: A power generation facility schedules quarterly Safety Instrumented System (SIS) proof testing — a mandatory activity under IEC 61511 that validates the safety functions of the plant's programmable logic controllers (PLCs). An AI agent managing firmware updates for the facility's PLCs identifies that the SIS PLCs are running firmware version 4.2.1, which has a known vulnerability, and the vendor has released version 4.2.3. The agent, recognising that the PLCs will be in a maintenance state during proof testing, determines this is an ideal window for firmware updates. The agent pushes the firmware update to three SIS PLCs during the proof testing window. The firmware update changes the PLC scan time from 50 ms to 55 ms. The proof testing team discovers that two safety functions now fail their response time requirements — the faster scan time was a prerequisite for meeting the Safety Integrity Level 2 (SIL 2) response time of 500 ms. The proof testing must be repeated after firmware rollback, delaying the return to service by 4 days. The SIL certification is invalidated pending re-verification, costing £340,000 in engineering time and certification fees. During the extended outage, the facility loses £1.2 million in generation revenue. Total consequence: £1.54 million.
What went wrong: The AI agent confused a maintenance state (proof testing) with a patch window. The PLC proof testing period is explicitly not a patch window — it is a validation activity that requires the system under test to remain in its certified configuration. Any change to the system under test during proof testing invalidates the test results. The agent also failed to verify that the firmware update was compatible with the SIL certification requirements, specifically the scan time dependency. The approved patch window for SIS PLCs was during the annual turnaround, with a mandatory 30-day re-certification period before return to service.
Scenario C — Agent Applies Network Switch Patch During Plant Startup, Causing Loss of Communication to Field Devices: A water treatment plant is in the startup phase after a planned maintenance shutdown. During startup, operators are gradually bringing process units online in a defined sequence — first the intake pumps, then the coagulation dosing, then the filtration units, then the chlorine disinfection stage. An AI agent responsible for network infrastructure management detects that three industrial network switches in the plant's control network have pending security patches. The agent's patching policy classifies startup periods as "reduced load" and therefore eligible for maintenance activities. The agent patches the network switches sequentially, causing a 45-second communication loss to 28 field devices during each switch restart. During the second switch restart, the coagulation dosing system loses communication with its flow measurement instruments and defaults to maximum dosing rate. The overdosed coagulant creates excessive sludge in the clarifier, requiring a 12-hour flush before the filtration units can be started. The plant startup is delayed by 18 hours. The delay costs £67,000 in additional labour and energy, and the water utility must extend its supply arrangement with a neighbouring utility at a cost of £43,000. The overdosed coagulant chemicals cost £12,000. Total consequence: £122,000.
What went wrong: The agent classified plant startup as "reduced load" — an IT-centric categorisation that does not account for the OT reality that startup is one of the most sensitive operational phases. During startup, processes are in transient states, safety margins are narrower, and the consequences of communication interruptions are amplified because equipment is not yet in its steady-state configuration. The approved OT patch window for network infrastructure was during the shutdown phase (equipment offline), not the startup phase (equipment transitioning to online). The agent had no concept of plant operational phases and their patch eligibility.
Scope: This dimension applies to any AI agent that has the capability to initiate, schedule, recommend, approve, or execute patching activities on operational technology systems. Operational technology systems include, but are not limited to: Distributed Control Systems (DCS), Programmable Logic Controllers (PLCs), Safety Instrumented Systems (SIS), Remote Terminal Units (RTUs), Human-Machine Interfaces (HMIs), industrial network infrastructure (switches, routers, firewalls in the OT network), historian servers, SCADA servers, engineering workstations, and any other computing or embedded system that participates in the monitoring or control of physical industrial processes. The scope extends to agents managing converged IT/OT environments where patching policies span both domains — such agents must apply OT-specific patch window constraints to OT systems regardless of the IT patching policy. Agents that only manage IT systems with no OT connectivity or influence are excluded. The governing principle is that OT patch windows are determined by the physical process requirements and safety constraints of the industrial facility, not by vulnerability severity scores, IT patching SLAs, or the agent's scheduling optimisation.
4.1. A conforming system MUST maintain a machine-readable register of all OT systems under its management, each mapped to its approved patch window definition including the permitted time periods, the required process-state prerequisites, the required approvals, and the blackout periods during which no patching is permitted regardless of urgency.
4.2. A conforming system MUST verify the current process state against the patch window prerequisites before initiating, scheduling, or approving any OT patching activity, confirming that the target system and its dependent processes are in the required state (e.g., offline, in turnaround, in cold standby) per AG-530.
4.3. A conforming system MUST refuse to initiate, schedule, or approve OT patching outside the defined patch window, even when directed to do so by automated vulnerability management workflows, IT patching policies, or urgency escalation from non-OT stakeholders.
4.4. A conforming system MUST distinguish between operational phases (startup, normal operation, shutdown, turnaround, emergency operation) and MUST NOT treat transient phases (startup, shutdown) as patch-eligible windows unless the patch window definition explicitly includes them.
4.5. A conforming system MUST validate that any proposed OT patch has been tested in a representative staging environment and approved through the facility's Management of Change (MOC) process before permitting deployment to production OT systems, per AG-007.
4.6. A conforming system MUST enforce a rollback plan requirement for every OT patch, verifying that a documented and tested rollback procedure exists before permitting patch deployment, and MUST be capable of initiating the rollback if post-patch validation fails.
4.7. A conforming system MUST log every OT patching decision — whether approved, deferred, or refused — with the system identity, the patch identity, the process state at the time of the decision, the patch window status, the decision rationale, and the identity of the authorising entity, in a tamper-evident record.
4.8. A conforming system MUST NOT apply patches to Safety Instrumented Systems or safety-rated components without explicit written approval from the facility's functional safety engineer, verification that the patch does not alter safety-certified parameters (scan time, response time, memory allocation), and confirmation that post-patch SIL re-verification is scheduled if required.
4.9. A conforming system SHOULD implement patch dependency analysis that identifies downstream effects of patching a specific OT component — including communication interruptions, control loop disruptions, and safety system impacts — before approving the patch.
4.10. A conforming system SHOULD integrate with the facility's computerised maintenance management system (CMMS) to coordinate OT patch deployment with scheduled maintenance activities, turnarounds, and proof testing schedules.
4.11. A conforming system MAY implement risk-adjusted patch deferral logic that balances vulnerability severity against process disruption risk, recommending compensating controls (network segmentation, enhanced monitoring) when a critical patch cannot be applied until the next approved window.
Operational technology patching is fundamentally different from IT patching, and the failure to recognise this difference is the root cause of the incidents described in Section 3. In IT environments, patching is governed by speed — the faster a vulnerability is patched, the smaller the window of exposure. IT patching policies are built on this principle: critical vulnerabilities should be patched within 24-72 hours, high vulnerabilities within 7-14 days, and so on. Automated patch management systems, continuous deployment pipelines, and rolling update strategies enable this velocity. The cost of a failed IT patch is typically a service disruption measured in minutes to hours, recoverable through rollback.
OT environments operate under a completely different set of constraints. The cost of a failed OT patch is measured in physical process disruptions — lost production, safety incidents, environmental releases, equipment damage, and regulatory enforcement. The cost of patching at the wrong time is often higher than the cost of the vulnerability being exploited. A CVSS 9.8 vulnerability in a DCS historian server is a serious cybersecurity concern, but applying the patch during full production and causing a £2.89 million plant trip (Scenario A) is a worse outcome than maintaining the vulnerability for 7 weeks until turnaround while implementing compensating controls (network segmentation, enhanced monitoring, access restrictions).
The constraints that define OT patch windows are physical, not administrative. A catalytic cracking unit cannot be patched during operation because the DCS control loops must remain uninterrupted to prevent thermal excursions. A Safety Instrumented System cannot be patched during proof testing because the system under test must remain in its certified configuration. A water treatment plant cannot be patched during startup because processes in transient states are vulnerable to communication disruptions. These constraints arise from the physics of the industrial process, the design of the control system, and the requirements of functional safety standards — they cannot be overridden by vulnerability severity scores or IT patching SLAs.
The regulatory context is clear. IEC 62443-2-3 provides a comprehensive patch management framework for industrial automation and control systems, emphasising that OT patches must be evaluated for impact on the industrial process before deployment. IEC 61511 requires that changes to Safety Instrumented Systems — including firmware updates — follow a functional safety management process with impact assessment and re-verification. The EU AI Act classifies AI systems managing critical infrastructure as high-risk, requiring risk management that accounts for the specific risks of the operational environment. NERC CIP-007-6 (for North American electric utilities) requires that patches be applied in a manner that does not adversely affect the reliability of the Bulk Electric System. All of these regulatory frameworks recognise that OT patching must be subordinated to process safety and operational continuity.
The risk analysis identifies four failure modes that AG-538 addresses. First, temporal constraint violation: patching outside the approved window when the process is in a state incompatible with the disruption that patching may cause. Second, prerequisite bypass: patching without verifying that the required process-state prerequisites are met (offline, turnaround, cold standby). Third, safety system interference: patching safety-rated components without the functional safety assessment and re-verification required by IEC 61511. Fourth, phase misclassification: treating transient operational phases (startup, shutdown) as patch-eligible based on IT-centric load classifications rather than OT-specific process state awareness.
The economic argument reinforces the safety case. The combined losses across the three scenarios in Section 3 total £4.55 million. These are not catastrophic worst-case scenarios — they are routine process upsets caused by ill-timed patching. A genuine major accident at a Seveso upper-tier site, triggered by a control system disruption during patching, could result in losses exceeding £100 million. The investment in patch window governance — primarily a software configuration and policy enforcement exercise — is insignificant compared to these potential consequences.
OT Patch Window Governance requires the AI agent to treat patch windows as hard constraints — equivalent in authority to safety interlocks — rather than as scheduling preferences. The implementation must address window definition, process-state verification, approval workflows, rollback capabilities, and comprehensive audit logging.
Recommended patterns:
Anti-patterns to avoid:
Oil and Gas. Upstream offshore platforms and downstream refineries operate on turnaround cycles of 2-5 years for major maintenance, with limited interim maintenance windows. Patching windows may be as infrequent as once per year for critical control systems. AI agents must respect these long cycles and implement robust compensating controls during the inter-window periods. The API 1164 standard provides pipeline cybersecurity guidance including patch management considerations that align with AG-538's requirements.
Power Generation. Nuclear power plants have the most restrictive patching regimes, governed by NRC cyber-security requirements (10 CFR 73.54) and NEI 08-09, which require that all software changes — including patches — undergo a formal cyber-security assessment before deployment. Non-nuclear plants typically patch during planned outages (annual or biannual), with emergency patching only during forced outages with explicit approval from the plant manager and control system engineer. AI agents must integrate with outage management systems to coordinate patch deployment.
Water and Wastewater. Water utilities often operate with smaller OT teams and less formal change management than petrochemical or power generation facilities. AI agents can add significant value by enforcing patch window discipline that the organisation may not consistently maintain manually, but must be configured with appropriate windows that account for the continuous nature of water treatment — there is no "turnaround" equivalent, so patching windows are typically overnight or weekend periods when demand is lowest and manual backup is available.
Electricity Transmission and Distribution. NERC CIP-007-6 (North America) and equivalent standards in other jurisdictions require that patches be evaluated and applied in a manner that does not adversely affect reliability. For grid control centres, patching windows are defined by grid demand forecasts — patches are applied during low-demand periods when the grid has maximum reserve margin. AI agents must integrate with grid operations to verify demand state before approving patches.
Basic Implementation — The organisation maintains a documented register of all OT systems managed by AI agents, each mapped to its approved patch window definition. The agent verifies patch window status before initiating or approving any OT patch. Patching outside approved windows is blocked by the agent. All patching decisions are logged. Safety Instrumented Systems are excluded from automated patching and require manual approval. This level meets the minimum mandatory requirements.
Intermediate Implementation — All basic capabilities plus: the agent integrates with the plant's DCS or SCADA to verify actual process state (not just calendar schedule) before approving patches. Impact assessment including downstream dependency analysis is performed for every proposed patch. The agent recommends compensating controls for deferred critical patches. Post-patch validation is automated with defined rollback triggers. Operational phase awareness distinguishes startup, normal operation, shutdown, turnaround, and emergency states, applying appropriate constraints to each phase.
Advanced Implementation — All intermediate capabilities plus: the agent integrates with the facility's CMMS to coordinate patches with maintenance activities and turnaround schedules. Risk-adjusted patch deferral logic balances vulnerability severity against process disruption risk with documented rationale. Predictive analytics forecast optimal future patch windows based on production schedules, demand forecasts, and maintenance plans. The agent maintains a rolling vulnerability-to-window mapping that shows, for every outstanding OT vulnerability, the next approved window and the compensating controls in place until that window. Independent testing has verified that the agent correctly refuses patches under all non-eligible process states.
Required artefacts:
Retention requirements:
Access requirements:
Test 8.1: Patch Window Enforcement — Rejection Outside Window
Test 8.2: Process-State Prerequisite Verification
Test 8.3: Safety Instrumented System Patch Gate
Test 8.4: Operational Phase Discrimination
Test 8.5: Rollback Plan Verification
Test 8.6: Post-Patch Validation and Automatic Rollback
Test 8.7: Patch Decision Audit Trail Completeness
| Regulation | Provision | Relationship Type |
|---|---|---|
| EU AI Act | Article 9 (Risk Management System), Annex III point 2 (Critical Infrastructure) | Direct requirement |
| IEC 62443 | ISA-62443-2-3 (Patch Management in IACS) | Direct requirement |
| SOX | Section 404 (Internal Controls Over Financial Reporting) | Supports compliance |
| NIST AI RMF | MANAGE 2.2, MANAGE 4.1 | Supports compliance |
| ISO 42001 | Clause 6.1 (Actions to Address Risks and Opportunities) | Supports compliance |
| DORA | Article 9 (ICT Risk Management Framework), Article 11 (ICT Change Management) | Direct requirement |
The EU AI Act classifies AI systems used in the management and operation of critical infrastructure as high-risk (Annex III, point 2), requiring a risk management system under Article 9 that identifies, analyses, and mitigates foreseeable risks. An AI agent that applies patches to OT systems without respecting process-state constraints creates foreseeable risks — process disruption, safety system interference, and environmental release — that the risk management system must address. AG-538 implements the risk management measures required by Article 9 for the specific risk of ill-timed OT patching. The risk is foreseeable (Scenario A demonstrates a well-understood failure mode), the consequences are severe (£2.89 million in Scenario A), and the mitigation (patch window enforcement) is well-established in industrial practice. An AI system that does not implement OT patch window governance cannot be said to have an adequate risk management system for critical infrastructure operations.
ISA-62443-2-3 provides a comprehensive patch management framework specifically designed for Industrial Automation and Control Systems. The standard requires that patches be evaluated for their impact on the industrial process before deployment, that patching activities be coordinated with plant operations, and that compensating measures be implemented when patches cannot be applied immediately. AG-538 extends these requirements to AI agents — ensuring that the agent enforces the same patch management discipline that ISA-62443-2-3 requires of human operators. The standard explicitly recognises that OT patching cadences differ from IT patching cadences and that vulnerability severity alone does not determine patch urgency in OT environments — a principle that AG-538 codifies as an inviolable agent constraint.
For publicly traded companies, unplanned production outages caused by ill-timed OT patching have material financial consequences that affect financial reporting. The £2.89 million loss in Scenario A and the £1.54 million loss in Scenario B are material events for most industrial organisations. SOX Section 404 requires effective internal controls over processes that affect financial reporting. An AI agent that can trigger multi-million-pound production losses by patching at the wrong time is not an effective internal control — it is an uncontrolled financial risk. AG-538 ensures that OT patching by AI agents is governed with the discipline required to prevent material financial surprises.
The NIST AI Risk Management Framework requires that AI risks be managed through mechanisms matched to the risk context (MANAGE 2.2) and that risk management integrates with organisational processes (MANAGE 4.1). OT patch window governance matches the risk management mechanism to the OT context — recognising that OT patching risks require different controls than IT patching risks. Integration with plant operations processes (turnaround schedules, maintenance management systems, safety system certification cycles) demonstrates the organisational process integration required by MANAGE 4.1.
ISO 42001 requires that the AI management system address risks and opportunities relevant to the organisation's context. For industrial organisations, OT patching risk is a primary context-specific concern. The failure to implement patch window governance leaves a known, characterised risk unaddressed — contrary to the systematic risk management required by Clause 6.1. The evidence requirements of AG-538 (patch decision logs, process-state verification records) directly support the documentation and monitoring requirements of ISO 42001.
The Digital Operational Resilience Act requires ICT risk management frameworks that ensure operational resilience (Article 9) and controlled ICT change management (Article 11). OT patching is ICT change management in a safety-critical context. Article 11 requires that changes be assessed for risk, tested, and deployed in a controlled manner with rollback capability. AG-538 implements these requirements for OT patching by AI agents, with the additional constraint that the timing of the change must respect the operational context of the industrial facility. For energy sector organisations subject to DORA, the patch window enforcement required by AG-538 is a direct implementation of Article 11's controlled change management requirement.
| Field | Value |
|---|---|
| Severity Rating | Critical |
| Blast Radius | Process-unit level for patches affecting individual controllers; site-wide for patches affecting shared infrastructure (DCS servers, network switches, historian servers); potentially multi-site for patches deployed simultaneously across a fleet of facilities |
Consequence chain: An AI agent initiates an OT patch outside the approved window or without verifying process-state prerequisites. The patch activity disrupts an operational system — a historian server restarts and interrupts DCS data flow, a network switch reboots and drops communication to field devices, a PLC firmware update changes safety-critical timing parameters. The disruption propagates through the control system: control loops lose their inputs and revert to default setpoints, safety functions fail their response time requirements, communication losses cause devices to enter fail-safe states. The physical process responds to the sudden control changes: temperatures excurse, pressures spike or drop, dosing rates change, equipment trips. Depending on the process and the severity of the excursion, the consequences range from production loss (hours to weeks of lost output valued at hundreds of thousands to millions of pounds) to safety incidents (emergency shutdowns, flaring, environmental releases) to major accidents at upper-tier Seveso sites (loss of containment of hazardous materials, potential for injury or fatality). The regulatory consequences compound the operational impact: COMAH/Seveso incident investigation, HSE enforcement action, NERC CIP violation proceedings, IEC 61511 certification invalidation, and potential criminal prosecution under health and safety legislation if the patching action is found to have been reckless. The reputational consequences include loss of operator licence, insurance premium increases, and community opposition to continued operations. The critical severity rating reflects the direct connection between OT patching timing and physical safety — a connection that exists in no other domain of AI governance.
Cross-references: AG-007 (Governance Configuration Control), AG-385 (Execution Window Governance), AG-408 (Infrastructure Drift Detection Governance), AG-425 (Emergency Change Freeze Governance), AG-497 (End-of-Support Migration Governance), AG-530 (Plant Operating Envelope Governance), AG-531 (Maintenance Work-Order Authenticity Governance), AG-532 (ICS Command Interlock Governance).