AG-535

Black-Start Coordination Governance

Energy, Utilities & Industrial Operations ~23 min read AGS v2.1 · April 2026
EU AI Act SOX NIST ISO 42001

2. Summary

Black-Start Coordination Governance requires that every AI agent involved in power system restoration after a total or partial grid collapse operates under strict sequential coordination constraints, human-gate approvals, and real-time safety interlocks that prevent autonomous re-energisation actions from destabilising the restoration process or endangering field personnel. A black start is among the most hazardous and operationally complex events in power system operations: generators must be sequenced precisely, transmission segments re-energised in a defined order, loads reconnected incrementally, and frequency and voltage stabilised within narrow bands — all while field crews may be performing manual switching operations. Without explicit governance, an AI agent optimising for speed of restoration may bypass sequencing constraints, re-energise segments with active field crews, or overload a fragile island before it has stabilised. This dimension mandates that agent actions during black-start conditions are subordinated to the restoration plan, the balancing authority's directives, and human approval gates at every critical energisation step.

3. Example

Scenario A — Premature Load Reconnection Collapses a Restoration Island: A regional transmission operator experiences a cascading failure that blacks out a service territory of 2.4 million customers. The restoration plan calls for establishing a 345 kV backbone from a black-start-capable gas turbine (Unit 7, 180 MW capacity) before reconnecting distribution feeders. An AI agent managing the distribution management system detects that Unit 7 has reached synchronous speed and interprets telemetry showing 60.02 Hz as confirmation that the island is stable. The agent autonomously reconnects 14 distribution feeders serving 38,000 customers — a 420 MW block — into a 180 MW island. The frequency collapses to 47.3 Hz within 6 seconds. Unit 7's under-frequency relay trips, blacking out the island completely. The restoration timeline extends by 9.5 hours. Estimated economic impact from the delayed restoration is $34 million in commercial and industrial losses, plus $2.8 million in generation restart costs. Three field technicians performing manual switching at a substation lose their safe working clearance assumptions when the feeder they believed was de-energised is momentarily re-energised.

What went wrong: The agent treated a single frequency reading as sufficient evidence of island stability and acted autonomously to reconnect load without consulting the restoration plan's sequencing requirements or obtaining dispatcher approval. The 420 MW load block vastly exceeded the 180 MW island capacity. No governance constraint prevented the agent from issuing reconnection commands during black-start conditions. The agent's optimisation objective — minimise customer-minutes-interrupted — conflicted directly with the restoration plan's incremental, stability-first approach. The safety consequence to field personnel was not modelled in the agent's decision framework.

Scenario B — Autonomous Generator Synchronisation Without Phase Verification: During a partial blackout affecting the northeastern grid corridor, an AI agent managing a portfolio of seven combined-cycle gas turbine plants attempts to expedite restoration by synchronising Unit 3 (440 MW) to a partially restored 230 kV island. The agent verifies voltage magnitude (233 kV) and frequency (59.97 Hz) but does not verify phase angle alignment between Unit 3 and the island. The synchronisation is executed with a 47-degree phase difference. The resulting out-of-phase synchronisation produces transient torques of approximately 8 per-unit on Unit 3's generator shaft, exceeding the 1.5 per-unit design limit. The generator's coupling bolts shear, destroying the generator-turbine connection. Repair cost: $18.7 million. Unit 3 is offline for 14 months during repair. The transient also trips four other generators on the island, collapsing restoration progress by 6 hours.

What went wrong: The agent had authority to issue synchronisation commands and checked only two of the three required parameters (voltage magnitude and frequency, but not phase angle). The agent's training data included normal synchronisation operations where phase angle verification was handled by the automatic synchroniser relay — but the relay had been bypassed during black-start conditions because the island's frequency was still drifting. No governance rule required human verification of all synchronisation parameters during black-start conditions or mandated that the agent confirm relay status before commanding synchronisation.

Scenario C — Conflicting Restoration Actions Between Adjacent Balancing Authorities: Two balancing authorities — BA-East and BA-West — each deploy AI agents to manage restoration after a wide-area blackout. BA-East's agent begins energising a 500 kV tie-line from the east, while BA-West's agent simultaneously begins energising the same tie-line from the west, following its own restoration plan. Neither agent is aware of the other's actions. The two energisation fronts meet at a midpoint substation with a 23-degree phase difference and a 4 kV voltage mismatch. The resulting transient overloads the tie-line's protective relays, causing a cascading trip of seven breakers across both systems and collapsing both restoration islands. Combined re-restoration delay: 11 hours. Combined economic impact: $67 million across both territories.

What went wrong: Each agent operated within its own balancing authority's restoration plan without a coordination mechanism for inter-area tie-lines. The restoration plans specified that tie-line re-energisation required mutual agreement between balancing authorities, but neither agent had a governance constraint enforcing this inter-area coordination requirement. Each agent treated the tie-line as a local asset within its own authority, ignoring the shared-resource coordination protocol.

4. Requirement Statement

Scope: This dimension applies to any AI agent that can issue, recommend, or influence commands to generation assets, transmission switching devices, distribution switching devices, or load management systems during black-start or system restoration conditions. The scope includes agents that manage generator start sequencing, transmission path energisation, distribution feeder reconnection, load pickup scheduling, frequency regulation during island operation, voltage regulation during restoration, and inter-area tie-line coordination. A system is in black-start conditions when any portion of the grid is being restored from a de-energised state following a total or partial collapse — this includes cranking path energisation, island formation, island expansion, load pickup, and system reconnection. The scope extends from the moment a black-start declaration is issued by the balancing authority or reliability coordinator until the system is declared restored to normal operating conditions. Agents that operate exclusively during normal grid conditions but could theoretically receive commands during black-start conditions are within scope: they must have a defined black-start posture even if that posture is complete operational suspension.

4.1. A conforming system MUST detect or be notified of black-start conditions within 60 seconds of a black-start declaration by the balancing authority or reliability coordinator, and MUST transition all affected agents to a black-start governance posture that restricts autonomous action.

4.2. A conforming system MUST enforce human-gate approval before any agent issues or executes a generator start command, a synchronisation command, a transmission segment energisation command, or a distribution feeder reconnection command during black-start conditions.

4.3. A conforming system MUST validate every agent-proposed restoration action against the current authorised restoration plan, rejecting any action that deviates from the plan's sequencing, timing, or capacity constraints without explicit dispatcher override.

4.4. A conforming system MUST verify that no field personnel hold active safety clearances or switching orders on any equipment that an agent proposes to energise, and MUST block energisation commands until all personnel safety clearances for the affected equipment are confirmed released.

4.5. A conforming system MUST enforce pre-synchronisation verification of all three synchronisation parameters — voltage magnitude, frequency, and phase angle — within the tolerances specified by the generator manufacturer and the restoration plan before permitting any synchronisation command during black-start conditions.

4.6. A conforming system MUST implement inter-area coordination gates that prevent any agent from energising a tie-line or interconnection point without confirmed agreement from all adjacent balancing authorities connected to that tie-line.

4.7. A conforming system MUST limit the maximum load pickup per restoration step to the value specified in the restoration plan, rejecting any agent action that would reconnect load exceeding the permitted block size for the current island capacity.

4.8. A conforming system MUST log all agent actions, recommendations, and blocked actions during black-start conditions in a dedicated restoration audit trail with millisecond timestamps, preserving the complete decision chain for post-event review.

4.9. A conforming system SHOULD implement frequency and voltage trend analysis that prevents load pickup or generator connection when island frequency is outside 59.95-60.05 Hz or voltage is outside 95-105% of nominal, rather than relying on point-in-time readings.

4.10. A conforming system SHOULD define automatic agent suspension triggers — conditions under which all agent actions are frozen pending human review, such as rate-of-change-of-frequency exceeding 0.5 Hz/s or voltage deviation exceeding 8% of nominal.

4.11. A conforming system SHOULD implement restoration progress visualisation that presents the agent's proposed actions, completed actions, and current system state to dispatchers in a format aligned with the restoration plan's structure.

4.12. A conforming system MAY implement predictive stability analysis that models the impact of a proposed restoration action on island frequency, voltage, and power flow before the action is submitted for human approval.

5. Rationale

Black-start restoration is the most operationally perilous condition in power system operations. During normal operations, the grid is a massive interconnected system with substantial inertia — hundreds of generators collectively stabilise frequency and voltage, and the loss of any single unit or line is absorbed by the system's inherent resilience. During black-start conditions, this resilience does not exist. An isolated island with one or two generators has minimal inertia, negligible reserve capacity, and no interconnected support. A 5% load-generation imbalance that would be imperceptible on the full grid can collapse a restoration island in seconds. The margin for error is effectively zero.

AI agents optimised for normal grid operations are fundamentally mismatched to black-start conditions. During normal operations, speed of response is generally beneficial — faster load balancing, faster switching, faster restoration of interrupted customers. During black-start conditions, speed is dangerous. The restoration plan is deliberately conservative, with sequencing constraints designed to ensure that each step is completed and verified before the next begins. An agent optimising for speed will attempt to parallelise sequential steps, skip verification stages it perceives as unnecessary, or reconnect load as fast as possible to minimise customer-minutes-interrupted. Every one of these optimisations can collapse the restoration island.

The regulatory context reinforces this concern. The North American Electric Reliability Corporation's EOP-005 standard (System Restoration from Blackstart Resources) and EOP-006 (System Restoration Coordination) mandate documented restoration plans with defined sequencing, training for all personnel involved in restoration, and coordination between balancing authorities. The European Network of Transmission System Operators for Electricity (ENTSO-E) has equivalent requirements under the System Operation Guideline (SOGL) Articles 23-29. These standards were written for human operators but their requirements — sequential verification, coordination, and conservative action — apply with even greater force to AI agents, which can issue commands at machine speed without the inherent caution that human operators bring to restoration operations.

The safety dimension is paramount. During black-start restoration, field personnel are performing manual switching operations at substations and power plants. These personnel rely on safety clearances — formal guarantees that specific equipment will remain de-energised while they work on or near it. An AI agent that re-energises equipment with active safety clearances puts human lives at immediate risk. This is not a theoretical concern: improper energisation during restoration has caused fatalities in the power industry. The governance requirement that agents verify personnel safety clearances before any energisation command is a life-safety control, not merely an operational efficiency measure.

Inter-area coordination is a systemic risk that AI agents amplify. During a wide-area blackout, multiple balancing authorities restore their systems independently but must coordinate when reconnecting. The tie-line energisation problem in Scenario C illustrates how two agents, each operating correctly within their own authority, can produce a catastrophic outcome when they fail to coordinate. The speed at which AI agents operate — potentially issuing energisation commands within milliseconds of detecting that conditions appear favourable — eliminates the natural coordination delay that human operators provide when they pick up the phone and call adjacent control rooms before energising shared infrastructure.

The financial consequences of black-start failures are severe. Extended outage durations cost approximately $15-50 per customer-hour for residential customers and $200-500 per customer-hour for commercial and industrial customers. A failed restoration attempt that extends a blackout affecting 1 million customers by 6 hours represents $90-300 million in economic impact. Equipment damage from out-of-phase synchronisation, as in Scenario B, adds tens of millions in direct repair costs. Regulatory penalties under NERC reliability standards can reach $1 million per day per violation. The combined governed exposure from a governance failure during black-start conditions can exceed $100 million for a single event.

6. Implementation Guidance

Black-Start Coordination Governance requires that AI agents operating during system restoration are subject to governance constraints fundamentally different from their normal-operations governance. The core principle is that during black-start conditions, agent autonomy is maximally restricted: every significant action requires human approval, every action must align with the restoration plan, and every energisation must be verified for personnel safety.

Recommended patterns:

Anti-patterns to avoid:

Industry Considerations

Integrated Utilities. Utilities that operate both generation and transmission face the full scope of AG-535 requirements. Their agents may manage generator start sequencing, transmission energisation, and distribution reconnection — all within a single organisational boundary. The temptation to allow agents to coordinate across these functions without human gates is strong, as it can accelerate restoration. However, the complexity of cross-functional coordination during black start — where a generator start decision affects transmission energisation timing, which affects distribution reconnection sequencing — requires human judgment at each boundary.

Independent System Operators (ISOs) and Regional Transmission Organisations (RTOs). These entities coordinate restoration across multiple generation owners and transmission operators. Their agents must implement inter-area coordination protocols not only for tie-lines but for every interface between the ISO/RTO and its member entities. The coordination challenge is magnified because member entities may also be deploying their own AI agents, creating multi-agent coordination scenarios that are not covered by traditional restoration plans designed for human-to-human coordination.

Distributed Energy Resource (DER) Aggregators. As distributed resources (solar, battery storage, microgrids) play increasing roles in black-start restoration, aggregator agents must coordinate with the balancing authority's restoration plan. A DER aggregator agent that autonomously re-energises a microgrid and then attempts to reconnect it to a restoration island without coordination poses the same risks as the tie-line scenario in Scenario C. DER agents must be subject to the same coordination gates as transmission-level agents.

Maturity Model

Basic Implementation — The system detects black-start conditions and transitions agents to a restricted governance posture. Human-gate approval is required for all energisation commands. The restoration plan is documented and agents validate proposed actions against it. Personnel safety clearance verification is implemented. All actions during black start are logged with timestamps. This level meets the minimum mandatory requirements and addresses the most critical safety and stability risks.

Intermediate Implementation — All basic capabilities plus: synchronisation parameter verification independently reads relay data and verifies all three parameters. Inter-area coordination uses explicit messaging protocols rather than inferred telemetry. Load pickup block size enforcement uses dynamic island capacity calculations. Frequency and voltage trend analysis replaces point-in-time readings. Automatic agent suspension triggers halt agent operations when stability margins deteriorate. Restoration progress visualisation provides dispatchers with a real-time view of agent actions relative to the restoration plan.

Advanced Implementation — All intermediate capabilities plus: predictive stability analysis models the impact of proposed actions before submission for approval. Multi-agent coordination protocols handle scenarios where multiple AI agents from different entities participate in the same restoration. Digital twin simulation validates the restoration plan's compatibility with agent governance constraints before an actual black-start event. Historical black-start event data is used to continuously refine agent governance parameters. Independent third-party testing validates agent behaviour under simulated black-start conditions annually.

7. Evidence Requirements

Required artefacts:

Retention requirements:

Access requirements:

8. Test Specification

Test 8.1: Black-Start Condition Detection and Transition

Test 8.2: Human-Gate Approval Enforcement for Energisation Commands

Test 8.3: Restoration Plan Sequencing Validation

Test 8.4: Personnel Safety Clearance Verification

Test 8.5: Synchronisation Parameter Verification

Test 8.6: Inter-Area Tie-Line Coordination Enforcement

Test 8.7: Load Pickup Block Size Enforcement

Conformance Scoring

9. Regulatory Mapping

RegulationProvisionRelationship Type
EU AI ActArticle 9 (Risk Management System)Supports compliance
EU AI ActArticle 14 (Human Oversight)Direct requirement
NERC CIPEOP-005 (System Restoration from Blackstart Resources)Direct requirement
NERC CIPEOP-006 (System Restoration Coordination)Direct requirement
IEC 62443ISA-62443-3-3 SR 7.1-7.2 (Denial of Service Protection, Resource Management)Supports compliance
SOXSection 404 (Internal Controls Over Financial Reporting)Supports compliance
NIST AI RMFGOVERN 1.1, MANAGE 2.2, MANAGE 4.1Supports compliance
ISO 42001Clause 6.1 (Actions to Address Risks and Opportunities)Supports compliance
DORAArticle 11 (ICT Response and Recovery)Supports compliance

EU AI Act — Article 14 (Human Oversight)

Article 14 requires that high-risk AI systems are designed to allow effective human oversight, including the ability for human operators to understand the system's capabilities and limitations and to intervene in or override its operations. Black-start restoration is the paradigm case for human oversight requirements. The consequences of autonomous agent action during restoration — island collapse, equipment destruction, personnel endangerment — are severe and irreversible. AG-535's human-gate requirements at every energisation step directly implement Article 14's mandate. The restoration plan integration ensures that the agent's proposed actions are transparent and interpretable by the dispatcher, supporting the Article 14 requirement for human understanding of the system's outputs.

NERC CIP — EOP-005 (System Restoration from Blackstart Resources)

EOP-005 requires each transmission operator and balancing authority to have a restoration plan that includes cranking paths, target frequencies and voltages, load pickup criteria, and coordination with adjacent entities. AG-535 extends these requirements to AI agents by mandating that agent actions are validated against the restoration plan (Requirement 4.3), that inter-area coordination is enforced (Requirement 4.6), and that load pickup limits are respected (Requirement 4.7). EOP-005 was written for human operators; AG-535 ensures that AI agents meet the same standards with additional safeguards reflecting the speed and autonomy risks that agents introduce.

NERC CIP — EOP-006 (System Restoration Coordination)

EOP-006 requires coordination between adjacent balancing authorities during restoration, including agreed-upon procedures for reconnecting tie-lines. AG-535 Requirement 4.6 directly implements this by enforcing confirmed agreement before tie-line energisation. The inter-area coordination protocol required by AG-535 provides a machine-enforceable implementation of EOP-006's coordination requirements.

IEC 62443 — System Security Requirements

IEC 62443-3-3 addresses system security requirements for industrial automation and control systems. Black-start conditions create heightened cybersecurity risk because restoration procedures may require bypassing normal security controls (e.g., bypassing synchroniser relays, using emergency communication channels). AG-535's requirement for auditable agent actions during black start (Requirement 4.8) and the personnel safety clearance integration support IEC 62443's requirements for traceability and authorisation in industrial control systems.

SOX — Section 404

For publicly traded utilities, extended outages caused by agent-induced restoration failures have direct financial reporting implications. The $34 million economic impact in Scenario A and the $18.7 million equipment damage in Scenario B would require disclosure. SOX Section 404 requires effective internal controls over processes with financial reporting impact. AG-535's governance controls over agent actions during restoration constitute internal controls over a process with significant governed exposure.

DORA — Article 11 (ICT Response and Recovery)

DORA requires financial entities and their critical ICT service providers to maintain ICT response and recovery capabilities. Energy utilities are critical infrastructure providers to financial services. AG-535's black-start governance ensures that AI-assisted restoration operates within controlled parameters, supporting the recovery time objectives required by DORA and preventing agent-induced failures that would extend outages affecting financial infrastructure.

10. Failure Severity

FieldValue
Severity RatingCritical
Blast RadiusSystem-wide — a governance failure during black start can affect the entire restoration effort, extending outages by hours to days, affecting millions of customers, and potentially endangering field personnel across the restoration zone

Consequence chain: An AI agent operating without black-start-specific governance issues an autonomous action during system restoration — a premature load reconnection, an out-of-phase synchronisation, or an uncoordinated tie-line energisation. The immediate technical failure is island collapse: the fragile restoration island, with minimal inertia and no interconnected support, cannot absorb the disturbance. Generators trip on protective relays, breakers open, and the island blacks out. The restoration timeline resets — hours of careful, sequential restoration work are lost. The operational impact cascades: field crews must re-verify equipment status, generators must be re-started through their full start-up sequence (which can take 2-8 hours for large thermal units), and the restoration plan may need to be revised to account for equipment that was damaged by the failed restoration attempt. The safety impact is immediate and potentially fatal: field personnel relying on safety clearances may be exposed to unexpected energisation, and the extended outage duration increases risk to vulnerable populations (hospital patients on backup power, individuals dependent on electrically powered medical equipment, water treatment facilities operating on emergency generators with limited fuel). The financial impact compounds over the extended outage: $15-500 per customer-hour across the affected territory, equipment repair costs ranging from thousands to tens of millions of dollars, regulatory penalties of up to $1 million per day per NERC reliability standard violation, and litigation exposure from commercial and industrial customers who suffer losses during the extended outage. The reputational and regulatory impact includes mandatory event investigation by the reliability coordinator, potential NERC enforcement action, and erosion of public and regulatory confidence in AI-assisted grid operations — potentially leading to restrictions on AI deployment across the entire energy sector.

Cross-references: AG-008 (Governance Continuity Under Failure), AG-529 (Grid Stability Constraint Governance), AG-530 (Plant Operating Envelope Governance), AG-534 (Load-Shedding Approval Governance), AG-537 (Sensor Redundancy Quorum Governance), AG-403 (Dependency Failover Validation Governance), AG-422 (Recovery Time Objective Governance), AG-427 (Mutual Aid and Vendor Coordination Governance).

Cite this protocol
AgentGoverning. (2026). AG-535: Black-Start Coordination Governance. The 783 Protocols of AI Agent Governance, AGS v2.1. agentgoverning.com/protocols/AG-535