Since the northeastern USA blackout of 14 August, a joint US and Canadian task force has been working to determine how and why it occurred. Their Interim Report, published on 19 November, detailed progress to date.
Three working groups have focused on specific aspects of the outage: • the Electric System WG, looking at grid operations to determine exactly what happened and why, and how it cascaded out of control • the Nuclear WG, examining how nuclear power stations performed • the Security WG, investigating sabotage and security questions.
Their Interim Report has been submitted to energy secretary Spencer Abraham and Canadian minister of Natural Resources Herb Dhaliwal.
Main conclusions The blame for allowing initiation of events and failing to control them has fallen mainly on FirstEnergy Corp, whose monitoring systems failed. This was compounded when the analytical tool of independent system operator MISO also failed, preventing them from becoming aware of FirstEnergy’s problems. The report concludes that the grid operators were presiding over a containable problem, but failed in their duty to monitor their system on a minute by minute basis and were therefore ill-prepared for the necessity to shed load or take other actions to offset lost generation or transmission capacity.
But the task force decided that although the blackout was largely preventable, once the problem grew to a certain magnitude, nothing could have been done to prevent it from cascading out of control.
It also found that nuclear plant safety was not compromised. All the reactors were shut down safely when the disturbance was detected, and were restarted safely when the grid was restored. And it found no evidence of terrorist activities or any sort of foul play or sabotage.
Before the event The task force found that the system was operating normally before the event, and was capable of dealing with more than 800 different fault contingencies, including the loss of the Harding-Chamberlin line. It was operating close to prescribed limits but still within NERC operational policies. Importantly, this establishes that no pre-existing electrical condition (for instance, large power flows into Canada) contributed to the blackout.
Procedures NERC and its affiliated organisations set the voluntary reliability standards that govern the operations of power grids. The task force found that FirstEnergy had failed to observe at least four of these standards while MISO had failed to observe two.
FirstEnergy: • Failed to return the system to a safe state within thirty minutes of the initial event • Failed to notify other systems • Did not use its analytical tools to assess conditions • Had inadequate operator training.
While MISO: • did not notify other reliability co-ordinators of potential problems • Did not have adequate monitoring capability.
Start of the sequence The initial events that led to the cascading blackout occurred in Ohio. Three high-voltage transmission lines operated by FirstEnergy Corp short-circuited and went out of service when they came into contact with trees that were too close to the lines.
FirstEnergy’s control room alarm system ‘wasn’t working properly’, that is, its monitoring equipment wasn’t flagging up the downed lines; the control room operators were unaware of the fault, and also unaware that transmission lines had gone down. Therefore they took no action, such as shedding load, which could at that time have kept the problem from becoming too large to control. And FirstEnergy operators, being un-aware of the growing problems, did not inform neighbouring utilities and reliability co-ordinators, who could have helped address the problem.
Enter MISO The loss of the three lines resulted in the overloading of nearby lines. But there were also problems at the Midwest Independent System Operator (MISO) the entity that co-ordinates power transmission in the region that includes FirstEnergy. Apparently MISO’s system analysis tools were not performing effectively on the afternoon of August 14th, which prevented MISO from becoming aware of FirstEnergy’s problems earlier and taking action.
The working group also found that MISO’s reliability co-ordinators were using out-dated information to support real-time monitoring, which hindered them in detecting further problems on the FirstEnergy system, and that MISO lacked an effective means to identify the location and significance of transmission line breaker operations reported by its monitoring systems. Having that information would have enabled MISO operators to become aware of important line outages much earlier.
MISO and the PJM Interconnection, the reliability control area that includes Pennsylvania, Maryland, New Jersey and parts of other states, lacked joint procedures to co-ordinate their reactions to transmission problems near their common boundary.
The report identifies other factors, including poor communications, human error, mechanical breakdowns, inadequate training, software glitches, and insufficient attention to factors ranging from the performance of sophisticated computer modelling systems to simple tree-trimming.
Out of control The task force attempted to determine how the blackout spread so far. The sequence started when prescribed measures – adjusting the output of certain power plants, taking certain customers temporarily off-line, adjusting equipment to stabilise the power flows – to deal with potentially catastrophic overloads did not happen, which moved the load to other unprepared lines and shut them down.
A cascade event depends ultimately on minutiae – ‘human actions or inactions, system topology, load/generation imbalances, distances between major power plants and load centres, voltage profiles, the types and settings of protective equipment’ to quote the report. The investigators did not attempt to unravel them. But it is known that the cascade evolved in three distinct stages. First, the collapse of FirstEnergy’s Ohio system triggered large overloads on lines on both the north and south shores of lake Erie, tripping them out. This triggered the next phase, the separation of the northeast from the rest of the Eastern Interconnection and a wave of line trips through western Ohio that separated American Electric Power from FE, and a resultant wave northward that separated western and eastern Michigan. Finally, the resulting tidewave power flow from PJM around lake Erie through New York and Ontario into Michigan and Ohio tripped those lines, effectively isolating NE USA and Ontario and separating large areas such as PJM from the blackout. The NE USA/Ontario island quickly became unstable owing to its internal undercapacity and broke up into smaller islands, some of which, notably western New York and most of New England, re-stabilised.
Islanding New England The report outlines four factors that kept the power on in some areas and helped to damp down the cascade.
First, dampening of the disturbance by distance, to the point where relays no longer trip. Second, the ability of areas with higher voltage lines in greater density, such as PJM and AEP, (500 kV and 765 kV respectively) to more readily absorb voltage and current swings and form a fire-break; similar breaks were created in areas that have sufficient internal capacity, often coupled with fast automatic load shedding relays, and were able to isolate themselves completely or import power from unaffected regions to the south and west. Finally, some areas interconnected by DC lines were protected from AC power disturbances.
The next stage ‘Phase Two’ will involve a series of public forums to give the people affected an opportunity to comment on the Interim Report’s findings and present ideas for improving the reliability of the electric infrastructure, and preven future blackouts.
The Task Force will then issue a final report with its recommendations for improving the system and for any appropriate follow-up.
Nuclear security In addition to the findings of the Electric System Working Group, the nuclear working group determined that all the affected nuclear plants in the United States and Canada functioned properly.
Procedures at the nuclear plants were followed, and the procedures and equipment both worked well on August 14th.
Terrorists The Security working group has found no evidence to date of terrorist activities or any sort of foul play or sabotage on August 14th. No deliberate damage or tampering has been found in any equipment in affected areas of the grid, and no sort of illicit cyber activity has been identified as factors.
While the Interim Report identifies a significant number of problems and shortcomings, it also shows us something very positive.
(Jail sentences!).
In the 100-plus years that the grid system has been in operation, massive power outages have occurred only a few times. But smaller outages occur every day. These minor outages are inevitable on such a vast and complex array of interconnected and interrelated machinery that is so vulnerable to internal malfunctions and external forces. Things go wrong. But it is the responsibility of the people who operate the system to keep the small problems from getting bigger.
‘So despite the potential for a major blackout, it hardly ever happens. That’s a credit to the design of the system and the people who run and maintain it. Its a good record, overall. But even one major blackout is too many, and we intend to use what weve learned from our investigation of August 14th to make the system even stronger and even more reliable.’ Spencer Abraham.
P information will be the basis for Phase Two of the process formulating recommendations on ways to make the electric system stronger, more efficient, and better able to withstand and adapt to all the things that can hinder its safe and reliable operation.
First, because of line trips, some areas were isolated from the portions of the grid experiencing instability, yet they retained sufficient on-line generation or the capacity to import power from other, unaffected, parts of the grid adn keeptheir system in balance.
Second, other areas were sufficiently distant from the central source of the cascade that they suffered smaller current and voltage fluctuations than areas closer to the source. Consequently, the instability encountered by circuit breakers was not suffieicnt to cause additional plants and lines to trip.
Third, in some areas more robust transmission lines were better able to absorb power and voltage surges. “The best way to keep a blackout from spreading over a wide area is to never let it get started” – Spencer Abraham.
It is the result of the work of ‘hundreds of technical experts and energy specialists from … the United States and Canada’.
The Electric System Working Group has concluded that at least four reliability standards established by NERC were not observed by FirstEnergy on August 14th, and two were not followed by MISO. These failures helped create a problem of such magnitude as to be insurmountable.