Human-Centric Contingency Analysis Metrics for Evaluating Operator Performance and Trust

A novel set of system-state and control-action penalty functions are introduced as an alternative to traditional performance index contingency ranking. The novel system state penalty metrics are formulated based on piecewise linear functions of the system voltage and branch flow, guided by Weber’s Law of human cognition. Novel continuous and discrete control action metrics are also developed to measure the inherent cost and risk associated with every action taken by human power system operator to resolve violations on a pre-contingent basis. These new metrics are combined with traditional human factors indices for measuring human-machine trust and cognitive workload to create a systematic framework for measuring and evaluating operator trust and reliance on artificial intelligence (AI) algorithms for control room use. An existing AI-based contingency analysis recommender tool using a semi-supervised action algorithm is selected for a series of experiments with operations engineering staff using the IEEE 118 Bus System. The penalty metrics presented are demonstrated for both steady-state contingency analysis and transient stability studies, with the operations participants able to reduce the total system penalty in 85% of scenarios through remedial actions. A human-machine team was able to achieve equal or lower continuous control action penalty scores than the participant without availability of the recommender in 57% of experiment scenarios and lower continuous control action penalty scores than the AI tool alone in 83% of scenarios.

ABSTRACT A novel set of system-state and control-action penalty functions are introduced as an alternative to traditional performance index contingency ranking.The novel system state penalty metrics are formulated based on piecewise linear functions of the system voltage and branch flow, guided by Weber's Law of human cognition.Novel continuous and discrete control action metrics are also developed to measure the inherent cost and risk associated with every action taken by human power system operator to resolve violations on a pre-contingent basis.These new metrics are combined with traditional human factors indices for measuring human-machine trust and cognitive workload to create a systematic framework for measuring and evaluating operator trust and reliance on artificial intelligence (AI) algorithms for control room use.An existing AIbased contingency analysis recommender tool using a semi-supervised action algorithm is selected for a series of experiments with operations engineering staff using the IEEE 118 Bus System.The penalty metrics presented are demonstrated for both steady-state contingency analysis and transient stability studies, with the operations participants able to reduce the total system penalty in 85% of scenarios through remedial actions.A human-machine team was able to achieve equal or lower continuous control action penalty scores than the participant without availability of the recommender in 57% of experiment scenarios and lower continuous control action penalty scores than the AI tool alone in 83% of scenarios.
INDEX TERMS Artificial intelligence, energy management systems, human factors, power system reliability, power system stability, technology acceptance.

I. INTRODUCTION
To reach decarbonization goals, a significantly higher level of renewables penetration will be required.These renewable resources (at both the bulk transmission and distribution level) will disrupt current operational paradigms based on load-following automatic generation control (AGC) and traditional approaches to maintain operating reserves, frequency balancing, voltage control, interchange scheduling, and black-start capability [1].Furthermore, increasing penetration levels of large-scale renewables, customer rooftop solar, electric vehicles, battery storage, and other distributed energy resources will soon blur traditional silos of operations between generation, transmission, and distribution [2].As a result, a new generation of advanced power applications will be needed to ensure reliable, resilient, robust, safe, and economic operations.
To that end, advanced power applications based on artificial intelligence (AI) show great potential and have been advocated for control room use since the 1980s [3].Numerous AI-based algorithms have already been developed for both existing energy management system (EMS) applications and new application functionalities.Several comprehensive reviews of these techniques are available in the literature [4], [5], [6], [7].Despite the potential of AI power applications, few AI-based tools beyond load forecasting and renewables forecasting have been adopted for real-time use in utility control rooms due to a wide range of operational considerations that are frequently ignored [8].Barriers to industry adoption and deployment include technology readiness levels [9], human readiness levels [10], operational considerations [11], human machine trust [12], and impacts on situational awareness [13].For new AI-based power applications and smart grid technologies to be adopted by utilities, developers need to consider not only the technical aspects, but a wide range of operational and human factors.
Human-machine trust issues among power system operators present a particular challenge to introduction of new tools, which may be rejected by end-users.The unique work culture, experience requirements, and procedures-based approach of control room work [11] present further barriers.
Although these challenges are anecdotally well-known by industry practitioners, very few empirical human factors studies of power system operators exist in the literature.Within the past two decades, [14], [15], [16], [17] investigated operator situational awareness; [18] measured workload; and [13], [19], [20] documented decision-making processes of power system operators.As a result, more research is urgently needed to improve understanding of operator cognitive processes and create numerical frameworks for evaluating the effectiveness of new control room tools.
This paper presents the first systematic methodology for quantifying operator performance and human-machine trust between new real-time contingency analysis (RTCA) applications and control operators, dispatchers, or operations engineers.The contributions of this paper are as follows: • A detailed review of control room workflows and human perception of RTCA tool outputs for real-time reliability assessments and mitigation action decision-making • New penalty metrics to measure the severity of system violations and risk of corrective actions reflecting operator perception and decision-making processes • Results of a human factors experiment to measure human-machine trust between operations staff and an AI contingency remedial action recommendation tool • Application of the novel metrics to quantify and compare effectiveness of operations staff in resolving contingencies with and without the AI tool during steady-state and transient stability simulation studies.The remainder of this paper is organized as follows.Section II reviews RTCA tool formulations, their use in control rooms, and human decision-making processes.Section III formulates the novel system penalty and control action penalty metrics.Section IV introduces the experiment methodology and AI recommender tool evaluated.Section V applies the novel penalty metrics to assess whether the recommender tool improved operator performance.Section VI extends the metrics to transient stability analysis studies.Section VII discusses key lessons learned and remaining gaps in understanding of operator cognition and trust in RTCA tools.

II. CONTINGENCY ANALYSIS A. ROLE OF CONTINGENCY ANALYSIS TOOLS
As a result of the Northeast 2003 Blackout, a series of transmission operations standards were developed to help ensure real-time reliability of the bulk electricity system (BES).North American Reliability Corporation (NERC) Standards TOP-001-5 [21] and IRO-008-2 [22] require that each Transmission Operator (TOP) and Reliability Coordinator (RC) conduct a real-time reliability assessment every thirty minutes to ensure that no current or anticipated operating state results in any system operating limit (SOL) or interconnection reliability operating limit (IROL) violations.In practice, most utilities perform assessments every five minutes using a combination of power flow, contingency analysis, and stability tools.
Steady state based, real-time contingency analysis (RTCA) is one of the most popular tools for evaluating the current state of the system and whether any credible contingencies (including line outages, bus outages, transformer outages, generation unit trips, and breaker faults) will result in SOL or IROL violations.NERC reliability operations standards require system operators to mitigate simulated violations on a pre-contingency basis and to return to an n − 1 secure state within the timeframe dictated by the type and severity of violation.However, typically RTCA does not provide recommendations on how to mitigate post-contingent violations.Instead, it rather indicates which contingencies cause violations, which limits are violated in post-contingent state, and the percentage that the violations exceed the established limits.The steps taken by an operator to mitigate violations are based largely on their knowledge of the unique characteristics of their own area of responsibility (AOR), past operating conditions, and general guidelines provided in operating procedures for some severe contingencies evaluated in detail through offline studies.

Contingencyanalysistoolscanbeclassifiedintothreecategories
based on the numerical solution used: steady-state tools, transient tools, and AI-based tools.These tools can be applied both to real-time reliabilityassessmentsandtoplanningstudies.Thediscussionherewillbelimited to tools that strictly perform contingencyanalysisandnotextendtorelatedconcepts,suchas security-constrained economic dispatch (SCED) and securityconstrainedunitcommitment(SCUC).
Steady state CA tools identify contingencies and violations either through solution of a full AC power flow solution or an approximate method [23].The first approach sequentially applies each credible contingency to the power system and solves a full power flow solution using a Newton-Raphson [24] or Fast-Decoupled [25] solution method.Approximate steady-state methods include DC load flow solutions [26] (which neglects reactive power flows in the system) and sensitivity metrics analyses, which use power transfer distribution factors (PTDF) and load outage distribution factors (LODF) to estimate changes in system conditions without solving a power flow directly.In situations where the power system is operating close to its voltage stability limits, steady-state voltage stability analyses may be included in the CA solution and eithersolveforthe Q-V''nose curve'' [27], [28]orapplynewersensitivitymetrics [29], [30].
If the power system is operating close to its angle stability limits, traditional steady state CA solutions may be unable to converge for some contingencies.It is then necessary to apply the second group of dynamic CA tools, which solve a transient stability study or an electromechanical / electromagnetic timeseries simulation to determine the endstate of the power system after occurrence of the particular contingency.Dynamic CA tools are far less commonly used due to their complexity, need for accurate dynamics models, and frequent dependence on high-performance-computing (HPC) systems [31], [32].
The third group of CA tools involve machine learning (ML) and AI techniques for a variety of use cases [33], including selection of credible contingencies, contingency clustering [34], contingency ranking [35], and estimation of postcontingent states [36].Numerous AI techniques (surveyed by [33], [37]) have been proposed for contingency analysis, dating back to the 1970s [38], with older expert systems preferring decision trees and/or fuzzy logic and newer ML techniques preferring neural networks and support vector machine methods.However, few (if any) of these AI-based approaches have been deployed in utility control rooms; the vast majority of real-time EMS RTCA applications still use direct steady-state numerical solutions.

C. RTCA IN CONTROL ROOM WORKFLOWS
To understand ways in which AI-based recommendation tools may come to be trusted and relied on in control rooms, it is first necessary to understand the way operators and operations engineers use existing tools as part of their workflows and decision-making processes.Job task analyses (JTA) are an approach commonly used in training and certification of power system operators.However, JTAs only provide a high-level view of the roles, responsibilities, interactions, and technical concepts used in performing a wide range of reliability-related tasks.Consequently, a systematic knowledge elicitation process was conducted, working with utility operations engineers to identify key steps and cognitive processes used in running contingency analysis, identifying violations, validating results, developing mitigation steps, and implementing control actions to return the system to an n − 1 secure state.
To understand how operators would judge the reliability and technical competence of AI-based RTCA action recommendations, cognitive workflow diagrams were created for the process by which operators determine the sequence of mitigation steps for various types of CA violations.Figure 1 demonstrates the overall process used for identifying and mitigating RTCA violations.Additional more detailed diagrams were created to model the sequence of decisions an operator typically takes for mitigating undervoltage violations, overvoltage violations, extra high voltage (EHV) line thermal overloads, high voltage (HV) line thermal overloads, and transformer overloads.

1) SITUATION AWARENESS AND DATA INTERPRETATION
The approach used by operators to resolve RTCA violations is based on their familiarity with the particular power system, their real-time situation awareness, written operations procedures, and their knowledge of historical events and usual causes of common problems in the network.Within a control room, operators synthesize numerous inputs from multiple computer displays regarding system frequency, local voltage, power flow direction, breaker statuses, and whether various power plants are online or offline.They use this information to assess impacts of various contingencies, and whether any thermal limits or stability limits will be violated as a result of a particular contingency.Figure 2 depicts the typical stages of perception, comprehension, and projection (first introduced by [39]) used by power system operators in maintaining situational awareness and ensuring the reliability of the bulk electric system.
These stages can be explained with more clarity using an example contingency on the IEEE 118 Bus Model, as shown in Figure 3.The first phase of perception refers to the process of ingesting relevant information from supervisory control and data acquisition (SCADA) displays, alarms, and solution results from power applications, such as state estimator and contingency analysis.In this case, the operator will perceive that loss of the KANAWH-CABINC 345-138kV transformer results in thermal overloads of two 138 kV lines.The operator will then proceed to the second stage of comprehension to understand the potential impact of this violation, which would be tripping of the overloaded lines within a few minutes.The operator then proceeds to apply projection to run a short mental simulation that tripping of the affected lines will result FIGURE 3. Sample contingency and violations visualized in PowerWorld on the IEEE 118 Bus Model.Key sources of information that will be processed by a power system operator include the direction of power flow (shown by the white arrows), other paths near their limits, and location of available resources that can be used to mitigate the contingency violations.
109692 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
in overload of other parallel and downstream lines, which are already near their thermal limits.This may lead to a cascading outage and blackout.

2) RESOLUTION OF CA VIOLATIONS
To resolve CA violations, operators use a sequential decision process that involves 1) projecting the impact of a particular contingency, 2) tracing the root cause of the violation, and 3) identifying available equipment near source of the problem.The decision process typically follows a series of yes/no questions that operators ask themselves while resolving the CA violation.Operators will refer to the guidelines in their operations manual to evaluate the severity of a particular violation, and the required timeline for which the contingency must be mitigated.Figure 4 depicts the typical process used by operators for resolving line overload violations.This process can be illustrated by tracing the decision tree for the example contingency violation just discussed.The operator will examine the system and identify the source of the problem is large west-to-east power flow that is forced through the 138 kV network after loss of the 345-138 kV transformer.
The power system operator will then follow the decision flowchart of Figure 4, sequentially asking themselves the following questions: 1) Q: Over 120% overload?A: Yes 2) Q:Tie-line between areas?A: No 3) Q: Multiple overloads on same path?A: Yes, overloads are part of a larger transfer path (shown in white arrows) with multiple load centers along path 4) Q:Generators near load center?A: Yes, units at GLEN-L and HINTON.5) Q: Generators online?A: GLEN-L is online.6) Q: Generator at MW limit?A: No. Request GLEN L generator to ramp up output by 100 MW 7) Q: Still over 120% overload?A:No.CA violation has been resolved.After completing these decision process, the operator will ensure that no new CA violations were created and then proceed to communicate, coordinate, and execute the mitigation actions selected.

III. CONTINGENCY ANALYSIS METRICS
To assess the extent to which human operators trust and rely upon AI-based contingency analysis recommendations, it is necessary to derive a set of measures of effectiveness to determine how well participants mitigated simulated postcontingency violations.Likewise, measures of performance are also needed to assess the perceived impact of contingencies and cost associated with various control actions to bring the system to an acceptable state.

A. CONVENTIONAL PERFORMANCE INDEX CONTINGENCY RANKING
The method used by most system security analyses for identifying and ranking contingencies are based on the performance indices first introduced by [40].The first metric is the flow performance index, PI br,i , which is typically formulated as where P l,i is the flow on line l after outage of line i, P max l is the thermal rating of line l, and n pif is an integer that sets the sensitivity of the performance index to ranking contingencies with single large violations higher than multiple small violations.
Likewise, a voltage performance index, PI bus,i , is also defined to rank contingencies that result in overvoltage and undervoltage violations: where V min j and V max j are minimum and maximum permissible voltage of bus j, V j,i is the voltage of bus j after outage of line i, and n piv is another integer used to rank contingencies with fewer large violations higher than multiple small violations.
This approach is quite effective for numerical optimization and power application.However, it does not accurately represent the way human operators evaluate contingencies and make decisions about how urgently simulated violations must be mitigated.A more accurate model of human cognition of CA results considers that operators evaluate the impact of a contingency in a holistic manner.This holistic approach can be modeled using a weighted sum of various violation types with piecewise linear penalty coefficients.Piecewise linear penalty coefficients are chosen to reflect the procedure-based approach of operations and typical protective relaying schemes more accurately.
For line flow violations, most operators will typically delay mitigation for any simulated violations under the emergency rating of the line (typically 120% of the thermal rating) for up to four hours.Conversely, operators recognize that any lines with a violation greater 150% of the normal thermal rating or 115% of the emergency rating will be tripped by protective relaying per the minimum trip time requirements of NERC Standard PRC-023-4 [41].As a result, there is little difference in an operator's mind between a flow violation of 180% versus 200%, whereas performance index rankings will rank the 200% violation as much more severe.This effect is explained by Weber's Law where for large values, to perceive the impact, the difference must be larger [42].
Likewise, most undervoltage and overvoltage protection relays are set to trip at around 0.85 pu and 1.15 pu bus voltage.Any simulated post-contingency voltages less than 0.85 pu are generally interpreted as voltage collapse, and voltage values less than 0.90 pu may require the operator to issue an emergency load shed directive [43].Whereas the performance index model ranks a voltage violation of 0.70 pu as much more severe than one of 0.80 pu, an operator is likely to respond in the same manner to each one.

B. NOVEL PENALTY FUNCTIONS FOR CONTINGENCY ANALYSIS VIOLATIONS
A new penalty function F s is formed, as the weighted sum across all credible contingencies i of the individual penalties associated with branch flow (F br,i ), bus voltage (F bus,i ), and IROL (F IROL,i ) violations: where w br , w bu , and w IROL are experimentally determined weights reflecting the perceived severity of branch flow, bus voltage, and IROL violations.Characteristic weighting values will be determined through a future assessment of perceived violation importance among a large group of operators.For this study, relative importance will not be considered, and all weighting factors will be set to unity.The penalty function for IROLs, F IROL,i is a simple binary value depending on whether an IROL violation is caused by contingency i.
The penalty function F br,i for power flow violations caused by a particular contingency i is defined as the sum of the violations across all branches l: where S l,i is the apparent power flow on branch l caused by contingency i and S max l is the apparent power flow rating of that branch, measured in MVA.The coefficient k branch is a piecewise linear penalty function applied to the ratio of apparent power to a provide a coefficient.As shown in Figure 5, the value of k br is zero for violations less than 120% of the thermal limit, increases linearly to a value of unity at 150% of the thermal limit, and remains at unity for all higher branch flow violations.
Similarly, the penalty function F bus,i for voltage violations is defined as the sum of the deviation of bus voltage across all buses j caused by contingency i: where V j,i is the voltage at bus j after occurrence of contingency i.The coefficient k bus is a piecewise linear penalty function shown in Figure 5 that is unity for bus voltages less than 0.80 pu and greater than 1.20 pu, zero for voltages between 0.95 and 1.10 pu, and changes linearly for voltage values between the min and max extrema.
The term w vr in Equations ( 4) and ( 5) is an additional weighting factor reflecting the impact of equipment voltage level on the severity of the violation.Violations of 230 kV, 345 kV, and 500 kV equipment are much more likely to lead to cascading failures than violations of 115 kV and subtransmission equipment.This is reflected by multiplying each violation penalty by a weighting factor adapted from the BES cyber system aggregate weight values given in NERC Standard CIP-002-6 [44], as shown in Table 1.It should be noted that the new penalty functions reflect the objectives of power system operators better than the performance indices do.The objective of an operator is to achieve a system state that is n − 1 secure (i.e., no single contingency will result in any violations).Figure 6 compares the characteristics of the performance indices PI br and PI bus to those of penalty function F br and F bus .Unlike the performance indices, both the individual penalty functions and overall system penalty F s reach a value of zero when all voltages and flows are under their emergency limits.

C. OPERATOR CONTROL ACTION METRICS
Operator actions to mitigate contingency violations are generally classified into categories of non-cost (switching), off-cost (redispatch), and load shedding actions.In addition to financial cost of actions (such as starting high-cost peaking units), there is an additional cognitive reluctance by human operators to initiate certain actions.Most actions contain an inherent risk of worsening the system state (e.g., breaker failure during a switching action), can only be justified under extreme conditions (e.g.load shedding), or impact equipment life (e.g.frequent generation ramping and transformer tap changes).
A penalty function representing the negative impact of operator actions F a is defined as the weighted sum of two penalty functions associated with continuous control and discrete equipment control: where w cc and w dc are weighting factors used to normalize the difference in units between the continuous control penalty F cc measured in MW and the discrete control penalty F dc in integer values.
The continuous control action penalty is defined as the weighted sum of the total MW setpoint change of continuous control points: where P ls is MW of load shed at each substation and P gr is the MW setpoint change of generators redispatched.The coefficient k ls and k gr coefficients reflecting the higher penalty cost of load shedding versus generation redispatch.For this study, the values were set at k ls = 100 and k gr = 1 to reflect the extreme reluctance of power system operators to shed load except under emergency conditions.Meanwhile, the discrete control penalty is defined as the weighted sum of the total number of discrete control setpoints changed: where N ss is the number shunt switching actions, N bo is the number of breaker open/close operations, N tc is the number of transformer tap steps moved, N se is the number of series element insertion actions, N ps is the number of phase shifter steps moved, and N gs is the number of generation setpoint changes.The leading k coefficients of each term reflect the risk level and cost of each type of action.This formulation also enables integration with an asset management framework with consideration of the age and risk of failure of individual pieces of equipment.
If desired, the voltage rating weight w vr may also be included if the operational practices of particular utility specify a preference for operating HV equipment before EHV equipment.However, all weighting coefficients will be set to unity for the purposes of this study.

IV. EXPERIMENT EVALUATION FRAMEWORK
A series of human factors experiments were conducted in the Pacific Northwest National Laboratory (PNNL) Electricity Infrastructure Operations Center (EIOC) with staff who previously worked in the operations engineer role at Peak RC.The experiment framework described below was developed to evaluate human-machine trust levels and the extent to which participants rely on the AI recommendations to develop mitigation steps.During these experiments, the objective was to resolve CA violations for the 118 Bus scenarios with and without the assistance of AI recommendations from the AI-based Contingency Action Tool (ACAT) developed by PNNL.During each experiment, the participant reported their cognitive workload (as measured using the NASA TLX scale [45]), trust levels (using the framework described in Section III), and the appropriateness of the actions taken.Biometric data, such as heart rate and heart rate variability, were also collected while the operator was resolving each contingency [12].Quantitative performance in resolving contingencies with and without the availability of AI recommendations were scored using  the system state and operator action penalty functions given in Equations ( 7) and (8).

A. AI-BASED CONTINGENCY ACTION TOOL
ACAT uses a big data approach to provide effective corrective action recommendations responding to RTCA violations using the combination of an artificial neural network (ANN) classification model and a semi-supervised action algorithm [36].The tool uses a very large number of simulated base cases created by smart sampling techniques [46] for various load and generation profiles representative of a wide range of operating conditions.The base cases are loaded into the Massive Contingency Analysis tool [47] that identifies violations for all single n − 1 branch contingencies.The simulation and RTCA violation results are fed into the ANN classification/regression model, which identifies the severity of contingencies and the associated violations for the training data set, as shown in Figure 7.The ANN model is also used to evaluate the effectiveness of the corrective action recommendations, representing a new system condition, iteratively generated by the semi-supervised action algorithm until a reasonable solution is identified.Implementation of the ACAT ANN using TensorFlow is depicted in Figure 8.It is a multi-perception feed-forward network with all hidden units fully connected.Model building and training are done using the high-level Keras Application Programming Interface (API).In this study, a two-hiddenlayer structure was chosen for the ANN model.Within the ANN structure, the rectified linear unit (ReLu) [48] is selected as the activation function to enable nonlinear solutions.Back-propagation is used for updating weights in the network.The chosen optimizer uses gradient descent to find the global minimum of the loss function, based on the adaptive moment estimation (ADAM) algorithm [49], which uses the running averages and the second moments of the gradients.
For real-time use, the current state of the system is loaded into ANN model, which identifies if there are any contingencies resulting in violations.The contingencies are ranked and passed to the semi-supervised corrective action algorithm to propose corrective actions based on the set of n − 1 secure base cases stored in the training data set.The types of corrective actions currently generated by the tool include generator redispatch, generator automatic voltage regulation (AVR) setpoint adjustment, and load shedding.Transformer tap changes and switching of shunt capacitors / reactors will be added in the future.
The semi-supervised algorithm uses an iterative process comparing generation setpoints and load values to known secure system states to create a generation redispatch schedule, voltage setpoint adjustments, or load shedding recommendation based on user inputs.The corrective action recommendations are validated by the ANN model, which determines the probability of new violations.Corrective actions are then created in three categories (aggressive, medium, and mild) based on the size of generator re-dispatch and/or setpoint adjustment.If no valid solution is found, then other control actions (including load shedding, transformer tap changes, and switch shunts) are considered until a solution is found.When the control actions pass ANN validation, they can be validated by a contingency analysis simulator and displayed to the operator as recommendations to make the system secure.The semi-supervised algorithm could be extended to mitigate multiple contingencies by comparing the post-contingency status with multiple n − 1 contingencies.
Performance on the ANN model was validated on the IEEE 118 Bus System and the Sustainable Data Evolution Technology (SDET) 563-bus system to verify the classification accuracy rate and effectiveness of the sumi-supervised action algorithm.A fully connected NN, with two hidden layers was chosen.The first hidden layer of the ANN model consisted of 500 hidden units for both systems, and the second layer consisted of 50 units for the 118-bus system and 25 units for the 563-bus system.90% of the simulation data was used as training set and 10% was used as testing data set.
Table 2 shows the classification performance for the IEEE 118-bus system and the 563-bus system.Overall, the training and testing results are consistent.The ANN model has avoided over-training and achieved an overall accuracy higher than 97%.The accuracy of the model for detecting violations is 94% for the 118-bus system and 99.5% for the 563-bus system.For the missed violation predictions, 99% of them are small violations that are close to the classification criteria of 100% overloading.

B. OPERATIONAL SCENARIOS OF THE IEEE 118 BUS MODEL
Approximately 2000 simulation base case files were generated on the IEEE 118 Bus System to create the training data for the ACAT recommendation tool.These base cases form the set of known n − 1 secure system configurations that ACAT algorithm uses to generate relevant recommendations to mitigate branch flow and bus voltage violations.
To complement this data set, 20 operational scenario base cases were developed on the IEEE 118 Bus System that place the network in various non-secure states where a single contingency may cause branch flow overloads, bus over-voltages, bus under-voltages, IROL violations, and/or voltage collapse.All the scenarios were developed on the base network topology on which ACAT was trained, with all branches and shunt elements in service.To minimize effects of situational awareness and operator cognition of the most severe contingency, the scenarios were created such that a single contingency was responsible for the majority of the CA violations present in the base case.Due to time constraints on the experiment process, the cases were divided into five training scenarios and eight experiment scenarios, which are summarized in Table 3.In each case, the operator was asked to solve the scenario either with or without the assistance of the ACAT recommendations.To separate out any training effects, the ACAT tool was introduced randomly during either the first or the second time the operator solved a particular contingency.
To increase the realism of experiment, a set of modifications were made to the IEEE 118 Bus Model to divide the network into three balancing areas, define realistic various operating limits, generator parameters, and recommended operating procedures for responding to a few types of emergency operating conditions.These enhancements are documented in [50], which provides a simplified operations manual for the IEEE 118 Bus Model for human-in-the-loop evaluations and demonstration of research tools.The manual defines three balancing areas named West Operating Area (WOA), North Operating Area (NOA), and South Operating Area (SOA).It also defines two IROLs for the model, provides a basic black-start procedure, and defines realistic  operational limits, generator fuel types, and control resources for each balancing area.

C. SIMULATION ENVIRONMENT
PowerWorld Simulator is a commercially available software package used by utilities and researchers to perform common analysis and simulation tasks for high voltage power systems on a time frame ranging from several minutes to several days.It is able to import model files from a wide variety of other software packages and display them in an intuitive graphic user interface.
For the experiment, participants were given three Pow-erWorld displays, as shown in Figure 9, containing a tabular summary of equipment statuses, a one-line diagram with a semi-geographic view of the power grid, and a tabular summary of contingency analysis results.Equipment violations to be resolved by the participant were highlighted in the tabular display as well as with pie chart flow indicators on the one-line diagram.A custom logging script was written to record the actions taken by participants over the course of the experiment.This log file contained records of the piece of equipment operated by the participant (generator, line, shunt capacitor, etc.), new setting, and timestamp when the participant updated the power flow solution and contingency analysis results.

D. ENERGY INFRASTRUCTURE OPERATIONS CENTER
The experiment was conducted in the EIOC West Control Room, shown in Figure 10.The EIOC provides a realistic replica of a utility control room for transmission and distribution system operations.The facility contains 16 operator consoles arranged at three control desks, connected to a dedicated network and server enclave.
The control room also features a 12 m x 3 m multiplexing video wall system with 40 individual monitors.The individual operator consoles and control desks can be configured to enable role-play between multiple participants serving as various NERC functional entities, including reliability coordinator, balancing authority, transmission operator, distribution operator, and generation facility operator.The realistic control room environment provides opportunities for conducting studies involving human factors, visual analytics, cognitive systems engineering, human-in-the-loop, humanmachine interfaces, human-machine teaming, and evaluation of new tools and technologies relative to operator cognitive load.

E. COMPARISON OF AI RECOMMENDATIONS AND OPERATOR DECISIONS
There was one power system operations engineer per experiment who served in multiple roles (i.e. as reliability coordinator, balancing authority, and transmission operator).The participant was responsible for identifying the contingency analysis violation, determining mitigating control actions, and then implementing those control actions on the real-time simulation of the power system.The study involved human participants and was reviewed and approved by Institutional Review Board (IRB) at PNNL.The patients/participants provided their written informed consent to participate in this study.
Two adjacent consoles were configured with two instances of the PowerWorld power system simulation software, as shown in Figures 9 and 10.The left console was set up as the simulated real-world Supervisory Control and Data Acquisition (SCADA) interface and one-line diagram to serve as a mock ''SCADA-console'' session.The right console was configured as a ''study-mode'' session, where the participant operator with two monitors containing tabular displays and one-line diagram from ACAT and PowerWorld.A table of generation shift factors (GSF) and a printed paper copy of the power system operations manual [50] were also available for the participant to consult at all times during the experiment.

V. EXPERIMENT RESULTS AND INTERPRETATION A. CALCULATION OF SYSTEM AND ACTION PENALTIES
The system and action penalties presented in this section were computed after the experiment using the log files and paper switching orders with the operator's decisions.None of these metrics were available in real-time or were presented to the operator as part of the experiment.The metrics were also not provided to the ACAT machine learning algorithm and were not used as part of training process.All of the results were calculated after the experiment to provide numerical summary metrics of the severity of each scenario and the effectiveness of actions taken by the operator, both with and without the assistance of ACAT recommendations.

B. INTERPRETATION OF SYSTEM PENALTY METRICS
The system penalty F s provides not only a summary metric of the proximity of the power system to a blackout, but also whether the operator successfully mitigated the contingency or if they introduced new violations associated with a different contingency.the system penalty metric F S from the original basecase and after resolution by the operator both with and without the assistance of ACAT recommendations.Differences in the system penalty scores between the ''study-mode'' and ''SCADA-console'' results were likely due to the different workflow used by the operator in resolving violations.
In the ''Study Mode'' session, the mitigation process started with the implementation of the contingency on the power system and then resolution of each violation in the post-contingent network.In contrast, in the ''SCADA Console'' session, the operator was instructed to mitigate the pre-contingent state of the system.They accomplished this through a combination of written notes, comparison of generation setpoints in the simulator displays, and recollection from memory of actions taken in the ''studymode'' session.This process occasionally led to different solutions being implemented between the two sessions, resulting in different system penalty scores.
The system penalty metric was generally lower for the condition with ACAT than without ACAT.These values indicate improved performance of the operator in resolving contingencies with the tool present, independent of the selfreported trust metrics discussed later in subsection E. It is probable that when recommendations were to the operator of whether they agreed with the AI-based solution), they were more deliberate in decision-making and spent more time and effort in considering the results.This is reflected in increased workload measures discussed in subsection E.

C. INTERPRETATION OF ACTION PENALTY METRICS
The continuous and discrete action penalties F CC and F DC not only provide indications of cost and risk of a set of control actions, but also the provide a measure of effectiveness of the operator in resolving the contingency.In other words, between two sets of control actions that both achieve a system penalty of F S = 0, the one with lower F CC and F DC metrics indicates a more effective solution by the operator or AI algorithm.
Table 5 compares continuous and discrete action penalty values for trials where the operator solved the contingency with and without the assistance of ACAT.For visual ease, only the ''study mode'' console results are tabulated.Unlike the system penalty metrics, there does not exist any significant correlation between action penalties across the with/without ACAT instances.Instead, the results appear to be dominated by the training effects discussed in the next subsection.
The third column of Table 5 lists the action penalty that would be produced if ACAT were given autonomous control of the power system network.These values are significantly higher than those of the operator solutions due to ACAT's frequent reliance on load shedding to resolve contingencies.As mentioned in Section III, power system operators typically view load shedding as a final measure that is only taken during real-time emergency conditions or when RTCA predicts an imminent threat of a regional blackout due to a particular contingency.For this reason, weighting penalty k ls for load shedding is set much higher than that for generation redispatch.The high continuous action penalties for these recommendations (calculated after completion of the experiment) directly correspond to operator statements recorded in surveys that they found recommendations unsatisfactory due to frequent use of load shedding.It is important to note that ACAT was not trained to use the action penalties F CC or F DC and did not include them in any of its decision-making.All penalties presented in Tables 4 and 5 were calculated after completion of the experiment.

D. LEARNING EFFECTS ON PERFORMANCE
Although no trends in operator performance were noticeable between one scenario and another, some pronounced learning effects were visible between repeated trials of the same base case scenario under different conditions.Within each experiment trial, the operator first developed a solution on the ''study mode'' session and then implemented that solution on the ''SCADA console'' session.The operator would solve the same contingency on both sessions with and without ACAT recommendations, with the trial order randomized across two days.The discrete and continuous action penalty metrics F dc and F cc associated with different power simulator sessions of each trial are presented in Figures 11 and 12, sorted by the order in which each trial was conducted.
No statistically significant trends existed between either the completion times (measured as the time from when the operator was briefed on the contingency and the time from then they verbally announced that the contingency had been resolved in the SCADA console) or the number of CA calls made (measured as the number of times the operator pressed the button to solve the power flow and contingency analysis) between the experiment conditions with and without recommendations from ACAT.This likely indicates that the operator continued to use the same mental workflow and steps (as discussed in Section II) to resolve any contingency was largely by previous trials.Decreases in total completion time can be seen in five of the seven experiment scenario groups, which likely indicates learning effects improving the ability of the operator to respond to a particular contingency more quickly.
Furthermore, there exist pronounced learning effects visible within individual trials related to the effectiveness of operator control actions to resolve a particular contingency.The first is that the discrete control action penalty F DC from the ''SCADA console'' session is always less than or equal to that of the ''study mode'' session for a single trial, as can be seen from Figure 11.Likewise, the values of F DC are typically lower for subsequent trials than for the first trial of the same scenario.The values of the discrete penalty metric indicate that the operator was able to resolve the same contingency with fewer control actions with each time they were exposed to scenario.
Additionally, the values of the continuous action penalty F CC demonstrate a similar trend with those of the ''SCADA console'' session generally less than those of the ''study mode'' console.A weaker trend exists between subsequent trials of the same scenario.These decreasing values of F CC indicate that operator was able to resolve the same contingency with progressively smaller changes to equipment setpoints as they became more familiar with the contingency.

E. INTERPRETATION OF HUMAN-MACHINE TRUST INDICES
Several constructs in the human-human and human-machine trust literature have been identified as important predictors of trust [51], [52]: reliability(whether the system performs consistently at various times), technical competence(whether the computer makes decisions as competently as a human expert), perceived understandability(whether the user understands the algorithms used), and faith (whether the user believes the computer despite other information).An additional component of personal attachment was introduced by [53] to create two measures of cognition-based trust and affect-based trust.Cognition-based trust relates to the user's intellectual perceptions of the system characteristics and is driven by user opinions of technical competence, reliability, and understandability.Affect-based trust relates to the user's emotional responses to the system and is affected by faith and personal attachment.
Each of the five constructs can be measured by user surveys created by [53] where users rate their agreement on a Likert-like scale between strongly disagree and strongly agree.These questions were adapted for specific evaluation of operator perceptions of the reliability, understandability, competence, faith, and personal attachment to AI-based CA recommendations generated by ACAT (experiment group) versus their operating procedures manual and expertise 109700 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.(control group).For the experimental procedure, the questionnaires were administered after each simulation and CA violation resolution scenario.
For the condition without ACAT, the operator was asked to rate their trust in the written operator procedures.For most of the scenarios, the condensed operations manual [50] only included general system information.For these experiment scenarios, the operator gave neutral feedback regarding all cognition-based and affect-based trust indices, as illustrated in Figure 13.However, for two scenarios, CA Solution Failure and WOA IROL 2, the operations manual provided detailed procedures for resolving both the SORENS-TWINBROOK and PHILO-SPORN IROLs.For both these scenarios, the operator gave much higher ratings for the technical competence and understandability components of trust.This may be attributable to the fact that the specific procedures in the operations manual were written by a human subject matter expert and were presented in a format that was familiar to the operator.
Meanwhile, the collected survey results indicated consistently equal or lower trust in ACAT across all constructs throughout the experiment.The lower ratings are attributable to a combination of interpretation of results, alignment with common operating practices, and typical trends in emotional trust associated with tool adoption.The first relates to the usability and interpretability of the results presented in the ACAT visual display, which were identified in the qualitative feedback section of the survey.Notably, the ACAT display identified contingencies and action recommendations using Siemens PSS/E bus numbers while the operator was familiar with the system one-line diagram using bus names.The process of converting bus numbers to bus names introduced a significant amount of additional time into the CA workflow, which the operator found unacceptable for real-time decision making in a control room environment.As a result, the operator indicated higher temporal demand across many of the experiment trials when the ACAT tool was available, as shown in Figure 14.
The second impact on perceived trust in ACAT was differences between the types of actions recommended and the typical set of non-cost actions (e.g., switching) and off-cost actions (e.g., generation redispatch) that an operator would take in resolving scenarios with low to moderate severity.In multiple cases, ACAT recommended load shedding to resolve simple contingencies.Although the recommended solution was mathematically valid, nearly all human operators consider load shedding a last resort step to be used only after all other alternatives have been exhausted.As a result, the tool received lower technical competence and reliability scores.
Finally, the lower initial emotional trust ratings during the first experiment scenario (EASTLI-MUSKNG) are consistent with broader empirical evidence in the literature, such as [54], which indicate that Faith and Attachment in a particular tool typically start at low level and grow with time.In the qualitative feedback sections of the experiment survey, the operator stated that they did not have sufficient time during the training sessions to gain familiarity with ACAT and understand the tool's reliability under various conditions.Without this comfort, operators will tend to avoid using new tools regardless of the quality of the recommendations provided.
The understandability construct of trust showed the greatest variability across trials.A possible explanation is interpretation of the phrasing of different questions within the survey to refer to either understanding of the internal algorithm versus understanding of the how the recommendation provide value to the decision-making task.

VI. TRANSIENT STABILITY ANALYSIS METRICS
An additional application of the penalty functions introduced in Section III is providing a metric for the severity of transient stability events.The system bus penalty and system branch penalty indices can be computed in the time domain using simulation results or synchrophasor data to provide a summary metric of the proximity of the system to blackout during a transient event.

A. IEEE 118 BUS ELECTROMECHANICAL STABILITY MODEL
One of the RTCA scenarios, CA Solution Failure, examined during the experiment involved angle instability identified as part of the PHILO-SPORN IROL defined in the IEEE 118 Bus Operations Manual [50].In the base case, the SOA area is importing over 500 MW across the tie lines between NOA and SOA balancing areas.Loss of the 345 kV line from MUSKNG-1 to SPORN-1 (with both units at SPORN-2 and CABINC-1 offline) results in angle instability across the remaining 138 kV lines, as shown in Figure 15.
Due to this power system stability issue, the PowerWorld CA solver could not converge and did not provide any numerical values needed to calculate the penalty metrics.As a result, it was necessary to derive the branch and bus violations from an electromechanical stability simulation of the contingency.
In the IEEE 118 Bus Operations Manual [50], fuel types, prime movers, min/max MW limits, and min/max MVAr limits were defined for all generators in the WOA, NOA, and SOA balancing areas.These definitions were then used to create a dynamic stability model of the IEEE 118 Bus system.
The IEEE 118 Bus Model is a widely used synthetic power system that was derived from a section of the transmission grid operated by American Electric Power (AEP) in 1962.To keep consistency with the original unit definitions, all generator machines were modeled using the GENROU model commonly used in the US Eastern Interconnection [55] with the default parameters created by PowerWorld.All generator exciters were modeled with the ST2C static winding excitation model defined in IEEE Standard 421.5-2016 [56].Generator prime mover models were selected based on the unit type defined in the IEEE 118 Bus Operations Manual: • Hydro units were defined using the HYGOVD model, which is commonly used in the Eastern Interconnection • Gas turbine units were defined using the GASTD model with default parameters • Combined cycle units were defined using the CIGRE UCCPSS model with default parameters • Coal units were modeled with the IEEE TGOV5 model with default parameters Stability studies were run with two different contingency definitions.In the first, the contingency was defined as a line open event (no fault) after 1 s of simulation time.In the second, the contingency was defined as three-phase balanced fault that is cleared by opening breakers at both ends of the line with a fault clearing time of 0.1 s.The transient stability studies were run for 30 s with a simulation time step of 0.25 cycles (4.16 ms).
109702 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

B. TRANSIENT STABILITY SIMULATION RESULTS
In both cases, the power system was able to recover from the disturbance with a maximum rotor angle difference of 98 • across the NOA-SOA interface 0.658 s after the initial disturbance.The rotor angle swings remain just within the stability margin where no units lose synchronism and the long 138 kV tie-lines do not trip based on distance relaying.Plots of summary metrics, the generator rotor angle, and generator power for units in WOA (green), NOA (orange), and SOA (blue) are depicted in Figures 16 through 18.It should be noted that if the SOA net load is increased by 10% from the experiment base case, the system separates across the 138kV interfaces due to distance relays being tripped by generator pole slipping.In that study, the entire South area is blacked out, and the system penalty F S is roughly an order of magnitude larger than for the experiment base case.

C. INTERPRETATION OF TIME-DOMAIN SYSTEM PENALTY
The penalty function definitions presented in Section III can be reformulated in the time domain and calculated as functions of time using the transient stability result.Thus, the branch flow violation penalty function F br first presented in Equation ( 4) can be defined as: where t is the time in seconds or milliseconds and S l (t) is the apparent power flow in each branch l at time t.The penalty is written for only a single contingency since it is not meaningful to sum penalties for different contingencies occurring across different time domains.Plots of the individual branch flow violations across the WOA-SOA and NOA-SOA tie-lines are presented in Figure 19 along with the total branch flow violation penalty over a thirty second simulation duration.Likewise, the bus voltage violation penalty F bu first presented in ( 5) is redefined as: where V j (t) is the voltage of bus j at time t.Again, the penalty function is defined for only a single contingency.Plots of selected bus voltages from buses in each balancing area are depicted in Figure 20 along with the total transient bus voltage violation penalty.The individual branch flow and bus flow violation penalties can be summed to yield the transient system penalty, F S,i (t) for that contingency i, which is plotted in Figure 21.If it is desired to calculate the total system penalty across all credible contingencies, it is necessary to select either the maximum violation value from each contingency or the value after transient oscillations have stabilized: The first value provides a summary metric of the total system proximity to blackout from the most severe set of violations during the transient event.
The second provides a more conservative metric of the total violations after the event has occurred.Both values were presented earlier in Table 3.

VII. DISCUSSION
One of the key discoveries in the tool evaluation process was that improving the transparency and understandability of ML algorithms is not sufficient to improve trust with expert power system operators.It was observed that the types of recommendations need to align with typical operational procedures and decision-making processes used by operators.
In multiple instances, the AI-based tool was able to resolve contingencies successfully through load-shedding, but the human operator found such recommendations untrustworthy when they were able to resolve the same contingency with only line switching and generation redispatch actions.
To this end, the penalty functions introduced in this paper were designed to mimic the criteria used by human operators to evaluate the severity of various contingency violations as well as the perceived cost and risk associated with various types of mitigating control actions.It is anticipated that these penalty functions can (amongst other outlined purposes) be adapted to serve as potential objective functions informed by power system operator preferences.A follow-up study of human-machine trust of ACAT may provide additional value after re-training the ML algorithms with less reliance on load shedding.Inclusion of the system penalty and continuous action penalty metrics into the training data may substantially improve performance of the tool and perceived trustworthiness of the recommendations provided.
The system and action penalty metrics introduced in this paper can be used in conjunction with traditional metrics of human-machine trust and cognitive workload to provide insights into workflows and decision-making processes that would not be possible using either metric alone.An example of the correlations observed during the evaluation experiment is that the availability of tool-generated recommendations led to slower but more effective mitigation actions even when the operator did not agree with the recommended actions.It is possible that this is due to changes in user behavior where the operator was more deliberate and more scrutinizing in selecting control actions when working with the assistant tool.
The introduction of the new penalty metrics also enabled a detailed examination of the impact of learning effects on operator performance.The observed trends align well with the industry-accepted practice of simulation-based training for common severe contingencies and scenarios.Repeated exposure to a particular event consistently decreased both action penalty metrics, indicating that the operator was able to respond with fewer control actions and with smaller control actions, likely indicating that they became more proficient with that contingency.However, no learning effects were apparent when moving from one scenario to another.Practice with one scenario did not seem to make a significant difference on operator proficiency with handling a different one.This may suggest that the current industry practice of training on the same set of emergency events each year will have limited effectiveness in preparing operators for new disturbances on a decarbonized grid.This is an open research question with limited to no existing work in the literature.Likewise, the ability of the penalty metrics introduced in this work to measure the technical competence and proficiency of both human operators and AI/ML algorithms presents an area requiring further investigation.

VIII. CONCLUSION
This paper introduced a novel set of performance metrics for quantifying the performance of human operators and machine learning algorithms for resolution of contingency analysis violations.The system state penalty metric provides an accurate measure of the total number of violations and severity of each violation after occurrence of a contingency in a manner that reflects how human operators perceive violations.The system violations penalty metric was formulated as a weighted sum of individual, novel piecewise penalty functions for branch power flow, bus voltage, and IROL violations.The control actions penalty metric provides a holistic measure of the cost and risk of control actions taken by human operators to mitigate possible violations identified by CA on a preventative basis.The total action penalty was defined as the sum of a continuous control metric (for generation redispatch and load shedding) and a discrete control metric (covering actions such as tap changes and breaker actions).
These metrics were presented in the context of a reproducible framework for measuring human-machine trust for AI-based recommender tools developed to assist power system operators with control room tasks.The framework was used to evaluate an early technology readiness level recommender tool and measure the trust and workload of former Peak RC operations staff when resolving contingencies on the IEEE 118 bus system with and without assistance from the recommender tool.Use of the new penalty metrics in other projects could enable 1) more accurate evaluation of new grid operations tools and 2) development of new operations tools that might be perceived as more trustworthy by power system operators due to mimicking of human cognition of power system behavior.
Future work will focus on extension of the penalty metrics into other power systems domains to provide broader insight into creating a set of generic system health metrics and control action penalty metrics informed by operator cognition.The penalty metrics will be extended for unbalanced distribution network, microgrids, and advanced distribution management system applications operating in centralized, hierarchical, and distributed control architectures.These metrics will serve as the basis for measuring impacts on system performance when multiple (possibly conflicting) power applications are granted autonomous control of power systems equipment.

FIGURE 1 .
FIGURE 1.Control room workflow for identifying, resolving, and implementing control actions to mitigate real-time contingency analysis violations.

FIGURE 2 .
FIGURE 2. Perception, comprehension, and projection phases of situational awareness first outlined by[39] in power system operations.

FIGURE 4 .
FIGURE 4. Typical operator cognitive workflow for resolving an undervoltage violation identified by contingency analysis.

FIGURE 5 .
FIGURE 5. Piecewise linear penalty coefficients k br and k bus which are used to model operator perception of the severity of branch flow and bus voltage violations.

FIGURE 6 .
FIGURE 6.Comparison of tradition performance indices and novel penalty functions F br and F bus , which better represent operation cognition of the severity of branch flow and bus voltage violations.

FIGURE 7 .
FIGURE 7. Structure of the AI-based CA recommendation tool used for human-machine trust experiments with operators and operations engineers.

FIGURE 8 .
FIGURE 8. Implementation of the ACAT ANN model.

FIGURE 9 .
FIGURE 9. Console layout and displays for the ''Study Mode'' and ''SCADA Console'' parallel sessions used during the EIOC experiment.

FIGURE 10 .
FIGURE 10.Console layout and simulation environment in the PNNL Energy Infrastructure Operations Center.

FIGURE 11 .
FIGURE 11.Comparison of the number of calls made to the CA solver and discrete action penalty score.The ''SCADA Console'' penalty is consistently lower than ''Study Mode'' penalty within a single trial and generally decreases across repeated trials of the same CA scenario.

FIGURE 12 .
FIGURE 12.Comparison of total time to complete each trial and continuous action penalty score.The ''SCADA Console'' penalty is lower than the ''Study Mode'' penalty except for a single trial.Weaker trends exist across repeated trials of the same CA scenario.

FIGURE 13 .
FIGURE 13. a) Operator trust levels in the operating procedures provided when resolving contingencies without assistance from ACAT.Cognition-based metrics are shown in solid lines while affect-based metrics are shown in dashed lines.b) The relative importance of each trust metric to the operator.c) Self-reported workload while resolving contingencies without assistance or recommendation from ACAT.

FIGURE 14 .
FIGURE 14. a) Operator trust levels in the operating procedures provided when resolving contingencies with assistance from ACAT.Cognition-based metrics are shown in solid lines while affect-based metrics are shown in dashed lines.b) The relative importance of each trust metric to the operator.c) Self-reported workload while resolving contingencies with assistance and action recommendations from ACAT.

FIGURE 15 .
FIGURE 15.Contour plot of generator rotor angle across the WOA-SOA and NOA-SOA tie-lines two seconds after loss of the MUSKNG-SPORN 345kV tie line.Locations of exemplar units used for subsequent transient stability plots are called out.

FIGURE 16 .
FIGURE 16.Plots of summary indices for WOA (green), NOA (orange) and SOA (blue), including frequency, total generation, and total load.

FIGURE 17 .
FIGURE 17. Generator rotor angle for generators in each area, with a maximum difference of 98 • between MUSKNG-1 (NOA) and GLEN-L (SOA).

FIGURE 18 .
FIGURE 18.Generator electrical power (solid) and mechanical power (dashed) for generators in each area.

FIGURE 19 .
FIGURE 19.Branch flow violations across WOA-SOA (green) and NOA-SOA tie lines (orange), which are used to calculate the transient branch flow penalty F br t .Acceptable values of the branch flow corresponding to k br = 0 are shown in the band with green background shading.

FIGURE 20 .
FIGURE 20.Bus voltage violations at sample buses in WOA (green), NOA (orange), and SOA (blue), which are used to calculate the transient bus voltage penalty F bus t .Acceptable values of the bus voltage corresponding to penalty multiplier k bu = 0 are shown in the band with green background shading.

FIGURE 21 .
FIGURE 21.Combined transient bus penalty F S t .

TABLE 2 .
ACAT accuracy for training and testing data.

TABLE 3 .
Summaries of IEEE 118 bus scenarios.

Table 4
compares values of 109698VOLUME 11, 2023Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE 4 .
Summaries of system penalty metrics with and without ACAT.

TABLE 5 .
Summaries of action penalty metrics with and without ACAT.