The Challenge Posed by Space Weather to High‐Voltage Electricity Flows: Evidence From Ontario, Canada, and New York State, USA

In this paper, we present evidence that geomagnetic activity has the potential to disrupt the electricity flows between Ontario, Canada, and New York State, USA. This conclusion is based on a modeling framework that makes use of weather data, electricity load data, measures of transmission “network effects,” proxies for geomagnetically induced currents and the expected level of power grid conditions. The model is estimated using hourly data throughout 1 May 2002 through 31 October 2003. The model is evaluated using out‐of‐sample hourly data over the period of 1 November to 9 December 2003. The out‐of‐sample predictions are more accurate when the forecasting equation reflects the estimated contribution of geomagnetic activity. The structural modeling results for the sample period indicate that the peak predicted effect of geomagnetic activity on the electricity flow was about 1,604 MWh in absolute value when the geomagnetically induced current proxy achieved a value of 363.2 nT/min. The analysis also indicates that the level of the geomagnetically induced electricity flow is highly dependent on ambient temperature and expected system conditions at the time of the geomagnetic storm. Moreover, geomagnetic activity is an important driver of the volatility in the electricity flows. The overall findings indicate that the scope of the challenge posed by space weather to the operations of the electric power system is likely understated.


Introduction
Despite society's well recognized vulnerability to space weather, Eastwood et al. (2017), have pointed out that there is significant research gap in terms of quantifying the technical impacts of space weather even though the economic stakes are in the billions of Euros (Eastwood et al., 2018). In this paper, we attempt to help reduce this gap. The focus of the paper is on quantifying the influence of geomagnetically induced currents (GICs) on high-voltage electricity flows.
We model the hourly high-voltage electricity flows between two large electrical production and use regions: Ontario and New York. Since these regions are not electrically "isolated," our model ingests information from and about surrounding electric power entities. We consider scheduled electricity flows and the actual flows. Knowing that the actual electricity flow data reveal a high percentage of unscheduled (loop) flows, we search for their likely source. We find that including data on terrestrial weather conditions, primarily temperature, and space weather activity, in the form of rapid variations in the geomagnetic field, improves model performance in out-of-sample test runs.
We present evidence that geomagnetic activity has the potential to disrupt the electricity flows between Ontario and New York, that is, that geomagnetic activity has potential to induce large loop flows and that this behavior is exacerbated by meteorological conditions. This conclusion is based on a modeling framework that makes use of weather data, electricity load data, measures of transmission "network effects," proxies for GICs and the expected level of power grid conditions. The model is constructed using hourly data over the period of 1 May 2002 through 31 October 2003. The modeling results are evaluated using hourly data over the period of 1 November 2003 through 9 December 2003.
The modeling framework presented in this paper is very similar to the time series analysis presented by Forbes and St. Cyr (2017), and thus, readers desiring further understanding of the methodology may wish to consult that publication. This paper is organized as follows: Section 2 provides some background information on loop flows, electricity markets, and power grid operations. It is noted that system operators view loop flows as harmful and difficult to predict. Readers with a focused interest in geomagnetic activity may wish to only skim this section. Section 3 discusses why basic physics suggests that geomagnetic activity may have implications for electricity flows. Section 4 discusses the data. Section 5 presents an overview of the transmission challenge in terms of Ontario and New York. Section 6 presents the econometric model. Section 7 presents the estimation results. Section 8 presents the out-of-sample evaluation. Section 9 reports an estimate of the GIC effects for the sample period. Section 10 summarizes the findings.

Background: Loop Flows
We begin by noting that loop electricity flows, which are also known as unscheduled flows and/or inadvertent flows, represent the difference between scheduled (ex ante) desired flows and actual flows of electricity between two electricity control areas. While some researchers distinguish between loop versus transit flows (Skånlund et al., 2013, p. 8), the overall unscheduled flows are ubiquitous and frequently large in magnitude because electricity flows on alternating current transmission systems, the dominant form of transmission (United States Department of Energy, 2002), follow the laws of physics as opposed to the laws of supply and demand. System operators such as the Independent Electricity System Operator (IESO) in Ontario Canada are candid in admitting their inability to model high-voltage electricity flows accurately. The flows impair the efficiency of the markets used to coordinate generation and transmission. Moreover, they represent a significant operational challenge for electric power system operators, as unanticipated increases or decreases in generation may be required to ensure the reliability of the electric power system. Some readers may have the impression that challenges to the reliability of the electric power system are extremely rare. Our view is shaped by the relatively high incidence of real-time electricity price spikes in New York City, NYISO's (New York Independent System Operator) largest zone in terms of electricity consumption ( Figure 1). As the figure attests, there are market periods in which the real-time price of electricity is more than five times the day-ahead price for the same hour. These price spikes represent instances in which the reliability of the electric power system was significantly challenged given that it would not be prudent for a system operator to pay substantially more than the day-ahead price for electricity unless reliability was under challenge. The price spikes occur because system operators are unable to make accurate predictions of future operating conditions which we believe is a key element in maintaining reliability. We suspect that the specific culprits of the price spikes are load forecasting errors, the failure/inability of generators to adhere to their generation schedules, and transmission issues including unscheduled flows. A detailed attribution of the causes of these price spikes among these factors is beyond the scope of this paper. However, a key measure of the failure/inability of suppliers to adhere to their schedules known as the electricity imbalance may provide some insights. We do not have this measure for the United States and Canada but do have this data for England and Wales for the same time period. Inspection of this data in Figure 2 indicates that the electricity imbalances, that is, the differences between the scheduled and actual activity levels, are higher for transmission (right two columns) as compared to generation (left three columns). This suggests that electricity transmission issues are an important determinant of the balancing actions that drive the real-time price spikes.
In this paper, we begin by noting that loop flows are undesirable because they reduce the efficiency of electricity markets and also pose a challenge to the reliability of the electric power system. Efficiency is  impaired because: (1) suppliers of electricity to the transmission system are paid-in-full even if the electricity never reaches its intended destination; (2) the locational day-ahead prices, a key driver of which generating stations are dispatched, do not reflect subsequent actual conditions; and (3) the loop flows give rise to balancing costs that are paid by all market participants. Operators of electric power systems, sometimes known as independent system operators, view these flows as a significant operational challenge, given that unanticipated increases or decreases in generation within an electricity control area may be required to ensure the reliability of the electric power system. A PJM/Midcontinent Independent System Operator (MISO) initiative to investigate loop flows provides support to these views (PJM is the regional transmission organization directly south of New York State; MISO (Midwest Independent System Operator now known as the Midcontinent Independent System Operator) is an independent system operator located in the midcontinent of the United States and Canada that was formed in 2005). A graphic from a PJM/MISO loop flow report ( Figure 3) provides insights into the operating challenges posed by these flows (PJM/ MISO, 2007). At some interfaces, scheduled flows exceed actual flows while the reverse is true at others. These differences create multiple supply and demand electricity imbalances within the control area that need to be resolved by the system operator even if the net inadvertent flow is zero.
Consistent with this view, a statement by Ontario's IESO, makes it very clear that it believes that loop flows are both harmful and difficult to predict. In its words, "… loop flow has posed challenges to system operations since the early days of interconnected system operations. This problem has only grown with the exponential growth in transaction volumes as a result of the introduction of electricity markets. Uncontrolled loop flows are difficult to predictthey arise through the dynamic interplay of load, generation dispatch, interchange between jurisdictions, and the topology of the involved transmission systems. They pose an inherent risk in that they make it difficult for system operators to make accurate predictions of future operating conditionsa key component in managing reliability." (Independent Electricity System Operator (IESO) of Ontario, 2011)

Space Weather
PJM responds to loop flows by redispatching generation as well as by implementing transmission load relief (TLR) procedures (one example of a TLR procedure is the curtailment of nonfirm transmission services). In PJM's words, "Loop flows exist because electricity flows on the path of least resistance regardless of the path specified by contractual agreement or regulatory prescription. PJM manages loop flow using a combination of redispatch and TLR procedures " (PJM Interconnection, 2007, p. 191).
The PJM 2005 State of the Market Report made nine recommendations to improve the functioning of PJM's markets. One of these called for improved analysis of loop flows. In its words, PJM's Market Monitoring Unit called for "Improvement in the analysis of the underlying sources of loop flows in order to enhance the efficient use of PJM market resources" (PJM Interconnection, 2006, p. 24 Interconnection, 2007, p. 199;PJM Interconnection, 2008, p. 4). It also indicated that it did not have the data to improve matters. In its words,  Nor is the relevance of loop flows limited to North America. Based on data reported by the European Network of Transmission System Operators for Electricity (https://www.entsoe.eu), there were a large differences between scheduled and actual cross-border electricity flows throughout Europe over the period of 1 January 2017 through 31 December 2017. The ENTSOE data indicate that France was a scheduled net importer from Germany in 2017 but was a net exporter to Germany in terms of actual electricity flows. Russell and Schlandt (2015) have indicated that Hungary, Poland, the Czech Republic, and Slovakia have complained about loop flows from the German grid. It is believed that the flows have increased in recent years because the transmission system in Germany has not kept up with the expansion of renewable energy. Consistent with this view, the EU regulatory authority ACER has concluded that about 60% of the scheduled electricity flow between Germany and Austria is misdirected (Hodgson, 2017). It is believed that this occurs because the electricity prices in Germany stimulate scheduled exports to Austria even though the transmission system cannot accommodate the scheduled flows.

Background: Geomagnetic Activity and Loop Electricity Flows
According to the North American Electric Reliability Corporation (North American Electric Reliability Corporation, 2012, p iii), GICs pose two risks to the electric power system: 1. Overheating damage to system assets such as transformers, and 2. voltage instability and power system collapse because of GIC-induced consumption of reactive power. Girgis and Vedante (2012, p. 4) consider the possible effects of a 35°C GIC-induced temperature increase within a transformer. They conclude that the effects are likely to be benign in terms of physical damage. However, the findings of the relationship between GICs and transformer losses observed by Tay and Swift (1985) suggest a third GIC-temperature related concern. Specifically, Figure 12 of Tay and Swift depicts an approximately exponential relationship between GIC levels and the associated losses in the transformer windings.
The evidence of GICs in transmission lines (Boteler et al., 1991;Lundby et al., 1985;Pirjola & Lehtinen, 1985) may also be of relevance given that the materials that comprise the power grid have a reported temperature sensitivity of 0.4% per degree Celsius (National Academies of Sciences, Engineering, and Medicine, 2016, p. 46). Consistent with the view that the operational effects of temperature in the transmission lines are too important to be ignored, some researchers have sought to replace the electric power's industry's reliance on static transmission line ratings with ratings that dynamically change in response to changes in temperature (Cheung & Wu, 2014). One prominent adopter of this approach is Elia, the transmission system operator in Belgium (https://www.elia.be/en/about-elia). Under one of the approaches that it has implemented, the hourly ampacity forecasts, that is, forecasts of transmission capacity, are based on weather forecasts and the historical relationship between meteorological conditions (e.g., temperature) and measured ampacity (Elia, 2017, p. 5).
We are unaware of any study that has rigorously modeled the GIC/temperature/operational effects linkages. However, a recent study by Forbes and St. Cyr (2019) may be instructive. In that study, out-of-sample econometric evidence on the operational consequences of a geomagnetic storm were presented. Specifically, the evidence indicated that GICs contributed to a real-time system-wide price spike in NYISO of about U.S. $800 per MWh during the January 2005 geomagnetic storm (a period outside the sample period of this study). During the course of this storm, real-time transmission congestion costs in NYISO's NYC zone spiked to about U. S $580 per MWh (Figure 4). Real-time transmission losses spiked to about U.S. $68 per MWh, which was about 6 times the day-ahead value for the same hour. Congestion costs and transmission losses in other zones and interfaces also spiked. For example, congestion costs at NYISO's interface with Ontario spiked to about U.S. $771 per MWh. We are unable to reconcile these outcomes with the GIC/reactive power paradigm. Forbes and St. Cyr (2008) presented evidence that the unscheduled electricity flows between New York's electricity control area and PJM's control area are statistically related with a GIC proxy. Forbes and St. 10.1029/2019SW002231

Space Weather
Cyr (2010) presented multivariate econometric evidence that there is a relationship between GICs and system operator imposed transformer constraints in PJM's 500 kilovolt (kV) transmission system. More recently, Forbes and St. Cyr (2017) presented evidence that geomagnetic activity has had implications for the overall electricity imbalance in England and Wales. One of the more interesting findings of that research was that the level of scheduled interchange with other electricity control areas has had implications for the electricity imbalance in England and Wales. Moreover, within-sample evidence was presented that the scheduled flows/electricity imbalance relationship has been affected by both ambient temperature and the levels of geomagnetic activity in England, Scotland, and France. This evidence was corroborated by out-of-sample predictions that were more accurate when the contributions of the geomagnetic activity were recognized. Consistent with the goal of model adequacy, the model had a predicted R-squared equivalent of approximately 0.85 over the out-of-sample evaluation period.

Data
The overall period of analysis for this study is 1 May 2002  There is no evidence that space weather contributed to the 14 August 2003 blackout. Instead, the blackout has been attributed to terrestrial factors including inadequate system understanding by system operators, inadequate situational awareness, inadequate tree trimming in the transmission rights-of-way, and inadequate diagnostic support on the part of the reliability coordinators (U.S.-Canada Power System Outage Task Force, 2004, p. 19). The role of tropospheric weather was acknowledged by the U.S.-Canada Power System Outage Task Force but was considered minor relative to the above-mentioned issues.
Over the sample period, the system operator in Ontario scheduled hourly transmission flows with electricity market participants in five other jurisdictions: Quebec, New York, Minnesota, Michigan, and Manitoba. In terms of total scheduled flows, the trade with Michigan and New York are dominant ( Figure 5). The study focuses on Ontario's trade with New York but makes use of hourly data representing Ontario's scheduled hourly transmission with market participants in all five jurisdictions. The scheduled flow data includes any transactions scheduled during the hour in question as well as any quantities scheduled prior to real time. The archived data were provided to us directly by the system operator. Except for Ontario's scheduled and actual flows with Quebec, the scheduled and actual flow data are reported at the province or state level. The raw data for Ontario's scheduled and actual flows with Quebec are reported at the interface level, that is, a geographic node at the boundaries of the various electricity control areas. In the interest of parsimony, the data for Quebec were aggregated to the province level. Individuals interested in inspecting the current postings are invited to click on the following URL (http://reports.ieso.ca/public/IntertieScheduleFlow/).
One of our working hypotheses is that the effects of tropospheric weather on electricity flows can largely be modeled by using weather data from within Ontario. This hypothesis is based on our preliminary findings

10.1029/2019SW002231
Space Weather that temperature is a driver of the power plant level shortfall between scheduled generation and quantity of energy actually uplifted onto the transmission system.
For this reason, the model relies extensively on data from Ottawa's MacDonald-Cartier International Airport in Ontario. The model also applies hourly weather data from six additional locations ( Table 2). The weather stations were selected based on their proximity to large load centers (e.g., Detroit) and/or large electric power generating stations. For example, the location of the Niagara Falls weather station in western New York is about 8 km from the Robert Moses Niagara Hydroelectric Power Station, one of the largest generating stations in New York State. It is also a short distance from Buffalo, New York, one of the largest load centers in western New York.
The study makes use of hourly data downloaded from the web site of the NYISO (www.nyiso.com). The NYISO data include data elements not reported by Ontario or its other trading partners over the sample period. For example, New York had a day-ahead market for electricity over this period, and thus, we make use of day-ahead hourly price data from NYISO's archive. Specifically, we use the day-ahead price data for New York's interface with Ontario to construct proxies of expected operating conditions. In addition, we use the day-ahead locational price data for other locations in New York to construct proxies of expected transmission conditions. We also make use of NYISO's historical archive of day-ahead hourly forecasted load, a variable of relevance to this study.
To provide perspective on the errors in the electricity flows that are discussed in the next section, we note that NYISO's day-ahead hourly load forecast over the sample period has a RMSE that is equal to about 608 MWh. This was calculated as follows: where T equals 12,950, the total number of hours in the sample, AL t is the actual electricity load in hour t and FL t is the forecasted load in hour t.
To be able to compare the errors in the load forecasts with the errors in transmission, it is useful to weight the RMSE of the load forecast by the mean activity level which is load in this case.
The mean hourly load in NYISO over the sample period was about 18,333 MWh, and thus, the energyweighted RMSE works out to about 3.3% of NYISO's average hourly load (0.033~608/18,333). To put this value in perspective, we have calculated the RMSE in the day-ahead load forecasts in France to be about 4.45% of mean half-hourly loads over the same period. For the California Independent System Operator, the RMSE was about 2.96% of the mean hourly load.

Space Weather
These load forecasting error metrics serve as useful benchmarks by which to assess the performance of the transmission system. There are some distinctions in the metrics. The RMSEs for transmission make use of data representing actual and scheduled electricity flows while the RMSEs for load are based on actual and forecasted electricity loads. The RMSEs will be weighted by the mean absolute value of the electricity flow. In our data, imports are recognized as negative values, while exports have positive values. The mean net flow equals mean imports plus mean exports. The mean net flow underrepresents the total level of activity because of the canceling out of the positive and negative values. For example, if actual exports equal 1,000 MWh and actual imports equal −1,000 MWh, the mean net flow would equal 0 while a more meaningful measure of the activity level would be 2,000 MWh.
Some of the price ratios employed in this study use fuel pricing data. The data representing the spot price of coal were obtained from Platts (http:// www.platts.com). The data representing the spot price of natural gas were obtained from NGI Intelligence Press http://www.naturalgasintel.com).
With respect to geomagnetic activity, previous literature has indicated that GIC levels in power grids are closely related to dH/dt, the time derivative of the horizontal component of the geomagnetic field (Bolduc et al., 1998;Coles et al., 1992;Mäkinen, 1993;Viljanen, 1998;Viljanen et al., 2001). Accordingly, GICs in this study are proxied by dH/dt, the rate of change in the horizontal component of the geomagnetic field. This rate of change is taken to be the maximum change in the 1-min values of the horizontal component of the geomagnetic field during a 1-hr interval. The geomagnetic data were obtained from Intermagnet, the global network of geomagnetic observatories that monitor the Earth's magnetic field (http://www.intermagnet.org). The study specifically makes use of the 1-min geomagnetic data from the Ottawa geomagnetic observatory. This observatory does not directly report the horizontal component of the geomagnetic field, but this can be easily calculated for each minute by noting that H 2 = X 2 + Y 2 (Backus et al., 2005, pp. 2-3) There were two major geomagnetic disturbances over the sample period ( Figure 6). The first event was in late May 2003, when dH/dt achieved a peak value of 363.2 nT/min during the hour commencing at 22:00 Universal Time (UT) on 29 May 2003. The second event was in late October 2003, when dH/dt achieved a peak value of 758 nT/min during the hour commencing at 8:00 UT on 29 October 2003. During this latter event, PJM invoked conservative operations in response to solar disturbances (Forbes & St. Cyr, 2010, p. 15). The Swedish high-voltage power transmission system also experienced operational challenges during this event (Pulkkinen et al., 2005; National Oceanic and Atmospheric Administration (NOAA), 2004). Note that the out-of-sample interval, which we discuss later, also has a large dH/dt value.

The Electricity Flows Between Ontario and New York
The hourly electricity flows from Ontario to New York were highly volatile over the sample period ( Figure 7). The summary statistics support this view. The mean hourly flow is a mere −31 MWh (on average the actual flows are from New York to Ontario), while the range is over 3,600 MWh. Consistent with this evidence of variability, an augmented Dickey-Fuller test indicates that the null hypothesis of a unit root can be rejected based on a P value that is less than 0.0001 (an example of an unit root is when the variable y in time period t equals its value in time period t-1 plus a random error term). The absence of an unit root is an essential perequisite for the modeling approach we employ given that statistical analysis of variables with a unit root can give rise to spurious results (Kennedy, 2008, p. 301).

Space Weather
It is also worth noting that the level of scheduled flow was a poor indicator of the actual flow, because while Ontario was a net exporter to New York in term of the scheduled flows, it was in reality a net importer from New York in terms of the actual electricity flows. To understand this point, we note that the average hourly scheduled exports from Ontario and New York over the sample period was 444 MWh, while average scheduled imports from New York equaled 171 MWh. Thus, one might expect that average hourly net exports from Ontario to New York would equal approximately 273 MWh (273 = 444-171). However, in terms of actual flows, Ontario on average was a net importer from New York. Its net imports from New York averaged about 31 MWh, which is about 304 MWh different from the commercially scheduled outcome. Over all hours, this shortfall works out to about 3,947 GWh. In short, the market for transmission services on average failed to deliver the electricity to the intended location that market participants in New York paid for and instead exported electricity from New York to Ontario.
Consistent with this grim assessment, the actual electricity flow between Ontario and New York was in the opposite direction from what had been scheduled in about 25% of the hours. This is visually apparent if one inspects Quadrants II and IV in Figure 8. Specifically, in Quadrant IV, Ontario is a scheduled exporter but is actually an importer. Over the hours corresponding to Quadrants II and IV, the correlation between scheduled and actual flow is a dismal 0.13, with the median difference between scheduled and actual flows being a nontrivial 480 MWh. Moreover, in 154 of the hours corresponding to these "wrong-way" flow cases, the difference between scheduled and actual flow was more than 1,000 MWh. Over all intervals, 250 hr, which is about 2% of the total hours in the sample, had a difference between scheduled and actual flow of more than 1,000 MWh. Taking the difference between the scheduled and actual flows as the error in the flows, the RMSE in the flows is approximately 446 MWh. This is equivalent to about 73% of the RMSE in NYISO's day-ahead load forecast even though the mean level of electricity load in New York over the sample period was 18,333 MWh. It is also very large relative to the mean electricity flow between Ontario and New York when measured in terms of its absolute value. As discussed above, we weight the RMSE by the mean in absolute value of the flow because it represents the mean of the total quantity of electricity that actually flows over the transmission lines. In this case, the absolute value of the actual flows is about 502 MWh, and thus, the RMSE in the flows is about 88.9% of the mean absolute flow (0.889~446/502). An alternative approach is to weight the RMSE by the means of the absolute values of scheduled flows. In this case, the mean level of scheduled exports was about 444 MWh, while the mean level of scheduled imports in absolute value was 171 MWh. Thus, the mean level of schedule flow was about 615 MWh (615 = 444 + 171).
In short, regardless of specific weights employed, the energy-weighted RMSEs are very large. We thus conclude that the laws of physics led to a suboptimal market outcome over the sample period. Consistent with our conclusion in terms of the nature of the problem, PJM's market monitoring unit has recommended that PJM "… explore an interchange optimization solution with its neighboring balancing authorities that would remove the need for market participants to schedule physical transactions across seams." (PJM Interconnection, 2018, p. 54) We suspect that this extreme recommendation to eliminate transmission between PJM and its trading partners, which we do not endorse (we believe that lower cost solutions are possible), emanates from a frustration with the lack of response to its many calls for improved analysis of loop flows.
The reports we have cited indicate that the system operators are unable to accurately forecast the flows but are silent on the magnitude of the forecast errors. To address this issue, we have obtained day-ahead hourly data from NYISO (http://mis.nyiso.com/public/P-30list.htm) that represents NYSIO's desired net interchange (DNI) from Ontario over the sample period. A scatter plot of New York's DNI from Ontario with the actual net exports to New York from Ontario indicates that the desired level of flows is highly

10.1029/2019SW002231
Space Weather inaccurate measure of the actual flow ( Figure 9). Specifically, the RMSE in the flow is about 193% of the mean absolute flow when New York's DNI from Ontario is the reference level. This does not address the issue of the accuracy of the hour-ahead projection. That issue is considered near the end of this section.
The large differences between scheduled and actual flows between Ontario and New York are part of what is known as the "Lake Erie loop flows." Loop flows around Lake Erie are predominantly in a counterclockwise direction. This leads to large differences between the scheduled versus actual electricity flows for the electricity control areas in the surrounding region ( Figure 10). For example, in Figure 10, the actual flow between IESO and NYISO is in opposite direction from the scheduled flow. The errors in the flows for some of the other interchanges are also quite large (e.g., IESO and MISO as well as PJM and NYISO). NYISO, municipal utilities in the several states, and the American Public Power Association complained to the Federal Energy Regulatory Commission (FERC) in 2008 about the Lake Erie loop flow problem (Clamp, 2010). The municipal utilities noted that the loop flows were significantly increasing the delivered cost of electricity. As reported by Reuters (2008), it was alleged that some of the Lake Erie loop flows could be attributed to power marketers taking advantage of the different methods used to price electric power in the four markets surrounding the lake (New York, Ontario, Midwest ISO, and PJM). Acting on the allegation, the FERC accepted interim transmission tariff revisions by NYISO that were designed to stem these flows (New York Independent System Operator, 2008). Nevertheless, a subsequent investigation by the FERC Office of Enforcement concluded there was no evidence of market manipulation (U. S. Federal Energy Regulatory Commission (FERC), 2009). While this finding indicates that the Lake Erie loop flows simply reflect the law of physics, it leaves the specific causes unaddressed. For example, there is no widely accepted explanation why the flows are predominantly in a counterclockwise direction but are occasionally in the opposite direction.
Before proceeding, it should be noted that phase angle regulators (PARs) were installed subsequent to our sample period with the intent of mitigating the loop flow. This technology modifies the flows by changing the paths of least resistance. Over the period of 1 August 2012 through 1 August 2013, the PARs were able to control the flow between Michigan and Ontario between 96% and 98% of the time (Midcontinent Independent System Operator (MISO), 2014). According to the MISO, the PARs have reduced the incidence of periods with loop flow greater than 200 MW from 56% to 16% of the time (MISO, 2014). However, as recently as 2016, NYISO indicated that the challenge posed by the Lake Erie loop flows had not been resolved. In its words: "The impact of Lake Erie Loop Flow on grid reliability and real-time market resource scheduling and pricing has been exacerbated in recent months." (New York Independent System Operator, 2016, p. 2).
Consistent with NYISO's assessment, Potomac Economics, the market monitor of NYISO, has called for the development of a model that could accurately forecast the electricity flows at the Ontario/New York interchange. In its words,   This statement suggests that the industry is currently unable to model the flows accurately. Unfortunately for NYISO, the report by Potomac Economics does not recommend an approach to make such forecasts.

Modeling Electricity Flows
Our modeling approach accepts the late Professor George Box's well-known aphorism that "All models are wrong; some models are useful" (Box et al., 2005, p. 440). They are all "wrong" in the sense that all are simplifications of a complex reality but can be useful if they capture salient or major aspects of that reality.
Metrics of a model's usefulness. In our view, there are two important metrics of a time series model's usefulness. The first is the finding of model adequacy measured by whether the residuals have the property of white noise. This view of model adequacy is endorsed by Becketti (2013, p. 256) who notes, "… the measure of a well-specified and accurately fitted time-series model is evidence that the residuals … are white noise." This standard of model adequacy is consistent with Kennedy (2008, p. 315) and Granger and Newbold (1974, p. 119). A second metric of a time series model's usefulness is its out-of-sample predictive accuracy. We note that a "useless" model will, almost by definition, be incapable of making accurate out-of-sample predictions. Accordingly, the accuracy of the out-of-sample predictions is an important metric of a model's usefulness. One of the key premises of our modeling approach is that the effects of both temperature and GICs are highly dependent on the level of scheduled transmission. Thus, we make extensive use of variables that reflect the interaction between temperature, GICs, and various measures of the scheduled flows. Another important consideration is that the flows between Ontario and New York cannot be fully understood in isolation from other portions of the transmission network. Moreover, while linear relationships are appealing, there is no reason to believe that linearity is a useful assumption when modeling a complex system.
Because of these considerations, the model is very far from being parsimonious. Some may suspect that the model is "overfitted" as a result. If true, this would be serious concern given that overfitted models are plagued by an inadequate ability to generate accurate out-of-sample predictions. In this case, the suspicion of overfitting is not supported by the "rule of thumb" advanced by Trout (2006) who suggests that a least squares regression model with k explanatory variables should have at least 10*k observations. In the least squares version of the model presented here, there are over 70 observations per estimated parameter. We further note that parsimony, while intuitively appealing and highly justified in many circumstances, can give rise to biased estimates when a relevant explanatory variable is excluded from the analysis (Kennedy, 2008, p. 92). We also note that the omission of interaction effects among explanatory variables is believed to be a common mistake in econometrics (Kennedy, 2008, p. 372). We recognize that a lack of parsimony can make the results challenging to interpret. We also acknowledge that a lack of parsimony, especially when the number of degrees of freedom is small, could give rise to unit root issues, an absence of white noise in the residual error terms, and poor forecasting results when the model is applied to out-of-sample data.
Despite its acknowledged lack of parsimony, the overall modeling approach in this study is relatively straightforward, as explained below.
Network effects. In the presence of network effects, the actual electricity flows between the hypothetical locations ABC and XYZ are influenced by the scheduled flows between seemingly irrelevant locations. This can occur if the scheduled flows between either ABC or XYZ and the hypothetical location DEF affect the paths of least resistance between ABC and XYZ. These possible effects are incorporated in the model by including the scheduled imports and exports between Ontario (IESO) and each of its trading partners as explanatory variables. Accordingly, the model includes variables representing Ontario's scheduled imports and exports with New York (NY), Michigan (MI), Manitoba (MB), Minnesota (MN), and Hydro Quebec (HQ).
Forecasted and actual electricity load. The study also uses the hourly actual load data for both NYISO and Ontario. While archived forecasted load data for Ontario were unavailable, the model does use archived forecasted load data for NYISO.

Space Weather
Weather. The study uses temperature, air pressure, and humidity data from Ottawa, Canada. It is hypothesized that the effect of temperature may be contingent on the scheduled flows, and thus the temperature variable will be interacted with the scheduled flow variables. The analysis also makes selective use of temperature data from New York City, Niagara Falls International Airport in New York, Michigan, Minnesota, Manitoba, and Quebec.
Proxy for Geomagnetic activity. The GIC proxy is entered directly into the econometric model as an explanatory variable. Given our hypothesis that the GIC effects may be highly contingent on both temperature and the scheduled flows, the GIC proxy is also interacted with Ontario's scheduled flows, as well as the variable representing the interaction between temperature and Ontario's scheduled flows. This is done by creating variables equal to the mathematical product of various power grid variables, temperature, and the GIC proxy. The explanatory variables in our model also include the interaction between the GIC proxy and the hourly average of the horizontal component of the geomagnetic field. This is done by creating an explanatory variable that equals the mathematical product of the two variables.
Proxies for expected generating conditions. Ideally, the model would make use of scheduled generation data by fuel. This would allow us to control for possible differences in the operating performance of the various generation technologies. Unfortunately, we do not have access to the required data. We concede that this represents a modeling challenge. Our approach to ameliorate this shortcoming is the use of day-ahead price ratios that may serve as useful proxies of expected operating conditions. Specifically, the model makes use of the hourly day-ahead price variable for NYISO's interchange with Ontario weighted by the spot price of natural gas. This follows from Forbes and Zampelli (2014), who have reported evidence that the day-ahead market for electricity in California is informationally efficient in the sense that the day-ahead price relative to the MWh equivalent cost of natural gas is a leading indicator of the operational outcome. To control for possible differences in operating performance between coal and gas-fired generating stations, the MWh equivalent price of natural gas relative to the MWh equivalent price of coal is also employed as an explanatory variable. This approach has been employed by Forbes and St. Cyr (2012). In this paper, we take this approach one step further by including in our list of explanatory variables the ratio of the day-ahead electricity price for each hour of every day relative to the minimum hourly day-ahead price for the day in question. The purpose of this variable is to proxy the operational challenges of producing electricity from sources other than coal or natural gas. This may be a useful explanatory variable to include in our model given that Ontario's electricity supply over this period was largely accounted for by nuclear and hydroelectric generating stations (Ontario Ministry of Energy, 2017).
Proxies for expected transmission conditions. The day-ahead losses and congestion costs for the following zones in the New York control area are represented in the model: West, New York City, and Long Island. West is a zone in New York's control area that is adjacent to Ontario, while the New York City (NYC) and the Long Island (LI) zones are New York's largest in terms of electricity demand. The day-ahead losses and congestion costs for New York's interchanges with the following balancing areas are also represented in the model: Ontario Hydro (OH), Hydro Quebec (HQ), New England (NE), and PJM. The variables representing day-ahead hourly transmission congestion and losses are weighted by the day-ahead hourly price of electricity at New York's reference point. The reference price in NYISO is determined at the Marcy 345-kV substation in Marcy, New York, which is about 300 km from New York City. This location is the reference against which energy losses and congestion costs in NYISO are calculated. The locational marginal price (LMP) for each location has three components: an energy component, a line loss component, and a congestion cost component. Specifically, for location j, Reported losses and congestion costs for location j can be positive, 0, or even negative. The day-ahead measures of the losses and congestion costs provide information about expected operating conditions on the transmission system, and thus, the econometric model will employ measures of these costs as explanatory variables in terms of their absolute values. The model also includes binary variables to represent the hour of the day and the day of the week.

Model.
The linear version of the model can be represented as follows: Flow is the actual hourly electricity flow in MW between Ontario and New York. Positive values represent exports to New York while negative values represent imports from New York T ONT is the hourly temperature in Ottawa, Ontario, measured in kelvins T ONT,k is the hourly temperature in Ottawa, Ontario, measured in kelvins, averaged with the temperature in state/province k, also measured in kelvins, k = NY, MI, MN, HQ, and MB T NYC is the hourly temperature at LaGuardia Airport in New York City measured in kelvins P ONT is the air pressure in Ottawa, Ontario HUMID ONT is the hourly relative humidity in Ottawa, Ontario Pratio is the hourly day-ahead price at the New York/Ontario interface relative to the minimum hourly day-ahead price at that same interface during the same day in question SR is the ratio of the hourly day-ahead price at the New York/Ontario interface relative to the fuel cost of producing 1 MWh of electricity using natural gas FPR is the fuel cost of producing 1 MWh of electricity using natural gas relative to the fuel cost of producing one MWh of electricity using coal OntarioAL is the actual hourly load for Ontario NYFL is the day-ahead hourly forecasted load for the New York electricity control area 10.1029/2019SW002231

Space Weather
NYAL is the actual hourly load for the New York electricity control area PNX k is a binary variable that is equal to one if scheduled exports from Ontario to k exceed scheduled imports from k, k = NY, MI, MN, HQ, and MB SI k represents Ontario's scheduled hourly imports from k, k = NY, MI, MN, HQ, and MB SX k represents Ontario's scheduled hourly exports to k, k = NY, MI, MN, HQ, and MB PNX k is a binary variable that is equal to one if scheduled exports from Ontario to k exceed scheduled imports from k, k = NY, MI, MN, HQ, and MB

Space Weather
POSX k is the difference between scheduled exports from Ontario to k and scheduled imports from k when scheduled exports from Ontario to k exceed scheduled imports from k. The variable has a value of zero when Ontario's scheduled exports to k are less than Ontario's scheduled imports from k PMCL j is a binary variable that is equal to one if day-ahead losses for zone/interface j are positive, j = West, NYC, LI, OH, HQ, NE, PJM.  Table 3. Readers interested in the summary statistics for a variable not listed can easily access the data at the following location: https://osf. io/jm47e/ website. As observed earlier, the electricity flows are highly volatile. Specifically, the absolute value of the coefficient of variation for the actual flows from Ontario to New York equals 19.86 (19.86 612.93/30.86). In contrast, the coefficient of variation in net scheduled flows from Ontario to New York equals 2.24. While the volatility in the actual flows ensures that the dependent variable is stationary, that is, does not exhibit a unit root, the disparity in the volatility between actual and scheduled flows highlights the challenge of using traditional statistical methods to explain the variation in the electricity flows. Further evidence of this point is provided by Figure 11, which reports the pairwise correlations between the Ontario/NY electricity flows and a subset of the variables described in Table 3. Observe that while the reported correlations for SX NY and SI NY exceed 0.65 in absolute value, the correlations with SX MI and SI MI are lower than the corresponding correlations with SX HQ and SI HQ . Also note that the correlation with T ONT is quite modest while the correlation with Pratio is nontrivial.
The next section of the paper discusses the methods used to estimate the model. Readers who are uninterested in these details may wish to proceed to section 8 of the paper.

Estimation and Results
The estimation process. The study used a two-step estimation approach. In the first step, the functional form of the structural equation was identified. In this equation, the electricity flow is modeled as a function of temperature, scheduled transmission, forecasted load, the GIC proxy, etc. The purpose of the second step in the estimation is to obtain parameter estimates that give rise to a white noise error structure that also has the property of asymptotic normality. Specifically, the second step recognizes that the actual electricity flow in hour t is highly correlated with the levels in previous hours and that a failure to recognize this reality would taint the results. The autocorrelations in the flows do not monotonically decline over time but instead have a significant diurnal pattern over the 24-hr market periods for each day (Figure 12). For example, the flows in market period t are highly correlated with that in market period t-24 (24 hr previously), relative to the flows in other market periods from the day before. Additionally, the flows in hour t are highly correlated with those in periods t-48, t-72, t-96, t-120, t-144, t-168, t-192, etc. There is also the possible presence of autoregressive conditional heteroscedasticity to be considered. In our opinion, the estimation method needs to control for these time-series characteristics of the electricity flow. Our caution on this point is consistent with Granger and Newbold (1974, p. 117) who state the following: "In our opinion the econometrician can no longer ignore the time series properties of the variables with which he is concernedexcept at his peril." Step two of the estimation is accomplished by first making use of an autoregressive conditional heteroskedasticity (ARCH) model. This modeling approach splits the residual error term into a stochastic element and a time-dependent standard deviation. This is a useful method in modeling times series data that exhibit time-varying volatility, that is, periods of turbulence followed at some point by periods of relative calm. The second step in the modeling also makes use of an autoregressive-moving-average with exogenous inputs model specification (ARMAX) with the transformed explanatory variables from the first step (e.g., scheduled exports to New York) being included as the exogenous inputs and where the disturbance terms are presumed to follow an autoregressive-moving-average (ARMA) specification.
Our estimation approach acknowledges that the Box-Jenkins philosophy of being parsimonious in the application of ARCH/ARMA terms may conflict with the goal of within-sample model adequacy, that is, white noise in the residuals, when modeling hourly electricity flow data; the goal of model adequacy should be the higher priority. Thus, while researchers who analyze daily, monthly, or quarterly data may make use of ARMA(1,1), ARMA(2,2), or ARCH(1) specifications, our approach will go substantially beyond this, given the autocorrelation evidenced in Figure 12 and the results of the Engel LM test. Specifically, we will estimate an ARMA(21,62) model with an ARCH(2) process for the conditional variance.
We begin the first step of the estimation by testing whether it is appropriate to transform the dependent variable. Specifically, we follow Box & Cox, 1964, p. 214) and transform the dependent variable in equation (3) as follows: where TFlow is the transformed electricity flow level; λ 1 is a parameter estimate obtained from the Box-Cox procedure; λ 2 is a value that ensures that the left-hand side of equation (3) is positive. In this case, λ 1 was taken to be 2,500 MWh. Inspection of (3) and (4) reveals that the Box-Cox transformation preserves the direction of the relationships that the transformed dependent variable has with the explanatory variables. Note that the assumed value of λ 2 does affect the reported value of the estimated constant term. The estimated value of λ 1 under the assumption of linearity in terms of the explanatory variables is 1.140589. Based on a P value that is less than 0.001, the analysis does not support a linear specification for the dependent variable. Given this finding, the issue of linearity in terms of the explanatory variables was then considered by applying the multivariable fractional polynomial (MFP) methodology. This a useful technique when one suspects that some or all of the relationships between the dependent variable and the explanatory variables are nonlinear (Royston & Sauerbrei, 2008).
The MFP approach begins by estimating a model that is strictly linear in the explanatory variables. Subsequent estimations then cycle through a battery of nonlinear transformations of the explanatory variables (e.g., cube roots, square roots, and squares) until the MFP model that best predicts the dependent variable is found. In our case, the analysis provided  support for representing a number of the explanatory variables with powers other than unity. This finding is conditional on the estimated value of λ 1 . This dependence could give rise to an inconsistency between the Box-Cox parameter estimate and the MFP recommended exponents. Fortunately, this issue is easily addressed by iterating between the two estimation techniques until the Box-Cox transformation of the dependent variable is consistent with the MFP transformations of the independent variables. The revised value of λ 1 is 1.149376. With respect to the MFP transformations of the independent variables, 32 of the explanatory variables have exponents other than one. For example, the MFP recommended exponent for the variable NYFL is 2.5.
Least squares estimation of the transformed equation yields an estimated equation with a R 2 of about 0.936. Unfortunately, the R 2 and the parameter estimates are open to question, that is, the results may be spurious, given that the Portmanteau (Q) test, first proposed by Box and Pierce (1970) and refined by Ljung and Box (1978), indicates that the residual error terms do not have the property of "white noise." For example, for lags one through 100 the P values are less than 0.0001 indicating that the null hypothesis of no autocorrelation in the residuals is rejected. Moreover, the residual error terms have somewhat more kurtosis, that is, a higher incidence of outcomes in the tails of the distribution, than the Gaussian distribution. Consistent with this finding, the null hypothesis of normality is rejected by the Shapiro-Wilk Normality test (Shapiro & Wilk, 1965). The null hypothesis of no ARCH was rejected with a P value less than 0.0001 using Engle's Lagrange multiplier test (Engle, 1982).
The time series issue in the least squares residuals is addressed in step two of the estimation procedure, that is, by the ARCH/ARMA methods. The modeled lag lengths in the ARCH process are Lags 1 and 2. In addition, the conditional variance is modeled as a function of binary variables for the hour of the day, the day of the week, and the following variables: where TSX is Ontario's total scheduled exports; TSI is Ontario's total scheduled imports.
Portmanteau (Q) tests were conducted for the hourly Lags 1 through 100, 120, 144, 168, 192, 200, 400, 300, 400, 500, 600, 700, and 800. For all these 112 lags, the P values are greater than.05. For example, the P value at lag 1 is 0.3808, thereby failing to reject the null hypothesis of a white noise error structure for lag one. The P value at lag 2 is 0.5608, thereby failing to reject the null hypothesis of a white noise error structure for lags one and two. At hourly Lag 24, the P value is 0.7409, thereby failing to reject the null hypothesis of white noise error structure for Lags 1 through 24. For the hourly Lag 800, the P value is 1.0000. Accordingly, the null hypothesis of a white noise error structure is not rejected. Consistent with this conclusion, the vast proportion of the autocorrelations in the standardized residuals are within the confidence bands corresponding to the null hypothesis of white noise (Figure 13). Based on these findings, we conclude that the model is useful even though the model, like all models, is "wrong." We estimated the transformed equation under the assumption that the residual errors terms correspond to the Student's t distribution. This distribution allows for more kurtosis ("heavy tailedness") than the Gaussian   Space Weather distribution. Specifically, the level of kurtosis that is accommodated by this distribution in excess of the Gaussian's level of three equals 6/(v − 4) provided that v > 4, where v is the number of degrees of freedom (Harvey, 2013, p. 20). In this case, the number of degrees of freedom does not refer to the sample size minus the number of estimated parameters but instead is a "shape" parameter for the distribution. In this case, the estimated value of v is approximately 16, and thus, the level of kurtosis that is accommodated by this distribution in excess of the Gaussian's level of three equals 6/(16 − 4), which is equal to one-half. In our view, the advantage of using this distribution is the resulting asymptotic normality of the maximum likelihood estimators, which ensures that our tests of statistical significance will be meaningful. As a further precaution, the standard errors for the coefficients are based on the full Huber/White/sandwich formulation, and thus from Bollerslev and Wooldridge (1992), the estimates of variance are robust to symmetric non-Gaussian disturbances. Unit root issues were addressed using the Augmented Dickey-Fuller test. Using this test, the null hypothesis that the standardized residuals have a unit root is rejected based on a P value that is less than 0.0001. Consistent with the seemingly modest level of kurtosis in the residuals, a Q-Q plot of the standardized residuals indicates a high measure of consistency with the Gaussian distribution over the quartile range of −3 to 3 ( Figure 14). While the figure indicates that the lower tail is more than adequately modeled, there nevertheless is a high degree of inconsistency in the upper tail. This indicates that the model's ability to model the positive tail risk is inadequate. This is consistent with a preliminary Hill (1975) estimator of the tail index. Depending on the nature of a positive tail risk relative to the negative tail risk, this finding could be a very serious concern if the methodology were to be employed by system operators to predict electricity flows during periods of extreme operational stress.
Before proceeding, we note that an anonymous reviewer has encouraged us to explore the possibility that the conditional variance in hour t is dependent on the conditional variance in previous hours. He or she has also encouraged us to examine whether the conditional variance has any implications for the expected value of the electricity flows. The former modeling analysis can be accomplished by making use of a generalized ARCH (GARCH) model, while the latter can be modeled using an "ARCH-in-means" specification (Note that heteroskedasticity refers to modeling errors that are highly variable, that is, have an uneven variance). Both of these suggestions were rigorously examined. In the case of a GARCH(1) specification, the model became unstable and crashed. A GARCH(1/2) did converge, but the results were inconsistent with the theory underlying the GARCH process (Becketti, 2013, p. 287). In the ARCH-in-means analysis, the model converged but the estimated ARCH-in-mean terms were statistically insignificant. A simultaneous application of the two suggestions was also not fruitful. Despite these results, we thank the reviewer for the suggestions.
The statistical significance of the explanatory variables. The estimation results are presented in Table 4, exclusive of the estimates for the hours of the day, the day of the week, the proxies for expected transmission conditions, the AR terms, the MA terms, the ARCH terms, and the conditional variance terms.

Space Weather
The following space weather variables are statistically significant at the 5% level with the estimated algebraic signs being represented in parentheses: G (+), (G*T) 0.25 (−), and (G*AvgH) 0.25 (+). The statistical significance of these three variables is consistent with Forbes & St. Cyr (2017; Table 4).
The following interactions between the GIC proxy and the load and generation proxy variables are statistically significant at the 5% level: (G*Pratio) 0.25 (−), G*SR (+), G*FPR (+), G*OntarioAL (−), G*NYFL (+), and G*NYAL (−). The statistical significance of these variables is consistent with Forbes and St. Cyr (2017;Table 4), which reports that the net imbalance volume in England and Wales is associated with similar GIC interaction variables.
The following interactions between temperature, the GIC proxy and the load and generation proxy variables are statistically significant at the 5% level: (G*T ONT *Pratio) 0.25 (+), G*T ONT *SR (−), G*T ONT *FPR (−), G*T ONT *OntarioAL (+), G*T NYC *NYFL (−), and G*T NYC *NYAL (+). The statistical significance of these variables, none of which would be considered in a parsimonious analysis, is consistent with Forbes and St. Cyr (2017;Table 4). While some of the estimated coefficients are small in magnitude, others such as (G*T ONT *Pratio) 0.25 are nontrivial.  ; Table 4), which presents evidence that the net imbalance volume in England and Wales is associated with explanatory variables defined as scheduled transmission multiplied by temperature.
The following interactions between the GIC proxy and the scheduled flows are statistically significant:  Table 4).
A number of the above noted statistically significant variables such as G, (G*T) 0.25 , and (G*AvgH) 0.25 have P values less than 0.001. We believe that these results are important given the finding of model adequacy in terms of the residuals having the property of white noise.  Concerning the results not reported in Table 4, 8 of the 29 variables representing the hour of day and the day of the week are statistically significant. Eleven of the 33 proxies of expected transmission conditions are statistically significant. The statistically significant proxies include the following:

An Out-of-Sample Evaluation of the Model
In this section, we evaluate the model using out-of-sample hourly data. The period of evaluation is 1 November to 9 December 2003. Before proceeding, we remind readers that the econometric model has both a structural and time series component. The structural component reflects the direct influences of temperature, scheduled flows, forecasted load, the GIC proxy, etc. The time series component takes into account that the electricity flow outcome in hour t is not independent from the electricity flow outcomes in hour t-1, t-2, t-3, … t-k where k is the maximum hourly lag incorporated into the model. In this case, the maximum hourly lag that is statistically significant is 792, which works out to exactly 33 days (792 = 33 * 24).
Note that the scheduled electricity flow is a very poor predictor of the actual electricity flow over the evaluation period ( Figure 15). The day-ahead DNIs reported by NYISO are also poor predictors ( Figure 16). In contrast, the hour-ahead prediction, which is based on the estimated structural parameters as well as the estimated ARCH/ARMA terms, is an accurate predictor of the actual flow ( Figure 17). Specifically, the predictive R-squared of the hour-ahead predictions equals 0.971 while the RMSE of the predictions equals 73 MWh. In addition to being more accurate than using either the scheduled flows of the day-ahead DNIs as predictors, it is also more accurate than a forecast that assumes that the flow in period t equals the flow observed in period t-1, that is, a persistence forecast ( Figure 18). This suggests that accurate predictions of the flows are possible if accurate forecasts of temperature and geomagnetic activity were to be employed. This issue is explored using forecasted temperature data in Appendix A.
While the accuracy of the hour-ahead prediction is worth noting, the hour-ahead prediction does not lend itself to a counterfactual analysis that seeks to address the role played by GICs. This occurs because the ARCH/ARMA component of the forecast reflects the lagged actual outcomes, which are in turn are affected by the lagged GIC effects even if the contribution of GICs are purged from the structural equation. Hence, a counterfactual prediction that presumes that the GIC effects are 0 will reflect a portion of those effects. Fortunately, it is possible to generate a forecast based on the structural component exclusively. This  latter type of forecast considers the fundamental drivers of the electricity flows exclusively, and thus, we focus on this type of forecast in our outof-sample evaluation of the GIC effects.
There was a significant level of geomagnetic activity over the out-ofsample evaluation period. The peak value of the GIC proxy was 103 nT/min, which is well below the peak of 758 nT/min over the sample period. Nevertheless, based on 1-min data reported by the Ottawa observatory, the peak value of the GIC proxy for the out-of-sample period ranks within the 99th percentile for Ottawa over the period 1996 through 2008. Consistent with this view, a portion of the period was considered historic in terms of geomagnetic activity (NOAA, 2004). Thus, in terms of geomagnetic activity the out-of-sample period would seem to be adequate to evaluate the model. Relative to the sample period, the out-ofsample period was below average in terms of temperature and Ontario's scheduled flows with New York.
Two structural prediction series for the out-of-sample period were created. The first structural prediction series is based on all the estimated structural parameters, while the second series constrains the 48 statistically significant coefficients that reflect geomagnetic activity to be equal to 0. Both series are transformed to the original units of measure using the values of λ 1 and λ 2 from the Box-Cox analysis. The difference between the two series was taken to be the net structural effect of geomagnetic activity on the electricity flow. The absolute value of this structural effect is highly correlated with the GIC proxy. Specifically, the correlation coefficient between the two series equals 0.80. Consistent with this correlation, there is a clear visual relationship between the peak level of geomagnetic activity in the out-of-sample evaluation period and the predicted effect of geomagnetic activity ( Figure 19). Specifically, there is a sharp downward plunge of about 400 MWh in the out-of-sample GIC-induced predicted level of electricity flow on 20 November 2003 that very closely corresponds with a spike in the value of the GIC proxy.
Observe that some of the variation in the GIC-induced predicted level of electricity flow depicted in Figure 19 does not appear to be visually related to the value of the GIC proxy. This occurs because, as Table 4 indicates, a significant portion of the effects of geomagnetic activity are conditional on the GIC proxy and its interactions with temperature, scheduled flow, electricity load, and the proxies of expected operating conditions. Consistent with causality between geomagnetic activity and the electricity flows, the out-of-sample RMSE is lower when the structural predictions reflect the estimated effects of geomagnetic activity.
Specifically, the RMSE is 193 MWh when the predictions reflect the estimated effects of geomagnetic activity; it equals 200 MWh when the statistically significant effects of geomagnetic activity are assumed to equal zero. Thus, the improvement in predictive accuracy is a modest 7 MWh over all hours. The improvement in forecast accuracy is especially evident if one focuses on those hours with the largest statistically significant estimated structural GIC effects. Over the 10 hr with the largest estimated geomagnetic effects, the prediction that includes the estimated GIC effects yields out-of-sample predictions with an RMSE of about 291 MWh, while the forecast that ignores these effects has a RMSE of about 555 MWh (Figure 20). For these 10 hr, the improvement in the RMSE equals 264 MWh.
Over the 5 hr with the largest estimated geomagnetic effects, the prediction that includes the estimated GIC effects yields out-of-sample predictions with an RMSE of about 336 MWh, while the prediction that ignores these effects has a RMSE of about 676 MWh ( Figure 21). Thus, the improvement in forecast accuracy for these 5 hr equals 340 MWh.

Space Weather
Over the 2 hr with the largest estimated geomagnetic effects, the prediction that includes the estimated GIC effects yields out-of-sample predictions with an RMSE of about 364 MWh, while the prediction that ignores these effects has a RMSE of about 808 MWh ( Figure 22). Thus, the improvement in forecast accuracy for these 2 hr equals 444 MWh.
Observe that the improvement in out-of-sample forecast accuracy is larger during those hours in which when the estimated geomagnetic effect is larger. In short, the findings are consistent with the existence of a causal relationship between geomagnetic activity and the electricity flows.

Discussion: An Assessment of the Challenge Posed by GICs Over the Sample Period
Based on the structural estimates, the predicted effect of geomagnetic activity on the electricity flows was calculated for each hour of the sample period. The mean predicted structural effect of geomagnetic activity on the electricity flow was about −9.6 MWh over all hours. The predicted structural effect is highly variable over the sample period ( Figure 23). A counterfactual analysis was also attempted assuming the peak level of geomagnetic activity observed in October 2003. Unfortunately, the model is incapable of calculating an exact counterfactual prediction because the predicted value of the transformed variable is negative in this case, while the Box-Cox transformation requires that it exceeds 0. Nevertheless, the analysis does indicate that the geomagnetically induced predicted level of electricity flow given the counterfactual assumptions is less than −4,000 MWh. This suggests that a 29 October 2003 geomagnetic event would have had a calamitous effect on the electricity flow if it had occurred during the hour commencing 22:00 UT on 29 May 2003. While some may be inclined to discount this counterfactual possibility because of the fall/spring seasonal nature of geomagnetic activity (Russell & McPherron, 1973), the extreme space weather near miss in July 2012 (Baker et al., 2013) would seem to belie this possible objection.
The empirical results also have implications for the volatility in the electricity flows. Specifically, the exact results concerning volatility (not reported in Table 4, but available on request) indicate that the   conditional variance in the electricity flow can be accounted for by ffiffiffiffi H p ; ffiffiffi ffi G p ;and a number of other variables including the hour of the day, a measure of total scheduled exports, and scheduled exports to New York. For the sample period, the predicted mean conditional variance is about 16,108 MWh 2 . The mean value of the prediction declines to approximately 1,643 MWh 2 if the statistically significant coefficients corresponding to ffiffiffiffi H p and ffiffiffi ffi G p are set equal to 0. Accordingly, the mean predicted conditional variance that can be attributed to geomagnetic activity is 14,465 MWh 2 . The predicted effect attributed to geomagnetic activity is highly variable over the sample period ( Figure 24). Specifically, the geomagnetically induced component of the estimated conditional variance ranges from 1,508 MWh 2 to about 35,992 MWh 2 . To us, this represents a possible explanation to the enigma on why the volatility in the actual electricity flows is so much larger than the volatility in the scheduled flows and/or the volatility in electricity demand.

Summary and Conclusion
In response to a research gap noted by Eastwood et al. (2017), this paper has addressed whether geomagnetic activity had any implications for electricity flows between the province of Ontario, Canada, and the state of New York, USA, over the period of 1 May 2002 through 31 October 2003. The starting point in terms of analysis is the observation that there can be large differences between the scheduled and actual flows of electricity between the two electricity control areas. This occurs because electricity flows on alternating current transmission systems, the dominant form of electricity transmission, follow the laws of physics-including geomagnetic activity-rather than the laws of supply and demand. Operators of electric power systems such as IESO and PJM have conceded that they are unable to model these unscheduled flows accurately. On several occasions, PJM's market monitoring unit has explicitly called for improved analysis of the flows and thus this analysis would seem to be highly appropriate. The flows may require operators to increase or decrease electricity generation within a control area to ensure that supply equals demand. In extreme cases, the flows may represent a significant challenge to the reliability of the electric power system.
In conjunction with the Box-Cox and MFP methodologies, we formulated and estimated an ARCH/ARMAX model with statistical controls for weather, forecasted load, actual load, scheduled transmission flows, and proxies for generation and transmission conditions to examine the relationship between geomagnetic activity and the electricity flows. The results indicate that the electricity flows between these two electricity control areas were affected by geomagnetic activity over the sample period.
The ARCH/ARMAX model was employed to predict the electricity flows using out-of-sample data. These predictions are more accurate than using scheduled flows as a predictor of actual flow. The predictions are also more accurate than a persistence forecast. The out-of-sample RMSE of the structural predictions is smaller, that is, the predictions are more accurate, when the predictions take geomagnetic activity into account. Consistent with geomagnetic activity having a causal effect on the flows, the improvement in predictive accuracy is larger for those hours in which the predicted effect of geomagnetic activity is larger. This suggests that GICs do have the potential to disrupt the electricity flows.
We cheerfully concede that the out-of-sample predictive accuracy of our model does not "prove" that GICs can disrupt the flows. Instead, it may be the case that there is some unknown factor at work that is highly correlated with the GIC proxy that is the true culprit. Those who would maintain this position in good faith should be willing to at least name this variable, provide evidence of its correlation with the GIC proxy, and also provide a coherent explanation of the physics that supports their belief. In the absence of this, the results of this paper would seem to be robust to rational skepticism.
Consistent with Forbes and St. Cyr (2017), the analysis indicates that the level of the geomagnetically induced electricity flow is highly dependent on expected system conditions at the time of the geomagnetic storm. A relatively minor storm may have a large effect if the storm occurs when system vulnerability is high, while a major storm may have only a minor effect if system vulnerability is low. A major storm could have a calamitous effect on the electricity flow if it occurs when system vulnerability is high. The findings also indicate that geomagnetic activity is an important driver of the volatility in the electricity flows.

Appendix A: An Alternative Model
In this appendix, we explore the use of an alternative time series econometric model estimated over the period August 2009 through 31 December 2012. There are 21,165 hourly observations in the alternative model. We do not have access to accurate predictions of geomagnetic activity and thus the model excludes this variable. The model does make use of archived hourly day-ahead temperature forecast data; measures of hourly locational marginal prices throughout MISO, PJM, and NYISO; hourly load forecast data from MISO, PJM, and NYISO; and hourly scheduled flow data for IESO and MISO (the scheduled flow data for PJM over this time period were unavailable to us). The model also makes use of numerous ARCH/ARMA terms to model the time series nature of the data. These terms allowed the model to achieve "white noise" in the residuals. The model has an R-squared equivalence of 0.96.
The model was evaluated using 4,218 hourly observations over the period 1 January to 30 June 2013 (data issues precluded evaluation after 30 June 2013). The RMSE of the out-of-sample predictions is approximately 100 MWh, which is superior to the alternative approaches for which we have data ( Figure A1). This result does not fully demonstrate that time series analysis is an attractive solution to the challenge posed by loop flows given that implementation of this approach would require significant data sharing among the ISOs. Changes in the scheduling of transmission services may also be required. There is also a positive tail risk issue that would require additional analysis before implementation. Nevertheless, the analysis reported here is encouraging given the magnitude of the operational challenges posed by loop flows.