Exploring wind power prognosis data on Nord Pool: the case of Sweden and Denmark

: A good understanding of forecast errors is imperative for greater penetration of wind power, as it can facilitate planning and operation tasks. Oftentimes, public data is used for system studies without questioning or verifying its origin. In this study, the authors propose a methodology to verify public data with the example of wind power prognosis published by Nord Pool. They focus on Swedish data and identify a significant bias that increases over the forecast horizon. In order to explore the origin of this bias, they first compare against Danish forecast and then describe the underlying structure behind the submission processes of this data. Based on the balance settlement structure, they reveal that Swedish ‘wind power prognoses’ on Nord Pool are in fact rather wind production plans than technical forecasts. They conclude with the recommendation for improved communication and transparency with respect to the terminology of public data on Nord Pool. They stress the importance for the research community to check publicly available input data before further use. Furthermore, the root-mean-square error and the spatio-temporal correlation between the errors in the bidding areas at different horizons are presented. Even with this compromised data, a stronger correlation is identified in neighbouring areas.


Introduction
Renewable energy from wind power is continuously increasing worldwide as power system integration issues are arising. A large body of research has evolved on integration studies of variable renewable generation such as wind and solar power [1][2][3][4][5]. A common ground of the conclusions of these studies comprises that the integration of renewables is facilitated with a fleet of flexible generators. In the Nordic context, Nilsson et al. [6] investigated the potential of using a dynamic wind signal as an incentive for load shift in demand response (DR) programs. The wind signal is constructed based on Swedish electricity generation data for 2014. The results showed that shifting electricity consumption from hours of high price and low wind power generation to hours of low price and high wind power generation leads to both consumer costsavings and reduced climate impact in the long term. In order for the research community to conduct reliable studies, it is vital to use data that is available and realistic.
In Europe, ENTSO-E launched a central transparency platform in 2015. This ENTSO-E Transparency Platform [7] provides free, continuous access to pan-European electricity market data for all users, across six main categories: load, generation, transmission, balancing, outages and congestion management. The Nordic market has been very transparent for a long time: market clearing prices, volumes, load and generation prognosis as well as final load and generation can be publicly accessed on Nord Pool [8]. It is tempting for the research community to utilise this public data, e.g. [6,9,10].
However, it remains important to verify input data carefully. For instance, authors of [9,10] obtained surprising results about Swedish wind forecast errors due to the lack of understanding about the origin of the data.
Various forecasting models, varying in complexity, have been developed and are summarised in [11,12]. While the different types of forecasting models are outside the scope of this study, it is well understood that none of these models can perfectly predict wind power. For this reason, a good understanding of the errors and resulting uncertainty is essential to facilitate power system operation.
In countries such as Germany, Spain, Finland and Denmark extensive work on the forecast error analysis has been conducted [13][14][15]. In [13], the wind power forecast quality for more than 200 wind farms of different sizes in Germany was analysed. It was shown that the forecasts for larger wind farms show higher accuracy in general. Finnish wind speed and wind power forecasts were examined in [15] where three-year historic data from more than 20 sites in Finland was analysed. An international comparison of energy weighted wind forecast errors has been conducted in [10]. The correlation between the wind power forecast errors obtained at different points in space and time is examined in [14] with a case-study in western Denmark. It was shown that the forecast errors propagate throughout time and space. However, such in-depth analyses have never been conducted in other Nordic countries, although the installed capacity in e.g. Sweden has surpassed that of Denmark.
In [9], wind power forecast error distribution functions of the United States, Spain, Finland, Sweden, Denmark, Ireland, and Germany were analysed. The Swedish cumulative distribution was chosen as a clear example to illustrate improved fit of the hyperbolic distribution over the normal distribution. The distribution of forecast errors in Sweden was found to be 'slightly leptokurtic negatively skewed' and 'interesting for their fairly small spread'. The reason was found as 'likely due to the large amount of geographic diversity stemming from the multiple sites over a large geographic area'. However, Hodge et al. [9] did not consider that the errors occur at different forecast horizons and lacks an understanding of procedural patterns that would explain several of the findings, as we will show in Section 3.3 and discuss in Section 5.2.
In the Nordic and Baltic countries, the imbalance price is constructed such that (wind) power producers are responsible for their own balancing and are billed for the resulting production imbalance [16]. Thus, forecast errors may turn into costs and therefore market participants would strive for accuracy in the forecasts for hourly power production. Several references investigated how forecast accuracy can be evaluated [17][18][19]. Murphy et al. [20] proposed a cost-loss ratio for repetitive decision making and Bessa et al. [21] discussed the relation between forecast error and market profit with the example of the Iberian market. In the Nordic context, Matevosyan et al. [22]  generally results in higher or equal profits than strictly bidding the production forecast and then paying the imbalance cost.
In this study, statistical analysis is applied to verify data and analyse wind power predictions over the forecast horizon in Sweden and Denmark. The contributions of this study are the following: • We propose a method to verify public data and apply this to Nord Pool wind power prognosis data. • We conduct statistical analysis of Swedish wind power prognosis based on Nord Pool data of 2015-2018 and Danish onshore forecasts based on historical Energinet data of 2016-2017. • The procedures and mechanisms related to forecast submission, publication and data accessibility in the Nordic countries are summarised. • The balancing settlement process is elucidated in order to explain the findings. We shed light on previous results of [9,10] regarding Swedish wind power forecast errors by detailing the mechanisms behind the forecast data on Nord Pool. • We reveal that the Swedish wind power prognoses on Nord Pool are in fact aggregated wind production plans (PPs) that are trade report updates rather than technical forecasts.
The rest of the paper is structured as follows. In Section 2, we present the available data used for this paper and how it was obtained. Section 3 describes techniques that were used to analyse the data and the results. The Nordic balance settlement (NBS) is summarised in brief in Section 4 regarding production, forecasts and balance responsibility in the Nordic countries. Based on this, the results are discussed in Section 5. In Section 6, the conclusions of this study are presented.

Available data
The data used in this study is taken from the available data on Nord Pool [8] representing Nordic and Baltic countries. In the definition of ENTSO-E [7], this comprises the Regional groups 'Nordic' (Denmark East, Finland, Norway, Sweden), 'Baltic' (Estonia, Latvia, Lithuania) and 'Continental Europe' (Denmark West), cf. Fig. 1. The focus will be on Sweden and Denmark.
In the Nordic countries, the growing wind power penetration has been supported by two important characteristics: (i) The large share of hydro-power generation gives the system more flexibility to cope with imbalances. (ii) The high transmission capacity between the different Nordic and Baltic countries, as well as interconnections to the rest of Europe, result in a large potential for import and export capabilities.
The geographical distribution of wind turbines is related to locations with high mean wind speed, low population density, and sufficient grid connection properties. Fig. 1 shows the geographical distribution of wind power turbines in the Nordic and Baltic countries up to 2017.

Installed capacity in Nordic and Baltic countries
Overall, wind power production is still small in most of the Nordic countries. European Union (EU) goals have been set to increase the share of renewable energy in gross final energy consumption in the EU to 20% before 2020 [27]. In order to reach this target, each member country has been given its own target. Denmark has set their goal to 30%, Sweden to 49%, Finland to 38%, Estonia to 25%, Latvia to 40% and Lithuania to 23% [28].
The installed wind power capacity in the Nordic and Baltic countries that participate in Nord Pool is summarised in Table 1 at the end of years 2014-2017. The table further displays the prediction of installed wind power capacity in 2030 according to Wind Europe [29]. While three scenarios were developed in [29] ('high', 'central', and 'low'), we only show the central scenario here.

Sweden:
Sweden has surpassed the installed capacity of Denmark and shows the fastest growth of wind power installations in the Nordic countries, cf. Fig. 2. In 2017, a total of 3452 wind power turbines were operating in the Swedish electricity system with an installed capacity of 6691 MW. Compared to 2010 (2004 MW), the installed capacity was more than tripled, while energy generation increased almost five times, from 3.5 TWh in 2010 (2.4% of the total consumption) to 16.3 TWh (12.8% of the total yearly consumption) [32]. However, this is still far from the goals set in the national planning framework for wind power [33]: it is anticipated that the expanded certificate system will lead to a continued expansion in wind power, with an increase of 18 TWh by 2030. This corresponds to 6,000 MW of additionally installed wind power (assuming the availability of 3000 full load hours per year). The extensive wind power development creates a significant challenge for the transmission system operator (TSO) when planning the development of the grid [33].

Denmark:
Denmark is the cradle of modern wind technology and has had a high wind power penetration since the 1990s. In 2017, 44% of Denmark's yearly electricity consumption was generated by wind power. On particular days (e.g. 10 June 2017) wind power accounts for up to 140% of the instantaneous power consumption [34]. According to Fichaux et al. [35], Denmark has had one of the best policy regimes for wind energy between 1980 and 2012. Therefore the procedures and documentation for wind power integration have achieved a highly developed state. Today, almost half of the country's electricity production comes from wind power, cf. Table 2. As detailed in Section 2.2.2, the installed capacity was provided at each update instant.  [23], Denmark [24] and the Baltics [25,26] capture the state of 2017, Norway [23] and Finland [23]  Other Nordic and Baltic countries have started the commercial use of wind power much later, but are following this example with ambitious goals.

Norway:
Norway has a relatively low share of wind energy. On the other hand, the high share of hydro power provides comfortable flexibility to the system. The Fosen project, however, will almost double the current capacity until its commissioning in 2020. Fosen Vind is realising Europe's largest onshore wind power project in central Norway, comprising six wind farms, with a combined capacity of 1000 MW [36].

Finland:
Finland has the third highest amount of installed wind power in the Nordic countries. The highest share of energy is produced by nuclear power.

Estonia, Latvia, and Lithuania:
The Baltic countries are not connected synchronously to the Nordic synchronous system (e.g. DK1) but have DC interconnectors. While the installed capacity is relatively low (cf. Table 2), the share in total power generation has reached 8% in Lithuania and Estonia.

Wind power forecasts on Nord Pool
On Nord Pool's website [8], market data labelled wind power prognosis can be downloaded. In the following, we will refer to the day for which a forecast is made as the delivery day (D). The delivery day is comprised of 24 delivery hours (DHs) if it is not a daylight saving day, cf. Fig. 3. In the Nord Pool data, the forecasting period starts at 0:00 on the calendar day before (D-1) the delivery day D and ends at 23:59 h, just before midnight on the delivery day. Each submitted forecast consists of 24 point forecasts in hourly resolution for the 24 DHs. In this study, we will refer to all forecasts that are sent on D-1 for the delivery day as day-ahead (DA) forecasts and if they are sent on the delivery day as intra-day (ID) forecasts.  The wind share is here calculated as the yearly electric energy production from wind power over the yearly consumption  24 , there are 47 forecasts on a forecast horizon between 47 and 1 h ahead. This means that the length of the horizon is different for each DH. In addition, there is one intra-hour (IH) forecast that is created during the respective DH itself. The last ID forecast in each DH is the historical data that is finally stored on Nord Pool's FTP server.

Denmark:
In addition to Nord Pool data, Energinet provided continuously updated wind power forecasts for Danish onshore generation.
Wind power forecasts of on-shore wind farms in Denmark were obtained from the Danish TSO Energinet in MATLAB file format. These forecasts were generated by running dedicated commercial software including supervisory control and data acquisition (SCADA) data in the control room. Forecast wind energy is based on numerical weather predictions. They are recalculated every time a new weather forecast is received. For this reason, updates are not generated in a specific sequence. The forecast horizon spans between 0 and 156 h. In order to match the forecast horizon with Swedish data, only the forecasts up to a 48 h forecast horizon were selected. In addition, it is worth mentioning that forecasts for DK1 bidding area are done for 15 min intervals, while for DK2 hourly average forecasts are made. Within this data, the installed capacity was documented at each update, i.e. multiple times per day.

Others:
The available data at Nord Pool in 2018 are summarised below for TSOs of participating countries: • Sweden: Svenska kraftnät submits forecast updates whenever there is new information, approximately every 5 min. • Norway: Statnett does not submit forecasts.
• Denmark: Energinet submits one ID forecast at the beginning of the day and one DA forecast around 17:00. However, the TSO stores forecast updates in 15 min resolution in DK1 and in 1 h resolution in DK2. • Finland: Fingrid does not submit forecasts. However, on the website of the TSO, one DA and one ID forecast are available. The day ahead forecast is published every day at 12:00 Eastern European time (EET) and is not updated after publication. • Estonia: Elering submits one ID forecast at the beginning of the day and one DA forecast around 17:00. • Latvia: Augstsprieguma Tkls AS submits one ID forecast at the beginning of the day and one DA forecast around 17:00. • Lithuania: Litgrid submits one ID forecast at the beginning of the day and one DA forecast around 9:00.
We are interested in the forecast error trajectory over the forecast horizon. Only Swedish and Danish data allow for such analysis, which is why we will limit the scope to that data. Since other countries only show a single DA and ID forecast at most, this data has not been analysed.

Wind power production
On Nord Pool's website [8], market data labelled wind power can be downloaded for all Nordic and Baltic countries.

Sweden:
For Sweden, it can take up to 14 days until the final measurements are received and the production data is published. The production data is available in hourly resolution for bidding areas SE1, SE2, SE3 and SE4 (cf. Fig. 1) as well as for the entire country SE. An example wind power forecasts with different forecasts horizons and production is shown in Fig. 4.
The graph shows the actual production (observation) and forecasts with a 1 h (red), 4 h (green), 8 h (orange) and 10 h (blue) horizon for November 2016. As can be seen in the zoom window, the largest mismatch between the forecast and actual production occurs during the hours of high generation of wind power. This observation is confirmed by the data as a general characteristic. It may be explained by the fact that when the wind speed is high, several turbines may operate close to their cut-out wind speed.
It is obvious that as the forecast horizon approaches the DH, forecasts tend to become more accurate. Particularly in extreme cases with steep ramps to high-or low-wind power production. One such incident is highlighted in Fig. 4. A more detailed statistical analysis on forecast accuracy is presented in Section 3.

Denmark:
The settlement data for 2016 for onshore wind power in two bidding areas DK1 and DK2 were provided by Energinet. Similar to the structure of the forecast data, the actual production was given as energy per quarter hour MWh/15min in DK1 and energy per hour MWh/h in DK2.
In Fig. 5, the actual wind power production and selected forecasts are shown. Note that a large number of missing data is present. This is due to the fact that forecasts are updated infrequently.

Others:
Other countries also send their wind power production to Nord Pool where it is published retrospectively. However, data are not analysed here.

Data processing
In order to analyse the forecast and observed wind power production, the data was first cleaned and then aligned among the same time frame. Data cleaning or data cleansing is the process of detecting incorrect or missing data in the time series.
Due to the download server unavailability, data from Sweden was missing in several instances. No forecast data for March and July 2016 is available and only a small sample set of data for December 2015 and January, April and June 2016 is available. In both countries, any target day that was missing one point forecast was discarded entirely in order to ensure the same sample length for all ID forecast horizons. In addition, daylight saving days were excluded in order to preserve the alignment with the horizons described in Table 3.

Data verification
Forecast data can be viewed on different time axes since a point forecast is made with a given forecast horizon for a specific forecast time, i.e. DH. In order to verify the data, we first perform the following steps: • align the data by forecast horizon (cf. Table 3) and plot the joint error of all DHs. • align the data by forecast time (cf. Table 3) and plot the individual error of all DHs.
Next, we select and define error measures and statistical properties that are commonly used in the respective field, here wind power forecasting [14,37]. In order to evaluate the error, a comparison of the properties of reliable data stemming from a similar context should be conducted, here with [13,14].
If major differences are revealed, the market and regulation that drive the underlying process should be investigated, and people may use this data if interviewed to explain the differences. If no major difference is revealed, it may be beneficial to check further statistical properties of the data before use.

Data analysis
This section describes the statistical analysis of forecast accuracy and spatio-temporal correlation. All forecast errors are first normalised with respect to the installed capacity in the corresponding bidding area. In Sweden, the mean of the installed capacity at the beginning and end of each quarter year was used for normalisation. The installed wind power capacity in each bidding area is illustrated in Table 4 at the end of years 2014-2017. The installed quarterly capacity was obtained from [32,33] and the quarterly averages were used. In Sweden, there are no clear intra yearly installation trends and installation rates vary from year to year.

Bias error
The absolute forecast error at any point in hour t is defined as the difference between the forecast and the actual (observed) wind power generation in that hour. The average normalised bias error Ē h at horizon h is calculated as where P i is the actually observed generation in DH i , P h, i is the hhour forecast for DH i , P inst, i is the installed wind power capacity and H is the total number of DHs in the data set that contain a hhour forecast. The average normalised bias error is shown in Fig. 6 over the forecast horizon. Note that the number of historical point forecasts N is different for each forecast horizon h. Therefore there are fewer data available for longer forecast horizons (applicable only to Swedish forecasts).
Forecasts with horizon 0 represent the IH forecast. As can be seen, the bias errors in Danish forecasts at horizon 47-12 h are significantly lower than their counterparts in Sweden and do not  Table 4 Installed wind power capacity P inst by bidding area in Sweden (SE1-SE4) and Denmark onshore (DK1, DK2) at the end of each year (MW) [30,32] Year Sweden  Denmark  SE1  SE2  SE3  SE4  Total  DK1  DK2  Total   2014  507  1975  1778  1467  5727  3280  670  3950  2015  534  2306  1894  1541  6275  3325  685  4010  2016  539  2386  1998  1598  6520  3370  735  4105  2017  568  2391  2136  1597  6691  --4205  outlook  1989  3890  3055  1881  10, This bias error, however, is not characteristic for the nature of wind power and prediction tools. Typically, the mean value of forecast errors is close to zero [15]. Therefore, this bias is investigated over the forecast time (cf.
where the actually observed generation is P i and the forecast at day j for DH i is P h, i, j . N is the total number of days in the data set and P inst, i is the installed wind power capacity. In order to clarify the relevant quantities, Table 3 illustrates the time when forecasts were downloaded and the number of forecast updates for DA and ID forecasts, respectively, as well as the DH. Table 3 also introduces the definition of forecast time, which is being used as the axis in Fig. 7. In Fig. 7 the bias errors for selected DHs in bidding area SE1 are shown as mean forecast error trajectory over the forecast time as defined in Table 3. The dot represents the IH forecast that was made 5 min before the DH. This IH forecast is in practice a very short-term forecast and also the latest update to the forecast data on Nord Pool. Similar results were found for other bidding areas. The dashed red lines indicate the time when the markets close, the dashed green lines indicate two instances of the PP submission process. Two rapid and systematic improvements in the bias error can be identified: The systematic improvements are discussed in Section 5. In the rest of this paper, the bias in the forecast at a certain horizon is removed by subtracting the mean value of the data set from each value.

Normalised root-mean-square error
A common metric to capture the error magnitude is the root-mean square error (RMSE). The RMSE implicitly gives more weight to the larger errors due to the square term.
In Fig. 8 the normalised RMSE (NRMSE) of the (i) biased and (ii) non-biased data is shown over the forecast horizon together with the RMSE for Danish wind power forecasts. It is logical that as the horizon approaches the DH the RMSE decreases. The lowest errors occur right before the DH, ranging from 4.6% in SE2 to 6.4% in SE1. It can be seen that Danish wind power forecasts, on average, had lower errors than price areas in Sweden. It is worth noticing that between forecast horizons 12 and 0 h RMSE for whole Sweden was lower than in DK1 and DK2, which indicates the influence of forecast aggregation over the larger areas, cf. [37]. In comparison, errors in the horizon 47 are between 24% in SE2 and 32% in SE1. Errors in DK1 and DK2 increase with the forecast horizon. However, these errors are lower than errors in Sweden and do not exceed 11%. DK1 and DK2 areas are comparable to that of SE4.
A significant increase of the RMSE is visible when the forecast horizon extends beyond 12 h. The highest RMSE occurs in bidding area SE1 and the lowest in SE2. This corresponds to the installed capacity: Table 4 shows that SE1 has the lowest installed capacity, while SE2 accommodates the most wind power. In addition, Fig. 1 shows that power plants are spread out more evenly in bidding areas SE2-SE4 compared to SE1. Since there are more wind power plants in SE2-SE4, the smoothing effect is more dominant [13], which implies that e.g. an over-prediction of the generation in the north of the bidding zone is compensated by under-prediction in the south. These differences in installed capacity and geographical spread lead to a lower RMSE in SE2.
The RMSE is significantly reduced in Fig. 8b when the bias is removed from the data. SE1 still has the largest error, however, compared to the biased data set it is improved by more than 9%. Similarly, other bidding areas, as well as SE, show lower RMSE. In addition, the total RMSE in SE is lower than in the individual bidding areas. This is expected due to the aggregation and smoothing effects described in [13,37]. Fig. 9 shows IH forecasts with a 12 h forecast horizon in SE. It is evident that IH forecasts are more evenly distributed around the mean, with lower spread and lower minimum and maximum errors not exceeding 12%. In contrast, 12 h forecast has left tail stretched to the left side with negative errors exceeding 25%.

Forecast error distribution
Here, the wind power forecast error distributions were examined over the forecast horizon. We want to characterise the error distribution at different horizons in all bidding areas. To this end, statistical moments were used to characterise the error distribution function; mean μ , variance σ 2 , skewness γ , and kurtosis κ . For a more detailed description of the general statistical characteristics of wind power, the interested reader is referred to [38]. The mean (bias) is equivalent to the bias error and was presented in (2) and discussed extensively in Section 3.1. The other three moments were calculated with non-biased forecasts μ = 0 . Fig. 8b shows the NRMSE of non-biased data which is representative for the second moment, variance. In Fig. 10, skewness and kurtosis are shown over the forecast horizon. A distribution function with the asymmetry tail stretched out to the left of its mean is called negatively skewed, to the rightpositively. The fourth moment kurtosis is equal to zero for a normal distribution. Distributions that have larger kurtosis than 0 are termed leptokurtic. They can be identified by large thin peaks and thick tails.
In Fig. 10a, negatively skewed error distributions over the horizon are shown. Due to the large bias errors, results for the horizons larger than 11 h should be interpreted with caution. The graph indicates that these forecasts are negatively skewed, however with increasing kurtosis. Negative skewness indicates that the left tail of the distribution functions is longer and thicker than the right tail. Even for horizons 5-0 h, close to the DH, forecasts tend to underestimate wind power production in higher quantities. Finally, the kurtosis in Fig. 10b in the forecasts for horizons between 0 and 11 h for Swedish forecasts increases and reaches a maximum in the forecasts closest to the DH. While in forecasts for Denmark, kurtosis fluctuates and is indifferent of the forecast horizon. The practical implication of increasing kurtosis can be interpreted as the increased accuracy in the short-term forecasts compared to larger horizons [9]. However, this is not observed in forecasts for Denmark, which can signify that the confidence of forecasts does not improve drastically in short-term forecasts.

Forecast error correlation
In order to gain an understanding of the temporal characteristics of forecast errors, the auto-correlation is addressed first, i.e. the correlation between the errors in the same bidding area at different horizons. Next, we extend the analysis to a spatio-temporal level and analyse cross-correlation, the correlation between the errors in the different bidding areas over varied forecast horizons. The Pearson correlation coefficient (r) is calculated to quantify the magnitude of the correlation. When r = 1, the two data sets are perfectly positively correlated, when it is −1 they are negatively correlated and when it is 0 no correlation exists.

Auto-correlation:
Auto-correlation describes the correlation between the errors r t, k = r x i, t , x i, k in bidding zone x at horizons t and k, where i indexes the DH (DH i ). The auto-correlation is calculated as follows: Here, N is the total number of samples, x i, t and x i, k are the errors and their mean is referred to as x¯i , t and x¯i , k . The auto-correlation up to a lag of 5 h is shown in Fig. 11. Errors at horizon 0 for each bidding zone were taken as a response variable to the errors observed at other horizons. It can be seen that auto-correlation is a slowly dampened process and remains strong between the forecasting errors that were obtained 5 h apart.

Cross-correlation:
Cross-correlation describes the correlation between the errors r x t , y k = r x i, t , y i, k in bidding zones x and y at horizons t and k, where i indexes the DH (DH i ). The crosscorrelation between the bidding zones at different horizons is calculated as given in (4) (4) Here, analogous to (3), N is the total number of samples, x i, t and y j, k are the errors and their mean is referred to by x¯i , t and y¯i , k . In Figs. 12 and 13, the cross-correlation between the errors of Swedish and Danish bidding areas is shown. Forecasting errors at the horizon 0 for the respective bidding zone were taken as the response variable to the errors at different horizons in different zones. A 95% confidence level is shown by dashed lines in the respective colour scheme. (The confidence interval is not constant since Danish forecast data displays an inconsistent number of data points over the forecast horizon.) The strongest correlation is observed between neighbouring bidding areas. Note that part of Denmark (DK1) uses 15 min updates of wind power forecasts. Therefore, the mean of all 15 min wind forecasts was used in DK1 to calculate the cross-correlation at a given horizon.
In Figs. 12 and 13, a negative correlation between the errors in some areas was observed, which is not intuitive. However, these values are very close to the confidence level, and we, therefore, consider them as statistically insignificant.

Nordic balance settlement
In order to explore the nature of the compromised data, it is essential to understand the market -even more so -the balancing settlement mechanisms. This section summarises the actors in the communication procedure of wind power forecasts on Nord Pool and their (balance) responsibility according to eSett OY [16].
The NBS is the establishment of a common balance settlement mechanism for Finland, Norway, and Sweden. The purpose of the mechanism is to reduce entry barriers for retailers that want to offer services in these countries and pave the way for a common Nordic retail market for electricity.

Settlement provider
The common settlement central, eSett OY [16], performs the balance settlement function for electricity market participants in Finland, Norway, and Sweden and is owned by the three collaborating countries' TSOs. eSett is the imbalance settlement provider using a harmonised model throughout the Nordics providing equal operational preconditions for balance responsible parties (BRPs). Further details and clarifications can be found in the Handbook of eSett OY [39]. Table 5 summarises some of the differences in the settlement processes between the countries [40].

Nominated electricity market operator (NEMO)
The liberalisation process in the middle of the 1990s was followed by an integration of the Nordic markets. The establishment of Nord Pool, the Nordic electricity exchange, was an important part of this integration. Today, there is a common Nordic wholesale electricity market.
Nord Pool operates the DA market (Elspot) and ID market (Elbas) and publishes relevant information on their website [8]. Today, market data of Norway, Sweden, Finland, Denmark, Estonia, Latvia, and Lithuania are published. Among this information, wind power forecasts that are sent by the TSOs are published and updated in real time on Nord Pool's website [8] with the name wind power prognosis.
The update frequency depends on the TSO and varies between TSOs. Nord Pool updates the forecasts whenever an update from any TSO is received and overwrites the old forecast data with the update.

Transmission system operator (TSO)
A TSO has the responsibility for both the security of supply and the high-voltage grid. They also carry the ultimate responsibility on the imbalance settlement according to the national laws. The TSO requires wind power forecasts for two main purposes: (i) capacity allocation and congestion management and (ii) operational balancing.
To this end, it is imperative to have the most accurate forecast available. This ensures the economic operation of the power system by operation at the optimal cross-border transmission level to maximise social welfare. Therefore, the TSO may generate or purchase its own forecasts as well, which is done in practice: The Swedish TSO currently purchases their forecasts externally from [41]. The Danish TSO has two types of forecasts available for the horizon 0-10 h; one only based on weather forecasts (offline forecast) and one taking SCADA data into account (online forecast). The offline forecast is recalculated every time a new weather forecast is received. The online forecast is updated every 5 min.
The wind power prognosis data (but really PP update) is aggregated by the Swedish TSO and only forwarded to Nord Pool. The TSO does not use this data internally for any purpose since they know that their software will yield actual forecasts (opposite to PPs).

Balance responsible party
A BRP is a party that has a valid imbalance settlement agreement with eSett. Balance responsibility means the BRP is obligated to ensure that a balance exists between the supply and demand and is penalised ex-post for net imbalances. The reference for calculating the imbalance is the PPs that a BRP sends for each regulation object (RO) in the respective metering balance area, commonly referred to as price area.

Commercial forecasting model
A great number of actors in the Nordic electricity market rely on externally purchased commercial software to generate wind power generation forecasts. Commercial tools are combining statistical models that use e.g. regression [41] and/or machine learning [42] with numerical weather predictions such as wind speed at different altitudes, wind direction, air density etc. Forecasting models are outside the scope of this study, but a summary of the state-of-theart can be found in [11,12].

Bias error
An investigation of the bias error revealed a systematic underprediction of the wind power that improves significantly at two instances: around gate closure of the DA market and around gate closure of the ID market, i.e. for PP updates for the first DH. This bias error impacts the results significantly and was thus compensated for. However, the compensation method is very simplistic and is not guaranteed to supply a realistic zero-mean forecast in every case. Generally, the imbalance price depends on the amount and sign of the mismatch between the reported and measured production. Thus, the intuition is that producers should be encouraged to generate accurate forecasts and PPs in order to minimise their cost. With respect to this bias, the following questions may arise: (i) do the forecasts have an impact on the balancing cost, for the BRP and/or the TSO? (ii) Is it a lack of information, i.e. are the forecasts scaled up when data is missing?
In the following, we will show that the bias is related to the mechanisms of the NBS, and does not incur a cost as of today. It is rather related to the fact that the Swedish TSO submits PPs instead of forecasts to Nord Pool that publishes them as wind power prognosis.

Swedish data and the NBS
In Sweden, PPs are submitted for each metering grid area (MGA, Swedish: Lokalnätområden, see [43]). The BRP reports the PP with a separate wind PP in each MGA to the TSO. There can be multiple BRPs in a single MGA, and one distribution system operator (DSO) is responsible for the production metering of each MGA. The TSO then aggregates the PPs of all MGAs in each of the four bidding areas SE1-SE4 and forwards this data to Nord Pool. BRPs are required to submit PPs for trades up to 45 min before the start of the DH. A PP is strictly not a forecast. This PP is an actual deal on the DA market Elspot, the ID market Elbas or a bilateral trade which is carried out by the BRP and the result is sent to the TSO.
Key finding 1: Swedish wind power prognoses on Nord Pool are in fact aggregated wind PPs, i.e. trade report updates rather than technical forecasts.
Whenever a new (bilateral or wholesale) trade is closed, the BRP updates its PP. After receiving the PP updates from the BRPs, Svk aggregates them by bidding area and sends them to Nord Pool where they are stored as semicolon divided value (sdv) files. Svk sends updated PPs every 5 min if there is any new information. Nord Pool publishes the updated version of all Nordic PPs as wind power prognosis on their website [8] and stores them with the name of the calendar week. The forecast made on the day ahead and the forecast made on the same day are stored for DHs of the same and consecutive day. This implies that old data is overwritten by updates. Therefore, the accessible data labelled as wind power prognosis on Nord Pool [8] are aggregated PPs that were sent just 45 min before the DH. This may explain why Swedish forecasts still rank well in international comparisons such as [9,10]: the forecast horizon is very short which logically results in lower forecast errors. The forecast error distributions in Fig. 9 showed that even for the horizons between 0 and 11 h the error distributions were negatively skewed leptokurtic, which is in line with [9].
Key finding 2: the historical Swedish DA forecasts that are stored on Nord Pool are in fact aggregated PPs sent between 23:15 and 23:59 DA. In comparison with Denmark's forecasts, we can conclude that PPs submitted after 23:00 DA have low bias and realistic magnitude in terms of forecast RMSE.
Key finding 3: a simple analysis of the historical DA wind power prognosis on Nord Pool would not reveal the nature of the compromised data.
BRPs send the PP updates per RO to the TSO, latest 45 min before the DH, see Fig. 14a. The TSO forwards these binding PPs per RO to eSett. eSett uses these PPs as a reference for calculating the imbalance cost of a BRP. However, the BRP can involve in bilateral trades at two instances [38]: (i) the first gate closure for bilateral trade is 45 min before the DH in Norway and Sweden, and 20 min before the DH in Finland. (ii) The second gate closure for bilateral trade is 24:00 next working day after the delivery day.
Observation 4.1: BRPs can settle their imbalance even after the DH by bilateral trades.
In the NBS model, eSett aggregates data on the BRP level, as illustrated in Fig. 14b. That means even if a BRP is in imbalance after the two gate closures for bilateral trades, it can still settle its imbalance within the own portfolio by e.g. hydro power. Therefore, the only potential cost associated with deviating from the PP consists of an opportunity cost for sub-optimal scheduling of a hydro power RO.
Observation 4.2: BRPs can settle their imbalance within their own portfolio by aggregating ROs.
In support of this, Bessa et al. [21] find evidence that forecasts with higher accuracy do not necessarily lead to higher incomes for a wind power producer and thus, the producer may be willing to reduce the forecasts' accuracy in exchange for an increased income. They find indications appear that bids do not correspond to a 'neutral' forecast. They further conclude that the market rules and the way that wind power is remunerated in the market should be revisited to better facilitate the TSO's decisions regarding system security. The results in [21] give as one option to decouple the concepts of forecasting and bidding. Here, we suggest decoupling the concepts of forecasting and PP reporting.
Furthermore, Matevosyan et al. [22] showed in the Nordic context that strictly bidding the production forecast does not always yield the economic optimum. Concurrently, academic literature is using imbalance prices when studying bidding strategies of market participants [22]. However, since in Sweden the opportunity cost for hydro power is close to zero, the actual imbalance cost would, in reality, be lower than the imbalance price.
Key finding 4: today, the opportunity costs for a BRP related to deviation from 'neutral' PP are close to zero. On the contrary, there exists evidence [21,22] that in certain conditions a BRP may have incentives to deviate from a PP that corresponds to the 'neutral' forecast.

Root-mean square error
As expected, the RMSE decreases as the forecast horizon approaches the DH, cf. Fig. 8. It was shown that for bidding areas with a higher number of wind power plants the RMSE is lower. Furthermore, aggregated forecasts for entire Sweden are on average lower than for individual bidding areas. This trend has been shown in a comparison of individual wind farm errors with the total error in Ireland [44].
In comparison with Denmark's forecasts, we can conclude that the forecast error of PPs after 23:00 DA has a realistic magnitude. Nevertheless, the PP data should rather be called aggregated wind PPs than wind power prognosis on Nord Pool.

Temporal correlation
The auto-correlation factors in Fig. 11 can be dominated by the fact that the forecasts were made for a large aggregation of wind power plants, where even the smallest area (SE1) hosts more than 500 MW installed capacity. Thus, spatial smoothing due to a large number of power plants may reduce the dependency on the local behaviour. This local dependency is found to have a higher impact in [14] where single wind parks are analysed.

Spatio-temporal correlation
Finally, the cross-correlation analysis in Figs. 12 and 13 shows that the errors between neighbouring bidding areas correlated stronger than between more distant areas. This should be discussed with reference to Fig. 1 and 15 showing the geographical spread of wind turbines and predominant wind directions.
It is observed in Fig. 1 that wind power plants are densely populated in the south of SE4 and on the border between SE4 and SE3. Predominant wind directions in SE4 and south of SE3 are west and southwest, which implies that most of the power plants in the south of SE3 are downwind from the power plants in SE4. In contrast, towards the north of Sweden, wind power plants in SE2 are more scattered and less concentrated on the border between SE2 and SE3. In addition, wind direction occurrences are more evenly distributed. Consequently, the amount of time that the plants in SE2 are upwind SE3 (SE1 upwind SE2) is less towards the north.
It can be seen that the cross-correlation r DK1, DK2 , i.e. r DK2, DK1 is stronger than the correlation between the price areas in Sweden. Both DK1 and DK2 correlate with SE4 in Sweden. Since the predominant wind direction in Denmark is west and southwest (similar to SE4), this means that SE4 is situated downwind from DK, which may explain the observed correlations r DK1, SE4 and r DK2, SE4 . As was shown in [14], errors between plants that are downwind or upwind from each other have stronger correlations. This may explain why the errors in SE3 are more correlated to the errors in SE4 than SE2. Another explanation of the crosscorrelation factors may also lie in the distribution of farms rather   than the prevailing wind directions. This can, for instance, explain the relatively small differences between r SE3, SE4 and r SE3, SE2 , cf. Fig. 12c. The different correlation factors between the bidding areas may be influenced by the location and distance of the wind power plants, as well as by the predominant wind directions. A more detailed analysis could be performed if actual forecasts were available for Sweden, and in a higher spatial resolution. In order to capture the full spatio-temporal characteristics of the errors, data of individual wind farms, wind speeds, and directions would be required.

Conclusion
In this study, we perform statistical analysis to verify wind prognosis data and analyse wind power predictions over the forecast horizon in Sweden and Denmark. We propose a method to verify public data and apply this to Nord Pool wind power prognosis data. We identify a significant bias error in Swedish data and therefore elaborate on the procedures and mechanisms related to forecast submission, publication, and data accessibility in the Nordic countries. This bias error explains the several findings of [9,10] regarding Swedish forecast error distributions. In Danish forecasts, no such dependency of the bias error over the forecast horizon was observed.
Interestingly, the RMSE for Swedish PPs at forecast horizons 12-0 h was comparable and even lower than Danish forecasts, which stresses the effect of aggregation and smoothing. In comparison with Denmark's forecasts, we can conclude that PPs submitted after 23:00 DA have low bias and realistic magnitude in terms of forecast RMSE. Therefore, a simple analysis of the DA wind power prognosis on Nord Pool would not reveal the nature of the compromised data. The historical Swedish DA forecasts that are stored on Nord Pool are in fact aggregated PPs sent between 23:15 and 23:59 DA. Previous research on Swedish forecast errors has been analysed without a full understanding of the underlying processes and thus depicts the results in an incomplete frame. The procedures detailed in this work can be valuable to further wind power research and for process improvement of TSOs and market operators.
The main conclusion is that the Swedish wind power prognoses on Nord Pool are in fact aggregated wind PPs that are trade report updates rather than technical forecasts. We suggest that one option to achieve realistic public wind forecasts is by requiring wind BRPs or TSOs to submit actual forecasts instead of PPs throughout the entire forecast horizon. At least, as of today, this data should rather be called aggregated wind PPs than wind power prognosis on Nord Pool in order to avoid confusion and misuse. Also, most importantly, this data should not be treated as a wind power forecast by the research community, e.g. integration studies or error analysis.