Pricing and hedging wind power prediction risk with binary option contracts

In markets with a high proportion of wind generation, high wind outputs tend to induce low market prices and, alternatively, high prices often occur under low wind output conditions. Wind producer revenues are affected adversely in both situations. Whilst it is not possible to directly hedge revenues, it is possible to hedge wind speed with weather insurance and market prices with forward derivatives. Thus combined hedges are offered to the wind producers through bilateral arrangements and as a consequence, the risk managers of wind assets need to be able to forecast fair prices for them. We formulate these hedges as binary option contracts on the combined uncertainties of wind speed and market price and provide a new analysis, based upon machine learning classification, to forecast fair prices for such hedges. The proposed forecasting model achieves a classification accuracy of 88 percent and could therefore aid the wind producers in their negotiations with the hedge providers. Furthermore, in a realistic example, we find that the predicted costs of such hedges are quite affordable and should therefore become more widely adopted by the insurers and wind generators.


Introduction
The output of a wind power facility displays substantial variability as well as stochasticity through time, such that volume risk at the asset level is of significant concern to both the operators and investors. Some assets are also operated as a merchant plant, thereby being exposed to wholesale price risk as well. The wind asset operators in these circumstances therefore, face daily revenue risks due to the variability of the product of output volumes and market prices. Furthermore, even when price risk is hedged with fixed price contracts or via financial derivatives, the hedge may only be partial, e.g. against underlying day-ahead reference prices rather than real-time prices. Thus, the prediction of wind asset revenues (prices × volumes) is a very challenging problem and furthermore, the available market instruments to manage the risks in these revenues are under-developed and limited in scope. Swap contracts do offer an option to hedge against risks and uncertainties but they fail to specify the risks properly [1]. Lucy et al. suggest modest alterations in such contracts to address these risks [2] but they do not appear to have been widely adopted. The power exchange in Germany, for example, introduced wind power futures but they have been slow to attract liquidity and are therefore expensive [3]. Thus, revenue risk is awkward to manage, and there is an adverse negative correlation between output and market price. For an energy generation company, a low market price combined with a high wind energy availability leads to excess energy production which needs to be sold at a lower price. An opposite situation arises if there is low wind availability (hence a low energy output) J o u r n a l P r e -p r o o f Journal Pre-proof of trying to predict the exact values). This prediction is carried out using a data-driven, machine learning methodology. Arguably, this idea reduces the model risk in joint prediction of two quantities (one of which -electricity price -is extremely volatile) significantly and yields fairer prices for hedges than model-based analytical approaches. This would reduce the model risk premium and make the product more affordable to the market participants. We call the proposed option "binary option" to emphasize its different payoff structure (which is closer to that of a binary option in financial markets), which is discussed in detail later in section 2.
Regarding the methodology for the analysis of electricity prices in general, a wide range of analytical techniques have been proposed. For example, a Nash equilibrium model is proposed in [19] for electricityprice analysis. A bilevel optimization model is suggested in [20] for optimal offer-bid strategy calculation in a competitive electricity market with stochastic parameters. Olsina et al. used a stochastic simulation approach to assess the influence of wind power generation on the market prices and their investment implications [21]. Stochastic simulation has also been used to obtain the bidding and offering curves taking account of uncertainties related to load, temperature, wind speed, solar radiation and purchasing power [22]. Stochastic equilibrium models are also used in [23] for analyzing strategic behaviors in electricity markets. An adaptation of the system dynamics approach formulated the stock flow structure of an electricity market model so that a more detailed analysis could help policymakers determine the effects of market structure [24]. A hybrid simulation technique considering a soft linking approach of agent-based modelling and systems dynamics elucidated the bidding behaviour of different market players [25]. A systems dynamic model of a tripartite evolutionary game assessed the impact of renewable portfolio standards on the retail electricity market, wherein the results indicated reversal effects, blocking effects and over-reliance effects [26]. In another study, a stochastic model was used to track the impact of wind power uncertainty on the operational strategies of hydro-wind hybrid systems [27]. A stochastic approach was also used to assess the levelised cost of solar PV, for which a deterministic method is more usual [28]. An empirical econometric approach was used to assess the impact of increased wind generation on wholesale electricity prices and it was observed that there was some decline due to shifts in the dynamics of supply and demand [29]. Another study developed an economic cybernetic model to represent the characteristics of market operation [30]. A stochastic dynamic market model was developed in [31] to assess the deferring value of generation investments under uncertainties in liberalized power markets . However, our focus is not upon asset investments but upon effective risk management of assets via hedges in their short-term operations.
Elsewhere, machine learning techniques have been used widely in power systems studies for forecasting and optimization and we consider this class of approaches to be most suitable for the pricing of quanto options, as formulated in this paper. Chen et al. [32] analyzed short term wind power prediction using a deep learning based auto-encoder algorithm. Wind speed was forecasted using a deep belief network with genetic algorithm for Taiwan [33]. Deep neural networks with spatial features were used for short term wind forecasting in [34]. In another study, a deep residual network with bidirectional Long Short-Term Memory (LSTM) was used to forecast one hour ahead wind power [35]. Longoria et al. proposed a meta agent learner approach for subsidy-free renewable trading, using data from Nord pool and East Denmark [36]. To assess the future responses of a wind turbine, a physics inspired stacked LSTM model was used and it was concluded that such an algorithm was able to forecast better than standard deep learning techniques [37]. Temporary Local Gaussian Process (TLGP) was used to estimate forecasting uncertainty of probabilistic wind power [38]. Ensemble forecasts were used to predict the wind power on the basis of weather prediction and meteorological observations [39]. A Bayesian extreme learning machine (BELM) was used for multi classification based forecasting of clearing prices in Canada Ontario and New York market [40]. A deep learning based ensemble approach was used for crude oil price forecasting [41]. Interval Decomposition Ensemble (IDE) learning approach was proposed to forecast crude oil prices considering different forecasting horizons and data frequencies [42]. In another study a Multi-Source and Temporal Network (MSTAN) was used to forecast short term probabilistic wind power [43]. Several methods such as the grey theorybased data preprocessor [44], probabilistic forecasting [45], convolutional neural networks [46], unsupervised clustering [47] have been used for PV power forecasting. Various machine learning models such as feed forward neural networks [48] and Extreme Learning Machine (ELM) models [49] have also been used to forecast solar irradiation. Machine learning techniques have been applied for load control and forecasting J o u r n a l P r e -p r o o f Journal Pre-proof [50,51], distributed generation [52], failure and fault studies [53], smart grid and smart energy infrastructure [54,55,56,57], market trading [58], electricity theft [59] as well as battery management [60].
We therefore adopt a machine learning based multi-class classification procedure for pricing the quantotype options (referred to as binary options and defined later in section 2.1). In turn, this can provide a basis for predicting fair prices for the joint price-volume hedges, which the wind operators need in order to negotiate with the insurers.
Since the most surprising and serious events tend to occur close to real-time, we focus on mitigating short term risks for the investor and hence on day ahead hedging. The day-ahead auction for wholesale electricity is the most liquid trading market for power in many jurisdictions, and as such, is the natural underlying for derivatives. It determines the main component of power prices, as physically delivered, with only a smaller component adjusted in subsequent intraday trading and real-time balancing. It has also been the most heavily researched aspect of electricity pricing [61]. We therefore, develop our prediction model for quanto-type hedges defined upon the price as determined from the day-ahead auction and the wind outputs as actually realized.
Specifically, the contributions of this paper can be summarized as follows. We propose a novel binary option contract which is based on both wind speed and day ahead price. We show that this contract can be priced efficiently and accurately using machine learning and that it leads to inexpensive hedging costs in realistic scenarios.
The paper proceeds as follows. In Section 2 a description of our methodology is provided, with subsections 2.1 and 2.2 covering the definition of our hedging instrument and the mechanism for pricing it using machine learning, respectively, followed by Section 3 which provides the results. Section 4 concludes. Additional information about the classifiers used and the results obtained is provided in the Appendices.

Methodology
The methodology is divided into two main sections: the binary option contract formulation and the machine-learning, multi-class classification. Together these are used to predict the associated risks and thereby estimate the fair price of the insurance contract.

Binary option contract on wind output and market price
In this section, we model the binary option which can be used in order to hedge the short term revenue risk for a wind energy producer, caused by (high wind, low price) or (low wind, high price) situations. This derivative gives a non-zero payoff when the values of the wind W in a particular location and the market price M fall in one of the pre-specified regions in (W-M) plane, which corresponds to the parameter settings in a parametric insurance contract. Note that the insurance company will only use parameters extrinsic to the company being insured and which can be externally validated. The market prices will be from a defined power exchange and the wind speed measurements from a defined weather monitoring service. We assume that the asset owner can convert wind speed to wind output for the revenue calculation and therefore in what follows, we refer to W as wind output even though in practice the contract will be written on wind speed. Thus, the payoff of this proposed financial instrument is defined in abstract terms as follows.
Definition 1. Given fixed, disjoint regions I 1 , I 2 , · · · , I n ∈ R 2 and fixed positive constants Q 1 , Q 2 , · · · , Q n , for the variables x, y the payoff function of the proposed derivative is defined by Due to the nature of the payoff, derivative instruments of this type are referred to as binary options, or digital options, or 'cash or nothing' options. Numerical methods for the pricing of binary options which depend on a single source of uncertainty (rather than on two sources, as is the case here) have been discussed extensively in financial mathematics; see, e.g. [62] and the references therein.
The regions I i in W-M plane are defined by adverse situations that the wind power producer wishes to hedge against ('high price -low wind' or 'low price -high wind' or both the situations), and Q i represents the anticipated loss corresponding to those situations. To be able to define the price of the binary option formally, we consider a filtered probability space (Ω, F t , P) in standard notation and let W t , M t be continuous time stochastic processes adapted to their natural filtration. W t and M t represent wind power production at time t and day ahead market price at time t, respectively.
For T > 0, for fixed regions I i ∈ R 2 and fixed positive constants Q i , the price at time t = 0 of an uncertain future payoff H(W T , M T ) is given by where the expected values are taken under the joint probability measure induced by historic time series data on the wind power and the market price. Constructing a risk neutral pricing measure jointly on wind power and price requires a liquid market of derivative products which depend on the joint distribution of these two quantities. Prices of such derivatives would then allow for the extraction of the associated riskneutral probabilities. In the absence of such a market, we can only use the historic measure. If the products such as the one we are advocating become established for short and medium term maturities, their prices, as determined by supply and demand, will allow us to extract risk neutral probabilities and then use them to price more advanced structured products. At the moment, this is a 'first generation product' with no information available on jointly risk neutral measure. The risk free interest rate r is used to obtain the discounted present value. Since the time horizon T in the context of our application is fairly short (e.g. one day before the day ahead price is revealed), the discounting factor e −rT ≈ 1 and we will omit it from further discussion.
The choice of regions I i in the (W-M) plane and the constants Q i are best illustrated by an example. Let M x be the number such that prob(M T ≤ M x ) = x. Here, the probability is interpreted as empirical probability derived from data, e.g. for 1000 possible market prices available, M 0.2 refers to the 200th largest price. W x is similarly defined. We will use different choices of x to define non-overlapping regions I i in W-M plane, which explicitly defines normal, favourable and adverse market conditions. Specifically, regions corresponding to the two adverse market conditions (high wind speed -low market price and low wind speed-high market price) are defined as follows: (2) Having defined disjoint regions in (W-M) plane (or classes) related to adverse power-price scenarios, we can complete our definitions of classes to cover all possible scenarios. I ll and I hh are defined in a similar fashion as above, respectively by (i) using M T < M 0.2 and W T < W 0.2 as bounds and (ii) using M T > M 0.8 and W T > W 0.8 as bounds for the corresponding region in (W-M) plane. The definition of one of the remaining classes is as follows: The remaining classes, I ml , I lm , I mh , and I hm are also defined in a straightforward fashion, using the same bounds on the two variables, viz M 0.2 , W 0.2 , W 0.8 , and M 0.8 as given in Table 1. Evidently, we have J o u r n a l P r e -p r o o f Journal Pre-proof illustrated the construction of the so called value at risk parameters for a joint outcome defined with a 0.2 chance of a lower price and a 0.2 chance of higher wind output. The joint probability will not be the product because these outcomes are not independent. Other value-at-risk parameters can be used in a similar way, depending upon the risk limits determined by the risk management team in the company. However, we suggest these limits are typical of practice being close to a 0.05 value-at-risk level (evidently 0.04 if they were independent).
It is also possible to use joint time series models to make point forecasts for M T and W T . However, as mentioned in the first section, to forecast the sets of adverse scenarios and their probabilities, a data-driven and non-parametric classifier approach appears to be far better suited than the time series approach with its associated model risk.
If the wind energy producer wishes to hedge the risk of (W T , M T ) ∈ I hl ∪I lh , they can purchase a contract with payoff as defined in (9), with n = 2, I 1 = I hl , I 2 = I lh and Q 1 , Q 2 defined by appropriate proxies of the anticipated reduction in revenues which need to be hedged. One possible proxy for each anticipated revenue is the difference between the anticipated median revenue under a normal situation (when the wind power-market price pair is in I mm ) and the anticipated lower revenues when the two quantities are in their respective extreme quantiles (either (W T , M T ) ∈ I hl or (W T , M T ) ∈ I lh ). These anticipated losses can be calculated from the classification results on the training data. While the chosen proxy for loss of revenue is intuitively justifiable and is consistent with our data-driven methodology, it is possible to base the values of Q i on other proxies, e.g. on the difference in the means of the two revenues (rather than the medians). Given values of Q 1 and Q 2 obtained as anticipated losses, the price of a binary option contract to be used as a hedge can be estimated by machine learning using multi-class classification algorithms, as demonstrated in Section 3 especially equation (10).
In a general situation, I i and Q i will be determined by specific hedging and risk mitigation requirements, and the proposed procedure does not depend on the exact shape of I i or the exact choice of Q i . For example, one may use deciles, i.e. the set of bounds {M 0.1 , W 0.1 , M 0.9 , W 0.9 }, instead of {M 0.2 , M 0.8 , W 0.2 , M 0.8 } as used in the paper, to model adverse scenarios. In terms of similarity with existing commercial products, these proposed options are closest in their manner of operation to parametric insurance offered by large insurers [63], which was discussed earlier in Section 1.
The next subsection illustrates how we can use the various machine learning algorithms to derive the probabilities P ((W T , M T ) ∈ I hl ) and P ((W T , M T ) ∈ I lh ).

Forecasting based on a deep learning algorithm
As outlined in 2.1, we need to estimate the probabilities of the wind power W and the market price M taking adverse combination of values (which, in turn, corresponds to W, M belonging to classes I lh and I hl ). We train an ensemble classifier based on machine learning algorithm, to predict the probabilities of classes to which a pair (W and M ) will belong, for all the 9 classes defined in the previous subsection. This allows us to price the necessary hedging instruments, i.e. the binary options defined as above.
J o u r n a l P r e -p r o o f

Journal Pre-proof
Given the widespread use of machine learning in power systems applications as discussed in Section 1, it is natural that we should look to apply it for our pricing formulation. The particular type of application we need, however, is for parametric classification to support our binary option specifications. Recall that the main purpose of defining multiple classes is to characterise joint occurrences of adverse market price-wind power production combinations (i.e. high W + low M or low W + high M classifications), which are the classes leading to adverse revenue fluctuations and need to be hedged against.
The classes and their boundary conditions are discussed in Table 4. Our designated classes are mutually exclusive and each class has a defined state and range for the parametric W and M values. The response variable comprises the classes as defined on the W and M values. The explanatory variables are numerous, depending upon the factors available to market participants, and are for example defined for the particular case study application in Section 3. The explanatory variables are chosen through trial and error, by testing the algorithm with various combinations of explanatory variables and choosing a set of variables which gave an acceptable accuracy of classification.
The classification design can be balanced or unbalanced depending on the frequency distribution in the classes. The unbalanced classification creates issues for the classifiers as the minority classes can be overlooked, thereby affecting the performance of the classifiers [64,65]. However, ensemble based machine learning algorithms combine predictions from multiple models and thereby tend to perform better in unbalanced multi-class classification problems.
Ensemble methods combine several individual classifiers in particular ways to create an averaged prediction of the classes. The ensemble classifiers work on a voting mechanism and the ensemble of classifiers gives better predictions than individual classifiers [66]. In this paper, several ensemble classifiers (Extra Tree (ET), Random Forest (RF), K nearest neighbors, Adam Gradient Boosting, Light Gradient Boosting Machine (LGBM), neural network-based classifiers, Naïve Bayes, Multinomial Naïve Bayes and Support Vector based classifiers were tested. In other words, we have compared the main machine-learning methods to find the best method for our application, and in that respect, the ensemble classifiers were found to give the highest accuracy. The classification was tested with multiple classifiers which varied in their approach of classifying the dataset.
The dataset is divided into training and testing subsets, wherein the training data are used to fit the model and the testing data are used to assess the accuracy of the fitted model. The "One versus the Rest" strategy is a heuristic method, which splits the multi-class classification into one binary classification problem per class. Based on the classification, a binary classifier is trained on each problem and then the predictions are made. The goal is to explore a set of internal model parameters and the process is iterative. The algorithm is fast in processing and hence, it is possible to update the model parameters at the desired frequency (for e.g. daily or more) if needed.
As observed in Table 5, ETs achieve better performance than the other classifiers which were tested. From the original training data, each decision tree is constructed. A mathematical criterion is used to split the data based on the best feature, provided from a feature set to each tree from a random sample of k features. In the model, the "entropy" criterion is used to measure the quality of a split. Information gain is used as a criterion for the decision. The entropy is calculated as where c is unique class labels and p i is proportion of classes with output label as i. More information on how RF and ET operate can be found in [67,68]. The information gain is calculated based on the decrease in entropy and is used to calculate the feature importance. The accuracy of the model is assessed through classification accuracy.

Classif ication Accuracy =
Correct P redictions T otal P redictions The probability for one of the classes in each prediction instance will generally be close to 1 and for all the other classes, it will be close to zero. This probability is based on soft voting. As there are several trees J o u r n a l P r e -p r o o f Journal Pre-proof being generated, each of these trees predicts probabilities for the class and finally, the ensemble averages them across the trees. This leads to a probability for each class and the predicted class corresponds to the one with the highest probability.
To evaluate the performance of the classifier, the confusion matrix of the three top performing techniques was analyzed. A confusion matrix is one of the techniques often used to evaluate the performance of a classification algorithm and essentially it provides an overview of the times when the model correctly predicts the expected class. Also, to further assess the impact of input parameters, a feature importance metric was calculated for the Extra Tree classifier. The classification report gives the macro average 1 which is the averaged unweighted mean per label, as well as a weighted average 2 and support 3 . The precision metric indicates the ability of the classifier to avoid classifying positive when it is negative. It is calculated as below: where TP and FP represent total predictions and false predictions, respectively. Recall refers to the possibility of identifying all positive events. It is defined as where FN represents false negative predictions. The F1 score indicates the percentage of predicted positives which were correctly identified. It is the harmonic mean 4 of the recall and precision scores.

Input Data
The data for the analysis were collected from the Nordpool website [69]. We used the hourly data for Sweden over the period 2015-2019 (five years) and cleaned for missing observations. The description of the data is given in Table 2. The data split for the training and the testing dataset is 80%-20%.   The classes are as defined in Subsection 2.1 with the labels ll, lm, lh, ml, mm, mh, hl, hm, hh in the confusion matrices for the classifiers which were analyzed further. The statistical features of the data are described in Table 3. The upper and lower bounds of the specific classes, hl, lh, and mm, which we are concerned with in this example, are given in Table 4.

Classifiers
A summary of the classification accuracy of the different methods is presented in Table 5. Several classifiers were tried for the analysis and the accuracies of the top 5 are presented in Table 5. The accuracies of some of the other classifiers are summarized in Appendix Appendix A. Evidently, the ensemble classifiers were able to predict the classes better and so the Extra Tree, Light Gradient Boosting classifier, and XGBoost were further analyzed. A confusion matrix of these three classifiers is given in Fig. 1. The confusion matrix indicates the classification accuracy which is given in Table  6. The diagonal elements represent the correctly identified classes. As this is an unbalanced dataset, the number of hl and lh classes is lower than other classes. Evidently, the lh and hl classes are predicted more accurately by the Extra Tree classifier than with the LGBM and XGBoost methods.   J o u r n a l P r e -p r o o f

Journal Pre-proof
As the Extra Tree classifier was the best performing classifier, it is useful to assess the relative importance of different features for this classifier. As observed from Fig. 2, the lagged market price (d − 1) has the highest influence on the forecasting algorithm, whereas solar production and CHP production have lesser importance. The feature importance is often dependent upon circumstances. In 2018, the electricity supply mix consisted of 41% nuclear power, 39% hydropower, 10% wind power, 0.2% solar power and the rest from CHP [70]. Although the electricity supply mix is dominated by nuclear and hydropower, they act as base load and they are less important in the feature sense than wind production which is more variable and thereby creates more day-by-day price volatility. The electricity price and demand have the highest feature importance. In general, the feature importance depends on the features which are more variable in nature. The value of the insurance contract (or alternatively, the price of the binary option) is determined as follows: with the values of Q 1 and Q 2 estimated from the data as: The values for Q 1 and Q 2 are in SEK, i.e. local currency on an hourly basis.

Forecasting the Fair Prices for the Hedges
Based on the method discussed in Subsection 2.1 above, the payoffs are calculated, expressed on a per MWh basis to reflect the fair price based on equation (1).

J o u r n a l P r e -p r o o f
Journal Pre-proof From our data, taking the corresponding sample averages in equation (10), and using anticipated losses calculated as outlined in subsection 2.1 leads to Q 1 = 40928.79 and Q 2 = 28098.8 SEK/MWh. 5 It is assumed that the wind operator will consider these differences in median revenues during normal situations and extreme situations as a fair basis for the insurance payoff and that both counterparties to the contract. The wind operator and the insurer will have market information in common to estimate the probabilities of the (W T , M T ) pair being in each of the relevant regions in equation (10). The probability of (W T , M T ) being in an extreme quantile is quite small and close to zero, leading to option prices of around 6-7 SEK/MWh or lower.
The histogram of a year's forecasts is presented in Fig. 3 and the hourly variation is shown in Fig. 4. Table Appendix A in Appendix A summarises the hourly descriptive statistics. In general, it is perhaps surprising that there is not more variation within the day. As regards the size of these prices, they are not large, with a maximum below 7 SEK/MWh compared to the average market price of 325 SEK/MWh (Table III). In practice, much would depend upon the choice of the extreme quantile values agreed with the insurer. Nevertheless, as a price to hedge the daily extremes of revenue risk, this would appear to be quite affordable, being comparable to other transaction costs such as trading fees and use of system charges. Furthermore, we can see in Fig. 3 that there are very few instances of the hedge prices for the binary 5 As we are taking a data-driven classification approach, an average payoff value is calculated ex-ante to the classification outcomes. Note that all the terms in the payoff (including the strike price) being unknown or random at the inception of an option is not at all unusual in financial derivatives. For e.g. an arithmetic average Asian option on a stock with price St has a payoff given by max S T − T 0 Sudu , where the strike price itself is path dependent.
options at prices of 6-7 SEK/MWh indicating, as expected, that these are extreme occurrences. It is also observed from the payoffs per hour (Fig. 4), that the hedge price volatility is greater during the daytime (from 11:00 am until 16:00 hours) than in the evening and early morning. Over test data (8760 samples), we calculated the total revenue with and without hedging for each class. Here, hedging refers to the purchase of the binary option which fixes the per MWh price for I lh and I hl classes, as mentioned in Subsection 3.2. Given that we are insuring against the worst case (wind power, market price) scenarios, we also looked separately at the total revenue for 1% worst case revenue scenariosboth for all the classes together (87 data points) and solely for classes I lh and I hl (8 data points), with and without hedging. The results are given in Tables 7 and 8 and are summarized below.  Due to the cost of option premium, the total revenue with hedging is slightly lower than the total revenue without hedging, although the difference is only 0.19%. As anticipated, the revenue under the worst case conditions with hedging is higher than the revenue under the same conditions without hedging. The improvement of the worst-case revenue for data points in only the two extreme classes (I lh and I hl ) is quite significant (10.01%). When considering the total revenue of data points in all the classes together in the worst 1% quantile, hedging still brings about 1.48% improvement in revenue. The improvement is quoted after accounting for the cost of purchasing the option contract. Finally, the absolute worst-case revenue over the entire test data improves from 14,577 SEK without hedging to 40,928 SEK with hedging. This shows the value of using the proposed binary option contracts as an insurance for controlling extreme revenue fluctuations.
We summarize our approach by means of a detailed flowchart as shown in Fig. 5. The methodology starts with data collection and cleaning. The definition of classes is followed by the identification of the explanatory variables. The machine learning based classification approach (Subsection 2.2) is then used for forecasting the classes and calculating the probabilities. These values are then utilized to calculate the payoff as described in Subsection 2.1.

Conclusion
Operators of merchant wind facilities are concerned about both output and market price volatility and their risk managers will generally require them to hedge against extreme adverse events. Furthermore,   with regard to investor relations, reporting on hedging practices is becoming increasingly required. Thus, forecasting the cost of hedges is an important element in the range of forecasting tasks faced by the operator. This is particularly crucial because such hedges are usually arranged as bespoke contracts with insurance companies and in this context, it is essential for the operator to negotiate with a view of the fair price in mind.
We have developed and validated a novel data-driven approach to predict the fair price of such hedges, which take the form of binary options. They are exercised if wind and market price variables occur in predefined extreme ranges, as expressed by quantiles. We compared various machine learning classifiers to predict the probability of the binary option being exercised and these demonstrated good accuracy. The fair prices were then calculated as expectations and appeared to be plausible in the case study context examined. The fair prices are in the range of 6-7 SEK/MWh indicating that they are small enough to act as realistic hedges, compared to the average market price of 325 SEK/MWh. These fair prices do of course depend upon the risk criterion as defined by the selected quantile. They turned out not to be expensive in comparison to the other trading costs, such as transaction and use of system fees, and as such, we would argue that as revenue risk becomes more important for wind generators, the use of these hedges will become more attractive. In that context, accurate forecasting of the hedge prices will be necessary to help the asset owners manage the risk and negotiate with the insurers.
Whilst the case study is illustrative of a particular set of parameters -in practice the hedges can be designed according to different degrees of extreme conditions and payoffs -we argue that the approach has wide generality and can be extended across different horizons and broader portfolios of assets.