Pipeline for Annual Averaged Wind Power Output Generation Prediction of Wind Turbines Based on Large Wind Speed Data Sets and Power Curve Data

Graphical abstract


Introduction
Interest in wind energy as one main future renewable energy source has risen constantly over the past years [3] . Due to this rising importance of wind energy as one major ingredient to reduce carbon dioxide emissions, prediction methods for wind power output are valuable tools for determination of wind turbine locations. For this purpose, one must model power curves from given power curve data from manufacturers as summarized in [2] . Additionally, one needs to further identify possible wind speed distributions from given large wind speed data sets [1,4] . Based on these articles, Wacker, Seebaß and Schlüter proposed an abstract framework for annual averaged wind power output generation prediction of wind turbines which heavily relies on large wind speed data sets and power curve data of wind turbines [3] because these methods are often considered separately in the literature, and can scarce be found combined. However, it seems important to present algorithmic aspects of the aforementioned article in greater details.
For these reasons, we present a complete pipeline for annual averaged wind power output generation prediction of wind turbines in this article which was first developed in [3] . This method relies heavily on large wind speed data sets, arbitrary power curve modeling techniques and arbitrary wind speed distributions. Finally, we provide one detailed example from a weather station situation in Bremen, Germany.
As already mentioned, prediction of produced energy by a wind turbine is an important topic because renewable energy sources are necessary to reduce carbon dioxide emissions. In this work, we provide details regarding our abstract pipeline's framework for this goal. The following steps are necessary ingredients.
• Step 1: Since wind speed data sets may come from different sources, different pre-processing steps need to be taken into account. This includes adjusting wind speeds at different heights by so-called power laws. • Step 2: Different power curve models might be adapted to given power curve data. • Step 3: We choose different wind speed probability distributions to fit our processed wind speed data. • Step 4: As our main output, we approximate integrals by finite sums to calculate semi-empirical and estimated wind power output generation prediction values numerically. • Step 5: We suggest different goodness-of-fit measures for evaluation purpose.

Method details
Let v j N j=1 be a time series of measured wind speed at a certain weather station. Let ( v k , P k ) be measured power curve data of a manufacturer's wind turbine prototype. These data sets build our foundation for our wind power output generation prediction algorithm. We portray the graphical flowchart of our algorithm in Figure 1 . All steps coincide with our procedure presented in our abstract. We mainly follow our preprint but add further details regarding our methods. However, we especially discuss power curve modeling and uncertainty quantification in a more detailed manner.
Step 1: Processing of wind speed data sets In our complete pipeline, we process wind speed data sets provided by the German Weather Service (DWD) [5] and National Centers for Environmental Information [6] . Since both data sets differ, we have to adjust our processing steps accordingly.
Let us first consider wind speed data sets from the German Weather Service. We take a closer look at data from the weather station located at Bremen, Germany (Station ID: 00691). Data can be extracted from the corresponding ZIP -archive and data are contained in the text-file named produkt_ff_stunde_00691.txt . The fourth column consists of measured wind speed with physical unit m s .
Missing data are replaced by −999 . This fact implies that we have to delete these entries from our code. As an outlook, we also need to delete zero wind speed values from this column for estimation of two-parameter Weibull distributions (compare Step 3).
International data sets from National Centers for Environmental Information need different treatment. A short extract of such files is given below.
The ninth column contains wind speed values scaled by a factor of ten. For this reason, we have to rescale these data by dividing these values by a factor of ten. Since international data sets are archived for every year, we must put together complete time series. If we want to adjust the given wind speeds of the weather stations at reading height to hub height, we need so-called power laws [7,8] . If h r is the reading height and v r is the measured wind speed at reading height, the extrapolation power law for the new wind speed v at hub height h reads where α is an empirical coefficient depending on the location's roughness. For further details, we refer interested readers to [7,8] . Concluding this step, we provide short pseudo-code which describe our wind speed processing procedure in Algorithm 1 .
Since all following steps are the same for different time series of wind speeds, we restrict our discussion and results to the case of wind speeds measured at reading height. However, all our mentioned steps can still be carried out if we apply the power law to the wind speeds at our preprocessing step 1.

Inputs: Wind speed data sets
Step 1: Put all archived data files together such that one complete time series is available. This is only necessary for wind speed data from National Centers for Environmental Information.
Step 2: Choose right column of wind speed data.
Step 3: Delete non-zero entries for all wind speed probability distributions and additionally eliminate all zero entries for two-parameter Weibull distributions.
Step 4: Rescale wind speed by a factor of ten. This is solely necessary for wind speed data from National Centers for Environmental Information.
Step 5: If you adjust wind speeds to hub height, you have to apply the above mentioned power-law with respect to the wind speeds' time series. Outputs: Prepared wind speed data sets Step 2: Power curve modeling A typical course of wind power curves is shown in Figure 2 . We observe that wind power curves can be described by piecewise defined functions. This general approach reads where q ( v ) is an arbitrary function on v cut-in , v rated . Here, v represents wind speed while v cut-in , v rated and v cut-off denote cut-in wind speed, rated wind speed and cut-off wind speed respectively. P rated is the rated power output.
Before considering power curve modeling in more detail, we summarize given wind speed data from manufacturer Vestas [9] in Table 1 . We clearly see that we can algorithmically determine v cut-in , v rated and v cut-off from these data. Determination of v cut-in is portrayed in Algorithm 2 .

Algorithm 2: Pseudo-code for determination of v cut-in
Finally, our procedure for calculation of v cut-off is given in Algorithm 4 . Now, we can use these results to interpolate power curve data points by certain power curve models. We restrict ourselves to two methods of cubic spline interpolation and logistic regression. For further models, we refer interested readers to [2] .
At first, we begin with cubic spline interpolation.
Algorithm 3: Pseudo-code for determination of v rated Inputs: Power curve data ( v k , P k ) with wind speed series { v k } and power output series { P k } and Output: j rated and v rated Algorithm 4: Pseudo-code for determination of v cut-off Our cubic spline interpolation model q cub reads where a l , b l , c l , d l for l ∈ { 1 , . . . , M } are all cubic interpolation parameters and θ cub all summarizes them in one vector. To build the linear system, all data points have to be passed and first derivatives must be continuous. We further need to define appropriate boundary conditions. For further details on cubic spline interpolation methods, we refer interested readers to Fritsch and Carlson [10] or Hyman [11] . With respect to the scripting language R , all these variants are implemented by splinefun .
Let us now consider logistic regression. The logistic regression model function q log is defined by where B, C, D, E, F are all logistic regression parameters which are summarized in θ log . To apply ordinary least-squares regression, we define an optimization cost function J by where v j , P j are given power curve data points. We refer interested readers to the optimization book of Nocedal and Wright for details on different algorithms to solve this problem formulation [12] .
Step 3: Wind speed probability distribution modeling A given time series { v k } N j=1 of N wind speed data points is our input for wind speed probability distribution modeling. Since we only can provide a non-exhaustive overview on this vast field, we refer interested readers to the review by Wang and co-authors [13] .
We concentrate on three probability speed distribution models which are often applied in wind speed modeling [4] -two-parameter Weibull distributions, four-parameter Kappa distributions and five-parameter Wakeby distributions.
Let us start with the two-parameter Weibull distribution, the most-common used wind speed probability distribution in wind speed analytics. It is often obtained by maximizing log-likelihood functions. This method has favorable statistical properties. Recently, Wacker, Kneib and Schlüter also proved that this functional has a unique global maximizer [14] . Hence, numerical optimization simplifies in this case.
The two-parameter Weibull distribution reads for all v > 0 where A Wei denotes the scale parameter and k Wei the shape parameter of the corresponding distribution [15] . The corresponding maximum log-likelihood function is defined by We determine first derivatives of L by and respectively. If we set these equations equal to zero, we will obtain a nonlinear system of equations which can be solved, for example, by Newton methods [12] . We use such methods which are supplied by R -packages EnvStats [16] or fitdistrplus [17] .
for all v ≥ 0 with scale parameter A Kap , shape parameter k Kap , location parameter μ Kap and second shape parameter h Kap . Here, the cumulative distribution function is given by Finally, let θ Wak = A Wak , γ Wak , k Wak , μ Wak , h Wak be the summarizing vector of all five parameters for the Wakeby distribution. The five-parameter Wakeby distribution is then defined by for all v ≥ 0 with scale parameter A Wak , second scale parameter γ Wak , shape parameter k Wak , location parameter μ Wak and second shape parameter h Wak . Here, the cumulative distribution function is implicitly given by In contrast to two-parameter Weibull and four-parameter Kappa distributions, this implies that fiveparameter Wakeby distributions are only implicitly defined. An often applied method to estimate parameters in four-parameter Kappa and five-parameter Wakeby distributions is the estimation method of L-moments. This method is implemented in the R -package lmomco from Hosking. For details on this estimation technique, we refer interested readers to Hosking's paper [18] since we use Hosking's Fortran implementation.

Step 4: Calculation of annual averaged wind power output generation values
The important output of algorithmic procedure are semi-empirical and estimated annual averaged wind power output generation values from from arbitrary power curves P Power and arbitrary wind speed probability distributions p Wind . This calculation is based on approximations of finite integrals.
The semi-empirical averaged hourly wind power output generation value reads for all wind speed data v j ≥ 0 for all j ∈ { 1 , . . . , N } with physical unit kW h −1 . Finally, the semiempirical averaged annual wind power output generation value is obtained by calculating P Ann., Semi-Emp. = 365 · 24 · P Hourly, Semi-Emp.
10 0 0 0 0 0 (14) with physical unit GW year . These values serve as comparative values for our estimations. Now, we are able to approximate estimation values based on finite integrals. Let us begin with estimated hourly averaged wind power output generation values. We calculate them by P Hourly, Th.  (16) because wind speeds are normally measured in 0.1 steps. Other possibilities are right-sided Riemannian sums, trapezoidal approximations or Simpson's rule. Since numerical integration is a vast field, we refer interested readers to the book by Davis and Rabinowitz [19] . This integral yields one hourly averaged wind power output generation value with physical unit kW h −1 . Finally, the annual averaged wind power output generation value is given by P Ann., Th. = 365 · 24 · P Hourly,T h 10 0 0 0 0 0 (17) and the physical unit of annual averaged wind power output generation values reads GW year .
Step 5: Goodness-of-fit measures and uncertainty quantification Since we want to compare different fits to curves, we often challenge the problem of comparing them. Coefficients of determination are applied to compare parametric models. Let v i ∈ v cut-in , v cut-off be all measured wind speeds which are larger than the cut-in wind speed v cut-in and which are smaller than the cut-off wind speed v cut-off . Denote empirical wind speed probabilities by p Emp. ( v i ) and estimated wind speed probabilities of certain wind speed distribution models by p Wind ( v i ) . The mean of all empirical wind speed probabilities is represented by p Emp.
The coefficient of determination reads where summations are performed over all measured wind speeds which are larger than v cut-in and which are smaller than v cut-off .
We discuss error analysis on this two-parameter Weibull distributions in a more detailed manner. Our analysis relies on Taylor's book [20] . Our starting point is (15) . Assume both functions p Wei and P Power to be uncertain. Here, the wind speed probability distribution function is the two-parameter Weibull distribution. Assume that the variables x 1 , . . . , x n are measured with uncertainties δx 1 , . . . , δx n and these values are used to compute a function value f ( x 1 , . . . , x n ) . If formula (3.48) for the uncertainty δ f of f from [20] is applied, the lower bound error of p Wei reads Since one main goal of this article is the prediction of annual averaged wind power output generation values, absolute differences of such values are suitable comparative measures. The absolute difference between semi-em pirical and estimated annual averaged wind power output generation values reads P Values = P Ann., Semi-Emp. − P Ann., Th. . (20)  Reading height 7 Number of data 8 Mean wind speed 9 Standard deviation of wind speed data 10 Minimum of wind speed data 11 Maximum of wind speed data 12 k Wei 13 A Wei 14 Semi-empirical annual averaged power output generation values 15 Estimated annual averaged power output generation values by Weibull distributions 16 Errors of Weibull estimates 17 Absolute differences between semi-empirical values and Weibull estimates 18 Estimated annual averaged power output generation values by Kappa distributions 19 Absolute differences between semi-empirical values and Kappa estimates 20 Estimated annual averaged power output generation values by Wakeby distributions 21 Absolute differences between semi-empirical values and Wakeby estimates Step 6: Summary of results All obtained data are summarized in one file. We list the important results that one might want access.
These data are saved in one file named Results_01.txt . A reduced version of collected data is saved in one file named Results_02.txt .

Example: Bremen, Germany
We first summarize some important data regarding weather station no. 00691 located at Bremen, Germany in Table 3 .
These data is taken from a meta-data-file which accompanies the weather-station-data-file. After calculation, we obtain the following results. All these results are summarized in Table 4 .

Code availability and data availability
The R [21] and GNU Octave [22] codes can be downloaded from https://github.com/bewa87/ 2020-Energy-AAPOGFWT . Data for the presented wind turbine from Vestas can be obtained from https://www.wind-turbine-models.com/turbines/7-vestas-v112-onshore#datasheet . Wind speed data for all German weather stations are available under [5] and wind speed data for worldwide weather stations can be accessed under [6] .