A novel information geometry method for estimating parameters of the Weibull wind speed distribution

Accurate modelling of wind speed is very important for the assessment of the wind energy potential of a region. The two-parameter Weibull distribution is widely used to model wind speed. Many different numerical methods can be used to estimate the shape and scale parameters of the Weibull distribution function. This paper proposes the estimation of parameters based on a novel approach, the Information Geometry Method (IGM). Non-Euclidean geometry and the Riemannian metric called the Fisher metric or the information metric are used in this approach. Differential equations derived from the Fisher information matrix are solved for the Weibull statistical manifold by the shooting method. The IGM is compared with the graphical method, maximum likelihood method, method of Lysen, method of Justus, and power density method. In particular, it is shown that this approach has a better performance than the other estimation methods according to the power density results for the periods of three years from 2012 to 2014 for Bilecik, a city of Turkey. Therefore, the IGM as a new approach used for the estimation of parameters of the Weibull distribution function can be a good alternative for the assessment of the wind energy potential.


INTRODUCTION
* Wind energy has been the most expeditious growing renewable energy technology in recent years.Along with the development of the wind industry, the global capacity of wind turbines reached approximately 486 GW by the end of 2016 [1].The potential of the wind energy of a certain region can be determined before a wind conversion system is installed.The success of the determination of the wind energy potential depends on accurate wind speed modelling.The statistical properties of wind speed are important to predict the output energy of a wind conversion system [2].There are several distribution functions for wind speed and power density analysis in the literature.The log-normal distribution [3][4][5][6], inverse Gaussian distribution [7], wake by [8,9], three-parameter log normal [10], gamma distribution [11,12], two-parameter gamma distribution [13], hybrid distributions [14][15][16], three-parameter generalized gamma distribution [6,17,18], and similar distribution functions are used in energy research and other research areas.The two-parameter Weibull density function [19] is commonly used in wind resource assessment to describe wind speed as a stochastic quantity.There are many different methods for estimating the shape and scale parameters of the Weibull wind speed distribution function [19][20][21][22][23].
This paper proposes a novel method based on information geometry for estimating the two-parameter Weibull distribution function.The information geometry method (IGM) is compared with other parameter estimation methods using 2012-2014 wind speed data for Bilecik, a city of Turkey.The paper is structured as follows.Material and methods are presented in Section 2. Section 3 describes basic concepts of information geometry, the Fisher information matrix, the IGM technique, and the Weibull statistical manifold.In Section 4, the IGM and the results of estimation methods are presented for the calculation of Weibull parameters.Comparisons of the results are presented and the superiority of the IGM is emphasized in Section 5.

THE WEIBULL DISTRIBUTION
The two-parameter Weibull distribution is one of the most commonly used statistical approaches in the modelling of wind speed data.The Weibull distribution function is given by Eq. ( 1) [24][25][26][27][28][29] where f(v) is the frequency or probability of the occurrence of a wind speed, c is the Weibull scale parameter, and k is the Weibull shape parameter [30].
The cumulative Weibull distribution function F(v) gives the probability of the wind speed exceeding the value v.It is expressed by Eq. ( 2) [31,32]: The wind power is commonly found by the following equation: where ρ (kg/m 3 ) is the air density and A (m 2 ) is the swept area.The mean power density for the Weibull distribution is given by Eq. ( 4) [33,34]: where Γ is the gamma function.At sea level and at 15 °C, the density of the air is approximately ρ 0 = 1.225 kg/m 3 .The corrected air density in reference to the height of the sea level (H m ) and other location information can be found according to Eq. ( 5) [35]: In this paper, by applying Eq. ( 5) the air density ρ = 1.1235 kg/m 3 was calculated for Bilecik.

METHODS FOR ESTIMATING WEIBULL PARAMETERS
There are several methods for estimating the shape and scale parameters of the Weibull distribution function in the literature.In this paper, analysis of wind energy potential is performed using the graphical method (GM), maximum likelihood method (MLM), method of Justus (MJ), method of Lysen (ML), power density method (PDM), and a novel method based on Information Geometry (IGM).

Graphical method
The cumulative distribution function is used in the GM.Wind speed data are interpolated based on the least square regression [36].In this method, both sides of the equation of the cumulative probability density function are obtained with Eq. ( 6) by twice taking the natural logarithm.

   ]]
A plot of ln[-ln [1-F(v)]] versus the ln v graph presents a straight line with a slope of k.The application of the graphical method requires that the wind speed data be in the cumulative distribution format.

Maximum likelihood method
The MLM is proposed in [19].This method requires large-scale numerical iterations.The shape (k) and the scale parameter (c) are calculated by Eqs ( 7) and ( 8), respectively.
where v i is the i th observed wind speed and n is the number of all observed non-zero wind speeds.
where  is the standard deviation of wind speed (m/s), v is the mean wind speed (m/s), and is gamma function.Lysen uses the same shape parameter (k) as is used in Eq. ( 9) in the MJ.In the ML, the scale parameter equation ( 11) is obtained as follows [22]:

Power density method
Akdag and Dinler [21] suggest that the shape and scale parameters be estimated because this has a simpler formula.
By using Eqs ( 4) and ( 8) one obtains Eq. ( 12): where V is the energy pattern factor (E pf ).Weibull parameters can be estimated approximately by the PDM, which is given by Eq. ( 13):

Information geometry method
Information geometry's main idea is to apply methods and techniques of non-Euclidean geometry to stochastic process and probability theories.Information geometry indicates that the use of a Euclidian geometry technique is not absolutely correct.Galanis et al. [37] proposed a novel approach based on statistical and geometrical techniques, which is information geometry for wave height characteristics in the North Atlantic Ocean.They presented two scenarios that work for points in the same neighbourhood using geodesics.In this paper, the use of geodesics is examined in detail.
Families of probability distributions are described as manifolds on which geometrical things such as distances, Riemannian metrics, curvature, and affine connection can be presented.A family of probability distributions is recognized as an n-dimensional statistical manifold S [37].
The geometrical model in a statistical manifold is defined by the Fisher information matrix, which at ..., .
Here  is the log-likelihood function, which is given by Eq. ( 16): and Eq. ( 17) shows the expectation with respect to the distribution p: The Fisher metric or the information metric is a Riemannian metric.These model properties are defined by the so-called Christoffel symbols ( i jk  ) of the Levi-Civita connection with respect to the Fisher metric, which are defined by solving the following equation [41]: where ..., . 2 The minimum distance between two elements f 1 and f 2 of a statistical manifold S is defined by the corresponding geodesic ω, which is the minimum length curve that connects them [41].Such a curve that satisfies the following system of second-order ordinary differential equations is given by   '' ' ' , 1 : : under the conditions In the literature, detailed information about and results of applying information geometry techniques can be found in Amari [38], Amari and Nagaoka [39], and Arwini and Dodson [40,42].
The Weibull distribution function is given by Eq. ( 1) in Section 2. The family of the two-parameter Weibull distributions can be considered as a 2-dimensional statistical manifold with The log-likelihood function becomes ( , ) log( ( , )) log log ( 1)(logv log ) .
The Fisher information matrix can be calculated by Eq. ( 15) [38,39]: where the Euler gamma is 0.577215 . The Christoffel symbols of the Levi-Civita connection with respect to the Fisher metric, which are defined by solving Eq. ( 18), are given in Eq. ( 23): .
So substitution of Eq. ( 23) into Eq.( 19) yields The shape and scale parameters can be determined by Eq. ( 24), which can be solved numerically.NDSolve of Mathematica based on the shooting method is used for the solution.The boundary conditions of the differential equation system are chosen according to the values from applying other estimating parameter methods, namely GM and MLM, in the solution steps.Detailed information about the solution steps is explained with sample wind speed data in Section 4.

Case study and wind speed data
In Table 1, some descriptive statistics such as maximum, mean, standard deviation, skewness, and kurtosis of the used wind speed data for the Bilecik station are presented.The coefficient of kurtosis, which measures the peakedness of a distribution, is very high for Bilecik.The probability and cumulative probability densities of the measured wind speed for Bilecik are shown in Fig. 2.These data give a very good insight into the properties of the wind speed in the selected region.
In this paper, six different methods -GM, MJ, ML, MLM, PDM, and a novel method, IGM -are used for the parameter estimation of the Weibull distribution.The shape and scale parameters were calculated by Eq. ( 24) in the IGM.The probability density function of the MLM has the shape parameter k = 1.9236 and the scale parameter c = 2.3030, while for the relevant GM outputs, k = 2.3588 and c = 2.6292.The minimum length curve that gives the distance between the two distributions is a two-dimensional curve, w = (w 1 , w 2 ), that can be obtained as the solution of the following differential system: (1) 2.6292 This nonlinear differential system is solved numerically based on the shooting method with NDSolve.The steps given in Fig. 3 are followed for the determination of the shape (k) and scale parameters (c) in the IGM.Firstly, the boundary conditions can be determined for Eq. ( 24) from the GM and MLM.Secondly, these equations are solved by some numerical methods such as the shooting method, which is used in this paper.Optimal points, which are k (0.42) and c(0.58) for the shape and scale parameters, are found from the geodesic curves for the Weibull statistical manifold.Optimal points are determined according to the golden ratio in this approach.The variations of the shape (k) and scale (c) parameters from January 2012 to December 2014 are shown in Fig. 4. Descriptive statistics of the wind power density were evaluated by using six different methods and the measured data (Table 2).All analyses for the six different methods were executed by using the Matrix Laboratory (MATLAB) programming language and the Mathematica platform.The IGM, which has 0.0301% error rate, performed better than the other methods for the period of three years in terms of mean power density (see Table 2).Although the IGM gave better results for a large data structure, its error rate, which has the Fisher Information Matrix structure, increased for monthly analysis compared with the PDM (Table 3).A wide variance of data and smaller number of samples cause a decrease in the accuracy of the model [43].

CONCLUSIONS
Determination of the wind energy potential depends on accurate modelling of the wind speed.Statistical properties of the wind speed are important for the prediction of the energy output in a wind conversion system.The two-parameter Weibull distribution is widely used to model the variation of wind speed.
In this paper, a novel method, the IGM, is used for parameter estimation of the two-parameter Weibull distribution.This new method is compared with other methods (GM, MJ, ML, MLM, and PDM) widely used in the literature.From January 2012 to December 2014 the IGM, a new branch of mathematics, showed better results than the five different methods mentioned above.Based on monthly analysis in the same time period, the IGM was superior in performance compared to the GM, MJ, ML, and MLM, but compared to the PDM its error rate was higher due to the smaller number of data and higher variance.In the IGM, the annual error rate of 0.0301%, monthly best error rate of 1.1587%, and monthly worst error rate of 8.0470% were observed according to power density values.In order to establish a wind energy conversion system, it is important to determine the power values correctly.
As a result, this paper has added a new approach using information geometry, which is a non-Euclidian structure, by using the Fisher Information Matrix to the literature for parameter estimation of the twoparameter Weibull distribution.

Fig. 2 .
Fig. 2. Probability and cumulative probability densities at 10 m height for the period of three years for Bilecik, Turkey.

Fig. 4 .
Fig. 4. Variations of the shape and scale parameters according to the Information Geometry Method from January 2012 to December 2014.

Table 1 .
Statistical values of wind speed data for Bilecik

Table 2 .
Wind power density (W/m 2 ) and Weibull shape and scale parameters calculated by measured data and six different methods*