Methods for statistical analysis of the results of monitoring the avifauna on the territory of wind farms

An algorithm for processing the results of monitoring the dynamics of ornithocomplexes on the territory of wind farms using statistical methods is proposed. The solution of the most frequently encountered problems in the analysis of bird migration in the wind farm zone is considered: 1) An algorithm for the primary statistical processing of information on the number of birds of various species, flight altitude and the time of their stay in the zone of interaction with turbines during monitoring has been developed in two ways: a method of route census and observations in accordance with the recommendations of the Scottish Natural Heritage Foundation. 2) The features of the application of correlation and regression analysis have been considered, which allow determining the dependence of the number of birds on a number of factors using the Student, Pearson and Fisher criteria in the presence of strong noise interference. 3) An algorithm of statistical analysis is proposed using a trend approach based on the Student, Irwin, Durbin - Watson, Pearson and Fisher criteria. The considered statistical methods were tested on the results of migratory bird census on the territory of the Prymorsk-1 wind farm located on the coast of the Sea of Azov, which were obtained by a group of researchers led by V Siokhin and P Gorlov.

Abstract. An algorithm for processing the results of monitoring the dynamics of ornithocomplexes on the territory of wind farms using statistical methods is proposed. The solution of the most frequently encountered problems in the analysis of bird migration in the wind farm zone is considered: 1) An algorithm for the primary statistical processing of information on the number of birds of various species, flight altitude and the time of their stay in the zone of interaction with turbines during monitoring has been developed in two ways: a method of route census and observations in accordance with the recommendations of the Scottish Natural Heritage Foundation. 2) The features of the application of correlation and regression analysis have been considered, which allow determining the dependence of the number of birds on a number of factors using the Student, Pearson and Fisher criteria in the presence of strong noise interference. 3) An algorithm of statistical analysis is proposed using a trend approach based on the Student, Irwin, Durbin -Watson, Pearson and Fisher criteria. The considered statistical methods were tested on the results of migratory bird census on the territory of the Prymorsk-1 wind farm located on the coast of the Sea of Azov, which were obtained by a group of researchers led by V Siokhin and P Gorlov.

Introduction
The intensive development of wind energy has a significant impact on the avifauna. Numerous observations of researches testify to the need to expand research on the dynamics of ornithocomplexes on the territory of wind farms with the involvement of information technologies and statistical methods of information processing [1][2][3]. A large number of studies have been devoted to the study of the wind energy impact on ornithological complexes [4][5][6][7]. One of the main tasks of monitoring birds on the territory of the wind farms is to obtain data on the dynamics of their quantitative and species characteristics, as well as the height and direction of migration in different seasons [8,9].
The results of long-term observations over several years make it possible to predict the impact of wind power on the ecological situation associated with changes in the avifauna. Currently, there is an extensive information related to the interaction of birds with wind turbines in various regions, which has been accumulated during the processing of the results of monitoring the territory of wind farms and adjacent regions for several decades [1-3, 9, 10]. The use of various methods of observation expands the possibility of predicting undesirable impact on the avifauna and taking effective measures to reduce this impact.  [5,[11][12][13] are often used, which provide for the fulfillment of a number of requirements. In particular, it is assumed that monitoring should be carried out in several areas, which in terms of their landscape and biotope characteristics adequately reflect the entire territory of the wind farm. A sufficient amount of time is allocated for observations, covering the main periods of bird migration in the region. Generally, research should be carried out in spring and autumn during the bird migration, as well as during the breeding period and in winter. Observation is carried out simultaneously by several researches [8,11,12]. During monitoring, the following parameters are recorded: The results of observations in accordance with the recommendations of the Scottish Natural Heritage Foundation make it possible to determine the activity coefficient of birds of the j species in the risk zone of collision with turbines at the k site K Risk(k)(j) , which is necessary to predict the interaction of birds with rotor blades. Its value is [13] where n Risk(k)(j) -the number of birds of the j species in the i group recorded in the k site in the RZ during the time interval t Risk(k)(j) .
In a similar way, it is possible to characterize the activity of birds for other heights, as well as in various monitoring areas and the wind farm territory as a whole. For example, the activity coefficient of birds in the k site is determined by the formula An alternative way to obtain information about the behaviour of birds on the territory of the wind farm is the method of route census (RAM) [11,14]. In this case, all birds sitting on the ground and in flight are taken into account. For the last group of birds, the number of individuals of each species, the direction of migration, the height and the type of flight (transit, forage, breeding) are determined. Generally, there are recorded those birds that are located at a distance of no more than 500 m to the left and to the right of the direction of the observer movement along the selected route.
The effectiveness of the analysis of the recorded data largely depends on the adequacy of statistical samples and the correctness of the application of statistical methods. The peculiarity of the processing of primary information in ornithology is associated with the presence of a number of objective reasons, which often make it difficult to obtain reliable conclusions. Therefore, the creation of algorithms that ensure the correct processing of observation results is of great practical importance. This research paper is devoted to the description of the methods of mathematical statistics adapted to the study of the dynamics of ornithocomplexes on the territory of the wind farms.
2. The purpose of the work and setting the task The purpose of the paper is to develop an algorithm for the use of statistical methods in the analysis of the results of monitoring the avifauna on the territory of the wind farms in solving the following problems:  [8,12,13] and RAM [8,9,14] methods. (ii) Correlation and regression analysis to determine the dependence of the number of birds on various factors using the Student, Pearson, Fisher, Irwin criteria. (iii) Trend analysis of the results of monitoring the wind farm territory based on the Student, Irwin, Durbin -Watson, Pearson and Fisher criteria.

Primary statistical analysis of the observation results
The number of birds that are recorded in the process of monitoring on the territory of the wind farms varies from several dozens to several thousand individuals and more. Therefore, information processing is carried out with the involvement of computer technologies based on mathematical statistics [1,9,11] . In this section, we will focus on the traditional methods of primary statistical processing of observational results, which are usually used in the study of avifauna on the wind farm territory. observed results The following parameters serve as objects of statistical analysis on the wind farm territory: the number of birds, the number of species, the time spent at the registration site, and the activity coefficients of individuals of various species at different heights, determined by formulas (1), (2). Let X = x 1 , x 2 , . . . x n is the number of values that are obtained during the registration of individuals at one of the registration sites during n days. The degree of scattering of each xi relative to the average value x a = x i /n x is estimated using the empirical variance s 2 . The average statistical deviation is taken equal to s x = s 2 x . Statistical data of ornithological research are distinguished by large values of s x , therefore, in some cases (for example, in the graphical interpretation of the results), a smoothing operation is used. The easiest way to smooth is to convert the original data to new values using the formula The weak point of smoothing is the decrease in the number of elements in the original row statistics, which is equal to n-2 with a single smoothing. Along with the simple operation (1.1), where the averaging is performed over three elements, other methods of linear and nonlinear transformation are used, which are widely used in economic and sociological research [15].
In many cases, there is a problem of comparing monitoring results obtained in different situations, as well as when comparing data at two sites or for different seasons of the year. Let X = x 1 , x 2 , ..., x n and Y = y 1 , y 2 , ..., y n are two samples, each consisting of n x and n y elements. Let us denote the mean sample values as x a , y a . The verification of the hypothesis about the equality of mathematical expectations is carried out in the case of a known random variable distribution law at a given significance level q. Various methods are used to identify the distribution law [15]. As an example, let's consider the use of the Pearson's criterion [16] when establishing the normal law of data distribution for the sample X = x 1 , x 2 , ..., x n . Let's divide the sample into m equal intervals h = (x n − x 1 )/m. Let us denote the number of elements in the k interval as n xk . The Pearson criterion is calculated by the formula where n ′ k = hn x p k /s x -a theoretical value of the number of elements in the k interval, corresponding to the normal random variable distribution law for the analyzed sample, p kthe probability that the random variable is in the k interval, x k -the average value of the elements in the k interval. If the value of the χ 2 -criterion calculated by the formula (4) is less than the critical value χ 2 kr for m-3 degrees of freedom at a given significance level q, then the sample obeys the chosen law. For χ 2 > χ 2 kr , the hypothesis about the possible use of the normal distribution law is rejected. The normal distribution law is characterized by two parameters -the mathematical expectation a and the general variance δ 2 . In practice, instead of a, the average value of x a , is used, and instead of the general variance, the empirical variance s 2 x is used. During the primary processing of row statistics, it is recommended to analyze the possibility of the appearance of anomalous values associated with technical errors in obtaining data or processing the original information. The rejection of erroneous data can be carried out by the Irwin's method. The method is based on determining the coefficients The values of λ i are compared with the critical parameter λ a . If λ i > λ a , then the i element of the row statistics is rejected. Parameters λ a depending on the sample size for the 5% significance level are presented in table 1.  It should be noted that the rejected element may not be the result of a measurement error, but reflect the influence of an unknown factor. But here mathematics is powerless. The decision on rejection is made by the researcher. Testing the hypothesis about the equality of two mathematical expectations of the average value is performed using the Student's T -test [17] where s = ((n x − 1)s 2 x + (n y − a)s 2 y )/(n x + n y − 2) If T is less than the critical value T kr for the significance level q for υ = n x + n y − 2 degrees of freedom, then the hypothesis of equality of means is true. For |T | > T kr , the hypothesis is rejected. Sample values of variances s 2 x , s 2 y characterizing the scatter of the results of specific measurements generally differ, therefore, before using the formula (6), one should make sure that the general variances δ 2 x = δ 2 y are equal. Testing the hypothesis about the equality of general variances can be carried out using the Fisher criterion If the value of F is less than the critical value F kr with a given significance level, then the general variances are equal. Otherwise, the hypothesis of equality of dispersions is rejected. Checking the homogeneity of variances with a large number of samples is carried out using the Duncan method and other methods. As an example, we will make a primary statistical analysis of the monitoring results at three vantage points (VP) belonging to the territory of the Prymorsk-1 wind farm. The site sizes of VP1, VP2 and VP3 were respectively 0,50 km 2 , 0,86 km 2 and 1,23 km 2 , figure 1. Observations were carried out during the spring period of the bird migration in 2017 [11]. The results of daily registration of the total number of birds n (k) at the k site per 1 km 2 , which flew in transit during the spring migration period, are presented in table 2. At the first stage, we reject possible erroneous measurements using the Irwin method. According to table 1, Irwin's critical value λ a for the 5% significance level with the number of measurements n=7 is about 2.4. Coefficients λ i for all row statistics in accordance with the data in table 2 is much less than the critical value. The exception is the measurement n (2) = 1.26, which was carried out on April 3, 2017, when λ i turned out to be 3.1.
The results of variance calculations of s 2 x , standard deviations sx and the Pearson's criteria χ 2 according to the data of table 2, from which the measurement n (2) = 1.26 obtained on April 3, 2017 is excluded, are presented in table 3. The values of x 2 were calculated for four intervals: m=4. The Pearson's criterion is recommended to apply for large samples. Therefore, the obtained data should be treated with caution.
The critical value of the Pearson criterion for the 5% significance level at m-3=1 (m=4) degrees of freedom is 3.8. The calculated value x 2 =6.7 for the first area is greater than the critical value, so the first row statistics do not obey the normal random variable distribution law.
Checking the homogeneity of variances in the number of birds in different sites using the Fisher's formulas (7) gave the following results for a 5% significance level: F 23 = 1.4 at F kr = 5.0, F 13 = 7.6 at F kr = 4.3, F 12 = 10.4 at F kr = 5.0. Here, the numerical indices in the notation of   (6) is 0.0088, the critical value of T kr for the 5% significance level at 11 degrees of freedom is 2.2. Since the value of T is several times less than the critical value T kr , it can be argued that the mathematical expectations for the number of birds in the second and third sites are the same.

Correlation and regression analysis
Methods of correlation and regression analysis are a powerful tool in identifying relationships between different parameters or determining the influence of one factor on another. Let's consider two samples X, Y with the same number of elements n. The presence of a linear correlation between them is determined by the correlation coefficient where s x = s 2 x , s y = s 2 y -standard deviations. The significance of the coefficient (8) is checked using the Student's T -test for q=n-2 degrees of freedom: If the value of T is greater than the critical value T kr at the chosen significance level q, then there is a relationship between the parameters X and Y. Otherwise, the dependency is considered not installed. The existence of a correlation dependence allows us to represent it in the form of a linear regression equation where a = y a − r xy x a s y /s x , b = r xy s y /s x . The coefficients a and b can also be found by the least squares method (LSM). The LSM method allows us to find the regression equation in the case of linear and nonlinear multifactorial models in a more general form The adequacy of the regression equations (10), (11) is checked using the Fisher criterion, based on a comparison of two variances. One of them s 2 y = (y i − y a ) 2 /(n − 1) determines the dispersion of the observation results y i from the average value y a . The second dispersion of adequacy s 2 ad = (y i − y a ) 2 /(n − 2) characterizes the degree of deviation of the recorded data y i from the values calculated using the regression equation at x = x i : If the value of F is greater than the critical value F kr with a given significance level q with two degrees of freedom equal to n-1 for the variance s 2 y and n-2 for the variance s 2 ad , then mathematical models (10), (11) adequately describe the situation under study. Otherwise, the equations are not adequate.
As an example, we will study the possibility of the existence of a correlation dependence between the activity coefficient K  The correlation coefficient r xy and the Student's T -test calculated by formulas (8), (9) are 0.92 and 5.2, respectively. The critical value of T kr at 5 degrees of freedom for the 5% significance level is 2.57. Since the Student's criterion is greater than the critical value of T kr , then the relationship between the statistical samples in table. 4 exists and we can proceed to the construction of the regression equation (10). In the case under consideration, it has the form The Fisher criterion calculated by the formula (12) is 5.3, which is greater than the critical value F kr = 4.95. The use of the Student's and Fisher's criteria in this case allows us to draw the same conclusion: the mathematical model (13) is adequate.

Trend analysis
Let us pay attention to the situation, often encountered in the ornithology, when the measured Student's criterion is comparable to or somewhat less than its critical value. In this case, it is recommended to use the following methods of processing observations: -evaluate the significance of the regression coefficient b in equation (10), -apply the Fisher criterion,  [15]. Nevertheless, the trend analysis algorithm, in our opinion, can be extended to the results of ornithological research with some reservations. Of practical interest, for example, is the assessment of the possibility of changing the number of birds in a given region under the influence of anthropogenic factors over time.
The main task of the trend analysis is to identify the trend of changing one factor as another one changes. The mathematical side of the trend model is also expressed by formulas like (10), (11). When identifying a trend in ornithological research, we single out two components: the main one that determines the development trend, called the trend, and the random one e i , responsible for the deviation from the trend. The deviation value e i is determined by the difference between the measurement result y i and the value of functions (10), (11): e i = y i − y(x i ). A necessary condition for successful data processing using the trend is the fulfillment of the following requirements [15]: -e i deviations are random; -different e i deviations do not depend on each other, i.e. there is no autocorrelation; -deviations obey the normal distribution law. Let us consider the procedure for carrying out the trend analysis. At the first stage, the randomness of deviations is checked. The randomness is tested using turning points. A turning point is the result of a measurement, for which one of the following conditions is met: If the number of turning points m is greater than the critical value m kr , then the deviations are considered random. The critical value for the 95% confidence level is given by the formula m kr = 2 * (n − 2)/3 − 2 (16n − 29)/90, (15) where n -the number of elements of the row statistics. Checking deviations for autocorrelation is carried out using the Durbin-Watson test d according to the formula The calculated value of d is compared with the tabular values d 1 and d 2 . There may be two cases here: a) The value of d does not fall within the interval from 2 to 4. For d > d 2 , there is no autocorrelation. If d < d 1 , then there is a relationship between the deviations. When the condition d 1 ≤ d ≤ d 2 is fulfilled, it is impossible to make an unambiguous conclusion.
b) The value of d falls within the range from 2 to 4. In this case, the parameter d=4-d is determined. Further, the analysis is carried out in accordance with the algorithm given in paragraph (a), where the parameter d is used instead of d.
Checking the compliance of deviations with the normal distribution law is performed using the Pearson's χ 2 -criterion (4). In addition to it, the Westergaard method, the DR criterion and other criteria are used [15].
When the above conditions are met, the trend analysis is started. The initial data from n measurements are divided into two approximately identical groups. The first group includes the first n1 measurements, the second -the remaining n 2 elements: n = n 1 + n 2 . Let us introduce the notation for the average values and variances of elements in each group: y (1 ) = y (1) i /n 1 , y (2 ) = y (2) i /n 2 , s ( 1) 2 = (y (1) i − y (1) ) 2 /(n 1 − 1), s ( 2) 2 = (y (2) i − y (2) ) 2 /(n 2 − 1). First, the hypothesis about the equality of variances in groups is tested using the Fisher criterion IOP Publishing doi:10.1088/1755-1315/1049/1/012039 9 F (7). When the value of F is less than the critical value F kr with a given level of significance, the general variances are equal and it is possible to proceed to assessing the presence of the trend using the Student's T -test (6). If the value of T is greater than the critical value of T kr , then the trend exists. Otherwise, there is no trend.
As an example, we will study the possibility of a trend in the number of birds N based on the results of n = 11 observations in 2010-2011. The initial data are presented in table 5. The equation for the dependence of the number of birds in thousands on the year of observation x, obtained by the LSM method, has the form Let us carry out a rejection of possible anomalous values by the Irwin method. The critical value of the Irwin parameter λ a in accordance with the data in table 1 equals about 1.5. The maximum value of the parameter λ i refers to the deviation for number 9 related to measurements in 2008: λ 9 = |y 9 − y 8 |/s x = | − 0.92 − 0.25|/0.36 = 3.25. Since λ 9 is greater than λ a , the result of the measurement in 2008 should be discarded. In all other measurements, the parameter λ i is less than the critical value, so they are saved for further analysis. The corrected data after rejection are given in table 6. Table 6. Adjusted data to determine the possibility of the existence of a trend in the number of birds N i in thousands based on the results of n=10 observations from 2000 to 2010.
where x is the year, N is the number of birds in thousands. The last line in table 6 contains the e i deviations of the observed values of N i from the values of N (x i ) obtained using the regression equation (18). As expected, the standard deviation of the new model has decreased compared to equation (17) from 0.36 to 0.23.
At the first stage, we check the requirement for randomness of deviations by the method of turning points. In accordance with condition (14), the turning points are the results of seven deviations (m=7) with numbers 2-5, 7-9 in table 6. The critical value m kr for ten measurements n=10 is 2.92 according to the formula (15). Since m is less than m kr , the deviations are random.
Next kr , for the 5% significance level is 3.8. Since the inequality χ 2 < χ 2 kr is satisfied, the random variables ek obey the normal distribution law.
All the conditions for the trend analysis are met, so we can proceed to the study of the adequacy of the regression equation (18). The correlation coefficient r xy between the row statistics N and x and the Student's T -test calculated by formulas (8), (9) for the data of table 3.2 are -0.93 and 7.54 respectively. The critical value of T xy at 8 degrees of freedom at a 5% significance level is 2.31. Since the Student's criterion is greater than the critical value of T kr , then the relationship between the statistical samples in table 6 exists.
The Fisher criterion calculated by the formula (12) is equal to 7.20, more than the critical value F kr = 4.95, which indicates the adequacy of the mathematical model (18). In the case under consideration, the conclusions obtained using the Student's and Fisher's criteria coincide.
Fisher's method is more stringent compared to the Student's method with small correlation coefficients at the level of 0.5-0.7, which can lead to different results. In this case, it is advisable to conduct additional trend studies. Let us split the data in table 6 into two identical groups.
In the first group we will include the first five measurements, in the second -the remaining measurements. Means and variances in each group are N (1) = 8.64, N (2) = 7.52, s 2 (1) = 0.15, s 2 (2) = 0.086. We will check the homogeneity of the variances of the groups under consideration using the Fisher criterion (7) F = 0.15/0.086 = 1.74. The critical value of F kr for the 5% significance level at 4 degrees of freedom of the numerator and 4 degrees of freedom of the denominator is 6.39. Since F kr > F , the variances are homogeneous.
Let us evaluate the presence of the trend using the Student's T -test (6). After substituting the average values of deviations and variances of each group into the formula (6), we have: T =12.9. This value is greater than the critical value T kr = 2.31 for 5% significance level at n 1 + n 2 − 2 = 8 degrees of freedom, so the hypothesis of a trend is accepted. Consequently, the number of birds in the period from 2000 to 2010 is declining. If the average annual number of birds in the first 5 years was estimated at the level of 8640 individuals, then in the last five years it decreased to 7520 individuals.
Let us compare the results obtained with the help of regression and the trend analyses. Using the regression equation 18, it can be found that the average annual number of birds in the first 5 years of observations was about N T rend = 6540 individuals. The difference between the two methods for assessing the trend is at the level of 0.3

Development of an algorithm for statistical analysis
Numerous studies of the avifauna on the territory of the wind farms indicate great difficulties in organizing monitoring, collecting information, and in the process of statistical processing of the data obtained [1,5,6,8,9,11,13]. The impossibility of duplicating observations under absolutely identical conditions and the influence of uncontrollable factors associated with meteorological and other conditions are partially compensated by long-term observations up to 1-10 years. However, even in this case there are situations when the measured parameters of the dynamics of ornithocomplexes do not fit into the classical schemes of statistical regularities. In this case, it is necessary to pay special attention to rejecting erroneous measurements, identifying individual anomalous data, and careful checking the adequacy of the obtained mathematical models.
The reliability of conclusions based on the results of statistical analysis depends, first of all, on the correctness of the application of the criteria used. The use of each of them requires the fulfillment of certain conditions. For example, the use of the Student's, Irwin's, Durbin -Watson's, Pearson's, Fisher's and other criteria implies the obligatory compliance of the studied sample with the chosen random variable distribution laws. Based on the material presented in this paper and the accumulated experience of statistical studies in a number of publications [1,6,8,11,12,[15][16][17][18], we can propose the following algorithm for statistical processing of the results of monitoring the wind farm territory.
Step 1. Selection or development of an information system to ensure the storage of observation results in the form of tables or databases, convenient for systematizing the information received and their subsequent analysis. Examples of such information and computer systems are presented in papers [5,7,8].
Step 2. Primary statistical processing of monitoring results, which consists in determining the average values of row statistics, variances, rejecting the results of erroneous measurements and identifying the random variable distribution law.
Step 3. Testing hypotheses about the coincidence of mathematical expectations in a comparative analysis of the number of birds and other parameters that determine their behavior in different monitoring sites and in different seasons.
Step 4. Calculation of correlation coefficients between various parameters that characterize the dynamics of ornithocomplexes depending on the census method, weather, seasonal or other conditions.
Step 5. Correlation-regression analysis. Construction of mathematical models that determine the dependence of one parameter on another parameter.
Step 6. Trend analysis to identify the possibility of changes in the number of birds over time or the direction of change in one parameter as another parameter changes.
The proposed algorithm is focused on processing the results of monitoring the dynamics of ornithocomplexes on the territory of wind farms, obtained by the SNH and RAM methods, although they can also be used in the case of other observation methods.

Conclusions
An algorithm for analyzing the results of monitoring the dynamics of ornithological complexes on the territory of wind farms has been developed using statistical methods. The proposed algorithm allows solving the following problems: 1. Carrying out primary processing of information on the number of birds of various species, flight altitude and time spent on observation sites using the Student's T -test, Fisher and Pearson criteria. 2. Construction of regression equations that determine the dependence of the number of birds on various factors using the Student's T -test, Fisher and Irwin criteria. 3. Trend detection when studying the IOP Publishing doi:10.1088/1755-1315/1049/1/012039 12 time dependence of the number of birds or the direction of change of one parameter as another parameter changes using turning points and the Student's T -test, Fisher, Irwin and Durbin-Watson tests. The proposed statistical methods have been tested in the analysis of the results of monitoring ornithocomplexes on the territory of the Prymorsk-1 wind farm.

Aknowledgement
The authors of this paper are grateful to Anastasia Gorlova (Horlova) for translation into English.