Determining Suitable Probability Distribution Models for Annual Precipitation Data (A Case Study of Mazandaran and Golestan Provinces)

Statistical distributions can be used for data development in shortage data situations, as in many parts of Iran station. The aims of this study are to select the best frequency distribution to estimate average annual precipitation and assess the effects of data length on the selection of suitable distribution. Therefore 65 stations data of Mazandaran and Golestan provinces were analyzed. Relative residual mean square (RMS) was used to determine the best fitted distribution to any annual series and precipitation was estimated for different return periods. Relative frequency of first classes of fitted distributions showed that normal and Pearson distributions fitness decreased and Gumbel distribution had more fitness with data series by increasing statistical period length. The best-fitted distribution is Pearson with 15-year data; log Pearson for 20, 25 and 30-year periods. Based on Moment method and total given scores, two-parameter normal distribution has the best fitness in all statistical periods.


Introduction
Precipitation is a key component of the hydrological cycle and one of the most important parameters for various natural and socio-economic systems: Water resources management, agriculture and forestry, tourism, flood protection, to name just a few (Schmidli, 2005). The study of consequences of global climate change on these systems requires scenarios of future precipitation change as input to hydrologic cycle. Hydrological and meteorological data show no random behavior. Then they can be analyzing by some statistical methods based on frequency analyses of precipitation and flood data. Therefore, statistical distributions can be employed for the studies such as the design of water structure, the management of water resource and watershed, and the determination of effective factors about hydrologic cycle. However, it is necessary to determine the best-fitted distribution to studied data. The primary aim of frequency analysis is to relate the magnitude of extreme events to their frequency of occurrence using the probability of distributions (Chow et al., 1988). The aim of this study is to determine suitable probability of distribution models for annual precipitation data in study area. Seven well-known probable distributing models including two-parameter standard normal, two-parameter log normal, three-parameter log normal, two-parameter gamma, Pearson type III, log Pearson type III and Gumbel with moment and maximum likelihood parameters which are tested to determine the best fitted distributions as well as precipitation in different return periods.
In Iran and other countries, many hydrologist and other experts have analyzed precipitation data. Mashayekhi (1972) considered Iran precipitation as normal distribution. Khalili (1973Khalili ( , 1976) studied 90-year Tehran precipitation data and 10-year central Alburz precipitation data and indicated credibility of gamma distribution. After investigating probability density function of monthly and yearly precipitation data of oldest station of Iran, Haghighatjou (2002) found that log Pearson type III is the best-fitted distribution to monthly data and there are no suitable distributions for yearly precipitation. After investigating of probable suitable distributions for meaning, maximizing and minimizing discharges in Mazandaran hydrological stations, Gholami (2000) concluded that more stations Gumbel distributions (L-moment method) and three-parameter log normal (moment method) had the most fitness for maximizing and minimizing annual discharges, as well as Gumbel (L-moment) for meaning and minimizing annual discharges. Bedeustani (1999), performed a study in the east Azarbayjan, indicated that although there was no suitable distribution for discharging estimation in short term, three-parameter log normal distribution and then log Pearson type III distribution showed more fitness with data series by increasing statistical period length. In addition, for maximizing precipitation in short and long terms, three-parameter log normal and three-parameter log normal along with log Pearson type III distributions were suitable, respectively. Markovich (1965) used minimum Square method for flowing assessment and concluded that gamma distribution had the best fitness among other distributions. Keshtkar (2001Keshtkar ( , 2006 compared moment and L-moment Methods to determine the probability of distributing parameters and suitable distributin for annual discharge series. 20 and 17 hydrometric station was chosen for meaning, maximizing and minimizing annual discharges and maximizing peak discharges respectively. In the central plateau watershed, the best fitted distribution for different annual discharges was studied. Results showed the best fitted distributions: for minimum discharges: Pearson distribution type III ( L moment method); for Medium annual discharges: Pearson type III and Log Pearson type III distributions (L moment method); for Maximum annual discharges: Pearson type III (L moment method), Log Pearson type III and two-parameter Log normal (moment method) distributions; for Maximum annual momentous discharges: Log Pearson type III (moment), Pearson type III, three-parameter Log normal and two-parameter Log normal (L moment method) distributions. Dinpashoh (2004) studied the regionalization of Iran's precipitation climate by using Data from 77 synoptic stations in Iran. After selecting twelve variables from the 57 candidate variables, using Procrustes Analysis, the selected variables used to regionalize Iran's precipitation climate by factor analysis and clustering techniques. The H and Z-statistics, which were based on the L-moment technique, were used to test the homogeneity of each region and selected the distribution which best-fitted annual precipitation records-in that region. The countrywide area was divided into six regions with homogeneous and one with heterogeneous precipitation climates. They demonstrated Southern coastal of Caspian Sea as B homogeneous region i.e. the best-fitted distribution for annual precipitation records in that region of Generalized Logistic. Tase (1982) assessed 50-year precipitation data of 82 meteorological stations in Japan and considered normal distribution for these data. Reich (1983) indicated monthly precipitation as a function of Gumbel distribution based on 15-year data. Based on results of a research conducting in the Seyhan river basin in Turkey (Topaloglu, 2002), Gumbel, log-Logistic, Pearson type III, log-Pearson type III and log-normal-3 distributions were applied to the series of annual instantaneous flood peaks and annual peak daily precipitation for 13 flow gauging and 55 precipitation gauging stations. According to the evaluations of chi-squared tests, Gumbel (the moments methods) for both hydrological and precipitation stations were founded to be the best models. Based on the K-S test, log-normal-3 (the moments methods) and log Pearson type III (the moments methods) models were determined to be the best for hydrological and precipitation stations, respectively. Dahamsheh et al (2007) descripted the structural characteristics and temporal and spatial variation of annual precipitation data in Jordan, together with possible projections for the future and using the probability distribution. Based upon their results, annual precipitation in Jordan was consistent in time with evidence of randomness. Finally they identified three distinct regions as 3-parameter lognormal distribution in the east, gamma distribution in the south and log-Pearson Type III distribution in the remainder of the country. Based on the chi-square goodness of fit test, Campbell (1981) found that the log Pearson type III distribution was the best distribution for culvert design in small-forested watersheds. Whereas Gumbel distribution with L-Moment method has been found to be a suitable distribution for peak flows by Gingras and Adamowski(1992), Pilon, and Adamowski (1992), Wallis(1988).

Method and Materials
The basin of Caspian Sea Rivers is located in the north slope of central Mountains and stretched along shore from Sefidrood delta to Bandar gaz. It is limited to Sefidrood watershed from the west, Central Basin from the south, Semnan and Khorasan basin from southern east and the Caspian Sea from the North. General patterns of atmospheric circulations over the Caspian Sea are controlled by the Eurasian High and Low Pressure Cells (Iceland Low, Indian Low, Azores High, Siberian High and Central Asian High; Lahijani, 2007). The Iranian coast has a sub-tropical climate and a rainfall pattern shows a strong gradient from west (around 1900mm in Anzali) to east (around 196mm in Hassan Gholi). Distribution of rainy weather is uniform throughout the year on the coast. The of average relative humidity is mostly above 75%. Mean summer temperature is around 26 1C, with fewer than 30 snowy and freezing days during the year (Lahijani, 2007).
In this study, precipitation-gauging data from stations of Golestan and Mazandaran provinces has used to investigate fitness of statistical distributions (figure 1).
Totally, a 30-year (1972-2001) common period data of 65 stations have been selected based on statistical period length and closeness to present time. After analyzing the outliers by U.S. Water Resources Council method (1981), the missing data has been completed by normal ratio method (Mahdavi, 2002) with attention to available number of years. Furthermore the homogeneity of data was tested by Double Mass Curve and Runs tests (Mahdavi, 2002).
Using HYFA (Hydrological Frequency Analysis) program, the best-fitted statistical distribution is determined for each station in periods of 15, 20, 25 and 30-year. In HYFA application, goodness of fit of relative residual mean square and chi-square test has been used and the parameters of the distributions were estimated by the methods of moments and maximum likelihood methods.
In this study, normal, two-parameter log-normal, three-parameter log-normal, Gumbel, two parameter gamma, Pearson type III and log Pearson type III distributions have been used (more information about distribution calculations are available in Applied Hydrology book, Mahdavi,2002 and Betül S thesis). After selecting the best-fitted distribution, it is possible to estimate mean annual precipitation for return periods of 2, 5, 10, 20, 25, 50, 100, 500 and 1000 year. By analyzing relative residual mean square and chi-square test tables (in HYFA output) in different time series (15, 20, 25 and 30-year), the best-fitted distributions ranked. Then scores 7-1 has been given to any distribution respectively, based on priority of them. Finally, the best-fitted probability of distribution selected by relative frequency of first classes (by analyzing the best-fitted distribution in any station) and total given scores for each statistical distribution.

Results
In HYFA software, using deviation table and volumes of relative residual mean square, the best statistic distribution, that has the least deviation, has been selected and introduced as the best-fitted distribution to data. Then in this distribution, by attenting to determine average and occurrence probability (or return period) relating curve can be drawed. By this curve, volumes of considered variety with certain return period can be determined. HYFA output presents some of the statistical characteristics such as mean, standard deviation, variance, skewness, etc. Some statistical characteristics of 30-year data were tabulated in Table 1 and location of the studied stations depicted in fig. 1.
Obtained scores of distributions indicated that in all stations, there is no apparent predominance for any distribution (confirms with results of Haghighatjou study). In all stations, relative frequency of first Score of the best fitness in four assessed period changes as figures 2 & 3.
Based on the sum of given Scores to each distribution, in Moment method, two-parameter Log-normal distribution for four statistical periods was the most suitable distribution. As well as in Maximum likelihood method, by increasing the period length, Pearson, gamma, three-parameter log-normal and log Pearson type III have the best fitness with scores of 216, 210, 225 and 219 respectively (table 2 & 3).
Relative frequency of first classes and separated investigation of two methods used to estimating distributions parameters introduce Gumbel and normal (moment method), gamma and Pearson distributions (Maximum likelihood method) for 15-year period, log Pearson, Pearson, gamma and Gumbel distributions (Maximum likelihood method) for 20-year period, log Pearson, Pearson and gamma distributions (Maximum likelihood method) for 25-year period and log Pearson and gamma distributions (Maximum likelihood method) for 30-year period as the best fitted models.
According to the relative frequency of the best fitted distribution and comparative investigation of two methods used to estimating the parameters of distributions, the best fitted distributions are Pearson and normal ones (moment method) for periods of 15-year, log Pearson, gamma and normal distributions (moment method) for 20-year, log Pearson, Pearson and normal distributions (moment method) for 25-year and log Pearson, gamma and normal distributions (moment method) for 30-year.

Discussion and Conclusion
Based on obtained Scores of distributions in all stations, there is no apparent predominance for any distribution (confirms with results of Haghighatjou study). Therefore, it is more important to select the frequency distribution to estimate the value of annual precipitation data based on the best-fitted distribution for any region.
The changes in statistical period length in studied stations changes the best fitted distribution for 88 percent of stations and only 8 stations have no changes in the best fitted distribution ( Table 4) that indicated relatively high variability of annual precipitation and insufficient data. It indicated the importance of appropriate distributing of station and the need to use the more data for researches.
Based on total given scores, two-parameter normal distribution has the best fitness in all statistical periods. Totally, it is noticed that by increasing statistical period length, normal and Pearson distributions are decreased and Gumbel distribution has more fitness with data series, based on relative frequency of the best-fitted distributions. Superiority of Pearson distribution in 15-year periods changed as log Pearson for 20, 25 and 30-year periods that maybe a result of data short period or occurrence of extreme events (i.e. drought and wet year). Period length also effects skewness, standard deviation and distribution parameters. Therefore completing the gages and regular data collection has a significant role in hydrological data analysis. Having a long-term data also can be effective in appropriating interpretation of extreme hydrological events (i.e. drought and wet years) occurrence.

Acknowledgements
We are very grateful of experts in Ministry of Energy (especially Mr. Babaee) for helpful attempts in completing the data. We thanks to the attempts of employee of university of Tehran, Natural Resource College. Topaloglu, F. (2002). Determining suitable probability distribution models for flow and precipitation series of the Seyhan River Basin. Turk j. Agric for 26 ) 187-194 © T.BÜTAK. Wallis, J. R. (1988. Catastrophes, computing and containment: living in our restless habitat. Speculation in Science and Technology,11(4). 295-315.