Flood frequency analysis using method of moments and L-moments of probability distributions

Estimation of maximum flood discharge (MFD) at a desired location on a river is important for planning, design and management of hydraulic structures. This can be achieved using deterministic models with extreme storm events or through frequency analysis by fitting of probability distributions to the recorded annual maximum discharge data. In the latter approach, suitable probability distributions and associated parameter estimation methods are applied. In the present study, method of moments and L-moments (LMO) are used for determination of parameters of six probability distributions. Goodness-of-Fit tests such as Chi-square and Kolmogorov– Smirnov are applied for checking the adequacy of fitting of probability distributions to the recorded data. Diagnostic test of D-index is used for the selection of a suitable distribution for estimation of MFD. The study reveals that the Extreme Value Type-1 distribution (using LMO) is better suited amongst six distributions used in the estimation of MFD at Malakkara and Neeleswaram gauging stations in Pampa and Periyar river basins, respectively. Subjects: Statistics & Probability; Civil, Environmental and Geotechnical Engineering; Engineering Mathematics; Engineering Management


PUBLIC INTEREST STATEMENT
Estimation of maximum flood discharge (MFD) at a desired location on a river is important for planning, design and management of hydraulic structures. This can be achieved using deterministic models with extreme storm events or through frequency analysis by fitting of probability distributions to the recorded annual maximum discharge data. In the latter approach, suitable probability distributions and associated parameter estimation methods are applied. This paper details the procedures adopted in method of moments and L-moments that are used for determination of parameters of probability distributions. Goodnessof-fit tests such as Chi-square and Kolmogorov-Smirnov are applied for checking the adequacy of fitting of probability distributions to the recorded data. However, the diverging results based on GoF tests lead to adopt qualitative test to aid the selection of a suitable distribution for estimation of MFD.

Introduction
Estimation of maximum flood discharge (MFD) with a specified return period is crucial for the design of hydraulic structures such as bridges, barrages, culverts, dams and drainage systems. Since the hydrologic phenomena governing the MFD is highly stochastic in nature, the MFD can be effectively determined by fitting of probability distributions to the series of recorded annual maximum discharge (AMD) data. An AMD is the highest instantaneous discharge value at a definite cross-section of a natural stream throughout an entire hydrologic year (water year). The longer the period of observation, the greater would be the length of the recorded series that may offer better results of the flood frequency analysis (FFA).
A number of probability distributions such as Exponential (EXP), Extreme Value Type-1 (EV1), Extreme Value Type-2 (EV2), Generalized Extreme Value (GEV), Generalized Pareto (GPA) and Normal (NOR) are used in FFA (Haktanir & Horlacher, 1993). The distributions viz., EV1, EV2, GEV and GPA are classified as extreme value family of distributions. Likewise, EXP and NOR distributions are classified as Gamma and Normal family of distributions. Generally, method of moments (MOM) is used for determination of parameters of the probability distributions. Sometimes, it is difficult to assess exact information about the shape of a distribution that is conveyed by its third and higher order moments. Also, when the sample size is small, the numerical values of sample moments can be very different from those of the probability distribution from which the sample was drawn. It is also reported that the estimated parameters of distributions fitted by the MOM are often less accurate than those obtained by other parameter estimation procedures such as maximum likelihood method, method of least squares and probability weighted moments. To address the aforesaid shortcomings, the application of alternative approach, namely L-moments (LMO) discussed in this paper is used for FFA (Hosking, 1990).
In the recent past, a number of studies have been carried out by different researchers on adoption of probability distributions for FFA. Kjeldsen, Smithers, and Schulze (2002) applied LMO in regional flood frequency analysis (RFFA) for KwaZulu-Natal province of South Africa. Kumar, Chatterjee, Kumar, Lohani, and Singh (2003) carried out RFFA adopting 12 frequency distributions (using LMO) and found that the GEV distribution is better suited distribution for estimation of MFD. Yue and Wang (2004) applied LMO to identify the suitable probability distribution for modelling annual stream flow in different climatic regions of Canada. Kumar and Chatterjee (2005) employed the LMO to define homogenous regions within 13 gauging sites of the north Brahmaputra region of India. Atiem and Harmancio˘lu (2006) carried out RFFA using the index flood LMO approach for 14 gauged sites on the Nile River tributaries. Saf (2009) observed that the Pearson Type-III distribution is better suited for modelling extreme values in Antalya and lower west Mediterranean sub-regions and the generalized logistic for the upper west Mediterranean sub-region. Bhuyan, Borah, and Kumar (2010) applied LH-moments (generalized version of LMO) to carry out RFFA for river Brahmaputra. They found that RFFA based on the GEV distribution by using level one LH-moment is superior to the use of LMO. Malekinezhad, Nachtnebel, and Klik (2011) concluded that the GEV (using LMO) is better suited amongst five distributions studied for modelling AMD of three different regions in Iran. Badreldin and Feng (2012) carried out RFFA for the Luanhe basin using LMO and cluster techniques. Haberlandt and Radtke (2014) carried out FFA for three mesoscale catchments in northern Germany. Thus, the studies reported didn't suggest applying a particular distribution for FFA for different region or country. This apart, when different distributions are used for estimation of MFD, a common problem is encountered as regards the issue of best model fits for a given set of data. This can be answered by formal statistical procedures involving Goodness-of-Fit (GoF) and diagnostic tests; and the results are quantifiable and reliable (Zhang, 2002). Qualitative assessment is made from the plots of the recorded and estimated MFD. For quantitative assessment on MFD within in the recorded range, Chi-square (χ 2 ) and Kolmogorov-Smirnov (KS) tests are applied. A diagnostic test of D-index is used for the selection of suitable probability distribution for FFA (United States Water Resources Council [USWRC], 1981). The study compares the performance of six probability distributions that were employed for FFA, and illustrates the applicability of GoF and diagnostic tests procedures in identifying which distribution is the best suited for estimation of MFD for Malakkara and Neeleswaram.

Methodology
The study is to assess the probability distribution function (PDF) for FFA. Thus, it is required to process and validate the data for application such as: (1) select the PDFs for FFA (say, EXP, EV1, EV2, GEV, GPA and NOR); (2) select parameter estimation methods (say, MOM and LMO); (3) select quantitative GoF and diagnostic tests; and (4) conduct FFA and analyse the results obtained thereof.

Theoretical description of MOM
MOM is a technique for constructing estimators of the parameters that is based on matching the sample moments with the corresponding distribution moments (Ghorbani, Ruskeep, Singh, & Sivakumar, 2010). The rth central moment (μ r ) about the mean (Q) of a random variable Q is defined by: Similarly, third and fourth moments (μ 3 and μ 4 ) about Q are used to define skewness (C S ) and kurtosis (C K ), which are as follows:

Theoretical description of LMO
LMOs are summary statistics for probability distributions and data samples. They are analogous to ordinary moments, which provide measures of location, dispersion, skewness, kurtosis and other aspects of the shape of probability distributions or data samples. But, LMOs are computed from linear combinations of the ordered data values (Vogel & Wilson, 1996). LMO can be used as the basis of a unified approach to the statistical analysis adopting probability distributions. According to Central Water Commission (2010), LMOs has the following advantages: (i) LMO characterize wider range of probability distributions than conventional moments.
(ii) LMOs are less sensitive to outliers in the data.
(iii) LMOs approximate their asymptotic normal distribution more closely.
(iv) LMOs are nearly unbiased for all combinations of sample sizes and populations.
LMO will thus particularly useful in providing accurate quantile estimates of hydrological data in developing counties, where small sample size typically exist. LMO is a linear combination of probability weighted moments. Let Q 1 , Q 2 , …, Q N be a conceptual random sample of size N and Q 1N ≤ Q 2N ≤ , …, ≤ Q NN denote the corresponding order statistics. The r + 1th LMO defined by Hosking and Wallis (1993) is: where l r + 1 is the r + 1th sample moment and b k is an unbiased estimator with The first two sample LMOs are expressed by: Table 1 gives the details of quantile functions and parameters of six probability distributions considered in the study. (1)

Goodness-of-Fit tests
GoF tests such as χ 2 and KS are applied for checking the adequacy of fitting of probability distributions to the series of recorded AMD data.
χ 2 statistic is defined by: where O j (Q) is the observed frequency value of jth class, E j (Q) is the expected frequency value of jth class and NC is the number of frequency classes. The rejection region of χ 2 statistic at the desired significance level (η) is Here, m denotes the number of parameters of the distribution.

KS statistic is defined by:
where F e (Q i ) = (i − 0.44)/(N + 0.12) is the empirical CDF of Q i and F D (Q i ) is the computed CDF of Q i (Zhang, 2002). If the computed values of GoF tests statistic given by the distribution are lower than that of the theoretical values at the desired significance level, then the distribution is considered to be acceptable for estimation of MFD.
By using the logarithmic transformation of the recorded data, parameters of EV1 are initially obtained by MOM and LMO; and further used to determine the parameters of EV2 from α = e ξi and k = 1/(scale parameter of EV1)

Diagnostic test
The selection of a suitable probability distribution for estimation of MFD is performed through D-index, which is defined by: Here, Q is the average (or mean) of the recorded AMD, Q i 's (i = 1 to 6) are the first six highest sample values in the series and Q * i is the estimated value by the probability distribution. The distribution having the least D-index is identified as the better suited distribution for estimation of MFD (USWRC, 1981).

Application
An attempt has been made to estimate the MFD by six probability distributions (using MOM and LMO) at Malakkara and Neeleswaram gauging stations. These gauging stations are located in the Pamba and Periyar river basins, respectively. The catchment areas of these gauging stations are 1,713 and 4,234 km 2 . Based on the water year (June to May), stream flow data for the period 1985-1986to 2012-2013for Malakkara and 1971-1972to 2012-2013 for Neeleswaram is used. The series of AMD is derived from the daily stream flow data and further used in FFA. The summary statistics of the AMD values are presented in Table 2.

Results and discussions
A computer program was developed and used for performing FFA. The program computes the parameters of the six probability distributions (using MOM and LMO), flood estimates for different return periods, GoF tests statistic and D-index values for the stations under study.

Analysis based on GoF tests
In the present study, the degree of freedom (NC − m − 1) was considered as one for 3-parameter distributions (GEV and GPA) and two for 2-parameter distributions (EXP, EV1, EV2 and NOR) while computing the χ 2 statistic values for Malakkara and Neeleswaram. GoF tests statistics are computed through Equations 6 and 7, and presented in Table 3 for the stations under study.  From Table 3, it may be noted that the computed values of χ 2 statistic for the EXP, EV2, GEV and GPA distributions (using MOM and LMO) are greater than the theoretical values ( 2 0.05,1 = 3.84 for GEV and GPA; and 2 0.05,2 = 5.99 for EXP, EV1, EV2 and NOR) at 5% significance level, and at this level, these four distributions are not acceptable for estimation of MFD in Neeleswaram. On the other hand, the computed values of χ 2 statistic for these distributions (using MOM and LMO) are lower than the theoretical values at 5% significance level, and at this level, the six distributions are acceptable for estimation of MFD in Malakkara. Also, from Table 3, it may be noted that the computed values of KS statistic for the six probability distributions (using MOM and LMO) are lower than the theoretical values (0.250 for Malakkara and 0.205 for Neeleswaram) at 5% significance level and at this level, the six distributions are acceptable for estimation of MFD in Malakkara and Neeleswaram.

Estimation of MFD by probability distributions
The parameters of the six probability distributions were determined by MOM and LMO; and further used for estimation of the MFD at Malakkara and Neeleswaram. The results are presented in Tables 4 and 5.

Flood frequency curves
The MFD estimates computed by the six probability distributions (using MOM and LMO) for Malakkara and Neeleswaram, as given in Tables 4 and 5, are used to develop flood frequency curves and these are presented in Figures 1 and 2, respectively.  From Figures 1 and 2, it can be seen that the estimated MFD by EV2 (LMO) was higher estimates when compared to the corresponding values of other distributions (using MOM and LMO) for the return period of 20 years and above. Also, from Figures 1 and 2, it can be seen that the fitted curves by EV1 distribution (using MOM and LMO) are linear for the stations under study.

Analysis based on diagnostic test
For the selection of the best suitable distribution for estimation of MFD, the D-index values of the six probability distributions are computed by Equation 8 and the results are presented in Table 6.  From Table 6, it may be noted that (1) the values of D-index viz., 2.063 of NOR (using LMO) for Malakkara and 0.950 of GPA (using LMO) for Neeleswaram are minimum when compared to the corresponding values of other probability distributions; (2) χ 2 test results don't support the use of EXP, EV2, GEV and GPA distributions (using MOM and LMO); and therefore, these four distributions are not found to be acceptable for estimation of MFD in Neeleswaram; and (3) D-index value of 1.571 (using NOR) is the second minimum next to the GPA when LMO is applied for determination of parameters for Neeleswaram. Therefore, NOR distribution (using LMO) is found to be better suited for estimation of MFD for Malakkara whereas NOR distribution (using MOM) for Neeleswaram.
Because of the diverging results obtained from GoF and diagnostic tests, qualitative assessment is made to identify the suitable probability distribution for estimation of MFD. From Figures 1 and 2, it can be seen that the fitted lines by estimated MFD values adopting NOR (using MOM and LMO) and GPA (using LMO) are not confined with the line of agreement of the observed MFD values. By considering the trend lines of the fitted curves using estimated MFD values, the study identifies the EV1 distribution (using LMO) is found to be a good choice for estimation of MFD at Malakkara though the D-index value of EV1 is higher than that of GPA. For Neeleswaram, EV1 distribution (using LMO) is also found to be a good choice for estimation of MFD.

Conclusions
The paper describes briefly the study carried out for estimation of MFD by adopting FFA using a computer aided procedure for determination of parameters of six probability distributions (using MOM and LMO) for Malakkara and Neeleswaram. The following conclusions are drawn from the study: (i) For the return period of 10 years and above, it was found that the estimated MFD by LMO of EXP, GEV, EV1 and EV2 distributions are higher than the corresponding values of MOM of these four distributions for Malakkara and Neeleswaram.
(ii) The study presents the selection of suitable distribution evaluated by GoF (using χ 2 and KS) and diagnostic (using D-index) tests.
(iii) The χ 2 test results showed that the EXP, EV1, EV2, GEV, GPA and NOR distributions (using MOM and LMO) are acceptable for estimation of MFD at Malakkara.
(iv) The χ 2 test results showed that the EXP, EV2, GEV and GPA distributions (using MOM and LMO) are not acceptable for estimation of MFD at Neeleswaram.
(v) The KS test results indicated that the six probability distributions (using MOM and LMO) are acceptable for estimation of MFD at Malakkara and Neeleswaram.
(vi) By considering the trend lines of the fitted curves using estimated MFD values, the study presented that the EV1 distribution (using LMO) is better suited amongst six distributions studied for estimation of MFD at Malakkara and Neeleswaram.
(vii) The study suggested that the MFD values for different return periods computed by EV1 distribution (using LMO) could be considered as the design parameter for planning and designs of irrigation and flood protection; and also for hydraulic structures on these rivers near Malakkara and Neeleswaram.