Comparison of Distribution Models for Peakflow, Flood Volume and Flood Duration

Besides peakflow, a flood event is also characterized by other possibly mutually correlated variables. This study was aimed at exploring the statistical distribution of peakflow, flood duration and flood volume for Johor River in south of Peninsular Malaysia. Hourly data were recorded for 45 years from the Rantau Panjang gauging station. The annual peakflow was selected from the maximum flow in each water year (July-June). Five probability distributions, namely Gamma, Generalized Pareto, Beta, Pearson and Generalized Extreme Value (GEV) were used to model the distribution of peakflow events. Anderson-Darling and Chi-squared goodness-of-fit tests were used to evaluate the best fit. Goodness-of-fit tests at 5% level of significance indicate that all the models can be used to model the distribution of peakflow, flood duration and flood volume. However, Generalized Pareto distribution was found to be the most suitable model when tested with the Anderson-Darling-Smirnov test and the Chi-squared test suggested that Generalized Extreme Value was the best for peakflow. The result of this research can be used to improve flood frequency analysis.


INTRODUCTION
Flood disasters caused by monsoonal storms in Malaysia can pose disastrous impact on the country's economy and social life of the population.According to the MNRE (Ministry of Natural Resources and Environment) (2007), the total flood affected area in Malaysia was 29,799 km 2 or about 9% of the total area of the country.The total number of people living in the flood prone areas was estimated to be 4.819 million, which was about 22% of the total population as of year 2000 and the estimated total annual average flood damage was RM 915 million (MNRE, 2007).
Results gained from flood distribution studies are important for a country's water resources planning in terms of assessing decision making processes in planning and management strategies.Garde and Kothyari (1990), Gunasekara and Cunnane (1992), Haktanir (1992), Bobee et al. (1993), Haktanir and Horlacher (1993), Vogel et al. (1993), Mutua (1994), Bobee and Rasmussen (1995), Mitosek and Strupczewski (2004) and Mitosek et al. (2002) used statistical distributions to model the long term flood characteristics.This study, on the other hand, was aimed at analyzing the statistical distribution of flood variables, notably peakflow, duration and volume of storm runoff that may be mutually correlated, as pointed out by Laio et al. (2009).
The estimation of extreme rainfalls or flood peak discharges in engineering practices relies on statistical analysis of maximum precipitation or stream flow records that uses available sample data to calculate the selected frequency distribution parameters.The fitted distribution is then utilized to estimate event magnitudes pertaining to return periods unequal to recorded events (Laio et al., 2009).For hydraulic design, it is important to obtain accurate estimations of extreme rainfall to alleviate possible damages.
Normally, selection of statistical distributions for any flood frequency analysis is done through statistical tests or by using graphical methods (Bobee et al., 1993).Cunnane (1989) summarized different distributions and parameter estimation procedures that were tested and recommended for different regions.Commonly used distributions for annual flood series modeling include Extreme Value type 1 (EV1), General Extreme Value (GEV), Extreme Value type 2 (EV2), two components Extreme Value, Normal, Log Normal (LN), Pearson type 3 (P3), Log Pearson type 3 (LP3), Gamma, Exponential, Weibull, Generalized Pareto and Wake by Cunnane (1989) and Bobee et al. (1993).Cunnane (1989) also revealed through a survey conducted on 54 agencies in 28 countries that EV1, EV2, LN, P3, GEV and LP3 distributions had been preferred in ten, three, eight, seven, two and seven countries respectively (Hadda and Rahman, 2011).
In this study, five probability distributions were considered as potential candidates.These were Beta, Generalized Pareto, Gamma, Pearson and Generalized Extreme Value (GEV).The reason for selecting these distributions for analysis is that they are commonly used in flood frequency studies (Chowdhury et al., 1991;Vogel and McMartin, 1991;Takara and Stedinger, 1994;Zalina et al., 2002).

MATERIALS AND METHODS
Data collection and study area: Discharge and rainfall data recorded at hourly intervals was obtained from the Department of Irrigation and Drainage, Malaysia.Discharge data at the Rantau Panjang gauging station (01° 46' 50''N and 103° 44' 45''E) was used in this analysis.The data covered 45 years.Missing records were removed.Figure 1 shows the map of Peninsular Malaysia and the location of the flow gauging station.
In this study, flood duration begins from the start of hydrograph rise and ends when the falling limb intercepts an extended line with a slope of 0.0055 L/s/ha/h as suggested by Hewlett and Hibbert (1967) and Yusop et al. (2006).The flood volume includes both base flow and storm flow, as shown in Fig. 2 Modeling the peakflow, flood duration and flood volume: Generalized Pareto, Pearson, Exponential, Beta and GEV were used to model the distribution of the flood variables.The Cumulative Distribution Function (CDF) was determined using the equation: The theoretical CDF is displayed as a continuous curve.The empirical CDF is denoted by: [ ] The Probability Density Function (PDF) is the probability that the variate has the value x: (3) For discrete distributions, the empirical (sample) PDF is displayed as vertical lines representing the probability mass at each integer X: The empirical PDF is shown as a histogram with equal-width vertical bars (bins).Each bin represents the number of sample data that fall into the corresponding interval divided by the total number of data points.Theoretically, the PDF is in the form of a continuous curve appropriately scaled to the number of intervals.
The Probability Density Functions (PDF) and Cumulative Distribution Function (CDF) for the five models are given as follows: Generalized Pareto distribution: The Generalized Pareto distribution with continuous shape parameter (К), continuous scale parameter (σ 0) and continuous location parameter (µ) have PDF and CDF given by: where, Pearson distribution: The Pearson distribution with continuous shape parameter (α 0), continuous scale parameter (β 0) and continuous location parameter (γ) have PDF and CDF given by: where,

+∞ ≤ p x γ
Gamma distribution: The Gamma distribution with continuous shape parameter (α), continuous scale parameter (β) and continuous location parameter (γ) have PDF and CDF given by: ( ) where,

+∞ ≤ p x γ
Beta distribution: The Beta distribution with continuous scale parameter (α 1 0), continuous shape parameter (α 2 0) and continuous location parameter (a b) have PDF and CDF given by:

Generalized Extreme Value (GEV):
The general extreme value I with continuous shape parameter (К), continuous scale parameter (σ and continuous location parameter (µ) have PDF and CDF given by: where,

Goodness-of-fit tests:
The Goodness-of-Fit (GOF) tests measure the compatibility of a random sample with a theoretical probability distribution function.In other words, these tests show how well a selected distribution fits the data.Two goodness-of-fit tests were conducted at 5% level of significance.Note that X denotes the random variable and n is the sample size.The mathematical explanation of two goodness-of-fit tests is as follows: Anderson-Darling (A-D) test: This statistical test is used to find out if a given sample belongs to a specific probability distribution.The test assumes that there are no parameters to be estimated in a distribution under scrutiny, which means that the test and its critical value sets are distribution-free.This test is more often used to test a family of distributions where the parameters in the family need to be estimated; this has to be noted in adjusting the test-statistics or its critical values.The test statistic (A 2 ) is given as: Chi-Squared (C-S) test: This test is a statistical hypothesis test to simply compare how well the theoretical distribution fits the empirical distribution PDF.The Chi-squared test statistic is given by: where, O i = The observed frequency for bin i E i = The expected frequency for bin I and is given by: where, X 1 & X 2 : The lower and upper limits for bin i The Cumulative Distribution Function (CDF): The cumulative distribution function is the probability that the variate takes on a value less than or equal to x: For continuous distributions, the CDF is expressed as: so the theoretical CDF is displayed as a continuous curve.The empirical CDF is denoted by:

RESULTS AND DISCUSSION
The averages of peakflow, flood duration and flood volume at the study site were 248 m 3 /sec, 349 h and 104 mm, respectively and the corresponding standard deviations were 163 m 3 /sec, 125 h and 49 mm.Table 1 presents the fitting result parameters for various distributions of flood variables.In this table the amount of continues shape parameter (α, К), continues scale parameter (σ, β) and continues location parameter (µ, γ)    2. Based on the Anderson goodness-offit test method, it was found that the Generalized Pareto was the best distribution to fit the peakflow and Generalized Extreme Value was the best for flood duration and volume.However, when the Chi-Squared test was used, GEV became more favorable for fitting peakflow and flood volume and Beta was the best distribution for the flood duration.Figure 3 presents the PDF for the GEV and GP distributions fitted to the peak flow.Since the goodness-of-fit test statistics indicate the distance between the observed data and the fitted distributions, it is obvious that the distribution with the lowest statistic value is the best fitting model.Based on this fact, the statistics from the Anderson goodness-offit test using GP for peak flow, GEV for flood duration and flood volume were 0.1551, 0.3547 and 0.2376, respectively.Also for the Chi-Squared test, the statistics for GEV for peakflow and flood volume and Beta for flood duration were 0.0571; 0.15231 and 1.2941, respectively.Figure 4 presents the PDF of GEV distribution fitted to the flood volume whereas Fig. 5 compares the CDF of peakflow between GEV and GP distributions.The CDF graph is useful to precisely determine how well the distributions can fit the observed data.The results showed that the GP distribution was more significant for peakflow compared to GEV.Previous studies on flood variables mostly focused on peak flow.GEV distribution had been found to be the best distribution to fit peakflow data over several stations in Malaysia (Ahmad et al., 2011;Ashkar and Mahdi, 2006).Also, Suhaila andJemain (2007, 2008) found that GEV distribution was the best for fitting daily rainfall throughout Peninsular Malaysia.

CONCLUSION
Flow data were used to analyze the statistical distribution of the peakflow, duration and volume of annual flood for Johor River at Rantau Panjang gauging station.Five probability distributions, namely Beta, Generalized Pareto, Gamma, Pearson and Generalized Extreme Value were tested.Based on the Anderson-Darling test, the Generalized Extreme Value distribution was found to be the most suitable for modeling the flood volume and duration and General Pareto was the most fitted to peakflow.Meanwhile, based on the Chi-squared test, the Generalized Extreme Value distribution was the most suitable for modeling the flood volume and peakflow and Beta was the most fitted to the duration.Goodness-of-fit tests at 5% level of significance indicate that all the models can be used to model the distribution of peakflow, flood duration and flood volume.For further study it is recommended to evaluate the performance of other distributions such as Log Normal, Log Pearson Type 3 and Normal.In addition, different goodness-of-fit tests such as Anderson-Darling and Kolmogorov-Smirnov can be attempted.

Table 1 :
Fitting result parameters for various distributions of flood variablesFlood variables