Modeling of Annual Maximum Storm Intensity with Bayesian Markov Chain Monte Carlo ( MCMC ) and L-moment 1

This study presents the best fitting distribution to describe the siries MSI based on hourly rainfall form 1970 to 2008 for three rain gauge stations in Peninsular Malaysia namely Bertam, Dungun and Pekan. Two threeparameter extreme value distributions which are considered are Generalized Extreme Value (GEV) and Generalized Logistic (GL). The parameters of these distributions are determined using the Bayesian MCMC with noninformative prior distribution and L-moments (LMOM) method. The Goodness-Of-Fit (GOF) between empirical data and theoretical distributions are then evaluated for each stations. The result show that the majority of the stations are found that the L-moment method can give the best modelling for MSI, specified for GEV distribution. Based on the model that has been identified, we can reasonably predict the risks associated the MSI for various return periods.


INTRODUCTION
Statistical modeling of extreme event is important in various disciplines including hydrology, engineering and environmental science.For environmental processes extreme value theory can be used to estimate the probabilities of extreme levels of the processes.For some processes, such as sea-level and wind speed, this information can help in the design of structures such as sea walls, bridges and buildings.For other processes, such as rainfall and pollution, the information can be used to assess danger due to extreme levels of the process.Extreme value theory can be used in finance to, for example, assess the risks of large insurance claims or predict the probability of rare events.
Extreme rainfall event is often associated with climate change, which may be followed by siries of natural disasters such as flash floods and landslides.Considering this phenomenon, the analysis of extreme rainfall data can be utilized for decision makers to setup measures for reducing or preventing the impact of disasters.In Malaysia, extreme analysis on rainfall data has been explored for all sorts of purposes such as tracing patterns and trends of daily rainfall during monsoon seasons (Suhaila et al., 2010a(Suhaila et al., , 2010b)), detecting recent changes in extreme rainfall events (Zin et al., 2010) and fitting probability distributions to annual maximum rainfalls by implementing various methods (Zin et al., 2009;Zawiah et al., 2009;Zin et al., 2010).
Previous literatures provide a few methods of viewing Storm Event Analysis (SEA) in their analysis, among them (Eagleson, 1972;Adams et al., 1986;Guo andAdams, 1998a, 1998b;Adams and Papa, 2000;Rivera et al., 2005).The statistical characterization of Maximum Storm Intensity (MSI) will be analysis involves fitting two extreme values distributions which are considered are Generalized Extreme Value (GEV) and Generalized Logistic (GL).The estimation parameters of these distributions is determined using Bayesian with non-informative prior and L-moment.
Bayesian inference is having a fundamental impact on virtually every statistical methodology.The Bayesian analysis has enormous potential for the various research fields.Especially, there are important literatures which dealt with the Bayesian approach in the fields of water resources engineering (Coles and Powell, 1996;Kuczera and Parent, 1998).The most important step in the processes of the Bayesian method is the construction of the prior distribution.The prior distribution represents the information about an uncertain parameter that is combined with the probability distribution of new data to yield the posterior distribution, which in turn is used for the future inferences and decisions involving parameter.The decision to use a particular prior distribution should be based on any available knowledge about the parameters.Estimated optimal values of the parameters from previous studies can be helpful for establishing the prior distribution and constraints on the model parameters.There are two types of prior distributions, 'Data-based prior distribution' and 'Non data-based prior distribution'.When the prior distribution is derived through the objective analysis using data, it is called as the data-based or the informative prior distribution.Also, when it is derived from subjective judgments or theoretical considerations, it is called as the non-data-based prior distribution.Especially, the non-informative prior distribution is a special case of the non-data-based prior distribution.When the noninformative prior distribution is used, the posterior distribution only reflects the information in the sample.One of the arguments to Bayesian statistics is the potential effect by the non-informative prior distribution.Jeffreys (1961) suggested that efforts for the elicitation of the informative prior distribution should be performed.While theoretical background to apply the non-informative prior distribution is plentiful (Bernado and Smith, 1994;Gelman et al., 1995;Carlin and Louis, 1996), there appeared to be a little literature in which the analysis is via the informative prior distribution in spite of the importance of the application with the informative prior distribution.

DATA AND DEFINITION OF STORM
The data consisting of hourly rainfall data from 3 rain gauge stations in Peninsular Malaysia from 1970 to 2008 have been obtained from the Drainage and Irrigation Department.Data from these stations presented in Table 1.The locations of these six stations are shown in Fig. 1.

METHODOLOGY Probability distribution:
The most common analysis of extreme hydrological events involves the use of annual maximum or annual extreme.When constructing the MSI series for each year in the record is selected; hence, the series obtained would have a length equal to the number of years.Many works that apply the annual maximum series usually involve fitting of a probability model to the rainfall data.Thereafter, several researchers have provided useful applications of annual maximum distributions to rainfall data obtained from different regions of the world.
Two probability distributions associated with modeling extreme events, GEV and GL are considered in this study.The probability density function, probability function and quantile function for each distribution that we consider are as given in Table 3, where x denote the observed values of the random variable representing the event of interest, α is the scale parameter, ε is the location parameter and κ is the shape parameter.
In order to fit a particular theoretical distribution to the observed distribution of AMSE, parameters are estimated using the Bayesian MCMC and LMOM method.
Bayesian MCMC with non-informative prior distribution: This section introduces the idea of Bayesian MCMC using non-informative priors.Suppose that prior beliefs about θ can be formulated and expressed by a probability density function π(θ) with no reference to the data.The likelihood for θ is ( ) . The prior information and the likelihood can be combined using Bayes theory to produce a posterior distribution for θ as follows: Often, the function of an extreme value analysis is to describe the extremal behavior of an observed process in order to find the probability of extreme events occurring in the future.Within the Bayesian framework, prediction is possible through the predictive distribution.Let y denote a future observation with probability density function f(y|θ), then: Is the predictive distribution of y given x.So, if suitable prior distribution can be specified, there are good reasons to choose Bayesian procedures.The difficulty in computing the integral in predictive distribution makes the simulation techniques such as MCMC can be overcome to simulate realizations of the posterior distribution.The main issue in Bayesian MCMC with non-informative prior distribution is the priors are constructed by assuming there is no information available about the process apart from data.In this study the prior density was chosen to be: The variances are chosen large enough to make the distribution almost flat, corresponding to prior ignorance and the posterior density is: The full details of the algorithm are as follows: 1. Initialize the chain a ( ) and the couter at j = 1 2. Put Increase counter from j to j + 1 and return to step 2

L-moment (LMOM) and Goodness-Of-Fit (GOF):
The LMOM method, populared by Hosking and Wallis (1997), is widely applied in the field of applied research such as hydrology, meteorology and civil engineering for estimating parameters of a distribution.It is based on a linear combination of order statistics where the first-until the fourth-order statistics correspond to measures of location, scale, skewness and kurtosis, respectively.When compared to maximum likelihood methods and method of moments, estimators found based on LMOM are more robust, proven to have smaller mean square error and easier to compute.As described by Vogel and Fennessey (1993), LMOM should be preferred for small sample sizes due to its robust property.The rth LMOM, denoted as ߣ is defined as: where, ܺ ି: is the random variable variable for ሺ‫ݎ‬ − ݇ሻ th order statistics.
Once the distribution of the observed values is determined for MSI series, the expected frequencies under the assumed distribution are computed for each station.The most appropriate distribution for each station is identified using results found based on several goodness-of-fit tests.
Three GOF tests considered are Relative Root Mean Square Error (RRMSE), Relative Absolute Square Error (RASE) and Probability Plot Correlation Coefficient (PPCC).The first two methods involve the assessment on the difference between the observed values and the expected values under the assumed distribution while the third method involves measuring the correlation between the ordered values and the associated expected values.The formulas for the tests are:

RESULTS
The parameters in the two probability distributions, the GEV distribution and the GL distribution, are estimated using L Moment and the Bayesian MCMC with the random walk chain algorithm.
In the case of LMOM, The rth LMOM, denoted as ߣ can be solved using the simple mathematic to obtain estimates of parameters for GEV and GL distribution and in Bayesian framework the algorithm in 3.2 can be used to obtain parameters.
Hourly rainfall data for six rain gauge stations will be analyzed by algorithm in 3.2.In each case 30000 iterations of the algoritm were carried out.The MCMC trace plots and estimated posterior densities for GEV and GL parameters for six rain gauge are given in Fig. 3 to 6, respectively.To check that chains had converged to the correct place, the same algorithm was carried out using the various starting points.The chains for the six sites all converged very well within the first 10000 iterations.Therefore, it can be suggested that the developed proposal distribution works well.
The result of the parameter estimation at six rain gauge station using GEV and GL distributions are summarize in Table 4.In the Table 4, the posterior means of Bayesian MCMC and the value of LMOM are very similar to each other regardless of the rain gauge stations and probability distributions.Therefore, it is suggested that the Bayesian MCMC has no advantage over the LMOM.However, from the point of the uncertainty, the Bayesian MCMC is more meaningful than the LMOM.Therefore, it is suggested that the Bayesian MCMC exhibits an advantage over the LMOM when the quantification of the uncertainty in the parameters is required.-----------------------------------GLO  ------------------------------------GEV  ------------------------------------GLO  -------------------------------------  Comparing with the results of the GEV distribution and GL distributions using numerical GOF from Table 5, it can be concluded that for all GOF tests, the MSI of the six rain gauge stations follow a Generalized Extreme Value (GEV) distribution.When the performance of the methods of estimation are compared under a particular GOF test based on the proportion of time where one is better than the other, it is found that the LMOM is more superior than the Bayesian MCMC for all the three type of GOF tests considered.These results are summarized in Table 5.

CONCLUSION
In this study, the occurrence probability of the annual Maximum Storm Intensity (MSI) events was analyzed at the six rain gauge stations, in Peninsular Malaysia.The two probability distributions, the GEV distribution and the GL distribution, were selected to fit of the data.The two types of data in this study were analyzed by LMOM and the Bayesian MCMC, specially for estimate the parameters of the two probability distributions.In this study could be showed that the Bayesian MCMC worked well and efficient with the non informative prior distribution in this study by checking of the acceptance rate.From the results of the parameter estimation and quantile estimation, it was seen that the Bayesian MCMC had no advantage over the LMOM when the median or mean value was required.However, in the aspect of the uncertainty analysis, the Bayesian MCMC could remarkably reduce the range of the uncertainty.The reduction of the uncertainty in the results of the frequency analysis may not always give a good description for the all the cases.Also, Bayesian analysis cannot always provide the reduction of the uncertainty.Especially, if we have much information such as large sample size for the defining the unknown parameters, the influence of the uncertainty is relatively weak to determine a specific decision.However, if we have a little information, the analysis of the uncertainty has a strong influence on the final selection of the parameters.Therefore, the reduction of the uncertainty in the frequency analysis with the extreme event such as the rare rainfall event in this study can provide the meaningful description.

Fig
Fig. 2: Definition of storm

Table 1 :
Main characteristics of the rain stations, SD standard deviation No.

SOUTH CHINA SEA STRAITS OF MELAKA
. The annual MSA, MSI and MSD and Number of Storm (NS) for two rain gauge stations (Dungun and Bertam) are provided in Table 2. let ‫ݔ‬ is jth rainfall (mm) on ith storm and ݊ is duration ሺhourሻ on ݅th storm, for each storm the MSA, MSI and MSD was obtained from the hourly data as follows:

Table 2 :
Annual number of storm (NS) and annual MSA, MSI and MSD for station Dungun and Bertam

Table 3 :
List of distributions used in this study

Table 4 :
Posterior means by MCMC and L-moment for the GEV and GLO parameters for each station rainfall

Table 5 :
Comparison of performance of MCMC versus L-moment under different GOF for GEV and GLO distributions

Table 6 :
Comparison of performance of MCMC versus L-moment under different return period for GEV and GLO distributions