Multiple comparisons of precipitation variations in different areas using simultaneous confidence intervals for all possible ratios of variances of several zero-inflated lognormal models

Flash flooding and landslides regularly cause injury, death, and homelessness in Thailand. An advancedwarning system is necessary for predicting natural disasters, and analyzing the variability of daily precipitation might be usable in this regard. Moreover, analyzing the differences in precipitation data among multiple weather stations could be used to predict variations in meteorological conditions throughout the country. Since precipitation data in Thailand follow a zero-inflated lognormal (ZILN) distribution, multiple comparisons of precipitation variation in different areas can be addressed by using simultaneous confidence intervals (SCIs) for all possible pairwise ratios of variances of several ZILN models. Herein, we formulate SCIs using Bayesian, generalized pivotal quantity (GPQ), and parametric bootstrap (PB) approaches. The results of a simulation study provide insight into the performances of the SCIs. Those based on PB and the Bayesian approach via probability matching with the beta prior performed well in situations with a large amount of zero-inflated data with a large variance. Besides, the Bayesian based on the reference-beta prior and GPQ SCIs can be considered as alternative approaches for small-to-large and medium-to-large sample sizes from large population, respectively. These approaches were applied to estimate the precipitation variability among weather stations in lower southern Thailand to illustrate their efficacies.

difference between rainfall variability in the lower and upper northern regions of Thailand. Recently, Bayesian credible intervals based on a non-informative prior were presented by Maneerat, Niwitpong & Niwitpong (2021a) for the single variance of a delta-lognormal model that was used on daily rainfall records.
Nevertheless, no studies have yet been conducted on simultaneous CIs (SCIs) for pairwise comparisons of the variances of several ZILN models, and so we addressed our research toward filling this gap. Hence, we estimated all possible ratios of variances of several ZILN models by using SCIs based on Bayesian, parametric bootstrap (PB), and generalized pivotal quantity (GPQ) approaches. The reasons for choosing them are that the Bayesian and PB approaches can be used to construct CIs capable of handling situations with large differences in the variances and high proportion of zero values of delta-lognormal models, respectively (Maneerat, Niwitpong & Niwitpong, 2020), while CI based on the GPQ approach perform quite well when the variance was large maneeratEstimatingFishDispersal2020. Their efficacies were determined via simulation studies and precipitation data from four areas of the lower southern region of Thailand in terms of the coverage rate (CR), the lower error rate (LER), the upper error rate (UER), and the average width (AW).

Model
For h groups, d i ;i = 1, 2, . . . , h, denotes the probability of having zero observations while the remaining probability for non-zero observations, d i = 1 − d i , follows a lognormal distribution denoted as LN(µ i ,σ 2 i ) with mean µ i and variance σ 2 i . For random samples from the groups, let Y i = (Y i1 ,Y i2 ,....,Y in i ) denote a ZILN variate based on n i observations from group i with the probability density function given by For Y i = 0, the number of zero observations n i0 follows a binomial distribution with sample size n i and the probability of having zero observations d i , where n i = n i0 + n i1 , n i0 = #{j : Y ij = 0} and n i1 = #{j : Y ij > 0}; j = 1,2,...,n i . For Y i > 0, W i = lnY i are normally distributed with mean µ i and variance σ 2 i . For a ZILN model, the maximum likelihood estimates of d i , µ i and σ 2 i ared i = n i0 /n i ,μ i = j:Y ij >0 lnY ij /n i1 and σ 2 i,mle = j:Y ij >0 [lnY ij −μ i ] 2 /n i1 , respectively. For the ith group, the population variance of Y i is given by which can be log-transformed as T i = lnV i = lnd i +2(µ i +σ 2 i )+ln[1− Givend i ,μ i andσ 2 i from the observations, the estimates of T i can be written aŝ . Using the delta theorem, the variance ofT i becomes In the present study, the parameter of interest is all pairwise ratios among the logtransformed variances of several ZILN models, which is defined as Its estimates can be obtained asλ ik =T i −T k ; ∀i = k and i,k =1 ,2,...,h. From Eq. (4), the variance ofλ ik can be expressed as where the covariance betweenT i andT k is COV( comprise independent and identically distributed (iid) random vector from a ZILN model. Thus, we can obtain estimates ofT i that are independent random variables. Using estimates (d i ,μ i ,σ 2 i ) andd k ,μ k ,σ 2 k from the samples enables the estimated variance ofλ ik to become where (d i ,μ i ,σ 2 i ) and (d k ,μ k ,σ 2 k ) denote the estimated parameters of (d i ,µ i ,σ 2 i ) and (d k ,µ k ,σ 2 k ), respectively.

Methods
To estimate λ ik , the SCIs are formulated based on Bayesian, GPQ and PB approaches.

The Bayesian approach
The essential feature of Bayesian approach is to use the situation-specific prior distribution that reflects knowledge or subjective belief about the parameter of interest; this is modified in accordance with Baye's Theorem to yield the posterior distribution. Thus, CIs based on the Bayesian approach are derived by using the posterior distribution. In Bayesian theory, the CI is referred to as the credible interval because it is not unique on the posterior distribution. The following methods are used to define suitable credible intervals: the narrowest interval for a univariate distribution (the highest posterior density interval) (Box & Tiao, 1973); the interval when the probability of being below is the same as being above, which is sometimes referred to as the equal-tailed interval (Gelman et al., 2014); or the interval with the mean as the central point (assuming that it exists). In the present study, the SCIs based on the Bayesian approach were constructed based on the equal-tailed interval. Motivated by Maneerat, Niwitpong & Niwitpong (2020), the probability-matching-beta (PMB) and reference-beta (RB) priors were our choice for parameter (d i ,µ i ,σ 2 i ) in this study. Thus, Bayesian SCIs for λ ik were established as follows: The PMB prior: The probability-matching prior for (µ combined with the prior of d i as a beta distribution with a i = b i = 1/2. Thus, the PMB prior for (d ,µ i ,σ 2 i ) can be defined as When updated with its likelihood, we obtain The respective marginal posterior distributions of (d which are denoted as d (post ) i,pmb |y i ∼ beta(n i0 + 1/2,n i1 + 1/2), µ where T k,pmb ). In agreement with Ganesh (2009), the 100(1 − α)% Bayesian-based SCI with PMB prior for λ ik is where ik,pmb .

The RB prior:
This is a non-informative prior derived from the Fisher information matrix (Maneerat, Niwitpong & Niwitpong, 2020). The RB prior of (d ,µ i ,σ 2 i ) is defined as in which the prior of d is a beta distribution. When combined with its likelihood Eq. (9), the posterior of (µ i ,σ 2 i ) differs from the PMB prior as follows: Moreover, it can be similarly denoted as d (post ) i,rfb |y i ∼ beta(n i0 +1/2,n i1 +1/2), µ where

The GPQ approach
Motivated by Wu & Hsieh (2014), the GPQ of d i is formulated using the arcsin square-root transformation of the variance. Moreover, the GPQs for (µ i ,σ 2 i ) are also obtained from transformation of the normal approximation by using the central limit theorem (Tian, 2005;Hasan & Krishnamoorthy, 2017). The GPQ for T i can be written as where are independent from standard normal, normal and χ 2 n i1−1 distributions, respectively. Thus, the corresponding GPQ of λ ik can be expressed as Similarly, Therefore, the 100(1 − α)% SCI for λ jk based on the GPQ approach is given by where q GPQ α denotes the (1 − α) th percentile of the Q GPQ distribution; the Q GPQ is derived as In agreement with Hannig et al. (2006), Kharrati-Kopaei & Eftekhar (2017), the asymptotic coverage probability of the SCI for λ ik based on the GPQ is slightly modified from that in Maneerat, Niwitpong & Niwitpong (2021b) (the proof of Theorem 1 in the Appendix).
thus it follows that the asymptotically coverage probability of 100 (1 − α)% SCI for λ jk based the GPQ approach is given by for ∀i = k and i,k =1 ,...,h.

The PB approach
Here, we assume that the data come from a known distribution with unknown parameters that are estimated by using samples stimulated from the estimated distribution. In the present study, the PB approach is adjusted to suit our particular situation. Letd * i ,μ * i and σ 2 * i be the observed values ofd i ,μ i , andσ 2 i representing the estimated values of parameters d i , µ i , and σ 2 i , respectively. Thus, we can obtain the empirical distribution of T based on the PB approach. In accordance with Sadooghi- Alvandi & Malekzadeh (2014), the respective sampling distributions of (d where are independent random variables with standard normal and Chi-square distributions, respectively. The PB variable-based pivotal quantity is expressed as where m PB α is the (1 − α) th percentile of the distribution of M PB . Theorem 2 shows the asymptotic coverage probability of the 100(1 − α)% SCI for λ ik based on the PB approach (see the proof in the Appendix ).
comprise an iid random vector from a ZILN model based on n i observations from population group i. Letλ ik =T i −T k be the estimate of λ ik , whereT i andT k are the approximately log-transformed variances ofT i andT k from the population groups ith and kth, respectively. Hence, where Var(λ ik )is the estimated variance ofλ ik ; ∀i = k and i,k =1 ,2,..,h.

SIMULATION STUDIES AND RESULTS
Simulation studies were conducted to assess the performances of the SCIs based Bayesian, GPQ, and PB approaches for all pairwise ratios of variances of several ZILN distributions: Bayesian SCIs based on PMB and RB priors (Maneerat, Niwitpong & Niwitpong, 2020), the GPQ-based SCI (Wu & Hsieh, 2014), and the PB-based SCI (Sadooghi-Alvandi & Malekzadeh, 2014;Li, Song & Shi, 2015;Kharrati-Kopaei & Eftekhar, 2017). CRs, LERs, UERs, and AWs of the SCIs were determined when the population group size(h) were fixed at 3 and 5; the optimal values of CR, LER, UER, and AW are 95%, 5%, 5% and 0, respectively, which were used to judge the best-performing SCI. Critical values v pmb α , v rb α , q GPQ α and m PB α for the Bayesian SCIs based on PMB and RB priors, GPQ and PB, respectively, were also assessed. Throughout the simulation studies, the simulation procedure to estimate the CRs, LERs, and UERs was as follows:  (14), (18), (21) and (31), respectively, and record whether or not the values of (λ ik ;i = k) fall within their corresponding confidence intervals.
(v) For each method: obtain the number of times that all (λ ik ;i = k) are in their corresponding SCIs to estimated the CR. (vi) Obtain the number of times that all (λ ik ;i = k) is less than or greater than their corresponding SCIs to estimate the LER and UER, respectively.
For h = 3 with large variance, Table 1 and Fig. 1 reveal that all of the methods provided CR performances close to and greater than the nominal confidence level (95%). Meanwhile, the SCIs based on the Bayesian approach based on the PMB prior and GPQ maintained a good balance between LER and UER. Importantly, the AW of PB was narrower than the other methods for small sample sizes, while those of the Bayesian approach based on the PMB prior were slightly narrower than the others for the other sample sizes. When a group comparison was h = 5 (Table 1 and Fig. 2), the PB approach provided the best CRs and narrowest AWs for all scenarios tested.

AN EMPIRICAL APPLICATION OF THE FOUR METHODS TO DAILY PRECIPITATION DATA
Daily precipitation records comprise publicly available data from the Thailand Meteorology Department (Department, 2021). Flash floods, landslides, and windstorms caused by heavy rainfall occurred in the four provinces in the lower southern area of Thailand: Songkhla, Yala, Narathiwat, and Pattani during January 2021, as reported by Thailand's Department of Disaster Prevention and Mitigation (Thailand, 2021). According to automatic weather system (Department, 2021), Songkhla has two weather stations in the Songkhla and Sadao districts, which means that we could simultaneously estimate variations in precipitation at five weather stations.
Daily precipitation data from December 2020 to January 2021 (Table 2) were used in the analysis. Figure 3 shows histogram along with normal quantile-quantile (Q-Q), cumulative density function (CDF) and probability-probability (P-P) plots. Furthermore, the Akaike information criterion (AIC) and Bayesian information criterion (BIC) values of five models: normal, logistic, lognormal, exponential, and Cauchy applied to fitting the non-zero precipitation data were compared to check the appropriateness of each model for fitting the data (Table 3). The AIC and BIC results for the lognormal model  were the lowest, and thus it was the most efficient. The data from all of the stations were zero-inflated, thereby verifying that they follow the assumptions for ZILN. The results in Table 4 reveals that since variance σ 2 i was greater than the mean µ i , quite large precipitation variations were required in the present study. For applying data of daily precipitation to measure the efficacy of the four methods, the 95% SCIs-based Bayesian, GPQ and PB approaches for all pairwise precipitation datasets from the five weather stations cover their point estimates (Table 5). In a agree with the simulation results for n 1 = n 2 = n 3 = 50 and n 4 = n 5 = 100, the PB approach provided the best SCI performance for ratio of variances of several ZILN models. This can be interpreted as Narathiwat has the highest variation in precipitation, followed by Yala. These results are in line with the Asia Disaster Monitoring and Response System (Thailand, 2021), which reported that both areas were affected by flooding and landslides damaging 22,308 households in Narathiwat and 12,082 households in Yala during the time period covered by the data used in the study.

DISCUSSION
From the above numerical results, it can be seen that the SCIs based on PB and the Bayesian approach based on the PMB prior dealt with large variations in the data better than the other approaches. The PB-based SCI has some strong points for small sample sizes due to random samples being obtained via bootstrap resampling. Furthermore, the performance of the Bayesian SCI based on the PMB prior declined as the number of populations increased and the sample size decreased. Although, the GPQ method provided appropriate CRs, its AWs were wider than the other methods, possibly because the GPQ of d i is limited for cases with unequal zero-inflated percentages. Since it has performed quite well for  one population group especially (Wu & Hsieh, 2014;Maneerat, Niwitpong & Niwitpong, 2021a). Further research could be conducted to explore subjective or prior beliefs about parameters when using the Bayesian approach for parameter estimation

CONCLUSIONS
SCIs for the comparison of the variance ratios among several ZILN models were formulated by applying Bayesian approaches based on the PMB and RB priors, along with the GPQ and PB approaches. In practice, the daily precipitation data for each of the weather  stations considered were overdispersed (i.e., the variance was greater than the mean) and zero-inflated (Table 4). Thus, the ZILN distribution is an appropriate model for estimating parameters in the construction of SCIs for multiple comparisons between their variances. For three populations, all of the methods produced 95% SCIs for all pairwise comparisons among variances covering the true parameter. Meanwhile, the SCI constructed via the Bayesian approach based on the PMB prior maintained a good balance between LER and UER and provided the narrowest AWs except for small sample sizes. On the other hand, the PB-based SCI could handle extreme cases when the sample sizes were small with large variances. For five populations, the PB-based SCI performed the best overall, with the Bayesian approach based on the RB prior for small-to-large sample sizes and the GPQ approach for medium-to-large and large sample sizes providing acceptable results, and thus can be recommended as alternative SCIs.

APPENDIX
The proofs of the methods for constructing the SCI for λ ik are covered here.

ADDITIONAL INFORMATION AND DECLARATIONS
Funding