Estimating the average daily rainfall in Thailand using confidence intervals for the common mean of several delta-lognormal distributions

The daily average natural rainfall amounts in the five regions of Thailand can be estimated using the confidence intervals for the common mean of several delta-lognormal distributions based on the fiducial generalized confidence interval (FGCI), large sample (LS), method of variance estimates recovery (MOVER), parametric bootstrap (PB), and highest posterior density intervals based on Jeffreys’ rule (HPD-JR) and normal-gamma-beta (HPD-NGB) priors. Monte Carlo simulation was conducted to assess the performance in terms of the coverage probability and average length of the proposed methods. The numerical results indicate that MOVER and PB provided better performances than the other methods in a variety of situations, even when the sample case was large. The efficacies of the proposed methods were illustrated by applying them to real rainfall datasets from the five regions of Thailand.


INTRODUCTION
Approximately 82.2% of Thailand's cultivated land area depends on natural rainfall (Supasod, 2006), thereby indicating its importance for Thai agriculture. However, it is a natural phenomenon with a significant level of uncertainty that can cause natural disasters such as droughts, floods, and landslides. In many countries around the world, extreme rainfall events have been increasing in frequency and duration. On December 5, 2017, Storm Desmond led to heavy rainfall causing flooding in northern England, Southern Scotland, and Ireland (Otto & Oldenborgh, 2017). On July 6-7, 2018, extreme rainfall events such as floods and landslides affected over 5,000 houses, and approximately 1.9 million people in Japan were evacuated from the at-risk area (Oldenborgh, 2018). In mid-September 2019, the amount of rainfall was extreme during Tropical Storm Imelda in Southeast Texas, USA, where over 1,000 people were affected by large-scale flooding and there were five deaths (Oldenborgh et al., 2019). Thus, it is necessary to assess how rainfall varies in each region of a country on a daily basis. Due to the climate pattern and meteorological conditions, Thailand is commonly separated into five regions: northern, northeastern, central, eastern, and southern. The rainfall in each region varies widely due to both location and seasonality. Importantly, Thailand's rainfall data include many zeros with probability δ > 0 and positive right-skewed data following a lognormal distribution for the remainder of the probability. Thus, applying a delta-lognormal distribution (Aitchison, 1955) is appropriate.
The mean is a measure of the center of a set of observations (Casella & Berger, 2002) that can be used in statistical inference, while functions of the mean such as the ratio or difference between two means can also be used. These parameters have been applied in many research areas, such as medicine, fish stocks, pharmaceutics, and climatology. For example, they have been used for hypothesis testing of the effect of race on the average medical costs between African American and Caucasian patients with type I diabetes (Zhou, Gao & Hui, 1997), to estimate the mean charges for diagnostic tests on patients with unstable chronic medical conditions (Zhou & Tu, 2000;Tian, 2005;Tian & Wu, 2007;Li, Zhou & Tian, 2013), to estimate the maximum alcohol concentration in men in an alcohol interaction study (Tian & Wu, 2007;Krishnamoorthy & Oral, 2015), to estimate the mean red cod density around New Zealand as an indication of fish abundance (Fletcher, 2008;Wu & Hsieh, 2014), and to estimate the mean of the monthly rainfall totals to compare rainfall in Bloemfontein and Kimberley in South African (Harvey & van der Merwe, 2012).
In practice, the mean has been widely used in many fields, as mentioned before. When independent samples are recorded from several situations, then the common mean is of interest when studying more than one population. Many researchers have investigated methods for constructing confidence interval (CIs) for the common mean of several distributions. For example, Fairweather (1972) proposed a linear combination of Student's t to construct CIs for the common mean of several normal distributions. Jordan & Krishnamoorthy (1996) solved the problem of CIs for the common mean under unknown and unequal variances based on Student's t and independent F variables from several normal populations. Krishnamoorthy & Mathew (2003) presented the generalized CI (GCI) and compared it with the CIs constructed by Fairweather (1972), and Jordan & Krishnamoorthy (1996). Later, Lin & Lee (2005) developed a GCI for the common mean of several normal populations. Tian & Wu (2007) provided CIs for the common mean of several lognormal populations using the generalized variable approach, which was shown to be consistently better than the large sample (LS) approach. Lin & Wang (2013) studied the modification of the quadratic method to make inference via hypothesis testing and interval estimation for several lognormal means. Krishnamoorthy & Oral (2015) proposed the method of variance estimates recovery (MOVER) approach for the common mean of lognormal distributions.
As mentioned earlier, many researchers have developed CIs for the common mean of several normal and lognormal distributions. However, there has not yet been an investigation of statistical inference using the common mean of several delta-lognormal distributions. Since the common mean is used to study more than one population, the average precipitation in the five regions in Thailand can be estimated using it as there is an important need to estimate the daily rainfall trends in these regions. Furthermore, the daily rainfall records from the five regions in Thailand satisfy the assumptions for a delta-lognormal distribution. Herein, CIs for the common mean of several deltalognormal models based on the fiducial GCI (FGCI), LS, MOVER, parametric bootstrap (PB), and highest posterior density (HPD) intervals based on Jeffreys' rule (HPD-JR) and normal-gamma-beta (HPD-NGB) priors are proposed. The outline of this article is as follows. The ideas behind the proposed methods are detailed in the Methods section. Numerical computations are reported in 'Simulation Studies and Results'. In 'An Empirical Application', the daily natural rainfall records of the five regions in Thailand are used to illustrate the efficacy of the methods. Finally, the paper is ended with a discussion and conclusions.

METHODS
Let W ij = (W i1 ,W i2 ,...,W in i ) be random samples drawn from a delta-lognormal distribution, for i = 1,2,...,k and j = 1,2,..,n i . There are three parameters in this distribution: the mean µ i , variance σ 2 i and the probability of obtaining a zero observation δ i . The distribution of W ij is given by The number of zeros has a binomial distribution n i(0) = # j : w ij = 0 ∼ B(n i ,δ i ). The population mean of W ij is given by The unbiased estimates of µ i ,σ 2 i , and δ i areμ i = n −1 i(1) j:w ij >0 lnW ij ,σ 2 i = (n i(1) − 1) −1 j:w ij >0 lnW ij −μ i 2 , andδ i = n i(0) /n i , respectively, where n i = n i(0) + n i(1) ; n i(1) = # j : w ij > 0 . Suppose that the delta-lognormal mean in Eq.
(2) for all k populations are the same, then according to Tian & Wu (2007) and Krishnamoorthy & Oral (2015), the common delta-lognormal mean is defined as According to Aitchison & Brown (1963), the Aitchison estimate of ϑ i is expressed aŝ where ψ a (b) is a Bessel function defined as To investigate the unbiased estimateθ (Ait ) i , the expected value is According to Shimizu & Iwase (1981), the uniformly minimum variance unbiased (UMVU) estimate of ϑ i iŝ where E(n i(1) ) = n i (1 − δ i ). The asymptotic variance ofθ (Shi) i is given by Actually, ψ n i(1) (σ

Fiducial generalized confidence interval
Fiducial inference was introduced by Fisher (1930). Fisher's fiducial argument was used to develop a generalized fiducial recipe that could be extended to the application of fiducial ideas (Hannig, 2009). The concept of the fiducial interval has been advanced by the idea of the generalized pivotal quantity (GPQ) such that it is directly used to apply for generalized inference. Later, Hannig, Iyer & Patterson (2006) argued that a subclass of GPQs, the fiducial GPQ (FGPQ), provides a framework that shows the connection between a distribution and a parameter. Recall thatμ i ∼ N (µ i ,σ 2 i /n i(1) ) and (n i(1) − 1)σ 2 i /σ 2 i ∼ χ 2 n i(1) −1 are the independent random variables. The structure functions ofμ i andσ 2 i arê which are the function of V i and U i , respectively, where V i ∼ N (0,1) and U i ∼ χ 2 n i(1) −1 . Given the observed values, the estimatesμ i andσ 2 i can be obtained, and the unique solution The respective FGPQs of µ i and σ 2 i are where V * i and U * i are independent copies of V i and U i , respectively. Hasan & Krishnamoorthy (2018) developed the FGPQ of δ i using a beta distribution as G δ i ∼ Beta(α i ,β i ); α i = n i(1) + 0.5 and β i = n i(0) + 0.5. The FGPQ of ϑ based on k individual samples is where G ϑ (ζ ) denotes the ζ th percentiles of G ϑ . Algorithm 1 shows the computational steps for obtaining the FGCI.

Large sample interval
Recall that the Aitchitson estimator isθ (Ait ) The approximated variance is obtained by replacingμ i ,σ 2 i andδ i . The pooled estimate of ϑ i is given bŷ where w i = 1/ Var θ (Ait ) i . Hence, the 100(1 − ζ )% LS interval for ϑ is obtained as where z ζ denotes the ζ th percentiles of standard normal N (0,1). The LS interval can be estimated easily via 'Algorithm 2'. (2) Computeθ.

Method of variance estimates recovery
This method produces a closed-form CI that is easy to compute. For this reason, the MOVER CI for the common delta-lognormal mean is considered for k individual random samples. The MOVER for a linear combination of ϑ i ;i=1 ,2,...,k is as follows. Letθ 1 ,θ 2 ,...,θ k be independent unbiased estimators of ϑ 1 ,ϑ 2 ,...,ϑ k , respectively. In addition, let [l i ,u i ] stand for the 100(1 − ζ )%CI for ϑ i . According to Krishnamoorthy & Oral (2015), Next, the closed-form CIs for ϑ i are needed to construct MOVER for ϑ. Thus, ϑ i is log-transformed as where δ * i = 1 − δ i . Letμ i , andσ 2 i andδ * be the unbiased estimates of µ i , σ 2 i , and δ i , respectively. The MOVER for a single delta-lognormal mean presented by Hasan & Krishnamoorthy (2018), the MOVER for ϑ i is given by Note that both Krishnamoorthy & Oral (2015), . 'Algorithm 3' describes the steps to construct the MOVER interval.

Parametric Bootstrap
This is developed from the parametric bootstrap on the common mean of several heterogeneous log-normal distributions, proposed by Malekzadeh & Kharrati-Kopaei (2019). The delta-lognormal mean is transformed by taking the logarithm as The likelihood of (ϑ,σ 2 i ,δ i ) is which enables obtaining the maximum likelihood estimates of lnϑ and σ 2 i as then it becomes the common lognormal mean (see Krishnamoorthy & Oral (2015) for a detailed explanation). By applying central limit theorem, we obtain lnθ mle − lnϑ . It can be seen that the distribution of T is complicated, possibly depending on nuisance parameters σ 2 i and δ i , but not on lnϑ. Thus, the exact distribution of T is unknown in practice, and so we propose the PB pivotal variable corresponding to T PB as i are the observed values ofμ i ,σ 2 i , andδ i , respectively, from random sampling with replacement based on the bootstrap approach. Thus, the 100(1 − ζ )% PB interval for ϑ is given by where q PB ζ denotes the (1 − ζ ) th percentile of distribution of T PB . The PB interval can be constructed as shown in 'Algorithm 4'.

Highest posterior density intervals
The HPD interval is constructed from the posterior distribution, as defined by Box & Tiao (1973). Note that the prior of ϑ i is updated with its likelihood function thereby obtaining the posterior distribution based on the Bayesian approach. Recall that W ij ∼ (µ i ,σ 2 i ,δ i ), then the likelihood is given by For k individual samples, Miroshnikov, Wei & Conlon (2015) described the pooled independent sub-posterior samples toward the joint posterior distributions ϑ are combined using weighted averages as follows: where ϑ post i are the posterior samples of ϑ i , for i = 1,2,...,k. The inverse of the sample variance is used to weight the posterior based on the ith samples is denoted as w i = Var −1 (θ i |w ij ). Different priors have been developed for estimating the common delta-lognormal mean, two of which are derived in the following subsections. Harvey & van der Merwe (2012) defined this prior as

Jeffreys' rule prior
which is combined with the likelihood Eq. (34) to obtain the posterior of ϑ as This leads to obtaining the marginal posterior distributions of µ i , σ 2 i and δ i as The pooled posterior of ϑ is weighted by its inversely estimated variance as follows: where ϑ . From Eq. (36), the 100(1 − ζ )%HPD-based Jeffreys' rule prior (HPD-JR) for ϑ is constructed as follows: Maneerat, Niwitpong & Niwitpong (2020) proposed a HPD based on the normal-gamma prior for the ratio of delta-lognormal variances that worked better than the HPD-JR of Harvey & van der Merwe (2012). Suppose that Y = lnW be a random variable of normal distribution with mean µ = (µ 1 ,µ 2 ,...,µ k ) and precision λ = (λ 1 ,λ 2 ,...,

Normal-gamma-beta prior
where (µ i ,λ i ) follows a normal-gamma distribution, and δ i follows a beta distribution, denoted as ( respectively. When the the prior Eq. (37) is combined with the likelihood Eq. (34), then the posterior density of ϑ becomes which can be integrated out to obtain the marginal posterior distributions of µ i , λ i and δ i as follows: where df = 2 (n i(1) − 1) and σ 2(NGB) . Similarly, the pooled posterior of ϑ is given by where ϑ

SIMULATION STUDIES AND RESULTS
The performances of the CIs were assessed by comparing their coverage probabilities (CPs) and average length (ALs) using Monte Carlo simulation. The best-performing CI is the one where the CP is closest to or greater than the nominal confidence level 1−ζ while also having an AL with the narrowest width. The CIs for the common delta-lognormal mean constructed using FGCI, LS, MOVER, PB, HPD-JR, and HPD-NGB were assessed in the study, the parameter settings for which are provided in Table 1. The number of generated random samples was fixed at M = 5000. For FGCI, the number of FGPQs was Q = 2500 for each set of 5,000 random samples. 'Algorithm 6' shows the computational steps to estimate the CP and AL performances of all of the methods.
(2) Compute the unbiased estimatesμ i ,σ 2 i andδ.   (Table 2 and Fig. 1), FGCI performed well for small-to-moderate sample sizes, as well as for large σ 2 i and a moderate-to-large sample size. HPD-NGB attained stable and the best CP and AL values for small σ 2 i and a moderateto-large sample size. MOVER and PB attained correct CPs but wider ALs than the other methods whereas LS and HPD-JR had lower CPs and narrower ALs. For k = 5 (Table 3 and Fig. 2), there were only two methods producing better CPs than the other methods in the various situations: MOVER (small δ i and σ 2 i ) and PB (large δ i and σ 2 i ). Moreover, the results were similar for k = 10 (Table 4 and Fig. 3).
As previously mentioned, our findings show that FGCI works well for small sample case because the FGPQ of σ 2 i might contain some weak points that affect the FGPQ of µ i as the sample case increases. For large sample sizes, MOVER was the best method for small σ 2 , which is possibly caused by the CI for µ i +σ 2 i . Meanwhile, the next best one was PB, which has the strong point of using a resampling technique to collect information about several populations even when the variance σ 2 is large.  Notes. FG, fiducial generalized confidence interval; MO, method of variance estimates; HJ, HPD-based Jeffreys' rule prior; HPD-JR, HN, HPD-based normal-gamma-beta prior. Bold denoted as the best-performing method each case.

AN EMPIRICAL APPLICATION
Daily rainfall data obtained from the Thai Meteorological Department (TMD) were divided into the northern, northeastern, central, and eastern regions, while the southern region was a combination of the data from the southeastern and southwestern shores. Due to the differences in the climate patterns and meteorological conditions in the five regions, we focused was on estimating the daily rainfall data in these regions by treating them as separate sets of observations rather than using the average rainfall for the whole of Thailand by pooling them and treating them as a single population. The daily rainfall amounts were recorded on August 5 and 9, 2019, which is in the middle of the rainy season (mid-May to mid-October) when rice farming is conducted in Thailand. Entries with rainfall of less than 0.1 mm were considered as zero records. Tables 5-6 contain the daily rainfall records for the five regions, while Figs. 4-5 show histogram plots of rainfall observations, and Figs. 6-7 exhibit normal Q-Q plots of the log-positive rainfall data on August 5 and 9, 2019, respectively. It can be seen that the data for all of the regions contained zero observations. After that, the fitted distribution of the positive observations was checked using the Akaike information criterion (AIC), as reported in Table 7. It can be concluded that the rainfall data in all of the regions on August 5 and 9, 2019 follow a delta-lognormal distribution. All data sets and R code are available in the Supplemental Information. The summary statistics are reported in Table 8.       In the approximation of the daily rainfall amounts in the five regions, the estimated common means were 4.4506 and 13.2621 mm/day on August 5 and 9, 2019, respectively.
The computed 95% CIs of the common rainfall mean are reported in Table 9. Under the rain criteria issued by the TMD (Department, 2018), it can be interpreted that the daily rainfall in Thailand on August 5, 2019, was light (0.1-10.0 mm), while it was moderate (10.1-35.0 mm) on August 9, 2019. These results confirm the simulation results for k = 5 in the previous section.

DISCUSSION
It can be seen that for MOVER and PB developed from the studies of Krishnamoorthy & Oral (2015) and Malekzadeh & Kharrati-Kopaei (2019), respectively, the simulation results are similar to both of these studies provided that the zero observations are omitted. CIs for the common mean have been investigated in both normal and lognormal distributions (Fairweather, 1972;Jordan & Krishnamoorthy, 1996;Krishnamoorthy & Mathew, 2003; Lin & Lee, 2005;Tian & Wu, 2007;Krishnamoorthy & Oral, 2015). However, the common mean of delta-lognormal populations is especially of interest because it can be used to fit the data from real-world situations such as investigating medical costs (Zou, Taleban & Huo, 2009;Tierney et al., 2003;Tian, 2005), analyzing airborne contaminants (Owen & DeRouen, 1980;Tian, 2005) and measuring fish abundance (Fletcher, 2008;Wu & Hsieh, 2014). Furthermore, it is possible that some extreme rainfall data also fulfill the assumptions of a delta-lognormal distribution. Note that such natural disasters as floods and landslides have been caused by the extreme rainfall events, as evidenced in many country around the world: Europe (e.g., Northern England, Southern Scotland and Ireland Otto & Oldenborgh, 2017), Asia (e.g., Japan Oldenborgh, 2018) and North America (e.g., Southeast Texas Oldenborgh et al., 2019). Our findings show that some of the methods studied had CPs that were too low or too high for large sample cases, a shortcoming that should be addressed in future work.

CONCLUSIONS
The objective of this study was to propose CIs for the common mean of several deltalognormal distributions using FGCI, LS, MOVER, PB, HPD-JR, and HPD-NGB. The CP and AL as performance measures of the methods were assessed via Monte Carlo simulation. The findings confirm that for small sample case ()k=2 (), FGCI and HPD-NGB are the recommended methods in different situations: FGCI (a small-to-moderate sample size and a large σ 2 i with a moderate-to-large sample size) and HPD-NGB (small σ 2 i with a moderate-to-large sample size). For large sample cases (k = 5,10), MOVER small δ i and σ 2 i ) and PB (large δ i and σ 2 i ) performed the best.