Bayesian estimation for the mean of delta-gamma distributions with application to rainfall data in Thailand

Precipitation and flood forecasting are difficult due to rainfall variability. The mean of a delta-gamma distribution can be used to analyze rainfall data for predicting future rainfall, thereby reducing the risks of future disasters due to excessive or too little rainfall. In this study, we construct credible and highest posterior density (HPD) intervals for the mean and the difference between the means of delta-gamma distributions by using Bayesian methods based on Jeffrey’s rule and uniform priors along with a confidence interval based on fiducial quantities. The results of a simulation study indicate that the Bayesian HPD interval based on Jeffrey’s rule prior performed well in terms of coverage probability and provided the shortest expected length. Rainfall data from Chiang Mai province, Thailand, are also used to illustrate the efficacies of the proposed methods.


INTRODUCTION
Weather conditions can vary immensely each day and forecasting it accurately for up to 7 days in advance is greatly desired. The climate in a given area provides a broad picture of temperature and rainfall variation over time and is categorized into seasons. For example, Chiang Mai, a province in Northern Thailand, has three seasons: summer (from March to June), the rainy season (from July to October), and winter (from November to February). The major economic output from Chiang Mai is from agriculture, for which rainfall is essential: insufficient or nonexistent rainfall (drought conditions) causes crops to die, whereas excessive rainfall (flooding) destroys crops and can cause disasters such as landslides. Therefore, predicting the amount of rainfall during each period is very important because it would enable farmers to plan the proper use of water resources accordingly. Thus, assessing rainfall dispersion in specific areas by using statistical methods such as the mean is of great importance. Chiang Mai province has an average rainfall of approximately 1,134 mm per year, with the highest rainfall in a day being 166.5 mm (August 14, 1968). The rainiest month is August and the least rainy month is January (Amatayakul & Chomtha, 2013). There can be zero millimeters of rainfall in a month, and so a monthly rainfall series often includes zero values. When a rainwater series only contains positive values, they can be fitted to standard continuous probability distributions such as a gamma distribution. For instance, Sangnawakij & Niwitpong (2017) and Krishnmoorthy, Mathew & Mukherjee (2008) constructed confidence interval for a gamma distribution of monthly rainfall data. However, a delta-gamma distribution (or a zero-inflated gamma distribution) is more suitable for data containing both positive and zero observations. The positive values comprise a gamma distribution with shape and rate parameters while the zero values follow a binomial distribution with proportion of zeros. Inference from a delta-gamma distribution applied to real data has been conducted in many fields. For instance, the testing of body armor for stab resistance in engineering during which a zero value was recorded when the armor was not pierced (Zimmer, Park & Mathew, 2020) and ecological data for biomasses that often contain a high proportion of zeros with skewed positive values (Lecomte et al., 2013).
The confidence interval is a range of observed values within which an unknown population parameter value such as the population mean is known to reside, and a specific confidence level is applied to conclude that the estimated interval contains the true value of the parameter. As an example, Zimmer, Park & Mathew (2020) estimated the coverage probabilities of the 95% upper confidence limits of a zero-inflated gamma distribution for confidence intervals constructed via bias-corrected and accelerated bootstrapping and the bootstrap-calibrated delta method.
The mean of a statistical distribution with a continuous random variable (also known as the expected value or expectation) is the long-running average value of random variables obtained by integrating the product of the variable with its probability defined by the distribution. Since the mean is the most popular measure of central tendency, we are interested in constructing confidence intervals for estimating the population mean as well as expanding the concept to analyze the difference between the means of two populations. Several studies have investigated methods of constructing confidence intervals for functions of the mean. Muralidharan & Kale (2002) used the maximum likelihood concept to estimate the parameters and construct confidence intervals for the mean of a zero-inflated gamma mean population. Ren, Liu & Pu (2021) used fiducial methods to establish simultaneous confidence intervals for the mean of multiple zero-inflated gamma distributions. Thangjai, Niwitpong & Niwitpong (2017) proposed confidence intervals for the mean and the difference between the means of two normal distributions with unknown coefficients of variation. Niwitpong, Koonprasert & Niwitpong (2012) provided confidence intervals for the difference between normal population means with known coefficients of variation. Maneerat, Niwitpong & Niwitpong (2019a) proposed Bayesian methods to construct highest posterior density (HPD) intervals for the mean and the difference between the means of two delta-lognormal distributions. Maneerat, Niwitpong & Niwitpong (2019b) proposed confidence intervals for the mean of a delta-lognormal distribution using the generalized confidence interval and the method of variance estimate recovery based on a weighted beta distribution and variance stabilizing transformation, respectively. Nonetheless, no publications have yet been forthcoming on constructing confidence intervals for the mean and the difference between the means of two delta-gamma distributions.
Herein, we propose confidence intervals for both the mean and the difference between the means of delta-gamma populations. We propose five methods comprising Bayesian credible and HPD intervals based on the Jeffrey's rule and uniform priors along with a confidence interval based on FQs. The performances of the proposed confidence intervals were evaluated using coverage probabilities and expected lengths via Monte Carlo simulations and were then applied to estimate monthly rainfall data from Chiang Mai province, Thailand, as a demonstration of their efficacy.
In this article, we propose the confidence intervals for the mean of delta-gamma distribution and we expanded to establish confidence intervals for the difference between delta-gamma means are presented in the 'Methods' section. The details of the simulation study and the performances of the methods were compared in terms of their coverage probabilities and expected lengths are included in the 'Results and Discussion' section. An empirical application of the proposed methods with a monthly rainfall data from Chiang Mai province Thailand are reported in 'An empirical application'. The last section contains 'Conclusions'.

METHODS
Confidence intervals for the mean of a single delta-gamma distribution Let X ¼ X 1 ; X 2 ; …; X n ð Þbe independent and identically distributed random sample from a delta-gamma distribution denoted as X∼DG(a, β, δ). The distribution function of X is given by where G x; a; b ð Þis a gamma distribution with shape parameter a and rate parameter β and where n 0 ð Þ and n 1 ð Þ are the number of zero and positive observed values, respectively. The population mean of X is s ¼ 1 À d ð Þ a b , and so the sample mean for τ isŝ ¼ 1 Àd âb . Krishnmoorthy, Mathew & Mukherjee (2008) showed that data can be transformed using the cube root approximation to develop inferential procedures for a gamma distribution. Suppose that G ¼ G 1 ; G 2 ; …; G n ð Þbe independent and identically distributed random variables from a gamma distribution, denoted as G a; b ð Þ, and that Y = G 1/3 ∼ N(µ, σ 2 ) is approximately normal with µ and σ 2 given by Krishnmoorthy & Wang (2016) l ¼ a b 1=3 1 À 1 9a and r 2 ¼ 1 9a 1=3 b 2=3 , respectively. Since the mean of a gamma distribution M ¼ a b , then we can rewrite µ and σ 2 to obtain Thus, the mean of a delta-gamma distribution is s

Bayesian methods
Bayesian statistical methods use Bayes' theorem to explain the conditional probability based on the prior distribution of the data. Hence, for the posterior (or conditional) probability of θ given sample x and prior p(θ), the likelihood function p xjh ð Þ can be defined as For non-multimodal densities when p hjx ð Þ is not symmetric, Box & Tiao (1973) defined the HPD interval as follows.
Suppose p hjx ð Þ is a posterior distribution, then region R in the parameter space of θ is called the HPD region of content (1 − γ) if the following two conditions are satisfied: As stated earlier, a delta-gamma distribution is a combination of gamma and binomial distributions. X i 6 ¼ 0; i ¼ 1; 2; …; n 1 ð Þ following a gamma distribution can be transformed using cubed roots to a normal distribution denoted as Y∼N(µ, σ 2 ). Suppose that Y ¼ Y 1 ; Y 2 ; …;Y n ð Þbe independent and identically distributed random variables with probability density function, the likelihood function is p yjk ð Þ / ðr 2 Þ Àn 1 ð Þ =2 exp À 1 2r 2 X n ð1Þ i¼1 ðy i À lÞ 2 and parameter λ = (µ, σ 2 ). Thus, the Fisher information for λ can be obtained from the above equation as The delta-gamma distribution for three unknown parameters is denoted as θ = (µ, σ 2 , δ) with likelihood function Therefore, the Fisher information for θ becomes In the following subsections, we cover the Jeffrey's rule and uniform priors used to construct Bayesian credible and HPD intervals for the mean of a delta-gamma population.
The Jeffrey's rule prior is defined as the square root of (9) Therefore, the 100(1 − γ)% two-sided interval for τ is The HPD interval has the property that every point within its region has a higher probability than any point outside of it (Noyan & Pham-Gia, 1993;Chen & Shao, 1999). Thus, to find the 100(1 − γ)% HPD interval of τ J , we computed CI HPD:J ¼ L HPD:J ; ½ U HPD:J ¼ s HPD:J ðc=2Þ; s HPD:J ð1 À c=2Þ ½ by using the HPDinterval package in the R software suite, defined by Box & Tiao (1973).
Suppose that X ¼ X 1 ; X 2 ; …; X n ð Þbe independent and identically distributed random variables with f X (x, ϑ, ζ), where ϑ is the parameter of interest and ζ is a nuisance parameter. Thus, the percentile of generalized pivotal quantity R(X; x, ϑ, ζ) is only a function of ϑ (a fiducial quantity) if it satisfies the following conditions: Given X, the R(X; x, ϑ, ζ) distribution is free of all parameters.
From Y∼N(µ, σ 2 ), the sample mean and variance of Y are " Y % l þ Z r ffiffiffiffiffiffiffi n ð1Þ p and S 2 % r 2 v 2 n ð1ÞÀ1 , respectively, where Z and v n ð1Þ−1 are standard normal and Chi-squared distributions with n (1)−1 degrees of freedom, respectively.

Confidence intervals for the difference between delta-gamma means
In this section, we extend the ideas for the single delta-gamma mean confidence intervals to create new ones for the difference between two delta-gamma means.
Þbe independent and identically distributed random samples from two delta-gamma distributions denoted as X∼DG(a, β, δ) and V∼DG(a 2 , β 2 , δ 2 ), then the difference between their means is simply where . The maximum likelihood estimator of δ 2 iŝ The Bayesian methods ð Þ and V j 6 ¼ 0; j ¼ 1; 2; …; m 1 ð Þ from gamma distributions can be transformed to normal distributions denoted as Y∼N(µ, σ 2 ) and Y 2 $Nðl 2 ; r 2 2 Þ, respectively, by using the cube roots of the data. Thus, the likelihood function is with parameter u ¼ ðl; r 2 ; l 2 ; r 2 2 Þ. Thus, the Fisher information for φ can be derived from the above equation as We can apply the difference between the two independent means as the unknown parameter denoted as f ¼ ðl; r 2 ; d; l 2 ; r 2 2 ; d 2 Þ with likelihood function Therefore, the Fisher information for ϕ becomes This can be used to construct confidence intervals for the difference between the means of two delta-gamma populations using Bayesian credible and HPD intervals based on Jeffrey's rule and uniform priors as follows.
m (1) − 1 degrees of freedom, respectively. Hence, the FQs for 1 − δ 2 and M 2 can respectively be obtained as and Thus, the FQs for the difference between two delta-gamma means can be derived as Therefore, the 100(1 − γ)% two-sided confidence interval using the FQs for ψ is

SIMULATION STUDIES AND RESULTS
The five methods for establishing new confidence intervals for the mean and the difference between the means of two delta-gamma distributions were tested via a Monte Carlo simulation study conducted using the R statistical program (R Core Team, 2021). The performances of the five proposed methods were compared in terms of their coverage probabilities and expected lengths respectively derived as CP ¼ cðL s or w UÞ 15;000 and EL ¼ P 15;000 k¼1 ðU k À L k Þ 15;000 ; where cðL s or w UÞ is the number of simulation runs for τ or ψ. The simulation results are presented for significance level γ = 0.05. The best-performing confidence interval was chosen with a coverage probability greater than or close to the nominal confidence level of 0.95 and the shortest expected length. For the single delta-gamma mean, the data were generated for X∼DG(a, β, δ) with sample sizes n = 30, 50, 100, or 200 and the probability of zeros δ = 0.2, 0.5, or 0.7, for Algorithm 6 • Step 1 generate x and v from DG(a, β, δ) and DG(a 2 , β 2 , δ 2 ); • Step 2 compute x 1/3 and v 1/3 ; Step 5 repeat Step (4) 5,000 times; • Step 6 compute the 95% confidence interval for ψ from CI d.FQ ; • Step 7 repeat Step (1)-(6) 15,000 times to compute the coverage probabilities and the expected lengths.
The coverage probabilities and expected lengths of the 95% confidence intervals for τ are reported in Table 1. It can be seen that the coverage probabilities of the Bayesian HPD interval based on the uniform prior and the FQ confidence interval were greater than or close to the nominal confidence level of 0.95 in all cases where as those of the Bayesian credible and HPD intervals based on the Jeffrey's rule prior and the Bayesian credible interval based on the uniform prior were less than the nominal confidence level for some cases. However, the expected lengths of the Bayesian HPD interval based on the Jeffrey's rule prior were shorter than the other methods as shown in Fig. 1. Therefore, the Bayesian HPD interval based on the Jeffrey's rule prior is recommended for constructing the confidence interval for the mean of a single delta-gamma distribution. The coverage probabilities and expected lengths of the 95% two-sided confidence interval for ψ with equal and unequal sample sizes are listed in Tables 2 and 3, respectively. The results show that the Bayesian HPD interval based on the Jeffrey's rule prior, the Bayesian credible and HPD intervals based on the uniform prior, and the FQ confidence interval provided coverage probabilities that were greater than or close to the nominal confidence level of 0.95 in all cases whereas the Bayesian credible interval on the Jeffrey's rule prior with (δ, δ 2 ) = (0.7, 0.7) provided ones that were less than the nominal confidence level for some cases. Since the expected lengths of the Bayesian HPD interval based on the Jeffrey's rule prior were the shortest as shown in Figs. 2 and 3. Thus, we can recommend it for constructing the confidence interval for the difference between the means of two delta-gamma distributions with equal and unequal sample sizes. Furthermore, the results for the difference between the means of two delta-gamma distributions for sample size n > m yielded similar results to those for n < m.

AN EMPIRICAL APPLICATION
In this part of the study, we approximate the mean of monthly rainfall data from Chiang Mai province, Thailand (https://www.hydro-1.net/), using the five confidence intervals proposed in this paper to illustrate their efficacies. There are three cases as follows. Case 1 was used to test the mean of a delta-gamma distribution for which we used rainfall data from only one rain station in Chiang Mai to provide a sample size consistent with that used in the simulation study. The difference between the means of two delta-gamma  distributions with equal sample sizes was investigated in Case 2 by using rainfall data over a period of time at the same station in Chiang Mai for various months within the same season. For Case 3, we compared the means of two delta-gamma distributions with uneven sample sizes by combining data from several stations in Chiang Mai for the same month.
Example 1: testing the mean of a single delta-gamma distribution Rainfall data were obtained from the Upper Northern Region Irrigation Hydrology Center (2021). We used monthly rainfall data (mm) from Irrigation Office Station I, Chiang Mai    city, comprising 50 observations in January from 1972 to 2021. The densities for the rainfall data are shown in Fig. 4. Next, we tested the distributions of positive rainfall datasets using the minimum Akaike information criterion (AIC) defined as where L is the likelihood function and k is the number of parameters. The results in Table 4 show that the positive rainfall dataset for Irrigation Office Station I fit a gamma distribution, as confirmed by the AIC because the AIC value for this distribution was the smallest. The Q-Q plots of positive rainfall data showing that they follow gamma distributions are exhibited in Fig. 5.
The zero values in rainfall data fitted a binomial distribution, and so the delta-gamma distribution is suitable for this data. The summary statistics were computed for the rainfall dataset from Irrigation Office Station I as n = 50, n (0) = 27, n (1) = 23 with maximum likelihood estimatorsd = 0.54,â = 5.30,b = 2.06, andŝ = 1.18. The 95% confidence intervals for τ are reported in Table 5. In accordance with the simulation results in the previous section, the length of the Bayesian HPD interval based on the Jeffrey's rule prior was shorter than the other methods, thereby confirming its suitability for constructing the confidence interval for the mean of a delta-gamma distribution.  Example 2: testing the difference between the means of two delta-gamma distributions with equal sample sizes Since January and February are in the winter season, they have similar precipitation profiles containing both positive and zero observations, and so the data were found to be consistent with a delta-gamma distribution. Therefore, the data from these months were chosen to compare the difference between the means of two delta-gamma distributions in this study. For n = m, we used monthly rainfall data from the Mae Taeng       Next, we tested the distributions of the positive rainfall datasets from the Mae Taeng Project station using AIC, the results of which are reported in Table 4. Q-Q plots of positive rainfall data showing that they follow gamma distributions are exhibited in Fig. 7.
The summary statistics were computed for the rainfall in January dataset from the Mae Taeng Project station as n = 46, n (0) = 23, n (1) = 23,d = 0.50,â = 4.41,b = 1.77, and s = 1.25 and for the rainfall in February dataset as m = 46, m (0) = 29, m (1) = 17,d 2 = 0.63, a 2 = 4.96,b 2 = 2.15, andŝ 2 = 0.85. From the 95% confidence intervals for ψ (Table 6), the expected length of the Bayesian HPD interval based on Jeffrey's rule prior was shorter than the other methods, which confirmed its suitability for constructing confidence intervals for the difference between the means of delta-gamma distributions with equal sample sizes.
Example 3: testing the difference between the means of two delta-gamma distributions with unequal sample sizes   The results in Table 4 are from using AIC to test the suitability of distributions to fit the positive rainfall datasets for the two stations, while the Q-Q plots of the positive rainfall data in Fig. 9 show that they follow gamma distributions.
The summary statistics for the rainfall dataset were n = 51, n (0) = 28, n (1) = 23,d = 0.55, a = 9.01,b = 3.20, andŝ = 1.27 for the Mae Hong Huk station and m = 171, m (0) = 97, m (1) = 74,d 2 = 0.57,â 2 = 3.91,b 2 = 1.60, andŝ 2 = 1.06 for the Mae Kuang station. The 95% confidence intervals for ψ reported in Table 7 show that the length of the  Bayesian HPD interval based on Jeffrey's rule prior was shorter than the others, which confirmed its appropriateness for constructing confidence intervals for the difference between the means of delta-gamma distributions with unequal sample sizes.

DISCUSSION
We used Krishnmoorthy & Wang's (2016) approach for establishing confidence intervals for the mean of a gamma distribution by using FQs in the case of the same distribution with excess zeros. Furthermore, we extended Yosboonruang, Niwitpong & Niwitpong's (2019) approach for building confidence intervals for distributions containing some zero observations by using Bayesian methods based on Jeffrey's rule and uniform priors.
The results show that the Bayesian HPD interval with Jeffrey's rule prior performed well in terms of coverage probability and had the shortest expected lengths for estimating the mean and the difference between the means of delta-gamma distributions with equal sample sizes. For unequal sample sizes, the results of simulation for the difference between the means of two delta-gamma distributions for n > m were similar to n < m. The proposed strategy can be utilized to help mitigate droughts or floods caused by insufficient or excessive rainfall, respectively. Similarly, the government could use our approach to control the output from dams when there is insufficient or too much rain. However, a limitation of the study is that we cannot apply our method to real-world data that does not contain zero observations, such as often occurs in rainfall data observations during the rainy season.

CONCLUSIONS
We constructed confidence intervals for the mean and difference between the means of two delta-gamma distributions using FQs and Bayesian methods based on the Jeffrey's rule and uniform priors. The performances of the confidence intervals were evaluated in terms of their coverage probabilities and expected lengths. The results of a simulation study show that the coverage probabilities of the Bayesian HPD interval based on the Jeffrey's rule prior were greater than or close to the nominal confidence level of 0.95 in almost all cases and its expected length was shorter than the other methods for both the mean and the difference between the means of two delta-gamma distributions. When using rainfall datasets to illustrate the efficacies of the proposed methods using real data, the Bayesian HPD interval based on the Jeffrey's rule prior performed better than the other methods in terms of interval length, which is consistent with the simulation results. Therefore, the Bayesian HPD interval based on the Jeffrey's rule prior is recommended for constructing confidence intervals for the mean and the difference between the means of two delta-gamma distributions.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This research received financial support from the National Science, Research, and Innovation Fund (NSRF) and King Mongkut's University of Technology North Bangkok