Bayesian estimation for median discrete Weibull regression model

: The discrete Weibull model can be adapted to capture di ﬀ erent levels of dispersion in the count data. This paper takes into account the direct relationship between explanatory variables and the median of discrete Weibull response variable. Additionally, it provides the Bayesian estimate of the discrete Weibull regression model using the random walk Metropolis algorithm. The prior distributions of the coe ﬃ cient predictors were carried out based on the uniform non-informative, normal and Laplace distributions. The performance of the Bayes estimators was also compared with the maximum likelihood estimator in terms of the mean square error and the coverage probability through the Monte Carlo simulation study. Meanwhile, a real data set was analyzed to show how the proposed model and the methods work in practice.


Introduction
Generalized linear models are extensions of the linear regression model to avoid the selection of the normality response and linearity imposed by the linear regression model, which is impossible for binary or count responses.Regression for count data is widely performed by models, such as Poisson [1], negative binomial [2] and zero-inflated regressions [3][4][5].The well-known property of a Poisson distribution shows its mean that is equal to the variance.This situation is often unrealistic as the distribution of counts tends to have a variance not equal to its mean.When data handles over-dispersion, a negative binomial distribution is utilized to model count variables.Zero-inflated models are used to model count data that have many zero counts.Moreover, the Conway-Maxwell Poisson distribution is used to deal with under-dispersion and over-dispersion [2].Also, the discrete Weibull distribution is examined to handle under-dispersion and over-dispersion discrete data.This model was first introduced by Nakagawa and Osaki [6].The motivation for considering the discrete Weibull distribution stems from the vital role the that continuous Weibull distribution plays in the survival analysis and failure time study.Similarly, the continuous Weibull distribution is widely used in probabilistic modeling and a fatigue life prediction model [7,8].However, there are many challenging things about this distribution that has yet to be proposed.
The inference for parameters of the discrete Weibull regression model has been investigated in a few studies based on a parameter affecting zero observation related to the explanatory variables through the log-log and logit link functions.Kalktawi [9] and Englehardt and Li [10] showed how a discrete Weibull regression model can be adapted to address over-dispersion and under-dispersion via the log-log link function.Moreover, Klakattawi et al. [11] proposed an ability to adapt in a simple way to different types of dispersions: Over-dispersion, under-dispersion and covariate-specific dispersion.Peluso and Vinciotti [12] conducted a simulation study linking two parameters to inspect a discrete Weibull regression model's level of flexibility.
The maximum likelihood estimation of parameters is valid for an asymptotically large sample size of data [13].One of the most common problems occurring in the count regression model is the maximum likelihood estimates that become unstable with larger standard errors of the estimates that affect statistical inference when insufficiently large sample sizes manifest.To overcome the problem, various alternatives to the maximum likelihood estimation have been proposed, and the Bayesian estimation is one of them.However, the Bayes estimators depend on the prior distributions of the parameters in the model.Many researchers at different periods of time worked in this area of research proposed different prior distributions in the count regression model (see, for example, Haselimashhadi et al. [14], Gelman et al. [15], Fu [16] and Chanialidis et al. [17]).Recently, Chaiprasithikul and Duangsaphon [18,19] proposed the Bayesian estimation for censored data and the zero-inflated and hurdle discrete Weibull regression models via the log-log link function for a parameter that affects zero observation.Uniform non-informative and normal prior distributions were used to account for the regression coefficients.It was demonstrated that this suggestion performs well when applied to real datasets and in simulation studies.Alternatively, a regression structure for the discrete Weibull model through the median link function was proposed by Kalktawi [9].In addition, there are many works that have developed the Bayesian estimation procedure for the reliability and life testing experiments (see, for example, Ahmadini et al. [20] and Okasha et al. [21]).
In the present article, the Bayesian estimation is examined, based on the random walk Metropolis algorithm for the median discrete Weibull regression model under the three different prior distributions: Uniform non-informative, normal and Laplace prior distributions.A simulation study is conducted to compare the performance of three different prior distributions and the maximum likelihood estimation in both under-dispersion and over-dispersion cases.Moreover, a real dataset is analyzed to see how the model works in practice.
The rest of paper is organized as follows.An overview of the median discrete Weibull regression model is presented in section two, along with the maximum likelihood estimation, Bayesian estimation, simulation study and real data analysis.In section three, results and discussion are explained.Finally, this paper is concluded in section four.

The median discrete Weibull regression model
A discrete Weibull distribution (type one) and some properties were introduced by Nakagawa and Osaki [6].The cumulative distribution function and the probability mass function of a random variable are given by and respectively, where 0 < q < 1 and β > 0 are the shape parameters.When y = 0, the parameter q = 1 − p Y (0; q, β), which is the probability of Y more than zero.In other words, when q is small, an excessive zero case occurs.Parameter β indicates the skewness and controls the range of values of Y.Moreover, Kalktawi [9] proposed the parameter β reflects the dispersion of data through the numerical analyses; if 0 < β ≤ 1 , the data is over-dispersion, if β ≥ 2 , the data is under-dispersion and if 1 < β < 2 , the data can be either over-dispersed or under-dispersed depending on the value of q.Thus, the discrete Weibull distribution is suitable for both over-dispersion and under-dispersion.Meanwhile, there are the special cases of a discrete Weibull distribution: Geometric distribution, the discrete Exponential distribution (see Sato et al. [22]), the discrete Rayleigh distribution (see Roy [23]) and the Bernoulli distribution.
The mean and variance of a discrete Weibull distribution are no closed-form expressions, but the numerical approximations can be obtained (see Barbiero [24]).Another property of a discrete Weibull distribution is the quantile function Q(τ).Let's say the τ − th (0 < τ < 1) quantile that is the smallest value of y for which F(y) ≥ τ.The quantile Q(τ) has a closed-form expression as given by The quantile formula provided in Eq (3) can be applied.The median for discrete distributions can be defined as any value of y that F(y) ≥ 0.5, then the median can be easily obtained from the closed form as given by The presence of regression analysis for count data is a statistical process to measure the relationship between a count variable and one or more explanatory variables.Klakattawi et al. [11] showed how a discrete Weibull regression model can be adapted to address over-dispersion and under-dispersion via the log-log link function to the parameter q.Moreover, Haselimashhadi et al. [14] applied the logit link function to the parameter q, which is commonly used in classification problems for probabilities that are bounded between zero and one.Furthermore, they performed β that depends on explanatory variables through the log link function.The proposed discrete Weibull regression model, unlike generalized linear models in which the conditional mean is central to the interpretation, has the advantage that the conditional quantiles can be easily extracted from the fitted model.Moreover, the regression coefficients can be easily interpreted in terms of changes in the conditional median.According to the median, it has a closed-form expression and is more appropriate than the mean because of the skewness and outliers commonly found for counting data.Kalktawi [9] performed the median link function to a discrete Weibull regression model and used the maximum likelihood method for parameters estimation.
This study determines Y i , i = 1, 2, ..., n as a count response variable, which takes only the nonnegative integer values with the k explanatory variables, x i = (1, x i1 , ..., x ik ) and a vector composed of regression coefficients as α = (α 0 , α 1 , ..., α k ) .It is assumed that the parameter q i = q(x i ) is related to k explanatory variables x i via the median link function as follows In the context, it is useful to assume that Thus, Equation ( 7) is substituted to Eq (4); hence, the parameter q i can be obtained as e βx i α . ( The conditional probability mass function of Y i given x i can be written as

Maximum likelihood estimation
In this section, the maximum likelihood estimation for the discrete Weibull regression model is performed by linking only parameter q.The likelihood function of the median discrete Weibull regression is given by The log-likelihood function of the discrete Weibull regression model via median is given by ln(e The maximum likelihood estimation of the parameters is obtained by setting the first partial derivatives of the log-likelihood function with respect to each unknown parameter equal to zero; where The maximum likelihood estimators do not have a closed form solution because of the complex form of the likelihood equations.It is very difficult to prove that the solution to the normal equations gives a global maximum.Therefore, the maximum likelihood estimators are estimated by using the numerical method applied in the function optim() from package stats in R language, which minimizes the negative log-likelihood function of the median discrete Weibull regression model.
Let I(α, β) be the observed Fisher's information matrix for the (k + 2) × (k + 2) unknown parameters that contain negative of the second derivative of the log-likelihood function; hence, the variance-covariance matrix is the inverse of the observed Fisher's information matrix, The maximum likelihood estimators are substituted, thus resulting in an estimator of denoted by ˆ , This matrix can be obtained by inverting the Hessian matrix from the function hessian() in R language.The Hessian matrix contains the second derivative of the negative log-likelihood, i.e., moreover, the Hessian matrix is the observed Fisher's information matrix.
According to the parameter inferences performed using the maximum likelihood method, under some regularity conditions [25], these estimators enjoy standard asymptotic properties.Thus, by the asymptotic normality of maximum likelihood estimators, the 100(1 − α)% confidence intervals for parameters α j , j = 0, 1, 2, ..., k and β, respectively are where z α/2 is the upper α/2 − th quantile of the standard normal distribution.

Random walk Metropolis algorithm
The Metropolis-Hastings (MH) algorithm is the most popular example of a Markov chain Monte Carlo (MCMC) method for simulating a sample from a probability distribution that is the target distribution from which direct sampling is difficult.This algorithm is similar to the acceptancerejection method; the proposal (candidate) value can be generated from the proposal distribution, then, the proposal value is accepted with an acceptance probability.Moreover, the MH algorithm is converging to the target distribution itself.For more details on the MH algorithm, see Hastings [26] and Gilks et al. [27].
Given y = (y 1 , y 2 , ..., y n ) is the vector of the observed values of a random sample Y 1 , Y 2 , ...Y n , let p(θ|y) be the target distribution, while θ is the vector of current state values (parameters) and θ * is the proposal value generated from the proposal distribution q(θ * |θ) .Then, the proposal value θ * is accepted with the probability p = min(1, R θ ) , where The iterative steps of the MH algorithm can be described as follows Step 1: Initialize the parameter θ (0) for the algorithm.
Step 2: For l = 1, 2, ..., L repeat the following steps: with probability 1-p.A random walk Metropolis algorithm is a special case of the MH algorithm.In the random walk Metropolis algorithm, the proposal distribution is symmetrical, depending only on the distance between the current state value and the proposal value, then the proposal value θ * is accepted with probability p = min(1, R θ ), where R θ = p(θ * |y) p(θ|y) .
The algorithm of random walk Metropolis can be summarized followed by the above steps with adjusting Step 2. Generate random error from a multivariate normal distribution with a zero-mean vector and variance-covariance .

Bayesian estimation for median discrete Weibull regression model
This section performs the Bayes estimators for the median discrete Weibull regression model based on three schemes of prior distributions as follows: i) Uniform non-informative prior distribution: If no prior information is available, a default flat prior can be resorted to, then it is easy to focus on the uniform non-informative prior distribution.The following the prior distributions are π(α j ) ∝ 1, j = 0, 1, ..., k, and π(β) ∝ 1.
ii) Normal prior distribution: As stated earlier, the possible values of α j are real numbers, which corresponds to the possible values of a normal distribution; this study selects the prior distribution of α j ; that is, a normal distribution with the hyperparameters as (µ α j , σ 2 α j ) , j = 0, 1, ..., k.For parameter β, this study selects the prior distribution that is Gamma distribution with the hyperparameters as (a, b).The following prior distributions are iii) Laplace prior distribution: If prior information is available, this study can perform the informative prior distribution that should include all possible values of parameter.The possible values of α j are real numbers which corresponds to the possible values of a Laplace distribution; it selects the prior distribution of α j ; that is, a Laplace distribution with the hyperparameters as (0, 1/λ).Similarly, it selects the prior distribution, which is Gamma distribution with the hyperparameters as (a, b).The following prior distributions are The joint prior distributions of the parameters α and β under the independence assumption is where θ = (α 0 , α 1 , ..., α k , β).
The choice of the hyperparameters' values is generally modified by available information of dataset to improve the Bayes estimators.For example, it fixes the hyperparameters' values of α j , j = 0, 1, 2, ..., k of normal prior distribution with mean zero and high variance.For Laplace prior distribution, it fixes the hyperparameters' values of α j , j = 0, 1, 2, ..., k with some λ > 0. In addition, the hyperparameters' values of β are considered by the maximum likelihood estimator of β with the mean of Gamma distribution.The joint posterior density function of the parameters α and β can be written as: where L(θ|y, x) is the likelihood function of the median discrete Weibull regression model in Eq (10).
The Bayes estimator of each parameter under the squared error loss function is the expected value of each parameter under the joint posterior density function.Therefore, the Bayes estimators are given by α j = ...
A difficulty to the implementation of Bayesian procedure is that of obtaining the posterior distribution.The process often requires the integration which is very difficult to calculate especially when dealing with complex and high-dimensional models.In such a situation, MH algorithms are highly helpful in this case to model deviations from the posterior density and generate accurate approximations [26,27].
Since the integral in Eqs ( 19) and ( 20) does not have a closed form, this study chose the random walk MH algorithm to estimate the Bayes estimators.It also determines the joint posterior density function of the parameters α and β in Eq (18) as the target distribution, while θ is the current state value and θ * is the proposal value generated from the proposal distribution q(θ * |θ).Then, the proposal value θ * is accepted with the probability p = min(1, R θ ), where For the random walk Metropolis algorithm, the proposal distribution is symmetrical, depending only on the distance between the current state value and the proposal value.Then, the proposal value θ * is accepted with probability p = min(1, R θ ) , where The iterative steps of the random walk Metropolis algorithm can be described as follows: Step 1: Initialize the parameters θ (0) = (α (0) , β (0) ) for the algorithm using the maximum likelihood estimation (MLE) of the parameters θ = (α, β).
Step 2: For l = 1, 2, ..., L , repeat the following steps: a. Generate random error vector from a multivariate normal distribution with a zero-mean vector and variance-covariance matrix as a diagonal matrix in which the diagonal elements are the diagonal of the inverse of the observed Fisher's information matrix; ∼ N(µ = 0, = diag(I −1 (θ))) .
Step 3: Remove B of the chain for burn-in.
Step 4: Calculate the estimated values of the Bayes estimators of the parameters α , and β from the average of the generated values as given by where θ is a parameter in vector θ = (α, β).

Simulation study
In this section, the Monte Carlo simulation is conducted to assess and compare the performance of the Bayesian estimation via the random walk Metropolis algorithm for the median discrete Weibull regression model under difference three prior distributions for the regression parameters: Uniform noninformative prior (Bayes(U)), normal prior (Bayes(N)) and Laplace prior (Bayes(L)).Moreover, the MLE is considered.The various selected sample sizes (n) are 50, 100 and 200.The three explanatory variables are considered: A standard normal distribution (x 1 ∼ N(0, 1)), a uniform distribution that lies between -0.3 and 0.3 (x 2 ∼ U(−0.3, 0.3)) and a Bernoulli distribution with probability of success 0.4 (x 3 ∼ Ber(0.4)).In particular, this study selects the regression parameters to take values (α 0 , α 1 , α 2 , α 3 ) = (1.5, 0.4, −0.2, 0.8) and β = 0.9 for over-dispersion, β = 2.5 for under-dispersion and β = 1.6 for either over-dispersed or under-dispersed, depending on the value of q.
The parameters are estimated by using the numerical method.In this paper, the Nelder-Mead method in the function optim() from package stats in R is applied to estimate parameters of the median discrete Weibull regression model.For Bayesian estimation, it fixes the hyperparameters' values of α j and j = 0, 1, 2, 3 of normal prior distribution with mean zero and variance 100 2 and the hyperparameters' values of β as one and the maximum likelihood estimator.In addition, it fixes the hyperparameters' values of α j and j = 0, 1, 2, 3 of Laplace prior distribution as 0.5 and the hyperparameters' values of β as one and the maximum likelihood estimator.Additionally, this study considers 10,000 iterations of the sampler and uses the first 10% of the data as burn-in.This simulation study is repeated 1,000 times.The measures of accuracy for the estimators are (i) the estimates of the parameters (Est.) (ii) the mean square error (MSE) (iii) the coverage probability (CP) (iv) the average length (AL) where α j is the j-th estimator LCL (l) α j are the j-th lower bound and upper bound for the 95% confidence interval of the l -th time and LCL α j < α j < UCL α j is the total of the number of times that α j is inside the confidence interval.The same measure of accuracy has been applied for the estimators of parameter β.Tables 1-3 report the estimates of the parameters (Est.)together with the MSE.In addition, Tables 4-6 report the 95% CP and the AL.

Real data analysis
In this section, the median discrete Weibull regression is applied to a real data set that shows the ability for over-dispersion data (see Kalktawi [9]).This data is available under the "COUNT" package in R from Hosmer and Lemeshow [28] and represents the number of visits to a doctor by pregnant women in the first three months of their pregnancies with 189 observations.The response variable is the number of physician visits in first trimester, and the three explanatory variables are history of mother smoking (1=history of mother smoking; 0=mother nonsmoker) (x 1 ), weight (lbs) at last menstrual period (x 2 ) and age of mother (x 3 ).For fitting the discrete Weibull distribution of the response variable, the Kolmogorov-Smirnov statistic is 0.0985 less than the critical value of 0.0989.Thus, this data can be modeled by the discrete Weibull distribution.Moreover, this data is modeled by the Poisson, negative binomial and discrete Weibull distributions.The results show that the Akaike information criterion (AIC) from the Poisson, negative binomial and discrete Weibull distributions are 476.59,466.85 and 466.84, respectively.Additionally, the mean and variance of the data are 0.7937 and 1.1221, respectively, which indicates an over-dispersion case.
We estimate parameters and construct the 95% confidence intervals via the maximum likelihood estimation method.To demonstrate how the proposed Bayesian method under the three prior distributions can be used in practice, this study calculates parameter estimates and the 95% HPD interval of the parameters with L = 10, 000 replicates and 10% of the chain for burn-in; B = 1, 000.In addition, the three information criteria, namely, the AIC, the Bayesian information criterion (BIC) and the deviance information criterion (DIC) (see in Kalktawi [9] and Haselimashhadi et al. [14]) are applied to compare models with different estimates of parameters, which are models for the three explanatory variables and a subset of parameters of that significance.All results are reported in Table 7.Along with, the traceplot, autocorrelation for sampled values and posterior densities of significant independent variables are performed in Figure 1.

Results and discussion of simulation study
An inspection of Tables 1-3 show that the estimates of the parameters (Est.) in the simulation study obtained by all methods seem to be close to the true parameter values.Moreover, all of the estimators have monotonic behaviors according to the MSE, namely, when n increases, the estimated MSE values decrease.The Bayes estimators have a smaller MSE than the estimators of MLE.The MSE of the Bayes(L) outperforms other methods in almost all situations.Additionally, note that the MSE for all estimators of the three Bayesian methods behave very similarly when n = 200.Conversely, the MLE presents the highest MSE but has a satisfactory performance when sample size increases.In addition" note that the MSE of the estimators of α 2 are very high and may cause what we define as a strong effect on x 2 or the high variance of the estimators of α 2 .Furthermore, the explanatory variable affects the MSE of estimators.However, when n is large enough and β increases, the MSE of estimators will decrease and will no longer be very high anymore.Tables 4-6 show that when sample sizes were increased, the CP of all methods was generally close to the nominal confidence level.The CP obtained when using the three Bayesian methods are closer to the nominal level than using MLE method.Additionally, the CP for the three prior distributions of the Bayesian method behave remarkably similar.Regarding the AL, as sample sizes were increased, the AL of the 95% confidence intervals decreased for all methods.For cases β = 0.9 and β = 1.6, the AL based on the Bayes(L) were the shortest for almost all situations after the CP was considered, while the Bayes(U) were the shortest in the case of β = 2.5.Although the AL for the MLE method performs the shortest in some situations, but the CP is farthest from the nominal confidence level.Additionally, the results of the AL for the estimator α 2 are quite wide, which is the explanation given for the extremely high MSE values of the estimator α 2 .

Conclusions
This paper introduced the Bayesian estimation for the discrete Weibull regression model via the median link function.The Bayesian approach was considered on the three different prior distributions: Uniform non-informative prior, normal prior and Laplace prior.The augmented random walk Metropolis procedure was also proposed to compute the Bayes estimates of the unknown parameters.Moreover, the maximum likelihood estimation was compared.The performance of these methods was compared by using the Monte Carlo simulation based on the MSE and CP criteria.These criteria were calculated for different sample sizes based on both under-dispersion and over-dispersion data, along with the application of the methods illustrated by using a real dataset available on the literature
Note: the boldface identifies the smallest MSE for each case.
Note: the boldface identifies the smallest MSE for each case.