Performance of Alternative Estimators in the Poisson-Inverse Gaussian Regression Model: Simulation and Application

The study proposed and compared the biased estimators for the Poisson-Inverse Gaussian regression model to deal with correlated regressors. The limitations of each biased estimator are also discussed. Additionally, some biasing parameters for the Stein estimator are proposed. The performance of estimators is evaluated with the help of a simulation study and a real-life application based on the minimum mean squared error criterion. The simulation and application findings favor the ridge estimator with specific biasing parameters because it provides less variation than others.


Introduction
In regression modeling, when the response variable only contains non-negative integers, then count response models are considered for modeling purposes.These models are considered to find the independent variables that might significantly impact the response variable [1,2].The Poisson model is a frequently used discrete model that assumes the variance equal to its mean (equi-dispersed).Sometimes, count variables exhibit over-dispersion or under-dispersion [3], i.e., the variation is greater than the mean or vice versa; therefore, the Poisson model provided misleading results.Hence, the negative binomial, Gamma, Quasi-Poisson, Bell, Conway-Maxwell Poisson, or Poisson-Inverse Gaussian (PIG) models are used instead of the Poisson model when count data is over-dispersed or under-dispersed.
There are heavy-tailed count data in many fields, including actuarial science, biology, computer science, electronic engineering, and medical research.These data sets often contain a small number of extremely big integer values, far from the majority of values close to zero.For such data sets, the practitioners aim to consider a model that handles the major part of the data and the heavy tail.The PIG model is one of the best choices in such cases, as it has a wider range of skewness than the negative binomial and is also frequently considered in heavy-tailed count data [4].
The most common assumption of the linear regression model (LRM) is that the explanatory variables must be uncorrelated.However, most of the time, the explanatory variables could have strong or almost strong linear relations.In that scenario, the independence assumption of the explanatory variables is violated, leading to the multicollinearity problem [5] and an estimation problem in the PIG model.Typically, the maximum likelihood estimator (MLE) is employed to estimate the unknown regression coefficients of the PIG model.Multicollinearity, however, causes an increase in the MLE's variance and could lead to inaccuracy of parameter estimations [6].To counteract the unfavorable impacts of multicollinearity, we, therefore, require some alternative estimation techniques.These alternative methods include the Stein estimator, ridge estimator, Liu estimator, and some others.
To avoid the impact of multicollinearity, Stein [7] suggested an estimator for the LRM known as the James-Stein estimator (JSE).There are few studies on the JSE for the GLM; for example, Schaefer [8] considered the JSE for the logistic regression model.Further, a JSE for the inverse Gaussian response model was studied by Akram et al. [9], and recently, Amin et al. [10] suggested a JSE for the Poisson regression.
A ridge estimator (RE) was introduced by Hoerl and Kennard [11], which is also an alternative to the ordinary least squares estimator and the JSE for the LRM to obtain more efficient estimates with correlated explanatory variables.The RE was also considered for various statistical models.The RE was first presented for the Poisson regression model by M`nsson and Shukur [12].To mitigate the impact of multicollinearity on estimation, the RE approach has also been expanded for various count data models.For instance, M`nsson [13] proposed the RE for the negative binomial regression model, Twrkan and Qzel [14] proposed the modified jackknifed RE for the Poisson model, Kabiranlar and Dawoud [15] introduced some ridge parameters for the Poisson model, Zaldivar [16] considered the performance of some REs for the Poisson model, Rashad and Algamal [17] developed a new RE and also proposed the restricted RE for the Poisson model as given by Yehia [18].Liu [19] discussed some limitations of the RE and proposed another alternative estimator known as the Liu estimator (LE) to overcome the effect of multicollinearity.Because the LE is a linear function of the Liu parameter d, it has an advantage over the RE.The LE was also considered in the literature for different statistical models.New Liu parameter estimators for the LE in the Poisson model were introduced by Månsson et al. [20].Månsson [21] additionally extends the LE to the negative binomial model.Qasim et al. [22] investigated the LE of the gamma regression model.Recently, the LE was considered for the Conway-Maxwell Poisson model by Akram et al. [23].In contrast to the standard LE, a different estimator known as a modified one-parameter LE addresses multicollinearity more efficiently than the LE.One of the drawbacks of Liu's [19] shrinkage parameter d is that it frequently produces a negative value, which significantly impacts the estimator's performance [24].Modified LE for the LRM was proposed by Lukman et al. [24].This estimator was also considered by Amin et al. [25,10] for the inverse Gaussian and Poisson regression models, respectively.Recently, the modified oneparameter LE for the Conway-Maxwell Poisson model was presented by Sami et al. [26].Various other models and estimators have also been studied by experts such as Prediction modeling using deep learning for the classification of grape-type dried fruits [27]; Gender determination from periocular images using deep learning-based Efficient Net architecture [28]; Revenue forecast models using hybrid intelligent methods [29].By the motivation of the above-stated studies, this study is designed to consider some alternative estimators for the estimation of the PIG model parameters to reduce the effect of multicollinearity.To evaluate the performance of the considered estimators, a Monte Carlo simulation experiment is performed with a variety of parametric conditions for different parameters, including the sample size, the number of explanatory variables, the dispersion factor, and different levels of correlation among the explanatory variables.Further, a real-life application related to mussel data is also considered to highlight the importance of the stated proposal.
The remainder of the study is structured as follows: The definition, estimation of the PIG model, and alternative estimators with their respective MSEs will be given in Section 2. In Section 3, the simulation's structure and results under different parametric conditions will be described.An empirical study will be considered in Section 4 to discuss the performance of the alternative estimators.Lastly, in Section 5, brief concluding remarks will be given.

The PIG Regression Model
Holla [30] proposed the PIG distribution as a combination of the Poisson and Inverse Gaussian distributions.Let y/w have a Poisson distribution with mean  and  have an inverse Gaussian distribution with mean 1 and a dispersion parameter .Assuming the marginal probability density function of y is where  =  − The PIG distribution's parameters λ an , have non-negative values.The log-likelihood is often used to obtain the MLE of parameters easily.Let (, ) be the log-likelihood, then for a given model, it is given as Let   = exp(  ′ ),  = 1,2, … ,, where x is the matrix of covariates of order  × , where  =  + 1,  is the vector of regression coefficients.
Our interest is to estimate the PIG model.To estimate the PIG model, we have to find β.To find β, we take the partial derivative of Eq. ( 2) w.r.t  and equate it to zero, we have Eq. ( 3) is a score function.A model's unknown parameters are typically estimated using the MLE.
As the PIG model is non-linear in , the MLE.So, this model also requires the iterative reweighted least square (IRLS) technique.At the final iteration, the MLE is defined by where ̂+ ̂3 is a vector of the adjusted response variable, and  ̂ is a matrix of weights that is,  ̂= diag(̂+ ̂3).For findingthematrixmeansquared(MMSE) andscalarMSE(SMSE)of theestimators ℚ ′  ̂ and Λ = diag( 1 ,  2 , … ,   )is also identical to ℚℚ ′ , where ℚ is the orthogonal matrix whose columns are the eigenvectors of S; and  1 ≥  2 ≥, … ,   ≥ 0 are the eigenvalues of the matrix S, whereas   , ∀  = 1, … ,  is the jth element of ℚ ′  ̂ .The MMSE of the MLE therefore becomes where  ̂ is the estimated dispersion parameter computed as  ̂= 1 Var( ̂).The SMSE of the  ̂ for the PIG model is defined as where   is the jth eigenvalues of S matrix.

PIG Stein Estimator
To overcome the effects of multicollinearity, Stein [7] suggested the JSE as an alternative estimator to the OLS in the LRM.The JSE is one of the alternative estimators for the logistic regression model that Schaefer [8] proposed by generalizing Stein's work [7].Akram et al. [9] proposed the JSE for the inverse Gaussian regression model.Recently, Amin et al. [10] suggested the JSE for the count regression model.To overcome the effect of collinearity in the PIG model, we also consider the JSE named PIG Stein estimator (PIGSE) as where (0 <  < 1) Stein parameter.The proposed PIGSE's bias and variance are defined as follows Using Eq.( 8) and Eq. ( 9), the MSE of PIGSE becomes After simplification, we get,

PIG Ridge Estimator
The performance of the MLE method in the presence of multicollinearity is inferior because it dramatically affects the regression estimates.A method called the ridge estimation method is most often used to overcome the collinearity effect.Hoerl and Kennard [11,32] introduced the concept of RE, where the optimal value of k has a significant contribution to obtaining better estimates.Different studies have recommended different strategies for choosing the shrinkage parameter for RE.To overcome the impact of multicollinearity and increase the estimator's effectiveness, the literature on various ridge parameters for the RE in the LRM and the GLM is available [33,10,26].So, the RE for the PIG model is given by where  > 0 is the ridge parameter, while   is the identity matrix of order  ×  and  =  ′  ̂.The bias and covariance matrices of the ridge estimator are respectively computed as Bias( ̂) = −ℚΛ  −1 . ( 14) The MMSE of  ̂ can be written as MMSE( ̂) = Cov( ̂) + Bias( ̂)Bias( ̂) ′ .

PIG Liu Estimator
The linearity of the shrinkage parameter is the primary motivation for employing the Liu estimation method.Moreover, this estimator performs better than the RE in terms of performance [34] According to the findings of several investigations, the Liu estimation method is more effective than the ridge at reducing the impact of collinearity.The detailed derivation for the PIG Liu estimator (PIGLE) is discussed below where   = ( ′  ̂ +   ) −1 ( ′  ̂ +   ) and (0 <  < 1) is the Liu parameter.Thus, the bias and covariance matrix of Eq. ( 18) can be respectively computed as The MMSE and SMSE of the PIGLE are respectively given as where Λ  = diag( 1 + ,  2 + , … ,   +  and Λ  = diag( 1 + 1,  2 + , …   + ) .Finally, the SMSE of the PIGLE can be defined as

PIG Improved Liu Estimator
The shrinkage parameter of the LE, as given by Liu [19], has the drawback of frequently returning a negative value, which impairs the effectiveness of the estimators.Lukman et al. [24] introduced the improved LE for the LRM to overcome this limitation.The PIG improved LE (PIGILE) is defined as where  * is the shrinkage parameter of the PIGLE.This modification provides a substantial improvement in the performance of the PIGILE and will give a positive value of the shrinkage parameter  * .

Selection of the Biasing Parameters
Every biased estimator based on its biasing parameters and the choice of biasing parameters contributes a significant role in estimating more accurate and reliable regression estimates with multicollinearity.Hoerl and Kennard [11]

Simulation Layout
In this part, we describe the Monte Carlo Simulations design, which is used to compare the performances of the different alternative estimators.To conduct a valuable simulation, we need to specify the effective properties of the estimators and the performance criteria.Effective factors in this simulation are the degree of correlation ρ among the explanatory variables, the number of explanatory variables p, and assumed dispersion parameter  = 2,4,6 and 8 , and the sample sizen = 50,100,150 and 200.The MSE of the estimators is chosen for comparison with alternative estimators as the performance evaluation criteria.To generate the explanatory variables, we used the following generally used expression described by Kibria [33] and some others.
where   are the independent standard normal pseudo-random numbers.The response variable of the PIG model is obtained from the PIG(  , ) distribution, where   = exp( 0 +  1  1 + ⋯ +     ).
The regression parameters are chosen such that ∑  =1   2 = 1, which is a commonly used limit [25].
We consider different degrees of correlation  = 0.

Results and Discussion
For different values of , ,  and , the EMSE of the considered estimator are presented in Tables 1-8 for  = 3 and 6.Moreover, some results are also shown in Figures 1 − 4 for  = 12.The performance of the alternative estimators is evaluated based on specific factors such as different sample sizes, dispersion parameters, correlation among the explanatory variables, and different numbers of explanatory variables.The summary of the simulation study is as follows 1) In the initial step, we study the effect of the sample size on the MSE.It is observed from Tables 1-8 that the sample size has an indirect effect on the estimated MSEs of the estimators.These results when the sample size increased from 50 to 200, the MSE values decreased both for the MLE and the considered alternative estimators.
2) The second factor that may influence the performance of the alternative estimators is the degree of correlation among the explanatory variables.The simulated results showed that the multicollinearity has a direct impact on the estimated MSEs of the estimators.From Tables 1-8, one can see that the MSE increased by increasing the degree of correlation.In this case, the MSE remained efficient and lower for all of our considered alternative estimators than the MLE.
3) The third factor that may influence the MSEs of the estimators is the number of explanatory variables involved in the model.To check the effect of explanatory variables on the MSEs, we consider p = 3, 6, and 12 by keeping all other factors constant, including sample size, collinearity level, and dispersion level.The simulation findings indicate that when we increase the number of explanatory variables while all the other factors are fixed, the MSE increases for the MLE the PIGSE, PIGRRE, PIGLE, and the PIGILE with different parameter estimators.Thus, we can conclude that the simulated MSEs of the considered estimators are directly correlated with the number of explanatory variables.
4) The MSEs of the MLE, the PIGSE, PIGRRE, PIGLE, and the PIGILE, along with different parameter estimators, are also affected by the value of the dispersion parameter.To check the effect of dispersion level on the estimated MSEs, we consider  = 2,4,6 and 8 by keeping all other factors constant, including sample size, collinearity level, and a number of explanatory variables.The findings demonstrate that the estimated MSEs of the studied estimators rise when the dispersion level is raised while all other factors remain constant.Hence, we can say that the dispersion level directly impacts the simulated MSEs (see Figure 3).
5) For all parametric conditions, the same results can be observed for  = 12, and the simulation results are shown in Figures 1-4.
6) .On comparing the performance of the considered alternative estimators, it is observed that mostly the PIGRE with biasing parameter  2 gives better performance than PIGSE, PIGLE, and PIGILE.For smaller , larger dispersion, and very high multicollinearity, the PIGILE performs better than other alternative estimators.The second best estimator is the PIGSE with biasing parameter  2 as compared to the MLE, PIGLE, and PIGILE.

Application: Mussels Data
We use the Mussels data application to assess the considered estimators' performance practically.This data set is taken from Sepkoski and Rex [36].Details about the data set are given in Table 9.As the response variable is in the form of counts, we consider this application for the performance evaluation of the PIGSE, PIGRRE, PIGLE, and PIGILE.There are many count models, and we have selected one that is suitable.We use the Akaike information criteria (AIC) as the suitable model selection criteria to identify the suitable model.The AIC values of these count data models are found to be Poisson (311.55),Negative Binomial (275.99),COMP (313.55), and PIG (275.35).We observed that PIG has a minimum AIC value from these fitted models.This shows that the PIG model fits the Mussels data well.The estimated dispersion parameter was found to be  = 2.54, which indicated that the data is overdispersed.As there are 10 explanatory variables, there may be a chance of multicollinearity.The problem of multicollinearity is tested using the condition index (CI).We observed that the  = √    4), ( 7),( 12), (18) and Eq. ( 23) respectively.Meanwhile, the MSEs of the MLE, the PIGSE, PIGRRE, and PIGILE are calculated using Eqs.( 6), ( 11), ( 17),( 22) and ( 26) respectively.Results of Table 10 show that the MLE is not a reliable estimation method in the existence of multicollinearity.On the contrary, the competitive estimators are considerably more efficient than the traditional MLE.On comparing the performance of the alternative biased estimators, it is found that PIGRRE provides better performance than the other alternative biased estimators.The second best one is the PIGSE as the others, i.e. the PIGLE and PIGILE.These results also support the simulation findings.Hence, mussel's data findings showed that the PIGRRE is the best alternative biased estimator to the PIGSE, PIGLE, and PIGILE in dealing with multicollinearity.

Concluding Remarks
The PIG model is one of several count data models that can be used to address the over-dispersion problem.The MLE based on the iterative method is used to estimate the coefficients of the PIG model.If the explanatory variables in the PIG model are highly correlated, the MLE will not produce accurate results.To reduce the effect of correlation among explanatory variables, we considered some alternative estimators as compared to the MLE for better estimation.These alternative estimators include PIGSE, PIGRRE, PIGLE and PIGILE.The main purpose of this study is to compare alternative estimators' performance with correlated explanatory variables.This study also proposed some biasing parameters for the PIGSE, and the PIGRRE is considered the best biasing parameter, as Batool et al. [35] suggested.The biasing parameters for the PIGLE and PIGILE are considered from the initial work of Liu [19] and [6].We considered the simulation study and a real application for comparison purposes, where MSE is used as a performance evaluation criterion.The simulation findings showed that dispersion, multicollinearity, and the number of explanatory variables directly affect the performance of the PIG model's estimators while sample size indirectly affects the performance of the estimators.Simulation findings showed that for smaller explanatory variables with very high multicollinearity and larger dispersion, the PIGILE is the better alternative estimator as other alternative estimators.We found from the simulation results that PIGRRE performs better than PIGSE, PIGLE, and PIGILE.The real application results support the simulation findings.So, based on simulation and application results, we suggest using PIGRRE instead of other alternative estimators and the MLE to deal with multicollinearity.
85.This implies the explanatory variables reveal severe multicollinearity.
[33,22,10,26,35]ed a biasing parameter for the RE in the LRM.Later on, several others proposed different biasing parameters for different biased estimators in the LRM and the GLM[33,22,10,26,35].Based on the available studies, we first suggest the Stein parameters for the PIGSE.For this purpose, we propose the following biasing parameters for the PIGSE 80,0.90,0.95 and 0.99.The sample size is such that  = 50,100,150 and 200.The numbers of explanatory variables are chosen as  = 3,6 and 12.And dispersion parameter as  = 2,4,6 and 8.For the values of , ,  and , a program is created in R-software to evaluate the estimated (EMSE) characteristics, which has been replicated  = 1000 times.The EMSE calculation formula is defined as

Table 4 . 11 Table 5 .
EMSEs of the PIGM Estimators for  = 3 and  = 2 Performance of Alternative Estimators in the Poisson-Inverse Gaussian Regression Model: Computation and Application EMSEs of the PIGM Estimators for  = 3 and  = 2

Table 9 .
Variables Notations and Descriptions of the Mussels Data

Table 10 .
PIG model's estimates and MSEs of the considered estimators

Table 10
indicates the estimated coefficients of the MLE, the PIGSE, PIGRRE, PIGLE, and the PIGILE, which are computed using Eqs.(