Introduction

The Poisson Regression Model (PRM) is one of the benchmark models for count data in much the same way as the normal linear regression model is the benchmark for continuous data1. In the PRM, \(y_{i}\) is the response variable and follows a Poisson distribution with mean \(\mu_{i}\), then the probability function is defined as

$$f\left( {y_{i} } \right) = \frac{{e^{{ - \mu_{i} }} \mu_{i}^{{y_{i} }} }}{{y_{i} !}}, \quad i = 1,2, \ldots ,n, y_{i} = 0,1,2, \ldots$$
(1)

where \(\mu_{i}\) is expressed by using canonical log link function and a linear combination of explanatory variables as follows \(\mu_{i} = \exp \left( {x^{\prime}_{i} \beta } \right)\) where \(x^{\prime}_{i}\) is the ith row of X, which is an \(n \times \left( {p + 1} \right)\) data matrix with p explanatory variables and \(\beta\) is a \(\left( {p + 1} \right) \times 1\) vector of coefficients.

The Maximum Likelihood method is the well-known estimation technique to estimate the model parameters in PRMs2. The log-likelihood function for PRM is given as follows

$$l(\beta ) = \sum\limits_{i = 1}^{n} {y_{i} x^{\prime}_{i} \beta - \exp \left( {x^{\prime}_{i} \beta } \right) - \log \left( {y_{i} !} \right).}$$
(2)

The Maximum Likelihood Estimator (MLE) of \(\beta\) is obtained by maximizing the log-likelihood function, so the following equations are obtained as

$$S(\beta ) = \frac{\partial l(\beta ;y)}{{\partial \beta }} = \sum\limits_{i = 1}^{n} {\left[ {y_{i} - \exp \left( {x^{\prime}_{i} \beta } \right)} \right]} x_{i} = 0.$$
(3)

Since Eq. (3) is nonlinear function of parameter \(\beta\), the solution of \(S\left( \beta \right)\) is obtained using the following iteratively reweighted least squares (IRLS) algorithm

$$\hat{\beta }_{MLE} = \left( {X^{\prime}\hat{W}X} \right)^{ - 1} X^{\prime}\hat{W}Z,$$
(4)

where Z is an n-dimensional vector with the ith element \(z_{i} = \log \left( {\hat{\mu }_{i} } \right) + \frac{{y_{i} - \hat{\mu }_{i} }}{{\hat{\mu }_{i} }}\) and \(\hat{W} = {\text{diag}} \left[ {\hat{\mu }_{i} } \right]\)3. The iteration ends when the difference between the old and updated values is less than a given small value, which is usually \(10^{ - 8}\)4. The asymptotic variance–covariance matrix of \(\hat{\beta }_{MLE}\) is \(cov\left( {\hat{\beta }{}_{MLE}} \right) = \left( {X^{\prime}\hat{W}X} \right)^{ - 1} .\)

Besides being a widely used estimator of MLE, one of its major disadvantages is that parameter estimates become unstable in the case of multicollinearity5,6,7,8,9,10,11,12,13. The multicollinearity problem, which occurs because of the approximately linear relationship between the explanatory variables, affects the estimates of model parameters in the PRMs as well as in the linear regression models. One effect of the multicollinearity between explanatory variables is that the variance of the MLE becomes so large that the estimates of the model parameters become unstable14,15,16,17,18,19,20.

In order to reduce the undesirable effects of multicollinearity, the biased estimators that are alternative to the MLE are generalized in a manner similar to that introduced in the linear regression model. For example, Månsson and Shukur18 proposed the Poisson Ridge Estimator (PRE) as follows:

$$\hat{\beta }_{PRE} = \left( {X^{\prime}\hat{W}X + kI} \right)^{ - 1} X^{\prime}\hat{W}X\hat{\beta }_{MLE} {, }\quad k > 0,$$
(5)

where \(k\) is a biasing parameter. The PRE is the generalization of the Ridge estimator introduced by Hoerl and Kennard21 for the linear regression model.

Månsson et al.19, Amin et al.22 and Qasim et al.23 defined the Poisson Liu Estimator (PLE) as follows:

$$\hat{\beta }_{PLE} = \left( {X^{\prime}\hat{W}X + I} \right)^{ - 1} \left( {X^{\prime}\hat{W}X + dI} \right)\hat{\beta }_{MLE} ,$$
(6)

where \(0 < d < 1\) is a biasing parameter. The PLE is the generalization of the Liu estimator introduced by Liu24 for the linear regression model.

In recent years, the estimators with two biasing parameters have been proposed as an alternative to PRE and PLE. The purpose of estimators with two biasing parameters obtained by combining several estimators is to obtain more suitable estimators for parameter estimates. In this context, Algamal25 defined the Poisson Liu-type estimator (PLTE) for the PRMs as follows:

$$\hat{\beta }_{PLTE} = \left( {X^{\prime}\hat{W}X + kI} \right)^{ - 1} \left( {X^{\prime}\hat{W}X - dI} \right)\hat{\beta }_{MLE} ,$$
(7)

where \(k{ > 0}\) and \(d \in R\) are the biasing parameters. The PLTE is a generalization of the Liu-type estimator, which is firstly introduced by Liu26. The PLTE is based on the biasing parameters \(k\) and \(d\).

Moreover, Asar and Genç15 and Çetinkaya and Kaçıranlar16 proposed another biased estimator with two biasing parameters, defined by Özkale and Kaçıranlar27 for the linear regression models. The Poisson two-parameter Estimator (PTPE) is defined as:

$$\hat{\beta }_{PTPE} = \left( {X^{\prime}\hat{W}X + kI} \right)^{ - 1} \left( {X^{\prime}\hat{W}X + kdI} \right)\hat{\beta }_{MLE} ,$$
(8)

where \(k{ > 0}\) and \(0{ < }d{ < 1}\) are the biasing parameters.

As an alternative to the estimators introduced so far, Akay and Ertan5 proposed a general Improved Liu-type Estimator (ILTE) which includes MLE, PRE, PLE, PLTE and PTPE as special cases as follows:

$$\hat{\beta }_{ILTE} = \left( {X^{\prime}\hat{W}X + kI} \right)^{ - 1} \left( {X^{\prime}\hat{W}X + f\left( k \right)I} \right)\hat{\beta }{\kern 1pt}^{*} , k > 0,$$
(9)

where \(\hat{\beta }^{*}\) is any estimator of \(\beta\) and \(f\left( k \right)\) is a continuous function of the biasing parameter k. The estimator given in (9) is a generalization of the Liu-type estimator proposed by Kurnaz and Akay28 for linear regression models.

In the literature, many estimators proposed for linear regression models can be generalized to be applied to PRMs. For example, the estimator depending on the Ridge estimator in linear regression models was proposed by Yang and Chang29. In this sense, the biased estimator proposed by Yang and Chang29 is adapted from the PRMs by Asar and Genç15. In addition, this estimator is applied to Negative Binomial regression models by Huang and Yang30. Depending on the PRE, the estimator given by Huang and Yang30 in the literature has been as follows:

$$\hat{\beta }_{PHY} \left( {k,d} \right) = \left( {X^{\prime}\hat{W}X + I} \right)^{ - 1} \left( {X^{\prime}\hat{W}X + dI} \right)\left( {X^{\prime}\hat{W}X + kI} \right)^{ - 1} X^{\prime}\hat{W}X\hat{\beta }_{MLE} ,\quad k > 0, 0 < d < 1,$$
(10)

where k and d are two biasing parameters. Although the estimator given in (10) is depending on the PRE, it is a general estimator which includes the MLE, PRE, and PLE as special cases, too.

From this point of view, another estimator depending on the Ridge estimator in linear regression models was proposed by Sakallıoğlu and Kaçıranlar31, and is defined by Sakallıoğlu and Kaçıranlar31 which is defined as:

$$\hat{\beta }_{SK} \left( {k,d} \right) = \left( {X^{\prime}X + I} \right)^{ - 1} \left( {X^{\prime}X + \left( {k + d} \right)I} \right)\hat{\beta }_{RE} { ,}\quad k > 0, - \infty < d < \infty ,$$
(11)

where k and d are two biasing parameters and \(\hat{\beta }_{RE} = \left( {X^{\prime}X + kI} \right)^{ - 1} X^{\prime}Y\). In this context, we can generalize the (11) estimators suggested for PRMs. Based on the PRE, we can generalize the estimator proposed by Sakallıoğlu and Kaçıranlar31 given in (11) as follows:

$$\hat{\beta }_{PSK} \left( {k,d} \right) = \left( {X^{\prime}\hat{W}X + I} \right)^{ - 1} \left( {X^{\prime}\hat{W}X + \left( {k + d} \right)I} \right)\hat{\beta }_{PRE} { ,}\quad k > 0, - \infty < d < \infty ,$$
(12)

where k and d are two biasing parameters. In this case, the estimator given in (12) is a general estimator which includes the MLE, PRE and PLE as special cases. Best of our knowledge, no study has been conducted about estimator in (12) for the PRMs.

In PRMs, it is known that the performance of biased estimators proposed as an alternative to MLE is generally affected by the value of the biasing parameter. In general, the methods used for the estimation of biasing parameters have been adapted similarly to those used in linear regression models. On the other hand, the use of estimators with two biasing parameters has become increasingly widespread in recent years. However, one of the most important problems for estimators with two biasing parameters is finding optimal estimates of the biasing parameters is difficult. For this purpose, many iterative techniques have been proposed to estimate these biasing parameters. In these cases, one of the biasing parameters can be estimated depending on the other biasing parameter, or vice versa15,16,30. Thus, the idea arises that an unknown functional relationship may exist between these two biasing parameters.

Based on the information mentioned above, our aim in this article is to introduce a new general class of estimators that arises when there is a functional relationship between the biasing parameters. In addition, the proposed general estimator can be defined to specifically include the estimators given by (4), (5), (6), (10) and (12). Thus, this proposed estimator constitutes a general class of estimators like the estimator given in (9). It is a more efficient alternative estimator when compared with the one defined in (9) which can overcome multicollinearity in the PRMs. Another purpose of this article is to compare these two class estimators with a simulation study under some conditions.

The remainder of the article is organized as follows: In "A new general biased estimator", a new biased estimator is defined and some of its properties are given. The superiority of this estimator over the other biased estimators under the matrix mean square error sense are shown in "The superiority of the PRTE in PRMs". In "Determination of function", several rules are proposed to determine the relationship between the biasing parameters. Two separate Monte Carlo simulation studies are executed in "The Monte Carlo simulation studies". In "Numerical example: the aircraft damage data", a real numerical example is provided to evaluate the performances of the proposed biased estimators. Some concluding remarks are given in "Some concluding remarks".

A new general biased estimator

For PRMs, we can define a new general class of estimators including (4), (5), (6), (10) and (12) estimators based on the PRE estimator as follows:

$$\hat{\beta }_{PRTE} = \left( {X^{\prime}\hat{W}X + I} \right)^{ - 1} \left( {X^{\prime}\hat{W}X + g\left( k \right)I} \right)\hat{\beta }_{PRE} , k > 0,$$
(13)

where \(g\left( k \right)\) is a continuous function of the biasing parameter \(k\). When we select \(g\left( k \right)\) as a linear function of the biasing parameter k such as \(g\left( k \right) = ak + b\) where \(a,b \in R\), the Poisson Ridge-type estimator (PRTE) is a general estimator which includes the other biased estimators as special cases:

\(\hat{\beta }_{PRTE} = \hat{\beta }_{MLE}\) for \(g\left( 0 \right) = 1\) where \(k = 0\) and \(b = 1\).

\(\hat{\beta }_{PRTE} = \hat{\beta }_{PRE}\) for \(g\left( k \right) = 1\) where \(a = 0\) and \(b = 1\).

\(\hat{\beta }_{PRTE} = \hat{\beta }_{PLE}\) for \(g\left( 0 \right) = b\) where \(a = 0\) and \(b\) corresponds to the biasing parameter d.

\(\hat{\beta }_{PRTE} = \hat{\beta }_{PHY} \left( {k,d} \right)\) for \(g\left( k \right) = b\) where b corresponds to the biasing parameter d.

\(\hat{\beta }_{PRTE} = \hat{\beta }_{PSK} \left( {k,d} \right)\) for \(g\left( k \right) = k + b\) where \(a = 1\) and b corresponds to the biasing parameter d.

Note that, the proposed estimator given in (13) is different form the biased estimator given in (9). That is, when we use \(\hat{\beta }_{PRE}\) instead of \(\hat{\beta }^{*}\) in (9), the resulting estimator \(\hat{\beta }_{ILTE(PRE)}\) is given as follows:

$$\hat{\beta }_{ILTE(PRE)} = \left( {X^{\prime}\hat{W}X + kI} \right)^{ - 1} \left( {X^{\prime}\hat{W}X + f\left( k \right)I} \right)\hat{\beta }{\kern 1pt}_{PRE} , k > 0,$$
(14)

where \(f\left( k \right)\) is a continuous function of the biasing parameter \(k\). Note that the estimator given in (14) does not exactly correspond to the estimators given by (10) and (12), respectively. To show that the estimators given in (13) and (14) are different estimators, let’s examine the asymptotic scalar mean square error (SMSE) and asymptotic matrix mean square error (MMSE) of these estimators.

In order to obtain the asymptotic SMSE and the asymptotic MMSE of an estimator, we denote \(\alpha = Q^{\prime}\beta ,\) \(\Lambda { = }diag\left( {\lambda_{1} ,...,\lambda_{p + 1} } \right) = Q^{\prime}\left( {X^{\prime}\hat{W}X} \right)Q,\) where \(\lambda_{1} \ge \lambda_{2} \ge \cdots \lambda_{p + 1} > 0\) are the ordered eigenvalues of \(X^{\prime}\hat{W}X, Q\) is the orthogonal matrix whose columns constitute the eigenvectors of \(X^{\prime}\hat{W}X\) and the ith element of \(Q^{\prime}\beta\) is denoted as \(\alpha_{j} ,j = 1,2,...,p + 1.\)

The asymptotic SMSE and the asymptotic MMSE of an estimator \(\hat{\beta } = H\hat{\beta }_{MLE} ,\) where \(H\) is an \(\left( {p + 1} \right) \times \left( {p + 1} \right)\) matrix, are defined as:

$$\begin{aligned} & MSEM\left( {\hat{\beta }} \right) = E\left( {\hat{\beta } - \beta } \right)\left( {\hat{\beta } - \beta } \right)^{\prime } = H\left( {\hat{\beta }_{MLE} - \beta } \right)\left( {\hat{\beta }_{MLE} - \beta } \right)^{\prime } H^{\prime} + \left( {H\beta - \beta } \right)\left( {H\beta - \beta } \right)^{\prime } \\ & SMSE\left( {\hat{\beta }} \right) = E\left( {\hat{\beta } - \beta } \right)^{\prime } \left( {\hat{\beta } - \beta } \right) = \left( {\hat{\beta }_{MLE} - \beta } \right)^{\prime } H^{\prime}H\left( {\hat{\beta }_{MLE} - \beta } \right) + \left( {H\beta - \beta } \right)^{\prime } \left( {H\beta - \beta } \right). \\ \end{aligned}$$
(15)

Note that there is a relationship \(SMSE\left( {\hat{\beta }} \right) = tr\left( {MMSE\left( {\hat{\beta }} \right)} \right)\) between MMSE and SMSE criteria. Because of the relation of \(\alpha = Q^{\prime}\beta\); \(\hat{\beta }_{MLE} , \hat{\beta }_{PRE} , \hat{\beta }_{PLE} , \hat{\beta }_{PLTE} , \hat{\beta }_{ILTE}\) and \(\hat{\beta }_{PRTE}\) have the same SMSE values as \(\hat{\alpha }_{MLE} , \hat{\alpha }_{PRE} , \hat{\alpha }_{PLE} , \hat{\alpha }_{PLTE} , \hat{\alpha }_{ILTE}\) and \(\hat{\alpha }_{PRTE}\), respectively.

Using (9), (13) and (14), it is easily computed that

$$\begin{aligned} MMSE\left( {\hat{\beta }_{ILTE} } \right) & = Q\left( {\left( {\Lambda + kI} \right)^{ - 1} \left( {\Lambda + f\left( k \right)I} \right)\Lambda^{ - 1} \left( {\Lambda + f\left( k \right)I} \right)\left( {\Lambda + kI} \right)^{ - 1} } \right. \\ & \quad \left. { + \left( {f\left( k \right) - k} \right)^{2} \left( {\Lambda + kI} \right)^{ - 1} \alpha \alpha^{\prime}\left( {\Lambda + kI} \right)^{ - 1} } \right)Q^{\prime} \\ \end{aligned}$$
(16)
$$\begin{aligned} MMSE\left( {\hat{\beta }_{{ILTE(PRE)}} } \right) & = Q\left( {\left( {\Lambda + kI} \right)^{{ - 1}} \left( {\Lambda + f\left( k \right)I} \right)\left( {\Lambda + kI} \right)^{{ - 1}} \Lambda \left( {\Lambda + kI} \right)^{{ - 1}} \left( {\Lambda + f\left( k \right)I} \right)\left( {\Lambda + kI} \right)^{{ - 1}} } \right. \\ & \quad \left. { + \left( {\Lambda + kI} \right)^{{ - 1}} \left( {f\left( k \right)\Lambda - 2k\Lambda - k^{2} I} \right)\left( {\Lambda + kI} \right)^{{ - 1}} \alpha \alpha ^{\prime } \left( {\Lambda + kI} \right)^{{ - 1}} \left( {f\left( k \right)\Lambda - 2k\Lambda - k^{2} I} \right)\left( {\Lambda + kI} \right)^{{ - 1}} } \right)Q^{\prime } . \\ \end{aligned}$$
(17)
$$\begin{aligned} MMSE\left( {\hat{\beta }_{PRTE} } \right) & = Q\left( {\left( {\Lambda + I} \right)^{ - 1} \left( {\Lambda + g\left( k \right)I} \right)\left( {\Lambda + kI} \right)^{ - 1} \Lambda \left( {\Lambda + kI} \right)^{ - 1} \left( {\Lambda + g\left( k \right)I} \right)\left( {\Lambda + I} \right)^{ - 1} } \right. \\ & \quad \left. { + \left( {\left( {g\left( k \right) - k - 1} \right)\Lambda - kI} \right)\left( {\Lambda + I} \right)^{ - 1} \left( {\Lambda + kI} \right)^{ - 1} \alpha \alpha^{\prime}\left( {\Lambda + kI} \right)^{ - 1} \left( {\Lambda + I} \right)^{ - 1} \left( {\left( {g\left( k \right) - k - 1} \right)\Lambda - kI} \right)} \right)Q^{\prime}. \\ \end{aligned}$$
(18)

Moreover, we can give the SMSE functions of ILTE, ILTE (PRE) and PRTE as follows:

$$SMSE\left( {\hat{\beta }_{ILTE} } \right) = \sum\limits_{j = 1}^{p + 1} {\frac{{\left( {\lambda_{j} + f\left( k \right)} \right)^{2} }}{{\lambda_{j} \left( {\lambda_{j} + k} \right)^{2} }}} + \sum\limits_{j = 1}^{p + 1} {\frac{{\left( {f\left( k \right) - k} \right)^{2} \alpha_{j}^{2} }}{{\left( {\lambda_{j} + k} \right)^{2} }}}$$
(19)
$$SMSE\left( {\hat{\beta }_{ILTE(PRE)} } \right) = \sum\limits_{j = 1}^{p + 1} {\frac{{\left( {\lambda_{j} + f(k)} \right)^{2} \lambda_{j} }}{{\left( {\lambda_{j} + k} \right)^{4} }}} + \sum\limits_{j = 1}^{p + 1} {\frac{{\left( {f(k)\lambda_{j} - 2k\lambda_{j} - k^{2} } \right)^{2} \alpha_{j}^{2} }}{{\left( {\lambda_{j} + k} \right)^{4} }}}$$
(20)
$$SMSE\left( {\hat{\beta }_{PRTE} } \right) = \sum\limits_{j = 1}^{p + 1} {\frac{{\lambda_{j} \left( {\lambda_{j} + g\left( k \right)} \right)^{2} }}{{\left( {\lambda_{j} + 1} \right)^{2} \left( {\lambda_{j} + k} \right)^{2} }}} + \sum\limits_{j = 1}^{p + 1} {\frac{{\left( {\left( {g\left( k \right) - k - 1} \right)\lambda_{j} - k} \right)^{2} \alpha_{j}^{2} }}{{\left( {\lambda_{j} + 1} \right)^{2} \left( {\lambda_{j} + k} \right)^{2} }}}$$
(21)

where the first term is the asymptotic variance and the second term is the squared bias. It should be noted that MMSE and SMSE functions of ILTE (PRE) and PRTE are different. Also, the MMSE and SMSE functions of other existing functions can be obtained according to the appropriate selection of \(f\left( k \right)\) and \(g\left( k \right)\).

Let \(\hat{\beta }_{1}\) and \(\hat{\beta }_{2}\) be any two estimators of \(\beta\) parameter. Then, \(\hat{\beta }_{2}\) is superior to \(\hat{\beta }_{1}\) with respect to the MMSE sense if and only if \(MMSE\left( {\hat{\beta }_{1} } \right) - MMSE\left( {\hat{\beta }_{2} } \right)\) is a positive definite (pd) matrix. If \(MMSE\left( {\hat{\beta }_{1} } \right) - MMSE\left( {\hat{\beta }_{2} } \right)\) is a non-negative definite matrix, then \(SMSE\left( {\hat{\beta }_{1} } \right) - SMSE\left( {\hat{\beta }_{2} } \right) \ge 0.\) But, the reverse is not always true32.

In order to compare the MMSEs for the above-mentioned biased estimators, we are using the following theorem.

Theorem 2.1

Let \(A\) be a positive definite matrix, namely \(A > 0,\) and \(c\) nonzero vector. Then, \(A - cc^{\prime}\) is positive definite matrix iff \(c^{\prime}A^{ - 1} c \le 1\)33.

The superiority of the PRTE in PRMs

In this section, we compare the PRTE with the ILTE according to the MMSE criterion. Here, we give a general theorem for comparing estimators with different choices of \(g\left( k \right)\) and \(f\left( k \right)\) functions. In this way, a general theorem is obtained for comparing the estimators mentioned above in terms of MMSE sense.

The following theorem is given to show the superiority of PRTE over ILTE.

Theorem 3.1.

Let be \(k > 0\) and \(- \lambda_{j} - \frac{{\left( {\lambda_{j} + 1} \right)\left( {\lambda_{j} + f\left( k \right)} \right)}}{{\lambda_{j} }} < g\left( k \right) < - \lambda_{j} + \frac{{\left( {\lambda_{j} + 1} \right)\left( {\lambda_{j} + f\left( k \right)} \right)}}{{\lambda_{j} }}\) where j=1,2,...,p+1. Then \(MMSE\left( {\hat{\beta }_{ILTE} } \right) - MMSE\left( {\hat{\beta }_{PRTE} } \right) > 0\) iff

$$\begin{aligned} & bias\left( {\hat{\beta }_{PRTE} } \right)^{\prime } Q\left( {\left( {\Lambda + kI} \right)^{ - 1} \left( {\Lambda + f\left( k \right)I} \right)\Lambda^{ - 1} \left( {\Lambda + kI} \right)^{ - 1} \left( {\Lambda + f\left( k \right)I} \right)} \right. \\ & \quad \left. { - \left( {\Lambda + I} \right)^{ - 1} \left( {\Lambda + g\left( k \right)I} \right)\left( {\Lambda + kI} \right)^{ - 1} \Lambda \left( {\Lambda + kI} \right)^{ - 1} \left( {\Lambda + g\left( k \right)I} \right)\left( {\Lambda + I} \right)^{ - 1} } \right)^{ - 1} Q^{\prime}bias\left( {\hat{\beta }_{PRTE} } \right) < 1 \\ \end{aligned}$$
(22)

where \(bias\left( {\hat{\beta }_{PRTE} } \right) = \left( {\left( {g\left( k \right) - k - 1} \right)\Lambda - kI} \right)Q\left( {\Lambda + I} \right)^{ - 1} \left( {\Lambda + kI} \right)^{ - 1} \alpha\).

Proof

Using (19) and (21), we obtain

$$\begin{aligned} & MMSE\left( {\hat{\beta }_{ILTE} } \right) - MMSE\left( {\hat{\beta }_{PRTE} } \right) = Q\left( {\left( {\Lambda + kI} \right)^{ - 1} \left( {\Lambda + f\left( k \right)I} \right)\Lambda^{ - 1} \left( {\Lambda + kI} \right)^{ - 1} \left( {\Lambda + f\left( k \right)I} \right)} \right. \\ & \quad \left. { - \left( {\Lambda + I} \right)^{ - 1} \left( {\Lambda + g\left( k \right)I} \right)\left( {\Lambda + kI} \right)^{ - 1} \Lambda \left( {\Lambda + kI} \right)^{ - 1} \left( {\Lambda + g\left( k \right)I} \right)\left( {\Lambda + I} \right)^{ - 1} } \right)^{ - 1} Q^{\prime} - bias\left( {\hat{\beta }_{PRTE} } \right)bias\left( {\hat{\beta }_{PRTE} } \right)^{\prime } \\ & \quad = Q diag\left\{ {\frac{{\left( {\lambda_{j} + f\left( k \right)} \right)^{2} }}{{\left( {\lambda_{j} + k} \right)^{2} \lambda_{j} }} - \frac{{\lambda_{j} \left( {\lambda_{j} + g\left( k \right)} \right)^{2} }}{{\left( {\lambda_{j} + 1} \right)^{2} \left( {\lambda_{j} + k} \right)^{2} }}} \right\}_{j = 1}^{p + 1} Q^{\prime} - bias\left( {\hat{\beta }_{PRTE} } \right)bias\left( {\hat{\beta }_{PRTE} } \right)^{\prime } . \\ \end{aligned}$$

\(D = \left( {\Lambda + kI} \right)^{ - 1} \left( {\Lambda + f\left( k \right)I} \right)\Lambda^{ - 1} \left( {\Lambda + kI} \right)^{ - 1} \left( {\Lambda + f\left( k \right)I} \right) - \left( {\Lambda + I} \right)^{ - 1} \left( {\Lambda + g\left( k \right)I} \right)\left( {\Lambda + kI} \right)^{ - 1} \Lambda \left( {\Lambda + kI} \right)^{ - 1} \left( {\Lambda + g\left( k \right)I} \right)\left( {\Lambda + I} \right)^{ - 1}\) is the pd matrix if \(\left( {\lambda_{j} + 1} \right)^{2} \left( {\lambda_{j} + f\left( k \right)} \right)^{2} - \lambda_{j}^{2} \left( {\lambda_{j} + g\left( k \right)} \right)^{2} > 0.\) Thus D is the pd matrix if \(- \lambda_{j} - \frac{{\left( {\lambda_{j} + 1} \right)\left( {\lambda_{j} + f\left( k \right)} \right)}}{{\lambda_{j} }} < g\left( k \right) < - \lambda_{j} + \frac{{\left( {\lambda_{j} + 1} \right)\left( {\lambda_{j} + f\left( k \right)} \right)}}{{\lambda_{j} }}\) and \(k > 0\) where j=1,2,...,p+1. By Theorem 2.1, the proof is completed.

Determination of \(g\left( k \right)\) function

Since the performance of the biased estimators is related to the choice of biasing parameters, it is an important problem to find the optimal biasing parameters for the proposed biased estimators. Different techniques for estimating the biasing parameters in the PRE, PLE, PLTE, PSK and PHY are generalized depending on the similarities between linear regression models and PRMs5,15,16,17,18,19,23,30,34. The performance of PRTE depends on the function \(g\left( k \right)\), and therefore only on the biasing parameter \(k\). It should be noted that we have given the appropriate choice of the \(g\left( k \right)\) function in the introduction that different estimators can be obtained. We may give a method to find the optimal \(g\left( k \right)\) function that approximately minimizes \(SMSE\left( {\hat{\beta }_{PRTE} } \right)\) according to \(k\). Our aim here is to determine the k and \(g\left( k \right)\) functions together, which can make the \(SMSE\left( {\hat{\beta }_{PRTE} } \right)\) function approximately minimum. In other words, our goal here is to choose the appropriate k and \(g\left( k \right)\) functions such that the decrease in the variance term is greater than the increase in squared bias. Note that \(SMSE\left( {\hat{\beta }_{PRTE} } \right)\) is a nonlinear function of the biasing parameter \(k\). So, writing \(h\left( k \right) = SMSE\left( {\hat{\beta }_{PRTE} } \right),\) then we find \(h^{\prime}\left( k \right)\) as follows differentiating \(h\left( k \right)\) with respect to \(k,\)

$$h^{\prime}\left( k \right) = \sum\limits_{j = 1}^{p + 1} {\frac{{2\lambda_{j} \left( {\lambda_{j} - g^{\prime}\left( k \right)\lambda_{j} - g^{\prime}\left( k \right)k + g\left( k \right)} \right)\left[ {\alpha_{j}^{2} \left( {\left( {k + 1 - g\left( k \right)} \right)\lambda_{j} + k} \right) - \left( {\lambda_{j} + g\left( k \right)} \right)} \right]}}{{\left( {\lambda_{j} + 1} \right)^{2} \left( {\lambda_{j} + k} \right)^{3} }}}.$$

When \(h^{\prime}\left( k \right) = 0\), there are two facts as follows;

Fact 1

\(\lambda_{j} \left( {\lambda_{j} - g^{\prime}\left( k \right)\lambda_{j} - g^{\prime}\left( k \right)k + g\left( k \right)} \right) = 0\) differential equation is found. From the solution of this differential equation, we obtain

$$g\left( k \right) = ck + \left( {c - 1} \right)\lambda_{j} ,$$
(23)

where \(c\) is the constant of integration.

Fact 2

\(\alpha_{j}^{2} \left( {\left( {k + 1 - g\left( k \right)} \right)\lambda_{j} + k} \right) - \left( {\lambda_{j} + g\left( k \right)} \right) = 0\) equation is found. We have

$$\begin{array}{*{20}c} {g\left( k \right) = \frac{{\alpha_{j}^{2} \left( {\lambda_{j} + 1} \right)}}{{1 + \lambda_{j} \alpha_{j}^{2} }}k + \frac{{\left( {\alpha_{j}^{2} - 1} \right)}}{{1 + \lambda_{j} \alpha_{j}^{2} }}\lambda_{j} } & {or} & {g\left( k \right) = \frac{{\alpha_{j}^{2} \left( {\lambda_{j} + 1} \right)}}{{1 + \lambda_{j} \alpha_{j}^{2} }}k + \left( {\frac{{\alpha_{j}^{2} \left( {\lambda_{j} + 1} \right)}}{{1 + \lambda_{j} \alpha_{j}^{2} }} - 1} \right)\lambda_{j} } \\ \end{array} .$$
(24)

According to the first and the second facts, it is convenient to choose \(g\left( k \right)\) as a linear function of the biasing parameter k. Note that, \(g\left( k \right)\) which is obtained in Fact 2 is a solution of the differential equation which is obtained in Fact 1. According to the results obtained in Fact 1 and Fact 2, we can propose the following generalizations. Firstly, note that the function \(g\left( k \right)\) given in (23) and (24) makes the \(SMSE\left( {\hat{\alpha }_{PRTE} } \right)\) function approximately minimum for a j value. So, \(g\left( k \right)\) depends on the eigenvalues of \(X^{\prime}WX\), the unknown parameter \(\alpha\) and the estimate of the biasing parameter k. In other words, many functions can be determined depending on the functional relationship given in (23) and (24). For example, the following functional relationships can be proposed for the determination of function \(g\left( k \right)\):

$$g_{1} \left( k \right) = c_{1} k + \left( {c_{1} - 1} \right)\lambda_{\min } \,{\text{where}}\,c_{1} \in \left( {0,1} \right),$$
(25)
$$g_{2} \left( k \right) = \frac{{\alpha_{\min }^{2} \left( {1 + \lambda_{\min } } \right)}}{{1 + \lambda_{\max } \alpha_{\max }^{2} }}k + \left( {\frac{{\alpha_{\min }^{2} \left( {1 + \lambda_{\min } } \right)}}{{1 + \lambda_{\max } \alpha_{\max }^{2} }} - 1} \right)\lambda_{\min } ,$$
(26)
$$g_{3} \left( k \right) = \frac{{\min \left( {\alpha_{j}^{2} \left( {\lambda_{j} + 1} \right)} \right)}}{{n\max \left( {1 + \lambda_{j} \alpha_{j}^{2} } \right)}}k + \left( {\frac{{\min \left( {\alpha_{j}^{2} \left( {\lambda_{j} + 1} \right)} \right)}}{{n\max \left( {1 + \lambda_{j} \alpha_{j}^{2} } \right)}} - 1} \right)\lambda_{\min } ,$$
(27)

where \(\alpha_{\min }^{2}\) and \(\alpha_{\max }^{2}\) are defined as the minimum and maximum value of \(\alpha_{j}^{2} , j = 1,2,...,p + 1,\) respectively. Similarly, \(\lambda_{\min }\) and \(\lambda_{\max }\) indicate the minimum and maximum value of the eigenvalue of \(X^{\prime}\hat{W}X\), respectively.

In this study, we examined only the first degree polynomial functions given in (25) to (27) for \(g\left( k \right)\) function. Note that, the function \(g\left( k \right)\) can be selected as any continuous function of the biasing parameter k. Therefore, the proposed biased estimator depends on a single biasing parameter k. In this case, we should use an appropriate estimate of biasing parameter k, which must be estimated to control the conditioning of the \(X^{\prime}WX\) matrix. Since the proposed estimator depends on a single biasing parameter k, the suitable estimates of k can be used given in Månsson and Shukur18, Kibria et al.17, Algamal25. In addition to the previously proposed estimators of the biasing parameter, we can also use the following estimators to estimate k:

$$\hat{k}_{PRTE} = \frac{{p\left( {\lambda_{\max } - \lambda_{\min } } \right)}}{n},\hat{k}_{PRTE} = \frac{{\max \left( {\lambda_{j} \hat{\alpha }_{j}^{2} } \right)}}{{\sum\nolimits_{j = 1}^{p + 1} {\hat{\alpha }_{j}^{2} } }},\hat{k}_{PRTE} = \left( {\prod\limits_{j = 1}^{p + 1} {\sqrt {\frac{1}{{\hat{\alpha }_{j}^{2} }}} } } \right)^{{\frac{1}{p + 1}}}$$

where \(m_{j} = \sqrt {\frac{{\hat{\sigma }^{2} }}{{\hat{\alpha }_{j}^{2} }}} ,j = 1,2,...,p + 1\) and \(\hat{\sigma }^{2} = \frac{1}{n - p - 1}\sum\limits_{i = 1}^{n} {\left( {y_{i} - \hat{y}_{i} } \right)^{2} }\).

The Monte Carlo simulation studies

In this section, we designed two simulation schemes to compare the performances of different biased estimators in the PRMs. In the first simulation scheme, we discussed the effects of sample size (n), the degree of the collinearity \(\left( \rho \right)\) and the number of the explanatory variables \(\left( p \right)\) on the performance of the PRTE, PRE, PLE, PLTE, PSK, PHY estimators and PRTE, based on suggested best biasing estimates. In the second simulation design, we examined the effect of the biasing parameter on the performances of the PRTE and ILTE for each set of the values \(\left( {n,\rho ,p,\sigma^{2} } \right)\). For both simulation designs, we generated the explanatory variables by following Månsson and Shukur18, Kibria et al.17, Kibria and Lukman35 as

$$x_{ij} = \left( {1 - \rho^{2} } \right)^{{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2}}} w_{ij} + \rho w_{ip+1} , i = 1,2,..,n, j = 1,2,...,p,$$
(28)

where \(w_{ij}\) are independent standard normal pseudo-random numbers and \(\rho\) is specified such that the correlation between any two variables is given by \(\rho^{2}\). Four different sets of correlations are investigated corresponding to \(\rho = 0.85,0.9,0.99\) and \(0.999\). Number of explanatory variables is determined as \(p = 2, 4, 8\) and 12. For each set of explanatory variables, the parameter \(\beta\) is selected as the normalized eigenvector corresponding to the largest eigenvalue of \(X^{\prime}X\) so that \(\beta^{\prime}\beta = 1\). We used glm function in the R Stats package4. We also set the intercept term equal to 0.

In the simulation and application sections, the proposed best biasing parameter estimators for PRE, PLE, PLTE, PSK, and PHY estimators are used based on the works of Månsson and Shukur18, Månsson et al.19, Kibria et al.17, Asar and Genç15, Alanaz and Algamal34, Çetinkaya and Kaçıranlar16, Qasim et al.23, Huang and Yang30.

To estimate k in PRE, we used the best estimator of k as \(\hat{k}_{PRE} = \max \left( {\frac{1}{{m_{j} }}} \right)\) where \(m_{j} = \sqrt {\frac{{\hat{\sigma }^{2} }}{{\hat{\alpha }_{j}^{2} }}} ,j = 1,2,...,p\) and \(\hat{\sigma }^{2} = \frac{1}{n - p - 1}\sum\nolimits_{i = 1}^{n} {\left( {y_{i} - \hat{\mu }_{i} } \right)^{2} }\) which is recommended by Kibria et al.17.

According the results given by Qasim et al.23, we used the best estimator of d in PLE as \(\hat{d}_{PLE} = \max \left( {0,\min \left( {\frac{{\hat{\alpha }_{j}^{2} - 1}}{{\max \left( {\frac{1}{{\lambda_{j} }}} \right) + \hat{\alpha }_{\max }^{2} }}} \right)} \right).\)

For PLTE, the biasing parameters k and d are estimated by grouping them in three different ways as follows:

PLTE I: \(\hat{k}_{PLTE} = \max \left( {\frac{1}{{m_{j} }}} \right)\) where \(m_{j} = \sqrt {\frac{{\hat{\sigma }^{2} }}{{\hat{\alpha }_{j}^{2} }}} ,j = 1,2,...,p\) and \(\hat{d}_{PLTE} = \frac{{\sum\nolimits_{j = 1}^{p} {\frac{{1 - \hat{k}_{PLTE} \hat{\alpha }_{j}^{2} }}{{\left( {\lambda_{j} + \hat{k}_{PLTE} } \right)^{2} }}} }}{{\sum\nolimits_{j = 1}^{p} {\frac{{1 + \lambda_{j} \hat{\alpha }_{j}^{2} }}{{\lambda_{j} \left( {\lambda_{j} + \hat{k}_{PLTE} } \right)^{2} }}} }}\).

PLTE II: \(\hat{k}_{PLTE} = \frac{{\lambda_{1} - 100 \lambda_{p} }}{99}\) and \(\hat{d}_{PLTE} = \frac{{\sum\nolimits_{j = 1}^{p} {\frac{{1 - \hat{k}_{PLTE} \hat{\alpha }_{j}^{2} }}{{\left( {\lambda_{j} + \hat{k}_{PLTE} } \right)^{2} }}} }}{{\sum\nolimits_{j = 1}^{p} {\frac{{1 + \lambda_{j} \hat{\alpha }_{j}^{2} }}{{\lambda_{j} \left( {\lambda_{j} + \hat{k}_{PLTE} } \right)^{2} }}} }}\).

PLTE III: \(\hat{d}_{PLTE} = \frac{1}{2}\min \left\{ {\frac{{\lambda_{j} }}{{1 + \lambda_{j} \hat{\alpha }_{j}^{2} }}} \right\}, j = 1,2,...,p\) and \(\hat{k}_{PLTE} = \frac{1}{p}\sum\limits_{j = 1}^{p} {\frac{{\lambda_{j} - \hat{d}_{PLTE}^{*} \left( {1 + \lambda_{j} \hat{\alpha }_{j}^{2} } \right)}}{{\lambda_{j} \hat{\alpha }_{j}^{2} }}}\).

Sakallıoğlu and Kaçıranlar31 did not provide a specific technique for estimating the biasing parameters k and d for SK estimator. Therefore, we used the following estimator to estimate the biasing parameters k and d in PSK:

PSK: \(\hat{k}_{PSK} = \max \left( {\frac{1}{{m_{j} }}} \right)\) where \(m_{j} = \sqrt {\frac{{\hat{\sigma }^{2} }}{{\hat{\alpha }_{j}^{2} }}} ,j = 1,2,...,p\) and \(\hat{d}_{PSK} = \frac{{\sum\nolimits_{j = 1}^{p} {\frac{{\lambda_{j} \left( {\hat{\alpha }_{j}^{2} - 1} \right)}}{{\left( {\lambda_{j} + 1} \right)^{2} \left( {\lambda_{j} + \hat{k}_{PSK} } \right)^{2} }}} }}{{\sum\nolimits_{j = 1}^{p} {\frac{{\lambda_{j} \left( {1 + \lambda_{j} \hat{\alpha }_{j}^{2} } \right)}}{{\left( {\lambda_{j} + 1} \right)^{2} \left( {\lambda_{j} + \hat{k}_{PSK} } \right)^{2} }}} }}\).

Moreover, we used the methods proposed by Huang and Yang30 to estimate the parameters of the PHY estimator. Huang and Yang30 proposed two methods. We refer to these methods as (K1, D1) and (K2, D2) (see Huang and Yang30 for details). We used these methods by adapting them for the PHY estimator in PRMs. As a result, the estimator obtained with (K1, D1) indicates PHY I, and the estimator obtained with (K2, D2) with PHY II.

We used the following \(g\left( k \right)\) functions together with the k estimator to determine the PRTE:

PRTE I: \(\hat{k}_{{PRTE {\text{I}}}} = \frac{1}{n}\left( {p\lambda_{\max } - \left( {p + 1} \right)\lambda_{\min } } \right)\) and \(g\left( k \right) = \frac{{\left( {1 + \lambda_{\min } } \right)\alpha_{\min }^{2} }}{{1 + \lambda_{\max } \alpha_{\max }^{2} }}k + \left( {\frac{{\left( {1 + \lambda_{\min } } \right)\alpha_{\min }^{2} }}{{1 + \lambda_{\max } \alpha_{\max }^{2} }} - 1} \right)\lambda_{\min }\).

PRTE II: \(\hat{k}_{{PRTE {\text{II}}}} = \frac{{p\lambda_{\max } \alpha_{{{\text{med}}}}^{2} }}{{n\alpha_{{{\text{mean}}}}^{2} }}\) and \(g\left( k \right) = \frac{{\left( {1 + \lambda_{\max } } \right)\alpha_{\min }^{2} }}{{p\left( {1 + \lambda_{\max } \alpha_{\max }^{2} } \right)}}k + \left( {\frac{{\left( {1 + \lambda_{\max } } \right)\alpha_{\min }^{2} }}{{p\left( {1 + \lambda_{\max } \alpha_{\max }^{2} } \right)}} - 1} \right)\lambda_{\min }\).

PRTE III: \(\hat{k}_{{PRTE {\text{III}}}} = \frac{p}{n}\left( {\lambda_{\max } - \lambda_{\min } } \right)\) and \(g\left( k \right) = \frac{{\min \left( {\left( {1 + \lambda_{j} } \right)\alpha_{j}^{2} } \right)}}{{n\max \left( {1 + \lambda_{j} \alpha_{j}^{2} } \right)}}k + \left( {\frac{{\min \left( {\left( {1 + \lambda_{j} } \right)\alpha_{j}^{2} } \right)}}{{n\max \left( {1 + \lambda_{j} \alpha_{j}^{2} } \right)}} - 1} \right)\lambda_{\min }\).

PRTE IV: \(\hat{k}_{{PRTE {\text{IV}}}} = \frac{{p\max \left( {\lambda_{j} \alpha_{j}^{2} } \right)}}{{n\alpha_{{{\text{mean}}}}^{2} }}\) and \(g\left( k \right) = \min \left( {\frac{{\left( {1 + \lambda_{j} } \right)\alpha_{j}^{2} }}{{n\left( {1 + \lambda_{j} \alpha_{j}^{2} } \right)}}} \right)k + \left( {\min \left( {\frac{{\left( {1 + \lambda_{j} } \right)\alpha_{j}^{2} }}{{n\left( {1 + \lambda_{j} \alpha_{j}^{2} } \right)}}} \right) - 1} \right)\lambda_{\min }\). where \(\alpha_{{{\text{med}}}}^{2}\) and \(\alpha_{{\text{mean}}}^{2}\) are defined as the median end mean value of \(\alpha_{j}^{2} , j = 1,2,...,p + 1,\) respectively.

The performance of the estimated MSEs (EMSEs) is used as basis for comparing the proposed estimators which are calculated for an estimator \(\hat{\beta }\) of \(\beta\) as

$$EMSE\left( {\hat{\beta }} \right) = \frac{1}{N}\sum\limits_{r = 1}^{N} {\left( {\hat{\beta }_{r} - \beta } \right)^{\prime } \left( {\hat{\beta }_{r} - \beta } \right)} ,$$
(29)

where \(\left( {\hat{\beta }_{r} - \beta } \right)\) is the difference between the estimated and true parameter vectors at rth replication and N is the number of replications. For each case of n, p and \(\rho\), the experiment was replicated 2000 times by generating response variables. Our Monte Carlo simulation studies were conducted using the R Programming Language. The results for different n, p and \(\rho\) are given in Tables 1, 2, 3 and 4 for \(p = 2, 4, 8\) and 12 respectively.

Table 1 The EMSE values of the estimators when \(p = 2.\)
Table 2 The EMSE values of the estimators when \(p = 4.\)
Table 3 The EMSE values of the estimators when \(p = 8.\)
Table 4 The EMSE values of the estimators when \(p = 12.\)

The bold numbers in the tables show the estimators with the smallest EMSE values, and in addition, the signs (*), (**), and (***) represent the first, second, and third smallest EMSE values in each row, respectively. The results from Tables 1, 2, 3 and 4 are listed below:

  1. 1.

    According to the results from Tables 1, 2, 3 and 4, it can be seen that the degree of correlation \(\left( \rho \right),\) the number of explanatory variables \(\left( p \right)\) and the sample size \(\left( n \right)\) have different effects on all estimators in the simulation.

  2. 2.

    It has been observed that the EMSE values of PRTE I, PRTE II, PRTE III and PRTE IV are smaller than the other existing biased estimators. Although our proposed estimators PRTE I, PRTE II, PRTE III, and PRTE IV outperformed other existing estimators in all cases, it is also observed that they outperformed each other in different \(n, p\) and \(\rho\) values.

  3. 3.

    When the number of variables p, and \(\rho\) are kept constant, the number of observations in the model did not have a significant effect on the PRTE I, PRTE II, PRTE III, and PRTE IV.

  4. 4.

    Regardless of n and p values, it is observed that PRTE I, PRTE II, PRTE III, and PRTE IV tended to give low EMSE values at high correlation.

  5. 5.

    When the number of observations \(\left( n \right)\) and correlation \(\left( \rho \right)\) in the model are kept constant, the EMSE values for PRTE I, PRTE II, PRTE III, and PRTE IV decrease with the increase in the number of explanatory variables \(\left( p \right)\).

In summary, all our proposed estimators outperformed the other considered estimators in all scenarios. However, it can be seen that there are cases where our estimators outperform each other due to different k and \(f\left( k \right)\) function choices in different scenarios. As a result, we can observe that the number of observations has a relatively small effect on EMSE values compared to \(\rho\) and p. In other words, PRTEs have a robust structure according to the number of observations, therefore it gives very good results in case of high collinearity.

In the second simulation scheme, we examined the effects of the biasing parameter k on ILTEs and PRTE performances when the sample size \(\left( n \right)\), degree of the collinearity \(\left( \rho \right)\), and number of explanatory variables \(\left( p \right)\) are constant. The purpose of this simulation is to examine the performances of ILTE and PRTE at various values of the biasing parameter k according to the EMSE values given in (29). The biasing parameter k was not estimated in the second simulation scheme. Only the EMSE values obtained by increasing the k values in the range \(\left[ {0,{ 2}} \right]\) by 0.1 were compared. There are many \(f\left( k \right)\) and \(g\left( k \right)\) functions we considered to evaluate the performances of these estimators. In order to compare the performances of these estimators under some n, p and \(\rho\) as an example, the ILTEs and PRTE determined by the following \(f\left( k \right)\) and \(g\left( k \right)\) functions are considered:

  • \(\hat{\beta }_{ILTE} = \left( {X^{\prime}\hat{W}X + kI} \right)^{ - 1} \left( {X^{\prime}\hat{W}X + f\left( k \right)I} \right)\hat{\beta }{\kern 1pt}_{MLE}\) where \(f\left( k \right) = \frac{{\lambda_{\min } \alpha_{\min }^{2} }}{{1 + \lambda_{\max } \alpha_{\max }^{2} }}k + \left( {\frac{{\lambda_{\min } \alpha_{\min }^{2} }}{{1 + \lambda_{\max } \alpha_{\max }^{2} }} - 1} \right)\lambda_{\min }\)

  • \(\hat{\beta }_{{ILTE{ (}PRE)}} = \left( {X^{\prime}\hat{W}X + kI} \right)^{ - 1} \left( {X^{\prime}\hat{W}X + f\left( k \right)I} \right)\hat{\beta }_{PRE}\) where \(f\left( k \right) = \frac{{\alpha_{\min }^{2} }}{{1 + \lambda_{\max } \alpha_{\max }^{2} }}\left( {k + \lambda_{\min } } \right)^{2} - \left( {k + \lambda_{\min } } \right)\)

  • \(\hat{\beta }_{PRTE} = \left( {X^{\prime}\hat{W}X + I} \right)^{ - 1} \left( {X^{\prime}\hat{W}X + g\left( k \right)I} \right)\hat{\beta }_{PRE}\) where \(g\left( k \right) = \frac{{\left( {\lambda_{\min } + 1} \right)\alpha_{\min }^{2} }}{{1 + \lambda_{\max } \alpha_{\max }^{2} }}k + \left( {\frac{{\left( {\lambda_{\min } + 1} \right)\alpha_{\min }^{2} }}{{1 + \lambda_{\max } \alpha_{\max }^{2} }} - 1} \right)\lambda_{\min }\)

Note that, when we use \(\hat{\beta }_{PRE}\) instead of \(\hat{\beta }^{*}\) in \(\hat{\beta }_{ILTE}\), the obtained estimator is shown \(\hat{\beta }_{ILTE(PRE)}\). Also, the \(f\left( k \right)\) functions used in the \(\hat{\beta }_{ILTE}\) and \(\hat{\beta }_{ILTE(PRE)}\) were determined in accordance with the rules given by Akay and Ertan5. Note that when the method given by Akay and Ertan5 is applied to \(\hat{\beta }_{ILTE(PRE)}\), \(f\left( k \right)\) that minimizes the \(SMSE\left( {\hat{\beta }_{ILTE(PRE)} } \right)\) function is a quadratic function.

We considered the cases \(\rho = 0.9,0.99, 0.999\), \(n = 50,100,500\), and \(p = 4,8,12\). Depending on these n, \(\rho\) and p values, the explanatory variables are generated according to (28). The simulation is repeated 2000 times for each k value. The results are given graphically in Figs. 1, 2 and 3.

Figure 1
figure 1

The EMSE values of ILTE, ILTE(PRE), PRTE as a function of k values where \(\rho = 0.9\).

Figure 2
figure 2

The EMSE values of ILTE, ILTE(PRE), PRTE as a function of k values where \(\rho = 0.99\).

Figure 3
figure 3

The EMSE values of ILTE, ILTE(PRE), PRTE as a function of k values where \(\rho = 0.999\).

According to Figs. 1, 2 and 3, we can obtain the following results depending on each set of the values \(\left( {n,\rho ,p} \right)\);

  1. 1.

    At small values of the biasing parameter k, PRTE outperforms other ILTE and ILTE(PRE). Although both the PRTE and ILTE(PRE) include the PRE, the performance of the ILTE(PRE) is quite poor compared to the PRTE at small values of the biasing parameter.

  2. 2.

    When the collinearity between the explanatory variables is relatively low, i.e. \(\rho = 0.9\), ILTE(PRE) exhibits quite different behavior from ILTE and PRTE. If the value of correlation of explanatory variables and the number of explanatory variables increases, ILTE, ILTE(PRE) and PRTE show almost the same behavior. However, PRTE exhibits a more consistent behavior at varying values of the biasing parameter k.

As a result of the second simulation design, we recommend the PRTE to the researchers. In general, the performance of these estimators depends on \(f\left( k \right)\) and \(g\left( k \right)\) functions, respectively. In practice, we need to replace these functions with suitable functional relationships that can occur between the biasing parameters.

Numerical example: the aircraft damage data

In this section, the aircraft damage data is reanalyzed to demonstrate the benefits of PRTE. This data consists of 30 observations with three explanatory variables. The first variable \(\left( {x_{1} } \right)\) is a dichotomous variable showing the type of the aircraft. The explanatory variables \(\left( {x_{2} } \right)\) and \(\left( {x_{3} } \right)\) are bomb load in tons and total months of aircrew experience, respectively. The count variable y is the number of locations where damage was inflicted on the aircraft3. This dataset is also used by Myers et al.3 , Asar and Genç15, Amin et al.7, Lukman et al.36, and Akay and Ertan5.

Asar and Genc15, Amin et al.7 and Akay and Ertan5 considered the following model \(\mu = \exp \left( {\beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2} + \beta_{3} x_{3} } \right)\). Except for the intercept term, the eigenvalues of \(X^{\prime}X\) are 208,522.5106, 374.8961 and 4.3333. Thus, the condition number is 48,120.9495, indicating a high multicollinearity problem among the explanatory variables. Firstly, the variables are standardized and then the intercept term is added to the vector of variables. Also, the eigenvalues of the matrix \(X^{\prime}WX\) are obtained as \(\lambda_{1} = 4{7}{\text{.5850}}\) \(\lambda_{2} = {2}{\text{.2844}}\), \(\lambda_{3} = {1}{\text{.4097}}\) and \(\lambda_{4} = {0}{\text{.3681}}\). The condition number is 129.2719 which is considerably larger than 30, indicating that MLE is still affected due to multicollinearity. The numerical results are given in Tables 5 to compare the PRTEs with other existing estimators.

Table 5 The estimated parameter values and the SMSE values of the estimators.

In addition, the bootstrap sampling method is used to calculate the SMSE values of the given biased estimators. For this reason, 10,000 bootstrap samples have been created. For each of these samples, the parameter estimates of the given biased estimators are calculated. The mean of the MLE estimates is considered the real parameters. Then the calculated SMSE values are given in Table 5. From Table 5, it can be seen that the estimator with the best SMSE value is PRTE I and PRTE III.

Now, we want to examine the performances of ILTE, ILTE(PRE), and PRTE, which were examined in the previous section. Figure 4 graphically shows the estimated variance values of these estimators based on the value of the biasing parameter k. Also, Fig. 5 shows the SMSE performance of \(\hat{\beta }_{ILTE}\), \(\hat{\beta }_{ILTE(PRE)}\) and \(\hat{\beta }_{PRTE}\) estimators according to the biasing parameter k.

Figure 4
figure 4

The estimated variance values of ILTE, ILTE(PRE) and PRTE as a function of k.

Figure 5
figure 5

The SMSE values of ILTE, ILTE(PRE) and PRTE as a function of k.

Figures 4 and 5 indicating that the proposed PRTE is a strong alternative to other estimators at small values of the biasing parameter k. This result is also compatible with the second simulation results given in the previous section.

To compare the estimators under the MMSE sense, the parameter estimation obtained with the bootstrap sampling method is used in place of the unknown parameter \(\alpha\). R Programming is used with tolerance \(10^{ - 12}\) to show the MMSE differences as a positive definite (pd) matrix. That is, if any of the eigenvalues is less than or equal to tolerance, then the matrix is not pd. Otherwise, the considered matrix is pd.

Finally, our aim in this part is to compare the estimators obtained from the choice of various \(f\left( k \right)\) and \(g\left( k \right)\) functions as a result of the theorem given in  “The superiority of the PRTE in PRMs”. To illustrate Theorem 3.1, the function \(f\left( k \right)\) and \(g\left( k \right)\) are taken as \(f\left( k \right) = 0.05k + 0.05\) and \(g\left( k \right) = 0.5k - 0.05\), respectively. In this case, \({\text{cov}} \left( {\hat{\beta }_{ILTE} } \right) - {\text{cov}} \left( {\hat{\beta }_{PRTE} } \right)\) is pd matrix for for \(0 < k \le 2.0057\). Also, k values which provide (22) criterion are \(0 < k < 2.0054\). Consequently, \(MMSE\left( {\hat{\beta }_{ILTE} } \right) - MMSE\left( {\hat{\beta }_{PRTE} } \right)\) is the pd matrix where \(0 < k < 2.0054\).

Some concluding remarks

In this article, we defined a new general class of estimator named the PRTE as an alternative to MLE and the other existing biased estimators in the presence of multicollinearity for the PRMs. The PRTE is a general estimator which includes other biased estimators, such as the PRE, PLE, PHY and PSK estimators as special cases. In this study, we propose several rules for the determination of function \(g\left( k \right)\). By using Monte Carlo simulations, the performance of the proposed PRTE with the existing estimators is evaluated in the smaller EMSE sense. The results show that the proposed PRTE outperforms the existing estimators in case of high multicollinearity. In addition, the comparison of ILTEs and PRTE is given with a general simulation study. In this simulation study, these two general estimators are compared according to the values of the biasing parameter k. It is observed that the PRTE is superior at small values of the biasing parameter k. Although the PRTE and ILTE(PRE) are both depending on the PRE, the main advantage of PRTE over ILTE(PRE) is that it can minimize the SMSE function with the help of a liner function of the biasing parameter k. Also, the estimators are applied to real dataset and it is observed that the results are consistent with simulation study. Depending on the experimental conditions examined, the proposed biased estimator outperforms the other existing biased estimators. Therefore, based on the results of the simulations and example, the PRTEs are recommended to the practitioners when there is multicollinearity problem in the PRMs.