Simultaneous prediction in the generalized linear model

This paper studies thepredictionbasedona composite target function that allows to simultaneously predict the actual and the mean values of the unobserved regressand in the generalized linear model. The best linear unbiasedprediction (BLUP) of the target function is derived. Studies show that our BLUPhas better properties than some other predictions. Simulations con rm its better nite sample performance.


Introduction
Generalized linear models have a long history in the statistical literature and have been used to analyze data from various branches of science on account of both mathematical and practical convenience. Consider the following generalized linear model: where y is the n-dimensional vector of observed data; y is the m-dimensional vector of unobserved values that is to be predicted; X and X are n × p and m × p known matrices of explanatory variables. Let rk(A) denote the rank of matrix A and suppose rk(X) ≤ p; β is the p × unknown vector of regression coe cients, and ε and ε are random errors with zero mean and covariance matrix where Σ ≥ and Σ ≥ are known positive semi-de nite matrices of arbitrary ranks. The problem of predicting unobserved variables plays an important role in decision making and has received much attention in recent years. For the prediction of y in model (1), [1] obtained the best linear unbiased predictor (BLUP) when Σ > . The Bayes and minimax prediction were obtained by [2] when random errors were normally distributed. [3] and [4] derived the linear minimax prediction under a modi ed quadratic loss function. [5] considered the optimal Stein-rule prediction. [6] reviewed the existing theory of minimum mean squared error loss predictors and made an extension based on the principle of equivariance. [7] investigated the admissibility of linear predictors with inequality constraints under the mean squared error loss function. Another interested subject of prediction relates to the mean of y , since [8] gured out that the best predictor of y is the conditional mean under the criterion of minimum mean squared error. In model (1), prediction of the mean value of y (namely = X β) relates naturally to the plug-in estimators of parameter β. [9] proposed the simple projection predictor (SPP) of X β by plugging in the best linear unbiased estimator (BLUE) of β. [10,11] considered plugging in the prediction of β under the balanced loss function. The plug-in approach spawned a large literature for the derivation of combined prediction, see [12][13][14].
Generally, predictions are investigated either for y or for Ey at a time. However, sometimes in the elds of medicine and economics, people would like to know the actual value of y and its mean value Ey simultaneously. For example, in the nancial markets, some investors may want to know the actual pro t while others would be more interested in the mean pro t. Therefore, in order to meet di erent requirements, the market manager should acquire both the prediction of the actual pro t and the prediction of the mean pro t simultaneously. Let aside investors' demands and from the point of view of a decision maker, the market manager needs to determine which prediction should be preferred or provides another comprehensive combined prediction both of the actual and the mean pro t based on empirical data. [15] gave other examples of practical situations where one is required to predict both the mean and the actual values of a variable. Under these circumstances, we consider predictions of the following target function where λ ∈ [ , ] is a non-stochastic weight scalar representing the preference to the prediction of actual and the mean value of the studied variable. Note that, δ = y if λ = and δ = Ey if λ = , which means predicting δ can achieve the prediction of y and Ey simultaneously. If < λ < , then prediction of δ balances the prediction of actual and the average value of y . Besides, the unbiased prediction of δ is also the unbiased prediction of y or Ey . Therefore, δ is more sensitive and inclusive to be studied. Studies on the prediction of δ have been carried out in the literature from various perspective. The properties of the predictors by plugging in Stein-rule estimators have been concerned by [16][17][18]. [19] investigated the Stein-rule prediction for δ in linear regression model when the error covariance matrix was positive de nite yet unknown. [20] studied the admissible prediction of δ. [21,22] and [23] considered predictors for δ in linear regression models with stochastic or non-stochastic linear constraints on the regression coe cients. The issues of simultaneous prediction in measurement error models have been addressed in [24] and [25]. [26] considered a scalar multiple of the classical prediction vector for the prediction of δ and discussed the performance properties.
For model (1), most former work concerned about biased prediction under Σ > (including the special case Σ = I), and did not discuss the value of the weight scalar λ in (2). In this paper, supposing Σ ≥ , we studied the best linear unbiased prediction (BLUP) of δ and make some comparisons to the usual BLUPs of y and Ey . We also propose a method to choose the value of λ in (2), which can give the way to determine which prediction of δ or y or Ey should be provided by nite sample data.
The rest of the paper is organized as follows. In Section 2, we derive the BLUPs of the target function (2) in the generalized linear model, and discuss the e ciency of our BLUP comparing to the usual BLUP and SPP. Simulation studies are provided in Section 3 to illustrate the determination of the weight scalar in our BLUP and the performance of our proposed BLUP comparing to the other two predictors. Concluding remarks are given in Section 4.

The BLUP of δ and its e ciency
Denote LH = {Cy C is an m × n matrix} as the set of all the homogeneous linear predictor of y . Denoteδ BLUP as the best linear unbiased predictor of δ in model (1). In this section, we rst derive the expressions ofδ BLUP in LH, and then study its performance comparing to the BLUP of y and the SPP of Ey . All of the predictors discussed in this paper are derived under the criterion of minimum mean squared error. Some preliminaries and basic results are given as follows:

De nition 2.2. δ is linearly predictable if there exists a linear predictor Cy in LH such that Cy is an unbiased predictor of d.
Proof. From De nition 2.1 and 2.2, there exists a matrix C such that E(Cy) = Eδ for any β, namely CX = X or If not speci ed otherwise, the variables we aim to predict in this paper are all linearly predictable.

Lemma 2.4 ([27]
). Suppose the n × n matrix Σ ≥ and let X be an n × p matrix, then

Lemma 2.5. In model (1), the BLUP of y and the SPP of Ey are respectivelỹ
If Σ > and rk(X) = p in model (1), the BLUP of y and the SPP of Ey are respectivelŷ Proof. BLUPs of y in Lemma 2.5 were derived by [1] and [28]. The SPPs of Ey were derived by [9].
The BLUPs and SPPs are presented here for further comparisons.
. The best linear unbiased predictor of δ Theorem 2.6. In model (1), the BLUP of δ in LH iŝ Proof. Supposeδ = Cy ∈ LH and is unbiased, then by Lemma 2.3, CX = X . Denote R(δ; β) as the risk ofδ and tr(A) as the trace of squared matrix A, we have Minimizing R(δ; β) is equivalent to solve the following optimization problem to obtain C such that Let Λ be a p × m Lagrange multiplier and construct the Lagrange function as Let ∂L ∂C = and ∂L ∂Λ = , we have and Corollary 2.7. If Σ > and rk(X) = p in model (1), then the BLUP of δ iŝ Proof. If Σ > and rk(X) = p, then X ′ Σ − X is nonsingular. Since With similar calculations as in the proof of Theorem 2.6, the solution of (3) gives that and thereforeδ BLUP = X β BLUE + λVΣ − (y − Xβ BLUE ).
Theorem 2.8. For the prediction of (2) in model (1) As for the choice of λ, usually the weight scalar should be given before predicting. Since λ represents the weight to the prediction of y and is not a parameter, then there is no "true" but suitable value of it. One method to select λ is by forecasters' subjective preferences. For example, if the prediction of y and Ey are treated equally, then λ = . . Another method to determine λ is by using observed data of (y, X) in model (1). In this paper we recommend to use the leave-one-out cross-validation technique. In order to determine λ, we takeδ BLUP as the predictor of y by Theorem 2.8 since the true β in Ey = X β is unknown. De neδ (−j) (λ) to be the predictor of y j when the jth case of (y, The predicted residual sum of squares is de ned as The choice of λ is the one that minimizes CV(λ) over T.

Simulations in Section 3 indicate the leave-one-out cross-validation technique for the selection of λ is feasible.
Forecasters can determine which one ofδ BLUP ,ỹ BLUP andỹ SPP is more "suitable" to be a orded through the selection of λ by observed data.

Remark 2.12. Theorem 2.10 and Corollary 2.11 show thatδ BLUP is better thanỹ BLUP under the criterion of covariance.
Theorem 2.13. For model (1) Proof. Denote thenδ BLUP = C y andỹ BLUP = C y. By the unbiasedness, C X = X and C X = X . Therefore, Note that D is a symmetric idempotent matrix and then we have Besides, Substituting (5) and (6) into (4), we have By Lemma 2.4 and Theorem 2.13, we have Corollary 2.14. In

Remark 2.15. Theorem 2.13 and Corollary 2.14 show thatδ BLUP is better thanỹ BLUP under the squared loss function as the predictor of Ey .
Theorem 2.16. For model (1), thenδ BLUP = C y ,ỹ BLUP = C y andỹ SPP = X β = C y. By Lemma 2.3, C X = X , C X = X and C X = X . Since

Simulation studies
In this section, we conduct simulations to illustrate the selection of λ inδ BLUP and the nite sample performance of our simultaneous prediction comparing toŷ BLUP andŷ SPP . The data are generated from the following model: where Σ = We assume y is the observation with sample size n = and y is to be predicted with sample size m = . In Section 3.1 we only need the sample data of y to determine λ, while in Section 3.2 we use all the sample data of y and y for comparison with various λ. Elements in corresponding matrices X and X are generated from the Uniform distribution [ . , . ].

. Selection of λ inδ BLUP
We set β to be the one-dimensional parameter with the true value 0.8. The number of simulated realizations for choosing λ is 1000. In each simulation, let λ vary from 0 to 1 with step size 0.001. We use the leave-one-out cross-validation technique (see Section 2.1) to determine λ. Let λ * be the selected value of λ, then Simulations show that the relationship between CV(λ) and λ is varying. Three of the simulations are presented to illustrate the relation between λ and log CV(λ) in Figure 1. Sub gure (a) tells that λ = and y BLUP should be provided when predicting; (b) tells that λ = andŷ SPP should be preferred; (c) tells that λ = .
andδ BLUP should be provided when predicting. The relationship between CV(λ) and λ also tells us that there are three kinds of λ * in our simulations. Table 1 shows that among 1000 simulations, 267 of them give that λ = , 332 of them determine λ = and 401 of them give that < λ < . Simulation performance shows that the leave-one-out cross-validation technique for the selection of λ is feasible and give the way to solve the question " which one ofδ BLUP ,ŷ BLUP andŷ SPP is preferred from the observations ".  (7). λ inδ BLUP varies on a grid from 0.1 to 0.9. For each λ, the number of simulations is 1000. In each simulation, we make some comparisons aboutδ BLUP ,ŷ BLUP andŷ SPP . Regardingδ BLUP − y ,ŷ BLUP − y andŷ SPP − y , the sample means (sms), the standard deviations (stds) and the mean squares (mss) of which are obtained in Table 2. Also, regardingδ BLUP − X β,ŷ BLUP − X β andŷ SPP − X β, the sms, the stds and the mss of which are presented in Table 3.
From Table 2 and Table 3, we make the following observations: (1) As for the prediction precision, no matter what λ is set to be, the sample means (sms) of these prediction error ofŷ BLUP ,δ BLUP andŷ SPP are all small. Comparisons of sms can not tell which one of the three predictors is better, yet the standard deviations (stds) and the mean squares (mss) ofδ BLUP − y are less than that ofŷ SPP − y . (2) No matter what λ is set to be, the sample means (sms) ofŷ BLUP − X β,δ BLUP − X β andŷ SPP − X β are all small. Comparisons of sms can not determine which predictor is better, yet the standard deviations (stds) and the mean squares (mss) ofδ BLUP − X β are less than that ofŷ BLUP − X β.

Conclusion
In this paper, we study the prediction based on a composite target function that allows to simultaneously predict the actual and the mean values of the unobserved regressand in the generalized linear model. The BLUP of the target function is derived when the model error covariance is positive semi-de nite. The BLUP is also the unbiased prediction of the actual and the mean values of the the unobserved regressand. We propose the leave-one-out cross-validation technique to determine the value of the weight scalar in our prediction, which can help to provide a suitable prediction. For the e ciency of the proposed BLUP, studies show that it is better than the usual BLUP under the criterion of covariance and dominates it as a prediction of the mean value of the regressand. Besides, the proposed BLUP is better than the SPP as a prediction of the actual value of the regressand. Simulation studies illustrate the selection of the weight scalar in the proposed BLUP and show that it has better nite sample performance. Further researches on simultaneous prediction are in progress. Table 3. Finite sample performance about goodness t of the model ofŷ BLUP ,δ BLUP (with di erent λ) andŷ SPP λ = . λ = . λ = . λ = . λ = . λ = . λ = . λ = . λ = .