On best unbiased prediction and its relationships to unbiased estimation
Introduction
Standard estimation theory deals with estimation of a function h(θ) of an unknown parameter vector θ based on the data y, postulating that y is the observed value of a random vector Y with density . That theory is well developed (e.g., Rao, 1973; Lehmann, 1983) but, it does not apply when the quantity of interest is not a function only of θ. Such situations arise in prediction under mixed linear models, estimation under superpopulation models, loss estimation, estimation after selection, the species problem, and in other applications (see Bjornstad (1996), and Section 4 for some specific examples). In this paper, we wanted to generalize unbiased estimation theory to cover such non-standard problems, and found that a prediction framework is suitable for that purpose. Thus, we consider prediction of the realized but unobserved value, z, of a random variable Z based on the observed value, y, of a random vector Y, given that the joint density of Y and Z is , where θ∈Θ is an unknown parameter vector. This includes an estimation problem as a special case, where Z=h(θ), i.e., the distribution of Z given y and θ is degenerate and independent of y.
It may appear from Hill (1990) and Bjornstad (1996) that the following problem is more general than our prediction problem. Let Y and W be two random vectors with joint density , and Z=η(Y,W,θ) be a function of Y,W and θ. Here, Y is observed but, W remains unobserved and the problem is to predict the realized value of Z based on y. In this context, Hill (1990), Bjornstad (1996), and others have discussed some important concepts such as sufficiency, ancillarity, and the likelihood principle. We, however, note that although this framework covers standard estimation and prediction problems as special cases, it is not well defined because any function of W can also be regarded as a function of for any set of some additional random variables W+. Thus, the problem can also be stated in terms of and , where, β are some additional parameters needed for complete specification of the family of joint distributions of Y and , and expresses the quantity of interest as a function of and β. The prediction framework avoids this ambiguity by dropping W and focusing directly on the family of joint distributions of Y and Z.
We consider the prediction problem to be a natural and general formulation of a statistical inference problem by taking the view that a problem starts with the quantity z of inferential interest and the available data y and then, to relate y to z we regard them as the realized values of two random observables Y and Z with joint distribution belonging to a specified family of distributions. The parameter θ just indexes this family. In some applications, consideration of other variables (W) may provide extra motivation and convenience for modeling the joint distribution of Y and Z, but it may not be necessary to consider W after the model (for Y and Z) is specified. We should note that a full statement of an inference problem also involves prior information, loss function, and evaluation criteria. But, a general discussion about their role and nature is not necessary for this paper.
It is well known that under minimum mean squared error, and several other criteria, the conditional mean of Z given y and θ, which we shall denote by , is the best predictor of Z (see Theorem 2.1, and Rao (1973, Section 4g.1)) provided that it is independent of θ. When γ(y,θ) depends on the unknown parameter θ, as is the case in most applications, a natural approach is to use an estimate of γ(y,θ) for predicting Z. Of course, various methods can be used for estimating γ(y,θ). In particular, one may (i) replace θ by an estimate in γ(y,θ), (ii) use the mean of an estimate of or (iii) use an estimate of the mean of γ(Y,θ), denoted by ψ(θ)=E[γ(y,θ)θ]; note that ψ(θ) is also . For example, approach (iii) has been used in estimation following selection (e.g., Sackrowitz and Samuel-Cahn, 1984; Vellaisamy, 1993; Nayak, 1995), and in estimating the probability of discovering a new species in an additional selection (e.g., Robbins, 1968; Starr, 1979; Nayak 1992, Nayak 1996). Note that if Z and Y are independent given θ, or more generally if γ(y,θ) is independent of y, then ψ(θ)=γ(y,θ), and approaches (i) and (iii) are similar. We shall provide some answers to the natural question: how should one estimate γ(y,θ) to obtain a good predictor of Z? However, our approach is to characterize and derive the optimum predictors directly, as in unbiased estimation theory. Thus, we also generalize some results on estimation.
In this paper, we shall judge the performance of a predictor by its bias and mean squared error (MSE). Three different schemes of repeated applications of the predictor lead to the following measures of bias and MSE. A predictor δ(Y) of Z is said to be
(i) z-unbiased if
(ii) y-unbiased if
(iii) unbiased ifThe corresponding measures of MSE of δ(Y) are:Clearly, each of z and y-unbiasedness imply (overall) unbiasedness but, the converses need not be true, and neither of z,y-unbiasedness implies the other. The fact that the conditional expectations in , , , are defined uniquely only up to probability 1 will be taken into account in some later discussions. While the choice of bias and MSE measures depends on the specific application, our goal is to discuss consequences of various choices and their relationships.
In Section 2, first we show that a minimum MSE predictor (without imposing unbiasedness) exists if and only if γ(y,θ) is independent of θ. As this does not hold in most applications, we then discuss uniformly minimum MSE unbiased prediction of Z. We present a characterization of the UMMSEUP of Z (see Theorem 2.2), and a lower bound for the MSE of an unbiased predictor. We discuss some relationships between the UMMSEUP of Z and the UMVUE of ψ(θ), and present necessary and sufficient conditions for existence of the UMMSEUP of Z when the marginal model for Y admits a complete sufficient statistic. In Section 3, we discuss predictions based on the conditional measures of bias and MSE defined in , , , . We show that (i) y-unbiasedness is not useful in most applications, (ii) if Y and Z are independent given θ, a z-unbiased predictor does not exist, and (iii) when the marginal model for Y admits a complete sufficient statistic T(Y) and depends on y only through T(y), the UMVUE of ψ(θ) is also the best z-unbiased predictor of Z, provided that a z-unbiased predictor exists. In Section 4, we apply our results to Robbins’ u–v method of estimation, predictions in mixed linear models, and estimation of a finite population mean under a transformation model.
Section snippets
Best predictor
For completeness, we first consider finding a uniformly minimum mean squared error predictors (UMMSEP) among all predictors by minimizing the overall MSE defined in (1.6). If δ(Y) and are two predictors then,where the last expectation can also be expressed asUsing , it can be seen that δ(Y) is a UMMSEP, if and only if δ(Y) is a version of γ(Y,θ) for all θ, which can be true
Optimality under conditional measures
In this section we discuss comparison of predictors based on measures of bias and MSE defined conditionally on y and z, respectively. Effects of y-unbiasedness have essentially been discussed in Theorem 2.1. A y-unbiased predictor of Z exists only if γ(y,θ) is independent of θ, and hence efforts to find a best y-unbiased predictor will be fruitless, except in very special circumstances. So, we shall focus on using z-unbiasedness and z-MSE for choosing a predictor. Taking the null events into
Applications
Example 1 We use this simple example to illustrate how different criteria may lead to different results. Let X1,…,Xn,Xn+1 be iid N(θ,1) and let Y=(X1,…,Xn). Here, is a complete sufficient statistic. (a) Let Z=X1−θ. Then, and . Here, the UMVUE of ψ(θ) is δ1(Y)=0, and the UMMSEUP of Z is (by Theorem 2.4). It can be seen that is (z,θ)-identifiable, is a minimal prediction sufficient statistic, where , and
Discussion
Prediction is a basic problem in statistical inference. In this paper, we have made some contribution to minimum MSE unbiased prediction and its relationships to unbiased estimation. Further, as estimation is a special case of prediction, we also generalized some results on unbiased estimation. We proved that the UMVUE of ψ(θ) is also the UMMSEUP of Z (and hence a good predictor of Z) if (i) Y and Z are independent given θ or (ii) if γ(y,θ) depends on y only through a complete sufficient
Acknowledgements
This research was supported by an ASA/USDA-NASS research fellowship. The author thanks Joseph L. Gastwirth, Mike Fleming, and two referees for some helpful comments.
References (34)
On statistical analysis of a sample from a population of unknown species
J. Statist. Plann. Inference
(1992)- Aitchison, J., Dunsmore, I.R., 1975. Statistical Prediction Analysis. Cambridge University Press, New...
- et al.
An error-components model for prediction of county crop areas using survey and satellite data
J. Amer. Statist. Assoc.
(1988) Predictive likelihood: a review (with discussion)
Statist. Sci.
(1990)On the generalization of the likelihood function and the likelihood principle
J. Amer. Statist. Assoc.
(1996)- Cassel, C., Sarndal, C., Wretman, J.H., 1977. Foundations of Inference in Survey Sampling. Wiley, New...
- et al.
Admissibility of estimators of the probability of unobserved outcomes
Ann. Inst. Statist. Math.
(1990) - et al.
On the frequentist properties of some hierarchical Bayes predictors in finite population sampling
Calcutta Statist. Assoc. Bull.
(1990–91) - et al.
Bayesian prediction in linear models: applications to small area estimation
Ann. Statist.
(1991) Conditional independence in statistical theory (with discussion)
J. Roy. Statist. Soc. B
(1979)
Best linear unbiased prediction in the generalized regression model
J. Amer. Statist. Assoc.
Extension of the Gauss–Markov theorem to include the estimation of random effects
Ann. Statist.
Decomposition of prediction error
J. Amer. Statist. Assoc.
A general framework for model-based statistics
Biometrika
Cited by (17)
Bayes, E-Bayes and robust Bayes prediction of a future observation under precautionary prediction loss functions with applications
2016, Applied Mathematical ModellingCitation Excerpt :This subject is the one which frequentists and Bayesians alike, agree on being Bayesian for predictions. For more information see Eaton [1], Bjornstad [2], Nayak [3], Wang and Veraverbeke [4], Karimnezhad et al. [5], Golparver et al. [6] and references cited therein. Prediction of a future observation has been developed under different prediction loss functions, see Jafari Jozani and Parsian [7] for prediction under Entropy prediction loss, Kiapour and Nematollahi [8] for prediction under balanced prediction loss function, among others.
Bayes and robust Bayes prediction with an application to a rainfall prediction problem
2014, Journal of the Korean Statistical SocietyThe concept of risk unbiasedness in statistical prediction
2010, Journal of Statistical Planning and InferenceFinding optimal estimators in survey sampling using unbiased estimators of zero
2003, Journal of Statistical Planning and InferenceRandom function prediction and stein's identity
2002, Statistics and Probability LettersPrediction of times to failure of censored units under generalized progressive hybrid censoring scheme
2022, Computational Statistics