On best unbiased prediction and its relationships to unbiased estimation

https://doi.org/10.1016/S0378-3758(99)00152-4Get rights and content

Abstract

Let Y be a random vector and Z be a random variable with joint density f(y,z|θ), where θΘ is a vector of unknown parameters. This paper discusses minimum mean squared error (MSE) unbiased prediction of Z based on Y, and its relationships to minimum variance unbiased estimation of ψ(θ)=E[Z|θ], the expected value of Z. A Rao–Cramer type lower bound for the MSE of an unbiased predictor is presented and a characterization of uniformly minimum MSE unbiased predictors (UMMSEUP) is discussed. When Y and Z are independent given θ, the UMMSEUP of Z and the uniformly minimum variance unbiased estimator (UMVUE) of ψ(θ) are shown to be identical. If the marginal model {f(y|θ),θ∈Θ} admits a complete sufficient statistic T(Y), we prove that (a) the UMMSEUP of Z exists if and only if Z admits an unbiased predictor and there exist two functions k and h such that E[Z|y,θ]=k(y)+h(T(y),θ) with probability 1 for all θΘ, and (b) the UMMSEUP of Z and the UMVUE of ψ(θ) are the same if and only if E[Z|y,θ] depends on y only through T(y) with probability 1. We also discuss optimum predictions when the bias and MSE are defined conditionally on y and z, respectively. The results are applied to uv method of estimation, prediction in mixed linear models, and estimation of the mean of a finite population under a super population model.

Introduction

Standard estimation theory deals with estimation of a function h(θ) of an unknown parameter vector θ based on the data y, postulating that y is the observed value of a random vector Y with density f(y|θ),θ∈Θ. That theory is well developed (e.g., Rao, 1973; Lehmann, 1983) but, it does not apply when the quantity of interest is not a function only of θ. Such situations arise in prediction under mixed linear models, estimation under superpopulation models, loss estimation, estimation after selection, the species problem, and in other applications (see Bjornstad (1996), and Section 4 for some specific examples). In this paper, we wanted to generalize unbiased estimation theory to cover such non-standard problems, and found that a prediction framework is suitable for that purpose. Thus, we consider prediction of the realized but unobserved value, z, of a random variable Z based on the observed value, y, of a random vector Y, given that the joint density of Y and Z is f(y,z|θ), where θΘ is an unknown parameter vector. This includes an estimation problem as a special case, where Z=h(θ), i.e., the distribution of Z given y and θ is degenerate and independent of y.

It may appear from Hill (1990) and Bjornstad (1996) that the following problem is more general than our prediction problem. Let Y and W be two random vectors with joint density f(y,w|θ),θ∈Θ, and Z=η(Y,W,θ) be a function of Y,W and θ. Here, Y is observed but, W remains unobserved and the problem is to predict the realized value of Z based on y. In this context, Hill (1990), Bjornstad (1996), and others have discussed some important concepts such as sufficiency, ancillarity, and the likelihood principle. We, however, note that although this framework covers standard estimation and prediction problems as special cases, it is not well defined because any function of W can also be regarded as a function of W=(W,W+) for any set of some additional random variables W+. Thus, the problem can also be stated in terms of Y,W,θ,β,f(y,w|θ,β) and η, where, β are some additional parameters needed for complete specification of the family of joint distributions of Y and W, and η(Y,W,θ,β)≡η(Y,W,θ) expresses the quantity of interest as a function of Y,W and β. The prediction framework avoids this ambiguity by dropping W and focusing directly on the family of joint distributions of Y and Z.

We consider the prediction problem to be a natural and general formulation of a statistical inference problem by taking the view that a problem starts with the quantity z of inferential interest and the available data y and then, to relate y to z we regard them as the realized values of two random observables Y and Z with joint distribution belonging to a specified family of distributions. The parameter θ just indexes this family. In some applications, consideration of other variables (W) may provide extra motivation and convenience for modeling the joint distribution of Y and Z, but it may not be necessary to consider W after the model (for Y and Z) is specified. We should note that a full statement of an inference problem also involves prior information, loss function, and evaluation criteria. But, a general discussion about their role and nature is not necessary for this paper.

It is well known that under minimum mean squared error, and several other criteria, the conditional mean of Z given y and θ, which we shall denote by γ(y,θ)=E[Z|y,θ], is the best predictor of Z (see Theorem 2.1, and Rao (1973, Section 4g.1)) provided that it is independent of θ. When γ(y,θ) depends on the unknown parameter θ, as is the case in most applications, a natural approach is to use an estimate of γ(y,θ) for predicting Z. Of course, various methods can be used for estimating γ(y,θ). In particular, one may (i) replace θ by an estimate in γ(y,θ), (ii) use the mean of an estimate of f(z|y,θ) or (iii) use an estimate of the mean of γ(Y,θ), denoted by ψ(θ)=E[γ(y,θ)θ]; note that ψ(θ) is also E[Z|θ]. For example, approach (iii) has been used in estimation following selection (e.g., Sackrowitz and Samuel-Cahn, 1984; Vellaisamy, 1993; Nayak, 1995), and in estimating the probability of discovering a new species in an additional selection (e.g., Robbins, 1968; Starr, 1979; Nayak 1992, Nayak 1996). Note that if Z and Y are independent given θ, or more generally if γ(y,θ) is independent of y, then ψ(θ)=γ(y,θ), and approaches (i) and (iii) are similar. We shall provide some answers to the natural question: how should one estimate γ(y,θ) to obtain a good predictor of Z? However, our approach is to characterize and derive the optimum predictors directly, as in unbiased estimation theory. Thus, we also generalize some results on estimation.

In this paper, we shall judge the performance of a predictor by its bias and mean squared error (MSE). Three different schemes of repeated applications of the predictor lead to the following measures of bias and MSE. A predictor δ(Y) of Z is said to be

(i) z-unbiased ifE[(δ−Z)|z,θ]=∫[δ(y)−z]f(y|z,θ)dy=0forallθ,z,

(ii) y-unbiased ifE[(δ−Z)|y,θ]=∫[δ(y)−z]f(z|y,θ)dz=0forallθ,y,

(iii) unbiased ifE[(δ−Z)|θ]=∫[δ(y)−z]f(z,y|θ)dydz=0forallθ.The corresponding measures of MSE of δ(Y) are:z-MSE(δ;θ,z)=E[(δ−Z)2|z,θ]=∫[δ(y)−z]2f(y|z,θ)dy,y-MSE(δ;θ,y)=E[(δ−Z)2|y,θ]=∫[δ(y)−z]2f(z|y,θ)dz,MSE(δ;θ)=E[(δ−Z)2|θ]=∫[δ(y)−z]2f(y,z|θ)dydz.Clearly, each of z and y-unbiasedness imply (overall) unbiasedness but, the converses need not be true, and neither of z,y-unbiasedness implies the other. The fact that the conditional expectations in , , , are defined uniquely only up to probability 1 will be taken into account in some later discussions. While the choice of bias and MSE measures depends on the specific application, our goal is to discuss consequences of various choices and their relationships.

In Section 2, first we show that a minimum MSE predictor (without imposing unbiasedness) exists if and only if γ(y,θ) is independent of θ. As this does not hold in most applications, we then discuss uniformly minimum MSE unbiased prediction of Z. We present a characterization of the UMMSEUP of Z (see Theorem 2.2), and a lower bound for the MSE of an unbiased predictor. We discuss some relationships between the UMMSEUP of Z and the UMVUE of ψ(θ), and present necessary and sufficient conditions for existence of the UMMSEUP of Z when the marginal model for Y admits a complete sufficient statistic. In Section 3, we discuss predictions based on the conditional measures of bias and MSE defined in , , , . We show that (i) y-unbiasedness is not useful in most applications, (ii) if Y and Z are independent given θ, a z-unbiased predictor does not exist, and (iii) when the marginal model for Y admits a complete sufficient statistic T(Y) and f(z|y,θ) depends on y only through T(y), the UMVUE of ψ(θ) is also the best z-unbiased predictor of Z, provided that a z-unbiased predictor exists. In Section 4, we apply our results to Robbins’ uv method of estimation, predictions in mixed linear models, and estimation of a finite population mean under a transformation model.

Section snippets

Best predictor

For completeness, we first consider finding a uniformly minimum mean squared error predictors (UMMSEP) among all predictors by minimizing the overall MSE defined in (1.6). If δ(Y) and δ(Y) are two predictors then,E[(δ−Z)2|θ]=E[(δ−Z)2|θ]+E[(δ−δ)2|θ]+2E[(δ−Z)(δ−δ)|θ],where the last expectation can also be expressed asE[(δ−Z)(δ−δ)|θ]=E[(δ−δ)E{(δ−Z)|Y,θ}|θ]=E[(δ−δ)(δ−γ)|θ].Using , it can be seen that δ(Y) is a UMMSEP, if and only if δ(Y) is a version of γ(Y,θ) for all θ, which can be true

Optimality under conditional measures

In this section we discuss comparison of predictors based on measures of bias and MSE defined conditionally on y and z, respectively. Effects of y-unbiasedness have essentially been discussed in Theorem 2.1. A y-unbiased predictor of Z exists only if γ(y,θ) is independent of θ, and hence efforts to find a best y-unbiased predictor will be fruitless, except in very special circumstances. So, we shall focus on using z-unbiasedness and z-MSE for choosing a predictor. Taking the null events into

Applications

Example 1

We use this simple example to illustrate how different criteria may lead to different results. Let X1,…,Xn,Xn+1 be iid N(θ,1) and let Y=(X1,…,Xn). Here, X̄=(X1+⋯+Xn)/n is a complete sufficient statistic.

(a) Let Z=X1θ. Then, γ(Y,θ)=E[Z|Y,θ]=X1−θ=Z and ψ(θ)=E[Z|θ]=0. Here, the UMVUE of ψ(θ) is δ1(Y)=0, and the UMMSEUP of Z is δ2(Y)=X1X̄ (by Theorem 2.4). It can be seen that f(y|z,θ) is (z,θ)-identifiable, T=(X1,X̄−1) is a minimal prediction sufficient statistic, where X̄−1=(∑i=2nXi)/(n−1), and

Discussion

Prediction is a basic problem in statistical inference. In this paper, we have made some contribution to minimum MSE unbiased prediction and its relationships to unbiased estimation. Further, as estimation is a special case of prediction, we also generalized some results on unbiased estimation. We proved that the UMVUE of ψ(θ) is also the UMMSEUP of Z (and hence a good predictor of Z) if (i) Y and Z are independent given θ or (ii) if γ(y,θ) depends on y only through a complete sufficient

Acknowledgements

This research was supported by an ASA/USDA-NASS research fellowship. The author thanks Joseph L. Gastwirth, Mike Fleming, and two referees for some helpful comments.

References (34)

  • T.K. Nayak

    On statistical analysis of a sample from a population of unknown species

    J. Statist. Plann. Inference

    (1992)
  • Aitchison, J., Dunsmore, I.R., 1975. Statistical Prediction Analysis. Cambridge University Press, New...
  • G.E. Battese et al.

    An error-components model for prediction of county crop areas using survey and satellite data

    J. Amer. Statist. Assoc.

    (1988)
  • J.F. Bjornstad

    Predictive likelihood: a review (with discussion)

    Statist. Sci.

    (1990)
  • J.F. Bjornstad

    On the generalization of the likelihood function and the likelihood principle

    J. Amer. Statist. Assoc.

    (1996)
  • Cassel, C., Sarndal, C., Wretman, J.H., 1977. Foundations of Inference in Survey Sampling. Wiley, New...
  • A. Cohen et al.

    Admissibility of estimators of the probability of unobserved outcomes

    Ann. Inst. Statist. Math.

    (1990)
  • G.S. Datta et al.

    On the frequentist properties of some hierarchical Bayes predictors in finite population sampling

    Calcutta Statist. Assoc. Bull.

    (1990–91)
  • G.S. Datta et al.

    Bayesian prediction in linear models: applications to small area estimation

    Ann. Statist.

    (1991)
  • A.P. Dawid

    Conditional independence in statistical theory (with discussion)

    J. Roy. Statist. Soc. B

    (1979)
  • Geisser, S., 1993. Predictive Inference: An Introduction. Chapman & Hall, New...
  • A.S. Goldberger

    Best linear unbiased prediction in the generalized regression model

    J. Amer. Statist. Assoc.

    (1962)
  • D.A. Harville

    Extension of the Gauss–Markov theorem to include the estimation of random effects

    Ann. Statist.

    (1976)
  • D.A. Harville

    Decomposition of prediction error

    J. Amer. Statist. Assoc.

    (1985)
  • J.R. Hill

    A general framework for model-based statistics

    Biometrika

    (1990)
  • Johnstone, I.M., 1988. On admissibility of unbiased estimates of loss. In: Gupta, S.S., Berger, J.O. (Eds.),...
  • Lehmann, E.L., 1983. Theory of Point Estimation. Wiley, New...
  • Cited by (17)

    • Bayes, E-Bayes and robust Bayes prediction of a future observation under precautionary prediction loss functions with applications

      2016, Applied Mathematical Modelling
      Citation Excerpt :

      This subject is the one which frequentists and Bayesians alike, agree on being Bayesian for predictions. For more information see Eaton [1], Bjornstad [2], Nayak [3], Wang and Veraverbeke [4], Karimnezhad et al. [5], Golparver et al. [6] and references cited therein. Prediction of a future observation has been developed under different prediction loss functions, see Jafari Jozani and Parsian [7] for prediction under Entropy prediction loss, Kiapour and Nematollahi [8] for prediction under balanced prediction loss function, among others.

    • The concept of risk unbiasedness in statistical prediction

      2010, Journal of Statistical Planning and Inference
    • Random function prediction and stein's identity

      2002, Statistics and Probability Letters
    View all citing articles on Scopus
    View full text