Skip to main content
Log in

Semiparametric Maximum Likelihood for Missing Covariates in Parametric Regression

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

We consider parameter estimation in parametric regression models with covariates missing at random. This problem admits a semiparametric maximum likelihood approach which requires no parametric specification of the selection mechanism or the covariate distribution. The semiparametric maximum likelihood estimator (MLE) has been found to be consistent. We show here, for some specific models, that the semiparametric MLE converges weakly to a zero-mean Gaussian process in a suitable space. The regression parameter estimate, in particular, achieves the semiparametric information bound, which can be consistently estimated by perturbing the profile log-likelihood. Furthermore, the profile likelihood ratio statistic is asymptotically chi-squared. The techniques used here extend to other models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bickel P.J., Klaassen C.A.J., Ritov Y., Wellner J.A. (1993). Efficient and adaptive estimation for semiparametric models. Johns Hopkins University Press, Baltimore

    MATH  Google Scholar 

  • Breslow N.E., McNeney B., Wellner J.A. (2003). Large sample theory for semiparametric regression models with two-phase, outcome dependent sampling. Annals of Statistics 31, 1110–1139

    Article  MathSciNet  Google Scholar 

  • Carroll R.J., Wand M.P. (1991). Semiparametric estimation in logistic measurement error models. Journal of the Royal Statistical Society, Series B 53, 573–585

    MathSciNet  Google Scholar 

  • Chatterjee N., Chen Y.H., Breslow N.E. (2003). A pseudoscore estimator for regression problems with two-phase sampling. Journal of the American Statistical Association 98, 158–168

    Article  MathSciNet  Google Scholar 

  • Chen H.Y. (2004). Nonparametric and semiparametric models for missing covariates in parametric regression. Journal of the American Statistical Association 99, 1176–1189

    Article  MathSciNet  Google Scholar 

  • Ibrahim J.G., Chen M.H., Lipsitz S.R. (1999). Monte Carlo EM for missing covariates in parametric regression models. Biometrics 55, 591–596

    Article  Google Scholar 

  • Lawless J.F., Kalbfleisch J.D., Wild C.J. (1999). Semiparametric methods for response-selective and missing data problems in regression. Journal of the Royal Statistical Society, Series B 61, 413–438

    Article  MathSciNet  Google Scholar 

  • Murphy S.A., van der Vaart A.W. (2000). On profile likelihood (with discussion). Journal of the American Statistical Association 95, 449–465

    Article  MathSciNet  Google Scholar 

  • Murphy S.A., van der Vaart A.W. (2001). Semiparametric mixtures in case-control studies. Journal of Multivariate Analysis 79, 1–32

    Article  MathSciNet  Google Scholar 

  • Pepe M.S., Fleming T.R. (1991). A nonparametric method for dealing with mismeasured covariate data. Journal of the American Statistical Association 86, 108–113

    Article  MathSciNet  Google Scholar 

  • Reilly M., Pepe M.S. (1995). A mean score method for missing and auxiliary covariate data in regression models. Biometrika 82, 299–314

    Article  MathSciNet  Google Scholar 

  • Robins J.M., Rotnitzky A., Zhao L.P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89, 846–866

    Article  MathSciNet  Google Scholar 

  • Robins J.M., Hsieh F., Newey W. (1995a). Semiparametric efficient estimation of a conditional density with missing or mismeasured covariates. Journal of the Royal Statistical Society, Series B 57, 409–424

    MathSciNet  Google Scholar 

  • Robins J.M., Rotnitzky A., Zhao L.P. (1995b). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association 90, 106–121

    Article  MathSciNet  Google Scholar 

  • Roeder K., Carroll R.J., Lindsay B.G. (1996). A semiparametric mixture approach to case-control studies with errors in covariables. Journal of the American Statistical Association 91, 722–732

    Article  MathSciNet  Google Scholar 

  • Rubin D.B. (1976). Inference and missing data. Biometrika 63, 581–592

    Article  MathSciNet  Google Scholar 

  • Rudin W. (1973). Functional analysis. McGraw-Hill, New York

    MATH  Google Scholar 

  • van der Vaart A.W. (1994). Maximum likelihood estimation with partially censored data. Annals of Statistics 22, 1896–1916

    MathSciNet  Google Scholar 

  • van der Vaart A.W. (1998). Asymptotic statistics. Cambridge University Press, New York

    MATH  Google Scholar 

  • van der Vaart A.W., Wellner J.A. (1996). Weak Convergence and Empirical Processes with Applications to Statistics. Springer-Verlag, Berlin Heidelberg New York

    MATH  Google Scholar 

  • van der Vaart A.W., Wellner J.A. (2001). Consistency of semiparametric maximum likelihood estimators for two-phase sampling. Canadian Journal of Statistics 29, 269–288

    Google Scholar 

  • Wild C.J. (1991). Fitting prospective regression models to case-control data. Biometrika 78, 705–717

    Article  MathSciNet  Google Scholar 

  • Zhang Z., Rockette H.E. (2005a). On maximum likelihood estimation in parametric regression with missing covariates. Journal of Statistical Planning and Inference 134, 206–223

    Article  MathSciNet  Google Scholar 

  • Zhang, Z., Rockette, H.E. (2005b). An EM algorithm for regression analysis with incomplete covariate information. Journal of Statistical Computation and Simulation (in press).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiwei Zhang.

About this article

Cite this article

Zhang, Z., Rockette, H.E. Semiparametric Maximum Likelihood for Missing Covariates in Parametric Regression. AISM 58, 687–706 (2006). https://doi.org/10.1007/s10463-006-0047-7

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-006-0047-7

Keywords

Navigation