Abstract
We consider parameter estimation in parametric regression models with covariates missing at random. This problem admits a semiparametric maximum likelihood approach which requires no parametric specification of the selection mechanism or the covariate distribution. The semiparametric maximum likelihood estimator (MLE) has been found to be consistent. We show here, for some specific models, that the semiparametric MLE converges weakly to a zero-mean Gaussian process in a suitable space. The regression parameter estimate, in particular, achieves the semiparametric information bound, which can be consistently estimated by perturbing the profile log-likelihood. Furthermore, the profile likelihood ratio statistic is asymptotically chi-squared. The techniques used here extend to other models.
Similar content being viewed by others
References
Bickel P.J., Klaassen C.A.J., Ritov Y., Wellner J.A. (1993). Efficient and adaptive estimation for semiparametric models. Johns Hopkins University Press, Baltimore
Breslow N.E., McNeney B., Wellner J.A. (2003). Large sample theory for semiparametric regression models with two-phase, outcome dependent sampling. Annals of Statistics 31, 1110–1139
Carroll R.J., Wand M.P. (1991). Semiparametric estimation in logistic measurement error models. Journal of the Royal Statistical Society, Series B 53, 573–585
Chatterjee N., Chen Y.H., Breslow N.E. (2003). A pseudoscore estimator for regression problems with two-phase sampling. Journal of the American Statistical Association 98, 158–168
Chen H.Y. (2004). Nonparametric and semiparametric models for missing covariates in parametric regression. Journal of the American Statistical Association 99, 1176–1189
Ibrahim J.G., Chen M.H., Lipsitz S.R. (1999). Monte Carlo EM for missing covariates in parametric regression models. Biometrics 55, 591–596
Lawless J.F., Kalbfleisch J.D., Wild C.J. (1999). Semiparametric methods for response-selective and missing data problems in regression. Journal of the Royal Statistical Society, Series B 61, 413–438
Murphy S.A., van der Vaart A.W. (2000). On profile likelihood (with discussion). Journal of the American Statistical Association 95, 449–465
Murphy S.A., van der Vaart A.W. (2001). Semiparametric mixtures in case-control studies. Journal of Multivariate Analysis 79, 1–32
Pepe M.S., Fleming T.R. (1991). A nonparametric method for dealing with mismeasured covariate data. Journal of the American Statistical Association 86, 108–113
Reilly M., Pepe M.S. (1995). A mean score method for missing and auxiliary covariate data in regression models. Biometrika 82, 299–314
Robins J.M., Rotnitzky A., Zhao L.P. (1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89, 846–866
Robins J.M., Hsieh F., Newey W. (1995a). Semiparametric efficient estimation of a conditional density with missing or mismeasured covariates. Journal of the Royal Statistical Society, Series B 57, 409–424
Robins J.M., Rotnitzky A., Zhao L.P. (1995b). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association 90, 106–121
Roeder K., Carroll R.J., Lindsay B.G. (1996). A semiparametric mixture approach to case-control studies with errors in covariables. Journal of the American Statistical Association 91, 722–732
Rubin D.B. (1976). Inference and missing data. Biometrika 63, 581–592
Rudin W. (1973). Functional analysis. McGraw-Hill, New York
van der Vaart A.W. (1994). Maximum likelihood estimation with partially censored data. Annals of Statistics 22, 1896–1916
van der Vaart A.W. (1998). Asymptotic statistics. Cambridge University Press, New York
van der Vaart A.W., Wellner J.A. (1996). Weak Convergence and Empirical Processes with Applications to Statistics. Springer-Verlag, Berlin Heidelberg New York
van der Vaart A.W., Wellner J.A. (2001). Consistency of semiparametric maximum likelihood estimators for two-phase sampling. Canadian Journal of Statistics 29, 269–288
Wild C.J. (1991). Fitting prospective regression models to case-control data. Biometrika 78, 705–717
Zhang Z., Rockette H.E. (2005a). On maximum likelihood estimation in parametric regression with missing covariates. Journal of Statistical Planning and Inference 134, 206–223
Zhang, Z., Rockette, H.E. (2005b). An EM algorithm for regression analysis with incomplete covariate information. Journal of Statistical Computation and Simulation (in press).
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Zhang, Z., Rockette, H.E. Semiparametric Maximum Likelihood for Missing Covariates in Parametric Regression. AISM 58, 687–706 (2006). https://doi.org/10.1007/s10463-006-0047-7
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-006-0047-7