Abstract
In this paper, we consider a general regression model where missing data occur in the response and in the covariates. Our aim is to estimate the marginal distribution function and a marginal functional, such as the mean, the median or any \(\alpha \)-quantile of the response variable. A missing at random condition is assumed in order to prevent from bias in the estimation of the marginal measures under a non-ignorable missing mechanism. We give two different approaches for the estimation of the responses distribution function and of a given marginal functional, involving inverse probability weighting and the convolution of the distribution function of the observed residuals and that of the observed estimated regression function. Through a Monte Carlo study and two real data sets, we illustrate the behaviour of our proposals.
Similar content being viewed by others
References
Aerts M, Claeskens G, Hens N, Molenberghs G (2002) Local multiple imputation. Biometrika 89:375–388
Bahadur RR (1966) A note on quantiles in large samples. Ann Math Stat 37:577–580
Bali L (2012) Métodos robustos de estimación de componentes principales funcionales y el modelo de componentes principales comunes. Ph. Thesis. Universidad de Buenos Aires (in spanish). Available at http://cms.dm.uba.ar/academico/carreras/doctorado/2012/tesisBali.pdf.
Bianco A, Boente G, González-Manteiga W, Pérez-González A (2010) Estimation of the marginal location under a partially linear model with missing responses. Comput Stat Data Anal 54:546–564
Bianco A, Spano P (2017) Robust inference for nonlinear regression models. https://doi.org/10.1007/s11749-017-0570-2
Billingsley P (1968) Convergence of probability measures. Wiley, New York
Boente G, González-Manteiga W, Pérez-González A (2009) Robust nonparametric estimation with missing data. J Stat Plan Inference 139:571–592
Bravo F (2015) Semiparametric estimation with missing covariates. J Multivar Anal 139:329–346
Bravo F, Jacho-Chávez D (2016) Semiparametric quasi-likelihood estimation with missing data. Commun Stat Theory Methods 45:1345–1369
Burton A, Altman DG (2004) Missing covariate data within cancer prognostic studies: a review of current reporting and proposed guidelines. Br J Cancer 91:4–8
Chen H, Chen K (1991) Selection of the splined variables and convergence rates in a partial spline model. Can J Stat 19:323–339
Chen Q, Ibrahim J, Chen M, Senchaudhuri P (2008) Theory and inference for regression models with missing responses and covariates. J Multivar Anal 99:1302–1331
Chen J, Shao J (2000) Nearest neighbor imputation for survey data. J Off Stat 16:113–131
Chen S, Van Keilegom I (2013) Estimation in semiparametric models with missing data. Ann Inst Math Stat 65:785–805
Chen X, Wan A, Zhou Y (2015) Efficient quantile regression analysis with missing observations. J Am Stat Assoc 110:723–741
Cheng PE (1994) Nonparametric estimation of mean functionals with data missing at random. J Am Stat Assoc 89:81–87
Cheng PE, Chu CK (1996) Kernel estimation of distribution functions and quantiles with missing data. Stat Sinica 6:63–78
Cleveland W (1985) The elements of graphing data. Bell Telephone Laboratories Inc., New Jersey
Collomb G (1979) Conditions nécessaires et suffisantes de convergence uniforme d’un estimateur de la régression, estimation des dérivées de la régression. Comptes Rendus Academie de Sciencies de Paris 228:161–163
Daniel C, Wood F (1980) Fitting equations to data: computer analysis of multifactor data. Wiley, New York
Díaz I (2017) Efficient estimation of quantiles in missing data models. J Stat Plan Inference 190:39–51
Fernholz L (1993) Smoothed versions of statistical functionals. In: Morgenthaler S, Ronchetti E, Stahel W (eds) New directions in statistical data analysis and robustness. Birkhauser, Basel, pp 61–72
Härdle W, Liang H, Gao J (2000) Partially linear models. Springer, Heidelberg
Härdle W, Müller M, Sperlich S, Werwatz A (2004) Nonparametric and semiparametric models. Springer, Heidelberg
He X, Zhu Z, Fung W (2002) Estimation in a semiparametric model for longitudinal data with unspecified dependence structure. Biometrika 89:579–590
Hirano K, Imbens G, Ridder G (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71:1161–1189
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
Huber P, Ronchetti E (2009) Robust statistics. Wiley, New York
Liang H, Wang S, Robins J, Carroll R (2004) Estimation in partially linear models with missing covariates. J Am Stat Assoc 99:357–367
Little R (1992) Regression with missing X’s: a review. J Am Stat Assoc 87:1227–1237
Little R, Rubin D (2002) Statistical analysis with missing data. Wiley, New York
Müller U (2009) Estimating linear functionals in nonlinear regression with responses missing at random. Ann Stat 37:2245–2277
Pollard D (1984) Convergence of stochastic processes. Springer, New York
Robinson P (1988) Root-n-consistent semiparametric regression. Econometrica 56:931–954
Schumaker L (1981) Spline functions: basic theory. Wiley, New York
Sued M, Yohai V (2013) Robust location estimation with missing data. Can J Stat 41:111–132
Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Reading
Varadarajan VS (1958) On the convergence of sample probability distributions. Sanky\(\bar{a}\) Indian J Stat 19:23–26
Wang Q, Linton O, Härdle W (2004) Semiparametric regression analysis with missing response at random. J Am Stat Assoc 99:334–345
Wang W, Rao J (2002) Empirical likelihood-based inference under imputation for missing response data. Ann Stat 30:896–924
Yang SS (1985) A smooth nonparametric estimator of a quantile function. J Am Stat Assoc 80:1004–1011
Yates F (1933) The analysis of replicated experiments when the field results are incomplete. Empire J Exp Agric 1:129–142
Zhang Z, Chen Z, Troendle JF, Zhang J (2012) Causal inference on quantiles with an obstetric application. Biometrics 68:697–706
Zhou Y, Wan ATK, Wang X (2008) Estimating equation inference with missing data. J Am Stat Assoc 103:1187–1199
Acknowledgements
The authors wish to thank two anonymous referees for valuable comments which led to an improved version of the original paper. This work was partially developed while Ana M. Bianco and Graciela Boente were visiting the Departamento de Estatística, Análise Matemática e Optimización de la Universidad de Santiago de Compostela, Spain under the bilateral agreement between the Universidad de Buenos Aires and the Universidad de Santiago de Compostela. This research was partially supported by Grants pict 2014-0351 from anpcyt and 20020130100279BA from the Universidad de Buenos Aires, Argentina and also by the Spanish Projects MTM2013-41383P and MTM2016-76969P from the Ministry of Science and Innovation, Spain. A. Bianco and G. Boente also wish to thank the Minerva Foundation for its support to present some of this paper results at the International Conference on Robust Statistics 2017.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Bianco, A.M., Boente, G., González-Manteiga, W. et al. Plug-in marginal estimation under a general regression model with missing responses and covariates. TEST 28, 106–146 (2019). https://doi.org/10.1007/s11749-018-0591-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-018-0591-5
Keywords
- Fisher consistency
- Kernel weights
- L-estimators
- Marginal functionals
- Missing at random
- Semiparametric models