Abstract
Goodness-of-fit tests for quantile regression models, in the presence of missing observations in the response variable, are introduced and analysed in this paper. The different proposals are based on the construction of empirical processes considering three different approaches which involve the use of the gradient vector of the quantile function, a linear projection of the covariates (suitable for high-dimensional settings) and a projection of the estimating equations. Besides, two types of estimators for the null parametric model to be tested are considered. The performance of the different test statistics is analysed in an extensive simulation study. An application to real data is also included.
Similar content being viewed by others
References
Bahari F, Parsi S, Ganjali M (2019) Empirical likelihood inference in general linear models with missing values in response and covariates by MNAR mechanism. Stat Pap. https://doi.org/10.1007/s00362-019-01103-0
Bianco A, Boente G, González-Manteiga W, Pérez-González A (2011) Asymptotic behavior of robust estimators in partially linear models with missing responses: the effect of estimating the missing probability on the simplified marginal estimators. Test 20:524–548
Bierens HJ, Ginther DK (2001) Integrated conditional moment testing of quantile regression models. Empir Econ 26:307–324
Benoit DF, Alhamzawi R, Yu K (2013) Bayesian lasso binary quantile regression. Comput Stat 28:2861–2873
Chen X, Wan ATK, Zhou Y (2015) Efficient quantile regression analysis with missing observations. J Am Stat Assoc 10:723–741
Conde-Amboage M, Sánchez-Sellero C, González-Manteiga W (2015) A lack-of-fit test for quantile regression models with high-dimensional covariates. Comput Stat Data Anal 88:128–138
Cotos-Yáñez TR, Pérez-González A, González-Manteiga W (2016) Model checks for nonparametric regression with missing data: a comparative study. J Stat Comput Simul 86:3188–3204
Davino C, Furno M, Vistocco D (2014) Quantile regression: theory and applications. Wiley, Hoboken
Dong C, Li G, Feng X (2019) Lack-of-fit tests for quantile regression models. J R Stat Soc B. https://doi.org/10.1111/rssb.12321
Escanciano JC (2006) A consistent diagnostic test for regression models using projections. Econom Theory 22:1030–1051
Escanciano JC, Goh SC (2014) Specification analysis of linear quantile models. J Econom 178:495–507
Feng X, He X, Hu J (2011) Wild bootstrap for quantile regression. Biometrika 98:995–999
García-Portugués E, González-Manteiga W, Febrero-Bande M (2014) A goodness-of-fit test for the functional linear model with scalar response. J Comput Graph Stat 23:761–778
He X, Zhu L-X (2003) A lack-of-fit test for quantile regression. J Am Stat Assoc 98:1013–1022
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
Hruschka ER, Hruschka ER Jr., Ebecken NFF (2003) Evaluating a nearest–neighbor method to substitute continuous missing values. AI 2003: advances in artificial intelligence, pp 723–734. Springer, Berlin
Huang Q, Zhang H, Chen J, He M (2017) Quantile regression models and their applications: a review. J Biom Biostat 8:2–6
Koenker R (2005) Quantile regression. Cambridge University Press, Cambridge
Koenker R, Bassett GS (1978) Regression quantiles. Econometrica 46:33–50
Otsu T (2008) Conditional empirical likelihood estimation and inference for quantile regression models. J Econom 142:508–538
Purwar A, Singh SK (2015) Hybrid prediction model with missing value imputation for medical data. Expert Syst Appl 42:5621–5631
Ruppert D, Wand MP (1994) Multivariate locally weighted least squares regression. Ann Stat 22:1346–1370
Shen Y, Liang HY (2018) Quantile regression and its empirical likelihood with missing response at random. Stat Pap 59:685–707
Sherwood B, Wang L, Zhou X (2013) Weighted quantile regression for analyzing health care cost data with missing covariates. Stat Med 32:4967–4979
Stute W (1997) Nonparametric model checks for regression. Ann Stat 25:613–641
Sun Z, Wang Q, Dai P (2009) Model checking for partially linear models with missing responses at random. J Multivar Anal 100:636–651
Sun Z, Chen F, Zhou X, Zhang Q (2017) Improved model checking methods for parametric models with responses missing at random. J Multivar Anal 154:147–161
van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York
Wang CY, Wang S, Gutiérrez RG, Carroll RJ (1998) Local linear regression for generalized linear models with missing data. Ann Stat 26:1028–1050
Wei Y, Yang Y (2014) Quantile regression with covariates missing at random. Stat Sin 24:1277–1299
Xu W, Zhu L (2013) Testing the adequacy of varying coefficient models with missing responses at random. Metrika 76:53–69
Xu HX, Fan GL, Liang HY (2017) Hypothesis test on response mean with inequality constraints under data missing when covariables are present. Stat Pap 58:53–75
Yu K, Lu Z, Stander J (2003) Quantile regression: applications and current research areas. J R Stat Soc Ser D 3:331–350
Zheng JX (1998) A consistent nonparametric test of parametric regression models under conditional quantile restrictions. Econom Theory 14:123–138
Zhou Y, Wan ATK, Wang X (2008) Estimating equation inference with missing data. J Am Stat Assoc 103:1187–1199
Acknowledgements
The authors acknowledge the support of the Projects MTM2016-76969-P (AEI/FEDER, UE) by the Spanish Ministry of Economy and Competitiveness, MTM2017-89422-P by the Spanish Ministry of Economy, Industry and Competitiveness and the support of the Competitive Reference Groups, 2016–2019 (ED431C 2016/040) and 2017–2020 (ED431C 2017/38), supported by the Consellería de Cultura, Educación e Ordenación Universitaria, Xunta de Galicia. We would also like to thank the reviewers of the paper and the Associate Editor for their interesting comments, which have helped to improve the contents of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Theoretical results
In order to derive the asymptotic properties of the empirical processes, it is crucial to obtain the following representation of the estimators \(\hat{\theta }_{S}\) and \(\hat{\theta }_{W}\) resulting from (7) and (8) respectively. The following hypothesis on the missing probability are required:
-
H1.
\(\inf _{x}p\left( x\right)>C_{0}>0\).
-
H2.
\(\sup _{x} \left| \hat{p}\left( x\right) -p\left( x\right) \right| \xrightarrow {a.s.} 0.\)
1.1 Previous lemmas
Lemma 1
Consider a random sample \(\{(X_i,Y_i,\delta _i)\}\), with \(i=1,\ldots ,n\), from model (11) and assume that \(f\left( \cdot |X\right) \) (the distribution of the error, conditioned by X) is bounded in a neighbourhood of zero with \(f\left( 0|X\right) >0\), and \(\left| f\left( t|X\right) -f\left( 0|X\right) \right| \le c\left| t\right| ^{1/2}\) for some \(c<\infty \). Assume also that there exist \(A\left( x\right) \), \(B\left( x\right) \), such that
with bounded \(E\left( \left| A\left( X\right) \right| ^3\right) \), \(E\left( \left| h\left( X\right) A\left( X\right) \right| \right) \), \(E\left( \left| h\left( X\right) A\left( X\right) ^3\right| \right) \) and \(E\left( \left| B\left( X\right) \right| ^2\right) \). If H1 holds, then:
where \(\epsilon _{j}=Y_{j}-g\left( X_{j};\theta _0\right) \) and \(\mathbf S =E\left[ p\left( X\right) )f\left( 0|X\right) \dot{g}\left( X;\theta _0\right) \dot{g}^T\left( X;\theta _0\right) \right] \).
Proof of Lemma 1
The proof of this lemma can be obtained following arguments as in the Lemma 1 in He and Zhu (2003). \(\square \)
Lemma 2
Under the conditions of Lemma 1, assuming that H1 and H2 hold, it can be proved that:
where \(\mathbf S _W=E\left[ f\left( 0|X\right) \dot{g}\left( X;\theta _0\right) \dot{g}\left( X;\theta _0\right) ^T\right] \).
Proof of Lemma 2
Let \(\hat{\theta }_{W}\) be obtained by minimising in \(\theta \) the following expression:
Using directional derivatives over \( S\left( \theta \right) \) and following similar calculations to Lemma A.1 in He and Zhu (2003) lead to:
The right hand side of the previous expression can be expressed as
The number of residuals equal to zero, \(\left( Y_{i}=g\left( X_{i};\hat{\theta }_{W}\right) \right) \), is a finite number with probability 1. Considering the moment conditions of \(A\left( X\right) \) and the properties of p and \(\hat{p}\), the previous expression is bounded by \(o_{p}\left( \sqrt{n}\right) \). Then, (28) can be written as:
Using the conditions on p an \(\hat{p}\), it is easy to check that:
Denote by e, a new variable with the same distribution as \(\epsilon \) (error variable in model (1)), i.e. with distribution function F and density f. Define \(l\left( X_{i};\hat{\theta }_W\right) =g\left( X_{i};\hat{\theta }_W\right) -g\left( X_{i};\theta _0\right) -\frac{h\left( X_{i}\right) }{\sqrt{n}}\). Then \(Y_{i}-g\left( X_{i};\hat{\theta }_W\right) =\epsilon _{i}-l\left( X_{i};\hat{\theta }_W\right) \).
Consider now the following decomposition:
where \(\left( i\right) \) is obtained using (29). Now, applying the local expansions of functions F and g respectively, it can be proved that:
where \(\mathbf S =E\left[ f\left( 0|X\right) \dot{g}\left( X;\theta _0\right) \dot{g}^T\left( X;\theta _0\right) \right] \).
On the other hand, following the arguments proving A.3 in He and Zhu (2003) and the hypotheses on p, it can be shown that:
Thus, the asymptotic expression for \(\left( \hat{\theta }_W-\theta _0\right) \) is obtained. \(\square \)
1.2 Main theorems
Theorem 1
Under the conditions of Lemma 1, assuming that H1 and H2 hold, the empirical processes \(R_{n}^{1}\) and \(R_{n,W}^{1}\) can be written as:
uniformly in t, where \(f\left( 0|X\right) \) denotes the conditional density of the error at zero and the matrices \(\mathbf S \left( t\right) \) and \(\mathbf S \) are defined as:
and similarly
uniformly in t, where:
Proof of Theorem 1
By some simple computations, it can be seen that
The result for \(R_{n}^{1}\) can be obtained in a similar way to He and Zhu (2003), under H1. For simplicity, the result for \(R_{n,W}^1\) will be only presented. In this case, using analogous arguments to those considered in other papers on goodness-of-fit tests for regression with missing responses (see Sun et al. (2009), Xu and Zhu (2013) or Sun et al. (2017), among others) and assuming that H1 and H2 hold, it can be proved that:
The empirical process can be decomposed as:
where \(l\left( X_{i};\hat{\theta }_{W}\right) =g\left( X_{i};\hat{\theta }_{W}\right) -g\left( X_{i};,\theta _0\right) -n^{-1/2}h\left( X_{i}\right) \). Similar to (A.3) in He and Zhu (2003), under H1, the second part of the previous expression can be approximated by
uniformly in t. Moreover, the first addend in (30) can be approximated as
Replacing the asymptotic expression of \(\left( \hat{\theta }_{W}-\theta _0\right) \) obtained in Lemma 2, the following representation holds:
\(\square \)
Theorem 2
Under the conditions of Lemma 1, assuming that H1 and H2 hold, the empirical processes \(R_{n}^{2}\) and \(R_{n,W}^{2}\) can be written as:
uniformly in \(\left( \beta ,u\right) \), where \(\mathbf S =E\left[ p\left( X\right) f\left( 0|X\right) \dot{g}\left( X;\theta _0\right) \dot{g}^T\left( X;\theta \right) \right] \) and \(\mathbf S \left( \beta ,u\right) =E\left[ p\left( X\right) f\left( 0|X\right) \dot{g}\left( X;\theta _0\right) \dot{g}^T\left( X;\theta _0\right) I\left( \beta ^TX\le u\right) \right] \).
uniformly in \(\left( \beta ,u\right) \), where \(\mathbf S _W=E\left[ f\left( 0|X\right) \dot{g}\left( X;\theta _0\right) \dot{g}^T\left( X;\theta _0\right) \right] \) and \(\mathbf S _W\left( \beta ,u\right) =E\left[ f\left( 0|X\right) \dot{g}\left( X;\theta _0\right) \dot{g}^T\left( X;\theta _0\right) I\left( \beta ^TX\le u\right) \right] .\)
Proof of Theorem 2
The proof of this theorem follows arguments to those in Theorem 1 in Conde-Amboage et al. (2015). \(\square \)
Extended simulation results
The design of the simulation study carried out in this work has been described in Sect. 3. Some partial results have been presented in the aforementioned section, which are now completed with the detailed results presented in this Appendix. Tables 8 and 9 provide the percentage of rejections for the tests based just on empirical processes, for \(d=1\) and \(d=2\) and different values of \(\alpha \). For tests based on empirical processes considering projections, Tables 10 and 11 present analogous results taking \(d=2\) and \(d=4\). Note that results corresponding to \(\alpha =0.05\) coincide with those reported in Sect. 3.
In addition, for \(d=2\), the behaviour of all the proposed tests have been compared. Results can be seen in Tables 12, 13, and 14, for different values of a and sample size \(n=100\).
Finally, the comparison of the tests behaviour taking the real or an estimated missing model (p or \({\hat{p}}\)) is presented in Table 15, for different values of \(\alpha \) and \(n=100\).
Rights and permissions
About this article
Cite this article
Pérez-González, A., Cotos-Yáñez, T.R., González-Manteiga, W. et al. Goodness-of-fit tests for quantile regression with missing responses. Stat Papers 62, 1231–1264 (2021). https://doi.org/10.1007/s00362-019-01135-6
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-019-01135-6