Abstract
In this paper, we consider the statistical inferences for a class of partially linear models with high dimensional endogenous covariates, when high dimensional instrumental variables are also available. A regularized estimation procedure is proposed for identifying the optimal instrumental variables, and estimating covariate effects of the parametric and nonparametric components. Under some conditions, some theoretical properties are studied, such as the consistency of the optimal instrumental variable identification and significant covariate selection. Furthermore, some simulation studies and a real data analysis are carried out to examine the finite sample performance of the proposed method.
Similar content being viewed by others
References
Cai, Z., & Xiong, H. (2012). Partially varying coefficient instrumental variables models. Statistca Neerlandica, 66, 85–110.
Card, D. (1995). Using geographic variation in college proximity to estimate the return to schooling. In L. Christofides, E. Grant, & R. Swidinsky (Eds.), Aspects of Labor Market Behaviour: Essays in Honour of John Vanderkamp (pp. 201–222). Toronto: University of Toronto Press.
Chen, B. C., Liang, H., & Zhou, Y. (2016). GMM estimation in partial linear models with endogenous covariates causing an over-identified problem. Communications in Statistics - Theory and Methods, 45, 3168–3184.
Didelez, V., Meng, S., & Sheehan, N. A. (2010). Assumptions of IV methods for observational epidemiology. Statistical Science, 25, 22–40.
Fan, J. Q., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Fan, J. Q., & Li, R. Z. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of the American Statistical Association, 99, 710–723.
Fan, J. Q., & Liao, Y. (2014). Endogeneity in dimensions. The Annals of Statistics, 42, 872–917.
Frank, I. E., & Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics, 35, 109–135.
Gao, X., & Huang, J. (2010). Asymptotic analysis of high-dimensional lad regression with lasso. Statistica Sinica, 20, 1485–1506.
Greenland, S. (2000). An introduction to instrumental variables for epidemiologists. International Journal of Epidemiologists, 29, 722–729.
Hernan, M. A., & Robins, J. M. (2006). Instruments for causal inference-an epidemiologists dream? Epidemiology, 17, 360–372.
Huang, J. T., & Zhao, P. X. (2017). QR decomposition based orthogonality estimation for partially linear models with longitudinal data. Journal of Computational and Applied Mathematics, 321, 406–415.
Huang, J. T., & Zhao, P. X. (2018). Orthogonal weighted empirical likelihood based variable selection for semiparametric instrumental variable models. Communications in Statistics-Theory and Methods, 47, 4375–4388.
Knight, K. (1998). Limiting distributions for \(L_{1}\) regression estimators under general conditions. The Annals of Statistics, 26, 755–770.
Koenker, R. (2005). Quantile Regression. Cambridge: Cambridge University Press.
Lee, E. R., Cho, J., & Yu, K. (2019). A systematic review on model selection in high-dimensional regression. Journal of the Korean Statistical Society, 48, 1–12.
Lin, W., Feng, R., & Li, H. Z. (2015). Regularization methods for high-dimensional instrumental variables regression with an application to genetical genomics. Journal of the American Statistical Association, 110, 270–288.
Liu, J. Y., Lou, L. J., & Li, R. Z. (2018). Variable selection for partially linear models via partial correlation. Journal of Multivariate Analysis, 167, 418–434.
Newhouse, J. P., & McClellan, M. (1998). Econometrics in outcomes research: the use of instrumental variables. Annual Review of Public Health, 19, 17–24.
Schumaker, L. L. (1981). Spline Function. New York: Wiley.
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B, 58, 267–288.
Wang, H., Li, G., & Jiang, G. (2007). Robust regression shrinkage and consistent variable selection through the lad-lasso. Journal of Business & Economic Statistics, 25, 347–355.
Wang, M. Q., Song, L. X., & Tian, G. L. (2015). SCAD-penalized least absolute deviation regression in high-dimensional models. Communications in Statistics-Theory and Methods, 44, 2452–2472.
Windmeijer, F., Farbmacher, H., Davies, N., & Smith, G. D. (2019). On the use of the Lasso for instrumental variables estimation with some invalid instruments. Journal of the American Statistical Association, 114, 1339–1350.
Xue, L. G., & Zhu, L. X. (2007). Empirical likelihood semiparametric regression analysis for longitudinal data. Biometrika, 94, 921–937.
Xie, H., & Huang, J. (2009). SCAD-penalized regression in high-dimensional partially linear models. The Annals of Statistics, 37, 673–696.
Yang, Y. P., Chen, L. F., & Zhao, P. X. (2017). Empirical likelihood inference in partially linear single index models with endogenous covariates. Communications in Statistics-Theory and Methods, 46, 3297–3307.
Yuan, J. Y., Zhao, P. X., & Zhang, W. G. (2016). Semiparametric variable selection for partially varying coefficient models with endogenous variables. Computational Statistics, 31, 693–707.
Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894–942.
Zhao, P. X., & Li, G. R. (2013). Modified SEE variable selection for varying coefficient instrumental variable models. Statistical Methodology, 12, 60–70.
Zhao, P. X., & Xue, L. G. (2013). Empirical likelihood inferences for semiparametric instrumental variable models. Journal of Applied Mathematics and Computing., 43, 75–90.
Zou, H., & Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics, 36, 1509–1533.
Acknowledgements
This research is supported by the National Social Science Foundation of China (No. 18BTJ035).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that there is no conflict of interests regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix. Proof of theorems
Appendix. Proof of theorems
In this Appendix, we provide the proof details of Theorems 1–4 in this paper.
Proof of Theorem 1
Let \(\delta _{n}=\sqrt{q_{n}/n}\) and \(\theta =\theta _{0}+\delta _{n} M\). We first show that, for any given \(\varepsilon >0\), there exists a large constant C such that
Let \(\varDelta _{n}(\theta )=Q_{n}(\theta )-Q_{n}(\theta _{0})\), then, invoking \(\theta _{0k}=0\) with \(k\in \mathscr {A}_{2}\), \(p_{\lambda _{1n}}(0)=0\) and model (4), some simple calculations yield
We first consider \(I_{n1}\). From Knight (1998), we have the following identity:
Hence, we have
From condition (C3), we have \(E[I(\varepsilon _{i}>0)-I(\varepsilon _{i}<0)]=0\). Hence, invoking condition (C6), we can prove
Hence by the Markov inequality, we obtain
This implies that
Next we consider \(I_{n4}\). We denote
Then
Note that
Then we obtain
This implies \(I_{n5}=o_{p}(\delta _{n}^{2})\). In addition, by the dominated convergence theorem, we can obtain
Next we consider the term \(I_{n2}\). Invoking condition (C8), some calculations yield
Then, by choosing a large C, all terms \(I_{n2}\), \(I_{n3}\) and \(I_{n5}\) are dominated by \(I_{n6}\) with \(\Vert M\Vert =C\). Note that \(I_{n6}\) is positive, then invoking (13–18), we obtain that (12) holds. Furthermore, by the convexity of \(Q_{n}(\cdot )\), we have
This implies, with probability at least \(1-\varepsilon\), that there exists a local minimizer \(\widehat{\theta }\) such that \(\widehat{\theta }-\theta _{0}=O_{p}(\delta _{n})\), which completes the proof of Theorem 1. \(\square\)
Proof of Theorem 2
For convenience and simplicity, let \(\theta _{0}=(\theta _{\mathscr {A}_{1}}^{T},\theta _{\mathscr {A}_{2}}^{T})^{T}\) with \(\theta _{\mathscr {A}_{1}}=\{\theta _{0k}:k\in \mathscr {A}_{1}\}\) and \(\theta _{\mathscr {A}_{2}}=\{\theta _{0k}:k\in \mathscr {A}_{2}\}\). The corresponding covariate is denoted by \(Z_{i}=(Z_{i}^{(1)T},Z_{i}^{(2)T})^{T}\). From the proof of Theorem 1, for a sufficiently large C, \(\widehat{\theta }\) lies in the ball \(\{\theta _{0}+\delta _{n}M:\Vert M\Vert \le C\}\) with probability converging to 1, where \(\delta _{n}=\sqrt{q_{n}/n}\). We denote \(\theta _{1}=\theta _{\mathscr {A}_{1}}+\delta _{n} M_{1}\) and \(\theta _{2}=\theta _{\mathscr {A}_{2}}+\delta _{n} M_{2}\) with \(\Vert M_{1}\Vert ^{2}+\Vert M_{2}\Vert ^{2}\le C^{2}\), and \(V_{n}(M_{1},M_{2})=Q_{n}(\theta _{1},\theta _{2})-Q_{n}(\theta _{\mathscr {A}_{1}},0)\), then the estimator \(\widehat{\theta }=(\widehat{\theta }_{1}^{T},\widehat{\theta }_{2}^{T})^{T}\) can also be obtained by minimizing \(V_{n}(M_{1},M_{2})\), except on an event with probability tending to zero. Hence, to prove this theorem, we only need to prove that, for any \(M_{1}\) and \(M_{2}\) satisfying \(\Vert M_{1}\Vert ^{2} +\Vert M_{2}\Vert ^{2}\le C^{2}\), if \(\Vert M_{2}\Vert >0\), then with probability tending to 1, we have
Note that
Similar to the proof of Theorem 1, we can obtain
In addition, for \(k\in \mathscr {A}_{2}\), we have \(\theta _{0k}=0\). Then invoking \(p_{\lambda _{n}}(0)=0\), we can derive
By conditions (C7) and (C8), we have \(\sqrt{q_{n}/n}/\lambda _{n}\rightarrow 0\) and \(p'_{\lambda _{1n}}(0)/\lambda _{1n}>0\). Hence, (23) implies that (19) holds with probability tending to 1. This completes the proof of Theorem 2. \(\square\)
Proof of Theorem 3
Note that Theorem 2 implies that the variable selection for optimal instrumental variables is consistent, then model (6) implies that, with probability tending to 1, we have \(X_{i}=\varGamma _{\mathscr {A}_{1}} Z_{i}^{*}+e_{i}\), \(i=1,\ldots ,n\). In addition, because \(\widehat{\varGamma }\) is the moment estimator of \(\varGamma _{\mathscr {A}_{1}}\), we can prove \(\widehat{\varGamma }=\varGamma _{\mathscr {A}_{1}}+O_{p}(\sqrt{p_{n}/n})\). Hence, invoking \(E(e_{i})=0\), a simple calculation yields
Furthermore, we let \(\beta _{0}\) and \(\gamma _{0}\) be the true values of \(\beta\) and \(\gamma\), respectively, and denote \(R(U_{i})=g(U_{i})-W_{i}^{T}\gamma _{0}\). Then from Schumaker (1981), we have \(\Vert R(U_{i})\Vert =O_{p}(\kappa _{n}^{-r})=O_{p}(\sqrt{\kappa _{n}/n})\). Hence, invoking (24), some calculations yield
where \(\delta _{n}=\sqrt{(p_{n}+\kappa _{n})/n}\). Furthermore, we denote \(\alpha _{0}=(\beta _{0}^{T},\gamma _{0}^{T})^{T}\) and \(\alpha =(\beta ^{T},\gamma ^{T})^{T}\) with \(\alpha =\alpha _{0}+\delta _{n} M\), where M is a \((p_{n}+L_{n})\) dimensional vector. Then (25) implies that
and
where \(\xi _{i}=(X_{i}^{T},W_{i}^{T})^{T}\). Furthermore, we let \(\varDelta _{n}(\beta ,\gamma )= M_{n}(\beta ,\gamma )-M_{n}(\beta _{0},\gamma _{0})\), then from (26) and (27), we have
Hence invoking (28), and using the similar arguments to the proof of (13), we have that, for any given \(\varepsilon >0\), there exists a large constant C such that
This implies, with probability at least \(1-\varepsilon\), that there exists a local minimizer \(\widehat{\beta }\) and \(\widehat{\gamma }\), which satisfy \(\widehat{\beta }-\beta _{0}=O_{p}(\sqrt{(p_{n}+\kappa _{n})/n})\) and \(\widehat{\gamma }-\gamma _{0}=O_{p}(\sqrt{(p_{n}+\kappa _{n})/n})\). Then, we complete the proof of part (i) in Theorem 3. \(\square\)
In addition, invoking the proof of part (i), and using the same arguments as the proof of Theorem 2, we can prove part (ii) in Theorem 3. Then we omit the proof procedure of part (ii) in detail.
Proof of Theorem 4
A simple calculation yields
where \(R(u)=g(u)-B^{T}(u)\gamma _{0}\) and \(H=\int _{0}^{1}B(u)B^{T}(u)du\). From the proof of Theorem 3, we can obtain \(\Vert \widehat{\gamma }-\gamma _{0}\Vert =O_{p}(\sqrt{(p_{n}+\kappa _{n})/n})\). Then from condition (C7) and \(\kappa _{n}=O(1/(2r+1))\), we can prove \(O_{p}(\sqrt{(p_{n}+\kappa _{n})/n})=O_{p}(\sqrt{\kappa _{n}/n})=O_{p}(n^{-r/(2r+1)})\). Then, invoking \(\Vert H\Vert =O(1)\), a simple calculation yields
In addition, from conditions C1, C4 and Corollary 6.21 in Schumaker (1981), we can obtain \(R(u)=O(\kappa _{n}^{-r})=O(n^{-r/(2r+1)})\). Then, it is easy to show that
Invoking (29–31), we complete the proof Theorem 4. \(\square\)
Rights and permissions
About this article
Cite this article
Liu, C., Zhao, P. & Yang, Y. Regularization statistical inferences for partially linear models with high dimensional endogenous covariates. J. Korean Stat. Soc. 50, 163–184 (2021). https://doi.org/10.1007/s42952-020-00067-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-020-00067-4
Keywords
- Partially linear model
- High dimensional endogenous covariates
- High dimensional instrumental variables
- Regularized estimation