Regularization statistical inferences for partially linear models with high dimensional endogenous covariates

Liu, Changqing; Zhao, Peixin; Yang, Yiping

doi:10.1007/s42952-020-00067-4

Regularization statistical inferences for partially linear models with high dimensional endogenous covariates

Research Article
Published: 20 April 2020

Volume 50, pages 163–184, (2021)
Cite this article

Journal of the Korean Statistical Society Aims and scope Submit manuscript

Changqing Liu¹,
Peixin Zhao^2,3 &
Yiping Yang²

175 Accesses
2 Citations
Explore all metrics

Abstract

In this paper, we consider the statistical inferences for a class of partially linear models with high dimensional endogenous covariates, when high dimensional instrumental variables are also available. A regularized estimation procedure is proposed for identifying the optimal instrumental variables, and estimating covariate effects of the parametric and nonparametric components. Under some conditions, some theoretical properties are studied, such as the consistency of the optimal instrumental variable identification and significant covariate selection. Furthermore, some simulation studies and a real data analysis are carried out to examine the finite sample performance of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Article 15 July 2015

References

Cai, Z., & Xiong, H. (2012). Partially varying coefficient instrumental variables models. Statistca Neerlandica, 66, 85–110.
Article MathSciNet Google Scholar
Card, D. (1995). Using geographic variation in college proximity to estimate the return to schooling. In L. Christofides, E. Grant, & R. Swidinsky (Eds.), Aspects of Labor Market Behaviour: Essays in Honour of John Vanderkamp (pp. 201–222). Toronto: University of Toronto Press.
Google Scholar
Chen, B. C., Liang, H., & Zhou, Y. (2016). GMM estimation in partial linear models with endogenous covariates causing an over-identified problem. Communications in Statistics - Theory and Methods, 45, 3168–3184.
Article MathSciNet Google Scholar
Didelez, V., Meng, S., & Sheehan, N. A. (2010). Assumptions of IV methods for observational epidemiology. Statistical Science, 25, 22–40.
Article MathSciNet Google Scholar
Fan, J. Q., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Article MathSciNet Google Scholar
Fan, J. Q., & Li, R. Z. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of the American Statistical Association, 99, 710–723.
Article MathSciNet Google Scholar
Fan, J. Q., & Liao, Y. (2014). Endogeneity in dimensions. The Annals of Statistics, 42, 872–917.
Article MathSciNet Google Scholar
Frank, I. E., & Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics, 35, 109–135.
Article Google Scholar
Gao, X., & Huang, J. (2010). Asymptotic analysis of high-dimensional lad regression with lasso. Statistica Sinica, 20, 1485–1506.
MathSciNet MATH Google Scholar
Greenland, S. (2000). An introduction to instrumental variables for epidemiologists. International Journal of Epidemiologists, 29, 722–729.
Article Google Scholar
Hernan, M. A., & Robins, J. M. (2006). Instruments for causal inference-an epidemiologists dream? Epidemiology, 17, 360–372.
Article Google Scholar
Huang, J. T., & Zhao, P. X. (2017). QR decomposition based orthogonality estimation for partially linear models with longitudinal data. Journal of Computational and Applied Mathematics, 321, 406–415.
Article MathSciNet Google Scholar
Huang, J. T., & Zhao, P. X. (2018). Orthogonal weighted empirical likelihood based variable selection for semiparametric instrumental variable models. Communications in Statistics-Theory and Methods, 47, 4375–4388.
Article MathSciNet Google Scholar
Knight, K. (1998). Limiting distributions for $L_{1}$ regression estimators under general conditions. The Annals of Statistics, 26, 755–770.
Article MathSciNet Google Scholar
Koenker, R. (2005). Quantile Regression. Cambridge: Cambridge University Press.
Book Google Scholar
Lee, E. R., Cho, J., & Yu, K. (2019). A systematic review on model selection in high-dimensional regression. Journal of the Korean Statistical Society, 48, 1–12.
Article MathSciNet Google Scholar
Lin, W., Feng, R., & Li, H. Z. (2015). Regularization methods for high-dimensional instrumental variables regression with an application to genetical genomics. Journal of the American Statistical Association, 110, 270–288.
Article MathSciNet Google Scholar
Liu, J. Y., Lou, L. J., & Li, R. Z. (2018). Variable selection for partially linear models via partial correlation. Journal of Multivariate Analysis, 167, 418–434.
Article MathSciNet Google Scholar
Newhouse, J. P., & McClellan, M. (1998). Econometrics in outcomes research: the use of instrumental variables. Annual Review of Public Health, 19, 17–24.
Article Google Scholar
Schumaker, L. L. (1981). Spline Function. New York: Wiley.
MATH Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B, 58, 267–288.
MathSciNet MATH Google Scholar
Wang, H., Li, G., & Jiang, G. (2007). Robust regression shrinkage and consistent variable selection through the lad-lasso. Journal of Business & Economic Statistics, 25, 347–355.
Article MathSciNet Google Scholar
Wang, M. Q., Song, L. X., & Tian, G. L. (2015). SCAD-penalized least absolute deviation regression in high-dimensional models. Communications in Statistics-Theory and Methods, 44, 2452–2472.
Article MathSciNet Google Scholar
Windmeijer, F., Farbmacher, H., Davies, N., & Smith, G. D. (2019). On the use of the Lasso for instrumental variables estimation with some invalid instruments. Journal of the American Statistical Association, 114, 1339–1350.
Article MathSciNet Google Scholar
Xue, L. G., & Zhu, L. X. (2007). Empirical likelihood semiparametric regression analysis for longitudinal data. Biometrika, 94, 921–937.
Article MathSciNet Google Scholar
Xie, H., & Huang, J. (2009). SCAD-penalized regression in high-dimensional partially linear models. The Annals of Statistics, 37, 673–696.
Article MathSciNet Google Scholar
Yang, Y. P., Chen, L. F., & Zhao, P. X. (2017). Empirical likelihood inference in partially linear single index models with endogenous covariates. Communications in Statistics-Theory and Methods, 46, 3297–3307.
Article MathSciNet Google Scholar
Yuan, J. Y., Zhao, P. X., & Zhang, W. G. (2016). Semiparametric variable selection for partially varying coefficient models with endogenous variables. Computational Statistics, 31, 693–707.
Article MathSciNet Google Scholar
Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894–942.
Article MathSciNet Google Scholar
Zhao, P. X., & Li, G. R. (2013). Modified SEE variable selection for varying coefficient instrumental variable models. Statistical Methodology, 12, 60–70.
Article MathSciNet Google Scholar
Zhao, P. X., & Xue, L. G. (2013). Empirical likelihood inferences for semiparametric instrumental variable models. Journal of Applied Mathematics and Computing., 43, 75–90.
Article MathSciNet Google Scholar
Zou, H., & Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics, 36, 1509–1533.
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research is supported by the National Social Science Foundation of China (No. 18BTJ035).

Author information

Authors and Affiliations

College of Mathematics and Statistics, Baise University, Guangxi Baise, 533000, China
Changqing Liu
College of Mathematics and Statistics, Chongqing Technology and Business University, Chongqing, 400067, China
Peixin Zhao & Yiping Yang
Chongqing key laboratory of social economy and applied statistics, Chongqing, 400067, China
Peixin Zhao

Authors

Changqing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Peixin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yiping Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peixin Zhao.

Ethics declarations

Conflicts of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix. Proof of theorems

In this Appendix, we provide the proof details of Theorems 1–4 in this paper.

Proof of Theorem 1

Let $\delta _{n}=\sqrt{q_{n}/n}$ and $\theta =\theta _{0}+\delta _{n} M$. We first show that, for any given $\varepsilon >0$, there exists a large constant C such that

$$\begin{aligned} P\left\{ \inf _{\Vert M\Vert =C}Q_{n}(\theta )>Q_{n}(\theta _{0})\right\} \ge 1-\varepsilon . \end{aligned}$$

(12)

Let $\varDelta _{n}(\theta )=Q_{n}(\theta )-Q_{n}(\theta _{0})$, then, invoking $\theta _{0k}=0$ with $k\in \mathscr {A}_{2}$, $p_{\lambda _{1n}}(0)=0$ and model (4), some simple calculations yield

$$\begin{aligned} \varDelta _{n}(\theta )= & {} \frac{1}{n}\sum _{i=1}^{n}\Bigg |Y_{i}-Z_{i}^{T}\theta \Bigg | -\frac{1}{n}\sum _{i=1}^{n}\Bigg |Y_{i}-Z_{i}^{T}\theta _{0}\Bigg | +\sum _{k=1}^{q_{n}}[p_{\lambda _{1n}}(|\theta _{k}|)-p_{\lambda _{1n}}(|\theta _{0k}|)] \nonumber \\= & {} \frac{1}{n}\sum _{i=1}^{n} \Bigg |\varepsilon _{i}-\delta _{n} Z_{i}^{T}M\Bigg |-\frac{1}{n}\sum _{i=1}^{n} |\varepsilon _{i}|+\sum _{k=1}^{q_{n}}[p_{\lambda _{1n}} (|\theta _{k}|)-p_{\lambda _{1n}}(|\theta _{0k}|)]\nonumber \\\ge & {} \frac{1}{n}\sum _{i=1}^{n}\Bigg [\Bigg |\varepsilon _{i}-\delta _{n} Z_{i}^{T}M\Bigg |-|\varepsilon _{i}|\Bigg ] +\sum _{k\in \mathscr {A}_{1}}[p_{\lambda _{1n}}(|\theta _{k}|)-p_{\lambda _{1n}}(|\theta _{0k}|)]\nonumber \\\equiv & {} I_{n1}+I_{n2}. \end{aligned}$$

(13)

We first consider $I_{n1}$. From Knight (1998), we have the following identity:

$$\begin{aligned} |a-b|-|a|=-b[I(a>0)-I(a<0)]+2\int _{0}^{b}[I(a\le s)-I(a\le 0)]ds. \end{aligned}$$

Hence, we have

$$\begin{aligned} I_{n1}= & {} -\frac{1}{n}\sum _{i=1}^{n}\delta _{n} Z_{i}^{T}M [I(\varepsilon _{i}>0)-I(\varepsilon _{i}<0)]\nonumber \\&+\frac{2}{n}\sum _{i=1}^{n}\int _{0}^{\delta _{n} Z_{i}^{T}M}[I(\varepsilon _{i}\le s)-I(\varepsilon _{i}\le 0)]ds\nonumber \\\equiv & {} \, I_{n3}+I_{n4}. \end{aligned}$$

(14)

From condition (C3), we have $E[I(\varepsilon _{i}>0)-I(\varepsilon _{i}<0)]=0$. Hence, invoking condition (C6), we can prove

$$\begin{aligned} E(I_{n3})= & {} -\frac{1}{n}\sum _{i=1}^{n}\delta _{n} E(Z_{i})^{T}M E[I(\varepsilon _{i}>0)-I(\varepsilon _{i}<0)]=0,\\ Var(I_{n3}) \,= \, & {} \frac{\delta _{n}^{2} }{n^{2}}\sum _{i=1}^{n}M^{T}E(Z_{i}Z_{i}^{T})M E[I(\varepsilon _{i}>0)-I(\varepsilon _{i}<0)]^{2} \le \frac{\delta _{n}^{2}\rho _{2} }{n}\Vert M\Vert ^{2}. \end{aligned}$$

Hence by the Markov inequality, we obtain

$$\begin{aligned} P(|I_{n3}|\ge \delta _{n}^{2}\Vert M\Vert )\le \frac{E(I_{n3}^{2})}{\delta _{n}^{4}\Vert M\Vert ^{2}} \le \frac{\delta _{n}^{2}\rho _{2} \Vert M\Vert ^{2}}{n\delta _{n}^{4}\Vert M\Vert ^{2}}\rightarrow 0. \end{aligned}$$

This implies that

$$\begin{aligned} I_{n3}=o_{p}(\delta _{n}^{2})\Vert M\Vert . \end{aligned}$$

(15)

Next we consider $I_{n4}$. We denote

$$\begin{aligned} S_{ni}=\frac{2}{n}\int _{0}^{\delta _{n} Z_{i}^{T}M}[I(\varepsilon _{i}\le s)-I(\varepsilon _{i}\le 0)]ds. \end{aligned}$$

Then

$$\begin{aligned} I_{n4}=\sum _{i=1}^{n}S_{ni}=\sum _{i=1}^{n}[S_{ni}-E(S_{ni})]+\sum _{i=1}^{n}E(S_{ni})\equiv I_{n5}+I_{n6}. \end{aligned}$$

(16)

Note that

$$\begin{aligned} nE(S_{ni}^{2}) \, = \, & {} n\frac{4}{n^{2}}E\left\{ \left[ \int _{0}^{\delta _{n} Z_{i}^{T}M}[I(\varepsilon _{i}\le s)-I(\varepsilon _{i}\le 0)]ds\right] ^{2}\right\} \\\le & {} \frac{4}{n}E\left\{ \left[ \int _{0}^{|\delta _{n} Z_{i}^{T}M|}ds\right] ^{2}\right\} \\= & {} \frac{4}{n}E\left\{ |\delta _{n} Z_{i}^{T}M|^{2}\right\} \le \frac{4\delta _{n}^{2}}{n}M^{T}E( Z_{i} Z_{i}^{T})M \le \frac{4\delta _{n}^{2}}{n} \rho _{2}\Vert M\Vert ^{2}. \end{aligned}$$

Then we obtain

$$\begin{aligned} P(|I_{n5}|\ge \delta _{n}^{2})\le \frac{Var(\sum _{i=1}^{n}S_{ni})}{\delta _{n}^{4}} \le \frac{nE(S_{ni}^{2})}{\delta _{n}^{4}}\rightarrow 0. \end{aligned}$$

This implies $I_{n5}=o_{p}(\delta _{n}^{2})$. In addition, by the dominated convergence theorem, we can obtain

$$\begin{aligned} I_{n6} \,= \,& {} 2E\left\{ \int _{0}^{\delta _{n} Z_{i}^{T}M}[I(\varepsilon _{i}\le s)-I(\varepsilon _{i}\le 0)]ds\right\} \nonumber \\ \,=\, & {} 2E\left\{ E\left[ \left. \int _{0}^{\delta _{n} Z_{i}^{T}M}[I(\varepsilon _{i}\le s)-I(\varepsilon _{i}\le 0)]ds\right| Z_{i}\right] \right\} \nonumber \\ \,= \, & {} 2E\left\{ \int _{0}^{\delta _{n} Z_{i}^{T}M}[F(s)-F(0)]ds\right\} \nonumber \\= & {} 2f(0)E\left\{ (1+o(1))\int _{0}^{\delta _{n} Z_{i}^{T}M}sds\right\} \nonumber \\ \,= \, & {} f(0)(1+o(1))\delta _{n}^{2}M^{T}E(Z_{i}Z_{i}^{T})M\nonumber \\ \, = \, & {} O_{p}(\delta _{n}^{2})\Vert M\Vert ^{2}. \end{aligned}$$

(17)

Next we consider the term $I_{n2}$. Invoking condition (C8), some calculations yield

$$\begin{aligned} |I_{n2}| \,=\, & {} \sum _{k\in \mathscr {A}_{1}}[p_{\lambda _{1n}}(|\theta _{k}|)-p_{\lambda _{1n}}(|\theta _{0k}|)]\nonumber \\= & {} \sum _{k\in \mathscr {A}_{1}} \delta _{n}p'_{\lambda _{1n}}(|\theta _{0k}|)sgn(\theta _{0k})|M_{k}|+ \sum _{k\in \mathscr {A}_{1}} \delta _{n}^{2}p''_{\lambda _{1n}}(|\theta _{0k}|)sgn(\theta _{0k})|M_{k}|^{2}(1+o(1))\nonumber \\\le & {} \sqrt{s}\delta _{n} a_{n}\Vert M\Vert +\delta _{n}^{2}b_{n}\Vert M\Vert ^{2}\nonumber \\ \,= \,& {} o_{p}(\delta _{n}^{2})\Vert M\Vert ^{2}. \end{aligned}$$

(18)

Then, by choosing a large C, all terms $I_{n2}$, $I_{n3}$ and $I_{n5}$ are dominated by $I_{n6}$ with $\Vert M\Vert =C$. Note that $I_{n6}$ is positive, then invoking (13–18), we obtain that (12) holds. Furthermore, by the convexity of $Q_{n}(\cdot )$, we have

$$\begin{aligned} P\left\{ \inf _{\Vert M\Vert \le C}Q_{n}(\theta )>Q_{n}(\theta _{0})\right\} \ge 1-\varepsilon . \end{aligned}$$

This implies, with probability at least $1-\varepsilon$, that there exists a local minimizer $\widehat{\theta }$ such that $\widehat{\theta }-\theta _{0}=O_{p}(\delta _{n})$, which completes the proof of Theorem 1. $\square$

Proof of Theorem 2

For convenience and simplicity, let $\theta _{0}=(\theta _{\mathscr {A}_{1}}^{T},\theta _{\mathscr {A}_{2}}^{T})^{T}$ with $\theta _{\mathscr {A}_{1}}=\{\theta _{0k}:k\in \mathscr {A}_{1}\}$ and $\theta _{\mathscr {A}_{2}}=\{\theta _{0k}:k\in \mathscr {A}_{2}\}$. The corresponding covariate is denoted by $Z_{i}=(Z_{i}^{(1)T},Z_{i}^{(2)T})^{T}$. From the proof of Theorem 1, for a sufficiently large C, $\widehat{\theta }$ lies in the ball $\{\theta _{0}+\delta _{n}M:\Vert M\Vert \le C\}$ with probability converging to 1, where $\delta _{n}=\sqrt{q_{n}/n}$. We denote $\theta _{1}=\theta _{\mathscr {A}_{1}}+\delta _{n} M_{1}$ and $\theta _{2}=\theta _{\mathscr {A}_{2}}+\delta _{n} M_{2}$ with $\Vert M_{1}\Vert ^{2}+\Vert M_{2}\Vert ^{2}\le C^{2}$, and $V_{n}(M_{1},M_{2})=Q_{n}(\theta _{1},\theta _{2})-Q_{n}(\theta _{\mathscr {A}_{1}},0)$, then the estimator $\widehat{\theta }=(\widehat{\theta }_{1}^{T},\widehat{\theta }_{2}^{T})^{T}$ can also be obtained by minimizing $V_{n}(M_{1},M_{2})$, except on an event with probability tending to zero. Hence, to prove this theorem, we only need to prove that, for any $M_{1}$ and $M_{2}$ satisfying $\Vert M_{1}\Vert ^{2} +\Vert M_{2}\Vert ^{2}\le C^{2}$, if $\Vert M_{2}\Vert >0$, then with probability tending to 1, we have

$$\begin{aligned} V_{n}(M_{1},M_{2})-V_{n}(M_{1},0)>0. \end{aligned}$$

(19)

Note that

$$\begin{aligned}&V_{n}(M_{1},M_{2})-V_{n}(M_{1},0)\nonumber \\&~~=Q_{n}(\theta _{1},\theta _{2})-Q_{n}(\theta _{\mathscr {A}_{1}},0) -[Q_{n}(\theta _{1},0)-Q_{n}(\theta _{\mathscr {A}_{1}},0)]\nonumber \\&~~=\left\{ \frac{1}{n}\sum _{i=1}^{n}\Bigg |Y_{i}-Z_{i}^{(1)T}\theta _{\mathscr {A}_{1}}-\delta _{n} Z_{i}^{(1)T}M_{1}- \delta _{n} Z_{i}^{(2)T}M_{2}\Bigg | -\frac{1}{n}\sum _{i=1}^{n}\Bigg |Y_{i}-Z_{i}^{(1)T}\theta _{\mathscr {A}_{1}}\Bigg |\right\} \nonumber \\&~~~~-\left\{ \frac{1}{n}\sum _{i=1}^{n}\Bigg |Y_{i}-Z_{i}^{(1)T}\theta _{\mathscr {A}_{1}}-\delta _{n} Z_{i}^{(1)T}M_{1}\Bigg | -\frac{1}{n}\sum _{i=1}^{n}\Bigg |Y_{i}-Z_{i}^{(1)T}\theta _{\mathscr {A}_{1}}\Bigg |\right\} \nonumber \\&~~~~+\sum _{k=\mathscr {A}_{2}}p_{\lambda _{1n}}(|\theta _{2k}|)\nonumber \\&~~=\left\{ \frac{1}{n}\sum _{i=1}^{n}\Bigg |\varepsilon _{i}-\delta _{n} Z_{i}^{(1)T}M_{1}-\delta _{n} Z_{i}^{(2)T}M_{2}\Bigg | -\frac{1}{n}\sum _{i=1}^{n}|\varepsilon _{i}|\right\} \nonumber \\&~~~~-\left\{ \frac{1}{n}\sum _{i=1}^{n}\Bigg |\varepsilon _{i}-\delta _{n} Z_{i}^{(1)T}M_{1}\Bigg |-\frac{1}{n}\sum _{i=1}^{n}|\varepsilon _{i}|\right\} +\sum _{k\in \mathscr {A}_{2}}p_{\lambda _{1n}}(|\theta _{2k}|). \end{aligned}$$

(20)

Similar to the proof of Theorem 1, we can obtain

$$\begin{aligned}&\left\{ \frac{1}{n}\sum _{i=1}^{n}\Bigg |\varepsilon _{i}-\delta _{n} Z_{i}^{(1)T}M_{1}-\delta _{n} Z_{i}^{(2)T}M_{2}\Bigg | -\frac{1}{n}\sum _{i=1}^{n}|\varepsilon _{i}|\right\} \nonumber \\&~~~~~~~-\left\{ \frac{1}{n}\sum _{i=1}^{n}\Bigg |\varepsilon _{i}-\delta _{n} Z_{i}^{(1)T}M_{1}\Bigg |-\frac{1}{n}\sum _{i=1}^{n}|\varepsilon _{i}|\right\} \nonumber \\&~~~\ge O_{p}(\delta _{n}^{2})+\frac{f(0)}{2}\delta _{n}^{2}M^{T}E(ZZ^{T})M+O_{p}(\delta _{n}^{2}) -\frac{f(0)}{2}\delta _{n}^{2}M_{1}^{T}E\Big (Z^{(eq1)}Z^{(1)T}\Big )M_{1}\nonumber \\&~~~= O_{p}(\delta _{n}^{2})+O_{p}(\delta _{n}^{2})f(0)\Vert M\Vert ^{2}. \end{aligned}$$

(21)

In addition, for $k\in \mathscr {A}_{2}$, we have $\theta _{0k}=0$. Then invoking $p_{\lambda _{n}}(0)=0$, we can derive

$$\begin{aligned} \sum _{k\in \mathscr {A}_{2}}p_{\lambda _{1n}}(|\theta _{2k}|)= & {} \sum _{k\in \mathscr {A}_{2}}p_{\lambda _{1n}}(|\theta _{0k}+\delta _{n}M_{2k}|)\nonumber \\= & {} \sum _{k\in \mathscr {A}_{2}}p_{\lambda _{1n}}(|\theta _{0k}|) +\sum _{k\in \mathscr {A}_{2}}p'_{\lambda _{1n}}(|\theta _{0k}|)\delta _{n}|M_{2k}| +O_{p}(\delta _{n}^{2})\sum _{k\in \mathscr {A}_{2}}|M_{2k}|^{2}\nonumber \\ \,= \, & {} \delta _{n}p'_{\lambda _{1n}}(0) \sum _{k\in \mathscr {A}_{2}}|M_{2k}|+O_{p}(\delta _{n}^{2})\Vert M\Vert ^{2}. \end{aligned}$$

(22)

Hence, from (20–22), we have

$$\begin{aligned}&V_{n}(M_{1},M_{2})-V_{n}(M_{1},0)\nonumber \\&\quad \ge \delta _{n}\lambda _{1n}\left( O_{p}(\sqrt{q_{n}/n}/ \lambda _{1n})f(0)\Vert M\Vert ^{2}+p'_{\lambda _{1n}}(0)/\lambda _{1n} \sum _{k\in \mathscr {A}_{2}}|M_{2k}|\right) . \end{aligned}$$

(23)

By conditions (C7) and (C8), we have $\sqrt{q_{n}/n}/\lambda _{n}\rightarrow 0$ and $p'_{\lambda _{1n}}(0)/\lambda _{1n}>0$. Hence, (23) implies that (19) holds with probability tending to 1. This completes the proof of Theorem 2. $\square$

Proof of Theorem 3

Note that Theorem 2 implies that the variable selection for optimal instrumental variables is consistent, then model (6) implies that, with probability tending to 1, we have $X_{i}=\varGamma _{\mathscr {A}_{1}} Z_{i}^{*}+e_{i}$, $i=1,\ldots ,n$. In addition, because $\widehat{\varGamma }$ is the moment estimator of $\varGamma _{\mathscr {A}_{1}}$, we can prove $\widehat{\varGamma }=\varGamma _{\mathscr {A}_{1}}+O_{p}(\sqrt{p_{n}/n})$. Hence, invoking $E(e_{i})=0$, a simple calculation yields

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}(X_{i}-X_{i}^{*})^{T}\beta= & {} \frac{1}{n}\sum _{i=1}^{n}(\varGamma _{\mathscr {A}_{1}}Z_{i}^{*} +e_{i}-\widehat{\varGamma }Z_{i}^{*})^{T}\beta \nonumber \\= & {} \frac{1}{n}\sum _{i=1}^{n}Z_{i}^{*T}(\varGamma _{\mathscr {A}_{1}} -\widehat{\varGamma })^{T}\beta +\frac{1}{\sqrt{n}}\left( \frac{1}{\sqrt{n}}\sum _{i=1}^{n}e_{i}^{T}\beta \right) \nonumber \\ \,= \, & {} O_{p}(\Vert \varGamma _{\mathscr {A}_{1}}-\widehat{\varGamma }\Vert )+O_{p}(n^{-1/2})=O_{p}(\sqrt{p_{n}/n}). \end{aligned}$$

(24)

Furthermore, we let $\beta _{0}$ and $\gamma _{0}$ be the true values of $\beta$ and $\gamma$, respectively, and denote $R(U_{i})=g(U_{i})-W_{i}^{T}\gamma _{0}$. Then from Schumaker (1981), we have $\Vert R(U_{i})\Vert =O_{p}(\kappa _{n}^{-r})=O_{p}(\sqrt{\kappa _{n}/n})$. Hence, invoking (24), some calculations yield

$$\begin{aligned} M_{n}(\beta ,\gamma )= & {} \frac{1}{n}\sum _{i=1}^{n}\Bigg |X_{i}^{T} (\beta _{0}-\beta )+(X_{i}-X_{i}^{*})^{T}\beta +W_{i}^{T}(\gamma _{0}-\gamma )\nonumber \\&+R(U_{i})+\varepsilon _{i}\Bigg | +\sum _{j=1}^{p_{n}}p_{\lambda _{2n}}\left( |\beta _{j}|\right) \nonumber \\= & {} \frac{1}{n}\sum _{i=1}^{n}\Bigg |X_{i}^{T}(\beta _{0}-\beta ) +W_{i}^{T}(\gamma _{0}-\gamma )+\varepsilon _{i}\Bigg | +\sum _{j=1}^{p_{n}}p_{\lambda _{2n}}\left( |\beta _{j}|\right) \nonumber \\&+O_{p}\Bigg (\Vert \hat{\varGamma }-\varGamma _{\mathscr {A}_{1}}\Vert +\Vert R(U_{i})\Vert \Bigg )\nonumber \\= & {} \frac{1}{n}\sum _{i=1}^{n}\Bigg |X_{i}^{T}(\beta _{0}-\beta ) +W_{i}^{T}(\gamma _{0}-\gamma )+\varepsilon _{i}\Bigg | +\sum _{j=1}^{p_{n}}p_{\lambda _{2n}}\left( |\beta _{j}|\right) +O_{p}(\delta _{n}), \end{aligned}$$

(25)

where $\delta _{n}=\sqrt{(p_{n}+\kappa _{n})/n}$. Furthermore, we denote $\alpha _{0}=(\beta _{0}^{T},\gamma _{0}^{T})^{T}$ and $\alpha =(\beta ^{T},\gamma ^{T})^{T}$ with $\alpha =\alpha _{0}+\delta _{n} M$, where M is a $(p_{n}+L_{n})$ dimensional vector. Then (25) implies that

$$\begin{aligned} M_{n}(\beta ,\gamma )=\frac{1}{n}\sum _{i=1}^{n}\Bigg |\varepsilon _{i}-\delta _{n}\xi _{i}^{T}M\Bigg | +\sum _{j=1}^{p_{n}}p_{\lambda _{2n}}\left( |\beta _{j}|\right) +O_{p}(\delta _{n}), \end{aligned}$$

(26)

and

$$\begin{aligned} M_{n}(\beta _{0},\gamma _{0})=\frac{1}{n}\sum _{i=1}^{n}|\varepsilon _{i}| +\sum _{j=1}^{p_{n}}p_{\lambda _{2n}}\left( |\beta _{0j}|\right) +O_{p}(\delta _{n}), \end{aligned}$$

(27)

where $\xi _{i}=(X_{i}^{T},W_{i}^{T})^{T}$. Furthermore, we let $\varDelta _{n}(\beta ,\gamma )= M_{n}(\beta ,\gamma )-M_{n}(\beta _{0},\gamma _{0})$, then from (26) and (27), we have

$$\begin{aligned} \varDelta _{n}(\beta ,\gamma ) = \frac{1}{n}\sum _{i=1}^{n}\Bigg [ |\varepsilon _{i}-\delta _{n} \xi _{i}^{T}M|- |\varepsilon _{i}|\Bigg ]+\sum _{j=1}^{p_{n}}\Bigg [p_{\lambda _{2n}} (|\beta _{j}|)-p_{\lambda _{2n}}(|\beta _{0j}|)\Bigg ] +O_{p}(\delta _{n}). \end{aligned}$$

(28)

Hence invoking (28), and using the similar arguments to the proof of (13), we have that, for any given $\varepsilon >0$, there exists a large constant C such that

$$\begin{aligned} P\left\{ \inf _{\Vert M\Vert =C}M_{n}(\beta ,\gamma )>M_{n} (\beta _{0},\gamma _{0})\right\} \ge 1-\varepsilon . \end{aligned}$$

This implies, with probability at least $1-\varepsilon$, that there exists a local minimizer $\widehat{\beta }$ and $\widehat{\gamma }$, which satisfy $\widehat{\beta }-\beta _{0}=O_{p}(\sqrt{(p_{n}+\kappa _{n})/n})$ and $\widehat{\gamma }-\gamma _{0}=O_{p}(\sqrt{(p_{n}+\kappa _{n})/n})$. Then, we complete the proof of part (i) in Theorem 3. $\square$

In addition, invoking the proof of part (i), and using the same arguments as the proof of Theorem 2, we can prove part (ii) in Theorem 3. Then we omit the proof procedure of part (ii) in detail.

Proof of Theorem 4

A simple calculation yields

$$\begin{aligned} \Vert \widehat{g}(u)-g(u)\Vert ^{2} \,= \,& {} \int _{0}^{1}\{\widehat{g}(u)-g(u)\}^{2}du\nonumber \\ \,= \, & {} \int _{0}^{1}\{B^{T}(u)\widehat{\gamma }-B^{T}(u)\gamma _{0}+R(u)\}^{2}du\nonumber \\\le & {} 2\int _{0}^{1}\{B^{T}(u)\widehat{\gamma }-B^{T}(u)\gamma _{0}\}^{2}du+ 2\int _{0}^{1}R(u)^{2}du\nonumber \\ \,= \,& {} 2(\widehat{\gamma }-\gamma _{0})^{T}H(\widehat{\gamma }-\gamma _{0})+ 2\int _{0}^{1}R(u)^{2}du, \end{aligned}$$

(29)

where $R(u)=g(u)-B^{T}(u)\gamma _{0}$ and $H=\int _{0}^{1}B(u)B^{T}(u)du$. From the proof of Theorem 3, we can obtain $\Vert \widehat{\gamma }-\gamma _{0}\Vert =O_{p}(\sqrt{(p_{n}+\kappa _{n})/n})$. Then from condition (C7) and $\kappa _{n}=O(1/(2r+1))$, we can prove $O_{p}(\sqrt{(p_{n}+\kappa _{n})/n})=O_{p}(\sqrt{\kappa _{n}/n})=O_{p}(n^{-r/(2r+1)})$. Then, invoking $\Vert H\Vert =O(1)$, a simple calculation yields

$$\begin{aligned} (\widehat{\gamma }_{k}-\gamma _{k0})^{T}H(\widehat{\gamma }_{k}-\gamma _{k0}) =O_{p}\left( n^{\frac{-2r}{2r+1}}\right) . \end{aligned}$$

(30)

In addition, from conditions C1, C4 and Corollary 6.21 in Schumaker (1981), we can obtain $R(u)=O(\kappa _{n}^{-r})=O(n^{-r/(2r+1)})$. Then, it is easy to show that

$$\begin{aligned} \int _{0}^{1}R(u)^{2}du=O_{p}\left( n^{\frac{-2r}{2r+1}}\right) . \end{aligned}$$

(31)

Invoking (29–31), we complete the proof Theorem 4. $\square$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, C., Zhao, P. & Yang, Y. Regularization statistical inferences for partially linear models with high dimensional endogenous covariates. J. Korean Stat. Soc. 50, 163–184 (2021). https://doi.org/10.1007/s42952-020-00067-4

Download citation

Received: 30 September 2019
Accepted: 01 April 2020
Published: 20 April 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s42952-020-00067-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Regularization statistical inferences for partially linear models with high dimensional endogenous covariates

Abstract

Access this article

Similar content being viewed by others

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendix. Proof of theorems

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Regularization statistical inferences for partially linear models with high dimensional endogenous covariates

Abstract

Access this article

Similar content being viewed by others

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendix. Proof of theorems

Appendix. Proof of theorems

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation