Skip to main content
Log in

Regularization statistical inferences for partially linear models with high dimensional endogenous covariates

  • Research Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

In this paper, we consider the statistical inferences for a class of partially linear models with high dimensional endogenous covariates, when high dimensional instrumental variables are also available. A regularized estimation procedure is proposed for identifying the optimal instrumental variables, and estimating covariate effects of the parametric and nonparametric components. Under some conditions, some theoretical properties are studied, such as the consistency of the optimal instrumental variable identification and significant covariate selection. Furthermore, some simulation studies and a real data analysis are carried out to examine the finite sample performance of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Cai, Z., & Xiong, H. (2012). Partially varying coefficient instrumental variables models. Statistca Neerlandica, 66, 85–110.

    Article  MathSciNet  Google Scholar 

  • Card, D. (1995). Using geographic variation in college proximity to estimate the return to schooling. In L. Christofides, E. Grant, & R. Swidinsky (Eds.), Aspects of Labor Market Behaviour: Essays in Honour of John Vanderkamp (pp. 201–222). Toronto: University of Toronto Press.

    Google Scholar 

  • Chen, B. C., Liang, H., & Zhou, Y. (2016). GMM estimation in partial linear models with endogenous covariates causing an over-identified problem. Communications in Statistics - Theory and Methods, 45, 3168–3184.

    Article  MathSciNet  Google Scholar 

  • Didelez, V., Meng, S., & Sheehan, N. A. (2010). Assumptions of IV methods for observational epidemiology. Statistical Science, 25, 22–40.

    Article  MathSciNet  Google Scholar 

  • Fan, J. Q., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.

    Article  MathSciNet  Google Scholar 

  • Fan, J. Q., & Li, R. Z. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of the American Statistical Association, 99, 710–723.

    Article  MathSciNet  Google Scholar 

  • Fan, J. Q., & Liao, Y. (2014). Endogeneity in dimensions. The Annals of Statistics, 42, 872–917.

    Article  MathSciNet  Google Scholar 

  • Frank, I. E., & Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics, 35, 109–135.

    Article  Google Scholar 

  • Gao, X., & Huang, J. (2010). Asymptotic analysis of high-dimensional lad regression with lasso. Statistica Sinica, 20, 1485–1506.

    MathSciNet  MATH  Google Scholar 

  • Greenland, S. (2000). An introduction to instrumental variables for epidemiologists. International Journal of Epidemiologists, 29, 722–729.

    Article  Google Scholar 

  • Hernan, M. A., & Robins, J. M. (2006). Instruments for causal inference-an epidemiologists dream? Epidemiology, 17, 360–372.

    Article  Google Scholar 

  • Huang, J. T., & Zhao, P. X. (2017). QR decomposition based orthogonality estimation for partially linear models with longitudinal data. Journal of Computational and Applied Mathematics, 321, 406–415.

    Article  MathSciNet  Google Scholar 

  • Huang, J. T., & Zhao, P. X. (2018). Orthogonal weighted empirical likelihood based variable selection for semiparametric instrumental variable models. Communications in Statistics-Theory and Methods, 47, 4375–4388.

    Article  MathSciNet  Google Scholar 

  • Knight, K. (1998). Limiting distributions for \(L_{1}\) regression estimators under general conditions. The Annals of Statistics, 26, 755–770.

    Article  MathSciNet  Google Scholar 

  • Koenker, R. (2005). Quantile Regression. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Lee, E. R., Cho, J., & Yu, K. (2019). A systematic review on model selection in high-dimensional regression. Journal of the Korean Statistical Society, 48, 1–12.

    Article  MathSciNet  Google Scholar 

  • Lin, W., Feng, R., & Li, H. Z. (2015). Regularization methods for high-dimensional instrumental variables regression with an application to genetical genomics. Journal of the American Statistical Association, 110, 270–288.

    Article  MathSciNet  Google Scholar 

  • Liu, J. Y., Lou, L. J., & Li, R. Z. (2018). Variable selection for partially linear models via partial correlation. Journal of Multivariate Analysis, 167, 418–434.

    Article  MathSciNet  Google Scholar 

  • Newhouse, J. P., & McClellan, M. (1998). Econometrics in outcomes research: the use of instrumental variables. Annual Review of Public Health, 19, 17–24.

    Article  Google Scholar 

  • Schumaker, L. L. (1981). Spline Function. New York: Wiley.

    MATH  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B, 58, 267–288.

    MathSciNet  MATH  Google Scholar 

  • Wang, H., Li, G., & Jiang, G. (2007). Robust regression shrinkage and consistent variable selection through the lad-lasso. Journal of Business & Economic Statistics, 25, 347–355.

    Article  MathSciNet  Google Scholar 

  • Wang, M. Q., Song, L. X., & Tian, G. L. (2015). SCAD-penalized least absolute deviation regression in high-dimensional models. Communications in Statistics-Theory and Methods, 44, 2452–2472.

    Article  MathSciNet  Google Scholar 

  • Windmeijer, F., Farbmacher, H., Davies, N., & Smith, G. D. (2019). On the use of the Lasso for instrumental variables estimation with some invalid instruments. Journal of the American Statistical Association, 114, 1339–1350.

    Article  MathSciNet  Google Scholar 

  • Xue, L. G., & Zhu, L. X. (2007). Empirical likelihood semiparametric regression analysis for longitudinal data. Biometrika, 94, 921–937.

    Article  MathSciNet  Google Scholar 

  • Xie, H., & Huang, J. (2009). SCAD-penalized regression in high-dimensional partially linear models. The Annals of Statistics, 37, 673–696.

    Article  MathSciNet  Google Scholar 

  • Yang, Y. P., Chen, L. F., & Zhao, P. X. (2017). Empirical likelihood inference in partially linear single index models with endogenous covariates. Communications in Statistics-Theory and Methods, 46, 3297–3307.

    Article  MathSciNet  Google Scholar 

  • Yuan, J. Y., Zhao, P. X., & Zhang, W. G. (2016). Semiparametric variable selection for partially varying coefficient models with endogenous variables. Computational Statistics, 31, 693–707.

    Article  MathSciNet  Google Scholar 

  • Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894–942.

    Article  MathSciNet  Google Scholar 

  • Zhao, P. X., & Li, G. R. (2013). Modified SEE variable selection for varying coefficient instrumental variable models. Statistical Methodology, 12, 60–70.

    Article  MathSciNet  Google Scholar 

  • Zhao, P. X., & Xue, L. G. (2013). Empirical likelihood inferences for semiparametric instrumental variable models. Journal of Applied Mathematics and Computing., 43, 75–90.

    Article  MathSciNet  Google Scholar 

  • Zou, H., & Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. The Annals of Statistics, 36, 1509–1533.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research is supported by the National Social Science Foundation of China (No. 18BTJ035).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peixin Zhao.

Ethics declarations

Conflicts of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix. Proof of theorems

Appendix. Proof of theorems

In this Appendix, we provide the proof details of Theorems 14 in this paper.

Proof of Theorem 1

Let \(\delta _{n}=\sqrt{q_{n}/n}\) and \(\theta =\theta _{0}+\delta _{n} M\). We first show that, for any given \(\varepsilon >0\), there exists a large constant C such that

$$\begin{aligned} P\left\{ \inf _{\Vert M\Vert =C}Q_{n}(\theta )>Q_{n}(\theta _{0})\right\} \ge 1-\varepsilon . \end{aligned}$$
(12)

Let \(\varDelta _{n}(\theta )=Q_{n}(\theta )-Q_{n}(\theta _{0})\), then, invoking \(\theta _{0k}=0\) with \(k\in \mathscr {A}_{2}\), \(p_{\lambda _{1n}}(0)=0\) and model (4), some simple calculations yield

$$\begin{aligned} \varDelta _{n}(\theta )= & {} \frac{1}{n}\sum _{i=1}^{n}\Bigg |Y_{i}-Z_{i}^{T}\theta \Bigg | -\frac{1}{n}\sum _{i=1}^{n}\Bigg |Y_{i}-Z_{i}^{T}\theta _{0}\Bigg | +\sum _{k=1}^{q_{n}}[p_{\lambda _{1n}}(|\theta _{k}|)-p_{\lambda _{1n}}(|\theta _{0k}|)] \nonumber \\= & {} \frac{1}{n}\sum _{i=1}^{n} \Bigg |\varepsilon _{i}-\delta _{n} Z_{i}^{T}M\Bigg |-\frac{1}{n}\sum _{i=1}^{n} |\varepsilon _{i}|+\sum _{k=1}^{q_{n}}[p_{\lambda _{1n}} (|\theta _{k}|)-p_{\lambda _{1n}}(|\theta _{0k}|)]\nonumber \\\ge & {} \frac{1}{n}\sum _{i=1}^{n}\Bigg [\Bigg |\varepsilon _{i}-\delta _{n} Z_{i}^{T}M\Bigg |-|\varepsilon _{i}|\Bigg ] +\sum _{k\in \mathscr {A}_{1}}[p_{\lambda _{1n}}(|\theta _{k}|)-p_{\lambda _{1n}}(|\theta _{0k}|)]\nonumber \\\equiv & {} I_{n1}+I_{n2}. \end{aligned}$$
(13)

We first consider \(I_{n1}\). From Knight (1998), we have the following identity:

$$\begin{aligned} |a-b|-|a|=-b[I(a>0)-I(a<0)]+2\int _{0}^{b}[I(a\le s)-I(a\le 0)]ds. \end{aligned}$$

Hence, we have

$$\begin{aligned} I_{n1}= & {} -\frac{1}{n}\sum _{i=1}^{n}\delta _{n} Z_{i}^{T}M [I(\varepsilon _{i}>0)-I(\varepsilon _{i}<0)]\nonumber \\&+\frac{2}{n}\sum _{i=1}^{n}\int _{0}^{\delta _{n} Z_{i}^{T}M}[I(\varepsilon _{i}\le s)-I(\varepsilon _{i}\le 0)]ds\nonumber \\\equiv & {} \, I_{n3}+I_{n4}. \end{aligned}$$
(14)

From condition (C3), we have \(E[I(\varepsilon _{i}>0)-I(\varepsilon _{i}<0)]=0\). Hence, invoking condition (C6), we can prove

$$\begin{aligned} E(I_{n3})= & {} -\frac{1}{n}\sum _{i=1}^{n}\delta _{n} E(Z_{i})^{T}M E[I(\varepsilon _{i}>0)-I(\varepsilon _{i}<0)]=0,\\ Var(I_{n3}) \,= \, & {} \frac{\delta _{n}^{2} }{n^{2}}\sum _{i=1}^{n}M^{T}E(Z_{i}Z_{i}^{T})M E[I(\varepsilon _{i}>0)-I(\varepsilon _{i}<0)]^{2} \le \frac{\delta _{n}^{2}\rho _{2} }{n}\Vert M\Vert ^{2}. \end{aligned}$$

Hence by the Markov inequality, we obtain

$$\begin{aligned} P(|I_{n3}|\ge \delta _{n}^{2}\Vert M\Vert )\le \frac{E(I_{n3}^{2})}{\delta _{n}^{4}\Vert M\Vert ^{2}} \le \frac{\delta _{n}^{2}\rho _{2} \Vert M\Vert ^{2}}{n\delta _{n}^{4}\Vert M\Vert ^{2}}\rightarrow 0. \end{aligned}$$

This implies that

$$\begin{aligned} I_{n3}=o_{p}(\delta _{n}^{2})\Vert M\Vert . \end{aligned}$$
(15)

Next we consider \(I_{n4}\). We denote

$$\begin{aligned} S_{ni}=\frac{2}{n}\int _{0}^{\delta _{n} Z_{i}^{T}M}[I(\varepsilon _{i}\le s)-I(\varepsilon _{i}\le 0)]ds. \end{aligned}$$

Then

$$\begin{aligned} I_{n4}=\sum _{i=1}^{n}S_{ni}=\sum _{i=1}^{n}[S_{ni}-E(S_{ni})]+\sum _{i=1}^{n}E(S_{ni})\equiv I_{n5}+I_{n6}. \end{aligned}$$
(16)

Note that

$$\begin{aligned} nE(S_{ni}^{2}) \, = \, & {} n\frac{4}{n^{2}}E\left\{ \left[ \int _{0}^{\delta _{n} Z_{i}^{T}M}[I(\varepsilon _{i}\le s)-I(\varepsilon _{i}\le 0)]ds\right] ^{2}\right\} \\\le & {} \frac{4}{n}E\left\{ \left[ \int _{0}^{|\delta _{n} Z_{i}^{T}M|}ds\right] ^{2}\right\} \\= & {} \frac{4}{n}E\left\{ |\delta _{n} Z_{i}^{T}M|^{2}\right\} \le \frac{4\delta _{n}^{2}}{n}M^{T}E( Z_{i} Z_{i}^{T})M \le \frac{4\delta _{n}^{2}}{n} \rho _{2}\Vert M\Vert ^{2}. \end{aligned}$$

Then we obtain

$$\begin{aligned} P(|I_{n5}|\ge \delta _{n}^{2})\le \frac{Var(\sum _{i=1}^{n}S_{ni})}{\delta _{n}^{4}} \le \frac{nE(S_{ni}^{2})}{\delta _{n}^{4}}\rightarrow 0. \end{aligned}$$

This implies \(I_{n5}=o_{p}(\delta _{n}^{2})\). In addition, by the dominated convergence theorem, we can obtain

$$\begin{aligned} I_{n6} \,= \,& {} 2E\left\{ \int _{0}^{\delta _{n} Z_{i}^{T}M}[I(\varepsilon _{i}\le s)-I(\varepsilon _{i}\le 0)]ds\right\} \nonumber \\ \,=\, & {} 2E\left\{ E\left[ \left. \int _{0}^{\delta _{n} Z_{i}^{T}M}[I(\varepsilon _{i}\le s)-I(\varepsilon _{i}\le 0)]ds\right| Z_{i}\right] \right\} \nonumber \\ \,= \, & {} 2E\left\{ \int _{0}^{\delta _{n} Z_{i}^{T}M}[F(s)-F(0)]ds\right\} \nonumber \\= & {} 2f(0)E\left\{ (1+o(1))\int _{0}^{\delta _{n} Z_{i}^{T}M}sds\right\} \nonumber \\ \,= \, & {} f(0)(1+o(1))\delta _{n}^{2}M^{T}E(Z_{i}Z_{i}^{T})M\nonumber \\ \, = \, & {} O_{p}(\delta _{n}^{2})\Vert M\Vert ^{2}. \end{aligned}$$
(17)

Next we consider the term \(I_{n2}\). Invoking condition (C8), some calculations yield

$$\begin{aligned} |I_{n2}| \,=\, & {} \sum _{k\in \mathscr {A}_{1}}[p_{\lambda _{1n}}(|\theta _{k}|)-p_{\lambda _{1n}}(|\theta _{0k}|)]\nonumber \\= & {} \sum _{k\in \mathscr {A}_{1}} \delta _{n}p'_{\lambda _{1n}}(|\theta _{0k}|)sgn(\theta _{0k})|M_{k}|+ \sum _{k\in \mathscr {A}_{1}} \delta _{n}^{2}p''_{\lambda _{1n}}(|\theta _{0k}|)sgn(\theta _{0k})|M_{k}|^{2}(1+o(1))\nonumber \\\le & {} \sqrt{s}\delta _{n} a_{n}\Vert M\Vert +\delta _{n}^{2}b_{n}\Vert M\Vert ^{2}\nonumber \\ \,= \,& {} o_{p}(\delta _{n}^{2})\Vert M\Vert ^{2}. \end{aligned}$$
(18)

Then, by choosing a large C, all terms \(I_{n2}\), \(I_{n3}\) and \(I_{n5}\) are dominated by \(I_{n6}\) with \(\Vert M\Vert =C\). Note that \(I_{n6}\) is positive, then invoking (1318), we obtain that (12) holds. Furthermore, by the convexity of \(Q_{n}(\cdot )\), we have

$$\begin{aligned} P\left\{ \inf _{\Vert M\Vert \le C}Q_{n}(\theta )>Q_{n}(\theta _{0})\right\} \ge 1-\varepsilon . \end{aligned}$$

This implies, with probability at least \(1-\varepsilon\), that there exists a local minimizer \(\widehat{\theta }\) such that \(\widehat{\theta }-\theta _{0}=O_{p}(\delta _{n})\), which completes the proof of Theorem 1. \(\square\)

Proof of Theorem 2

For convenience and simplicity, let \(\theta _{0}=(\theta _{\mathscr {A}_{1}}^{T},\theta _{\mathscr {A}_{2}}^{T})^{T}\) with \(\theta _{\mathscr {A}_{1}}=\{\theta _{0k}:k\in \mathscr {A}_{1}\}\) and \(\theta _{\mathscr {A}_{2}}=\{\theta _{0k}:k\in \mathscr {A}_{2}\}\). The corresponding covariate is denoted by \(Z_{i}=(Z_{i}^{(1)T},Z_{i}^{(2)T})^{T}\). From the proof of Theorem 1, for a sufficiently large C, \(\widehat{\theta }\) lies in the ball \(\{\theta _{0}+\delta _{n}M:\Vert M\Vert \le C\}\) with probability converging to 1, where \(\delta _{n}=\sqrt{q_{n}/n}\). We denote \(\theta _{1}=\theta _{\mathscr {A}_{1}}+\delta _{n} M_{1}\) and \(\theta _{2}=\theta _{\mathscr {A}_{2}}+\delta _{n} M_{2}\) with \(\Vert M_{1}\Vert ^{2}+\Vert M_{2}\Vert ^{2}\le C^{2}\), and \(V_{n}(M_{1},M_{2})=Q_{n}(\theta _{1},\theta _{2})-Q_{n}(\theta _{\mathscr {A}_{1}},0)\), then the estimator \(\widehat{\theta }=(\widehat{\theta }_{1}^{T},\widehat{\theta }_{2}^{T})^{T}\) can also be obtained by minimizing \(V_{n}(M_{1},M_{2})\), except on an event with probability tending to zero. Hence, to prove this theorem, we only need to prove that, for any \(M_{1}\) and \(M_{2}\) satisfying \(\Vert M_{1}\Vert ^{2} +\Vert M_{2}\Vert ^{2}\le C^{2}\), if \(\Vert M_{2}\Vert >0\), then with probability tending to 1, we have

$$\begin{aligned} V_{n}(M_{1},M_{2})-V_{n}(M_{1},0)>0. \end{aligned}$$
(19)

Note that

$$\begin{aligned}&V_{n}(M_{1},M_{2})-V_{n}(M_{1},0)\nonumber \\&~~=Q_{n}(\theta _{1},\theta _{2})-Q_{n}(\theta _{\mathscr {A}_{1}},0) -[Q_{n}(\theta _{1},0)-Q_{n}(\theta _{\mathscr {A}_{1}},0)]\nonumber \\&~~=\left\{ \frac{1}{n}\sum _{i=1}^{n}\Bigg |Y_{i}-Z_{i}^{(1)T}\theta _{\mathscr {A}_{1}}-\delta _{n} Z_{i}^{(1)T}M_{1}- \delta _{n} Z_{i}^{(2)T}M_{2}\Bigg | -\frac{1}{n}\sum _{i=1}^{n}\Bigg |Y_{i}-Z_{i}^{(1)T}\theta _{\mathscr {A}_{1}}\Bigg |\right\} \nonumber \\&~~~~-\left\{ \frac{1}{n}\sum _{i=1}^{n}\Bigg |Y_{i}-Z_{i}^{(1)T}\theta _{\mathscr {A}_{1}}-\delta _{n} Z_{i}^{(1)T}M_{1}\Bigg | -\frac{1}{n}\sum _{i=1}^{n}\Bigg |Y_{i}-Z_{i}^{(1)T}\theta _{\mathscr {A}_{1}}\Bigg |\right\} \nonumber \\&~~~~+\sum _{k=\mathscr {A}_{2}}p_{\lambda _{1n}}(|\theta _{2k}|)\nonumber \\&~~=\left\{ \frac{1}{n}\sum _{i=1}^{n}\Bigg |\varepsilon _{i}-\delta _{n} Z_{i}^{(1)T}M_{1}-\delta _{n} Z_{i}^{(2)T}M_{2}\Bigg | -\frac{1}{n}\sum _{i=1}^{n}|\varepsilon _{i}|\right\} \nonumber \\&~~~~-\left\{ \frac{1}{n}\sum _{i=1}^{n}\Bigg |\varepsilon _{i}-\delta _{n} Z_{i}^{(1)T}M_{1}\Bigg |-\frac{1}{n}\sum _{i=1}^{n}|\varepsilon _{i}|\right\} +\sum _{k\in \mathscr {A}_{2}}p_{\lambda _{1n}}(|\theta _{2k}|). \end{aligned}$$
(20)

Similar to the proof of Theorem 1, we can obtain

$$\begin{aligned}&\left\{ \frac{1}{n}\sum _{i=1}^{n}\Bigg |\varepsilon _{i}-\delta _{n} Z_{i}^{(1)T}M_{1}-\delta _{n} Z_{i}^{(2)T}M_{2}\Bigg | -\frac{1}{n}\sum _{i=1}^{n}|\varepsilon _{i}|\right\} \nonumber \\&~~~~~~~-\left\{ \frac{1}{n}\sum _{i=1}^{n}\Bigg |\varepsilon _{i}-\delta _{n} Z_{i}^{(1)T}M_{1}\Bigg |-\frac{1}{n}\sum _{i=1}^{n}|\varepsilon _{i}|\right\} \nonumber \\&~~~\ge O_{p}(\delta _{n}^{2})+\frac{f(0)}{2}\delta _{n}^{2}M^{T}E(ZZ^{T})M+O_{p}(\delta _{n}^{2}) -\frac{f(0)}{2}\delta _{n}^{2}M_{1}^{T}E\Big (Z^{(eq1)}Z^{(1)T}\Big )M_{1}\nonumber \\&~~~= O_{p}(\delta _{n}^{2})+O_{p}(\delta _{n}^{2})f(0)\Vert M\Vert ^{2}. \end{aligned}$$
(21)

In addition, for \(k\in \mathscr {A}_{2}\), we have \(\theta _{0k}=0\). Then invoking \(p_{\lambda _{n}}(0)=0\), we can derive

$$\begin{aligned} \sum _{k\in \mathscr {A}_{2}}p_{\lambda _{1n}}(|\theta _{2k}|)= & {} \sum _{k\in \mathscr {A}_{2}}p_{\lambda _{1n}}(|\theta _{0k}+\delta _{n}M_{2k}|)\nonumber \\= & {} \sum _{k\in \mathscr {A}_{2}}p_{\lambda _{1n}}(|\theta _{0k}|) +\sum _{k\in \mathscr {A}_{2}}p'_{\lambda _{1n}}(|\theta _{0k}|)\delta _{n}|M_{2k}| +O_{p}(\delta _{n}^{2})\sum _{k\in \mathscr {A}_{2}}|M_{2k}|^{2}\nonumber \\ \,= \, & {} \delta _{n}p'_{\lambda _{1n}}(0) \sum _{k\in \mathscr {A}_{2}}|M_{2k}|+O_{p}(\delta _{n}^{2})\Vert M\Vert ^{2}. \end{aligned}$$
(22)

Hence, from (2022), we have

$$\begin{aligned}&V_{n}(M_{1},M_{2})-V_{n}(M_{1},0)\nonumber \\&\quad \ge \delta _{n}\lambda _{1n}\left( O_{p}(\sqrt{q_{n}/n}/ \lambda _{1n})f(0)\Vert M\Vert ^{2}+p'_{\lambda _{1n}}(0)/\lambda _{1n} \sum _{k\in \mathscr {A}_{2}}|M_{2k}|\right) . \end{aligned}$$
(23)

By conditions (C7) and (C8), we have \(\sqrt{q_{n}/n}/\lambda _{n}\rightarrow 0\) and \(p'_{\lambda _{1n}}(0)/\lambda _{1n}>0\). Hence, (23) implies that (19) holds with probability tending to 1. This completes the proof of Theorem 2. \(\square\)

Proof of Theorem 3

Note that Theorem 2 implies that the variable selection for optimal instrumental variables is consistent, then model (6) implies that, with probability tending to 1, we have \(X_{i}=\varGamma _{\mathscr {A}_{1}} Z_{i}^{*}+e_{i}\), \(i=1,\ldots ,n\). In addition, because \(\widehat{\varGamma }\) is the moment estimator of \(\varGamma _{\mathscr {A}_{1}}\), we can prove \(\widehat{\varGamma }=\varGamma _{\mathscr {A}_{1}}+O_{p}(\sqrt{p_{n}/n})\). Hence, invoking \(E(e_{i})=0\), a simple calculation yields

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}(X_{i}-X_{i}^{*})^{T}\beta= & {} \frac{1}{n}\sum _{i=1}^{n}(\varGamma _{\mathscr {A}_{1}}Z_{i}^{*} +e_{i}-\widehat{\varGamma }Z_{i}^{*})^{T}\beta \nonumber \\= & {} \frac{1}{n}\sum _{i=1}^{n}Z_{i}^{*T}(\varGamma _{\mathscr {A}_{1}} -\widehat{\varGamma })^{T}\beta +\frac{1}{\sqrt{n}}\left( \frac{1}{\sqrt{n}}\sum _{i=1}^{n}e_{i}^{T}\beta \right) \nonumber \\ \,= \, & {} O_{p}(\Vert \varGamma _{\mathscr {A}_{1}}-\widehat{\varGamma }\Vert )+O_{p}(n^{-1/2})=O_{p}(\sqrt{p_{n}/n}). \end{aligned}$$
(24)

Furthermore, we let \(\beta _{0}\) and \(\gamma _{0}\) be the true values of \(\beta\) and \(\gamma\), respectively, and denote \(R(U_{i})=g(U_{i})-W_{i}^{T}\gamma _{0}\). Then from Schumaker (1981), we have \(\Vert R(U_{i})\Vert =O_{p}(\kappa _{n}^{-r})=O_{p}(\sqrt{\kappa _{n}/n})\). Hence, invoking (24), some calculations yield

$$\begin{aligned} M_{n}(\beta ,\gamma )= & {} \frac{1}{n}\sum _{i=1}^{n}\Bigg |X_{i}^{T} (\beta _{0}-\beta )+(X_{i}-X_{i}^{*})^{T}\beta +W_{i}^{T}(\gamma _{0}-\gamma )\nonumber \\&+R(U_{i})+\varepsilon _{i}\Bigg | +\sum _{j=1}^{p_{n}}p_{\lambda _{2n}}\left( |\beta _{j}|\right) \nonumber \\= & {} \frac{1}{n}\sum _{i=1}^{n}\Bigg |X_{i}^{T}(\beta _{0}-\beta ) +W_{i}^{T}(\gamma _{0}-\gamma )+\varepsilon _{i}\Bigg | +\sum _{j=1}^{p_{n}}p_{\lambda _{2n}}\left( |\beta _{j}|\right) \nonumber \\&+O_{p}\Bigg (\Vert \hat{\varGamma }-\varGamma _{\mathscr {A}_{1}}\Vert +\Vert R(U_{i})\Vert \Bigg )\nonumber \\= & {} \frac{1}{n}\sum _{i=1}^{n}\Bigg |X_{i}^{T}(\beta _{0}-\beta ) +W_{i}^{T}(\gamma _{0}-\gamma )+\varepsilon _{i}\Bigg | +\sum _{j=1}^{p_{n}}p_{\lambda _{2n}}\left( |\beta _{j}|\right) +O_{p}(\delta _{n}), \end{aligned}$$
(25)

where \(\delta _{n}=\sqrt{(p_{n}+\kappa _{n})/n}\). Furthermore, we denote \(\alpha _{0}=(\beta _{0}^{T},\gamma _{0}^{T})^{T}\) and \(\alpha =(\beta ^{T},\gamma ^{T})^{T}\) with \(\alpha =\alpha _{0}+\delta _{n} M\), where M is a \((p_{n}+L_{n})\) dimensional vector. Then (25) implies that

$$\begin{aligned} M_{n}(\beta ,\gamma )=\frac{1}{n}\sum _{i=1}^{n}\Bigg |\varepsilon _{i}-\delta _{n}\xi _{i}^{T}M\Bigg | +\sum _{j=1}^{p_{n}}p_{\lambda _{2n}}\left( |\beta _{j}|\right) +O_{p}(\delta _{n}), \end{aligned}$$
(26)

and

$$\begin{aligned} M_{n}(\beta _{0},\gamma _{0})=\frac{1}{n}\sum _{i=1}^{n}|\varepsilon _{i}| +\sum _{j=1}^{p_{n}}p_{\lambda _{2n}}\left( |\beta _{0j}|\right) +O_{p}(\delta _{n}), \end{aligned}$$
(27)

where \(\xi _{i}=(X_{i}^{T},W_{i}^{T})^{T}\). Furthermore, we let \(\varDelta _{n}(\beta ,\gamma )= M_{n}(\beta ,\gamma )-M_{n}(\beta _{0},\gamma _{0})\), then from (26) and (27), we have

$$\begin{aligned} \varDelta _{n}(\beta ,\gamma ) = \frac{1}{n}\sum _{i=1}^{n}\Bigg [ |\varepsilon _{i}-\delta _{n} \xi _{i}^{T}M|- |\varepsilon _{i}|\Bigg ]+\sum _{j=1}^{p_{n}}\Bigg [p_{\lambda _{2n}} (|\beta _{j}|)-p_{\lambda _{2n}}(|\beta _{0j}|)\Bigg ] +O_{p}(\delta _{n}). \end{aligned}$$
(28)

Hence invoking (28), and using the similar arguments to the proof of (13), we have that, for any given \(\varepsilon >0\), there exists a large constant C such that

$$\begin{aligned} P\left\{ \inf _{\Vert M\Vert =C}M_{n}(\beta ,\gamma )>M_{n} (\beta _{0},\gamma _{0})\right\} \ge 1-\varepsilon . \end{aligned}$$

This implies, with probability at least \(1-\varepsilon\), that there exists a local minimizer \(\widehat{\beta }\) and \(\widehat{\gamma }\), which satisfy \(\widehat{\beta }-\beta _{0}=O_{p}(\sqrt{(p_{n}+\kappa _{n})/n})\) and \(\widehat{\gamma }-\gamma _{0}=O_{p}(\sqrt{(p_{n}+\kappa _{n})/n})\). Then, we complete the proof of part (i) in Theorem 3. \(\square\)

In addition, invoking the proof of part (i), and using the same arguments as the proof of Theorem 2, we can prove part (ii) in Theorem 3. Then we omit the proof procedure of part (ii) in detail.

Proof of Theorem 4

A simple calculation yields

$$\begin{aligned} \Vert \widehat{g}(u)-g(u)\Vert ^{2} \,= \,& {} \int _{0}^{1}\{\widehat{g}(u)-g(u)\}^{2}du\nonumber \\ \,= \, & {} \int _{0}^{1}\{B^{T}(u)\widehat{\gamma }-B^{T}(u)\gamma _{0}+R(u)\}^{2}du\nonumber \\\le & {} 2\int _{0}^{1}\{B^{T}(u)\widehat{\gamma }-B^{T}(u)\gamma _{0}\}^{2}du+ 2\int _{0}^{1}R(u)^{2}du\nonumber \\ \,= \,& {} 2(\widehat{\gamma }-\gamma _{0})^{T}H(\widehat{\gamma }-\gamma _{0})+ 2\int _{0}^{1}R(u)^{2}du, \end{aligned}$$
(29)

where \(R(u)=g(u)-B^{T}(u)\gamma _{0}\) and \(H=\int _{0}^{1}B(u)B^{T}(u)du\). From the proof of Theorem 3, we can obtain \(\Vert \widehat{\gamma }-\gamma _{0}\Vert =O_{p}(\sqrt{(p_{n}+\kappa _{n})/n})\). Then from condition (C7) and \(\kappa _{n}=O(1/(2r+1))\), we can prove \(O_{p}(\sqrt{(p_{n}+\kappa _{n})/n})=O_{p}(\sqrt{\kappa _{n}/n})=O_{p}(n^{-r/(2r+1)})\). Then, invoking \(\Vert H\Vert =O(1)\), a simple calculation yields

$$\begin{aligned} (\widehat{\gamma }_{k}-\gamma _{k0})^{T}H(\widehat{\gamma }_{k}-\gamma _{k0}) =O_{p}\left( n^{\frac{-2r}{2r+1}}\right) . \end{aligned}$$
(30)

In addition, from conditions C1, C4 and Corollary 6.21 in Schumaker (1981), we can obtain \(R(u)=O(\kappa _{n}^{-r})=O(n^{-r/(2r+1)})\). Then, it is easy to show that

$$\begin{aligned} \int _{0}^{1}R(u)^{2}du=O_{p}\left( n^{\frac{-2r}{2r+1}}\right) . \end{aligned}$$
(31)

Invoking (2931), we complete the proof Theorem 4. \(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, C., Zhao, P. & Yang, Y. Regularization statistical inferences for partially linear models with high dimensional endogenous covariates. J. Korean Stat. Soc. 50, 163–184 (2021). https://doi.org/10.1007/s42952-020-00067-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42952-020-00067-4

Keywords

Mathematics Subject Classification

Navigation