Debiased/Double Machine Learning for Instrumental Variable Quantile Regressions

The aim of this paper is to investigate estimation and inference on a low-dimensional causal parameter in the presence of high-dimensional controls in an instrumental variable quantile regression. The estimation and inference are based on the Neyman-type orthogonal moment conditions, that are relatively insensitive to the estimation of the nuisance parameters. The Monte Carlo experiments show that the econometric procedure performs well. We also apply the procedure to reinvestigate two empirical studies: the quantile treatment effect of 401(k) participation on accumulated wealth, and the distributional effect of job-training program participation on trainee earnings.


Introduction
Model selection and variable selection are widely discussed in the area of prediction. Much less attention, however, has been paid to the modification of prediction methods under the context of causal machine learning in economics, cf. Athey (2017) and Athey (2018).
As one of the pioneering papers, within the linear framework of instrumental variable estimation, Belloni et al. (2014) proposed a double-selection procedure to correct for an omitted variable bias in a high-dimensional framework. Constructing a general framework encompassing results from the aforementioned Belloni's paper, Chernozhukov et al. (2015) and Chernozhukov et al. (2018a) proposed a unified procedure, double/debiased machine learning (DML), which remains valid for nonlinear or semi-nonparametric models. The aim of this paper is to investigate estimation and inference on a low-dimensional causal parameter in the presence of high-dimensional controls in an instrumental variable quantile regression. In particular, our procedure follows the idea outlined by Chernozhukov et al (2018b). To the best of our knowledge, the present study is the first to investigate Monte Carlo performance and empirical studies of the double machine learning procedure within the framework of instrumental variable quantile regressions. The Monte Carlo experiments show that our econometric procedure performs well.
Causal machine learning has been actively studied in economics in recent years, which are based on two approaches: the double machine learning, cf. Chernozhukov et al. (2018), and the generalized random forests, cf. Athey, Tibshirani and Wager (2019). Chen and Hsiang (2019) investigate the generalized random forests model using instrumental variable quantile regression. In contrast to the DML for instrumental variable quantile regressions, their econometric procedure yields a measure of variable importance in terms of heterogeneity among control variables. Although related to our paper, Chen and Hsiang (2019) do not consider the setting of high-dimensional controls.
We apply the proposed procedure to empirically investigate causal quantile effects of the 401(k) participation on net financial assets. Our empirical results signify that the 401(k) participants with low savings propensity are more associated with the nonlinear income effect, which complements the findings concluded in Chernozhukov et al. (2018a) and Chiou et al. (2018). Another empirical example of the job training program participation is investigated as well.
The rest of the paper is organized as follows. The model specification and estimation procedure are introduced in Section 2. Section 3 presents Monte Carlo experiments.
Section 4 presents two empirical applications. Section 5 concludes the paper.
We briefly review the conventional instrumental variable quantile regression (IVQR), and then the IVQR within the framework of high-dimensional controls. Our DML procedure for the IVQR is introduced in this section, which is constructed based on a tentative procedure suggested by Chernozhukov et al. (2018b).

The Inverse Quantile Regression as a GMM Estimator
The following conditional moment restriction yields an IVQR estimator.
where q(·) is the structural quantile function, τ stands for the quantile index, D, X and Z are, respectively, the target variable, control variables and instruments. Condition (1) and linear structural quantile specification leads to the following unconditional moment is a vector of a function of instruments and control variables. The parameters depend on the quantile of interest, but we suppress the τ associated with α and β for simplicity of presentation. Equation (2) leads to a particular moment condition for doing partialling out: with "instrument" where δ is a matrix parameter, We construct the grid search interval for α first and profile out the coefficient for each α in the interval on the exogenous variable by equation (5). That is, We build sample counterpart of the population moment condition based on equations (2)-(5). That is,ĝ where K h N is a kernel function with bandwidth h N . We thus can solve for the parameters through optimizing the GMM criterion function. Specifically, where Σ(a 1 , a 2 ) is a weighting matrix used in the GMM estimation. Notice that the estimatorα based on the inverse quantile regression (i.e. IVQR) is first-order equivalent to the estimator defined by the GMM.

Estimation with High-dimensional Controls
We modify the procedure introduced in Subsection 2.1 in order to deal with a dataset of high-dimensional control variables. We construct the grid search interval for α and profile out the coefficients on exogenous variable using the L 1 -norm penalized quantile regression estimator:β We also do dimension reduction on J because of the large dimension of X. In particular, we implement the following regularization.
The regularization above does a weighting LASSO for each instrument variable on control variables, and consequently the L 1 norm optimization obeys the Karush-Kuhn-Tucker condition After implementing the double machine learning procedure outlined above for the IVQR, we now can solve for the low-dimensional causal parameter α through optimizing the GMM defined as follows. The sample counterpart of the moment condition Accordingly,α = arg min More importantly, the aforementioned double machine learning procedure (DML-IVQR hereafter) satisfies the Neyman orthogonality conditions, cf. Chernozhukov et al. (2018b).

Weak-Identification Robust Inference
Under the regularity conditions listed in Chernozhukov and Hansen (2008), the asymptotic normality of the GMM estimator with a nonsmooth objective function is guaranteed. We Consequently, it leads to We define It then follows that a valid (1 − p) percent confidence region for the true parameter, α 0 , may be constructed as the set where c 1−p is the critical point such that and A can be numerical approximated by the grid {α j , j = 1, ..., J}.

Algorithms for L1-norm Penalized Quantile Optimization
The suggested double machine learning algorithm involves solving L1-norm optimization which is a nontrivial task. Researchers often represent the L1-norm penalized quantile objective function as a linear programming problem. Specifically, However, it turns out that the computation is challenging and time-consuming. For instance, it often meets the singular design within the high dimensional framework. As an alternative, we utilize the algorithm developed by Yi and Huang (2017) who use the Huber loss function to approximate the quantile loss function. In the equation (12), ρ τ is not differentiable, and

is the Huber loss function of t defined in Yi and
Huang (2017), we have ρ τ (t) ≈ 1 2 h τ (t) + (2τ − 1)t for small τ . Therefore the equation (12) can be approximated by The optimization above stands for the Huber approximation. This optimization problem is more computationally feasible for the sake of the differentiability of the loss function.

Monte Carlo Experiments
We evaluate the finite-sample performance, in terms of RMSE and MAD, of the double machine learning for the IVQR. The following data generating process is modified from the one considered in Chen and Lee (2018).
where Φ(·) is the cumulative distribution function of a standard normal random variable. Consequently, where F (·) is the cumulative distribution function of .

Partialing out and nonPartialing out Z on X
We focus on comparing MAD and RMSE resulting from different models under the exact specification (10 control variables). po-GMM stands for doing partialing out Z on X.
GMM stands for doing no partialing out Z on X. Table 1 shows that doing partialing out Z on X leads to an efficiency gain across quantiles especially when sample size is moderate.  The date generating process considers ten control variables. po-GMM stands for doing partialing out Z on X. GMM stands for doing no partialing out Z on X.

IVQR with High-dimensional Controls
We now evaluate the finite-sample performance of the IVQR with high-dimensional controls. The data generating process involves 100 control variables with an approximate sparsity structure. In particular, the exact model (true model) depends only on 10 relevant control variables out of the 100 controls. GMM uses 100 control variables without regularization. Table 2 shows that the RMSE and MAD stemmed from the DML-IVQR are close to those from the exact model. In addition, Figure 1 plots distributions of the IVQR estimator with/without double machine learning. The DML-IVQR stands for the double machine learning for the IVQR with high-dimensional controls. Histograms signify that the DML-IVQR estimator is more efficient and less biased than the IVQR using many control variables. Since a weak-identification robust inference procedure results naturally form the IVQR, cf. Chernozhukov and Hansen (2008), we construct the robust confidence regions for the GMM and the DML-IVQR estimators. Figure 2 signifies that,    The data consists of 9915 observations.
Following the regression specification in Chernozhukov and Hansen (2004), Table 3 presents quantile treatment effects obtained from different estimation procedures which have been defined in the previous section including IVQR, po-GMM and GMM. The corresponding results are similar. As to the high-dimensional analysis, we create 119 technical control variables including those constructed by the polynomial bases, interaction terms, and cubic splines (thresholds). To ensure each basis has equal length, we utilize the minimax normalization for all technical control variables. Consequently, we use the plug-in method to determine the value of penalty when doing the LASSO under the moment condition, and tune the penalty in the quantile L1-norm objective function based on the Huber approximation by 5-fold cross validation. The DML-IVQR also implements 3600 3600 3700 5700 13200 15800 17700 NFTA(po-GMM) 3500 3600 3700 5600 13900 15800 17700 NFTA(GMM) 3500 3600 3700 5700 13900 16100 18200 feature normalization of the outcome variable for the sake of computational efficiency.
To make the estimated treatment effects across different estimation procedures roughly comparable, Table 4 shows the effect obtained through the DML-IVQR multiplied by the standard deviation of the outcome variable. Weak identification/instrument robust inference on quantile treatment effects are depicted in Figures 4 and 5. Yet, the robust confidence interval widens as the sample size becomes fewer at the upper quantiles; estimated quantile treatment effects are significantly different from zero. We could use the result from the DML-IVQR as a data-driven robustness check on those summarized in the Table 3.

Effects of subsidized training on male and female trainee earnings
Abadie, Angrist and Imbens (2002) use the Job Training Partnership Act (JTPA) data to estimate the quantile treatment effect of job training on the earning distribution. The data are from Title II of the JTPA in early 1990s, which consist of 11,204 samples, 5,102 of them are male, and 6,102 of them are female. In estimation, they take thirtymonth earnings as the outcome variable, enrollment for JTPA service as the treatment variable, and a randomized o er of JTPA enrollment as the instrumental variable. The control variables include the binary variables of black and Hispanic applicants, highschool graduates, married applicants, 5 age-group, AFDC receipt (for women), whether the applicant worked at least 12 weeks in the 12 months preceding random assignment, the dummies for the original recommended service strategy (classroom, OJT/JSA, other) and a dummy for whether earnings data are from the second follow-up survey. Table 7 presents quantile treatment effects for male and female groups respectively obtained from several estimation procedures including IVQR, po-GMM, and GMM. As to the high-dimensional analysis, we create 85 technical control variables including those constructed by the polynomial bases, interaction terms, and cubic splines (thresholds). Table 8 shows the quantile treatment effect obtained through the DML-IVQR. Table   7 together with the existing findings in the literature suggest that for female only, job training program generates significantly positive treatment effect on earnings at 0.5 and 0.75 quantiles. The DML-IVQR signifies similar results, which can be confirmed by the identification-robust confidence intervals depicted in Figures 6 and 7. The selected variables are collected in the online appendix 1 . Thus, the existing empirical conclusions in the literature is reassured by the IVQR using double machine learning procedure.   We create 85 technical control variables including those constructed from the polynomial bases, interaction terms, and cubic splines (thresholds).  The performance of a debiased/double machine learning algorithm within the framework of high-dimensional IVQR is investigated. The simulation results signify that the proposed procedure performs more efficiently than those based on the conventional estimator with many controls. Furthermore, we evaluate the corresponding weak-identification robust confidence interval of the low-dimensional causal parameter. Given a large number of technical controls, we reinvestigate quantile treatment effects of the 401(k) participation on accumulated wealth and then highlight the non-linear income effects driven by the the savings propensity.