Parameter estimation through semiparametric quantile regression imputation

: In this article, we consider an imputation method to handle missing response values based on semiparametric quantile regression estimation. In the proposed method, the missing response values are gener- ated using the semiparametrically estimated conditional quantile regression function at given values of covariates. Then the imputed values are used to estimate a parameter deﬁned as the expected value of a function involving the response and covariate variables. We derive the asymptotic distribution of our estimator constructed with the imputed data and provide a variance estimator. In simulation, we compare our semiparametric quantile regres- sion imputation method to fully parametric and nonparametric alternatives and evaluate the variance estimator based on the asymptotic distribution. We also discuss an extension for estimating a parameter deﬁned through an estimation equation.


Introduction
Missing data is frequently encountered in many disciplines. If the variables that explain the missing data are related to the response of interest, an inference based on ignoring missing undermines efficiency and often leads to biases and misleading conclusions. Missing data adjustments include weighting and imputation as two broad classes. In weighting, weights are derived through calibration or propensity score estimation. However, weighting is poorly suited to item nonresponse since for a given unit the weight should be the same for all items. Imputation provides a complete data set by replacing missing response variables with imputed values. In the presence of item nonresponse, imputation simplifies analyses because standard analytical tools can be applied to any imputed data set and the resulting point estimates are consistent across different users.
Many different imputation approaches have been developed in the literature and some prominent examples are included here. The pioneer work of Rubin (1987) discussed multiple imputation (MI) based on Bayesian methods to 3622 Senniang Chen and Cindy L. Yu generate pseudo values from the posterior predictive distribution and impute multiple data sets. Despite its simple form, however, the variance estimator of MI will have convergence problems if the congeniality and self-sufficiency conditions are not met (Meng (1994)). Fractional imputation was proposed to retain both estimation efficiency of multiple imputation and consistency of the Rao-Shao variance estimator (Rao and Shao (1992)). In fractional imputation, multiple values are imputed for each missing cell with assigned weights. Kim (2011) proposed parametric fractional imputation (PFI) with inspirations from importance sampling and calibration weighting to reduce the computation burden. Noticeably, both PFI and MI assume a parametric regression model, and therefore may suffer from model misspecification. While MI and PFI resort to the creation of artificial responses, hot-deck imputation (HDI) replaces missing units with observed data through matching methods. By using covariate information, the matching method could be classifying donors and recipients into similar categorical classes (Brick and Kalton (1996); Kim and Fuller (2004)), or creating metrics to match donors and recipients (Rubin (1987); Little (1988)). More examples are documented in Andridge and Little (2010). In a recent work by Wang and Chen (2009), multiple imputed values are independently drawn from observed respondents with probabilities proportional to kernel distances between missing cells and donors. Both HDI and Wang and Chen (2009) are purely nonparametric, so the stability and accuracy of the estimators depend on the dimensionality and the sample size concerned.
To leverage the advantages of both parametric and nonparametric methods and avoid the limitation of a pure or exclusive approach, we propose an imputation method based on semiparametric quantile regression, which has the following set up. Define f (y|x) as the conditional density where y is the response subject to missing and x is the covariate always observed, and q τ (x) as the τ -th conditional quantile function, which is the inverse conditional distribution function F −1 (τ |x). Instead of estimating f (y|x) parametrically or non-parametrically, we estimate q τ (x) semiparametrically using observed data under the missing at random (MAR) assumption, in the sense intended by Rubin (1976). Then multiple imputed values y * j (j = 1, · · · , J) are obtained via y * j |x =q τj (x), where τ j is independently drawn from Uniform[0, 1]. The semiparametric quantile regression imputation (hereafter called SQRI) is expected to have appealing features. Firstly, the entire conditional distribution function is used to draw imputed values, hence preserving the conditional density of the filled-in response values. Secondly, because different conditional quantiles instead of conditional means or actual observations are used in imputation, the method is less sensitive to outliers, as quantiles are known to be less affected by extremes. Thirdly, it does not require strong model assumptions as in a fully parametric solution, and therefore is robust against model violations. Lastly, imputed values can be easily created through random numbers generated from Uniform[0, 1] onceq τ (x) is estimated.
We are not the first to use quantile regression for imputation. Papers pertaining to quantile regression imputation include Munoz andRueda (2009), Wei, Ma, andCaroll (2012) and Yoon (2013). Our paper is distinctive from these papers in terms of objective, type of imputation and theory. (i) For objective, Wei, Ma, and Caroll (2012) and Yoon (2013) aimed to estimate quantile regression coefficients, while we are interested in estimating more general parameters, e.g. parameters defined through a smooth estimation equation. It is also worth noting that the setting in Wei, Ma, and Caroll (2012) is different since they dealt with missing covariates, not missing responses. (ii) For type, Wei, Ma, and Caroll (2012) imputed multiple data sets, and Munoz and Rueda (2009) proposed a single and deterministic imputation. Our method utilizes fractional imputation because the variance estimator used in multiple imputation is not consistent for some estimated parameters. For examples, see Wang and Robins (1998) and Kim, Brick, Fuller, and Kalton (2006). (iii) For theory, Wei, Ma, and Caroll (2012) and Yoon (2013) assumed a linear quantile regression model, while we rely on a semiparametric approach incorporating penalty for model complexity. And the key idea used to arrive at the asymptotic normality in our proof is substantially different from that of Wei, Ma, and Caroll (2012) and Yoon (2013).
We demonstrate the theoretical validity and applicability of our SQRI estimators in this paper. The rest of paper is organized as follows. Section 2 introduces the SQRI algorithm and the estimator constructed using the imputed values. Section 3 presents large sample theories and a variance estimator. Section 4 discusses an extension for estimating a parameter defined through an estimation equation. Section 5 demonstrates the properties of SQRI estimators through simulations. Section 6 concludes with some remarks. Appendix outlines proofs of the theorems appearing in the main text.

Semiparametric quantile regression imputation (SQRI)
We consider (x i , y i ) T (i = 1, · · · , n) to be a set of i.i.d. observations of random variables (X, Y ), where Y is the response variable subject to missing, and X is a d x -dimension variable always observed. Each dimension of X has a compact support set on [0, 1]. Let q τ (x) be the unknown conditional 100τ % quantile of response Y given X = x. For a given τ ∈ (0, 1), the conditional quantile function q τ (x) is defined as a function satisfying, (2.1) where ρ τ (u) = u(τ − I(u < 0)) is the check function proposed in Koenker and Bassett (1978). The true parameter of interest, θ 0 , is a d θ -dimensional vector defined as, If the full sample is observed, we can estimate θ 0 in (2.3) by, Because y i is unobserved for missing units, θ full is not available. In Section 2.2, we outline a procedure to obtain an imputed version of (2.4) by using semiparametric quantile regression. Let δ i = 1 be the response indicator, defined by δ i = 1 if y i is observed and δ i = 0 if y i is missing. Assume δ i ∼ Bernoulli(p i ), where the response probability p i is defined as, (2.5) Assume that the response mechanism satisfies the condition of missing at random (MAR) by Rubin (1976). MAR asserts that the response variable y i is conditionally independent of the indicator δ i and can be expressed in terms of the response probability as, (2.6) or in terms of the conditional distribution of y i given x i as, (2.7)

SQRI using penalized B-splines
Many papers have studied the estimation of q τ (x) based on parametric methods, and a summary of relevant literature can be found in Koenker (2005). Parametric model assumptions may not hold sometimes, giving rise to nonparametric methods. Nonparametric quantile regression, including the kernel quantile regression in Yu and Jones (1994) and the smoothing spline method in Koenker, Ng, and Portnoy (1994), has also been intensively studied. Among many findings is the well-known trade-off between computational cost and smoothness. In other words, spline smoothing methods demand massive computation, and the unpenalized spline tends to give wiggly curves despite its cheap computational cost. There are some recent work on nonparametric quantile regression (thanks to a referee for drawing our attention). Belloni, Chetverikov, and Fernandez-Val (2016) developed the nonparametric quantile regression series framework in which the conditional quantile function is approximated by a linear combination of series terms. They showed that this framework can cover many models as a special case. Chao, Volgushev, and Cheng (2016) established weak convergence for the quantile regression estimator in a general series approximation framework, and studied a partial linear model which is included as a special case of such framework and might be a solution to the curse of dimensionality in a nonparametric approach. In this paper, we employ a semiparametric quantile regression method based on penalized B-splines, as suggested in Yoshida (2013), that features a relatively smoothed quantile function at reduced computational burden. Without loss of generality, we consider X as an univariate variable with a distribution function F x (x) on [0, 1] to simplify notations in theories. We discuss how to deal with multivariate X in the simulation study of Section 5. Let K n −1 be the number of knots within the range (0, 1), and p be the degree of B-splines. In order to construct the p-th degree B-spline basis, we define equidistantly located knots as κ k = K −1 n k, (k = −p + 1, · · · , K n + p). Note there are K n − 1 knots located in (0, 1). The p-th B-spline basis is k (x)(k = −p + 1, · · · , K n ) are defined recursively as, • For s = 0: where k = −p + 1, · · · , K n + p; • For s = 1, 2, · · · , p: where k = −p + 1, ..., K n + p − s.
Readers can refer to de Boor (2001) for more details and properties of the B-spline functions. The estimated conditional quantile regression function iŝ Here λ n (> 0) is the smoothing parameter, and D m is the m-th difference matrix and is (K n + p − m) × (K n + p) dimensional with its element defined as where the notation m k is the choose function given by (k!(m − k)!) −1 m! and m is the order of penalty. As discussed in Yoshida (2013), the difference penalty b T (τ )D T m D m b(τ ) is used to remove computational difficulty occurring when the penalty term is defined through an integral, and it controls the smoothness of the estimated quantile regression function. When m = 2, D m has an interpretation related to the integral of the square of the second derivative of the function defined by the B-spline. Because the second derivative of a straight line is zero, the use of D m for m = 2 shrinks the estimated quantile regression function toward a straight line. The asymptotic property ofq τ (x) is given in Section 3. Section 5 discusses about how we choose the tuning parameters (λ n , m, K n , p) in practice.

Parameter estimation through SQRI
Suppose the parameter of interest is θ 0 = E{r(y i , x i )}. Cheng (1994) showed , and an estimator of θ 0 can be obtained as, (2.10) The integral in (2.10) can be approximated by This suggests the following SQRI imputation procedure. When y i is missing, we generate J imputed values {y * ij } J j=1 by, 1. Simulate τ j ∼ Uniform(0,1) independently for j = 1, 2, · · · , J; 2. For each j = 1, 2, ..., 3. For the missing unit i, J independent values are generated as, Repeat step 3 for every missing unit in the data set. Then an imputed version of θ full in (2.4) is defined as, (2.11) For some parametric imputation methods, imputation and estimation steps are entwined, in that updating parameters and re-imputing based on most recently updated parameters are iteratively done. This might require heavy computing time. In the SQRI described above, imputation and estimation steps are totally separate, making general purpose parameter estimation possible. Also in SQRI, standard analytical tools can be directly applied to imputed data without re-imputation. The PFI by Kim (2011) avoids re-imputation by adjusting weights of imputed values based on iteratively updated parameters. However, any parametric imputation method, including PFI and MI, might suffer from model misspecification. Nonparametric imputation, such as HDI or the method proposed in Wang and Chen (2009) using kernel distance, assumes no parametric model, but the stability and accuracy of nonparametric estimators depend on sample size and dimensionality of the problem. The SQRI provides a useful compromise between a fully parametric approach and a purely nonparametric approach.

Large sample theories and variance estimation
In this section, we demonstrate the theoretical validity of θ constructed with the imputed values generated from the SQRI procedure described in Section 2, provide a variance estimator for θ, and also introduce an alternative SQRI estimator using fixed τ j 's.

Asymptotic normality of θ
The derivation of the asymptotic distribution of θ proceeds in two steps. In Lemma 1, we give the asymptotic distribution of the estimated quantile funtion q τ (x), which leads to the Bahadur representation of the estimated quantile regression coefficientsb(τ ). In Theorem 1, we provide the asymptotic distribution of θ. The Bahadur representation ofb(τ ) is crucial to the proof of Theorem 1.

Senniang Chen and Cindy L. Yu
for a given x ∈ (0, 1) and τ ∈ (0, 1), where Here f y|x (x, y) is the conditional density of Y given X = x. There exist two sources of asymptotic biases inq τ (x). One is b a τ (x) which is the model bias between the true function q τ (x) and the spline model used, see equation (3.1). Another source of bias b λ τ (x) is introduced by adding the penalty term into the quantile regression. When there is no penalty term (λ n = 0), this bias vanishes. Both of these two bias terms have an order of O(n − p+1 2p+3 ). By the proof of Lemma 1, the estimator of the quantile regression coefficientb(τ ) has the following Bahadur representation, . The property in (3.2) is extensively used in the derivation of Theorem 1 which regards the asymptotic normality of θ. We now state Theorem 1.

Theorem 1. Under the conditions given in the Appendix, and assuming
Remark 1: When there is no missing, The key step in the proof for Theorem 1 is to show that Appendix A gives the proof of Lemma 1, and Appendix B outlines the proof of Theorem 1.

Variance estimator for θ
We use the asymptotic matrix Σ We use kernel density estimation to estimate f y|x (x, y) as, where K(·) is a Normal kernel and a (or b) is the bandwidth for x (or y).

Alternative SQRI estimator using fixed τ j 's
When using random τ j 's generated from Uniform[0, 1] to impute y * ij , it is possible to get some τ j 's that are very close to zero or one. This might cause instability in variance estimation due to extreme quantiles. To avoid this potential trouble, we propose an alternative SQRI estimator that uses fixed τ j 's.
The estimator θ defined in (2.10) is a consistent estimator for θ 0 = E{r(y i , x i )}. Instead of approximating the integral 1 0 r(q τ (x i ), x i )dτ in θ using J random τ 's generated from Uniform[0, 1], we can use a midpoint approximation approach (e.g. Nusser, Carriquiry, Dodd, and Fuller (1996)) which entails dividing the interval [0, 1] into J sub-intervals and evaluating the function at the midpoints of the J sub-intervals. An estimator using the fixed midpoints can be defined as, where τ j (j = 1, · · · , J) are the fixed sequence of the midpoints of J evenly By the proof given in Appendix C, the SQRI estimator θ fixed has the following asymptotic order, Under the same conditions given in Theorem 1 and assuming further that J = O(n α ) for α > 1/2, the two SQRI estimators θ fixed and θ have the same (3.11) Therefore the asymptotic normality and the variance estimator of θ fixed can be derived similarly as those forθ in Sections 3.1 and 3.2. To see whether there are any finite sample differences between the SQRI estimator using random τ j 's and the SQRI estimator using fixed τ j 's, both of the estimators are calculated and reported in the simulation studies of Section 5.

Estimating a parameter defined by an estimation equation through SQRI
In this section, we discuss how to use the SQRI to estimate more general parameters defined through an estimation equation. If the true parameter θ 0 is defined as an unique solution to, where g(Y, X; θ) is a vector of d g estimation functions for d g ≥ d θ . We consider the generalized method of moment (GMM) to estimate θ 0 . The GMM estimator is obtained as, and y * ij are J imputed values generated through the SQRI procedure either using random τ j 's from Uniform[0, 1] or using the fixed sequence of the midpoints of J equally spaced subintervals on [0, 1].
The asymptotic normality of θ gmm is very similar to Theorem 1, and is stated as follows, and the other notations are defined the same way as before. The variance estimator for Var {ξ i,g (θ 0 )} can be obtained by estimating ξ i,g (θ 0 ) similarly as estimating ξ i,r (θ 0 ) in Section 3.2. For example, see equation (3.6), where r(y i , x i ) can be replaced by g(y i , x i ; θ gmm ) andṙ y (q τj (x k ), x k ) can be replaced byġ y (q τj (x k ), x k ; θ gmm ). The additional gradient matrix Γ(θ 0 ) can be estimated by, (4.6) The proof of the asymptotic normality of θ gmm requires additional regularity conditions regarding to the functional form of g(y i , x i ; θ), and is similar to the proof in Theorem 1. To conserve space, we defer the details of this proof to the supplement.

Simulation studies
The main objectives of the simulation studies are to compare SQRI to fully parametric and nonparametric alternatives, and to evaluate the performance of the variance estimator for SQRI. In addition, we also want to see if there are any differences in finite sample performances between the two SQRI estimators.
We specify the simulation set-up as follows. The response y i is generated from a model y i = m(x i ) + i , where m(x i ) is the mean function and i are iid N (0, 0.1 2 ). We consider the following four different mean functions drawing from the design of simulation studies in Breidt, Opsomer, and Johnson (2007) to cover a range of correct and incorrect model specification.
The covariate x i for the first three univariate models (or x 1i and x 2i for the last bivariate model) are all independently and identically simulated from a truncated normal distribution N (0.5, 0.3 2 ) on interval [0, 1].
The response indicator, δ i , is simulated from Bernoulli(p i ), where logit(p i ) = 1 + 0.5x i for the linear, bump, cycle models, or logit(p i ) = 0.2 + x 1i + 0.5x 2i for the bivariate model.
( 5.1) The missing rates in all scenarios are about 20%.
We are interested in estimating three parameters, the marginal mean of the response variable μ y = E(Y ), the marginal standard deviation of the response variable σ y = Var (Y ) and the correlation between the response and covariate variables ρ = corr(X, Y ). So θ = (μ y , σ y , ρ) and the corresponding estimation equation is defined as For bivariate model, θ = (μ y , σ y , ρ 1 , ρ 2 ), where ρ 1 = corr(X 1 , Y ) and ρ 2 = corr(X 2 , Y ) and the estimation equation is defined in an analogous way. Note that μ x and σ 2 x are the mean and variance of the covariate and are treated as nuisance parameters.
For each model, 1000 Monte Carlo (MC) samples of size n = 200 are created. In order to focus on comparing different imputation methods, all of the following estimators are calculated using the GMM method once the missing values are filled in using the specified imputation approaches.
• Full: The GMM estimator using full observations. • Resp: The GMM estimator using respondents only (where "Resp" comes from the word "respondents"). • SQRI: Our proposed estimator defined in (4.2), using random τ j 's from Uniform[0, 1]. Using the GMM estimator constructed with SQRI imputed values facilitates the joint estimation of μ y , σ y and ρ's, and their variance estimation.
• SQRI-fixed: The proposed estimator defined in (4.2), but using the fixed midpoints of J evenly spaced sub-intervals of [0, 1] as τ j in step 1 of the SQRI procedure. • MI: The multiple imputation method proposed in Rubin (1987). The R package 'mi' by Gelman, Hill, Su, Ya, and Pittau (2013) is employed to obtain J multiple imputed data sets. Then the GMM estimator is calculated for each imputed data set, and the MI estimator is obatined by taking an average of the GMM estimators across multiple imputed data sets. • PFI: The parametric fractional imputation method proposed in Kim (2011).
Under PFI, multiple imputed values y * ij (j = 1, · · · , J) are generated from a proposed conditional densityf (y|x) and their associated fractional weights w * ij are computed usingf (y|x) and the assumed conditional density f (y|x;η 0 ), whereη 0 is the given initial value for η in the conditional density formula. By maximizing the score function of the density f (y i |x i ; η) using the imputed values and their weights,η is updated, and the fractional weights w * ij are re-calculated iteratively untilη converges. The PFI estimator is calculated using the GMM method, with the estimation equation for a missing unit i replaced by J j=1 w * ij g(y * ij , x i ; θ). • NPI: The non-parametric imputation method proposed in Wang and Chen (2009). In NPI, multiple imputed values y * ij (j = 1, · · · , J) are independently drawn from the respondent group (δ i = 1) with the probability of selecting y s with δ s = 1 being where K(·) is a d x -dimensional kernel function and h is a smoothing bandwidth. In our simulations, the Gaussian kernel is used with h prescribed by a cross-validation method. The NPI estimator is obtained using the GMM method with the estimation equation for a missing unit i replaced by J −1 J j=1 g(y * ij , x i ; θ). • HDFI: A hot-deck fractional imputation estimator. Under HDFI, multiple imputed values y * ij (j = 1, · · · , J) are independently drawn from a donor pool consisting of 20 nearest neighbors identified through the Euclidean distance. The HDFI estimator is calculated using the GMM method with the estimation equation for a missing unit i replaced by J −1 J j=1 g(y * ij , x i ; θ). The Full and the Resp estimators are included in order to help us gauge how far away our proposed estimators are from the ideal case and from the case of simply ignoring missing. Estimators NPI and HDFI are based on nonparametric imputation methods, while estimators MI and PFI are based on parametric imputation methods, where y i is assumed to satisfy Y |X = x ∼ N (β T x, σ 2 ) for some σ > 0. Our SQRI estimators are semiparametric as we use penalized B-spline to estimate conditional quantile regression. For penalized B-spline quantile estimators, typically the degree of B-spline p and the degree of the difference matrix m are fixed at low values, for example p ≤ 3 and m ≤ 2. We set p = 3 and m = 2, a popular choice in practice as suggested in Yoshida (2013). For a given K n (where K n = number of knots + 1), the smoothing parameter λ n is prescribed via the generalized approximation cross-validation (GACV) method discussed by Yuan (2006). We obtain results for a variety of choices of K n and conclude K n = 5 suffices in our examples. In the bivariate model, the same specifications are used to obtain bases B(x 1 ) and B(x 2 ) on x 1 and x 2 respectively, then B(x) is defined as, We discuss how to choose the number of imputed values, J, practically. Like all other imputation methods, the theoretical proofs of the SQRI estimators require J to go to infinity to ensure the consistency. But a finite number (J) imputation is necessary in practice. A midpoint approximation approach used for SQRI-fixed estimator sheds some light on our choice of J in practice. By the property of (3.10), we see that θ fixed and θ have the same asymptotic equivalence when J = O(n α ) for α > 1/2. We can use an integer which is close to n 1/2 as an initial choice of J. For example, in our simulation studies, n 1/2 with the sample size n = 200 equals 14.14. So J = 10 seems to be a reasonable choice and is used in the imputation. The simulation results with J = 100 are also provided in the paper for comparisons.
Tables 1-4 present the relative MC biases and the MC variances of all estimators with both J = 10 and J = 100 for the four models respectively. The relative MC bias is defined as the ratio of the MC bias to the true parameter value. The relative biases of the proposed SQRI estimators are less than 1% in all cases (Tables 1-4). In general, the proposed SQRI estimators have smaller relative biases and variances as compared with the Resp estimator because the SQRI estimators incorporate additional covariate information of the missing units while the Resp estimator totally ignores missing units.
We compare the SQRI estimators with the two parametric imputation alternatives, MI and PFI. When the linear model is correctly specified, the SQRI estimators have relative biases comparable to those in the MI and PFI estimators. When the model is misspecified (bump, cycle, bivariate), the SQRI estimators generally have significantly smaller biases than the MI and PFI estimators. The reduction in biases of the SQRI estimators in the bump, cycle, and bivariate models, relative to the MI and PFI, results because the SQRI procedure relies on fewer model assumptions. In terms of variances (Tables 1-4), the proposed SQRI estimators have variances comparable to those in the two parametric ones under the correct linear model, but have slightly smaller variances under the incorrect models.
We compare the SQRI estimators with the two nonparametric imputation alternatives, NPI and HDFI. The SQRI estimators have considerably smaller biases than NPI and HDFI estimators in most of the cases. The variances of the SQRI estimators are generally in line with those of the NPI and HDFI estimators (Table 1-4).
When comparing the SQRI estimator using random τ 's with the SQRI-fixed estimator using fixed τ 's, we don't see any consistent superiority of one over another in relative biases. And the variances of the two SQRI estimators are similar. The simulation studies show that J = 10 is sufficient for the proposed  We evaluate the performance of variance estimation for both of the SQRI estimators using random τ 's and fixed τ 's. Tables 5 and 6 contain the coverage probabilities and the half interval widths of the 95% C.I.'s using the two SQRI estimators for both J = 10 and J = 100. The 95% C.I.'s are calculated based on the asymptotic normality and a bootstrapping method. For the SQRI estimator  using random τ 's, the coverage probabilities based on normality are close to the nominal level of 0.95 except for ρ under the linear and cycle models, and (σ y , ρ 1 ) under the bivariate model. The normality approximation of the SQRI estimator has undercoverage for ρ in the linear and cycle models and for σ y in the bivariate model, but overestimates the coverage for ρ 1 in the bivariate model. When using the SQRI-fixed estimator, the coverage probabilities based on normality slightly improve in general, though the coverage probabilities for  ρ in the linear and cycle models are still low (about 85.3% and 92.5%). The confidence interval lengths of the two SQRI estimators based on normality are comparable. A bootstrapping method then is conducted as a remedy to obtain the confidence intervals. The bootstrapping algorithm is described as follows.
1. Draw a simple random sample χ * n with replacement from the original sample χ n = (x i , y i , δ i ) n i=1 ; 2. Implement semiparametric quantile regression to impute values for the missing cells in χ * n ; 3. Estimateθ using the SQRI and SQRI-fixed estimators in equation (4.2). 4. Repeat step 1 ∼ 3 for B times, then we haveθ [1] ,θ [2] , · · · ,θ [B] . The 2.5-th and 97.5-th percentiles of {θ [b] } B b=1 give the lower and upper bounds of the 95% confidence interval. We use B = 400 in our simulation. In general, the bootstrapping method has a slightly better performance over normal approximation method, offering satisfactory coverage probabilities close to 0.95 even when J is small.
In summary, the simulation study results confirm that the SQRI estimators can be used as effective alternatives in imputation. Both SQRI and SQRI-fixed estimators perform similarly in point estimates and variance estimation. However, if one has a concern about having extreme imputed values, the SQRI-fixed estimator might be preferable because using fixed τ 's can prevent it from happening.

Future work
There will be some future work along this line. In this paper, we propose quantile regression imputation method for missing response cases. Wei, Ma, and Caroll (2012) used multiple imputation with the goal of estimating the quantile regression coefficients of y on x when covariates are subject to missing. Combining their method with ours, we can propose a quantile regression imputation method to handle data with both missing covariates and missing responses. Another limitation of the current approach is that we focus on low dimensional covariates. It is important to extend the proposed method to high dimensional covariates. Model selection for high dimensional data has been intensively studied in the area of quantile regression. One of the popular model selection methods is the Least Absolute Shrinkage and Selection Operator (LASSO) method. We can use LASSO quantile regression to reduce the dimension first, then a conditional quantile regression can be estimated based on the selected model. We will investigate on these topics in our future work.
(f) r(y, x) has a bounded second derivative function with respect to y.

A. Proof of Lemma 1
Assume the number of knots K n − 1 and the smoothing parameter λ n depend on n. By Barrow and Smith (1978), there exists b * (τ ) that satisfies sup x∈(0,1) and q (p+1) τ (x) is the (p+1)-th derivative of q τ (x) with respect to x. Here Br p (·) is the p-th Bernoulli polynomial inductively defined as Br 0 (x) = 1, and x 0 Br p−1 (z)dzdx is the p-th Bernoulli number (Barrow and Smith (1978) and Yoshida (2013)).
Proof : The following proof needs the Knight's identity, Similar to Lemma 3 in Yoshida (2013), we can show that We can see that U n (Δ) is convex with respect to Δ, so it has a unique minimizer Δ n (τ ) as where H n (τ ) = λn n D T m D m + Φ(τ ). Therefore, we have  The results in (i) and (ii) of Lemma 1 follow immediately.

B. Proof of Theorem 1
To save space, we sketch the proof of Theorem 1. For details, readers are referred to the supplement which provides the detailed derivation for the asymptotic normality of the GMM estimator θ gmm defined in Section 4. Note that the estimator θ in Theorem 1 is a special case of the GMM estimator when defining g(y i , x i ; θ) = r(y i , x i ) − θ. So the proofs are very similar.
Proof : We can decompose √ n( θ − θ 0 ) into three terms as follows, . (B.1) Terms of B 1 and B 2 are easy since they are sums of i.i.d. random variables. The key idea in our proof is to replace B 3 byB 3 = E(B 3 |A R ) where A R = {δ i , (y i , x i )|δ i = 1; i = 1, · · · , n}, and to show the following two results: (1)B 3 = n −1/2 n i=1 δ i C p h r (x i , y i )B(x i ) + o p (1), and (2)B 3 − B 3 = o p (1).

C. Proof of property (3.10)
We can decompose θ fixed − θ 0 into three terms as follows, .
(C.1) It is easy to see