L1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$L_{1}$\end{document}-Estimation for covariate-adjusted regression

We study a covariate-adjusted regression(CAR) model that is proposed for such situations where both predictors and response in a regression model are not directly observable but are distorted by a multiplicative factor that is determined by an unknown function of some observable covariate. By establishing a connection to varying-coefficient models, we present the local linear L1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$L_{1}$\end{document}-estimation method when the underlying error distribution deviates from a normal distribution. The robust estimators of parameters are proposed in the underlying regression model. The consistency and asymptotic normality of the robust estimators are investigated. Since the limit distribution depends on the unknown components of the errors, an empirical likelihood ratio method based on L1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$L_{1}$\end{document} estimator is proposed. The confidence intervals for the regression coefficients are constructed. Simulation results demonstrate the superiority of the proposed estimators over other classical estimators when the underlying errors have heavy tails. Pima Indian diabetes data set is conducted to illustrate the performance of the proposed method, where the response and predictors are potentially contaminated by body mass index.


Introduction
Covariate-adjusted regression(CAR) was initially proposed for regression analysis by Sentürk and Müller [18], where both the response and predictors are not directly observed.
The available data are distorted by unknown functions of some common observable covariate. An example is the fibrinogen data collected on 69 hemodialysis patients, where the regression of fibrinogen level on serum transferrin level is of interest in Kaysen et al. [11]. Both response and predictor are known to be influenced in a multiplicative fashion by body mass index, defined as weight/height 2 . Based on the observation, Sentürk and Müller [18] suggested that the confounding variable affects the primary variables through a flexible multiplicative unknown function. Such way of adjustment may reduce non-negligible bias and lead to consistent estimators of the parameters of interest, which is through dividing by the body mass index identified as a common confounder. For the simple case of two variables of interest, the underlying variables are where "⊥ ⊥"indicates independence, Y and X are the unobservable continuous variables of interest, whileỸ andX are available distorted variables. U is an observed continuous scalar confounding variable, ψ(·) and φ(·) denote unknown smooth contaminating functions of U. The main goal is to uncover the relationship between response Y and covariate X, based on the confounding variable U, and on the contaminate variables Y andX. Sentürk and Müller [18] considered the simple linear regression model where γ 0 and γ 1 are unknown parameters, e is the error term, e ⊥ ⊥ (X, U). Reasonable assumption for ψ(·) and φ(·) is that the mean distorting effect vanishes, that is, The central objective, based on the observation of the confounding variable U and the distorted observations (Ỹ ,X) in (1.1) is to estimate the unknown parameters γ 0 and γ 1 .
To eliminate the effect caused by distortions, Sentürk and Müller [19] proposed covariate-adjusted varying coefficient model(CAVCM) to target the covariate-adjusted relationship between longitudinal variables. Sentürk and Nguyen [20] proposed the estimation procedures based on local polynomial smoothing technique (LP) for the model. Cui et al. [3] considered the covariate-adjusted nonlinear regression and proposed a direct plug-in estimation procedure for the model. Li et al. [12] studied covariate-adjusted partially linear regression models and obtained confidence intervals for the regression coefficients.
According to model (1.1)-(1.3), the regression ofỸ onX can be expressed as This is a varying coefficient model with heteroscedasticity, that is, a useful extension of classical linear models. Varying coefficient models are widely used in diverse areas as the modeling bias can significantly be reduced and the "curse of dimensionality"problem can also be avoided. See, for example, Hastie and Tibshirani [8], Fan and Zhang [5][6][7]. Least-squares (LS) method is the popular approach in the vast literature on model (1.4), as LS method has favorable properties for a large class of error distributions. However, this method will break down when the random error is adversely affected by outliers and heavy-tail distributions. The robust estimation method is desired. In this article, we propose robust coefficient estimation motivated by Tang et al. [21]. We use a two-step estimation procedure to estimate the unknown parameters. Firstly, we use L 1 -estimation to estimate varying coefficients based on local linear fit. Because model (1.4) is heteroscedastic, the inferring methods are not same. Secondly, the estimates of unknown parameters are constructed based on weighted averages of these functions. However, the limiting variance has a complex structure with several unknown components. An estimated empirical log-likelihood approach to construct the confidence region of the regression parameter is developed. An empirical log-likelihood ratio is proved to be asymptotically standard chi-square.
The rest of this article is organized as follows. In Sect. 2, we describe the L 1 estimation procedure and propose the estimation of both nonparametric and parametric components. We obtain the asymptotic results and discuss the efficiency of the estimators. In Sect. 3, we construct empirical likelihood based confidence regions for the parameters. Section 4 presents the hypothesis testing procedure. In Sects. 5 and 6, some simulations and empirical study are carried out to assess the performance of the proposed estimators and confidence regions. Section 7 concludes the paper with discussion. The proofs of theorems are deferred to Appendix.

Estimation and asymptotic behavior
Consider a covariate-adjusted regression model in the following general form: where γ = (γ 1 , . . . , γ p ) τ . Y and X r , r = 1, . . . , p, are unobservable variables distorted by smooth function ψ(U) and φ r (U).Ỹ ,X r , r = 1, . . . , p, and univariate confounder U are observable variables. e is random error with 0.5 quantile being zero, E(ψ(U)) = 1, E(φ r (U)) = 1, r = 1, . . . , p. Our goal is to estimate the unknown parameter γ consistently based the observed data, and to further establish asymptotic normality for the proposed estimators. The estimation of the regression coefficient γ is a two-step estimation procedure similar to that in Sentürk and Müller [18]. From model (2.1), we rewrite CAR into the following CAVCM: In the first step, we employ L 1 -estimation to estimate varying coefficients β(U) based on local linear fit. For U in the neighborhood of u, we use a local linear approximation Suppose that {U i ,X i ,Ỹ i }, i = 1, . . . , n, are independent and identically distributed samples from model (2.1), where a = (a 1 , . . . , a p ) τ , b = (b 1 , . . . , b p ) τ .
In the second step, from (C1), (C2), and (2.1), The unknown regression parameters γ r , r = 1, . . . , p, are obtained as averages of raw estimatesβ r (U i ). The estimates are given bŷ whereX r = 1 n n i=1X ir . In this section, we establish the asymptotic properties ofγ r .
Theorem 1 Under the regularity conditions in Appendix, if h → 0 and nh → ∞ as n → ∞, then Theorem 2 Under the regularity conditions in Appendix, if nh 2 / log(1/h) → ∞ and nh 4 → 0 as n → ∞, then the asymptotic distribution ofγ r is given by The optimal bandwidth forβ r (·) is h ∼ n -1/5 . This bandwidth does not satisfy the condition in Theorem 2. In order to obtain the asymptotic normality forγ r , undersmoothing for β r (·) is necessary. The requirement has also been used in the literature for semiparametric model; see Carroll et al. [2] for a detailed discussion.

Empirical likelihood
Although we have obtained the asymptotic distribution of γ r , the σ 2 r is complex and includes several unknown components to be estimated. To resolve this difficulty, we propose an empirical likelihood method to construct a confidence interval for γ r . For more information on the empirical likelihood estimation, we refer to Owen [16].
Note that E((β r (U i )γ r )X ir ) = 0 for i = 1, 2, . . . , n, r = 1, . . . , p if γ r is the true parameter. Hence, the problem of testing whether γ r is the true parameter is equivalent to testing whether E((β r (U i )γ r )X ir ) = 0. By Owen [15], to construct an empirical likelihood ratio function for γ r , we denote V i (γ r ) = (β r (U i )γ r )X ir . That is, we can define the profile empirical likelihood ratio function It can be shown that L n (γ r ) is asymptotically chi-squared with 1 degree of freedom. However, L n (γ r ) cannot be directly used to make statistical inference on γ r because L n (γ r ) contains the unknown β r (·). A natural way is to replace β r (·) by L 1 -estimatorβ r (·) and to replace V i (γ r ) byV i (γ r ). Then an estimated empirical likelihood ratio function is defined byL By the Lagrange multiplier method,L n (γ r ) can be represented aŝ In the following, we show that logL n (γ r ) converges to the standard chi-square distribution with degree 1.

Theorem 3 Under conditions of Theorem 2, we havê
According to Theorem 3, we construct a (1α)-level confidence region of γ r :

Bootstrap test
It is often of practical interest to test for the significance of the regression coefficients. We consider the null hypothesis where c r is an unknown constant. Under the null hypothesis, the smooth estimatorβ r (u) of β r (u) is expected to be close to a horizontal line. We average {β r (U i )} to obtain the estimator of parameter c r . Similar to the statistics proposed by Cai, Fan, and Yao [1], the residual sum of squares under null hypothesis is Analogously, the the residual sum of squares corresponding to model (2.2) is The goodness-of-fit test statistic is defined as and we reject the null hypothesis (4.1) for large values of T n . The distribution of T * n computed from the bootstrap samples is used as an approximation to the distribution of T n . The p-value of the test is the relative frequency of the event T * n ≥ T n .

Simulation study
In this section, we carry out simulations to investigate the performance of our proposed methods as outlined in Sects. 2 and 3. We shall compare the finite sample performance of the LP procedure with our approach. The underlying (unobserved) multiple regression model considered is as follows: where the predictors X 1 , X 2 , and X 3 are distributed as N(2, 1. For the weight function, we use the Epanechnikov kernel, and the asymptotic optimal bandwidth h for LP has been considered in Senturk et al. [19]. We can produce simple formulas for the asymptotic optimal bandwidth h for L 1 : [10]. We repeat the simulation 1000 times with sample sizes of 50, 100, and 200, respectively. The corresponding results are summarized in Table 1. As we can see from the table, when the error is normally distributed, the proposed L 1 estimators perform nearly as well as the LP estimators although they have slightly larger biases and standard deviations. However, for the other two nonnormal errors, LP estimators are not as good as expected. And L 1 estimators have a significant improvement. For the sample sizes, n = 100 and 200 samples are generated from the above simulated data, and for each sample, the 95% confidence intervals are computed using the empirical likelihood, which is reported in Table 2. When n increases, we see that the coverage probabilities increase.

Application
We illustrate the methodology via an application to the diabetes data set which contains eight-dimensional patterns to understand the prevalence of diabetes and other cardiovascular risk factors. The 131 subjects analyzed here are females at least 35 years old of Pima Indian heritage who were actually screened for diabetes. The female patients may have abnormal insulin action that prevents the body from normal utilization of glucose. Obesity is a risk factor in both diabetes and hypertension. One of the purposes of the study is to identify risk factors for diabetes, among which is hypertension. In this study, we investigate the relationship between plasma glucose (GLU) concentration and hypertensive measure, diastolic blood pressure (DBP). We analyze the simple linear regression relationship between GLU and DBP, GLU = γ 0 + γ 1 DBP + e. Body mass index (BMI) is identified to be a major factor significantly associated with elevated prevalence of hypertension and diabetes. Both the response and the predictor are potentially affected by body mass index (BMI), BMI = weight/height 2 . The varying coefficient model has the following form GLU = β 0 (BMI) + β 1 (BMI)DBP + e(BMI), based on the confounding variable BMI and the contaminate variables, GLU and DBP. The parameters γ 0 and γ 1 are estimated by the covariate-adjusted regression algorithm. Three outliers are removed before the analysis. The p-values for covariate-adjusted regression estimates are obtained from 1000 bootstrap samples. For least squares regression, DBP was close to being significant, p = 0.056, while with covariate-adjusted re-gression it became highly significant, p = 0.029. Thus, the covariate-adjusted regression model is more appropriate for the data than least-squares regression. We shall compare the performance of the LP procedure with our approach. The LP estimates are (γ 0 ,γ 1 ) = (72.1511, 0.5972). The L 1 estimates are (γ 0 ,γ 1 ) = (74.2114, 0.6035). We estimate the standard deviation of γ 0 and γ 1 based on 1000 bootstrap samples for both these two methods. From the above, we can see that the difference between the estimated parameters based on LP modeling and L 1 modeling is relatively small; however, L 1 estimators have smaller standard errors than LP approach, which means that L 1 modeling has better performance. It is believed that the distortion effect of the obesity index on blood pressure is different from its effect on plasma glucose, and the distortion effect of the obesity index on GLU can be assessed directly from the estimated intercept function.

Discussion
In this paper, we propose a robust and efficient procedure for CAR, which has improved on the earlier proposed LP estimation when the underlying error distribution deviates from normal distribution, and the asymptotic normality has been established under some regular conditions. We propose a two-step estimation procedure considered for CAR. Firstly, we use L 1 -estimation motivated by Tang et al. [21] to estimate varying coefficients based on local linear fit. The performance of the smoothing technique chosen for estimation of the varying coefficient functions in the first step does affect the overall performance of the CAR estimates in the second step. When the data contain outliers or come from population with heavy-tailed distributions, L 1 -estimation should yield better estimators. Secondly, the estimates of unknown parameters are constructed based on weighted averages of these functions. In addition, an estimated empirical log-likelihood approach to construct the confidence region of the regression parameter is developed, and the confidence intervals for the regression coefficients are constructed. Finally, it is interesting to develop a robust and efficient variable selection procedure for the CAR in high dimension setting.
In order to obtain our results, we first prove some lemmas. Y 1 ), . . . , (X n , Y n ) be i.i.d. random vectors, where the Y i s are scalar random variables. Assume further that E|Y | r < ∞ and that sup X |y| r f (x, y) dy < ∞, where f (x, y) denotes the joint density of (X, Y ). Let K(·) be a bounded positive function with bounded support, satisfying a Lipschitz condition. Then The proof of Lemma 1 can be found in Mack and Silverman [13].

Lemma 2 Under the regularity conditions
The last equality holds because the last term is free of the optimization variables a and b. By applying the following identity: Since L 1 -loss is a special case of quantile loss function at 0.5, the next proof is similar to that of Theorem 3.1 of Kai et al. [10]. By Lemma 1, we have The conditional expectation of B n can be calculated as follows: It can be shown that Similar to the proof procedures of Theorem 3.1 of Kai et al. [10], by applying the convexity lemma of Pollard [17] and the quadratic approximation lemma of Fan et al. [4], the minimizer can be expressed as That is, Hence, we get Obviously, the asymptotic expression of β(U k ) iŝ We split the second term in the previous expression into two parts R 1k + R 2k , where

By the fact
without loss of generality, assuming - where ξ between 0 and -r i (U k ) ψ(U i ) . Noting that K(·) is a symmetric function, we have E(R ii,1k ) = O(h 3 ) uniformly for k. In the same spirit, we can prove E(R ij,1k ) = O(h 6 ) uniformly for k. It follows that For R 2k , noting that . Therefore, we can obtain (A.1).

Lemma 3 Under assumptions
Proof We use some elementary calculation to obtain By central theorems for the sum of independent and identically distributed random variables, we can obtain that By Lemma 2, we can show that n -1/2 n i=1 (β r (U i )-β r (U i ))X ir P → 0. This together with (A. 5) and (A.6) proves Lemma 3.

Lemma 4 Under conditions of Lemma 2, we have
By the law of large numbers, we obtain M 1 P → A(γ r ). Hence, to prove Lemma 4, we only need to show that M l P → 0, l = 2, 3.
By conditions (C1)-(C3) and Lemma 2, we obtain By the similar argument for M 3 , we have The proof is completed.

Lemma 5 Under the assumptions of Theorem 3, we have
Proof Some elementary calculation yields that By conditions (C1) and (C2), we have 1 ir a.s.

Lemma 6
Under the conditions of Theorem 3, we have λ = O p n -1/2 .
Proof By using Lemmas 3-5 and the same method in Owen [14], we can prove this lemma.
Here, we omit the process of the proof.
Proof of Theorem 1 The proposed estimatesγ r can be denoted bŷ = √ n a n n i=1 β r (U i )β r (U i ) X ir + √ n a n n i=1 β r (U i )X iraγ r E(X r ) ir -bE(X r ) = I 1 + I 2 .
For I 1 , using Lemma 2 and the conditions in (C1)-(C6), we have This implies I 1 = o p (1). By the central limit theorem, This completes the proof of Theorem 2.
Proof of Theorem 3 We use a Taylor expansion to (3.1), and by Lemmas 3-6 we can obtain L n (γ r ) = 2 This, together with Lemmas 3 and 4, completes the proof.