Partial linear models with general distortion measurement errors ∗

: This paper considers partial linear regression models when nei- ther the response variable nor the covariates can be directly observed, but are instead measured with both multiplicative and additive distortion mea- surement errors. We propose conditional variance estimation methods to calibrate the unobserved variables. A proﬁle least-squares estimator associ- ated with the asymptotic results and conﬁdence intervals is then proposed. To do hypothesis testing of the parameters, a restricted estimator under the null hypothesis and a test statistic are proposed. The asymptotic properties of the estimator and the test statistic are also established. Further, we employ the smoothly clipped absolute deviation penalty to select relevant variables. The resulting penalized estimators are shown to be asymptoti- cally normal and have the oracle property. Estimation, hypothesis testing, and variable selection are discussed under the scenario of multiplicative distortion alone. Simulation studies demonstrate the performance of the proposed procedure and a real example is analyzed to illustrate its appli- cability.


J. Zhang
To date, there has been little discussion on the coexistence of the two kinds of distortion measurement errors in partial linear models. [8] and [38] considered multiplicative distortion measurement errors (φ A (U ) ≡ 0, ψ A,r (U ) ≡ 0), and [39] considered the additive distortion measurement errors (φ M (U ) ≡ 1, ψ M,r (U ) ≡ 1). Multiplicative distortion measurement data usually occur in health-related studies or medical science research. For example, [7] numerically normalized the collected data according to body mass index (BMI) to study the relation between fibrinogen level and serum transferrin level among hemodialysis patients. This processing of the collected data [7] implies that there may exist a multiplicative relation between the unobserved primary variables and BMI, which is called the confounding variable. Unfortunately, the exact relation between the confounding variable and the primary variables is typically unknown, and simply dividing the confounding variable may lead to an inconsistent estimator of the parameter for a given statistical model. From another perspective, [25,26] adopted some flexible multiplicative adjustments by introducing unknown smooth distortion functions φ M (u) and ψ M,r (u) on the confounding variable. Recently, a number of researchers have studied multiplicative distortion measurement error models (see [1,22,21,28,25,20,27,11,41,36]). The topic of additive distortion measurement errors was first considered in [25]. Later, [20] proposed some graphical techniques for assessing departures from or violations of assumptions regarding the type and form of the additive or multiplicative distortion.
Regarding additive distortion, [33] proposed a residual-based estimator of the correlation coefficient between two unobserved primary variables, and showed that the estimator is asymptotically efficient as if all the variables are observed exactly, i.e., without distortion. [34] studied the estimation and variable selection in partial linear single-index models when the response variable and some covariates are measured with additive distortion measurement errors, i.e., φ M (u) ≡ 1 and ψ M,r (u) ≡ 1, r = 1, . . . , p. Suppose that there are no multiplicative distortions errors (φ M (u) = ψ M,r (u) ≡ 1). If we treat the nonparametric function g(Z) as a single-index model g(Z) = g(θZ) (θ = 1), the estimation method proposed in [34] is not applicable to the partial linear model (1.1). Their leaveone-out component estimation method is not workable because of the identifiability problem for the single-index parameter. The leave-one-out component estimation method for a single-index parameter can theoretically achieve the semiparametric efficient bound, and the asymptotic covariance matrix of profile least-squares estimators is usually invertible, which can be used to construct asymptotic intervals and hypothesis testing for further statistical inference. In detail, to use this method, we need to transform the single-index parameter γ into γ = ( 1 − γ (−1) 2 , γ (−1) = (γ 2 , . . . , γ r ) T ) T , and we must also first estimate γ (−1) . Obviously, the leave-one-out component estimation method can not be used in partial linear models, because θ = 1 is a one-dimensional parameter and θ (−1) is an empty set. This paper intends discusses partial linear models that contain both multiplicative and additive distortion measurement errors. Because the model (1.1) contains the additive distortion functions φ A (u) and ψ A,r (u), the calibration estimation proposed in this paper is different from that in [1] and [34]. In [1], the authors only considered the existence of multiplicative distortion measurement errors (φ A (u) = ψ A,r (u) ≡ 0), and used the conditional mean calibration procedure to estimate (φ M (u), Y, ψ M,r (u), , X r = Xr ψ M,r (u) , ψ M,r (u) = E( Xr|U =u)

E( Xr)
, r = 1, . . . , p. In [34], the authors considered only additive distortion measurement errors (φ M (u) = ψ M,r (u) ≡ 1), and used the conditional mean calibration procedure to obtain the relations . . , p. The authors then used the regression "residuals" to estimate the parameters in PLSiMs. (u). Note that all the distortion functions (φ M (u), φ A (u)), (ψ M,r (u), ψ A,r (u)) are unknown, and the conditional mean calibration [1] and residual-based calibration procedures in [34] are no longer workable, because we can not estimate them through E( Y |U = u) or E( X r |U = u) alone. Consequently, in this paper, we propose a new calibration procedure by coupling the conditional means (E( Y |U = u), E( X r |U = u)) and conditional variances (Var( Y |U = u), Var( X r |U = u)). Using these estimatesÊ( Y |U = u), Var( Y |U = u),Ê( X r |U = u), Var( X r |U = u), we obtain (φ M (u),φ A (u),ψ M,r (u),ψ A,r (u)) ψ M,r (U ) . Note that the calibrated variables (Ŷ ,X r ) and the asymptotic results obtained in this paper are all different from those in [1] and [34].
With these calibrated variables, we use a profile least-squares estimation to obtain a root-n consistent estimator of β 0 . Specifically, we consider the estimation efficiency of the proposed estimators in the case when φ A (u) = ψ A,r (u) ≡ 0. In this setting, without additive distortion, we further propose a second estimator by using the conditional absolute mean technique [2,41]. The normal approximation is derived by estimating the asymptotic covariance matrices and the empirical likelihood-based statistics are proposed to construct two different asymptotic confidence intervals of the parameter β 0 .
To make further inferences, we consider the problem of checking whether the linear combination Aβ 0 = b holds. A restricted profile least-squares estimator and a test statistic are proposed by introducing Lagrange multipliers under the null hypothesis. Under the null hypothesis, the limiting distribution of the test statistic is shown to be a standard chi-squared distribution. We also investigate the asymptotic properties of the estimator and the test statistic under the local alternative hypothesis. Finally, to perform variable selection, we propose a profile penalized least-squares method based on the smoothly clipped absolute deviation method [4,SCAD]. We demonstrate that the resulting SCAD-based solution is selection-consistent. Monte Carlo simulation experiments are conducted to examine the performance of the proposed estimation and test procedures.
The remainder of this paper is organized as follows. In Section 2, we propose the conditional variance calibration for the unobserved variables, present a pro-file least-squares estimator of the parameter, and derive the related asymptotic results. In Section 3, the confidence intervals the of parameter are proposed. Section 4 considers the problem of checking whether the linear restriction Aβ 0 = b holds. In Section 5, variable selection for parameter β 0 is discussed. Section 6 covers two estimation procedures when only multiplicative distortion exists. Hypothesis testing, confidence intervals construction, and variable selection are also discussed. In Section 7, we report the results of simulation studies. In Section 8, we present statistical analysis results using real data. All technical proofs of the asymptotic results are given in the appendix.

Calibration
We first calibrate unobserved Y and X by using the observed To ensure identifiability, it is assumed that (2. 2) The identifiability conditions (2.1)-(2.2) are introduced by [26,25], and it is analogous to the classical additive measurement errors: E(e) = 0 for W = X +e, where W is error-prone and X is error-free [11,31]. Define Under the independence condition between U and (Y, X), the identifiability conditions (2.1)-(2.2) and condition (C1) entail that: Because the square root of variances σ Y , σ Xr 's are used in the denominators of (2.5)-(2.6), the condition σ Y p r=1 σ Xr > 0 should be imposed here. Equivalently, it is required that the covariates X r 's and response variable Y are nonconstant variables. Using Thus, the unobserved variables {Y, X r , r = 1, . . . , p} can be obtained through (2.9)-(2.10) at the population level. We summarize the calibration procedure as follows.

A profile least squares estimator
In the following, we define A ⊗2 = AA T for any matrix or vector A. From model (1.1), we have Under the identifiability conditions (2.1)-(2.2) and the independence condition between U and Z, the model (2.15) is equivalent to Thus, a profile least squares estimator of β 0 (at the population level) is obtained as To obtain the estimator of β 0 , we use local linear estimators to estimate S Y (z) and s Xr (z). These estimators are defined aŝ , ω = 0, 1, 2. Based on (2.16), the profile least squares estimator of β 0 is obtained aŝ After obtaining estimatorβ, using the relation is estimated by using the local linear estimator (ĝ(z), g (z)) = arg min (2.20) here, h 2 is the bandwidth. After simple calculation, we havê where, In the following Theorem 2.1 and Theorem 2.2, we present the asymptotic results of estimatorsβ andĝ(z).

Asymptotic results
We now list the assumptions needed in the following theorems.
These conditions are not restrictive and are satisfied in most practical situations. Condition (C1) is the typical smoothing assumptions in the distortion measurement errors literature, see also in [2,32]. Conditions (C2)-(C3) are needed for the asymptotic normality of our statistics. See, for example [35]. Conditions (C4) is a common condition for kernel function K(·), and the Epanechnikov kernel satisfies this condition. This condition ensures the kernel smoothing estimators 1 The high-order kernel function K * (t) has zero value and negative values when |t| ≥ 3/7. For example, the involved estimators with 1 nh n i=1 K * ( Ui−u h ) may produce negative values, and this is the drawback of the high-order kernel function. Condition (C5) is on bandwidths (h, h 1 ) in the nonparametric kernel smoothing. For bandwidth h 1 , Condition (C5) requires that the "optimal" rate of order n −1/5 can be used [35]. For bandwidth h, a under-smoothing condition nh 4 → 0 is needed. The consequence of under-smoothing is that the biases of the nonparametric estimates are kept small and preclude the optimal bandwidth for h. Condition (C6) is the technique condition involved in SCAD [4].
In the following, we define Remark. The first term Σ −1 0 Σ 0ε Σ −1 0 is the usual asymptotic covariance matrix for the profile least squares estimator when data are exactly observed [6], i.e., φ M (u) ≡ 1, φ A (u) ≡ 0, ψ A,r (u) ≡ 1 and ψ A,r (u) ≡ 0, r = 1, . . . , p. If the model error is further independent of X, this term reduces to E( 2 )Σ −1 0 . The second term Σ φ M ,ψ M is caused by the multiplicative and additive distortion measurement errors involved in the response variable and covariates. It is interesting to see that the additive distortions φ A (u) and φ A,r (u)'s have no effect on the estimation of β 0 . If we further assumed that φ M (u) = ψ M,r (u) ≡ 1, r = 1, . . . , p, then the term Σ φ,ψ = 0. In this case, the estimator is efficient because the effect of additive distortions vanishes, which coincides with the asymptotic result of Theorem 1 in [39]. In other words, the profile least squares estimation procedure can automatically eliminate the effect induced by the additive distortions. And the profile least squares estimation procedure can also eliminate both the effect of multiplicative and additive distortions for estimating β 0r when β 0r = 0, i.e., Avar(β r ) = e T r Σ −1 0 Σ 0 Σ −1 0 e r , whereβ r is the r-th component ofβ, e r is a pdimensional vector with 1 in the r-th position and 0's elsewhere, r = 1, . . . , p, and Avar(β r ) stands for the asymptotic variance ofβ r obtained in Theorem 2.1.
Remark. When the multiplicative distortions φ M (u) and ψ M,r (u)'s vanish (φ M (u) = ψ M,r (u) ≡ 1), the asymptotic variance , which coincides with the asymptotic variance of Theorem 2 in [39]. Moreover, the estimatorĝ(z) is asymptotically efficient when the additive distortion functions further satisfy i.e., the asymptotic bias and asymptotic variance ofĝ(z) are the same as those obtained in [5] and [6].

Empirical likelihood method
Empirical likelihood (EL) method proposed by [24] is another popular method to construct confidence intervals without estimating the asymptotic covariance matrix. The EL method is an appealing nonparametric approach for constructing confidence intervals (regions) for the parameter of interest. There is a large and growing literature extending empirical likelihood methods to many statistical problems. For example, [12,10,1,17]. In the following, we construct confidence intervals of β 0 based on the EL principle. The EL method needs an auxiliary vector ℘ n,i (β ) = (℘ [1] n,i (β ), . . . , ℘ [p] n,i (β )) T with the property of that E℘ n,i (β ) = 0 when β = β 0 . Recalling that model (2.16) is a linear regression model, the "ideal" auxiliary random vector can be constructed as . We now define the calibrated EL principle by plugging in The Lagrange multiplier method entailsl n (β ) = 2

Hypothesis testing
In the previous section, we consider the estimation and confidence intervals of β 0 . Another interesting topic is whether certain explanatory variables can significantly influence the response. In many important statistical applications, in addition to model information in model (1.1), let us give some prior information about β 0 in the form of a set of linear restrictions as follows: where A is a known k × p full-rank matrix, rank(A) = k ≤ p and b is a known k-vector constants. This hypothesis test is used to check the special structure of parameters β 0 or the influence of the components of X. If the null hypothesis H 0 is true, the condition Aβ 0 = b can be used to estimate β 0 . A restricted profile least squares estimation procedure by using Lagrange multiplier technique is proposed as: where λ is a k × 1 vector of the Lagrange multipliers. Differentiating W n (β, λ) with respect to β and λ, Using the estimatorΣ defined in subsection 3.1, the restricted estimatorβ R of β 0 derived from the first equation (4.2) satisfies Note that the profile least squares estimatorβ in (2.19) satisfies Together with (4.3)-(4.4), we havê Then, equation (4.5) entails that Recalling that the estimatorβ R in the second equation (4.2) satisfies Aβ R −b = 0, we multiply A on both sides in equation (4.6) and obtain From (4.7), we obtain We substitute expression λ in (4.8) to (4.6), and the restricted least squares estimator of β 0 is obtained aŝ We now present the asymptotic normality ofβ R .
Remark. From the definition of Ω A , it is seen that AΩ A = 0. Then, the asymptotic covariance matrix of Aβ R − Aβ 0 under the null hypothesis H 0 is a zero matrix, this is because the linear constrain Aβ R = b holds true in (4.2) when we estimate β 0 .
To test hypothesis H 0 , we propose to use a weighted quadratic forms of Aβ − b. Intuitively, if the null hypothesis H 0 is false, i.e., Aβ 0 = b, the value of Aβ − b should be significantly large. The test statistic for testing H 0 is defined as

Theorem 4.2. Suppose conditions in Theorem 2.1 hold, under the null hypoth-
where χ 2 k is a centered chi-squared distribution with degrees of freedom k. Next, we consider the local alternative hypothesis (4.10) In the following, we present the asymptotic results ofβ R and T n under the local alternative hypothesis H 1n .
where χ 2 k (π c ) is the noncentral chi-squared distribution with degrees of freedom k, and π c is the noncentrality parameter.

Variable selection
In the process of data analysis, the advent of modern technology allows many variables to be easily collected in scientific studies. Typically, many of them are included in the full model at the initial stage of modeling to reduce the model approximation error. It is of fundamental interest in statistical modeling to determine which variables should be selected and retained in the final statistical model. One popular variable selection method is the penalized least-squares method, which has been extensively studied over the past two decades. The least absolute shrinkage and selection operator [29,LASSO] and the smoothly clipped absolute deviation [4,SCAD] have been extensively discussed and are widely used.
Model (2.15) is a linear regression model with respect to β 0 . It is of interest to determine which covariates have nonzero effects on the response. There are a number of penalized variable selection methods for partial linear regression models (see, for example, [30,18,13]). In this section, we use the SCAD penalty function to select the nonzero component of β 0 . The SCAD penalty function p ζ (·) satisfies p ζ (0) = 0, p ζ (0+) > 0, and its first order derivative is where, a is some positive constant with a > 2 and ζ is a tuning parameter. From the perspective of Bayesian statistics, [4] suggests using a = 3.7, and so this value will be used throughout the remainder of this paper. For variable selection in multiplicative distortion measurement error models, [9] considered the use of Lasso-type penalty functions for simultaneous variable selection and parameter estimation in a linear regression model. There has been no discussion in the literature of the variable selection problem when both multiplicative and additive distortion exist in the partial linear model considered in this paper. To solve this problem, we propose the following SCAD penalized estimator: where p ζ (·) is the SCAD penalty function with the tuning parameter ζ. We now study the sampling property of the resulting penalized least squares estimators. Without loss of generality, assume that , where β 0,1 denotes the p 0 × 1 nonzero components of β 0 , and β 0,2 is a (p − p 0 ) × 1 vector containing zeros. In addition, X 1 consists of the first p 0 components of X and ψ M,1 (U ) consists of the first p 0 components of ψ M (U ). Moreover, we define the following notation:

Theorem 5.1. Under the conditions (C1)-(C6), the penalized estimatorβ
T satisfies: (a) (consistency) with probability tending to one,β P,2 = 0; Remark. The extra-bias √ nR ζ 1 is induced by the SCAD penalty function. If we impose conditions √ nR ζ 1 → 0 and Σ ζ 1 → 0, the asymptotic result of Theorem 5.1(b) is the same as Theorem 2.1 if the non-zero components of β 0 were known beforehand. Moreover, the SCAD penalty also automatically shrinks the zero components of β 0 to zeros. With an appropriate choice of the tuning parameter ζ, Theorem 5.1 indicates that the proposed variable selection procedure possesses the oracle property. We now discuss the choice of the tuning parameter.
We adopt the BIC selector to choose the regularization parameters ζ j 's [16] by reducing the p-dimensional regularization parameters (ζ 1 , . . . , ζ p ) to a single dimension. Let ζ r = ζ 0σr , r = 1, . . . , p, whereσ r is defined in (3.1). The BIC score for ζ 0 can be defined as andŜ X,ζ (Z i ) consist of the components ofX i andŜ X (Z i ) according toβ P,ζ , respectively. N ζ0 is the number of nonzero coefficients ofβ P,ζ , whereβ P,ζ is the resulting penalized estimator of β 0 with tuning parameter ζ = (ζ 1 , . . . , ζ p ) T , where ζ r = ζ 0σr . Thus, the minimization problem over ζ j reduces to a onedimensional minimization problem through ζ 0 . The minimizer of the tuning parameter ζ 0 can be obtained by a grid search. Based on our experience in simulations, 30 grid points, evenly distributed over the range of ζ 0 , are sufficient.

Estimation
In the previous sections, we consider the coexistence of multiplicative and additive distortion measurement errors. In this section, we consider a special case , there is no additive distortions: We propose to use the recently studied conditional absolute mean calibration method [2,41, CAMC] to estimate β 0 , and we discuss the asymptotic efficiency of estimatorβ and the CAMC estimator.
Use directly (2.11), the calibrated variables are defined aŝ The estimators (2.19) and (6.2) require the condition σ Y p r=1 σ Xr > 0, which is equivalent to P (Y = E(Y )) + p r=1 P (X r = E(X r )) < 1. In other words, none of the variables is a constant variable. The CAMC method proposed in [2] and [41] is by using The equations (6.3) and (6.4) require the condition E(|Y |) It is remarkable that the CAMC method is applicable for model (6.1) but not for model ( and φ M (u) are two unknown functions, and only one equation (6.3) is not workable anymore. [2] proposed the CAMC method to estimate the conditional mean function E(Y |X = x), and [41] used CAMC for model checking problem. We now propose another estimator of β 0 based on the CAMC method. The Nadaraya-Watson estimators of φ M (u) and ψ M,r (u) are defined aŝ Using (6.5)-(6.6), we obtain the CAMC calibrated variables LetX C,i = (X C,1i , . . . ,X C,pi ) T , using (2.17) and (2.18), the parameter β 0 is estimated asβ We now present the asymptotic results for the estimatorsβ V andβ C . Define 5378 J. Zhang the following notations: Compared with Theorem 2.1, it is seen that estimatorsβ V andβ have the same asymptotic mean and asymptotic covariance matrix. This is not surprising because the additive distortions φ A (U ) and ψ A (U ) has no effect on the profile least squares estimatorβ, and the distortion model (6.1) assumed that φ A (U ) = 0 and ψ A (U ) ≡ 0. As a result, it is natural thatβ V andβ have common asymptotic mean and asymptotic covariance matrix.
For the CAMC estimatorβ C , it is seen thatβ C is more asymptotically is a positive definite matrix, and vice versa. In details, we denote the asymptotic variance ofβ V,r (the r-th component ofβ V ) as Avar(β V,r ) and the asymptotic variance ofβ C,r (the r-th component ofβ C ) as Avar(β C,r ). We have It is seen that if the response variable Y is exactly observed, i.e., φ M (u) ≡ 1, then Var(φ M (U )) = 0 and Cov(φ M (U ), ψ M,r (U )) = 0, the difference between the asymptotic variances Avar(β V,r ) and Avar(β C,r ) reduces to It is also seen that if the true parameter β 0r = 0, we have When β 0r = 0, bothβ C andβ V result in asymptotic efficient estimators, i.e., the profile least squares estimation with different calibration procedures eliminate the effect caused by the multiplicative distorting functions φ M (u) and ψ M,r (u)'s. We directly substitute estimatorsβ withβ V orβ C in the (2.20) and (2.21), and obtain the estimatorsĝ V (z) andĝ C (z), respectively. In Theorem 6.1, the estimatorsβ V andβ C have root-n convergence rate, and local linear kernel smoothing estimatorsĝ V (z) andĝ C (z) have root-(nh 2 ) convergence rate, which is slow than the former one. So the asymptotic result ofĝ V (z) andĝ C (z) are the same as those in Theorem 2.2, but the asymptotic variance σ 2 (z) is calculated under the model (6.1).

A hypothesis testing
For the parameter hypothesis testing problem (4.1), the test statistics are defined as We have the following asymptotic results.
Under the local null hypothesis H 1n of (4.10), we have where χ 2 k (π C,c ) is the noncentral chi-squared distribution with degrees of freedom k, and π C,c is the noncentrality parameter: From Theorem 6.2, we can use two test statistics T V,n and T C,n to check the hypothesis H 0 in (4.1). It is seen that if π C,c > π c , T C,n performs asymptotically more powerful than T V,n for detecting the local alternative hypothesis H 1n ; if π C,c = π c , both two statistics are asymptotically equivalent. If the local alternative hypothesis H 1n is given in advance, we can use the larger value of estimatorsπ c andπ C,c to decide which statistic is better.

Variable selection
Analogous to (5.1), the penalized estimators of β 0 are defined aŝ where p ζ (·) is the SCAD penalty function with tuning parameter ζ. Similar to Theorem 6.1, the asymptotic result ofβ P,V is the same as Theorem 5.1. Now we present the asymptotic results ofβ P,C . In the following, we define (a) (consistency) with probability tending to one,β P,C,2 = 0; To choose the regularization parameters ζ j 's for the penalized estimatorβ P,C , we also adopt the BIC selector suggested by [16]. Let ζ j = ζ 0σC,j , whereσ C,j 's are defined in (6.8). The BIC score for ζ 0 can be defined as Tβ P,C,ζ 2 , X C,i,ζ andŜ X,ζ (Z i ) consist of the components ofX C,i andŜ X (Z i ) according toβ P,C,ζ , respectively. N ζ0 is the number of nonzero coefficients ofβ P,C,ζ , whereβ P,C,ζ is the resulting penalized estimator of β 0 with tuning parameter ζ = (ζ 1 , . . . , ζ p ) T , with ζ j = ζ 0σC,j . Based on our experience in simulations, 30 grid points are set to be evenly distributed over the range of ζ 0 .

Implementation
This section reports the results of simulation studies to demonstrate the performance of our proposed estimators. The Epanechnikov kernel K(t) = 0.75(1 − t 2 )I{|t| < 1} is used here. According to condition (C5), the bandwidth h 1 can be chosen as the optimal convergence rate, but the bandwidth h should be chosen for under-smoothing (nh 4 → 0). The consequence of under-smoothing is that the biases of the non-parametric estimates are remain small and preclude the optimal bandwidth for h. The asymptotic variances of the proposed estimators for β 0 depend on neither the bandwidths (h, h 1 ) nor the kernel function K(t).
Hence, we can use the rule of thumb: h =σ U n −1/3 , h 1 =σ Z n −1/5 , and witĥ σ U being the sample deviation of U andσ Z being the sample deviation of Z. This method is fairly effective and easy to implement in practice. Our experience suggests that the numerical results are stable when we shift several values around the data-driven bandwidths.
Example 1. We consider the model A total of 1000 realizations are generated and sample sizes of n = 300, n = 500, and n = 1000 are considered.
The model error is independent of X and generated as N (0, 0.5 2 ). The variable U follows the uniform distribution U[0, 1], and the distortion functions are chosen Table 1, we report the mean, standard errors and mean squared errors for the true estimatorβ T (the profile least-squares estimator [14] using the simulated dataset {Y i , X i , Z i } n i=1 ), the proposed estimatorβ and the naive estimatorβ N (the profile least squares estimator using the dataset without calibration). Unsurprisingly,β T performs better than β, because the MSE values forβ T are all smaller than those forβ. For the proposed estimatorβ, all the mean values are close to the true value (2, −1, 0) T , and the MSE values decrease as the sample size n increases. In Theorem 2.1, we show that the estimatorβ r is asymptotically efficient when β 0r = 0. In Table  1, we see that MSE values for the estimatorβ 3 are very close those for the true estimatorβ T 3 when n = 1000. The naive estimatorβ N has a large bias, especially when estimating β 01 and β 02 . All MSE values for the naive estimator are greater than those for the true estimator and proposed estimator in this table. This indicates that ignoring the multiplicative distortion functions φ M (U ), φ A (U ), ψ M,r (U ) and ψ A,r (U ) increases the bias and results in an inconsistent estimator, even when the sample size n is large.
(1.2) Confidence intervals. We report the 95% normal approximation (NA) confidence intervals and empirical likelihood (EL) confidence intervals for β 0s , s = 1, 2, 3. The results are reported in Table 2. In Table 2, as the sample size n increases, we see that both the NA confidence intervals and the EL confidence intervals achieve satisfactory performance, both in terms of the average length of the confidence intervals and the coverage probabilities. The NA con-fidence intervals are wider and have larger coverage probabilities than the EL confidence intervals. Note that the EL method does not need to estimate the asymptotic variances of estimators, whereas the NA method does. Generally, the NA asymptotic intervals and the EL method are both recommended when the sample size is large.
(1.3) Restriction Estimator. We consider the restricted estimator under two constraints A 1 = (2, −1, 0) (i.e., 2β 01 − β 02 = 5) and A 2 = (−0.5, 1, 0) (i.e., −0.5β 01 + β 02 = −2). In Table 4, the MSE of the restricted estimator for β 02 using A 1 is much smaller than the value in Table 1, and the MSEs for β 01 and β 03 have improved slightly. This indicates that the restricted condition A 1 can improve the estimation efficiency for β 02 without sacrificing much estimation efficiency for β 01 and β 03 . For A 2 , the MSE of the restricted estimator for β 0s , s = 1, 2, 3, is much smaller than the value in Table 1, which again implies that the restricted condition A 2 improves the estimation efficiency for β 0 .
(1.4) Hypothesis test. We consider the following test problem for model ( Table 4. In Table 4, as the value of c increases, the power function increases rapidly. The power function tends to 1 as the sample size n increases, which shows that the test statistic T n is powerful for this test problem.   Table 2 Simulation results of confidence intervals. "NA" stands for the normal approximation and "EL" stands for the empirical likelihood. "Lower" stands for the lower bound, "Upper" stands for upper bound, "AL" stands for average length, "CP" stands for the coverage probabilities.    , ψ A,r (U ) = U r − 1 r+1 , r = 1, . . . , p. The sample size n in this example is chosen as n = 300, n = 500 and n = 1000.

Table 3 Simulation results of Mean (M), Standard Error (SD) and Mean Squared Error (MSE) for β R with
To measure the selection and estimation accuracy, we define ω u,β 0 , ω c,β 0 and ω o,β 0 as the proportions of underfitted, correctly fitted and overfitted models. In the case of overfitted models, "1", "2", and "≥ 3" are the proportions of models including 1, 2, and more than 2 insignificant covariates. We denote the mean squared error β P − β 0 2 2 as Mse β 0 , whereβ P is the final penalized estimators. Moreover, "C β 0 " and "IN β 0 " denote the average number of the zero coefficients that were correctly set to be zero, and the average number of nonzero coefficients that were incorrectly set to be zero, respectively. In Table 5, we report the true penalized estimator (using the true covariates (Y, X, Z)), the penalized estimatorβ P , and the naive penalized estimator (using the observed data ( Y i , X i , Z i ) directly without calibration). We see that the values of "C β 0 " for the true penalized estimator andβ P are close to the true values 7 (p = 10), 17 (p = 20), and 27 (p = 30), and that "IN β 0 " is close to 0. However, the naive penalized estimator falsely penalizes the non-zero components of β 0 to zero, and the values of IN β 0 are nonzero even for large sample sizes. For the true penalized estimator and the penalized estimatorβ P , the proportion of models that are correctly fitted (column ω c,β 0 ) is above 90% when n = 300, and 100% when n = 1000. The proportions of models that are underfitted (column ω u,β 0 ) and overfitted (columns under ω o,β 0 ) for the true penalized estimator andβ P are about 0% and 10% when n = 300 and n = 500, respectively. In the overfitted case, the proportion of models including 1 insignificant covariate dominates the cases including 2 or more insignificant covariates, and the latter is close to 0% in most situations. This indicates that the true penalized estimator and β P are most likely to select a final model that is very close to the true model. Moreover, the mean squared errors Mse β 0 forβ P are much smaller than for the naive penalized estimator. The naive penalized estimator definitely ruins the oracle property of the SCAD penalty function, giving larger values of Mse β 0 . The large biases can not be eliminated even when the sample size n increases to 1000, which coincides with the simulation results reported in Table 1. When the sample size n = 300, the percentage of underfitted models (columns under ω u,β 0 ) is about 8%, which implies that the naive penalized estimator eventually produces an incorrect model. This again indicates that ignoring the distortion measurement errors in the variable selection process will ruin the oracle property and result in a wrong model. We compare the performance of the estimators (β V ,β C ) and their confidence intervals, test statistics (T V,n , T C,n ) and penalized estimators (β P,V ,β P,C ). Note that the estimation method proposed in [1] can not be used in this example because E(X r ) = 0, r = 1, . . . , p.
In Table 6, it is not surprising that the true estimator performs better than the proposed estimator, because the MSE values forβ T are all smaller than β V andβ C , and their mean values are close to the true value (2, −1, 0) T . The performance ofβ V is slightly better than that ofβ C . The latter has a slightly larger MSE. In Theorem 6.1, we have shown that the estimatorβ r is asymptotically efficient when β 0r = 0. In Table 6, we see that the MSE values for the Table 5 Simulation results for Example 2. "T" stands for the true penalized estimator, "P" standards for the penalized estimatorβ P , and "N" stands for the naive penalized estimator.
All values of Mse β 0 are in the scale of 10 −3 .  Table 1. This again implies that the multiplicative distortions and additive distortions asymptotically have no effect on estimating β 03 = 0, regardless of the choice of distortion functions. In Table 7, we report the 95% NA confidence intervals based on the estimator β V associated with EL confidence intervals, and the NA confidence intervals based on the estimatorβ C associated with EL confidence intervals for β 0s , s = 1, 2, 3. In Table 7, the NA confidence intervals according toβ C are slightly wider than forβ V , but the coverage probabilities are slightly larger than the empirical likelihood confidence intervals. Additionally, the EL method produces more accurate coverage probabilities than the NA method (see also Table 2).
In Table 8, we compare the performance of the test statistics T V,n and T C,n for the hypothesis testing problem 7.2. Simulation results are similar to those in Table 4. As the value of c increases, the power functions of T V,n and T C,n increase to a rapidly as the sample size n increases. It is clear that T C,n is more powerful than T V,n , and this coincides with the confidence intervals of β 03 reported in Table 7. As the 95% confidence intervals based onβ C are slightly wider than those forβ V , if one uses the complement sets of the 95% confidence intervals to reject the hypothesis 7.2, the complement sets of the confidence intervals according toβ C are more powerful than those forβ V . The simulation results in Table 8 also reveal that T C,n (based onβ C ) is more powerful than T V,n (based onβ V ).   Table 7 Simulation results of confidence intervals. "NAV" stands for the normal approximation based onβ V , "ELV" stands for the corresponding empirical likelihood method. "NAC" stands for the normal approximation based onβ C , "ELV" stands for the corresponding empirical likelihood method.   Example 4. In this example, we conduct 1000 simulations from model (1.1) to examine the performance of the penalized estimatorsβ P,V andβ P,C . The sample sizes are set to be n = 300, n = 500, and n = 1000. The parameter β 0 , the variables (X, Z, U, Y, ) and multiplicative distortion functions φ M (U ) and ψ M,r (U )'s are the same as those in Example 2. The additive distortion functions are all set to be zero: φ A (U ) = 0, ψ A,r (U ) = 0, r = 1, . . . , p.
In Table 9, we report the true penalized estimator (using the true simu- , the penalized estimatorsβ P,V ,β P,C , and the naive penalized estimator (using the distorted data directly without calibration). The simulation results are all similar to those in Table 5. The values of "C β 0 " forβ P,V ,β P,C are close to the true values 7 (p = 10), 17 (p = 20), and 27 (p = 30), and "IN β 0 " is close to 0. The values of ω c,β 0 are all above 97%. For ω u,β 0 (underfitted model) and ω o,β 0 (overfitted model), the values are all close to 0. However, the naive penalized estimator falsely penalizes the nonzero components of β 0 to zero because the values of IN β 0 are nonzero, even for large sample sizes, and the values of ω u,β 0 are also nonzero. The naive penal-ized estimator definitely ruins the oracle property of SCAD penalty function, giving larger values of Mse β 0 . The large biases can not be eliminated even when the sample size n increases to 1000, which coincides with the simulation results reported in Table 5. This again indicates that ignoring the multiplicative distortion measurement errors for the variable selection process will ruin the oracle property and result in a poor model. Table 9 Simulation results for Example 3. "T" stands for the true penalized estimator, "V" standards for the penalized estimatorβ P,V , "C" standards for the penalized estimatorβ P,C and "N" stands for the naive penalized estimator. All values of Mse β 0 are in the scale of 10 −3 .

Real data analysis
As an illustration, we now apply our method to the analysis of bodyfat data (http://lib.stat.cmu.edu/datasets/bodyfat). The dataset contains the percentage of body fat determined by underwater weighing and various body circumference measurements for 252 men. In practice, the accurate measurement of body fat is inconvenient and costly. Thus, simple methods of estimating body fat that are neither inconvenient nor costly are desirable. We used the partial linear model (1.1) to investigate the relationship between Y -percent body fat, Z-age, X 1 -weight, X 2 -height, X 3 -neck circumference, X 4 -chest circumference, X 5 -abdomen circumference, X 6 -hip circumference, X 7 -thigh circumference, X 8 -knee circumference, X 9 -ankle circumference, X 10 -biceps circumference, X 11 -forearm circumference, X 12 -wrist circumference, and the confounding variable U -the body density determined from underwater weighing. We first present the patterns ofφ M (u),φ A (u),ψ M,r (u)'s andψ A,r (u)'s in Figures 1-4. The plots show that all the distortion functions are non-constant.
Corresponding to the covariates (X 1 , . . . , X 12 ) T , Table 10 presents the estimators of β 0 , standard errors, p-values, confidence intervals based on NA, confidence intervals based on EL, and the penalized estimators associated with their estimated standard errors. The standard errors of the penalized estimatorβ P are obtained using the plug-in estimators of the asymptotic covariance matrices obtained in Theorem 5.1. The estimatorβ and its associated p-values show that only variable X 5 -abdomen is significant; the remaining p-values are all greater than 0.2. The only 95% NA confidence intervals that does not contain zero is that of β 05 . For the 95% EL confidence intervals, only those for β 01 , β 05 and β 012 exclude zero, implying that these parameters should be significant at the 5% significant level. The penalized estimatorβ P indicates that X 2 -height, X 5 -abdomen and X 12 -wrist are irrelevant to the response Y . The above analysis suggests that X 5 -abdomen is the most important variable in model (1.1). Intuitively, it makes sense that the percentage of body fat will increase as X 5 abdomen circumference becomes larger. Finally, we show the pattern ofĝ(z) in Figure 5. This figure reveals that g(z) is nonlinear, and that the percentage of body fat increases from age 20-45, decreases slightly from age 45-60, and then increases again from age 60-80. Generally, the parameter estimation results in Table 10 and the plot in Figure 1 show that X 5 -abdomen and Z-age can be used as the two principal variables for predicting the percentage of body fat in future health studies.
Proof. Lemma A.1 can be immediately proved from the result obtained by [19].
Proof. Lemma A.2 is the direct result of Theorem 1 in [37].
For r = 1, . . . , p, we have Proof. Lemma A.2 is the direct result of Lemma B.2 in [40]. See also the Lemma 1.1 in the on-line supplementary material of [41].

A.2. Proof of Theorem 1
Recalling that where, Step 1.1 For the expression D n1 , we have [3].
Using (A.14), we have completed the proof of Theorem 1.

A.3. Proof of Theorem 2
Note that For the term S n1 (z), using Lemma A.1 and Theorem 1, it is seen that Directly using Lemma A.1, similar to (A.16), we have Together with (A.16) and (A.17), we havê The asymptotic result of Theorem 2 is directly obtained from (A.18), we have completed the proof of Theorem 2.

A.4. Proof of Theorem 3
We first consider the conditional mean calibration. For 1 ≤ r ≤ p, let℘ n,i (β 0 ) be the r-component of℘ n,i (β 0 ). We decompose℘ [r] n,i (β 0 ) into following terms: To prove Theorem 3, we need to show that max 1≤i≤n |℘ [r] n,it | = o P (n 1/2 ), t = 1, . . . , 8. It is noted that for any sequence of i.i.d random Next, for R [r] n,i1 , according to the proof of Theorem 1 and Theorem 3 in [37],

A.5. Proof of Theorem 4 and Theorem 5
Step 1 Note that Under the null hypothesis H 0 , we have Aβ 0 = b. Using (A.29), it is seen that Together with (A.13) and (A.14), the equation (A.30) can be expressed aŝ We have completed the proof of Theorem 4.
Step 2 Under the null hypothesis H 0 : Aβ 0 = b, using (A.14) and Theorem 1, Similar to the analysis of (A.13), we have The Slutsky theorem entails that where, I k is a k × k dimensional identity matrix. Using (A.34), the continuous mapping theorem entails that where, χ 2 k is the centered chi-squared distribution with degree of freedom k We have completed the proof of Theorem 5.

A.6. Proof of Theorem 6
Step 1 It is noted that b = Aβ 0 − n −1/2 c under the null hypothesis H 1n . From (A.29), we havê Using (A.30)-(A.31) and (A.36), we havê According to Theorem 1, we have Step 2 Under the local alternative hypothesis H 1n : Aβ 0 = b + n −1/2 c, using Theorem 1, we have Using (A.33)-(A.34) and (A.39), we have Then, according to (A.40), the continuous mapping theorem entails that where, χ 2 k (π c ) is the noncentral chi-squared distribution with degree of freedom k, and π c is the noncentrality parameter, defined as We have completed the proof of Theorem 6.

A.7. Proof of Theorem 7
Step 1 In this step, we establish the asymptotic expressions of minimizer estimatorβ P . Define p ζs (|β s |).
We have completed the proof of Theorem 7.

A.8. Proof of Theorem 8
The proof of asymptotic normality ofβ V can be directly obtained from the proof of Theorem 1 by treating φ A (U ) ≡ 0, ψ A,r (U ) ≡ 0, r = 1, . . . , p. We omit the details.
For the estimatorβ C , similar to (A.1), we havê where, Step 8.1 For the expression K n1 , we have [3].
We have completed the proof of Theorem 8.

A.9. Proof of Theorem 9
The proof of the asymptotic results of T V,n are similar to the proof of Theorem 5 and Theorem 6, we omit the details. For the test statistic T C,n , under the null hypothesis H 0 : Aβ 0 = b, using (A.58) and Theorem 1, we have

J. Zhang
Similar to the analysis of (A.57), we have The Slutsky theorem entails that AΣ −1 CΣC, Using (A.61), the continuous mapping theorem entails that Under the local alternative hypothesis H 1n , it is noted that b = Aβ 0 −n −1/2 c under the null hypothesis, we have Then, according to (A.64), the continuous mapping theorem entails that where, χ 2 k (π C,c ) is the noncentral chi-squared distribution with degree of freedom k, and π C,c is the noncentrality parameter: We have completed the proof of Theorem 9.
General distortion measurement errors 5411