Estimation and inference of error-prone covariate eﬀect in the presence of confounding variables

: We introduce a general single index semiparametric measure- ment error model for the case that the main covariate of interest is measured with error and modeled parametrically, and where there are many other variables also important to the modeling. We propose a semiparametric bias-correction approach to estimate the eﬀect of the covariate of interest. The resultant estimators are shown to be root- n consistent, asymptotically normal and locally eﬃcient. Comprehensive simulations and an analysis of an empirical data set are performed to demonstrate the ﬁnite sample performance and the bias reduction of the locally eﬃcient estimators.


Introduction
Estimating and testing the effect of a covariate of interest while accommodating many other covariates is an important problem in statistical practice. The t-test and the analysis of variance are widely used to evaluate the covariate effect when the covariate of interest is binary or categorical and no confounders are present. When the covariate of interest is not necessarily binary or categorical, evaluating the covariate effect has been studied extensively in the context of linear model, partially linear model (Heckman, 1986;Härdle et al., 2000; and partially linear single-index model (Carroll et al., 1997;Yu and Ruppert, 2002;Li et al., 2011;Ma and Zhu, 2013), as long as both the covariate of interest and the confounders are measured precisely. In this work, we intend to generalize the partially linear single-index model to a larger class where the link function is not restricted to be linear, and we further consider measurement error issues.
When the covariate of interest is measured with error, to evaluate its effect precisely we must reduce the bias caused by measurement error and adjust for the confounding effects simultaneously. This is an interesting yet very challenging problem. To partially address this problem, Carroll et al. (2006) assumed the confounding effects are linear, and Liang et al. (1999) and Ma and Carroll (2006) assumed the confounders are in fact univariate. These assumptions restrict the usefulness of their methods. To the best of our knowledge, how to assess the covariate effect subject to measurement error while taking into account possibly nonlinear confounding effects still remains an open and difficult problem in the literature.
Estimating and testing the effect of a covariate of interest in the presence of possibly nonlinear confounding effects has many applications in a variety of scientific fields such as econometrics, biology, policy making, etc. Consider the Framingham Heart Study (http://www.framinghamheartstudy.org/) as a typical example. It is common knowledge that high systolic blood pressure (SBP) is directly linked to the occurrence of coronary heart disease (Y ). To quantify the effect is however not necessarily straightforward. One difficulty is that SBP can vary significantly from time to time, hence a clinically meaningful covariate is the long term average of SBP ( X), which is unfortunately impossible to measure precisely. A widely used practice is to use the average of several measured SBP values ( W ) during a reasonably long time course as a substitute. Thus, long term average SBP is a variable measured with error. Another difficulty comes from the presence of possibly nonlinear confounding effects (Z) for heart disease, such as smoking status, family history, ethnicity, BMI, lung capacity, age and other laboratory variables. Because these effects are not of medical interest while their connection to the heart disease occurrence might be complex, a suitable modeling strategy is to use an unspecified function to summarize their possibly nonlinear effect. Difficulty with such modeling strategy naturally arises when the dimension of Z is more than one, since it is well known that nonparametrically estimating a function of multivariate confounding variables suffers from the curse of dimensionality. To tackle this issue, we follow the single index modeling strategy and assume that the combined effect of the covariates in Z is manifested through a linear combination γ T Z, where γ is a length p vector. For identifiability, we assume that Z contains at least one continuous variable, the first component of γ is one, and we use γ to denote the vector of the last p − 1 components. Let H be the logistic distribution function. In this Framingham data example, we assume that, given X and Z, the probability of the occurrence of the coronary heart disease (Y ) admits a model of the form Here we adopt the general assumption that after the transformation from the raw systolic blood pressure, the relation between W ≡ log( W − 50) and X ≡ log( X − 50) is additive with a normal measurement error, i.e. U ∼ N (0, σ 2 u ), and we assume the error is nondifferential. This relation is verified by Carroll et al. (2006, chapter 6).
The above model can be viewed as a special case of the following general semiparametric measurement error model. To be specific, we write the general probability density/mass function of the response variable Y , for example disease status, conditional on the covariate set (X, S T , Z T ) T as g{y, x, s, θ( γ T z), β}, (1.1) where X is an error-prone covariate whose effect on Y is of central research interest, Z, S contain additional covariates that may be related to Y and may be confounded with X. We model part of these confounders (S) parametrically, such as the categorical variables, and part of these confounders (Z) nonparametrically through an unspecified smooth function θ. Both S and Z are measured precisely. In model (1.1), g is a known conditional probability density/mass function, θ is an unspecified smooth function, γ = (1, γ T ) T , where γ is an unknown length p − 1 vector, and β is an unknown parameter. In this notation, the example above can be written as g{y, x, s, θ( γ T z), β} = exp[y{ xβ+θ( γz)}]/[1+exp{ xβ+ θ( γz)}]. In our context, we assume the covariate X is of our primary interest but is unobservable. Instead, we observe its erroneous version W , where the relation between W and X is specified, i.e. f W |X (w | x) is a known model. In practice, the specification of f W |X (w | x) is usually obtained through validation data, instruments or repeated measurements. We treat θ(·) as an infinite dimensional nuisance parameter. We further make the surrogacy assumption that W and Y are independent given X, S, Z. The primary interest is in β, which describes the effect of X on Y . In many applications, β enters the model as multiplication coefficient of a linear function of the covariates, such as through β 1 X + β 2 S. Model (1.1) is an extension of the generalized single index model proposed by Cui et al. (2011) in which neither X nor S is present. In addition, Tsiatis and Ma (2004) studied a simpler version of model (1.1) where Z does not appear, and Ma and Carroll (2006) considered a simpler version of model (1.1) where Z is univariate. The generalization to multivariate Z in model (1.1) is important in practice since it accomodates more realistic applications; see, for example, the Framingham Heart Study in Section 5. In particular, model (1.1) allows us to handle the possible nonlinearity of the confounding variables through the unspecified function θ, while the single index structure γ T z facilitates nonparametric modeling. Nevertherless, the extension also poses several challenging technical and computational problems. Indeed, when the index vector appears inside an unknown function, its estimation is more complex and interaction between the estimation of the indices and the function has to be taken into account. The variability in estimating these quantities further affects the estimation quality of the parameter of interest. Overall, the three sets of parameters, namely the parameter of interest, the index vector and the unknown smooth function link together intrinsically, which complicates the estimation procedure, the computational treatment and the theoretical development. Compared with the case when the index vector does not appear, such additional complexity can be viewed as a price paid to overcome the curse of dimensionality.
We design a general methodology for the semiparametric measurement error model (1.1), and introduce a bias-correction approach to construct a class of locally efficient estimators. This bias-correction approach is motivated by the projected score idea in semiparametrics (Tsiatis and Ma, 2004) and does not have to resort to a deconvolution method or to correctly specify a distributional model for the error-prone covariate of interest. We further generalize the biascorrection approach to estimating γ in model (1.1), which is a component that does not appear in the models considered in Tsiatis and Ma (2004) or Ma and Carroll (2006). In their studies, Z is either absent or univariate, hence the issue of estimating γ does not occur. In the presence of multivariate Z, the conditional density of X given S and Z, denoted f X|S,Z (x, s, z), is required in implementing the bias-correction approach. However, with a multivariate Z, regardless whether S is discrete or continuous, estimating f X|S,Z (x, s, z) is a thorny issue even if X were observed due to the curse of dimensionality. To alleviate the difficulty in estimating f X|S,Z (x, s, z), a working model is adopted. If this working model happens to be the underlying true one, the resultant estimator is semiparametrically efficient, whereas if this working model is unfortunately misspecified, then the resultant estimator is still root-n consistent and asymptotically normal. In other words, the resultant estimator is locally efficient. To put the bias-correction approach into practice, we suggest a profiling algorithm for estimating β.
The article is organized as the following. In Section 2 we introduce the biascorrection approach for estimating β in the semiparametric measurement error model (1.1). The asymptotic properties of the resultant estimators are given in Section 3. We report several simulation studies in Section 4 and revisit the Framingham data in Section 5. This paper is concluded with a brief discussion in Section 6. All technical details are given in an Appendix.

Estimation
In this section we discuss estimation of the covariate effect at the sample level. Write the observation as (y i , w i , s i , z i ), i = 1, . . . , n. We propose to estimate the effect of the covariate of interest as well as other nuisance parameters through solving the estimating equations derived from the semiparametric log-likelihood.
The surrogacy assumption and the model specification in Section 1 directly lead to the semiparametric log-likelihood, subject to an additive term that does not involve the parameters β, γ, θ, Recall that γ is defined in Section 1 as a vector of the free parameters in γ. Here f X|S,Z and f W |X represent the probability density function of X conditional on (S, Z) and the probability density function of W conditional on X respectively. If both θ and f X|S,Z had been known, the simple maximum likelihood estimator (MLE) would have provided a most natural estimator for β and γ. Let be the score functions with respect to β and γ, then we could modify the MLE through localization to handle the issue caused by the unknown functional form of θ. Specifically, let us adopt a local parametric model θ( γ T z) = ν( γ T z; α). For example, the most widely used local polynomial model in Fan and Gijbels (1996) can be used as ν( γ T z; α). Here α depends on γ T z, but we suppress the dependence of α on γ T z for notational clarity. Then we could estimate θ together with β, γ, through iteratively solving to obtain β, γ, and is a kernel function and h is a bandwidth. In the above display, S α is defined analogously as S β except that θ( γ T z) is replaced by ν( γ T z; α) and the derivative is with respect to α, i.e.
The above idea would have worked if we knew how to actually calculate the score functions. However, without an explicit form of f X|S,Z , the calculation of the score vectors is not an easy task. A natural approach is to estimate f X|S,Z and then use the estimated version to obtain the corresponding estimated score functions. This is not entirely out of the question, especially when f W |X (w, x) happens to describe an additive independent error Performing an inverse Fourier transform on F x (t, s, z) would then yield an estimate of f X|S,Z (x, s, z).
The above analysis reveals some hidden obstacles in estimating f X|S,Z (x, s, z). First of all, the deconvolution procedure is only applicable when the measurement error is additive and independent of X. When the measurement error model f W |X (w, x) goes beyond this structure, it is unclear how to recover f X|S,Z (x, s, z). Second, the procedure requires estimating f W |S,Z (w, s, z) nonparametrically. However, when the dimension of (s, z) is moderate or high, in other words, the confounding variables are multivariate, this is again a problem suffering from the curse of dimensionality and is not practically feasible in finite samples. Finally, even when the dimension of (s, z) is sufficiently low and the deconvolution procedure can be carried out in practice, the resulting estimate of f X|S,Z (x, s, z) has very slow convergence rate (Carroll and Hall, 1988;Fan, 1991), hence using the estimated f X|S,Z (x, s, z) may yield very different results from using the true f X|S,Z (x, s, z), which is required in the original score function calculation.
Due to these inherent difficulties involved with estimating f X|S,Z (x, s, z), we decide not to pursue this route. Instead, we take a somewhat counter-intuitive approach. Instead of striving to obtain an approximation of f X|S,Z (x, s, z), we propose to simply guess a model f * X|S,Z (x, s, z), which may or may not reflect the true conditional density function, and calculate the score functions S β , S γ , S α under this guessed model. Of course, this simple replacement of the true score functions with the guessed version is not guaranteed to yield consistent estimation of β, γ and θ. To correct the possible bias, we form where a β , a γ , a α are functions of (X, S T , Z T ) T that satisfy and E * represents expectation calculated using f * X|S,Z (x, s, z). E(a β | w, s, z, y), E(a γ | w, s, z, y) and E(a α | w, s, z, y) are respectively the projections of the score vectors S β , S γ and S α onto the tangent space Λ described in Appendix A.1, and has an no explicit form except in some special cases. We give one such special example at the end of this section. It is easy to see that the definition of a β , a γ , a α in (2.2) guarantees the consistency of L β ,L γ , and L α automatically, whether or not f * X|S,Z reflects the truth. We then use L β , L γ and L α to replace S β , S γ , S α in the iterative procedure described above to estimate β, γ and θ. That is, we solve to estimate β, γ and solve We suppressed the dependence of α on z 0 for notational brevity. The estimation procedure can be either iteratively solving (2.3) and (2.4) (backfitting), or using (2.4) to obtain θ as a function of β, γ, and then using (2.3) to solve for β, γ (profiling). In the following, we carry out all the procedures using the profiling approach.
The bias correction through forming L β etc. is rooted in the projected score idea in semiparametrics (Bickel et al., 1993;Tsiatis and Ma, 2004;Tsiatis, 2006). Given any function, say S β , we can calculate its residual after projecting it onto the nuisance tangent space associated with the model. The projection of we had used f X|S,Z throughout all the calculations. We defer the detail of this calculation in Appendix A.1. However, due to the lack of knowledge on f X|S,Z , we are forced to perform all the calculations using a proposed f * X|S,Z . The fortunate fact is that even using the possibly misspecified conditional density, (L T β , L T γ , L T α ) T still has mean zero because this property is enforced by its very construction reflected on the definitions of a β , a γ , a α in (2.2). It is worth mentioning that if f X|S,Z happens to be the truth, then S β , S γ , S α are indeed the score functions. Thus, as the orthogonal projection of the score functions, L β ,L γ and L α are the efficient score functions. Hence the resulting estimator is not only consistent, but also efficient.
To further illustrate the estimator, we now investigate the partially linear single index model with normal measurement error. We will show that in this special case, many quantities simplify and a set of explicit estimating equations can be obtained.
Consider an alternative form of Model (1.1) in this case, where Y = X T β + θ( γ T Z) + , follows a normal distribution with mean zero, known constant variance σ 2 and is independent of X. We adopt an additive normal measurement error W = X + U, where U follows a normal distribution with mean zero and known constant covariance matrix Σ and is independent of X. For estimating θ(·), we adopt the familiar local linear form θ( γ T z) = α 0 + α 1 γ T z.
Define Δ = W + Y Σβ/σ 2 . Following Stefanski and Carroll (1987), the forms of L β is where E * is computed under the model f * X|Z (x, z). Using similar derivation, we can further obtain Then the estimation can be carried out through jointly solving Similar calculations can also be made regarding the Poisson model Y ∼ Poisson[exp{X T β + θ( γ T Z)}]. In this case, L β takes the form , E * is computed under the model f * X|Z (x, z). Using similar derivation, we can further obtain Then the estimation can be carried out through jointly solving

Asymptotic properties and inference
In this section we show that the estimated covariate effect is asymptotically normal in Theorem 3.1 and locally efficient in Theorem 3.2. A by-product of the asymptotic normality property is that it facilitates testing if the estimated covariate effect is statistically significant.
Viewing θ(·) as a one dimensional parameter, we have L α = L θ θ α , where L θ is obtained the same way as L α by replacing α with θ, and θ α is the partial derivative of θ(·, α) with respect to α. Let θ αα = ∂θ α /∂α T . Let L ββ , L βγ , L βα and L βθ be the partial derivative of L β with respect to β, γ, α and θ respectively. Similarly define L γβ , L γγ , L γα , L γθ , L αβ , L αγ , L αα and L αθ . Let Theorem 3.1. Under the regularity conditions listed in the Appendix, we have the expansion Consequently, when n → ∞,  | s, z) is correct, the subsequent estimator β has the additional property that it is semiparametric efficient.
The proofs of Theorems 3.1 and 3.2 are given in the Appendix. In practice, the matrices A and B can be estimated through their sample versions, while Ω, U, θ β and θ γ need to be estimated via their corresponding nonparametric regression.
Knowing the asymptotic properties of β allows us to perform various tests. Specifically, we can test the covariate effect described as H 0 : Mβ = c, where M and c are the corresponding matrices or vectors used to describe the particular test of interest. As an example, we have the following Chi-square test result.

follows a chi-square distribution with degrees of freedom d M , where d M is the number of rows in M.
We provide the proof of Theorem 3.3 in Appendix A.5.

Simulation
We perform four simulation studies to examine the finite sample performance of the proposed method.
In the first set of simulation studies, the response variable Y is binary with Y = 0 or 1, with the true g function of the form Thus, the parameter of interest β = (β 1 , β 2 ) T consists of two components. The function θ( γ T z) = cos( γ T z)/2 − 1.
Our first simulation is a relatively simple one, where the covariate vector Z has dimension p = 2. This yields a total of three parameters in addition to the univariate nonparametric function θ and the unknown distribution of X. In simulations 2 and 3, we increase the dimension of the covariate vector Z to three and four respectively, which yield four and five parameters in addition to the two unknown functions. In all the simulations, the covariate X and the measurement errors are generated from normal distributions, and the covariate vector Z is generated from uniform distributions.
To compare the performance of various estimators, we implemented a naive estimator, two versions of the regression calibration estimators and two versions of the semiparametric estimators. In the naive estimator, the presence of measurement error is simply ignored and a profile likelihood estimation procedure is implemented to estimate the parameter β. In the regression calibration procedures, we first calculate X * = E(X | W ) and X * 2 = E(X 2 | W ) then treat X * and X * 2 as X and X 2 , and perform the profile likelihood estimation under the error-free model. In calculating E(X | W ) and E(X 2 | W ), we experimented with two situations, where we used two different working distributions of X, respectively normal and uniform. This corresponds to the true and misspecified distributional assumption on X. Finally, we also implemented the proposed semiparametric estimator, with the same working distributions of X. The estimation and inference results of all five estimators are given in Tables 1-3 respectively, corresponding to the three simulation studies. All the results are based on 1,000 simulated data sets with sample size 500. To see how the estimation procedure behaves with increasing dimension of Z, we also experimented with p > 4. In our observation, with all other aspects of the simulation fixed, the procedure performs well until p = 10, when we started to see significant biases. Throughout the numerical analysis, we used the bandwidth h = 3sd(w)n −1/3 , where sd(w) is the sample standard deviation of w. We also experimented with the bandwidth h = 1.5sd(w)n −1/3 and h = 4.5sd(w)n −1/3 , the results appear insensitive to the bandwidth changes so are omitted.
The common observation across all simulations is that the naive estimator and the two regression calibration estimators tend to produce larger biases while the semiparametric estimators, whether performed under the true or misspecified working model of the distribution of X, have much smaller biases. The Table 1 Results of Simulation 1 with p = 2. The true parameter values, the estimates ( μ), the sample standard errors ("sd"), the mean of the estimated standard errors ( sd) and the 95% confidence interval ("%") of five different estimators are reported. The five estimators are the naive estimator ("Naive"), the regression calibration estimators with two working distributions of X (" RC-nor" and "RC-Uni") and the semiparametric estimators with two working distributions of X (" Semi-nor" and "Semi-Uni"). relatively large biases of the naive and regression calibration estimators directly lead to invalid inference results, reflected in the terrible empirical coverage of the 95% confidence intervals. On the contrary, the semiparametric estimators not only yield very small biases, it also provides a close match between the sample standard deviations and their corresponding asymptotic versions. This leads to reasonable approximation of the empirical coverage of the 95% confidence intervals to the nominal level. It is worth pointing out that although we implemented an efficient estimator through adopting the working model for X as normal, and a non-efficient estimator through using uniform as the working model for X, the estimation variability of the two estimators are very close. In other words, the method appears to have certain robustness to the working model, in that in addition to retaining consistency as our theory has promised, it also seems to remain efficient regardless of the working model. The latter property is not within our expectation and whether this is a universal phenomenon with theoretical explanation deserves further investigation.
To further illustrate the generality of the results derived in this paper, we perform a fourth set of simulation studies concerning a Poisson model. We generate the counting response variable Y with mean exp{βx + θ( γ T z)} and generate X from N (0, 1.1 2 ). We set β = 1.1, θ( γ T z) = −0.4 cos(2.75 γ T z − 1.0) and allow substantial meansurement error σ u = 0.8. Following (2.5), we directly posit E * (X | δ) = δ 2 and E * (X | δ) = δ sin(δ) for E(X | δ). We experimented with various dimension of z from 2 to 11 where z contain both continuous and discrete. Simulations results are summarized in Table 4. The consistency of our estimator, regardless if the posited models are correct or not, as well as the superioty of our method in contrast with the comparison methods are clear from these results.

Framingham heart study
We use our new methodology to analyse data from the Framingham Heart Study described in Section 1. The data set contains 1,126 male subjects. We use the occurrence of coronary heart disease as the response variable (Y ), and systolic blood pressure, after subtracting 50 and taking logarithm transformation, as the covariate measured with error (W ), see Carroll et al. (2006) who used this transformation, so that W = X + U , where X is the transformed true systolic blood pressure. We included age, the logarithm of 1 + the number of cigarettes smoked per day as reported by the subject and metropolitan relative weight as confounders Z, with age chosen to be the leading component in Z. Metropolitan relative weight is defined as the percentage of desirable weight (the ratio of actual weight to desirable weight times 100). Desirable weight was derived from the 1959 Metropolitan Life Insurance Company tables (Metropolitan Life Insurance Company, 1959) by taking the midpoint of the weight range for the medium build at a specified height, see also Hubert et al. (1983).
We fit the model with systolic blood pressure in its original scale. With H(·) being the logistic distribution function, the final model is Using the available repeated measurements of W , we obtained the measurement error standard deviation to be 0.0745, and the Kolmogorov-Smirnov test for the normality of U yields a p-value of 0.701. We also include the qq-plot of the errors in Figure 1, which exhibits a linear pattern. Thus, we assume U has the centered normal distribution with standard deviation 0.0745.
The semiparametric analysis of the Framingham data, as well as the results from naive estimator and regression calibration estimators are given in Table  5. Not unexpectedly given the context, all results confirm the significance of Table 4 Results of simulation 4 with p = 2 to 11. The true parameter is β = 1.1. The the estimates("est"), the sample standard errors ("sd"), the mean of the estimated standard errors ( sd) and the 95% confidence interval ("%") of six different estimators are reported. The six estimators are the naive estimator ("Naive"), the regression calibration estimators with two working distributions of X ("RC-Nor" and "RC-Uni"), the oracle estimator ("Oracle"), and the local estimators with two posited forms of E(X | δ) ("Local 1" and "Local 2").  the systolic blood pressure as a risk factor for heart disease. In addition, the two estimates from the two semiparametric methods, conducted under a normal and a uniform working model for the distribution of X respectively, are very close. The naive estimator is attenuated towards zero by approximately 25%. Neither the effects from number of cigarettes smoked nor metropolitan weight is statistically significant. We also plot the estimated θ( γ T z) as a function of γ T z, as well as the 95% pointwise confidence bands in Figure 2 from both semiparametric methods, and we can see a general trend of increasing risk with increasing age.

Discussion
We have developed both estimation and inference tools to analyse covariate effect when the covariate under study is measured with error and also subject to confounding effects. The method is completely general, reflected in the generality of the main regression model. Specifically, we allow arbitrary regression relation between the response variable and the covariate under study, and we do not require a specific parametric model strategy for the confounding effects. The estimated θ( γ T z) as a function of ( γ T z) in Framingham data analysis. Vertical axis stands for θ( γ T z) and horizontal axis stands for ( γ T z). In the left panels, γ is obtained with a normal working model on X and in the right panels γ is obtained with uniform working model on X. The plots in the lower panels contain the 95% confidence bands.
Our procedure does not require any model assumption on the unobservable covariate of interest, and the framework can allow arbitrary measurement error structure. Under the special situation, when the regression model has a generalized partially linear form, and the measurement error is normal additive, great simplification occurs (Ma and Tsiatis, 2006) and the estimation procedure degenerates to a backfitted or profiled version of the estimator given in Stefanski and Carroll (1987). We would like to point out that to solve the estimating equations, one could choose to use backfitting or profiling procedures. In our construction of the estimator, these are only two ways of solving the estimating equations jointly. Upon convergence, the solutions from backfitting and profiling are identical. They are both roots of the estimating equations. This is very different from using backfitting versus profiling before estimating equations are derived, where using profiling or backfitting could result in different sets of estimating equations and hence both the theoretical and empirical performance can be different. The latter issue is well studied in Van Keilegom and Carroll (2007). LIkewise, the nonparametric estimation of θ(·) can also be carried out via splines, wavelets, etc., and research along these lines are certainly needed.

A.2. List of regularity conditions
1. The function θ(·) is twice differentiable and its second derivative is Lipschitz-continuous. 2. The density function of Z has a compact support and is positive on the support. 3. The matrix A and B defined in (3.1) and (3.2) are non-singular and their elements are bounded away from infinity. 4. The kernel function K(·) has compact support, is bounded on its support, and satisfies K(x)dx = 1, xK(x)dx = 0 and x 2 K(x)dx > 0. 5. The bandwidth h = O(n −r ) for 1/8 < r < 1/2. Condition 1 is a standard smoothness requirement on θ(·) required for general nonparametric smoothing methods. Condition 2 requires the distribution of Z to have some properties to avoid technical issues such as dividing by zero. This requirement can be slightly relaxed at the price of more tedious technical treatment. Condition 3 ensures that the estimators of the parameters do not degenerate. Condition 4 rquires the kernel function to be the usual compactly supported second order kenel. Condition 5 states the bandwidth reuirement and illustrates that the method does not require under smoothing.

A.3. Proof of Theorem 3.1
For notational simplicity, we define ζ = (β T , When solving for α in (2.4), we have where 0 β is a zero vector with the same length as β. Note also that Taking into account that E{L θ (Y, W, S, Z; ζ, α) | z} = 0, this yields Incorporating the above, we have Single index semiparametric measurement error modeling Plugging the above into (A.1), we obtain the expansion in Theorem 3.1. The subsequent results in Theorem 3.1 are easy to obtain. Therefore, their proofs are omitted.

A.4. Proof of Theorem 3.2
The asymptotic expansion in Theorem 3.1 indicates that A −1 S eff {Y, W, S, Z; ζ, θ(·)} is an influence function (Newey, 1990) To show the efficiency, we need to show that when f * X|S,Z (x, s, z) = f X|S,Z (x, s, z), S eff is the residual of the orthogonal projection of S ζ onto the nuisance tangent space, denoted Λ. Following Tsiatis and Ma (2004), the nuisance tangent space with respect to f X|S,Z (x, s, z) is Λ f = [E{a(X, S, Z) | Y, W, S, Z} : E(a) = 0], and L ζ is the orthogonal projection of S ζ onto Λ ⊥ f , the orthogonal complement of Λ f . Taking derivative of l * (β, γ, α, y, w, s, z) with respect to α and considering all possible α, we obtain the nuisance tangent space with respect to θ(·) as Λ θ = {S θ (Y, W, S, Z)a( γ T Z)}.