Estimating multiple treatment eﬀects using two-phase semiparametric regression estimators

: We propose a semiparametric two-phase regression estimator with a semiparametric generalized propensity score estimator for estimating average treatment eﬀects in the presence of the ﬁrst-phase sampling. The proposed estimator can be easily extended to any number of treatments and does not rely on a prespeciﬁed form of the response or outcome func- tions. The proposed estimator is shown to reduce bias found in standard estimators that ignore the ﬁrst-phase sample design, and can have improved eﬃciency compared to the inverse propensity weighted estimators. Results from simulation studies and from an empirical study of NHANES are pre- sented.


Introduction
Timely comparative treatment analysis is useful for physician recommendations, patient awareness, regulatory agency assessments of benefit-risk profiles, and reimbursement agency cost effectiveness assessments. The use of observational data for such purposes has grown significantly in number of studies and importance [14,24]. In many observational databases subjects can be exposed to one or more treatment options and the treatments and study participation are self-selected. The database is often a sample from a complex survey, such as National Health and Nutrition Examination Survey (NHANES) data, a set of subjects who enroll in a particular insurance policy, a combination of clinical trials as in meta-analysis, or a set of subjects who receive care at centers that shares electronic medical records. Of particular interest due to simple interpretation and practical use in reimbursement are the average expected treatment differences for a population, termed population average treatment effects (ATE) in [8,15].
For estimating average treatment effects in observational data analysis, literature already contains several approaches including matching estimators [1,12,23], inverse probability weighted (IPW) estimators ( [9]; [11] for the case of two treatments; [4] in the context of multiple treatments), and doubly robust estimators [2,17,18,26] that tend to be combinations of IPW estimators and outcome regression models. A recent work by [27] considers estimation of treatment effects from two-phase samples, where their observational dataset is a simple random sample from a super-population, a validation sample is drawn using stratified Poisson sampling, and observations grouped by treatment indicators are results from a self-selection process. However, there is not much work done to study the impact of a general design used to obtain the observational data, and to derive the asymptotic results from incorporation of a general first-phase design. Ignoring the sampling design for the analysis dataset can lead to biased estimators of the average treatment effects and incorrect variance estimation. In the following, we quantify the bias due to ignoring the sample design and give a motivation example to emphasize the importance of the sample design.
In general, survey data can be viewed as the outcome of two processes: in the first process the values of random variables are generated for units in a finite population according to a model called the super population model, and in the second process a sample of units is drawn from the finite population according to a sample design, termed the first-phase sample. Analytic inference is made with respect to the super population model. When the sampling probability depends on an auxiliary variable z or the response variable y, the observed marginal sample likelihood of the response variable y can be altered from the super population likelihood where inference is being made. Therefore sample estimators that ignore first-phase design can be biased for the super population parameters. To quantify the bias, we use the results in [21]. For a random vector (y, z), the sample conditional probability density function (pdf) of y given z and the sample marginal pdf of z can be expressed through the super population pdf's as where f s (·) and f ξ (·) are the sample and super population pdf's, E s (·) and E ξ (·) denote the expectations under the sample and super population distributions respectively, and π is the sampling probability. In this paper, we are interested in estimating the marginal mean of y, denoted as θ = yf ξ (y|z)f ξ (z)dzdy. The marginal mean estimator that disregards the sampling design is θ s = yf s (y|z)f s (z)dzdy. Using equations (1.1) and (1.2), the bias in θ s can be quantified as If π is independent of (y, z), then the bias is zero. If π depends on auxiliary variable z only, then the bias is where µ(z) = E ξ (y|z). If π depends on y, which is called informative sampling, then the bias is Bias = E ξ π(y, z) E ξ (π(y, z)) − 1 y . (1.5) In practice, π often depends on auxiliary variables and possibly design variables used for the sample selection but not included in the outcome model under consideration. The probabilities π can depend on the outcome variable in the case of self-selection. Estimators that do not account for the selection effects in the inference can be seriously biased.
As an example of a case where the first-phase sample design is important, consider a finite population generated from a super population model y i = µ+ǫ i , where ǫ i is a random error variable with mean zero for subject i and y is the outcome of a treatment. Suppose subjects migrate after severe disease progression to larger hospitals with greater treatment options available. If subjects with severe disease progression are also less likely to respond to the treatment, this migration could generate clusters of subjects where subjects with homogeneous ǫ i values are together in larger hospitals. A study designer selects a cluster sample with probability proportion to the hospital size for convenience as more data can be obtained with fewer hospitals selected. Ignoring the sample design will lead to biases in both mean and variance estimation. An analyst might include disease severity in an outcome model as an auxiliary variable, but an estimator of the marginal distribution of disease severity is needed to estimate the marginal treatment mean. Other examples with details on the importance of accounting for the sampling design can be found in [19,21]. Due to the potential for biases, it is worthwhile to explore estimators that account for the first-phase sampling design.
In this paper, we propose a two-phase semiparametric regression estimator based on an argument in [3]. The term two-phase is used because we consider the sampling of the observational data as the first phase and subject treatment selection as the second phase. The term semiparametric is used because both the outcome model and treatment selection probabilities are estimated semiparametriclly. The key advantage of our estimator is the incorporation of the first phase sampling, similarly as in [27], thus correcting the biases in estimators that disregard the first phase design information in the ATE estimation. The paper derives asymptotic results for the proposed estimators obtained from incorporating a general first-phase design and including semiparametric estimators of the self-selection probability and outcome models. Moreover, by viewing the problem as a two-phase sampling problem, the method can be readily extended to multiple sampling phases. This extension is useful because the analysis dataset can be a subset selected from a larger sample of the finite population. This case covers the common situation where detailed treatment and outcome data is available for only a subsample of the data such as in a subsample with medical chart adjudication of claims records or a subsample constructed by merging multiple sources of data like claims records and electronic medical records. The proposed estimator that is designed to handle multiple treatments does not require strong model specification as in fully parametric solution and permits incorporating covariate information through regression.
The paper is organized as below. Section 2 introduces the proposed twophase semiparametric regression estimators and their asymptotic properties. Two simulation studies are presented in Section 3 to compare the proposed estimators to other commonly used estimators. Section 4 contains two examples to illustrate the use of our approach. Section 5 discusses the caveats of using the estimator and possible extensions.

Proposed two-phase semiparametric regression estimators
In this section, we introduce our two-phase semiparametric regression estimators. Section 2.1 builds the framework and discusses the motivation of the estimators, and Section 2.2 contains theoretical results for asymptotic consistency and normality of the proposed estimators.

Basic set-up and the proposed estimator
Let U be a finite population containing (y i , z i ), where i = 1, . . . , N indexes a subject, z i is a set of covariate variables, and y i = [y i1 , . . . , y iG ] T is a vector of potential outcomes for G different treatments. Consider (y i , z i ), i = 1, . . . , N, to be i.i.d. realizations from a superpopulation regression model where ǫ ig are independent random variables with mean zero and variance ν g (z i ) and µ zg (·) is a smooth function. Let A 1 with size n index a first phase sample selected from U under a design p 1 (·) with π 1i as the first order inclusion probabilities, and let A 2g (g = 1, . . . , G) be a collection of disjoint second-phase sample indices that partition the first-phase sample into the G treatment groups.
The partitioning can be viewed as a multinomial extension of Poisson sampling with probabilities π 2ig (on observables) for subject i where δ 2ig is the indicator variable of subject i selecting treatment g, G g=1 δ 2ig = 1, for any i, and δ 2ig is independent of δ 2jh for any subjects i = j and any treatments g and h. The self-selection probabilities π 2ig can be impacted by physician/patient preferences and reimbursement guidelines, and are estimated using the sieve estimation approach of [4]. The z i are assumed to be observed in A 1 and y ig is observed only in A 2g , which is different from [27]where the observed outcome, treatment indicators and covariates are assumed to be available in the population level.
If the outcome model µ zg (z i ) and the selection probability model π 2ig were known, a two-phase regression estimator of the finite population meanȳ N g = N −1 i∈U y ig is Estimator (2.2) is a two-phase sampling extension of the design unbiased difference estimator proposed by [3,22], and it is usually more efficient relative to the IPW estimator N −1 i∈A2g π −1 1i π −1 2ig y ig when y ig is correlated with z i [22]. In the following, the methods used for estimating the selection probability π 2ig and the outcome model µ zg (z i ) will be discussed.
We adopt the method in [4] to estimate π 2ig . Let {r K (z i )} ∞ k=1 be a sequence of known approximating functions, and assume that π 2ig can be approximated by . . , r K (z i )] and γ g,K is the real-valued coefficients of R K (z i ) for the g-th treatment selection. Let an estimator of the K × G matrix γ K = [γ 1,K , γ 2,K , . . . , γ G,K ] bê where 0 K represents a K × 1 vector zeros used to constrain the sum to 1. The estimated probabilities arê This solution is that of multinomial logistic regression. Condition B in the Appendix specifies assumptions about R K (z i ), π 2ig and K to ensureπ 2ig converges to π 2ig fast enough. Choices for the r K (z i ) include power series, spline, and kernel expansions. We propose estimating the g-th outcome model µ zg (z i ) with a semiparametric regression estimator using the base R K (z i ) as in (2.3). The benefit is that the estimator has a semiparametric specification for both the probabilities and the mean functions. Let µ zg (z i ) be the predicted values for all i in A 1 , and the regression is fit with elements indexed in A 2g , where R K (z i ) includes the intercept through the entire paper. Combining (2.2), (2.3) and (2.4), our two-phase semiparametric regression estimator for g-th marginal treatment mean is , for any g = 1, . . . , G. (2.6)

The central limit theorem of θ g
The asymptotic consistency and normality of θ g are established in Theorem 1 on the finite population level, and in Corollary 1 on the super-population level.
For the design properties, we use the traditional finite population asymptotic framework, in which the population U and the designs are embedded into a sequence of such populations index by F N with N → ∞. The o p (·) and → notations below are with respect to this sequence of populations and designs, see [16].
Theorem 1. Under the regularity conditions in the Appendix, Two key steps in the proof (details in the Appendix) are to show Combining (2.12) and (2.13) gives This leads to the asymptotic results in Theorem 1.
Remark 1. The result in Theorem 1 holds so long asμ zg (z i ) is consistent for some quantity that does not necessarily need to be µ zg (z i ), but the efficiency s. However, the proof used to show the consistency in (i) of Theorem 1 does not require the consistency ofμ zg (z i ) to µ zg (z i ).

Remark 2.
Our estimator performs better in terms of bias than the commonly used naive IPW estimator that ignores the first phase design, θ na−ipw To quantify the bias, write Taking an expectation gives the asymptotic bias of θ na−ipw g as The magnitude of the bias depends on the correlation between the first-phase inclusion probabilities, π 1i , and the error in the outcome model implied by the naive IPW estimator ignoring the first-phase. Our estimator can gain efficiency relative to the IPW estimator that incorporates the first phase sampling, To see this, we assume R k (z) = z for a univariate covariate z without loss of generality. Our estimator θ g can be written as Our θ g has a smaller variance of the linearized term than θ ipw g when the condi- holds. This condition will often hold when y ig and z i are correlated and the outcome model is approximately correctly specified. Simulation studies in Section 3 illustrate cases where this efficiency gain occurs. This indicates that a combination of regression and use of estimated propensity scores can give further improvement than using estimated propensity scores alone, which is noted by several authors including [15,25].
Remark 3. When a subset of z i , called x i , is available on the population level, estimator θ g can be easily extended to incorporate this additional information.
For example, this case can occur when there are some demographic variables available in the frame. Let µ xg (x i ) for i ∈ U denote the predicted values for the model relating y ig to the x i . The extended three-phase estimator is The asymptotic properties of θ g,p and its variance estimation are given in Appendix C, where it is shown that the asymptotic variance of θ g,p , denoted by Comparing (2.18) to the asymptotic variance of θ g which can also be expressed as whereȳ 1π,g = N −1 i∈A1 y ig . It can be seen that θ g,p is usually more efficient than θ g when y ig is correlated with x i . The efficiency gain occurs because the second term in (2.18), V {ā 1π,g |F N }, is likely smaller than the second term in (2.19), and π 1ij is the joint inclusion probability in the first phase. Assuming the SRS design is used, yg and S 2 ag are the variances of y ig and a ig . S 2 yg tends to be larger than S 2 ag if y ig can be well approximated by If only control totals are known for the population, a linear regression model can be used to estimate µ xg (x i ). The estimator θ g,p in (2.17) can then be written as (2.20) It is worth noting that θ g takes a similar form to the Simple Doubly Robust (SDR) estimator in [27], if assuming both the outcome and self-selection models are correctly specified in their set-up. The differences are that we use the known population size N and estimate π 2ig and µ zg (z i ) semiparametrically, while they use i∈A1 π −1 1i in place of N and estimate π 2ig and µ zg (z i ) parametrically. The distinction between using parametric and semiparametric estimation arises in the asymptotic results. The SDR will suffer efficiency loss if one of π 2ig and µ zg (z i ) models is wrong. The similarity between their SDR and our θ g is not surprising since the SDR does not use non-validation (population level) data and we do not have population level data to use. If the first-phase is simple random sampling and the covariate is known for the whole population, then our estimator devolves into the estimator from [4], shown to be semiparametric efficient.
While Theorem 1 shows conditional convergence together for θ g andȳ N g , the goal typically is to make inference for g-th marginal treatment mean on the superpopulation level. The following corollary extends the results of θ g on the finite population level to the superpopulation level with a sketch of the proof in the Appendix.
realizations from the super-population model (2.1), then under the conditions in the Appendix , V 1g and V 2g are the same as in (2.7) and (2.8), and E ξ (·) and V ξ (·) here are with respect to the randomness on the superpopulation.
In order to make inference, we next propose a variance estimator V ( θ g ) and prove its consistency in Theorem 2. An estimator of V 1g in (2.7) iŝ andê ig is calculated the same way asǫ ig in (2.23) andπ 2ij,g =π 2igπ2jg if i = j andπ 2ij,g =π 2ig if i = j. An estimator of σ 2 g iŝ (2.29) Combining (2.22), (2.24) and (2.29), the variance estimator for the asymptotic variance in (2.21) is (2.30) The following theorem gives the consistency of V ( θ g ) and the central limit theory using V ( θ g ).

Theorem 2.
Under the conditions in the Appendix, A short proof is provided in Appendix B.
To obtain the inference for treatment effects and other functions of treatment means, we need to estimate λ T θ * , where λ is any real-valued vector and θ * = [θ * 1 , . . . , θ * g ] T is the vector of marginal treatment means from the superpopulation model. As an example, an average treatment effect θ * The same proof in Appendix A can be directly applied to show the asymptotic consistency and normality of θ. If we denote the cell (g, h) of a matrix M by [M ] (g,h) and definê β z = [β z1 , . . . ,β zG ], the variance estimator for θ can be expressed similarly as for g, h = 1, . . . , G, for g, h = 1, . . . , G and g = h, and the diagonal cells [Σ] (g,g) are the same asσ 2 g in (2.29). The similar arguments of Theorem 2 can be used to show the consistency of this estimator. The central limit theorem for any linear combination estimator λ T θ follows immediately.

Simulation study
In this section, we provide two simulation examples to illustrate the performance of our two-phase semiparametric regression estimators of average treatment effects.
In both examples, we consider three treatment levels and population and sample sizes (N, n) = (12500, 250), (25000, 500) and (50000, 1000) to illustrate convergence. These simulations are intended to demonstrate that in two-phase sampling problems, ignoring the first-phase and handling only treatment selection can lead to erroneous conclusions. The simulations will also show there are potential efficiency gains by incorporating population control data, which is often ignored in treatment comparison studies. The first phase designs chosen for the two examples are stratified and probability proportional to size sampling, which are two commonly used designs for data selection. Example 1. We specify the simulation set-up as follows.
Example 2. The second set-up is as follows.
(1) Covariates: (2) Outcome models: 3) where I (·) is an indicator function, s i = z 1i + 5 and e ig ∼ N (0, 1). Under this setup, the marginal means are E(y i1 ) = E(y i2 ) = E(y i3 ) = 5. (3) First phase sampling: Poisson sampling with probability-proportional-to-size (PPS), where the size variable is s i . So π 1i = ( i∈U s i ) −1 ns i , and n is the expected sample size. The joint inclusion probability π 1ij = π 1i π 1j due to independence of the Poisson sampling. (4) Second phase selection: where Φ(·) is the CDF of N (0, 1), and (φ 0g , φ 1g , φ 2g , φ 3g ) is (0.1, 0.1, −0.1, 0.1) for g = 1, is (0.2, 0.2, −0.2, 0.2) for g = 2 and is (0, 0, 0, 0) for g = 3. In this example, we assumed (z 1i , z 3i ) were observed in A 1 and used for estimating π 2ig , while the true functional form of π 2ig depends on (z 2i , z 3i ) where z 2i is z 1i contaminated with noise η i . The second example is of greater complexity than the first example and includes an optimal first-phase design in terms of anticipated variance (see [6]  for both examples will be discussed next. The estimated coefficient of T rt2 i is the estimated treatment effect of θ 2 − θ 1 and the estimated coefficient of T rt3 i is the estimated treatment effect of θ 3 − θ 1 . Note that the covariates related to the first phase sampling, H i in example 1 and z 1i in example 2, are included in the regression analysis. 6. MT: A one-to-one matching estimator using an approach detailed in [1].
The matching was done based on the estimated propensity scoresπ 2ig , and the first phase sampling design weights are also included.
The NA-IPW, REG and MT are three commonly used estimators by practitioners, among which NA-IPW and REG ignore the first phase sampling design. In example 1, we used a cubic spline base of [z 1i , z 2i , z 3i ] for R K (z i ) and a cubic spline base of x i ≡ z 1i for R K (x i ) in estimation. For each variable, 10 knots were identified with locations corresponding to 10 equally spaced quantiles of the corresponding observations. In example 2, a cubic spline base of z 1i with 18 knots and a cubic spline base of z 3i with 18 knots were used to construct R K (z i ), and a cubic spline bases of x 1 ≡ z 1i with 18 knots was used to construct R K (x i ). The locations of the knots were chosen such that the first one third (or the last one third) of the knots are uniformly spread between 0 and 20 th (or 80 th and 100 th ) quantiles of the data for the corresponding variables, and the remaining one third were equally spaced between 20 th and 80 th quantiles.
Tables 1 (a) and (b) present the MC biases, variances, and mean squared errors (MSE) of the estimated treatment effects using the six estimators for each (N, n) combination and for example respectively. The NA-IPW and REG estimators as expected are highly biased in both examples due to ignoring the relationship between the first-phase design and the treatment effects. The matching estimator MT using the first phase design weights does reduce biases, compared to the NA-IPW and REG, but the IPW performs better than the MT in terms of the MSE in most of the cases. Although the IPW is consistent and has the same asymptotic efficiency as our two-phase semiparametric regression estimator (TPR1), the MC biases and variances of the IPW are greater than those of TPR1 in both examples. The MC biases and variances of the IPW though decrease when the sample size increases. The variance reduction of TPR1 over the IPW estimator indicates that gains for finite samples can be made by combining propensity and outcome regression when both models are well approximated semiparametrically. Both of our proposed estimators (TPR1 and TPR2) have similar low MC biases and much smaller MC variances and MSE relative to other estimators considered. TPR2 is more efficient than TPR1 due to the use of additional information on the population level. Table 1 The MC biases, variances and MSEs of the estimated treatment effects, for (1)  In example 1, the order of the true marginal treatment means is T rt1 < T rt2 < T rt3 and our proposed two estimators, TPR1 and TPR2, and the IPW estimators estimated the treatment effect order correctly. However, if the first phase sampling is ignored, the estimates from the NA-IPW and REG reverse the order of the estimated treatment means completely. In example 2 where all treatments are marginally equivalent, the NA-IPW and REG estimate a decreasing order of treatment efficacy. These simulation results show that ignoring the first phase design can result in a serious bias in the ATE estimation. Table 2 The coverage probabilities of the 95% C.I. for estimated treatment effects, for (1)  Tables 2 (a) and (b) report the coverage probabilities of the 95% confidence interval (C.I.) for the average treatment effects. For each MC sample and each (N, n), we computed the point estimator θ and the variance estimator of θ, and constructed the 95% C.I. for the pair differences. Variance estimation for the DE is similar to (2.30) with V 2g replaced by N −2 i∈A2g j∈A2g (π 1ijπ2ij ) −1 ∆ 1ij (π −1 1i y ig )(π −1 1j y jg ). Variance estimation for the NA-IPW was done by noting the NA-IPW estimator as a special case of the IPW estimator with assumed simple random sampling in the first phase. The estimated variance of the REG and the MT are provided by the regression and matching packages used in R. Note that the variance estimators for the estimators ignoring the first phase probabilities are not appropriate and can be biased under the full design. In both examples, estimators NA-IPW and REG have very poor coverage probabilities due to the large biases. Estimators IPW and MT that do not ignore the first phase sampling underestimate coverage probabilities in both examples. Our two estimators, TPR1 and TPR2, give satisfactory coverage probabilities in both examples even for a small sample size, relative to the nominal size 0.95.

Empirical study
In this section, we evaluate empirical performance of our proposed estimators in estimating treatment effects using a subsample from the 2005-2006 NHANES Survey. The goal of this empirical analysis is to assess the effect of nutrition label use (treatments) on body mass index (BMI) as in [5]. The NHANES is a study designed to assess the health and nutritional status of adults and children in the United States and is unique in that it combines interviews and physical examinations. A detailed description of the survey can be found at http://www.cdc.gov/nchs/nhanes.htm. The nutrition label use variable has three levels: level 1 = often, level 2 = sometimes, and level 3 = seldom. The study variable y is the BMI calculated from body weight and height. Covariates included were selected from [5]. In general, the covariates were classified into five categories: demographic, risky behavior, lifestyle, knowledge and health situation. There are totally 36 covariates and most of them are dummy variables, see detailed description in [5]. The analysis dataset contains n = 1775 subjects from the NHANES survey data.
NHANES uses a complex multistage probability sampling design, and the weights, i.e. π −1 1i , are created to account for the complex survey design, survey non-response, and post-stratification. The same set of estimators (except for TPR2) evaluated in the simulation section were computed using this dataset. Since most of the covariate are dummy variables, the base used for estimatinĝ π 2ig and the outcome regression model is a vector simply containing all individual covariates. In the REG regression estimator, the explanatory variables are an intercept, treatment indicators and all the covariates. In addition, variance estimation is carried out for all estimators. Due to confidentiality issues, Mashed Variance Units (MVUs) were created and attached to the NHANES data files. The NHANES website provides an R code instruction to produce variance estimates using the MVUs. This R code was embedded into our main codes to calculate the components that are related to the first phase variance estimator in equations (2.25)-(2.28).
The estimated treatment effects, the standard errors and the 95% C.I.s are reported in Table 3. For the two estimators that incorporate the first phase design, TPR1 suggests that the estimated treatment mean of the BMI monotonically increases when the nutritional label use changes from "often" to "seldom", while the IPW and MT estimators give an increasing trend from "often" to Table 3 Results from the empirical study of nutrition label uses, including estimated treatment effects for three nutrition label use levels, their standard errors and their 95% C.I.s. "sometimes", but decreasing trend from "sometimes" to "seldom". A monotonic decreasing trend is present in NA-IPW and REG, leading to the strange conclusion that increasing nutritional label awareness increases BMI. Researchers generating hypotheses using results from NA-IPW and REG method could be led astray by not completely controlling for the full treatment group inclusion probabilities.

Conclusion and remarks
Much of the focus of observational study analysis has been on incorporating treatment selection into estimators to reduce bias due to self selection. Ignoring the first-phase sample design can have large implications for the interpretation of data. Accounting for the first-phase sample design reduces the bias and makes the target of estimation explicit. By incorporating auxiliary variables, the proposed two-phase semiparametric regression estimators are an improvement over the IPW estimators in finite sample problems. The assumptions for the two-phase regression estimators are reasonable for a large number of problems and we demonstrate that valid inference can be made with semiparametric model specificiations. However, these estimators only account for bias that can be explained by observed covariates. If the second-phase inclusion probabilities depend on unobserved variables, residual bias will exist. Further, the IPW and two-phase semiparametric regression estimators rely on a known first-phase design. In some cases, the first-order inclusion probabilities may need to be estimated and a design such as Poisson sampling is assumed. In summary, consideration of handling sample selection phases prior to treatment selection and auxiliary variables can lead to stronger and clearer evidence from observational studies. Estimating treatment effect parameters defined through a general estimation equation in observational studies is a topic for future research.
Condition A.
(1) For all g, δ 2ig is independent of y i , given the variable z i ; (2) z i is distributed with density bounded away from zero on its compact support Z; (3) For all g, V (y ig |z i ) is uniformly bounded for all z i ∈ Z; (4) For all g, π 2ig is bounded away from zero and one. And there exist positive constant C 1 and C 2 such that C 1 < n −1 N π 1i < C 2 .
is bounded away from zero uniformly in K; (2) There exists a sequence of constants ξ(K) such that R K (z) ≤ ξ(K) for any K; (3) For all g, π 2ig (z) and µ g (z) = E[y ig |z] are s-time differentiable with sd −1 Condition C.
(1) the limiting design covariance matrix: nV (ū 1π ) → Σ 1 a.s. and nV (ū 2π,g |A 1 ) → Σ 2g a.s., where Σ 1 and Σ 2g are positive definite; (2) the normalized HT estimators satisfy central limit theorems: The super-population parameter of interest is not identifiable from the data on { G g=1 y ig δ 2ig , z i } n i=1 . Following the literature, we consider missing at random assumption in (A.1) to achieve identification. Condition B is general. But particularly, if R K (z i ) is the power series or the spline series, (B.1) and (B.2) are satisfied automatically with η = 1 for the power series and η = 0.5 for the spline series. Condition C gives the design properties of the Horvitz and Thompson [13] estimators on both phases in the traditional finite population asymptotic framework. For any variable u with finite 4 th moment, defineū 1π = N −1 i∈A1 π −1 1i u i , andū 2π,g = N −1 i∈A2g (π 1i π 2ig ) −1 u i , and their variance and variance estimators as Condition C are satisfied for many commonly designs in reasonably behaved finite populations. Note that (C.3) would not hold for systematic sampling or oneper-stratum designs.

A: Proof of Theorem 1 and Corollary 1
Proof of Theorem 1.
Proof of Corollary 1. We can decompose θ g − θ * g = θ g −ȳ N g +ȳ N g − θ * g . Then the asymptotic results are immediate by using Theorem 1.3.6 of [6] again.