Bootstrap of means under stratified sampling

In a two-stage cluster sampling procedure, $n$ random populations are drawn independently from independent populations and a sub-sample of observations is taken in each of them. The estimator of the general mean of the observed variables is asymptotically Gaussian and the asymptotic distributions of several bootstrap versions of the normalized and studentized statistics are studied. A weighted population resampling provides a good approximation and its accuracy depends on the convergence rate of the sample size of the populations.


Introduction
In a classical stratified sampling, a population is split into a fixed number L of strata and in each of them a subsample is observed. Various bootstrap estimators of the mean have been studied as the strata sizes are finite [16,18] or infinite [2,4]. In a two-stage cluster sampling the population consists of clusters of units, first some clusters are sampled and then units are sampled within the selected clusters. Several two-stage boostrap methods have also been studied in the case of a finite number of finite clusters drawn without replacement [16,18]. Here, we consider a two-stage cluster sampling with an infinite number of clusters. The general setting is the following: A general population is subdivided into a large number of independent populations (or clusters) which cannot be all observed. We consider the problem of estimating some parameters of the distribution function of a variable X on individuals in the general population and we assume that the realizations of this variable in the different populations are independent and identically distributed. The observations of X on individuals within the same population are supposed to be independent and identically distributed conditionally on the selected population.
Here the parameters of interest are the means of X, the variance of the means of X in the different sub-populations and the variance of X and we focus on the behavior of bootstrap estimators. The normalized and the studentized estimators of the mean are asymptotic Gaussian as the number of sampled populations tends to infinity and the variances are consistently estimated. Three bootstrap procedures are studied, B1 Sampling the populations with replacement in the set of the observed populations, then taking the original observed data of these sampled populations, B2 Taking each of the observed populations and sampling the individuals with replacement in the set of the observed data within each population, B3 A two-stage bootstrap cluster sampling that is a combination of the first two procedures. Here the populations are sampled with replacement from the set of the observed populations, then individuals are sampled with replacement within each population.
An application and simulation of the B1 sampling for a large number of populations was presented in [11,12] for specific parameters in forestry. Bootstrap sampling with a large number of dependent variables looks like a stratified sampling, where individuals and variables may be drawn in the bootstrap procedure. Only the bootstrap resampling of individuals is relevant due to the loss of dependence between dependent variables of an individual when they are not jointly resampled, as in [13,14,15] with an application to medical data.
The asymptotic properties of the mean bootstrap estimator in the classical i.i.d. bootstrap were studied by many authors (e.g. [1,2,3,7,17]). They have been extended to some cases of independent but non identically distributed variables and to functional results in [2,4,6,9,18]. We prove properties for some of the three bootstrap procedures when the number of strata and the sub-sampling sizes tend to infinity, under a condition for their respective rate. The results are quite different from those of [1,2,16,18] for means of fixed sub-populations and with resampling of all the populations (B2 sampling scheme), equivalent results are established in proposition 2.2 for the mean of the sub-population means. Though the bootstrap resampling scheme B1 is consistent for the mean parameters µ and µ k of the general population and sub-populations, it is not for the variances of their estimators µ and µ k and other bootstrap estimators of the variances are necessary to obtain asymptotically normal estimators. With random populations, the bootstrap sampling scheme B2 gives consistent and asymptotically normal estimators for all the parameters with weighted bootstrap. Under a weighted bootstrap resampling scheme B1, the estimator of the global mean and its variance achieve the usual properties. Under sampling scheme B3, the bootstrap estimators of µ and µ k and the variances are not consistent. The rate of the bootstrap approximation is given for the weighted resampling scheme B1. As the variance of the estimators splits into within and between-population variances of different orders, the asymptotic results are quite different from the usual results.

Empirical estimation in a subdivided population
A variable X is observed according to the following two-stage cluster sampling: a sample of K populations is selected among a large number L of populations and, in the k-th population of unknown large size, we consider n k independent observations of the variable of interest, X ki , i ≤ n k , k ≤ K. Let N = k≤K n k be the total number of observations, we suppose that N and n k , for each k, increase with K. The estimators of the global population have then equivalently indexed by N = N K or K.
Conditionally on the k-th population sampling, the variables X ki , i ≤ n k , have the distribution function F k and the distribution of the variables X ki in the general population is F including the distribution functions of the L subpopulations. Denote by E the expectation with respect to the sampling of the populations and by E k the conditional expectation in the k-th population, µ k the conditional mean of the X ki 's in the k-th population µ = EX ki = Eµ k .
If E|X| 2 < ∞, the variance of X is denoted V in the general population and the random variance of the X ki 's conditionally on the sampling of the k-th We also denote by γ = E(µ k − µ) 2 the variance of the independent random variables µ k , and σ 2 = EV k . The variance of the X ki 's is The variables X ki and X kj , i = j, are independent within the k-th population but they are dependent in the general population, with Cov(X ki , X kj ) = γ. The variables X ki and X lj , k = l, are independent for any i and j. A random effect linear model having a nested error structure can be used to describe the data, Within the k-th population, µ k and V k are unbiasedly estimated by For the global population, µ is estimated by the empirical mean µ N of the X's. Several boostrap estimators of µ will be proved to converge to the empirical estimator µ ′ K of the mean of the intra-populations means. For the general mean, the estimators are denoted Note that with random populations, both µ N and µ ′ K have the expectation µ since E µ k = µ for every k. It would not be with fixed populations. We have The variance terms σ 2 , V and γ of the variable X are estimated by The variances of the estimators are where EV k satisfies (1),ñ −1 = K −1 k n −1 k < 1 is the harmonic mean of the n k 's and n * = N −2 k n 2 k < N −1 max K k=1 n k with n * = K −1 when n k constant for every k, then the estimators are identical. The variance of µ ′ K is smaller than the variance of µ when γσ −2 > {( nK) −1 − N −1 }{n * − K −1 } −1 which may happen even with K = 2. If n k = c k K α with 0 < α < 1 and bounded constants c k for every k, then S N < S ′ K . The variance of both estimators split into a within-populations variance and a between-populations variance, V ar interμN = N −1 σ 2 and V ar interμN = n * γ.
By (1)-(8) and with n k = O(K α ), asymptotically unbiased estimators of V ar µ ′ K and V ar µ N are Estimators of the within and between-populations variances follow as Under a convergence rate for the sub-sample sizes, the estimators are all consistent, and the variances of order K −1 are larger than the variance of µ N . The studentized statistics, as well as the normalized statistics converge in distribution: Proof. µ N −µ and µ ′ K −µ ′ are the means of the (weighted) independent variables having zero mean and variances of main order n * and K respectively. With (7) and (8)) are such that (1)). Then N V arμ N < ∞ which ensures the a.s. converge of the estimator µ N by the Borel-Cantelli lemma.
The boundedness of the variances S 2 N and S ′ 2 K imply the following Lindeberg conditions and the weak convergence of the normalized variables is a consequence of a CLT [5].
If we considerμ K = K −1 K k=1 µ k , similar results hold for µ ′ K −μ K , with variance σ 2 (Kñ) −1 = V ar intra µ ′ K , and forμ−µ with variance γK −1 = V ar inter µ ′ K . The result in the proposition below is similar to the convergence property in a stratified population with fixed strata [4].

Proposition 2.2
Under the conditions of the proposition 2.1 , µ ′ K −μ K and µ K − µ ′ converge in probability to zero,

Bootstrap estimation
The observed data set is denoted by X and we consider the three bootstrap procedures described in the introduction. For each of them, E * and V ar * denote the mean and variance for the bootstrap sampling distribution conditionally on X , without any distinction of the specific distribution, F k is the empirical subdistribution function of the k-th population.

Bootstrap sampling of individuals
We consider a bootstrap sample which consists of the sub-samples set of size n k for the k-th observed population, (X * ki ) i≤n k , where the variables X * ki have the distribution F k , i ≤ n k , k ≤ K and the K populations are independent and considered as fixed. The bootstrap version of the previous estimators are written The variances of µ * N and µ ′ * K reduce to intra-population variances given by it is an unbiased bootstrap estimator of V k and unbiased bootstrap estimators the variances of µ * N and µ ′ * K follow. Then bootstrapping on independent individuals only with fixed populations (resampling scheme B2) as in the classical stratified sampling leads to estimators of µ having variances asymptotically equivalent to the within-populations variance of the estimators µ. All variables X * ki , i ≤ n k , k ≤ n are independent conditionally on the sample X and µ * N − µ N is the mean of the N independent centered variables (X * ki − µ k ). The next convergence results are then proved as Proposition 2.1, Proposition 3.1 If E|X| 2+δ < ∞ for some δ > 0 and n k is of order K α , 0 < α < 1/2, for k = 1, . . . , K, then conditionally on X , converge weakly to standard Gaussian variables.

Bootstrap sampling of the populations
A sample of K populations is drawn uniformly from the observed population set, each of them having the probability K −1 , and the variables X * ki are defined as the observed values of the variable X in these sampled populations, according to the sampling scheme B1. We get µ * k = µ l with probability K −1 , for every k, l ≤ K, and E * µ * k = µ ′ K and E * µ * 2 k = K −1 l≤K µ 2 l for every k. It follows that both µ ′ * K and µ * N have the bootstrap mean µ ′ K and the bootstrap variances are The usual bootstrap estimator of the variance of µ ′ * is the bootstrap variance of the µ * k 's, it is a strongly consistent bootstrap estimator and E * S * 2 K = (K − 1)K −1 V ar µ ′ K . We now get bootstrap estimators of µ having a variance asymptotically equivalent to K −1 V ar µ ′ K as K → ∞ and the asymptotic behavior of µ ′ * K is similar to that of µ ′ K : Proposition 3.2 Under the conditions of Proposition 3.4 and conditionally on X , K 1/2 S * −1 K ( µ ′ * K − µ ′ K ) converges weakly to a standard Gaussian variables. As the uniform population sampling is not relevant for µ N , let us consider a sample of K populations drawn with probabilities n k N −1 for population k = 1, . . . , K, and the variables X * ki are defined as the observed values of the variable X in these sampled populations. The means of the bootstrap estimators are E * µ * k = E * µ * N = µ N , and for l = 1, . . . , K their variance are An unbiased bootstrap estimator of the variance of µ N is also an estimator of this variance: a weighted bootstrap variance of the µ * k 's.

Proposition 3.3 Under the conditions of Proposition 3.4 and conditionally on
X , S * −1 N ( µ * N − µ N ) converges weakly to a standard Gaussian variable.

Cluster bootstrap sampling
For the B3 bootstrap sampling, K independent populations are drawn uniformly from the observed population set and the bootstrap sample consists of sub-samples of size n l −1 and distribution F l if the the k-th bootstrap population is the l-th observed population. Denote Y * li bootstrap variables having the distribution F l and X * ki bootstrap variables having the distribution F l with probability K −1 , for every l, k = 1, . . . , K. The bootstrap sampling distribution is then The behavior of µ * N also depends on µ ′2 K in this procedure, with V ar * µ * N = (n * − 1)K 2 V ar * µ ′ * K + ( µ ′2 K − µ 2 N ) and µ 2 N cannot be simply estimated in this way. For µ ′ K , the variance of the bootstrap estimator µ ′ * K in the two-stage cluster procedure is equivalent to the sum of the total and within-populations variances of µ N as N and K → ∞, it is therefore unsuitable, and we get Proposition 3.4 Under the conditions of Proposition and conditionally on X , converges weakly to a standard Gaussian variable.

Second order asymptotics
It appears that among the studied bootstrap procedures, only the second resampling scheme of B2 provides consistent estimators for µ and the variances of the estimators. Further expansions prove that the bootstrap estimator of µ satisfy the classical properties of bootstrap estimators in the independent case [1,2,8,10,17]. The variances are denoted S 2 N = V ar µ N , S 2 N = V ar µ N and S 2 * N = V ar * µ * N .
Proposition 3.5 If E|X| 3 < ∞ and n k = c k K α , for some positive and bounded c k and 0 < α < 1/2, for k = 1, . . . , K, then the bootstrap estimators based on the population sampling procedure, with probability n k N −1 for population k, satisfy a.s.
lim sup where C is the constant of the Berry-Esseen bound. If E|X| 6+δ < ∞ for some δ > 0 and the d.f. F k are continuous for k ≤ K, then a.s.

lim sup
Proof. The variances satisfy S 2 s. under convergence rate of the n k 's. The bootstrap variance S 2 * N is estimated by S * 2 N in 3.1, and S * 2 N = S 2 N + o(1) a.s. The Berry-Esseen theorem for a weighted sum of independent variables with varying distributions applies for both µ N − µ and µ * N − µ N . Using the expansion S −3 For the bootstrap estimator, a.s. conditionally on X uniformly in x, and the first result follows. The second part of the proposition is a consequence of Edgeworth expansions of µ N − µ and µ * with p(x) = 2x 2 + 1. The expansion for the bootstrap estimator is with S N converging to S N and the sum of third moments converging to their expectation at the rate N −1 .

Discussion
The model studied in this paper is the model of means of a variable X i having a mixture distribution of L d.f. F 1 , . . . , F L , defined as F k (x) = P {X i ≤ x|i ∈ P k }, for the k-th sub-population P k . The variable X i has the d.f. F = L k=1 p k F k , with p k = P {X i ∈ P k }.
For a stratified population with a fixed number of strata, [2] have proved convergence of the bootstrap Studentized estimators under the condition of subsample sizes n k of the same order O(N ), they have proved in particular that the distribution function of (V ar µ * k ) −1/2 k E * ( µ * k − µ k ) 3 is an approximation of (V ar µ k ) −1/2 k E( µ k − µ k ) 3 . Their model is different from the random model considered in this paper under the slower convergence rate n k = O(K α ), 0 < α < 1/2 and with weighted estimators. In proposition 3.5, the order of the approximations is K 1/2+2α , that is stronger than the usual results.
Inversion of several expansions improve coverage probabilities and have been compared to bootstrap confidence intervals deduced from the approximation of the distribution of Studentized statistics, these numerical improvements were discussed in [1,8,10] and in many other papers later. Here all the estimators are differentiable transformations of moment estimators that admit Edgeworth expansions which provides a second order approximation of the distribution function of S −1 N ( µ N − µ) by its inversion, but the order 0(K −(α+1/2) ), coming from the expression of S N , differs from O(K −1 ), the expected term after a correction term O(K −1/2 ) in a one-stage sampling. For the studentized statistic, the correction is similar, with 1 − x 2 replaced by 1 + 2x 2 .