Approximation of rejective sampling inclusion probabilities and application to high order correlations

This paper is devoted to rejective sampling. We provide an expansion of joint inclusion probabilities of any order in terms of the inclusion probabilities of order one, extending previous results by H\'ajek (1964) and H\'ajek (1981) and making the remainder term more precise. Following H\'ajek (1981), the proof is based on Edgeworth expansions. The main result is applied to derive bounds on higher order correlations, which are needed for the consistency and asymptotic normality of several complex estimators.


Introduction
In a finite population of size N , sampling without replacement with unequal inclusion probabilities and fixed sample size is not straightforward, but there exist several sampling designs that satisfy these properties (see Brewer and Hanif (1983) for a review). Rejective sampling, which is also called maximum entropy sampling or conditional Poisson sampling, is one possibility, introduced by Hájek (1964). If n denotes the fixed sample size, the n units are drawn independently with probabilities that may vary from unit to unit and the samples in which all units are not distinct are rejected. In the particular case of equal drawing probabilities, rejective sampling coincides with simple random sampling without replacement. Rejective sampling with size n can also be regarded as Poisson sampling conditionally on the sample size being equal to n. The unconditional Poisson design can be easily implemented by drawing N independently distributed Bernoulli random variables with different probabilities of success, but it has the disadvantage of working with a random sample size. The conditional Poisson design can also be interpreted as a maximum entropy sampling design for a fixed sample size and a given set of first order inclusion probabilities.
Rejective sampling has been extensively studied in the literature. Hájek (1964Hájek ( , 1981 derives an approximation of the joint inclusion probabilities in terms of first order inclusion probabilities. By showing that the maximum entropy design belongs to a parametric exponential family, Chen, Dempster and Liu (1994) give a recursive expression of the joint inclusion probabilities and propose a new algorithm. This algorithm has been improved by Deville (2000), who gives another expression for the joint inclusion probabilities. Using the results in Chen, Dempster and Liu (1994), Qualité (2008) proves that the variance of the well-known unbiased Horvitz-Thompson estimator for rejective sampling is smaller than the variance of the Hansen-Hurvitz estimator for multinomial sampling. Several estimators of the variance of the Horvitz-Thompson estimator have also been proposed; see Matei and Tillé (2005) for a comparison by means of a large simulations study. The conditional Poisson sampling scheme is not only of interest in the survey sampling field, but also in the context of case-control studies or survival analysis Chen (2000).
The purpose of the present article is to generalize the result given in Hájek (1964) and Hájek (1981), obtained for the first and second order inclusion probabilities of rejective sampling, to inclusion probabilities of any order and also to provide a more precise remainder term. The proof of our result is along the lines of the proof by Hájek (1981) using Edgeworth expansions and leads to approximations that are valid when N , n and N −n are large enough. One interesting application of our result is that it enables us to show that rejective sampling satisfies the assumptions needed for the consistency and the asymptotic normality of some complex estimators, such as the ones defined in Breidt and Opsomer (2000), Breidt et al. (2007), Cardot et al. (2010) or Wang (2009). Such assumptions involve conditions on correlations up to order four, which are difficult to check for complex sampling designs that go beyond simple random sampling without replacement or Poisson sampling. Our result implies that the rejective sampling design also satisfies these conditions.
In the case-control context, Arratia, Goldstein and Langholz (2005) consider rejective sampling and also give approximations of higher order correlations. Their approach and the assumptions they need to derive their results are different from the ones we consider in the present paper. Instead of using Edgeworth expansions, they consider an expansion that involves the characteristic function. Their results are obtained using a condition, which is sufficient, but not necessary to derive our expansion. In view of this we provide an example of a rejective sampling design that does not satisfy the condition in Arratia, Goldstein and Langholz (2005), but does satisfy our weaker assumption. Moreover, Arratia et al do not give an explicit approximation formula for higher order inclusion probabilities in rejective sampling, whereas we do provide such an approximation, which may be of interest in itself.
The paper is organized as follows: in Section 2 we introduce notations and state our main result which is Theorem 1. In Section 3, we apply this result and illustrate that rejective sampling satisfies conditions on higher order correlations imposed in the recent literature to derive several asymptotic results. Detailed proofs are provided in Section 4.

Notations and main result
In this paper, we use the first description of rejective sampling by Hájek (1981), namely as Poisson sampling conditionally on the sample size being equal to n. Let us denote U as the population of size N . Let 0 ≤ p 1 , p 2 , . . . , p N ≤ 1 be a sequence of real numbers such that p 1 + p 2 + · · · + p N = n. The Poisson sampling design with parameters p 1 , p 2 , . . . , p N is such that for any sample s, the probability of s is The corresponding rejective sampling design is such that the probability of a sample s is where c is a constant such that s P RS (s) = 1. We refer the reader to Hájek (1981) for more details.
The inclusion probabilities of order k under this sampling scheme are denoted as π i1,i2,...,i k = P RS (i 1 ∈ s, i 2 ∈ s, . . . , i k ∈ s) for any {i 1 , i 2 , . . . , i k } ⊂ {1, 2, . . . , N }. Our purpose is to obtain an expansion of inclusion probabilities of any order. Theorem 7.4 in Hájek (1981), see also Theorem 5.2 in Hájek (1964), provides such an expansion for inclusion probabilities of order two, i.e., We will obtain an extension of (2.2) and prove that a similar expansion holds for inclusion probabilities of higher order. Our approach is along the lines of the method used in Hájek (1981). Consider Poisson sampling with parameters p 1 , p 2 , . . . , p N and denote as P the corresponding probability measure on the set of samples under this sampling scheme. For i = 1, 2, . . . , N , we denote as I i the indicator of inclusion of unit i, that is For every i = 1, 2, . . . , N , the indicator I i is a Bernoulli random variable with parameter p i . Define K = size s = I 1 + I 2 + · · · + I N . (2.4) Note that the expectation and the variance of K satisfy E P (K) = n and V P (K) = d. By Bayes formula and by independence of the I i 's under Poisson sampling, the inclusion probability π i1,i2,...,i k can be written as (2.5) The next step is to use Edgeworth expansions for the probabilities of K. This leads to the next lemma.
Lemma 1. Consider Poisson sampling with parameters p 1 , p 2 , . . . , p N , such that p 1 +p 2 +· · ·+p N = n with corresponding probability measure P on the set of samples. Let d and K be defined in (2.3) and (2.4), respectively. Then, for all The proof of the lemma is provided in Section 4. We are now in the position to formulate our main result.
Proof. From Lemma 1, we find Together with (2.5) it follows that for all k ≥ 1, (2.10) Applying (2.10) to the case k = 1, yields that the first order inclusion probabilities satisfy and as a consequence, Combining this with (2.10) yields imsart-generic ver. 2011/11/15 file: Boistard_Lopuhaa_Gazen_arxiv.tex date: May 2, 2014 where the contribution to terms of order d −1 is This proves part (i). Part (ii) is deduced immediately from (i) and (2.11).

Application: bounds on higher order correlations under rejective sampling
Conditions on the order of higher order correlations, as N → ∞, appear at several places in the literature, see e.g., Breidt and Opsomer (2000), Breidt et al. (2007), Cardot et al. (2010) or Wang (2009), among others. Such conditions are used when studying asymptotic properties in survey sampling for general sampling designs, but they are difficult to check for more complex sampling designs, that go beyond simple random sampling without replacement. An attempt to provide simpler conditions for rejective sampling can be found in Arratia, Goldstein and Langholz (2005). They formulate some sort of asymptotic stability condition on inclusion frequencies that ensure bounds on general higher order correlations. The purpose of the present section is to explain how Theorem 1 can be used to establish several bounds on higher order correlations for the rejective sampling design. The bounds in Arratia, Goldstein and Langholz (2005) match with the ones that we find for correlations up to order four, which suffices for the conditions imposed in Breidt and Opsomer (2000); Breidt et al. (2007); Cardot et al. (2010); Wang (2009). However, in order to derive these bounds, we only need the simple requirement that where d is defined in (2.3). Moreover, one can show that (3.1) is weaker than the asymptotic stability condition in Arratia, Goldstein and Langholz (2005) as detailed in section 4.2.
Before we start a discussion on the assumptions on higher order correlations that appear for example in Breidt and Opsomer (2000); Breidt et al. (2007); Cardot et al. (2010); Wang (2009), first note that (3.1) necessarily yields that d → ∞, which means that Theorem 1 holds. Moreover, condition (3.1) has a number of additional consequences, such as n ≥ d → ∞, N − n ≥ d → ∞, and lim sup A typical example of a condition on higher order correlations, is lim sup where for every integer t ≥ 1:  (2000) among others. Since E P (I i − π i )(I j − π j ) = π ij − π i π j , condition (3.3) immediately follows from Theorem 1 and (3.2). Interestingly, the simple representation of the second order correlations as a difference of second order inclusion probabilities and the product of single order inclusion probabilities can be generalized for correlations of higher order as precised in the following lemma.

Lemma 2. For any
where D m,k is the set of distinct m-tuples in A k and {i m+1 , . . . , i k } = A k \ {i 1 , . . . , i m }.
From this lemma, we can prove the following proposition that provides an expansion of higher order correlations for rejective sampling.
Proposition 1. Consider a rejective sampling design. Then, for any k ≥ 3 and any positive integers n j , j = 1, 2, . . . , k, The proofs of Lemma 2 and Proposition 1 are provided in Section 4.3. Proposition 1 together with condition (3.2) imply that the following conditions that appear for example in Breidt and Opsomer (2000) are satisfied: (3.7) Other conditions on higher order correlations, such as

Proof of Lemma 1
For the proof of Lemma 1, we use Edgeworth expansions for probabilities of sums of independent random variables, as given in Theorem 6.2 in Hájek (1981). If K = I 1 + I 2 · · · + I N is a sum of independent Bernoulli random variables with parameters p 1 , p 2 , . . . , p N , and d = V(K). Then, for 0 ≤ l ≤ N and m ≥ 1, where f m (x) is the Edgeworth expansion of P (K = l) up to order m, given by where φ denotes the standard normal density and each P j is a linear combination of (probabilistic) Hermite polynomials involving the cumulants of K. Recall that the Hermite polynomials are defined by Lemma 3. The polynomials P j in (4.2) can be expressed as: where the sum is taken over all sets {k m } consisting of all non-negative integer solutions of k 1 + 2k 2 + · · · + jk j = j, (4.5) and r is defined by k 1 + k 2 + · · · + k j = r, and where κ m is the m-th cumulant of K and H j+2r is the Hermite polynomial of degree j + 2r as given in (4.3).
The next lemma shows that the cumulants of the sum of independent Bernoulli variables are of the same order as the variance.
Lemma 4. Let K = I 1 + I 2 + · · · + I N be a sum of independent Bernoulli random variables with parameters p 1 , p 2 , . . . , p N . Let d = V(K) = Proof. Recall that the cumulants of a random variable X are defined as the coefficients in the expansion of the logarithm of the moment-generating function, i.e., if the m-th cumulant is κ m = g (m) (0). This implies that the m-th cumulant κ m of the sum of independent Bernoulli random variables is equal to the sum of the m-th cumulants e m of the individual Bernoulli variables. Moreover, we have the following recurrence relation between the cumulants of a single Bernoulli variable with parameter p: e m+1 = p(1 − p) d dp e m , (4.6) see for instance, example (c) in Section 4 in Khatri (1959). It is straightforward to see that κ 1 = p 1 + p 2 + · · · + p N and κ 2 = N i=1 p i (1 − p i ) = d. It can be proved by induction, using (4.6), that e m = p(1 − p)R m (p), where R m is a polynomial with degree less than or equal to m − 1 and with coefficients depending only on m. Thus, κ m = dQ m (p), where Q m (p) is of the form and is bounded uniformly in p 1 , p 2 . . . , p N . This proves the lemma.
The first summation is over all possible (i 1 , . . . , i m ) ∈ D m,k , which are all possible combinations of m different indices from A k = {i 1 , i 2 , . . . , i k }. From each such combination i 1 , . . . , i m , the second summation picks two different indices i < j from the set {i 1 , . . . , i m }. This means that any combination of (1 − π i )(1 − π j ), with {i, j} ⊂ A k is possible. In fact, each such combination will appear several times and we only have to count how many times. Well, for a fixed combination (i, j), from the k possibilities A k , we need to pick i and j, and for the m − 2 remaining choices there are k − 2 possibilities left. We conclude that each term (1 − π i )(1 − π j ), with {i, j} ⊂ A k , appears k−2 m−2 times. Moreover, this holds for any m = 2, 3, . . . , k. This means that (−1) k−2−n k − 2 n = (1 − 1) k−2 = 0.
We conclude that the coefficient of d −1 is zero, which proves (4.14). Next, suppose that the expectation is of order O(d −2 ) for all powers 1 ≤ m j ≤ n j , and consider This can be written as according to the induction hypothesis. Next, write When we insert this, we find by applying the induction hypothesis.