Algorithms to Calculate Exact Inclusion Probabilities for a Non-Rejective Approximate π ps Sampling Design

AP-design, an efficient non-rejective implementation of the πps sampling design, was proposed in the literature as an alternative Poisson sampling scheme. In this paper, we have updated inclusion probabilities formulas in the AP sampling design. The formulas of these inclusion probabilities have been greatly simplified. The proposed results show that the AP design and the algorithms to calculate inclusion probabilities are simple and effective, and the design is possible to be used in practice. Three real examples have also been included to illustrate the performance of these designs.


Introduction
Unequal probability sampling is frequently used in surveys in order to increase the efficiency in the estimation of the population characteristics.A sampling design without replacement and with unequal inclusion probabilities which are proportional to a size variable, that is known for all units in the population is usually called a πps sampling design.The πps sampling usually produces more efficient estimates than sampling with equal probabilities.Suppose that the finite population U consists of N units labelled 1, . . ., N .An auxiliary variable with value X i for the unit i is known for all i = 1, . . ., N .Assume that X i > 0, for all i and strict inequality for at least one i.It is required to estimate the total Y = i Y i where the sum is over 1, . . ., N , given a sample of size n.Let p i = nX i /X, i = 1, . . ., N be the prescribed inclusion probability parameters with N i=1 p i = n with X its corresponding population total.The problem is how to select a sample with fixed size n, so that the probability of each unit i to be included in the sample equals just p i .Many papers have proposed sampling schemes in which the inclusion probability of unit i is π i .Some important reference are followings: Sen (1953), Durbin (1967), Brewer (1963), Sampford (1967), Hájek (1964Hájek ( , 1981)), Rosén (1997a), Aires (1999), Bondesson & Thorburn (2008), Bondesson & Grafström (2011), Grafströ (2009), Laitila & Olofsson (2011), Olofsson (2011).Most of the schemes with predetermined inclusion probabilities are either difficult to execute or calculate π ij , the second order inclusion probability units i and j, if n is more than 2. Recently, Zaizai, Miaomiao & Yalu (2013) presented a new approximative πps design for fixed sample size n as follows: 1. Draw an initial sample s 0 , using Poisson sampling design with probabilities p i N 1 .The size of the initial sample s 0 is a random variable denoted by n s0 .
2. If n s0 = n, then the sampling is finished and the sample s = s 0 .If n s0 < n, then replenish the rest units denoted by s 1 , its size n − n s0 , by simple random sampling without replacement (SRSWOR) design from U −s 0 , the final sample s = s 0 ∪ s 1 .If n s0 > n, then remove n s0 − n units denoted by s 2 , using the SRSWORdesign, from s 0 , the final sample s = s 0 − s 2 .The AP design becomes a nonrejective sampling design.Algorithms for calculating exact first-and second-order inclusion probabilities of the corresponding design are too complex and involve a Jacobi over-relaxation iterative method.
Note 1.We assume that the population is such that p i = nX i /X < 1, for all i.You need to remove the cases where p i is larger than 1 and then iterative removing further units if necessary The purpose of this paper is to simplify calculation of the first-order and secondorder inclusion probabilities of the AP design.The analytical expressions of inclusion probabilities for the AP design presented in Section 2 are simpler to operate than the original one.

Inclusion Probabilities of AP Design
Now we discuss inclusion probabilities of the AP sampling design.For convenience, we denote the random variable k∈U,k =i I k as n −i s0 , the random variable tors for the Poisson sampling.In order to calculate the first and second-order inclusion probabilities of the AP design, we firstly derive the following Proposition and Lemmas.For convenience, the subset {1, 2, • • • , i} of U is abbreviated as U i and P r( i α=1 I α = j) as P i j where j = 0, 1, . . ., i; i = 1, 2, . . ., N .Then P r(n s0 = ν) = P N ν , ν = 0, 1, . . ., N .Proposition 1. Keep the same assumptions as above and q i = 1 − p i .Then A proof of proposition 1 can be found in Tillé (2006) and Olofsson (2011).
Note 2. Proposition 1 shows that we can calculate P i 0 , P i 1 , . . ., P i i by using P i−1 0 , P i−1 1 , . . ., P i−1 i−1 with initial values P 1 0 = q 1 and P 1 1 = p 1 .By recursive calculation with respect to i, we can finally obtain Lemma 2. Given the assumptions as in Lemma 1, then Lemma 1 and Lemma 2 are proved in the appendix.Now we present theorems 1 and 2 which the core results of this paper.
Theorem 1.Under the AP-design, the algorithms for calculating the first-order inclusion probabilities can be written as where Theorem 2. Under the AP-design, the analytical formula of the second-order inclusion probabilities is as follows Revista Colombiana de Estadística 37 (2014) 127-140 where From Theorem 1 and Theorem 2, we can find that the problem to solve π k and π kl may be switched into solving a series of P r(n s0 = ν) = P N ν , ν = 0, 1, . . ., N .We can recursively calculate P N ν by using Proposition 1. Proofs of Theorems 1 and 2 can be found in the appendix.

Numerical Examples
The statistical literature contains several proposals for methods generating fixed-size without-replacement πps sampling designs.In practice, πps designs with sample size n = 2 are widely used and fully studied.Due to the difficulties in the implementation and the complexity in computing of inclusion probabilities, application of πps designs with sample size n > 2 is relatively less.Instead, approximate πps designs such as the Conditional Poisson design (CP), two-phase πps sampling design (2Pπps), Rosén (1997)'s Pareto design and Zaizai et al. (2013)'s design (AP) have been used.However, there are fast and fairly simple implementations of strict πps designs such as systematic πps sampling.Unfortunately, its variance estimation is cumbersome.

A Review of some Sampling Designs
Poisson sampling is a method to generate a sample s, which has a random size, from a finite population U consisting of N individuals.Each individual i in the population has a predetermined probability p i and is included in the sample s.A Poisson sample may be obtained by using N independent Bernoulli trials to determine whether the individual under consideration is to be included in the sample s or not.The first-order inclusion probabilities of the individuals are equal to the target inclusion probabilities under the Poisson sampling design.A major drawback with the Poisson design is the randomness of the sample size which has urged statisticians to develop sampling schemes providing fixed size πps designs.
Conditional Poisson sampling (CP), also called rejective sampling or maximum entropy sampling, was first introduced by Hájek (1964).It is a fixed size sampling design, without replacement, on a finite population, with unequal inclusion probabilities among the units of the population.It was called rejective sampling because Hájek's implementation amounts to drawing samples with the Poisson sampling design which has a random size until the desired size is chosen.In fact, one can also obtain the conditional Poisson design by drawing samples, with replacement, using a multinomial sampling design and rejecting the samples which hold some units of the population more than one.Laitila & Olofsson (2011) proposed a new method to generate a sample with fixed size and inclusion probabilities proportional to size, viz. the 2Pπps design based on a two-phase approach.Consider a population U of N units.For sample generation, let n be the predetermined sample size and assume target inclusion probabilities, p k , to be proportional to a size variable, x k , known for all k ∈ U .The 2Pπps sampling scheme is as follows: 1. Draw a sample, s 0 , using a Poisson design with p ak ∝ x k as inclusion probabilities, with expected sample size E(n s0 ) = U p ak ≥ n.
2. If the size of s 0 is greater than or equal to n, then proceed to step 3 and let s a = s 0 .If not, repeat step 1.
3. From the sampled set, s a , draw a sample s of size n using an SRSWOR design.
It was shown that the first-order inclusion probabilities of the 2Pπps design are asymptotically equal to the target inclusion probabilities.But the 2Pπps design is still a rejective sampling design.
Pareto sampling was introduced by Rosén (1997aRosén ( , 1997b)).It is a simple method to get a fixed size πps sample though with inclusion probabilities only approximately as desired, which can be described as follows: firstly independent random numbers(U 1 , . . ., U N ) from U (0, 1) are generated, one value for each population unit (i = 1, . . ., N ).Then Pareto distributed ranking variables , where p i is the targeted inclusion probability for unit i and p i = n, are calculated.Those n units with the smallest Q-values are selected as a πps sample with fixed size n.Bondesson, Traat & Lundqvist (2006) obtained the formulas of first-order and second-order inclusion probabilities for the Pareto design.The true inclusion probabilities only agree with the target inclusion probabilities approximately.Zaizai et al. (2013) presented an alternative πps design (AP) as Section 1.The AP design is a non-rejective sampling design.

Examples
Since the Horvitz-Thompson estimators under the AP design, CP design and (2Pπps) design are unbiased, their precision is measured by the variance.However, the ratio estimators mentioned by Kadilar & Cingi (2004) and the traditional ratio estimator are biased, so their precision is measured by mean square error (MSE).In the following section, the estimators and their variances(or MSEs) under the AP design, CP design, 2Pπps design and SRSWOR are studied using three data sets earlier used in the literature.In this paper the AP design and other designs are applied to three populations in which y-values are known, so these variances or MSEs can be calculated exactly.This is only to show the performance of various designs.In practice the y-values in an interested population will be unknown, the variance or MSE of an estimator cannot be obtained, but can be estimated from a sample.Then, the precision is measured by estimation of variance or MSE.As far as the Horvitz-Thompson estimators under the AP design, CP design and (2Pπps) design, the Yates-Grundy variance estimators can be used as the precision.It is unbiased estimator for the true variance.
Example 1.We have used the data of Kadilar & Cingi (2004) in this section.However, we have considered the data of only Aegean Region of Turkey, as we are interested in unequal probabilities sampling with fixed sample size here.We have applied our proposed method and other unequal probabilities sampling methods, such as the 2Pπps sampling design and the CP sampling design on the data of apple production amount (as interest of variate y) and number of apple trees (as auxiliary variate x) in 105 villages of Aegean Region in 1999 (Source: Institute of Statistics, Republic of Turkey).
For a large size population, we may divide the population into three strata according to size of X i , and the AP-design can be used to get a sample of fixed size within each stratum independently.Let the population be stratified into 3 strata, where sample sizes and population sizes are (N 1 , n 1 ) = (41, 8),(N 2 , n 2 ) = (41, 8) and (N 3 , n 3 ) = (23, 4) respectively.Finally we use stratification sampling technique to build estimation.The relative differences of the inclusion probabilities for the AP-design ,2Pπps-design and CP-design with respect to target inclusion probabilities can be calculated in each stratum respectively.

CP
HT are easily computed, respectively.As mentioned previously, it is of interest to compare the efficiency of using alternative sampling schemes, for example, the 2Pπps design, AP design, CP design and SRSWOR design.We conclude that the proposed method is more efficient than the 2Pπps design and SRSWOR design.The empirical comparisons included in Table 1 are of interest.It is noticed that the efficiency of the AP design is almost identical to the 2Pπps design, but it is significantly higher than ratio estimators of the SRSWOR design mentioned by Kadilar & Cingi (2004) (Note: The MSEs here are different from the original literature, because the original literature has 106 datum, one of which is a invalid data and is removed, this article has 105 datum).Although the CP design is more efficient than the AP design, the CP design is not easy to implement.The some important advantages of the proposed sampling design are not only its implementation as non-rejective, but also its inclusion probabilities that can be calculated recursively.Note 3. The AP design still is not an exact πps design.The inclusion probabilities will be larger than intended probabilities for small inclusion probabilities and smaller than intended probabilities for large inclusion probabilities.At the extreme case there will be risks of not selecting units which are intended to be taken with probability 1, and of selecting units with intended inclusion probability 0.
Example 2. To analyze the performance of the suggested method in comparison to other methods considered in this paper, a natural population data set from the literature (Singh 1967) is being considered.The descriptions of these populations are given below.
y: Percentage of hives affected by disease.
x: January average temperature.
We shall consider drawing a sample according to the AP design previously developed.The exact and desired first-order inclusion probabilities are listed in Table 2 and the second-order inclusion probabilities are in Table 3.Then, once we get an AP sample, we can build estimator Y AP HT of population mean Y , and the variance of Y AP HT is easily computed.From Table 4, we see that the proposed method has a smaller variance than the CP design.Although the variance of the 2Pπps design is slightly smaller than proposed method, the AP design is easy to implement and generally applicable.In general, the AP design is extremely efficient and it is significantly higher than ratio estimators of the SRSWOR design mentioned by Kadilar & Cingi (2004).
Example 3. The data we considered here is from 35 Scottish farms in Table 5.Let sample size n be equal to 8. The descriptions of these populations are given below (Asok & Sukhatme 1976, page 916).y: Acreage under oats in 1957.
x: Recorded acreage of crops and grass for 1947.
The exact first-order and second-order inclusion probabilities for the AP design, 2Pπps design and CP design are calculated.In this example, the efficiencies for the  AP design, CP design and 2Pπps design are compared.From the results of Table 6, we conclude that the AP design is more efficient than the CP design.Since the CP design and 2Pπ ps design are far more complex than the AP design, the proposed design is significantly better than the CP design and 2Pπ ps design and it is significantly higher than ratio estimators of the SRSWOR design mentioned by Kadilar & Cingi (2004).
A primary purpose of this paper is to extend the theory of finite sampling with unequal probabilities.Although the study variable y of the data presented in Table 5 is often unknown in the real world, they do indicate that substantial reductions in variance can be obtained by using the AP design (Table 1, 4 6).It is the opinion of the authors that the technique suggested in this paper may be an implemented utility in the real world for unknown study variable y.Hence, the proposed method has potential application value.

Conclusions
We have shown that it is feasible to calculate the first-order and second-order inclusion probabilities in the AP design.Expressions for the third-order and fourthorder inclusion probabilities under the AP sampling design can be obtained.The proofs are similar to that of π k .
This study shows that the AP design possesses approximately the same efficiency with the CP design and 2Pπps design.But the AP design is a non-rejective sampling design and very close to the strict πps design.First and second-order inclusion probabilities can be accurately calculated by using the formula given in this paper.From these numerical illustrations, it is deduced that there is considerable gain in efficiency by using the Horvitz-Thompson estimator under the AP design over the other ratio-type estimators mentioned.
National Natural Science Foundation of China (11161031,11361036), Natural Science Foundation of Inner Mongolia (2013MS0108) and Doctoral Fund of Ministry of Education of China (20131514110005).

Table 1 :
The variances of the AP design, 2P πps design, CP design with n = 20, and MSE of SRSWOR ratio estimators in example 1. Aegean Region data.

Table 2 :
The raw data and the first-order inclusion probabilities for the AP design ,the 2Pπps design, the CP design and Pareto design, N = 10, n = 4 in example 2. Single data.

Table 3 :
The second-order inclusion probabilities π AP ij for the AP design, N = 10, n = 4 in example 2. Single data.

Table 4 :
The variances of the AP design, 2Pπps design, CP design and Pareto design with n = 4 and MSE of SRSWOR ratio estimators in example 2. Single data.

Table 5 :
Recorded Acreage of Crops and Grass for 1947 and Acreage Under Oats in 1957 for 35 Farms in Orkney in example 3. Scottish forms data.

Table 6 :
The variances of the AP design, 2Pπps design, CP design with n = 8 and MSE of SRSWOR ratio estimators in example 3. Scottish forms data.