On the non‐recursive implementation of multistage sampling without replacement

Graphical abstract


Method details
In multistage sampling without replacement, a general method for unbiased variance estimation for the expansion estimator ( i.e. the extension of the Horvitz-Thompson estimator) suitable for computer implementation was first proposed by Durbin [1] , improved by Raj [2] and extended by Rao [3] . For instance, with two-stage sampling, this method consists in estimating the first-and secondstage variance components separately and combining them to form an unbiased estimator of overall sampling variance ( e.g. [ 4 , sec. 4.3.2] or [ 5 , sec. 7.2.2]). It follows that both the expansion estimator and the sampling variance estimator may be defined recursively which means that the estimator symbol appears on both sides of its definition ( e.g. [ 5 , p. 158]).
Generally speaking, there is a dichotomy between the recursive definition of a calculation, which is implicit, simple and quick to implement on a computer, and its iterative counterpart, usually more complicated to work out. Owing to the recursive definition of the hierarchical nesting of sampling units and of the sampling variance, it follows that the simplest and most direct way to implement multistage sampling on a computer is to rely on a full recursive implementation, based on a recursively defined data structure and on recursive programming (for recursive programming the reader is referred to Rohl [6] or Rubio-Sánchez [7] ).
In imperative programming languages such as FORTRAN, Pascal, C/C++ and so on, the appropriate recursive data structure (a tree) is constructed using dynamically allocated pointers. Recursion is supported by allowing a function (or procedure) to call itself. In this case, recursive calls are automatically managed at the call stack level. For turning a recursive function (or procedure) into an iterative one, one approach is simulating recursion using a manual stack in lieu of relying on the call stack. By using pointers for tree representation and a manual stack for tree traversal we obtain a mixed implementation, that is, recursive for the data structure but iterative for tree traversal (by simulating the recursion). This is precisely what has been proposed by Bellhouse [8 , 9] and Rylett and Bellhouse [10] .
Multistage sampling is also implemented "recursively" in softwares used by some statistical agencies. This is the case for instance of POULPE, a SAS macro program in use at INSEE in France [11,12] . We guess that the recursion in POULPE is simulated, in a way that has not, however, been detailed in the literature. The main R packages devoted to probability sampling are 'sampling' and 'survey' [13] . In the 'sampling' package [14] , to our knowledge variance estimation for multistage sampling is not handled. Conversely, package 'survey' [15,16] estimates the sampling variance by relying on recursive programming (the function 'multistage' calls itself). As far as we know this is also the case for the 'ReGenesees' package which inherits from 'survey' regarding multistage sampling [17] .
In this technical article we show how multistage sampling can be implemented in a full iterative way, that is without simulating the recursion nor using pointers, by relying on simple (flat) array data structures. Moreover, the solution we propose does not require maintaining in memory all the tree information for performing the statistical computations.
The full-iterative computer implementation detailed here should in particular be useful for handling very large multistage sampling configurations and also for Monte Carlo simulations. It should also be of some pedagogical interest for the in-depth teaching of multistage sampling for a graduate course in computational statistics (even if we have to recognize that this is a somewhat specific technical topic).
Hence, for multistage sampling without replacement, the aim of the present article is to provide (1) a complete recurrent formulation thanks to an appropriate notation and (2) a comprehensive set of algorithms for full-iterative implementation, ensuring efficient statistical computations. To our knowledge, these two technical points have not to date been documented in the statistical computing literature. We focus here on total estimation based on the Horvitz-Thompson estimator, and fixed size designs at all stages. Other estimators can be used without questioning the general organization of the algorithms, and variable size designs can be accommodated in the general case by referring to the Horvitz-Thompson variance instead of the Sen-Yates-Grundy variance (see for instance [ 18 , sec. 7.3] or [ 4 , sec. 4.3.2]).

Recurrence formulas
In this section, we introduce basic notation conventions that we think appropriate to write recurrence formulas for multistage sampling. Then we express multistage sampling and inclusion probabilities with our notations, and we provide the formulas for estimation: (i) in the general case, that is possibly with unequal inclusion probabilities for all designs, at all stages; (ii) for the particular case of equal probabilities for all designs, at all stages; and finally (iii) for self-weighted sampling.

Notations
Throughout the formulas section, indices will be noted by a lower case letter, the set of which will be noted by the same letter, but in upper case. For instance, we will have the set of indices I = { i } N i =1 and we will be able to describe the elements indicated by I simply by using set membership, that is i ∈ I. For the sake of notation simplicity, in what follows we will use i < j ∈ I in lieu of i < j, i, j ∈ I.
At each level 0 ≤ ≤ L of the tree corresponding to the nested structure of multistage sampling units, the population is denoted U = { u ,i } i ∈ I where I = { 1 , . . . , N } is a set of indices ( i.e. the units labels) as in Daffin et al. [19 , p. 512]. The root is at level = 0 in the tree, and will be denoted as the singleton set U 0 = { u 0 , 1 } . At levels 0 ≤ < L , sampling units are populations of units from level + 1 . At last level = L the sampling units are elementary units (that is units which are not further decomposed) which constitute a population of size N L = N. Considering clusters of elementary units as ultimate sampling units is merely introducing a level with 100% sampling fraction within the selected clusters. We will denote in a recurrent way: with U ,i the set of sampling units at level which constitute the population corresponding to the parent unit u −1 ,i and N ,i the cardinality of U ,i . The index sequence I is partitioned into subsequences corresponding to the N −1 parent units, that is:

Multistage sampling design
At stage = 0 there is no sampling since by definition the unit u 0 , 1 has the probability π 0 , 1 = 1 to be included so s 0 = { 1 } . At stages 1 ≤ ≤ L one can apply a sampling design d ,i to each label subsequence I ,i associated with the units of the sub-population U ,i . This sampling will be denoted by referring to the function s ,i = d ,i (I ,i ) with s ,i a sample of size n ,i ( 1 < n ,i ≤ N ,i ). The multistage sampling design may be written recurrently as: Once the L -stage sampling design is applied, we obtain a sample s L = s including n L = n elementary units for gathering values of the variable of interest y .
The recurrent formulation of multistage sampling allows any design d ,i to be used. Theoretically, the definition of designs at level > 1 may depend on the sample s −1 . Moreover, samplings at level > 1 among labels I ,i and I , j for i = j ∈ s −1 are not necessarily independent. From this generality it follows that this class of sampling designs may be rather complex. In practice, only a sub-class of multistage sampling designs is used, by imposing both invariance and independence. The invariance property implies that the same design d ,i will be applied to I ,i whatever the sample s −1 which includes the parent unit u −1 ,i . The independence property implies that sampling inside each set of labels belonging to the same level will be applied independently. For more details about invariance and independence, the reader is referred to Särndal et al. [4] .

Inclusion probabilities
With each design d ,i is associated a vector π ,i containing N ,i first-order inclusion probabilities : and a matrix ,i containing N ,i × N ,i second-order (or joint) inclusion probabilities : π , ja i · · · π , j j · · · π , jb i . . .
For U we may denote first-order inclusion probabilities as the sequence of the probability vectors for all sampling designs at level , and the resulting vector is indexed by indices from I , that is : At level , we denote π * ,i the overall (or final) inclusion probability for unit u ,i resulting from the application of sampling designs from level 1 to . As a result of both invariance and independence for the sampling designs, we get the recurrence : In the literature devoted to multistage sampling, this relation is sometimes known as the selection equation [20,21] .

Unequal probability sampling designs at all stages
In this section we consider the recurrence formulation for the general case, allowing unequal inclusions probabilities at all stages, for all sampling designs.
The total may be estimated using an expansion estimator according to the following recurrence: The total estimator is then Y 0 , 1 . Its sampling variance may be written using the recurrence: The sampling variance is then V 0 , 1 . When joint inclusion probabilities can be managed (that is, both calculated and stored), and because we consider only fixed-size designs, the term V * −1 ,i may be the Sen-Yates-Grundy variance: The sampling variance estimator may be written using the recurrence: The sampling variance estimator is then V 0 , 1 . The term V * −1 ,i may be the Sen-Yates-Grundy unbiased variance estimator: with π , jk > 0 . At least for high entropy designs, when joint inclusion probabilities cannot be managed, a (biased) variance estimator may be used instead of (7) (see the references cited by Aubry [22] ). One possibility is to write [5,23] : and When all sampling designs are PPSWOR ( Probability Proportional to Size WithOut Replacement ), then the size is computed at all levels 1 ≤ < L from the values defined for the population of elementary units. For i ∈ I and = L , we denote x i the value for the size-variable, associated with the elementary unit u ,i . The total x i , can be decomposed by the following recurrence: The total is then X 0 , 1 .

Equal probability sampling designs at all stages
When every population U ,i is sampled using SRSWOR ( Simple Random Sampling WithOut Replacement ) of size n ,i out of N ,i ( 1 < n ,i ≤ N ,i ), then formulas become considerably simpler. Indeed, up to second order, inclusion probabilities are simply (for j = k ∈ I ,i ): It follows that the recurrence for the total estimator is now written: The recurrence for sampling variance is now: with and accordingly, the recurrence for the unbiased sampling variance estimator is: with

Self-weighted multistage sampling
With our notations, the weight of an elementary unit u ,i is w ,i = 1 /π * ,i for i ∈ I and = L . A design is self-weighted when all elementary units have the same weight that is, for = L , when π * For the sake of simplicity, this common inclusion probability will be denoted π * .
Let X ,i be the size of the sampling unit u ,i in number of underlying elementary units. We have the following definition by recurrence: Obtaining a self-weighted L -stage sampling design requires the selection of samples of constant size m among units U ,i ( i ∈ I −1 ) for = 1 , . . . , L with a probability proportional to their size, that is: Applying recurrence (2) , it follows that the constant overall inclusion probability for elementary units is equal to π * = n/N with: At last stage L , expression (16) shows that it is necessary to include elementary units with the following probability: which does not depend on j. In the most general case, for = 1 , . . . , L , cardinalities N ,i ( i ∈ I −1 ) differ, and therefore sizes X , j ( j ∈ I ,i ) also differ. In this case, units at level 1 ≤ < L need to be selected using PPSWOR with inclusion probabilities (16) whilst elementary units have to be selected using SRSWOR with probabilities (18) . Although the last stage is SRSWOR we rely here on the general formulation.
In the simplest case, for = 1 , . . . , L cardinalities are equal, that is, M = N ,i ( i ∈ I −1 ), and therefore sizes X , j = N/N ( j ∈ I ,i ) are also equal and the same arises for first-order inclusion probabilities Applying recurrence (2) it follows, as expected: In this limiting case, sampling units are selected by SRSWOR at each stage with probabilities π , j = m /M .

Algorithms
In this section we provide algorithms for full-iterative computer implementation of multistage sampling. The principle that guided algorithm elaboration is that the tree corresponding to the hierarchical nesting of sampling units is represented by arrays of which only a part remains useful for statistical computations. Hence, the major part of the required computer memory is used only temporarily and can be released as soon as the preliminary steps are achieved. Moreover, no sparse arrays are used so that no memory places are wasted.
Although we do not provide a theoretical analysis of time and space complexity of the proposed algorithms -a topic far beyond the scope of this paper -we are confident in their efficiency, especially when implemented in a compiled programming language usually used for statistical computing, such as C/C++ or FORTRAN (possibly for interoperating with the R programming environment).

Notations
Algorithms (procedures or functions) are given as pseudocode, with classical control structures. There are three possible passing modes for a procedure parameter depending on whether it is only on input ("in" mode; its value remains unchanged), only on output ("out" mode; its value is defined within the procedure) or both on input/output ("in-out" mode; its value is modified within the procedure). We put a down-arrow, an up-arrow or a down-up-arrow above a parameter's symbol for specifying in, out, and in-out passing modes, respectively. By definition, all parameters of a function are in input-only passing mode and therefore down-arrows will be omitted.

Hierarchical nesting of units
The hierarchical nesting of units leads to a tree whose nodes correspond to sampling units (primary sampling units, secondary sampling units and so forth until elementary sampling units). Without loss of generality, we illustrate the tree structure involved by multistage sampling with a small example for L = 3 ( Fig. 2 ).
In practice the tree information is given by the data. All the needed information is a label for each unit for levels = 1 , . . . , L , as well as the membership of each unit of level within a unit of level − 1 . The data corresponding to our example are illustrated in Fig. 3 (a).
The same data can be stored as shown in Fig. 3 (b), with an access array M containing indices of arrays N and O for delimiting the levels, array N containing the unit labels, and array O containing the membership information, that is, the label of the parent unit.
For performing statistical computations we deal with in this article, at first glance it may seem necessary to access in memory the tree path for all nodes. This is for instance the case for a recursive implementation such as that of Bellhouse [8 , 9] and Rylett and Bellhouse [10] . In this paper, not only do we not convert the data into a recursive data structure for representing the corresponding tree (example in Fig. 2 ), but moreover we propose a solution where paths are used only at a preliminary step.
We manipulate a set of four arrays: (i) A containing the level of each node (between 0 for the root and L for the leafs); (ii) B containing the paths for each node; (iii) C containing the number of  nodes' children; (iv) D for efficient node browsing at a given level. Sorting the arrays A , B and C is a prerequisite for subsequent treatments.
The conversion from arrays M and O describing the dataset is ensured by Algorithm 1 which returns (unsorted) arrays A and B . Note that the 0-filling step at the end of the algorithm is irrelevant when computer implementation is done in a programming language where arrays are automatically initialized with 0 or when a specific syntax exists for doing such initialization in fewer instructions. Of course it is possible to convert (sorted) arrays A , B describing the tree into a dataset using the reverse conversion Algorithm 2 which returns arrays M, N and O .
Algorithm 1 Converting a dataset to the corresponding tree. Information contained in arrays A and B allows generating a reordering index T according to Algorithm 3 . In the latter, note that the choice of the sorting algorithm is left to the discretion of the reader. Thanks to T , arrays A and B can be reordered in place using Algorithm 4 . For this procedure, an upper bound ( UB ) lower than the lower bound ( LB ) for the number of columns indicates that the array to be reordered ( G ) has one dimension only. Once the reordering of A and B is achieved, access array D is built using Algorithm 5 . Note that as soon as D is built, regarding the total number of nodes N1 Algorithm 2 Converting a tree to the corresponding dataset.  for i = 1 to N1 do defining a key for the paths 3: for = 1 to L do root level is pointless for defining the key 5: [ sorting R by increasing order 10: elements in E are permuted at the same time as those of R ] 11: for i = 1 to N1 do 12: i ← 0 sentinel value 6: for j = k to N do 7: if (¬ I j ) then 8: I j ← TRUE marked as used 9: if ( j = T j ) then 10: i ← j index for begining a permutation cycle 11: k ← i + 1 12: [ exit from the current loop ] 13: end if 14: end if 15: end for j 16: if (i = 0) then we have of course N1 = D L, 2 . Hence, when D is a parameter, it is not necessary to also pass N1 . Finally, the number of children for each node can be computed according to Algorithm 6 . For our example, the set of arrays A , B , C and D is illustrated in Fig. 3 (c). Note that arrays A and B are necessary for obtaining C in proper order and for building D according to the simple Algorithm 5 , but later on they are no longer useful for statistical computations and corresponding memory may be released.
root level is trivial 3: p ← 1 current level 4: k ← D 2 , 1 lower bound for browsing nodes at the next level 5: for i = 2 to D L −1 , 2 do browsing nodes (after the root and before the leafs) 6: ← A i node level 7: c ← B i, 8: if ( = p) then 9: p ← changing the current level 10: k ← D +1 , 1 lower bound for browsing nodes at the next level 11: end if 12: N ← 0 13: for j = k to D +1 , 2 do browsing nodes at the next level 14: if (B j, = c) then 15: N ← N + 1 16: else 17: k ← j updating the lower bound 18: [ exit from the current loop ] 19: end if 20: end for j 21: C i ← N 22: end for i 23: for i = D L, 1 to D L, 2 do browsing leafs 24: C i ← 0 leaf level is trivial 25: end for i 26: end procedure For testing or Monte Carlo simulation purpose it is convenient to be able to automatically generate the hierarchical nesting of units for any number of stages. This is what Algorithm 7 does. In this paper, this algorithm is the only one which simulates the recursion associated with a tree. According to the number of stages L and an interval for random population sizes ( MIN , MAX ), it returns the total number of nodes in the tree ( N1 ), and arrays A , B and C. Next, Algorithms 3 and 4 have to be used before executing Algorithm 5 . After that, as previously, memory for arrays A and B may be released. ← L + 1 sentinel value 14: for m = 0 to L do 15: if (r m = 0) then 16: ← m 17: [ exit from the current loop ] 18: end if 19: end for m 20: i ← i + 1 node being developed

Multistage sampling without replacement
Let S be a one-dimension array containing samples sizes associated with nodes at levels 1 ≤ ≤ L . Multistage sampling without replacement as formalized by recurrence formula (1) may be implemented using Algorithm 8 which generates an indicator array I. As a great diversity of sampling designs may be used, details are left to the discretion of the reader. When PPSWOR is used, array TX must be computed beforehand using Algorithm 9 -which is a mere translation of recurrence formula (10) -where X is a one-dimension array corresponding to a size variable x .
for the root 3: for i = 2 to D L, 2 do 4: N1] is the sample indicator for multistage sampling 5: end for i 6: h ← 1 7: for = 0 downto L − 1 do going down the tree from the root 8: for k = D , 1 to D , 2 do browsing the nodes at level 9: N ← C k current population size 10: if I k then belongs to the sample 11: n ← S k current sample size 12: [ Apply the sampling design for the current population, generating 13: the current sample indicator J [1 : N] , possibly using TX ] 14: for i = 1 to N do 15: end for i 17: end if 18: h ← h + N 19: end for k 20: end for 21: end procedure Algorithm 9 Computing the total for all nodes in the tree. for k = D , 1 to D , 2 do browsing the nodes at level 10: N ← C k current population size 11: for m = 1 to N do 12: i ← i + 1 13: end for m 15: end for k 16: end for 17: end procedure

Statistical computations for SRSWOR at all stages
Let Y be the array corresponding to the variable of interest y . In a Monte Carlo simulation, as y is known for the totality of the elementary units, Algorithm 9 applied to Y returns T Y . Then it is possible to apply recurrence formula (13) , translated into Algorithm 10 .
In practice Y contains missing values for the elementary units not in the final sample. Recurrence formulas (12) and (14) translate into Algorithms 11 and 12 , respectively, where the array I corresponds to the sample indicator. n ← S k current sample size 10: updating formula 19:   Note that, in Algorithms 10 and 12 , one-pass computation of the sum of squared deviations ( ss ) is performed using the numerically stable updating formula proposed by Youngs and Cramer [24] (see also [ 25 , sec. 2.4.4]).

Statistical computations for PPSWOR at all stages
Let X be the one-dimension array corresponding to size variable x , positively correlated with the variable of interest y . First, Algorithm 9 is applied to X for returning TX . Next, unequal inclusion probabilities have to be stored. First-order inclusion probabilities are obviously stored in a onedimension array ( P ). For computing overall first-order inclusion probabilities ( P * ), recurrence formula (2) translates into Algorithm 13 . for k = D , 1 to D , 2 do browsing the nodes at level 8: N ← C k current population size 9: if I k then belongs to the sample 10: n ← 0 11: for m = 1 to N do 14: i ← i + 1 15: if I i then belongs to the sample 16: n ← n + 1 17: if (n = 1) then initialization 18: t ← T Y i total 19: ss ← 0 20: else 22: yi ← T Y i total 23: t ← t + yi 24: for k = D , 1 to D , 2 do browsing the nodes at level 6: N ← C k number of children for node k 7: for j = 1 to N do 8: i ← i + 1 9: end for j 11: end for k 12: end for 13: end procedure When joint inclusion probabilities can be managed, they have to be stored in an efficient way, that is in a one-dimension array. For the complete set of units involved in a multistage sampling situation, the matrix of joint inclusion probabilities is block diagonal. Owing to the symmetry of joint inclusion probabilities and to the block diagonal structure, it turns out that a very efficient storage requires only two one-dimension arrays: (i) a small one ( F ) for indicating submatrix changes; (ii) a larger one ( PP ) to store all joint inclusion probabilities (without duplication). These arrays are generated using Algorithm 14 . When joint inclusion probabilities cannot be managed, F and PP arrays are irrelevant, and Algorithm 14 has to be simplified accordingly (this straightforward modification if left to the reader). Since a great diversity of sampling designs may be used, parameters required for computing matrices are omitted and details are left to the discretion of the reader. Finally, retrieving joint inclusion probabilities needs Algorithm 15 which, in turn, relies on Algorithm 16 .

Algorithm 14
Storage of first-and second-order inclusion probabilities. for k = D , 1 to D , 2 do browsing the nodes at level 6: i ← i + 1 7: number of distinct pairs of units 10: p ← p + m 11: end for k 12: end for 13: N2 ← p − 1 number of second-order probabilities to be stored 14: m ← 1 15: for the root node 16: n ← 0 17: for = 0 to L − 1 do going down the tree from the root 18: for k = D , 1 to D , 2 do browsing the nodes at level 19: [ Compute the matrix for current population and design, 20: possibly using TX ] 21: When Monte Carlo simulations are performed, computing the theoretical sampling variance in the general case (that is, with unequal inclusion probabilities) requires recurrence formula (4) translated into Algorithm 17 . Exact computation needs formula (5) (Sen-Yates-Grundy variance) which translates into Algorithm 18 and returns array V required by Algorithm 17 .

Algorithm 15
Returning the joint inclusion probability for i = j in the subpopulation of size N corresponding to the node k . if (i > 1) then 11: for m = 1 to i − 1 do 12: e ← e + n − m 13: end for m 14: end if 15: e ← e + j − i 16: [ return e ] 17: end function Unbiased estimation of the population total is achieved using recurrence formula (3) , translated into Algorithm 19 . Estimating the sampling variance requires recurrence formula (6) translated into Algorithm 20 . When joint inclusion probabilities can be managed, node variance estimates can be computed using formula (7) , translated into Algorithm 21 returning V which is a parameter of Algorithm 20 . Otherwise, one possibility is to use approximation (8) translated into Algorithm 22 , which only requires first-order inclusion probabilities.

Method validation
This section is aimed at providing both numerical illustration and control for the validity of statistical computation algorithms proposed in this article.
Generating artificial multistage sampling configurations for any number of stages ( Algorithm 7 ) is a very convenient way for applying and validating all algorithms proposed in this article on very large examples. Nevertheless, there is no generality loss by referring to the small example used throughout this paper, with L = 3 stages and MIN = 3 or MAX = 4 children for each node at levels 1 ≤ < L .
Self-weighted multistage sampling design is a good candidate for testing purpose since it involves unequal probability sampling. Besides, the self-weighting property is very easy to check thanks to selection equation (2) ; an error in this regard would immediately indicate an implementation error. As PPSWOR design we refer here to the Hanurav-Vijayan procedure which is appropriate, provided it is correctly implemented [22] . For the sake of completeness, we also provide the explicit expression of the theoretical sampling variance (20) , which translates into Algorithm 23 . This example shows how cumbersome the expression of the variance becomes when the number of stages increases. Terms s 1 , 1 , s 1 , 2 , s 2 , 1 and s 2 , 2 are indicated to easily identify their counterparts in Algorithm 23 .
Algorithm 17 Theoretical sampling variance in the general case. for k = D , 1 to D , 2 do browsing the nodes at level 8: N ← C k current population size 9: s ← 0 10: for m = 1 to N do 11: i ← i + 1 12: s ← s + v i /P i 13: end for m 14: v k ← V k + s 15: end for k 16: end for 17: the theoretical sampling variance is at the root level 18: end function Algorithm 18 Computing node variances (Sen-Yates-Grundy).

24: end function
Theoretical first-order ( P ) and overall ( P * ) inclusion probabilities as well as estimates of the latter ( P * ) obtained after 10 7 replications of sampling [ 22 , sec. 4.1] are given in Fig. 4 (a). An example of sample is given in column I (sample indicator) and the result of procedure 19 appears in column HTE (the total estimate is Y = 15 . 655 ). The sampling variance computed by function 17 is V ( Y ) = 7 . 875 . As expected this is also the value obtained using Algorithm 23 . We can estimate the sampling variance using either Sen-Yates-Grundy variance estimator (7) (case 1) or approximation (8) (case 2). In the former case, computations require second-order inclusion probabilities given in Fig. 4 (b). Sampling variance estimates for the sample given in 4 (a) are V ( Y ) = 18 . 945 for case 1 and V ( Y ) = 13 . 931 for case 2. As expected, the Monte Carlo estimate for the sampling variance is close to its theoretical value since we obtain 7.870. Finally, the Monte Carlo estimate for the expectation of V ( Y ) is 7.873 for case 1 -again very close to the theoretical value -and 7.652 for case 2, which indicates a slight underestimation due to the approximation, and illustrates that estimator (8) tends to be down-biased.

Additional information
In practice, two-stage and three-stage sampling designs are rather common, and four-stage sampling is not exceptional. Conversely, multistage sampling beyond five stages seems not to be used.
When sampling geographical space, multistage sampling is very useful from a logistical point of view for reducing the cost of travelling between the elementary spatial sampling units. It follows that, in environmental sciences for instance, multistage sampling is part of the toolbox for designing largescale natural resource surveys, often in combination with stratification. The multistage sampling of nested spatial sampling units also meets the concern of being able to study a phenomenon at several spatial scales.
The data collected on the elementary units are not always devoted only to estimating finite population parameters, and may rather be seen as all-purpose data. As such, their analysis and modeling should be easy, which implies they all have the same weight, that is, the same overall inclusion probability. Otherwise, specific versions of the usual statistics methods should be used. It is always risky to communicate data that one cannot process correctly using classical statistics methods. Hence, for avoiding this risk, it is advisable that the multistage sampling design be a self-weighting one. It is worth noting that self-weighted multistage sampling also allows balancing workload among the high-level units, which, again, may be interesting from a logistical point of view.

Algorithm 22
Computing approximate variance estimates.

Declaration of Competing Interest
The author declare that he has no known competing for financial interests or personal relationships that could have appeared to influence the work reported in this paper.