Quasi-systematic sampling from a continuous population
Introduction
We propose to use a specific family of point processes to select samples for the purpose of estimating the mean or the integral of a function of a real variable. We draw a parallel with sampling designs which are themselves point processes on finite spaces. Systematic sampling is widely used in finite population. It has been introduced by Madow and Madow (1944) and Madow (1949). It is easily implemented and, by spreading the sample over the population, it results in precise mean and total estimators when the variable of interest is similar for neighboring units. The main drawback of systematic sampling is that most of the unit joint inclusion probabilities are null, making it impossible to estimate the variance of the Horvitz–Thompson estimator without bias (see Horvitz and Thompson, 1952).
The aim of this paper is to develop a method that is a compromise between a base point process such as the Poisson process or the binomial process and the systematic process for sample selections in a continuous population. A similar objective is pursued in Breidt (1995) in a finite population setting supported by a superpopulation model. Breidt (1995) considers one-per-stratum sampling designs from a population that is split into strata of successive units where divides the population size. He introduces a class of sampling procedures that encompasses systematic sampling with constant rate and simple random sampling of one unit per stratum.
Point processes, that we refer to as sampling processes in the context of sampling, are the subject of a vast literature (see for example Daley and Vere-Jones, 2002, Daley and Vere-Jones, 2008, and references therein). Cordy (1993) and Deville (1989) introduced independently the continuous analogue to the Horvitz–Thompson estimator for infinite population sampling. Different communities have studied point processes: mathematical physicists, probabilists and statisticians. A detailed state of the art in the study and simulation of some complex point processes can be found in Møller and Waagepetersen, 2003, Møller and Waagepetersen, 2007. Many simulation methods for point processes are implemented in the package spatstat (Baddeley and Turner, 2005).
We introduce a new family of sampling methods that enable to continuously tune the distance between units in the sample. These processes allow to obtain small probabilities of jointly selecting neighboring units. These sampling methods are particularly efficient when the function of interest is smooth. Moreover, joint inclusion densities are positive and it is possible to estimate the sampling variance without bias.
The paper is organized as follows: in Section 2, we give a definition of sampling processes in continuous populations and we define the Poisson process, the binomial process and the systematic process. Important results of renewal process theory are recalled in Section 3. In Section 4, we define the systematic-Poisson and the systematic–binomial processes with tuning parameter , and compute the joint densities. Section 5 contains proofs for the asymptotic processes when tends to infinity. Simulations are presented in Section 6 and our ideas on the choice of the tuning parameter in Section 7. Finally, we give a brief discussion of the method and its advantages in Section 8.
Section snippets
Sampling from a continuous population
Following Macchi (1975) (see also Moyal, 1962), a finite sample of size from a bounded and open subset of is a collection of units without consideration for the order of the ’s. This definition matches those commonly used in finite population sampling (see for example Cochran, 1977, for an introduction to finite population sampling theory). A sampling process is a probability distribution on the space of all such collections, for all . Note that it is not directly a
Renewal processes
A renewal process, or renewal sequence, is a stochastic process defined on the positive real line. It is completely characterized by the distribution of its independent and identically distributed inter-arrival times. For example, the Poisson process is a renewal process with exponentially distributed inter-arrival times when its intensity is constant. The following definition can be found in Mitov and Omey (2014). Definition 3.1 Renewal Process A renewal process is any process with
Quasi-systematic sampling
Our aim is to propose new sampling processes that allow to control the selection probability of neighboring units by adjusting the joint inclusion density. Spreading the sample units over has some advantages when units close together are similar (e.g. when the function has small variations).
The systematic sampling process allows to select samples that are very well spread. However, it does not possess a positive second-order inclusion density so that Cordy (1993)’s Horvitz–Thompson variance
Asymptotic results
The sampling processes introduced in Section 4 depend on a parameter . When gets large, they look more and more like systematic sampling processes. Indeed, we will see that these processes converge in distribution to the systematic sampling process when is fixed and goes to infinity. We first need Lemma 5.1. Lemma 5.1 A forward gamma random variable converges in distribution to a continuous uniform variable when tends to infinity and is fixed.
Proof It is easy to prove that, if
Simulations
Some simulations are useful to illustrate the properties of the systematic–binomial sampling process. We also ran simulations with the systematic-Poisson process and found that it behaves similarly but gives results that are less accurate than the systematic–binomial process with our test function. We considered the following test function: plotted in Fig. 5 (left). We aim at estimating its mean using the Horvitz–Thompson estimator on a sample selected
Choice of the tuning parameter
By choosing the tuning parameter one can make a compromise between an accurate estimation of the target parameter with a poor estimation of the precision and a less accurate estimation of the target parameter but with a reliable estimation of the estimator variance. Ideally one would have at its disposal a proxy interest function and could run simulations to select a suitable , that is to say a that corresponds to one’s preferred compromise.
When no useful proxy function is available, some
Conclusion and discussion
In this paper, we only worked on sampling processes with constant first-order inclusion density. It is however common in finite population survey sampling to choose different inclusion probabilities for different population units using auxiliary information available (e.g. the size of businesses or the approximate dispersion of the interest variable in a sub-population). Suppose we want to have a sampling process with first-order inclusion density proportional to a non-negative continuous
Acknowledgments
The authors are grateful to one associate editor and three reviewers for their insightful comments that helped considerably improve the quality of this paper. This work was supported in part by the Swiss Federal Statistical Office. The views expressed in this paper are solely those of the authors. M. W. was partially supported by a Doc.Mobility fellowship of the Swiss National Science Foundation (grant no. P1NEP2_162031).
References (20)
An extension of the Horvitz-Thompson theorem to point sampling from a continuous universe
Statist. Probab. Lett.
(1993)- et al.
spatstat: an R package for analyzing spatial point patterns
J. Stat. Softw.
(2005) Markov chain designs for one-per-stratum sampling
Surv. Methodol.
(1995)Sampling Techniques
(1977)- et al.
- et al.
Une théorie simplifiée des sondages
- et al.
A generalization of sampling without replacement from a finite universe
J. Amer. Statist. Assoc.
(1952) - et al.
Continuous Multivariate Distributions
(2000) The coincidence approach to stochastic point processes
Adv. Appl. Probab.
(1975)