Quasi-systematic sampling from a continuous population

https://doi.org/10.1016/j.csda.2016.07.011Get rights and content

Abstract

A specific family of point processes is introduced that allow to select samples for the purpose of estimating the mean or the integral of a function of a real variable. These processes, called quasi-systematic processes, depend on a tuning parameter r>0 that permits to control the likeliness of jointly selecting neighbor units in a same sample. When r is large, units that are close tend to not be selected together and samples are well spread. When r tends to infinity, the sampling design is close to systematic sampling. For all r>0, the first and second-order unit inclusion densities are positive, allowing for unbiased estimators of variance. Algorithms to generate these sampling processes for any positive real value of r are presented. When r is large, the estimator of variance is unstable. It follows that r must be chosen by the practitioner as a trade-off between an accurate estimation of the target parameter and an accurate estimation of the variance of the parameter estimator. The method’s advantages are illustrated with a set of simulations.

Introduction

We propose to use a specific family of point processes to select samples for the purpose of estimating the mean or the integral of a function of a real variable. We draw a parallel with sampling designs which are themselves point processes on finite spaces. Systematic sampling is widely used in finite population. It has been introduced by Madow and Madow (1944) and Madow (1949). It is easily implemented and, by spreading the sample over the population, it results in precise mean and total estimators when the variable of interest is similar for neighboring units. The main drawback of systematic sampling is that most of the unit joint inclusion probabilities are null, making it impossible to estimate the variance of the Horvitz–Thompson estimator without bias (see  Horvitz and Thompson, 1952).

The aim of this paper is to develop a method that is a compromise between a base point process such as the Poisson process or the binomial process and the systematic process for sample selections in a continuous population. A similar objective is pursued in Breidt (1995) in a finite population setting supported by a superpopulation model. Breidt (1995) considers one-per-stratum sampling designs from a population that is split into strata of a successive units where a divides the population size. He introduces a class of sampling procedures that encompasses systematic sampling with constant rate 1/a and simple random sampling of one unit per stratum.

Point processes, that we refer to as sampling processes in the context of sampling, are the subject of a vast literature (see for example  Daley and Vere-Jones, 2002, Daley and Vere-Jones, 2008, and references therein). Cordy (1993) and Deville (1989) introduced independently the continuous analogue to the Horvitz–Thompson estimator for infinite population sampling. Different communities have studied point processes: mathematical physicists, probabilists and statisticians. A detailed state of the art in the study and simulation of some complex point processes can be found in Møller and Waagepetersen, 2003, Møller and Waagepetersen, 2007. Many simulation methods for point processes are implemented in the R package spatstat (Baddeley and Turner, 2005).

We introduce a new family of sampling methods that enable to continuously tune the distance between units in the sample. These processes allow to obtain small probabilities of jointly selecting neighboring units. These sampling methods are particularly efficient when the function of interest is smooth. Moreover, joint inclusion densities are positive and it is possible to estimate the sampling variance without bias.

The paper is organized as follows: in Section  2, we give a definition of sampling processes in continuous populations and we define the Poisson process, the binomial process and the systematic process. Important results of renewal process theory are recalled in Section  3. In Section  4, we define the systematic-Poisson and the systematic–binomial processes with tuning parameter r, and compute the joint densities. Section  5 contains proofs for the asymptotic processes when r tends to infinity. Simulations are presented in Section  6 and our ideas on the choice of the tuning parameter in Section  7. Finally, we give a brief discussion of the method and its advantages in Section  8.

Section snippets

Sampling from a continuous population

Following Macchi (1975) (see also  Moyal, 1962), a finite sample of size n from a bounded and open subset Ω of R is a collection of units X={x1,,xn} without consideration for the order of the xi’s. This definition matches those commonly used in finite population sampling (see for example  Cochran, 1977, for an introduction to finite population sampling theory). A sampling process is a probability distribution on the space S of all such collections, for all nN. Note that it is not directly a

Renewal processes

A renewal process, or renewal sequence, is a stochastic process defined on the positive real line. It is completely characterized by the distribution of its independent and identically distributed inter-arrival times. For example, the Poisson process is a renewal process with exponentially distributed inter-arrival times when its intensity λ is constant. The following definition can be found in Mitov and Omey (2014).

Definition 3.1 Renewal Process

A renewal process is any process X={Xk,k=0,1,2,} with Xk=X0+i=1kJi,k=1,2,

Quasi-systematic sampling

Our aim is to propose new sampling processes that allow to control the selection probability of neighboring units by adjusting the joint inclusion density. Spreading the sample units over Ω has some advantages when units close together are similar (e.g. when the function z has small variations).

The systematic sampling process allows to select samples that are very well spread. However, it does not possess a positive second-order inclusion density so that Cordy (1993)’s Horvitz–Thompson variance

Asymptotic results

The sampling processes introduced in Section  4 depend on a parameter r. When r gets large, they look more and more like systematic sampling processes. Indeed, we will see that these processes converge in distribution to the systematic sampling process when n is fixed and r goes to infinity. We first need Lemma 5.1.

Lemma 5.1

A forward gamma random variable ForG(r,rn) converges in distribution to a continuous uniform variable U(0,1/n) when r tends to infinity and n is fixed.

Proof

It is easy to prove that, if ϕf

Simulations

Some simulations are useful to illustrate the properties of the systematic–binomial sampling process. We also ran simulations with the systematic-Poisson process and found that it behaves similarly but gives results that are less accurate than the systematic–binomial process with our test function. We considered the following test function: h(x)=100sin(3x22x2+1)exp{[sin(4πx)2]}, plotted in Fig. 5 (left). We aim at estimating its mean using the Horvitz–Thompson estimator on a sample selected

Choice of the tuning parameter

By choosing the tuning parameter r one can make a compromise between an accurate estimation of the target parameter with a poor estimation of the precision and a less accurate estimation of the target parameter but with a reliable estimation of the estimator variance. Ideally one would have at its disposal a proxy interest function and could run simulations to select a suitable r, that is to say a r that corresponds to one’s preferred compromise.

When no useful proxy function is available, some

Conclusion and discussion

In this paper, we only worked on sampling processes with constant first-order inclusion density. It is however common in finite population survey sampling to choose different inclusion probabilities for different population units using auxiliary information available (e.g. the size of businesses or the approximate dispersion of the interest variable in a sub-population). Suppose we want to have a sampling process with first-order inclusion density proportional to a non-negative continuous

Acknowledgments

The authors are grateful to one associate editor and three reviewers for their insightful comments that helped considerably improve the quality of this paper. This work was supported in part by the Swiss Federal Statistical Office. The views expressed in this paper are solely those of the authors. M. W. was partially supported by a Doc.Mobility fellowship of the Swiss National Science Foundation (grant no. P1NEP2_162031).

References (20)

  • C.B. Cordy

    An extension of the Horvitz-Thompson theorem to point sampling from a continuous universe

    Statist. Probab. Lett.

    (1993)
  • A.J. Baddeley et al.

    spatstat: an R package for analyzing spatial point patterns

    J. Stat. Softw.

    (2005)
  • F.J. Breidt

    Markov chain designs for one-per-stratum sampling

    Surv. Methodol.

    (1995)
  • W.G. Cochran

    Sampling Techniques

    (1977)
  • D. Daley et al.
  • D. Daley et al.
  • J.-C. Deville

    Une théorie simplifiée des sondages

  • D.G. Horvitz et al.

    A generalization of sampling without replacement from a finite universe

    J. Amer. Statist. Assoc.

    (1952)
  • S. Kotz et al.

    Continuous Multivariate Distributions

    (2000)
  • O. Macchi

    The coincidence approach to stochastic point processes

    Adv. Appl. Probab.

    (1975)
There are more references available in the full text version of this article.

Cited by (0)

View full text