Confidence intervals from simulations based on 4-independent random variables

doi:10.1016/S0167-7152(99)00053-X

Statistics & Probability Letters

Volume 45, Issue 2, 1 November 1999, Pages 141-147

https://doi.org/10.1016/S0167-7152(99)00053-X Get rights and content

Abstract

Using an algorithm of Joffe (Ann. Probab. 2 (1974) 161–162) we generate a finite sequence of 4-independent random variables. Using this sequence we find a confidence interval with coverage probability exceeding a value close to 1−α for $θ=E{g(U)}$ where $U =(U_{1},…,U_{s})$ with U₁,…,U_s independent and identically U(0,1) distributed random variables and $g(u)∈{0,1}$ for all $u ∈ R^{s}$ .

Introduction

Our aim is to calculate the probability $θ=P{T ∈A}$ where the statistic $T$ is a continuous random p-vector and $A⊂ R^{p}$ . In the context of continuous data, many probabilities of statistical importance can be put in this form: P-values, Type I and II errors for hypothesis tests and coverage probabilities for confidence limits, just to name a few. Suppose that the calculation of θ is analytically intractable, so that we seek a numerical approximation to θ. Also suppose that $T$ is such a complicated function of the data that its probability distribution (for given true parameter values) is not readily calculable. For example, when $T$ is a signed root likelihood ratio statistic with Bartlett correction this will usually be the case. We suppose, however, that $T$ can readily be expressed in the form $T = a (U)$ where $a : R^{s} → R^{p}$ and $U =(U_{1},…,U_{s})$ with U₁,…,U_s independent and identically distributed U(0,1) random variables. For example, suppose that $T$ is a statistic based on s independent and identically distributed continuous random variables each with distribution function $F(·)$ where F⁻¹(u) is defined for 0<u<1. Since $F(·)$ is the distribution function of F⁻¹(U₁) for U₁ a U(0,1) distributed random variable, $T$ can readily be expressed in the form $T = a (U)$ . To discuss the numerical approximation of θ it is convenient to express θ as follows. Define $g(·)=χ_{A} (a (·))$ where $χ_{A} (·)$ is the indicator function of the set A, i.e. $χ_{A} (u)=1$ for $u ∈A$ and $χ_{A} (u)=0$ for $u ∉A$ . Thus $θ=E{g(U)}= ∫ 01 ⋯ ∫ 01 g(u) d u .$ The analysis in the paper will be based on the quasi-Monte Carlo approximation of θ by $λ= 1 m ∑ i=1 m g(y (i)),$ where $y (1),…, y (m)$ is a set of points in the unit cube in $R^{s}$ . We suppose that m and $y (1),…, y (m)$ have been chosen judiciously under the conflicting aims of (a) making λ a good approximation to θ and (b) making m as small as possible. A detailed discussion about how this choice is made is a topic outside the scope of the paper. However, we make the following observations about that choice. The fact that g(·) is discontinuous implies that to guarantee a given level of accuracy of approximation of θ by λ, m is necessarily very large by comparison with the case that g(·) is smooth in some sense, see e.g. Haber (1970). For the particular case that ${u : u ∈ R^{s},g(u)=1}$ is a convex set, Theorem 1 of Schmidt (1975) provides some guidance as to how large m needs to be chosen to guarantee a given level of accuracy of approximation of θ by λ. Note that the approximator λ is very similar to the approximator obtained using what Fang and Wang (1994) call the “indicator function method” applied to the above multiple integral expression for θ.

We suppose that the calculation of an approximation to θ is subject to the following constraint.

Constraint 1

The computational burden of evaluating $g(u)$ for a single value of $u$ is such that it is only practical to evaluate $g(u)$ for far less than m values of $u$ .

In statistical practice, values of s exceeding 30 and functions $a (·)$ which are computationally burdensome to evaluate, arising for example from bootstrapping, are not uncommon. Thus it is not uncommon to encounter Constraint 1 in statistical practice.

Constraint 1 rules out the direct use of the approximator λ. We therefore look to Monte Carlo simulation to provide an estimate of λ which is then used as an estimate of θ. Since we are using simulation, our purpose is not only to find an estimate of θ but also to find a confidence interval for θ with coverage probability exceeding a value close to 1−α (where α has been specified beforehand, commonly α=0.05).

One method of simulation which, under Assumption 1 described below, fulfills that purpose is the following. Suppose that I₁,…,I_n are independent and identically distributed discrete random variables each uniformly distributed over {1,…,m}. Throughout the paper we will suppose that n>4. We form $W_{1} = ∑ i=1 n g(y (I_{i}))$ and estimate λ by $Λ ̃ =W_{1} /n$ . Note that W₁ has a Binomial (n,λ) distribution and there are standard procedures for finding a confidence interval for λ with coverage probability ⩾1−α. It is straightforward to show that such a confidence interval for λ will also be a confidence interval for θ with coverage probability exceeding a value close to 1−α when the following assumption holds true.

Assumption 1

n≫1 and |λ−θ|≪ standard deviation of $Λ ̃$ , i.e. $|λ−θ| ≪ λ(1−λ)/n$ .

Now observations of truly random variables may be made using electronic noise, radioactive decay or other truly random sources, see e.g. Maddocks et al. (1972) and references therein. We make a sharp distinction between truly random and pseudorandom variables. For a given value of the seed, the latter are calculated using a deterministic formula. We suppose that the observation of truly random variables is subject to the following constraint.

Constraint 2

The making of observations of truly random variables has a cost associated with it, so that it is practical to make observations of I₁,…,I₄ only.

In computational practice this is not an uncommon constraint. Constraint 2 makes the method of simulation described above inapplicable, since we will typically want the summation (1) that defines W₁ to involve more than just a handful of terms. We will come back to the confidence interval based on $Λ ̃$ later, when we use it as a yardstick for the new confidence interval based on the new method of simulation.

In Section 2 we introduce a new method of simulation which is based on a sequence of random variables K₁,…,K_n. These random variables are identically uniformly distributed over {1,…,m} and are 4-independent, i.e. any set of 4 of these random variables are independent. The basis of their generation is an algorithm of Joffe (1974). The new simulation is seeded by (I₁−1,…,I₄−1). We then form $W= ∑ i=1 n g(y (K_{i}))$ and estimate λ by $Λ ̂ =W/n$ . As a consequence of the 4-independence of the K_i's, the following assumption is satisfied by W.

Assumption 2

E{W^k}=E{X^k} for k=1,…,4 where X∼ Binomial(n,λ).

In Section 3 we describe a confidence interval for λ based on

Λ ̂

which has coverage probability ⩾1−α. As proved in Appendix B, under Assumption 1 this is also a confidence interval for θ with coverage probability exceeding a value close to 1−α.

In Section 4 we compare a confidence interval for λ based on $Λ ̃$ (based, in turn, on W₁ given by Eq. (1)) with the confidence interval for λ based on $Λ ̂$ (calculated using the new method of simulation) but with n set equal to n′. We show that for large n and n′ these confidence intervals will have almost identical widths provided that n′/n takes a certain value which depends on α. This value is approximately 2 for α in the range $[0.02, 0.1]$ . In other words, for n and n′ large and α in this range, the confidence interval using the new method of simulation with n set to twice the value of n for the confidence interval based on $Λ ̃$ will lead to approximately equally “accurate” confidence intervals. By adopting the former confidence interval we, in effect, avoid the generation of n−4 observations of the I_i's, but at the cost of requiring the evaluation of $g(u)$ for about twice as many values of $u$ as the latter.

We illustrate this comparison with an example. Suppose that m=10³⁰ and that evaluating $g(u)$ for 10⁵ values of $u$ takes 1 h of computer time. Clearly, Constraint 1 applies in this case. Two procedures which lead to confidence intervals of approximately equal width are the following:

The first of these procedures requires us to wait 2 h instead of 1 h for the result. However, this procedure saves us the burden of making observations of approximately 10⁵ independent truly random variables each uniformly distributed on {1,…,10³⁰}. For many computer users the access to observations of truly random variables made using electronic noise, radioactive decay or some other truly random source is so limited as to make this burden unacceptably large.

Section snippets

The new method of simulation

The new method of simulation uses the random variables X₁,X₂,…,X_p defined according to the following algorithm of Joffe (1974). Suppose that p is a prime and define $X_{1}, X_{2}, X_{3}, X_{4}$ to be independent random variables each uniformly distributed on {0,1,…,p−1}. Then let $X_{l} =X_{1} ⊕lX_{2} ⊕l^{2} X_{3} ⊕l^{3} X_{4} for l=5,…,p,$ where ⊕ denotes addition modulo p. According to the theorem of Joffe (1974), the random variables X₁,X₂,…,X_p are identically distributed and are 4-independent. We view the truly random vector (X₁,X₂,X₃,X₄)

The new confidence interval based on the new method of simulation

We now describe the new method of finding a confidence interval for λ, based on the new method of simulation described in Section 2, with coverage probability ⩾1−α. Define $a(φ,β)= 14 −β λ− 12^{2} =βφ+ 14 (1−β),$ where β is a parameter satisfying β∈[0,1] and φ=λ(1−λ). Since λ∈[0,1], $φ∈[0, 14]$ . Define φ₁=λ₁(1−λ₁). As shown in Appendix A, the following is a confidence interval for λ with coverage probability ⩾1−α: $H={λ_{1} : n(Λ ̂ −λ_{1})^{2} ⩽c^{∗} (0).a(φ_{1},β^{∗})},$ where $c^{∗} (0)$ and $β^{∗}$ are given by , of this appendix, respectively.

A comparison of the new confidence interval with one based on $Λ ̃$

In this section we restrict the range of possible values of λ to [ε,1−ε] where ε is a small positive number. The confidence interval for λ based on $Λ ̃$ (based, in turn, on W₁ given by Eq. (1)) and described by Bickel and Doksum (1977, pp. 160–161) is $G={λ_{1} : n(Λ ̃ −λ_{1})^{2} ⩽ c ̃ .φ_{1}},$ where φ₁=λ₁(1−λ₁) and $P{R⩽ c ̃}=1−α$ for R∼χ₁². For n large, this confidence interval for λ has coverage probability exceeding a value close to 1−α. We compare this confidence interval with the new confidence interval H for λ

References (9)

Abramowitz, M., Stegun, I.A. (Eds.), 1972. Handbook of Mathematical Functions. Dover, New...
Bickel, P.J., Doksum, K.A., 1977. Mathematical Statistics. Holden-Day, San...
Fang, K.-T., Wang, Y., 1994. Number-theoretic Methods in Statistics. Chapman & Hall,...
S Haber
Numerical evaluation of multiple integrals
SIAM Rev.
(1970)

There are more references available in the full text version of this article.

Cited by (0)

View full text