On the distribution of integration error by randomly-shifted lattice rules

: A lattice rule with a randomly-shifted lattice estimates a math- ematical expectation, written as an integral over the s -dimensional unit hypercube, by the average of n evaluations of the integrand, at the n points of the shifted lattice that lie inside the unit hypercube. This average provides an unbiased estimator of the integral and, under appropriate smoothness conditions on the integrand, it has been shown to converge faster as a function of n than the average at n independent random points (the standard Monte Carlo estimator). In this paper, we study the behavior of the estimation error as a function of the random shift, as well as its distribution for a random shift, under various settings. While it is well known that the Monte Carlo estimator obeys a central limit theorem when n → ∞ , the ran- domized lattice rule does not, due to the strong dependence between the function evaluations. We show that for the simple case of one-dimensional integrands, the limiting error distribution is uniform over a bounded interval if the integrand is non-periodic, and has a square root form over a bounded interval if the integrand is periodic. We ﬁnd that in higher dimensions, there is little hope to precisely characterize the limiting distribution in a useful way for computing conﬁdence intervals in the general case. We nevertheless examine how this error behaves as a function of the random shift from diﬀerent perspectives and on various examples. We also point out a situation where a classical central-limit theorem holds when the dimen- sion goes to inﬁnity, we provide guidelines on when the error distribution should not be too far from normal, and we examine how far from normal is the error distribution in examples inspired from real-life applications.

Abstract: A lattice rule with a randomly-shifted lattice estimates a mathematical expectation, written as an integral over the s-dimensional unit hypercube, by the average of n evaluations of the integrand, at the n points of the shifted lattice that lie inside the unit hypercube. This average provides an unbiased estimator of the integral and, under appropriate smoothness conditions on the integrand, it has been shown to converge faster as a function of n than the average at n independent random points (the standard Monte Carlo estimator). In this paper, we study the behavior of the estimation error as a function of the random shift, as well as its distribution for a random shift, under various settings. While it is well known that the Monte Carlo estimator obeys a central limit theorem when n → ∞, the randomized lattice rule does not, due to the strong dependence between the function evaluations. We show that for the simple case of one-dimensional integrands, the limiting error distribution is uniform over a bounded interval if the integrand is non-periodic, and has a square root form over a bounded interval if the integrand is periodic. We find that in higher dimensions, there is little hope to precisely characterize the limiting distribution in a useful way for computing confidence intervals in the general case. We nevertheless examine how this error behaves as a function of the random shift from different perspectives and on various examples. We also point out a situation where a classical central-limit theorem holds when the dimension goes to infinity, we provide guidelines on when the error distribution should not be too far from normal, and we examine how far from normal is the error distribution in examples inspired from real-life applications.

Introduction
Monte Carlo (MC) and quasi-Monte Carlo (QMC) methods estimate the integral of a function f over the s-dimensional unit hypercube [0, 1) s = {u = (u 1 , . . . , u s ) : 0 ≤ u j < 1 for all j}, by evaluating f at n points in this hypercube and taking the average. This integral often represents a mathematical expectation: (1) where f : [0, 1) s → R, U = (U 1 , . . . , U s ) ∼ U (0, 1) s , and the latter means that U is a random vector with the uniform distribution over (0, 1) s . This framework is very general, because the (pseudo)randomness in a simulation is typically obtained from a sequence of independent uniform (pseudo)random numbers over the interval (0, 1). We can take s as an upper bound on the number of calls to the generator and view f as representing all the transformations applied to those uniform random numbers to produce the estimator.
In the MC method, we sample n independent random points U 0 , . . . , U n−1 uniformly over (0, 1) s and we estimate µ bŷ Under standard assumptions,μ n obeys the classical central-limit theorem (CLT), so if n is large enough we can compute a confidence interval on µ by assuming that the error follows approximately a normal distribution. The variance can be estimated by the empirical variance, because the observations are independent. With randomized quasi-Monte Carlo (RQMC), the estimator has the same form:μ n,rqmc = 1 n and we still have U i ∼ U [0, 1) s for each i, but these points are no longer independent. They are constructed in a way that the set P n = {U 0 , . . . , U n−1 } ⊂ [0, 1) s covers the unit hypercube more uniformly than typical independent random points. This uniformity is quantified by figures of merit called discrepancies, which measure of discrepancy between the empirical distribution of the points and the uniform distribution over (0, 1) s . Two popular ways of constructing RQMC point sets are randomly-shifted lattices and digitally-shifted nets. For general background on quasi-Monte Carlo methods, RQMC, and discrepancies, the reader is referred to L'Ecuyer (2009); L'Ecuyer and Lemieux (2002); Lemieux (2009);Niederreiter (1992); Owen (1998); Sloan and Joe (1994) and the references given there. The RQMC estimator (3) has expectation µ and variance Var[μ n,rqmc ] = E[(μ n,rqmc − µ) 2 ].
This variance is more difficult to estimate than for MC, because of the dependence across observations. The usual way of estimating it and computing a confidence interval on µ is to obtain m independent realizations ofμ n,rqmc , say X 1 , . . . , X m , based on m independent randomizations of P n , and compute their sample mean X m and their sample variance and S 2 x,m . One has E[X m ] = µ and E[S 2 x,m ] = mVar[X m ] (L'Ecuyer and Lemieux, 2000). With a variance estimate in hand, one might be tempted to compute a confidence interval on µ by using a normal approximation for the distribution of X m , as for MC. This is common practice. Even ifμ n,rqmc is not normal, X m will eventually be approximately normal if m is large enough, because the CLT applies when m → ∞. But in typical RQMC settings, m is small, often no more than 10, and n is taken as large as possible given the available computing budget. Then it is unclear if a normal approximation makes sense.
This motivates the need for a better understanding of the distributional behavior ofμ n,rqmc in the asymptotic regime where m is fixed (and small) and n → ∞, and also for typical (finite) values of n. This is our aim in the present paper.
For certain RQMC methods that involve a sufficient amount of randomization of the dependent points, a CLT has been proved to hold for n → ∞. Two such special cases are Latin hypercube sampling (LHS) (Owen, 1992), for which a bound on the total variation convergence to the normal distribution is available, and digital nets with the full nested scrambling of Owen (Loh, 2003). See Loh (2005) for a survey. However, LHS is not one of the most powerful RQMC methods, because it ensures good uniformity only for the one-dimensional projections of the s-dimensional points. In two or more dimensions, its variance converges as O(n −1 ), the same as for MC. In one dimension, it becomes equivalent to a stratification scheme whose convergence is examined later in this paper. The nested scrambling of Owen (1997) is much more powerful in theory: its variance converges as O(n −3 ) when the integrand f has sufficient smoothness. However, it is not so popular in practice because it is very time-consuming.
In this paper, we focus on randomly-shifted lattice rules, a widely-used and effective RQMC technique (L'Ecuyer and Lemieux, 2000;Sloan and Joe, 1994), where the n evaluation points are those of a randomly shifted integration lattice of density n that lie in the unit hypercube, in s dimensions. We assume that the lattice is projection-regular, which means that each of its one-dimensional projections over the interval [0, 1) contains exactly n points. We explore and illustrate how the error behaves as a function of the random shift, and study its asymptotic distribution (for large n), in a variety of settings. We find that when s and m are fixed and n → ∞, the error does not obey a CLT, in the sense that it does not have a normal asymptotic distribution. We provide some insight on when the normal approximation is likely to be reasonable or not for finite (typical) n. We also have some results for an asymptotic regime where s → ∞.
Randomly shifting the lattice means that we generate a random shift U ∼ U (0, 1) s and add it to all the lattice points. This is equivalent to sampling U in one of the elementary parallelotopes determined by any s vectors that form a basis of the lattice. We show that this is also equivalent to generating it uniformly in [0, 1) s−1 × [0, 1/n), which is much simpler, or more generally in any elementary region of a partition of [0, 1) s into n pieces of volume 1/n if the partition satisfies certain tiling conditions. These properties are useful to understand how the error behaves as a function of the shift. We provide illustrations.
We say that f is c-periodic with respect to coordinate j if its periodic continuation with respect to coordinate j is a continuous function. This means that f is continuous in [0, 1) s and that for any (u 1 , . . . , u j−1 , u j+1 , . . . , u s ) ∈ [0, 1) s−1 , If it is c-periodic with respect to each coordinate j, f is simply called c-periodic. We will see that for continuous c-periodic functions f , the integration error as a function of the shift is a continuous function over the unit hypercube, and that if f is continuous but not c-periodic with respect to coordinate j, the error function has jumps whenever coordinate j of the shift crosses a multiple of 1/n.
For the simple case of a one-dimensional continuous function (s = 1), we develop a generalized Euler-MacLaurin expansion of the integration error, from which we obtain the asymptotic distribution of this error when n → ∞. As a special case, we find that when f is not c-periodic, the asymptotic distribution of n times the integration error is uniform over a bounded interval centered at 0. If f is c-periodic but its derivative is not, then n 2 times the error converges in distribution to the square of a uniform random variable centered at 0, whose cumulative distribution function (cdf) is a square root function. In general, if f has k − 2 c-periodic derivatives but its derivative of order k − 1 is not c-periodic, then n k times the error converges in distribution to a constant times B k (U ) where B k is the Bernoulli polynomial of degree k and U ∼ U (0, 1).
For more general s-dimensional functions, the analysis is more complicated because (among other things) the convergence behavior and even the convergence rate depend on the choice of lattice as a function of n. It is well known from the theory of lattice rules (Kuo and Joe, 2002;Niederreiter, 1992;Sinescu and L'Ecuyer, 2010;Sloan and Rezstov, 2002) that if all the mixed partial derivatives of order one of f are absolutely integrable over [0, 1) s , then for any ǫ > 0 there are rank-1 lattice rules for all n, such that the integration error converges as O(n −1+ǫ ) uniformly over the shift U. These rules are also not difficult to find and they can be constructed explicitly component by component (one coordinate at a time) (Kuo and Joe, 2002;Sloan and Rezstov, 2002). Of course, this convergence rate does not apply to an arbitrary lattice rule. One can construct examples of badly selected rules for a given function f where the integration error is the same for all n. On the other hand, this rate applies to the average over all reasonable rank-1 lattice rules, so it represents the rate for typical rules used in practice. Note that this convergence rate is just an upper bound and does not tell much about the error distribution.
To study this distribution, we will decompose the error as a sum of three terms, E 1 + E 2 + E 3 , where nE 1 is a linear combination of independent uniform random variables over (−1/2, 1/2), whose cdf is a spline of degree s that does not depend on the choice of lattice, E 2 has a discrete distribution over n s−1 values, and E 3 is O(n −2+ǫ ) when f is smooth enough, and it typically becomes negligible when n → ∞. Any of these terms could be zero in certain situations. The term E 1 is the integration error for the best approximation of f by a sum of one-dimensional functions. In situations where this term dominates the total error, the spline is a good approximation of its cdf. For E 2 , we do not have an expression or a characterization of the limiting distribution; it seems that the cdf of E 2 , and thus the cdf of the error when E 2 dominates, can be almost anything. We also look at the integration error via its Fourier expansion.
Our results for E 1 come from our study of the special case where f is a sum of one-dimensional functions. In practice, it is not unusual that E 1 dominates the error, especially in situations where RQMC works very well, and this is often achieved deliberately by transforming the integrand f via a change of variables Caflisch, Morokoff and Owen (1997); L'Ecuyer (2009); Sloan and Joe (1994). We show that for a sum of s non c-periodic one-dimensional functions, the cdf of n times the integration error converges to a spline (a piecewise-polynomial function) of degree s when n → ∞, and this limiting distribution has a bounded support (in contrast with the normal distribution). In this sense, the behavior is significantly different than when a classical CLT holds. We also show that when s → ∞, under certain conditions, a classical CLT holds for a properly standardized version of the error. We give illustrative examples.
Finally, we perform an empirical study of the error distributions for an example inspired from real-life: pricing an Asian option.
The remainder is organized as follows. In Section 2, we recall the basic definitions and results on randomly-shifted lattice rules, and introduce some notation. In Section 3, we show how the random shift can be equivalently generated over a smaller subset of the unit hypercube, and we provide several illustrations. In Section 4, we study the error distribution for one-dimensional integrals and compare it with the error distribution obtained by a stratification scheme. In Section 5, we look at linear combinations of one-dimensional functions and show that a classical CLT holds under certain conditions as the number of functions increases. In Section 6, we study the general s-dimensional case. A real-life example is examined in Section 7.
Preliminary results (mostly for the one-dimensional case) were published in the proceedings of the 2009 Winter Simulation Conference (L'Ecuyer and Tuffin, 2009).

Definitions
An integration lattice is a discrete vector space of the form where the basis vectors v 1 , . . . , v s ∈ R s are linearly independent over R and where L s contains Z s , the set of integer vectors. The dual lattice to L s is where "t" means "transposed" and the vectors are assumed to be column vectors whenever it matters (in the text we often write them as row vectors to simplify the notation). It is easy to see that Z s ⊆ L s if and only if L * s ⊆ Z s , i.e., if all coordinates of all vectors of L * s are integers. Let P n = L s ∩ [0, 1) s = {u 0 , . . . , u n−1 } be the set of lattice points that belong to the unit hypercube [0, 1) s , where n is the cardinality of this set P n . One can show that all coordinates of all vectors of L s must be multiples of 1/n. A (deterministic) lattice rule is a numerical integration method that approximates µ by the average of f (u 0 ), . . . , f (u n−1 ). Further details on integration lattices, lattice rules, and ways of measuring the uniformity of P n for an integration lattice, can be found in L'Ecuyer (2009); L'Ecuyer and Lemieux (2000); Sloan and Joe (1994); Tuffin (1998), for example. Most integration lattices used in practice are projection-regular, which means that each one-dimensional projection of P n is the set {0, 1, . . . , n− 1}, and we assume here that the lattice has this property. They are also usually of rank 1, which means that the basis can be selected so that v j = e j (the jth unit vector in s-dimensions) for j = 2, . . . , s, and we can write where a 1 = (a 1 , . . . , a s ) and v 1 = a 1 /n.
A lattice rule can be turned into an RQMC method by applying a random shift modulo 1 to P n , which consists in generating a single point U ∼ U (0, 1) s and adding it to each point of P n , modulo 1, coordinate-wise. That is, each point u i = (u i,1 , . . . , u i,s ) ∈ L s ∩ [0, 1) s is randomized into U i = (U i,1 , . . . , U i,s ) = (u i +U) mod 1. This was proposed by Cranley and Patterson (1976) and further studied by L'Ecuyer and Lemieux (2000), among others. This is the same as randomly shifting L s and then taking the intersection with [0, 1) s . Clearly, U i ∼ U [0, 1) s for each i, and the global uniformity of P n is preserved by the shift in the sense that the points have the same relationship with each other if we forget about the hypercube boundaries. Figure 1 gives a toy illustration with s = 2, n = 8, v 1 = (1/8, 3/8), v 2 = (0, 1), and U = (0.436, 0.233).

Integration error
The integration error for a lattice rule shifted by u is given by We are interested in the behavior of g(u) as a function of u, and the distribution of g n (U), the integration error for a randomly-shifted lattice rule, for large n.
Suppose that the integrand f has the Fourier expansion with Fourier coefficientŝ Then it is known that the Fourier coefficients of g n satisfŷ g n (h) =f (h) if 0 = h ∈ L * s , andĝ n (h) = 0 otherwise (Niederreiter, 1992;Sloan and Joe, 1994). Therefore, the Fourier expansion of g n can be written (when it exists) as By the Parseval equality, we also have whenever f is square integrable (L'Ecuyer and Lemieux, 2000). By making assumptions on how fast the Fourier coefficients converge when the size of h increases, one can obtain asymptotic bounds on the worst-case error sup u∈[0,1) s |g n (u)| or on the variance Var[g n (U)]. For example, let α > 1/2, take some non-negative constants γ 1 , . . . , γ s , and consider the class of functions f : [0, 1) s → R such that for all h = (h 1 , . . . , h s ) ∈ Z s , It is known that there exists a sequence of lattice rules, indexed by n, such that for any δ > 0, the worst-case error and the variance converge as O(n −α+δ ) and O(n −2α+δ ), respectively (Dick et al., 2006;Sloan and Joe, 1994). When α is an integer, the above condition can be written in terms of square integrability of a collection of partial derivatives: f satisfies the condition if for every subset of coordinates, the partial derivative of order α with respect to these coordinates is square integrable, and the partial derivatives of orders 0 to α − 2 are c-periodic (Dick et al., 2006;L'Ecuyer, 2009). Concrete applications where the variance is reduced drastically by randomlyshifted lattice rules can be found in L'Ecuyer (2009); L'Ecuyer and Lemieux (2000); Lemieux (2009), for example.

A general condition on tiling patterns
Instead of generating the random shift U ∼ U (0, 1) s , we can equivalently generate it over a smaller subregion R 0 of volume 1/n. The key observation is that shifting by u i + U (modulo 1) for any u i ∈ P n is equivalent to shifting by U, because shifting the lattice L s by any lattice point just gives the same lattice. The next proposition characterizes the regions R 0 for which this idea works. These regions determine a tiling pattern made of n shifted versions of R 0 that cover the unit hypercube.
Proposition 1. Let R 0 ⊂ [0, 1) s be a region such that the family {R i = (R 0 + u i ) mod 1, i = 0, . . . , n − 1} forms a partition of [0, 1) s in n regions of volume 1/n, where R i is R 0 shifted by u i modulo 1. Then, sampling the random shift uniformly in R i for any fixed i is equivalent to sampling it uniformly in [0, 1) s . Consequently, the error function g n (u) over any of those regions R i is the same as over R 0 , in the sense that g n (u) = g n ((u + u i ) mod 1) for all i.
Proof. Note that if a shifted lattice point u is in R k for some k, then for each i = k, another shifted lattice point (u i − u k + u) mod 1 must be in R i . This means that each R i must contain a single shifted lattice point. Consider now a fixed i and suppose (without loss of generality) that u i is the lattice point that belongs to R i before the shift. LetŨ i be the single lattice point that falls in R i after a shift by U ∈ [0, 1) s , and letŨ k = (u k − u i +Ũ i ) mod 1 for k = 0, . . . , n − 1. Then a random shift by any of theseŨ k is the same as a random shift byŨ i , and leads to exactly the same point in R i . If τ i denotes the transformation that maps the uniform random shift U ∈ [0, 1) s toŨ i ∈ R i , then the inverse image by τ i of any measurable set of volume ǫ > 0 in R i is the union of n sets of volume ǫ in [0, 1) s , and so has total volume nǫ. This means that the pointŨ i has the uniform density n over R i , and we can generate it directly from this density.
Next, we examine some choices of R 0 that satisfy this proposition.

Random shift in a parallelotope
Consider a set of basis vectors v 1 , . . . , v s that define the lattice. These vectors determine a parallelotope (an s-dimensional parallelepiped) of volume 1/n, with one vertex at the origin, and whose edges connected to that vertex are precisely v 1 , . . . , v s . With R 0 = P , the conditions of Proposition 1 are satisfied (Conway and Sloane, 1999). To sample U uniformly in P , we can generate C = (C 1 , . . . , C s ) uniformly over (0, 1) s and put U = C 1 v 1 +· · ·+C s v s . This mapping C → U is a linear transformation, so its Jacobian is a constant, and therefore it preserves uniformity.
Example 1. Figure 2 illustrates this property for the same example as in Figure 1, with the shiftŨ ∈ P that corresponds to the shift U shown in Figure 1. The parallelogram P is shaded in light gray in the figure. The figure also shows the parallelogram (P + u i ) mod 1 for u i = (0.875, 0.625), which is split into three pieces, shaded in dark green. Sampling uniformly in this parallelogram is easily achieved by sampling uniformly in P and adding u i modulo 1.
The original eight points and the two vectors v 1 and v 2 of a lattice basis (left) and the points shifted byŨ = (0.061, 0.108) modulo 1 (right).

Random shift in a rectangular slice
A simpler way of partitioning [0, 1) s is as follows. Fix an arbitrary coordinate j ∈ {1, . . . , s} and, for each This partitions the hypercube into n hyper-rectangles (or slices) R i of thickness (and volume) 1/n. The assumption of Proposition 1 holds, because coordinate j of the original points takes each value in {0, 1/n, . . . , (n−1)/n} exactly once when we run through all the points, and therefore there is always exactly one point in each R i . In one dimension (s = 1), this R i is the interval [i/n, (i + 1)/n), so we can generate the random shift over the interval [0, 1/n). Then the error function g n (u) for u ∈ [0, 1) is periodic with period 1/n (or a divisor of 1/n).
Example 2. In Example 1, we can generate the random shift U uniformly over the rectangle R 0 = [0, 1) × [0, 1/8) instead of over the parallelogram. This is equivalent and obviously simpler. Figure 3 (left) illustrates this situation, with a shift U ∈ R equivalent to that or Figure 1.
Proposition 1 works for other types of regions than rectangles and parallelotopes. The shaded region in Figure 3 (right) gives another illustration.

Discontinuity of g n for non-periodic functions
Shifting the points by u modulo 1 is equivalent to shifting the function f by −u modulo 1. If f is c-periodic, f ((u i + u) mod 1) varies continuously with u for each i, which means that the error function g n (u) is also continuous in u everywhere in [0, 1) s , and is c-periodic.
On the other hand, suppose f is not c-periodic with respect to the jth coordinate and let u = (u 1 , . . . , u s ) denote the shift. When u j increases while the other coordinates of u are fixed, each time u j crosses a multiple of 1/n, there is one shifted point whose jth coordinate jumps from 1 to 0, and this causes a jump in the error function g n (u). Thus, g n (u) is discontinuous in u j across the hyperplane u j = i/n for any integer i, and it is continuous in u j elsewhere, provided that f is continuous in (0, 1) s . Therefore, in the general non-c-periodic situation, g n (u) is continuous in each small hypercube of volume n −s determined by these hyperplanes, and is discontinuous across the boundaries of these small hypercubes. In the one-dimensional case, these small hypercubes become the intervals of the form [i/n, (i + 1)/n); the function g n (u) is the same over each of those intervals and it is discontinuous across them.
If f is a sum of one-dimensional functions, say f (u) = s j=1 f j (u j ), then we can analyze the error by examining each f j separately. The error function for f can be written as where g j,n is the error function for f j . This g j,n is periodic with period 1/n, and it is discontinuous at each multiple of 1/n if f j is not c-periodic. This implies that the function g n over any of the small hypercubes of the form In that case, the random shift U can be equivalently sampled uniformly in any of the small s-dimensional hypercubes of volume n −s .
The following examples illustrate these properties.
Example 3. Figure 4 shows the error function g n (u) for u ∈ [0, 1) 2 , for an example where s = 2, f (u 1 , u 2 ) = u 1 + u 2 − 3/2 = (u 1 − 1/2) + 2(u 2 − 1/2), n = 8, and the lattice points are the same as in Figure 1. This is a sum of two one-dimensional functions, so the error function has exactly the same shape in each small square of the form [i 1 /n, (i 1 + 1)/n) × [i 2 /n, (i 2 + 1)/n). It is also discontinuous across the boundaries of those squares, because f is not cperiodic. Here the error varies a lot within each square but it is the same across all squares.
Example 4. Figure 5 plots g n (u) for u ∈ [0, 1) 2 for s = 2, f (u 1 , u 2 ) = (u 1 − 1/2) (u 2 − 1/2), and the same points. In contrast with the previous example, the error varies a lot across the squares and very little within each square. The error function is symmetric with respect to each of the two diagonals. If we partition the unit square into four squares of area 1/4, the error function is also symmetric with respect to each of the two diagonal in each of those four squares. This high symmetry comes from the symmetry in f .
Example 5. Figure 6 plots g n (u) for u ∈ [0, 1) 2 for s = 2, f (u 1 , u 2 ) = u 2 1 (u 2 − 1) + 1/6, and the same points. The error has a behavior halfway between that of the previous two examples: It varies substantially both across the squares and within them. Also, it does not exhibit as much symmetry.

Limit theorems
One-dimensional integrals are rarely estimated by RQMC methods but studying the distribution of g n (U ) for a randomly-shifted lattice rule in one dimension (s = 1 and U ∼ U (0, 1)) is a first step toward understanding what may happen in general, and the result will be reused as a building block for the general case. It also provides simple counter-examples showing that the CLT does not apply. When s = 1, the random shift can be generated uniformly over (0, 1/n), so the randomly-shifted points can be written as {U/n, (1 + U )/n, . . . , (n − 1 + U )/n) where U ∼ U (0, 1) and the points are shifted by U/n. The corresponding total error is g n (U/n), and although it is not equal to g n (U ) for the same U , Proposition 1 tells us that it has exactly the same distribution. Thus, we could expand the error over each interval of length 1/n, and sum up these terms to get a handle on g n (U/n) and its distribution. This would yield the same result as in the next theorem, which provides an asymptotic expansion of of order m for g n (U/n) when f has at least m + 1 integrable derivatives. This theorem appears in a different form in (Abramowitz and Stegun, 1970, Section 23.1.32), without a proof. For completeness, we provide a proof in Appendix A. We use B k to denote the kth Bernoulli polynomial, defined by Abramowitz and Stegun, 1970). This gives B 0 (u) = 1, B 1 (u) = u−1/2, B 2 (u) = u 2 − u + 1/6, B 3 (u) = u 3 − (3/2)u 2 + (1/2)u, B 4 (u) = u 4 − 2u 3 + u 2 − 1/30, and so on.
Theorem 2 (Generalized Euler-MacLaurin expansion). If f has m + 1 integrable derivatives over [0, 1], then The last term is O(n −(m+1) ). This is the Euler-MacLaurin expansion of g n (U/n) of order m.
Proposition 3. Suppose that f has m + 1 integrable derivatives over [0, 1], and therefore Proof. Under the given assumptions, Theorem 2 gives from which the convergence result follows. For the last part, just take m → ∞ in Theorem 2.
We recall that g n (U/n) has the same distribution as g n (U ) where U ∼ U (0, 1). This corollary says that for non-c-periodic functions, n times the error converges in distribution to a uniform random variable, and for c-periodic functions whose derivative is not c-periodic, n 2 times the error converges in distribution to a random variable with a square root density over a finite interval.

Symmetric functions and the baker's transformation
With this, we can switch from case (i) to case (ii) in Corollary 4, i.e., from O(n −1 ) to O(n −2 ) convergence for g n (U ) (Hickernell, 2002). The resulting functionf is also symmetric with respect to u = 1/2. This transformation is equivalently achieved by keeping f unchanged and transforming the randomized . This is called the baker's transformation; it stretches the points U i by a factor of two and then folds back those that exceed 1. After applying this transformation, the lattice points become locally antithetic in each interval of the form [i/n, (i + 2)/n] if n and i are even, in the sense that they are at equal distance from the center of the interval, on each side. As a result, they integrate exactly any linear function over this interval. This holds for every such interval, so a piecewise-linear approximation which is linear over each interval is integrated exactly.

Comparison with stratification and randomized nets
We now examine the case where instead of using the same shift by U/n over each interval of length 1/n, we use independent random shifts, say by U i /n over the interval [i/n, (i + 1)/n), where U 0 , . . . , U n−1 are independent and have the U (0, 1) distribution. In one dimension, both Latin hypercube sampling and the scrambled nets of Owen (1997) are equivalent to this form of stratified sampling over the unit interval. The (unbiased) estimator becomeŝ Suppose that f has square integrable derivatives up to order 2. A Taylor expansion of f around x i = (i + 1/2)/n gives for some ξ i ∈ [i/n, (i + 1)/n] that depends on U i . The overall variance is then for large n. Moreover, unless f is a constant, the estimator obeys a CLT in the sense that 12n 3 1 0 (f ′ (u)) 2 du 1/2μ n,str converges in distribution to a standard normal random variable. Thus, the randomly-shifted lattice rule estimator converges at a slower rate than the stratified estimator if f is not c-periodic, and at a faster rate if f is c-periodic. Another popular RQMC method is a digital net with a digital random shift (L'Ecuyer, 2009;Owen, 2003). In one dimension, this turns out to be equivalent to a randomly-shifted lattice rule, and therefore all the same results apply.
For n = 1024, the shift by 1/6144 is too small to be visible to the eye. For the case of stratification, noting that 1 0 (f ′ (u)) 2 du = 4/3, we find that 3n 3/2μ n,str converges to a standard normal.
If n is odd, the middle term of the summation does not have a continuous derivative, so the theorem does not apply. But if we write n = 2p + 1 and define t p (u) = k + u n Thus, n 2 g n (U/n)/4+1/12 converges to U 2 if U ≤ 1/2 and to (1−U ) 2 otherwise, so it has the same limiting distribution as when n is even: For the stratified estimator, here we have 1 0 (f ′ (u)) 2 du = 2 1/2 0 (8u) 2 du = 16/3, from which we find that (3/2)n 3/2μ n,str converges to a standard normal. This is a slower convergence rate than for the randomly shifted lattice rule. Moreover, applying the baker's transformation multiplies the variance by 4.

Discontinuous and unbounded functions
The integrands f encountered in practice are often discontinuous, and sometimes they are also unbounded and have unbounded derivatives. The next examples give simple illustration of what can happen in these cases, in one dimension.
Example 11. Consider the step function f (u) = 0 for u < a and f (u) = 1 for u ≥ a, for some constant a ∈ (0, 1). For a given n, let δ(n) = ⌈na⌉/n − a. The integration error with a randomly shifted lattice rule is then g n (U/n) = −δ(n) if U/n < 1/n − δ(n), and g n (U/n) = 1/n − δ(n) if U/n ≥ 1/n − δ(n), where U is U (0, 1). Thus, the error is distributed over only two possible values, and the variance is O(n −2 ), unless a is a multiple of 1/n, in which case there is no error. Interestingly, with the stratification, the error has exactly the same behavior, because the error is determined by a single U i .
To generalize this, suppose that f is piecewise constant with a jump of size d j (positive or negative) at position a j , for j = 1, . . . , k, for some integer k > 0. As for the case of a single jump, changing the position of a jump by any multiple of 1/n does not change the distribution of g n (U/n), so we can assume that all the jumps are in the interval (0, 1/n), sorted in increasing order. Let a 0 = s 0 = 0 and s j = d 1 + · · · + d j , for j = 1, . . . , k, and a k+1 = 1/n. Then the average is µ = n k j=0 s j (a j+1 − a j ) and the error g n (U/n) is (s j − µ)/n with probability (a j+1 − a j )n. Thus, the error is distributed over k + 1 possible values only, regardless of n, and the variance is again O(n −2 ). However, these k + 1 values depend on n, and the distribution of n g n (U/n) may vary with n without converging to anything.
Even more generally, suppose f is twice continuously differentiable, except at k points a j where it has a jump. Then f can be written as f = f j + f c , where f j is a piecewise constant function as above and f c is a smooth function that obeys Proposition 3. The error g n (U/n) is then the sum of two terms: an O(1/n) error with a discrete distribution over k values as above, and another term obtained by applying Proposition 3 to f c .
Example 12. Consider the unbounded function f (u) = g(F −1 (u)) where F is the cdf of a random variable with infinite support [0, ∞). Let n 0 be a (fixed) large integer and let b = 1 − 1/n 0 . Suppose that n → ∞ while n remains a multiple of n 0 . The error for the integral over the interval [0, b] has a distribution that obeys a modified version of Proposition 3, with f (k−1) (1) replaced by f (k−1) (b), and n replaced by n − n/n 0 . Over the last interval [b, 1), the distribution is more complicated, and the error is unbounded. In fact, the error over the interval [(n − 1)/n, 1) is distributed as a random variable generated from the tail of the distribution F , so the total error will have a right tail that resembles the right tail of F . It will not be uniform.

Setting and relevance
To study the integration error in the general situation where s ≥ 1, we will decompose this error in the next section as a sum of three terms, where the first term E 1 is the limiting integration error for a sum of one-dimensional functions. This E 1 is sometimes the dominant term of the error. The present section is devoted to examining the behavior of the error for such a sum. That is, we suppose here that f has the form where f j : [0, 1) → R and a j ∈ R for each j. We will obtain the limiting distribution for this case by applying the results of Section 4 to each f j , and looking at the linear combination. Our error decomposition is in fact connected with the functional ANOVA decomposition of f (Liu and Owen, 2006;Owen, 1998), written as where the f u : (0, 1) s → R depend only on {u i , i ∈ u}, are defined recursively by f ∅ = µ (a constant function) and . . , s}, where the first integral is with respect to the coordinates of u whose indexes are not in u, denoted by uū. The f u 's integrate to zero and are orthogonal, and the variance σ 2 = [0,1) s (f (u) − µ) 2 du (assumed finite) decomposes as σ 2 = u⊆{1,...,s} σ 2 u where σ 2 u = Var[f u (U)] for U ∼ U (0, 1) s . We say that f has effective dimension d in proportion ρ in the superposition sense (Owen, 1998) When (10) holds for d = 1 and ρ close to 1, the approximation of f − µ by the sum of one-dimensional functions f {j} for j = 1, . . . , s, in the ANOVA decomposition, accounts for most of the variance (and mean square error). Then, the higher-order functions in the decomposition have a small impact and the distribution of the error g n (U) would be well approximated by the distribution of the sum in (8) with f j = f {j} and a j = 1. Note that low effective dimension can sometimes be achieved by redefining f via a change of variables, without changing the expectation µ, and this often makes RQMC much more effective (Caflisch, Morokoff and Owen, 1997;Glasserman, 2004;L'Ecuyer, 2009).

Convergence to a spline
Suppose that f has the form (8) and that each f j satisfies the assumptions of Proposition 3 for some m = m j ≥ 1. If m j = 1 for at least one index j, then the error will converge as O(1/n) and the f j 's for which m j > 1 will have no impact on the asymptotic distribution, because their error contribution converges at a faster rate. So to simplify the exposition, suppose that m j = 1 for all j (the other f j 's have been removed) and that each f j has an integrable second derivative. We can then apply Corollary 4(i) to each f j : It says that if g j,n (U j ) denotes the corresponding integration error, then n g j,n (U j /n) converges to [f j (1) − f j (0)](U j − 1/2). But the integration error function for f can be written as This means that the asymptotic distribution of n g n (U) is the same as the distribution of . This W is a linear combination of independent uniform random variables over the interval (−1/2, 1/2). Theorem 1 of Barrow and Smith (1979), restated in the next proposition, tells us that the (exact) cumulative distribution function of W is a non-decreasing spline of degree s, with s − 1 continuous derivatives, and its support is the finite interval [−b, b], where b = s j=1 |b j |/2. Proposition 5. The exact distribution function of W is given by for −b ≤ w ≤ b. Its density is given by a spline of degree s − 1: where 0 0 = 0 by convention.
Example 13. As an illustration, let s = 2 and f (u 1 , u 2 ) = 5u 1 + 3u 2 − 4 = 5B 1 (u 1 ) + 3B 1 (u 2 ). The constant −4 has no effect on the error. Here, we have b j = a j , b = 4, and the distribution function F w of W = ng n (U) = ng n (U 1 , U 2 ) (here we have equality) is a spline of degree 2 defined by The corresponding density is a "table mountain" function, defined by

Asymptotic distribution when s → ∞
If f has an additive form as in (8) and s → ∞ while n remains fixed, then g n (U) is a sum of s independent random variables and its properly standardized version may obey a CLT when s → ∞, under standard conditions on the variances of the terms of the sum. The following version of the Berry-Essen theorem provides simple sufficient conditions for the CLT to hold, together with an explicit bound on the approximation error for the distribution function. It is a direct adaptation of Petrov (1995, Theorem 5.7) reformulated for our setting.
The following condition is less general, but simpler. Proof. Because which we rewrite as If the right side vanishes when s → ∞, then w s /v s also vanishes. Proof. Because the contribution from f j is the same in every term in the sum, it can be factorized out of the sum, then just ignored because the right-hand side is zero. A similar argument holds for the supremum in the numerator.
Example 14. Suppose f j (u j ) = B mj (u j ), the Bernoulli polynomial of degree m j , so that (8) becomes Example 10 told us that the error function for the jth term is g j,n (u j ) = n −mj B mj (u j ), so the error function for the sum is g n (u) = s j=1 a j n −mj B mj (u j ).

We also have
where b k is the k-th Bernoulli number, which can be defined as Corollary 7 implies that if then the CLT holds for the standardized version of g n (U). It is known (Abramowitz and Stegun, 1970, page 805) that |b k | > 2k!/(2π) k for k = 2, 4, . . . and that |B k (u)| < 2k!/(2π) k (1 − 2 1−k ) < 4k!/(2π) k for u ∈ (0, 1) and k ≥ 2. The latter inequality also holds for k = 1, since |B 1 (u)| < 1/2 < 4/2π for u ∈ (0, 1). Using these inequalities, we can derive an upper bound for the left-hand side of (14), which implies that the CLT holds under the following sufficient condition (in which we have dropped the constant factors): For the special case where m j = k (a positive integer) for all j, (15) is equivalent to Corollary 8.
Example 15. As a special case of Example 14, let f be a linear combination of Bernoulli polynomials of degree 1: If a j = j k for all j, where k ≥ 0 is fixed, Corollary 8 applies, because If a j = ρ j for all j, where 0 < ρ < 1, then when s → ∞, where c(δ) = 3 1+δ/2 /(3 + δ), so Corollary 8 does not apply. One can show that the CLT does not hold in this case; the a j 's are converging to 0 too fast.

Normal P-P plots for numerical examples
We report a few representative illustrations of simulation experiments we made to assess the normality of the error. For these experiments, and all other numerical examples in the rest of the paper (unless otherwise indicated), we use a rank-1 lattice rule in 20 dimensions with n = 8192, v 1 = a 1 /n where (1,2431,3739,1689,3185,2609,4343,1525,71,6083,2585,1583,5385,3359,7759,3055,2627,4535,4475,1623), and v j = e j (the jth unit vector) for j = 2, . . . , 20. When the integrand has s < 20 dimensions, we just use the first s coordinates of each point.
In our experiments, we compute m independent replicates of the RQMC estimatorμ n,rqmc in (3) and we subtract the global average (or the exact mean µ when we know it) to each replicate to obtain estimates of the corresponding replicates of g n (U), say g n (U 1 ), . . . , g n (U m ). We then compute their empirical variance S 2 n,m and the standardized errors Z n,j = g n (U j )/S n,m for j = 1, . . . , m. Finally, we plot the empirical distribution of these standardized errors, defined byF against Φ(z), where Φ is the standard normal distribution function. Unless otherwise specified, we take m = 10 4 , in order to obtain good accuracy in the estimation of the error distribution.
Example 16. Figure 9 shows normal P-P plots of the error for a function f defined by (16) with a j = j 2 and with a j = 2 −j , as in Example 15. As expected, the P-P plots indicate a nearly normal distribution when s is large enough in the first case, for which the CLT holds, but not for the second case. We made a similar plot for a j = 1 (not shown), and the fit was already excellent for s = 2 and looked perfect (to the eye) for s = 10 and 20. We also performed similar experiments by replacing the first-degree Bernoulli polynomials by second-degree or by third-degree Bernoulli polynomials, and we observed similar behaviors for the same three choices of coefficients a j .  (17) with r = s − 1 (left) and r = 1 (right). The behavior of all the curves is somewhat similar to that in Figure 9, except that the non-normality on the right panel is more accentuated. a j = 1 for all j: for some r < s. Figure 10 shows normal P-P plots of error for functions f defined as in (17) in s = 2, 10 and 20 dimensions, for r = s − 1 and for r = 1. In both cases, the error g n (U) is dominated by the errors on the polynomials of degree 1. In the first case (r = s−1), when s = 10 or 20, this dominant part of the error is a sum of several independent terms and the normal approximation provides a very good fit (the P-P plots look perfect) for this reason. For s = 2, the normal approximation is not good, as expected. In the second case (r = 1), the P-P plots show a bad fit (and almost superpose each other) for all values of s. This can be explained by the fact that the dominant part of the error comes from a single polynomial of degree 1 for all s. By looking at this plot, one may conclude that the CLT does not hold here if r = 1 and s → ∞. But the expression in (15) in that case becomes so the CLT does apply, but the convergence in s is very slow and does not show up in Figure 10.
We say that f has effective dimension d in proportion ρ in the truncation sense (Owen, 1998) If (19) holds for small d and ρ close to 1, this means that f depends almost only on the first few variables (or random numbers) U j . A low effective dimension in the truncation sense is incompatible with meeting (even approximately) the conditions of Proposition 6. Various techniques have been designed to transform the integrand f to reduce its effective dimension in the truncation sense in order to improve the effectiveness of RQMC (L'Ecuyer, 2009). The PCA method in Section 7 is one such technique. One side effect of these techniques is that they tend to reduce significantly the normality of the RQMC estimator. On the other hand, a low effective dimension in the superposition sense is not incompatible with the conditions of Proposition 6. In fact, all the integrands in Examples 16 and 17 have effective dimension 1 in the superposition sense and yet the proposition applies.

A first-order error decomposition
For a general s-dimensional integrand f , we will separate the sum of its onedimensional ANOVA components from the rest, and apply the results of Section 5 to this sum. If f {j} is the jth one-dimensional ANOVA component of f and if each f {j} has an integrable second derivative, then the integration error for the sum is a random variable whose distribution can be approximated as in (11) by that of This provides a first-order approximation for the situation where the one-dimensional ANOVA components are not all c-periodic. The importance of the contribution of E 1 to the total error g n (U) will depend on f . On the other hand, it does not depend on the choice of lattice, because all integration lattices have the same one-dimensional projections. From a different perspective, we have seen that when we randomize the points, we can first choose a small hypercube [0, 1/n) s + K/n at random, where K is a random vector uniformly distributed in {0, . . . , n − 1} s , and then sample a random shift U/n uniformly inside that hypercube, where U ∼ U (0, 1) s . Equivalently, we can sample K uniformly over a restricted subset of n s−1 distinct hypercubes that satisfy the conditions of Proposition 1, instead of over all n s hypercubes. The corresponding error is and the expected error conditional on K is This E 2 depends only on the choice of K and represents the error due to the difference of means across the small hypercubes. In contrast, E 1 does not depend on K and depends only on the random shift U/n. Thus, E 1 and E 2 are two independent components of the error g n (U). An equivalent way of viewing this is to write the first terms of the ANOVA decomposition ofg n (K, U) after conditioning on K: where we have used the fact (easy to verify) that the one-dimensional ANOVA components conditional on K,g n,{j} (K, U), do not depend on K. Let be the remainder of the error. We will show in what follows that when f is smooth enough, E 3 becomes negligible compared with E 1 and E 2 when n increases, and E 2 is typically the dominant part of the error when n → ∞.
In the case of Latin hypercube sampling, for comparison, ifẼ 1 is the total integration error on the one-dimensional ANOVA components, then n 3/2Ẽ 1 is asymptotically normal as in the one-dimensional case, whereas in two or more dimensions the total integration error obeys an ordinary CLT, as shown by Stein (1987).

Convergence of the error terms
We see from its definition that |E 1 | converges as O(n −1 ) (unless it is zero), and that E 1 depends neither on K nor on the choice of lattice. Its variance is In contrast, it is difficult to obtain a simple formula for the variance of E 2 , because it depends on both f and the lattice. We saw earlier that if f is smooth enough, namely if all its mixed partial derivatives of order one are absolutely integrable, then for any ǫ > 0, it is easy to find rank-1 lattice rules for all n, such that for any shift U, the integration error g n (U) converges as O(n −1+ǫ ). Then, |E 2 | and |E 3 | will converge at that speed or faster. Proposition 10 below shows that under a slightly stronger smoothness assumption on f , there are lattice rules for which |E 3 | converges almost as O(n −2 ). In fact, this convergence holds on average over rank-1 lattice rules, so it represents the typical behavior as soon as we make some effort to select a good rule. This means that E 3 would typically become negligible compared with E 1 + E 2 when n → ∞. We then obtain |E 2 | = O(n −1+ǫ ) while |E 1 | = O(n −1 ), so E 2 should eventually dominate when n → ∞. For a typical (moderate) n, however, E 1 may still dominate, and the distribution of n(E 1 +E 2 ) can be taken as an approximation of the distribution of ng n (K, U). We will use the following lemma in the proof.
Lemma 9. Suppose that f has integrable mixed partial derivatives of second order over [0, 1) s . Then, for any ǫ > 0, there exist rank-1 lattice rules for which and this property actually holds on average over all rank-1 lattice rules.
Proof. Let f j denote the partial derivative of f with respect to its jth coordinate. The partial derivative ofg n (K, u) with respect to the jth coordinate of u can be expanded as From the theory of lattice rules, as we have seen earlier, if all the mixed partial derivatives of first order of f j are absolutely integrable, then for any ǫ > 0, and for all n, on average over all rank-1 lattice rules, the worst-case integration error on f j by the lattice rule converges as O(n −1+ǫ ) uniformly in the shift U. If we use a rule at least as good as the average, we obtain Proposition 10. If f has integrable mixed partial derivatives of second order over [0, 1) s , then for any ǫ > 0, there exist rank-1 lattice rules for which E 3 (K, U) = O(n −2+ǫ ) uniformly in K and U, and this holds on average over all rank-1 lattice rules.
Proof. Fix the components of u equal to those of U, except for the first one, and denote U ′ = (U 2 , . . . , U s ) and U ′′ = (U 3 , . . . , U s ). By applying Lemma 12 of the appendix with m = 1 to the function F (u 1 ) =g n (K, u 1 , U ′ ), given that this function has two integrable derivatives, we obtain that for all U 1 ∈ (0, 1), In the O(n −2 ) term here and in all the O(n −2 ) terms that follow, the implied constants can be taken independent of K and U. The error difference in the square brackets on the right can be approximated by where we have used, for the left side, the fact that the second-order partial derivatives ofg n are O(n −2 ), and Lemma 9 for the right side. Substituting this approximation into (21), we find Next, we let u 2 vary again and integrate with respect to u 2 : Applying (22) to the integral on the right-hand side of the above equation, but with u 2 taking the place of u 1 , we obtain 1 0 1 0g n (K, u 1 , u 2 , U ′′ ) du 1 du 2 =g n (K, U 1 , U 2 , U ′′ ) This procedure can be applied repeatedly up to the last coordinate and this yields the result.
We just saw that with well-chosen lattice rules, E 2 becomes the dominating part of the error when n → ∞. However, we do not have an explicit form for the limiting distribution of E 2 , and it seems unlikely that a practically useful form (for computing confidence intervals in general) can be obtained. In fact, it seems that the distribution of E 2 could be almost arbitrary. More specifically, for any pre-specified continuous cdf F with finite support, by taking n large enough and picking any lattice rule with n points, one can construct a smooth function f for which the cdf of E 2 = E 2 (K) is arbitrarily close to F . It suffices to specify f so that the average error E 2 (K) takes the values F −1 ((j + 1/2)n −s+1 ) for j = 0, . . . , n s−1 when K runs through its n s−1 possibilities, and then smooth the function across the boundaries of the small hypercubes. This means that for large n, the errorg n (K, U) can have (approximately) essentially any distribution. On the other hand, it is unclear if this can be done for a fixed function f with integrable mixed partial derivatives of order two, and independent of n.  Figure 11 compares the cdf of nE 2 (in red) and that of ng n (U) (in blue) for Examples 4 and 5, where n = 8. In these figures, the cdf of nE 2 was computed exactly whereas that of ng n (U) is an empirical version obtained from 10 4 independent replications. For Example 4, it turns out that E 1 (U) = 0 for all U, sog n (K, U)−E 2 (K) = O(n −2+ǫ ), which explains the good agreement between the two distributions plotted in Figure 11 (top).

Two-dimensional examples
For Example 5, we have E 1 (U) = −(U 1 − 1/2)/(2n) + (U 2 − 1/2)/(3n), whose possible values range from −5/(12n) to 5/(12n), and whose cdf is a spline of second degree, according to Proposition 5. That is, for any k ∈ {0, . . . , n − 1}, ng n (k, U) can depart from its average in the small square by up to 5/12 ≈ 0.417 (if we neglect the O(n −2+ǫ ) term), which is more than twice the largest distance between any two consecutive jumps of the cdf of nE 2 , which is 0.200 (the distance between the first and last jumps of that cdf is approximately 0.571). This explains the fact that the cdf of ng n (U) here is not well approximated by that of nE 2 . It is much smoother. If we increase n, the steps of nE 2 become smaller and its discrete distribution gets closer to the empirical error distribution, but the gap between these two distributions does not converge to zero very quickly.

The general case
We saw earlier that the Fourier expansion of g n (u), for an arbitrary shift u ∈ [0, 1) s , is Let w 1 , . . . , w s be a basis of the dual lattice L * s and let W be an s × s matrix with columns w 1 , . . . , w s . These vectors have only integer coordinates. Thus, L * s is the set of all integer vectors h that can be written as for z = (z 1 , . . . , z s ) t ∈ Z s . By summing over the vectors z instead of the vectors h (a change of variable), we can rewrite for some function v n whose Fourier coefficients arev n (z) =f (Wz). This function v n depends on n and on the selected lattice, via W.
If we can recover v n explicitly from its Fourier coefficients, this would give an explicit formula for g n (u), the error as a function of the shift. More generally, we may be able to approximate from this expression at least the most important Fourier coefficients in the expansion of g n , and eventually from this, the distribution of g n (U).
Note that there is a link between this function v n and the generation of the shift uniformly in the parallelotope P . Indeed, if U has the uniform distribution over P , then we can write U = VŨ whereŨ ∼ U (0, 1) s and V is the s×s matrix whose columns are the basis vectors v 1 , . . . , v s . Then, W t V = I, and therefore g n (U) = v n (W t U) = v n (W t VŨ) = v n (Ũ). This implies that if U ∼ U (0, 1) s , then g n (U) and v n (U) have the same distribution (but they are not the same function of U).

A few multidimensional examples
Example 18. In one dimension, the dual lattice contains all multiples of n. Thus, the matrix W is one by one and contains the single entry n. The Fourier coefficients of v n are thenv n (z) =f (nz), for z ∈ Z, and we have g n (u) = v n (nu).
As a special case, if f is the kth Bernoulli polynomial B k , then its Fourier expansion is for u ∈ (0, 1), and we havê We can then conclude that v n (u) = n −k B k (u) and g n (u) = v n (nu) = n −k B k (nu mod 1) for all u ∈ (0, 1). But if U is U (0, 1), then nU mod 1 is also U (0, 1), and therefore n k g n (U ) has exactly the same distribution as B k (U ). This agrees with Example 10.
Example 19. Suppose that f : [0, 1) s → R is a product of Bernoulli polynomials:

Its Fourier expansion is then
for u = (u 1 , . . . , u s ) t ∈ (0, 1) s , where u(h) = {j : h j = 0}. In this case, the Fourier coefficientv n (z) =f (Wz) can be written aŝ where w (j) is the j-th row of the matrix W. Here, (w (j) z) mj is a multivariate polynomial in z. Example 20 illustrates the specific case where m j = 1 for all j.
Example 20. Figure 12 shows a normal P-P plot of the error for The discrepancy from the normal distribution becomes more obvious for s = 20. For this special case, for s = 2, the Fourier coefficients of the error function for a rank-1 lattice rule can be written explicitly aŝ v n (z 1 , z 2 ) = −1 4π 2 nz 1 (−a 1 z 1 + z 2 ) . 6.5. Sampling the error in the dual space So far, we saw asymptotic distributions and convergence conditions for the error.
Here we point out that expansion formula in (24) can sometimes be used to estimate the error distribution. The idea is to compute a truncated version of this expansion,g whereZ = {z = (z 1 , . . . , z s ) : −z j,0 ≤ z j ≤ z j,0 } for some positive integers z j,1 , . . . , z j,s . If we know the Fourier coefficientsf (h) of f , then the error distribution can be estimated by the empirical distribution of m independent replicates ofg(U) instead of m independent replicates ofμ n,rqmc as explained in Subsection 5.4. This could be advantageous if n is large and |f (Wz)| converges quickly to zero as z increases. This convergence speed may depend on the choice of W, i.e., the choice of dual basis.
Example 21. Consider the two five-dimensional functions We take the same lattice with n = 8192 as in Subsection 5.4. For W, we computed a Minkowski-reduced basis of the dual lattice (Afflerbach and Grothe, 1985); this gives basis vectors which are as short as possible and of almost the same length. Then we approximated the error distribution with m = 10 5 independent realizations ofg(U) in (26), for both f 1 and f 2 . We did this with z 1,0 = z 2,0 = z 0 first with z 0 = 1 and then with z 0 = 4. The Fourier coefficients converge much faster for f 2 than for f 1 , so we expect a better approximation for f 2 . This is confirmed empirically in Figure 13, which shows a P-P plot of the empirical distributionF of the m independent replicates of ng n (U ) against the empirical distributionF of m replicates of ng n (U ) sampled as in Subsection 5.4. Note that here,g n (U ) is much less time-consuming to sample than g n (U ). We observe an excellent fit in the case of f 2 , even with z 0 = 1. For f 1 , the approximation with z 0 = 1 is poor, while that with z 0 = 4 is reasonable but the lack of fit is still visible.

Example: Pricing an Asian option
Here we take an application used as an example in many articles. We compare two ways of applying RQMC, which can be seen as two ways of defining the function f to estimate the same µ. We examine empirically the RQMC error distribution and see how far it is from normal. Based on the asymptotic results of Section 5.3, we expect the error distribution to be generally closer to normal when f depends more or less equally on several coordinates of u, and farther from normal in the opposite case. Our results agree with this intuition.
Consider the pricing of an Asian option based on a geometric Brownian motion (GBM) {S(t), t ≥ 0} observed at discrete time points 0 = t 0 < t 1 < · · · < t s = T . The GBM process can be written as S(t) = S(0) exp[X(t)] where X(t) = (r − σ 2 /2)t + σB(t), {B(t), t ≥ 0} is a standard Brownian motion, r is the risk-free interest rate, and σ is the volatility parameter. The option's net discounted payoff is Y = e −rT max(0, S − K) where K > 0 is a constant called the strike price and S = (S(t 1 ) + · · · + S(t s ))/d, and the price of the option is where the expectation is under the risk neutral measure (Glasserman, 2004;Hull, 2000).
To simulate S and Y we need to sample the vector X = (X(t 1 ), . . . , X(t s )) from an s-dimensional normal distribution with mean µ = (r − σ 2 /2)(t 1 , . . . , t s ) than 98 % of the variance is concentrated on the first coordinate of u, which means that f looks more like a single one-dimensional function rather than a sum of several independent one-dimensional functions.

Acknowledgments
This work has been supported by an NSERC-Canada Discovery Grant and a Canada Research Chair to the first author, the EuroNF Network of Excellence to the third author, and INRIA's associated team MOCQUASIN to all authors. This paper is an expanded version of L'Ecuyer and Tuffin (2009) and part of the material was also presented at the Sixth St. Petersburg Workshop on Simulation, in June 2009.

Appendix A: Derivation of the generalized Euler-MacLaurin expansion
The following is the derivation of the generalized Euler-MacLaurin expansion given in Theorem 2. Here we use { · } to denote the component-wise modulo-1 operator.
Integration by parts of the first and second terms on the right-hand side yields respectively, where we have used the identity B ′ k+1 (u) = (k +1)B k (u). Summing the latter two expressions and using B 1 (0) − B 1 (1) = 1, B k (0) − B k (1) = 0 for k ≥ 2, Lemma 11 is readily obtained.
Lemma 12. If F has m + 1 integrable derivatives over [0, 1], then for any U ∈ (0, 1), Proof. First note that 1 0 F (u)du = 1 0 F (0) (u) B 0 ({U − u}) du, then apply Lemma 11 recursively: Next, we take the term with k = m + 1 out of the sum and into the integral, which yields Lemma 12. Now, to prove Theorem 2, we first write the integral of f as The sum over the index i in the second term on the right-hand side is telescoping and reduces to f (k−1) (1) − f (k−1) (0). From here, obtaining Theorem 2 is straightforward.
Example of a stochastic activity network, taken from Elmaghraby (1977) Appendix B: An additional example: A stochastic activity network We consider the graph of Figure 16, where each arc j has a random length V j with cdf F j (·), for j = 1, . . . , 13, and these lengths are assumed independent. Let T be the length of the longest path from node 1 to node 9 in this graph. We are interested in estimating both E[T ] and q(x) = P[T > x] for a given constant x, by simulation. This type of graph may represent the precedence relationships between the activities of a project, with the arc lengths representing activity durations and T the (random) time needed to complete the project. This particular example is taken from Elmaghraby (1977), where the F j 's can be found; see also Avramidis and Wilson (1996) and L'Ecuyer and Lemieux (2000).
To estimate E[T ] and q(x) = P[T > x] by Monte Carlo, it suffices to generate the random variables V 1 , . . . , V 13 and compute T by a standard longest-path algorithm, repeat this n times to obtain n realizations T 1 , . . . , T n of T , and then estimate E[T ] by T n = (T 1 + · · · T n )/n and q(x) by Y n = (1/n) For RQMC, we generate the n replicates of (V 1 , . . . , V 13 ) by inversion from the n points of the lattice rule Y j = F −1 j (U j ) where U j is the jth coordinate of the RQMC point, and we repeat this for each point. Figures 17 and 18 (left) show the normal P-P plots of the error for m = 10 4 independent replicates of T n and Y n , respectively, with RQMC and n = 8192. We observe a surprisingly good agreement with the normal distribution for T n . This can be explained (intuitively) by the fact that the length of each path is a sum of independent random variables and some of these random variables already have the normal distribution. On the other hand, the empirical distribution of Y n appears to be concentrated on a small number of values, and that number becomes even smaller (the steps are larger) when q(x) gets closer to 0 or 1. Among the 10 4 independent replications with RQMC, we have observed exactly 40, 113 and 55 distinct values of Y n (these are the number of jumps in the empirical distribution) for x = 30, 64, 100, respectively. Here we have E[T ] ≈ 64, q(64) ≈ 0.420 (thus the mean value of T is below the median), q(30) ≈ 0.992 and q(100) ≈ 0.066 In the figure, the (horizontal) jumps inF Zn,m corresponds to the numbers (or counts) of observed values. The (vertical) jumps in Φ correspond to spacings between observed values, and since all observed values are equally spaced, the magnitude of a jump in Φ is the probability that a standard normal variable falls between the two consecutive (standardized) observed values. The biggest jumps inF Zn,m are in the same region (the middle of the plot) as the biggest jumps in Φ. This means that the discrete distributionF Zn,m behaves very much like a normal.
For this example, an estimator with lower variance than Y n can be constructed by conditional Monte Carlo (CMC) as follows; see Avramidis and Wilson (1998), Section 4.1, and references therein. Let L = {5, 6, 7, 9, 10} be the set of arcs that separate the nodes {1, 2, 3, 4, 5} from the nodes {6, 7, 8, 9}, and let B be the eight other arcs. For the CMC estimator, we generate only the arc lengths V j for j ∈ B and we estimate P[T > x] by the conditional expectation of I[T > x] given {V j , j ∈ B}, that is, by W = P[T > x | {V j , j ∈ B}]. Let W n be the average of the n replicates of W . A normal P-P plot for the error on this CMC estimator W n is given in Figure 18 (right). The error distribution is now smooth, and this comes from the fact that W is no longer an indicator function; now it has a density. But it still departs significantly from the normal distribution for x = 100.
We have examined the empirical error distribution for multiple good rank-1 lattices, with different values of n, with and without the baker's transformation. The shape of the distribution was found to be strongly dependent on the choice of lattice parameters. Figure 19 gives an idea of the range of observed distribution shapes by showing histograms of this empirical distribution, with and without the baker's transformation, using the lattice rule with n = 8192 described in Subsection 5.4, and with the baker's transformation for a another good rank-1 lattice rule with n = 8191.