Discrepancy bounds for deterministic acceptance-rejection samplers

Abstract: We consider an acceptance-rejection sampler based on a deterministic driver sequence. The deterministic sequence is chosen such that the discrepancy between the empirical target distribution and the target distribution is small. We use quasi-Monte Carlo (QMC) point sets for this purpose. The empirical evidence shows convergence rates beyond the crude Monte Carlo rate of N−1/2. We prove that the discrepancy of samples generated by the QMC acceptance-rejection sampler is bounded from above by N−1/s. A lower bound shows that for any given driver sequence, there always exists a target density such that the star discrepancy is at most N−2/(s+1). For a general density, whose domain is the real state space Rs−1, the inverse Rosenblatt transformation can be used to convert samples from the (s − 1)-dimensional cube to Rs−1. We show that this transformation is measure preserving. This way, under certain conditions, we obtain the same convergence rate for a general target density defined in Rs−1. Moreover, we also consider a deterministic reduced acceptance-rejection algorithm recently introduced by Barekat and Caflisch [F. Barekat and R. Caflisch, Simulation with Fluctuation and Singular Rates. ArXiv:1310.4555 [math.NA], 2013.]


Introduction
The Monte Carlo (MC) method is one of the widely used numerical methods for simulating probability distributions. However, sometimes it is not possible to sample from a given target distribution. Markov chain Monte Carlo (MCMC) methods have been developed to address this problem. Instead of sampling independent points directly, MCMC samples from a Markov chain whose limiting distribution is the target distribution. MCMC has widened the applications of MC in many different fields [5,22]. Another deficiency of MC algorithms is its slow convergence rate. Quasi-Monte Carlo (QMC) algorithms on the other hand perform better in improving the convergence rate of Monte Carlo which partially depends on generating samples with small discrepancy. For a survey of QMC we refer to [10,11]. Putting the QMC idea into MCMC is a good way to improve the convergence rate and widen practical applications. Recently many results in this direction have been achieved [6,7,39,40]. With this paper we add another result in this direction by using a deterministic driver sequence in an acceptance-rejection algorithm. We prove discrepancy bounds of order N −1/s , where the dimension of the state space is s − 1 and N is the number of samples. The discrepancy here is a generalization of the concept of the Kolmogorov-Smirnov test between the empirical distribution of the samples and the target distribution to higher dimension. A more detailed description of our results will be provided below.

Previous work on MCMC and QMC
In the following we describe some previous research on QMC and MC. L'Ecuyer studied the convergence behavior of randomized quasi-Monte Carlo for discretetime Markov chains in [21], known as array-RQMC. The general idea is to obtain a better approximation of the target distribution than with the plain Monte Carlo method by using randomized QMC. In a different direction, Tribble [39] and Tribble and Owen [40] established a condition under which low discrepancy sequences can be used for consistent MCMC estimation for finite state spaces. It has been shown that replacing an IID sequence by a completely uniformly distributed sequence also implies a consistent estimation in finite state spaces. A construction of weakly completely uniformly distributed sequences is also proposed in [40]. As a sequel to the work of Tribble, Chen in his thesis [6] and Chen, Dick and Owen [7] demonstrated that Markov chain quasi-Monte Carlo (MCQMC) algorithms using a completely uniformly distributed sequence as driver sequence gives a consistent result under certain assumptions on the update function and Markov chain. Further, Chen [6] also showed that MCQMC can achieve a convergence rate of O(N −1+δ ) for any δ > 0 under certain conditions, but he only showed the existence of a driver sequence.
In our recent work [12], done with Rudolf, we prove upper bounds on the discrepancy under the assumptions that the Markov chain is uniformly ergodic and the driver sequence is deterministic rather than independent uniformly distributed random variables. In particular, we show the existence of driver sequences for which the discrepancy of the Markov chain from the target distribution with respect to certain test sets converges with (almost) the usual Monte Carlo rate of N −1/2 . A drawback of this result is that we are currently not able to give an explicit construction of a driver sequence for which our discrepancy bounds hold for uniformly ergodic Markov chains. Garber and Choppin in [14] adapted low discrepancy point sets instead of random numbers in sequential Monte Carlo (SMC). They proposed a new algorithm named sequential quasi-Monte Carlo (SQMC). They constructed consistency and stochastic bounds based on randomized QMC point set for this algorithm. It is an open problem to obtain deterministic bounds for SQMC. More literature review about applying QMC to MCMC problems can be found in [7, Section 1].

Acceptance-rejection algorithms
We now give a description of the algorithms in this paper. Let ψ : D → R + = [0, ∞) be our target density function, where D ⊆ R s−1 and R + = [0, ∞). We consider the cases where D = [0, 1] s−1 or D = R s−1 . Assume that it is not possible to sample directly from the target distribution. One possible solution to obtain samples from ψ is to choose a proposal density H from which we can sample and then use an acceptance-rejection algorithm. Assume there exists a constant L < ∞ such that ψ(x) < LH(x) for all x ∈ D. The following algorithm can be used to obtain samples with distribution ψ.
Algorithm 1 (Random acceptance-rejection (RAR) algorithm). Given a target density ψ : D → R + and a proposal density H : D → R + . Assume that there exists a constant L < ∞ such that ψ(x) < LH(x) for all x in the domain D. We introduce another random variable u having uniform distribution in the unit interval, i.e. u ∼ U ([0, 1]). Then the acceptance-rejection algorithm is given by LH(X) , otherwise go back to step 1. See also [4,9,19] for a discussion of related algorithms. For a discussion on how to select proposal densities see for instance [4] and the references therein. The acceptance-rejection sampler works to sample from an unknown density based on a proposal density.
Acceptance-rejection sampling and importance sampling [34, Section 3] are quite similar ideas. Both of them distort a sample from a distribution in order to sample from another one. However, there is a difference in the selection of the constant L > ψ(x)/H(x) for x in the domain D. The acceptance-rejection method does not work when sup x∈D ψ(x)/H(x) = ∞, while importance sampling is still available [33]. In this paper, we only use the acceptance-rejection sampler to get samples of a given target density, since we are interested in obtaining discrepancy bounds for MCQMC. More information on general strategies for generating nonuniform random variables can be found in the monographs [9,19].
The acceptance-rejection algorithm with deterministic driver sequence is one special class of MCQMC. From the superior distribution properties in terms of the discrepancy of the Sobol sequence [37] one could expect an improvement in the discrepancy of the samples obtained from the acceptance-rejection algorithm based on the Sobol sequence. In one dimension the discrepancy we study is the Kolmogorov-Smirnov test between the target distribution and the empirical distribution of the sample points. For a given point set in the s-dimensional unit cube, the star discrepancy measures the difference between the proportion of points in a subinterval of [0, 1] s and the Lebesgue measure of this subinterval.
We defer the precise definition of discrepancy to Section 3.
In this paper, we replace the IID initial samples with an explicit construction of the driver sequence by using (t, m, s)-nets [11,27] (obtained from the Sobol sequence). Figure 1 shows a comparison between different driver sequences: de- terministic points (Sobol points) and pseudo-random uniform points (they both have 2 9 points). The acceptance-rejection sampler works by only accepting those points under the target density curve. The difference of driver sequences will affect the samples we obtain by the acceptance-rejection algorithm, hence the distribution properties of the points which were accepted will be influenced. The right two figures in Fig. 1 show the histograms of the points which we accepted in both cases. Note that the deterministic samples better estimate the density function. Our interest in this paper is in entirely deterministic methods. However, one could also use randomized quasi-Monte Carlo point sets [30,31,32] and study a randomized setting.

Previous work on deterministic acceptance-rejection algorithms
The deterministic acceptance-rejection algorithm has also been discussed by Moskowitz and Caflisch [24] and Wang [41,42]. Therein a smoothing technique was introduced to improve the numerical performance of the acceptancerejection algorithm. Wang [42] gave a heuristic argument to indicate a convergence rate of order N − s+2 2(s+1) . This argument assumes that the points in elementary intervals are uniformly distributed. Thus this reasoning is not fully deterministic. Our lower bound on the discrepancy (Theorem 2) indicates that this reasoning does not apply in our case. The numerical experiments in [42] also indicate an improvement using a well chosen deterministic driver sequence (in this case the so-called Halton sequence [17]) compared to a random driver sequence. Recently, Nguyen andÖkten in [25] presented a consistency result of an acceptance-rejection algorithm for low-discrepancy sequences. This algorithm yielded good numerical performances on standard deviation and efficiency. However, proving an explicit convergence rate of the discrepancy for this algorithm is still an open problem. See also [23,25] for numerical experiments using quasi-Monte Carlo point sets for the related problem of integrating indicator functions.
It is worth noticing that all results given in previous work are empirical evidence and the discrepancy of samples is not directly investigated. Our work focuses on discrepancy properties of points produced by totally deterministic acceptance-rejection methods. We also prove discrepancy bounds on deterministic acceptance-rejection algorithms, including an upper bound and a lower bound. The combination with the reduced acceptance-rejection sampler provides further evidence of the good performance of the deterministic method. Our algorithm here may also be combined with similar algorithms like the acceptance-complement method, see for instance [9,Section II.5].
Before presenting the theoretical background, we briefly describe deterministic algorithms and some numerical results which show a convergence rate comparison using Monte Carlo and quasi-Monte Carlo methods.

Construction of driver sequence
In this paper, we use low discrepancy point sets given by (t, s)-sequences (see Definition 3 and Definition 4 below) in base b as driver sequences. The first b m points of a (t, s)-sequence are a so called (t, m, s)-net in base b. Explicit constructions of (t, s)-sequences in base 2 have been found by Sobol [37], in prime base b ≥ s by Faure [13] and in prime-power base b by Niederreiter [26]. In all these constructions t depends only on s but not on m. In practice, since digital nets (based on Sobol points) are included in the statistics toolbox of Matlab, this method is very easy to implement. People seeking more discussion of construction methods can also consult [11,Chapters 4&8].
In the following we describe the algorithm and present some numerical results. Since in general it is computationally too expensive to compute the supremum in the definition of the star-discrepancy exactly, we use a so-called δ-cover to estimate this supremum. An introduction to δ-covers is provided in the appendix. In the numerical discussion, the driver sequence is generated by a (t, m, s)-net in base 2. Specifically, we always use a Sobol sequence [37] to generate (t, m, s)-nets for our experiments.

Deterministic algorithm for target densities defined on [0, 1] s
We consider now the case where the target density is defined on [0, 1] s−1 . The following algorithm is a deterministic version of Algorithm 1. For the proofs later, we need the technical assumption that the target density is pseudo-convex. The definition of pseudo-convexity is discussed in Section 3.  The following example shows a better convergence rate when using a lowdiscrepancy driver sequence rather than a random point set. In each example the reported discrepancy for the acceptance-rejection algorithm using a random diver sequence is the average of 10 independent runs, which is throughout all the numerical experiments.
Example 1. In this example we consider a non-product target density in [0, 1] 4 . Let the target density ψ be Figure 2 shows the discrepancy by using deterministic points and pseudorandom points as driver sequence. For the RAR algorithm, we observe a convergence rate of order N −0.482 , whereas the DAR algorithm shows a convergence rate of the discrepancy of order N −0.659 .

Deterministic algorithm for target density defined in real state space
Now we extend the domain of the target density ψ to R s−1 with s ≥ 2. Assume that there is a proposal density function H : The inverse Rosenblatt transformation is used to generate samples from the proposal density in the real state space R s−1 . Let F be the joint CDF of H and F j (z j |z 1 , . . . , z j−1 ) be the conditional CDF of the proposal density for j = 1, . . . , s − 1. The transformation T is used to generate points in R s−1 × R + from the unit cube [0, 1] s , such that the projection of points onto the first s − 1 coordinates has distribution H. More precisely, let T : [0, 1] s → R s be the transformation given by z = T (u), where z = (z 1 , . . . , z s ), u = (u 1 , . . . , u s ) and (1) The first s − 1 coordinates are produced by the inverse Rosenblatt transformation which converts the points from the unit cube [0, 1] s−1 to R s−1 . The sth coordinate is uniformly distributed on the line More details with respect to the Rosenblatt transformation and extensions can be found in [8,29,35].
Algorithm 3 (Deterministic acceptance-rejection algorithm in R s ). Let an unnormalized target density function ψ : R s−1 → R + , where s ≥ 2, be given. Let H be a proposal density H : R s−1 → R + , such that there exists a con- Suppose we aim to obtain approximately N samples from ψ.
N we accepted onto the first (s − 1)-dimensional space. Denote the first s − 1 coordinates of the points we accept by the acceptance-rejection method by P We provide an example to demonstrate the performance of Algorithm 3.

Example 2.
Let the target density function be given by The proposal density function H, which we use to do the acceptance-rejection to generate samples of ψ(x 1 , x 2 ), is chosen as For this choice of H, we use the transform T defined in Equation (1) to obtain samples from H. The sample (x j,1 , x j,2 ) is given by the following transformation Note that (u j,1 , u j,2 ) is the driver sequence given by a (t, m, 2)-net in base b.
The order of the star discrepancy is demonstrated in Figure 3 where N is the number of accepted samples. The numerical experiments show that the star discrepancy converges at a rate of N −0.720 for this example using quasi-Monte Carlo samples as proposal. The RAR algorithm converges with order N −0.390 . Again, the DAR sampler outperforms the RAR sampler.

A deterministic reduced acceptance-rejection sampler
In this subsection we consider an extension of the DAR sampler. The random version of this reduced method was recently introduced by Barekat and Caflisch in [2]. For a target density function ψ, we carefully select H such that for ψ − H and H the inverse CDF can be computed. For the case ψ(x) > H(x), we write ψ = (ψ − H) + H and get samples according to ψ − H and H respectively. Figure 4 illustrates this method. The sample sets of ψ can be divided into three subsets, R 1,1 , R 1,2 and R 2,2 , where R 1,2 and R 2,2 can be directly generated by using the inverse CDF of H and ψ − H in a certain range. The acceptance-rejection method is only used to obtain R 1,1 . Compared with the ordinary acceptance-rejection sampler, one obvious merit of this method is that we do not require ψ(x) ≤ H(x) in the whole domain. Also, this method might give better convergence rates since R 1,2 and R 2,2 are obtained via inversion and therefore have low discrepancy. Algorithm 4 gives a simple version of the improved method. More discussion of a general version is available in Section 5.
ii) Use the acceptance-rejection method with the target density ψ and the proposal density H on the domain S using {x 0 , x 1 , . . . , x M−1 } as driver sequence. Choose M such that N 1 points are accepted by the DAR algo- Return the point set R Since the inverse transform is a measure-preserving transformation, it can preserve the uniformities of the driver sequence. Thus R 1,2 and R 2,2 are low discrepancy point sets. The following example verifies the efficiency of the DRAR algorithm. A theoretical result about the discrepancy properties of samples obtained by this class of algorithms is provided in Theorem 5.
Example 3. Let ψ(x) = sin(4x) + x 2 be a density function defined on [0, 1]. Instead of seeking a proposal density H such that ψ(x) ≤ H(x), we notice that inversion can be implied to sin(4x) and x 2 independently. However, it can not work for their sum. Choose H(x) = x 2 . We only do deterministic acceptancerejection with respect to the target density ψ and proposal density H in the subinterval S = (π/4, 1]. In the remaining range L = [0, π/4], we apply the inverse transformation on H and ψ−H to obtain samples based on a deterministic driver sequence.
The discrepancy of the point set generated by Algorithm 4 converges at the rate of N −0.929 , which is significantly better than the N −0.501 convergence rate of a random driver sequence, see Figure 5.

Background on discrepancy theory and (t, m, s)-nets
In this section we first establish some notation and some useful definitions and then obtain theoretical results. First we introduce the definition of (t, m, s)-nets and (t, s)-sequence in base b (see [11]) which we use as the driver sequences. The following fundamental definitions of elementary interval and fair sets are used to define a (t, m, s)-net and (t, s)-sequence in base b. The concept of discrepancy is introduced in [16] to measure the deviation of a sequence from the uniform distribution. Now we give the definition of the so-called star discrepancy which enables us to distinguish the quality of point sets with respect to the uniform distribution.
Definition 5 (star discrepancy). Let P N = {x 0 , x 1 , . . . , x N −1 } be a point set in [0, 1) s . The star discrepancy D * N is defined by where the supremum is taken over all Figure 6 for an illustration of the concept of discrepancy in the unit square.
If we extend the supremum in Definition 5 over all convex sets in [0, 1] s , we get another interesting discrepancy, the so-called isotropic discrepancy. It is another measure of the distribution properties of point sets with respect to convex sets. For further reading about the definition and properties of discrepancy, we refer for instance to [11,16].
For our purposes here we need the definition of pseudo-convex sets which we introduce in the following (see also [1, Definition 2] and Figure 7 for an example).
Then A is called a pseudo-convex set and A 1 , . . . , A p is an admissible convex covering for A with p parts and with q convex parts of A. Remark 1. For convenience, we call a nonnegative function pseudo-convex if and only if the region below its graph is a pseudo-convex set.
Next we present a bound on the isotropic discrepancy of points generated by (t, m, s)-nets. A detailed proof is given in Appendix B.1.
for some constant C s > 0. These two inequalities therefore yield a convergence rate of order M −1/s (log M ) 1−1/s .
The following lemma will be used to get a discrepancy bound for a point set on a pseudo-convex set. It is an extension of [1, Lemma 5] to the unit cube.

Discrepancy investigation of the deterministic acceptance-rejection sampler
The first result we get is a discrepancy bound with respect to the target density of samples generated by the acceptance-rejection algorithm with deterministic driver sequences. The star discrepancy of points generated by the acceptancerejection algorithm with respect to the target density converges at the rate of N −1/s , where N is the number of accepted samples. See Theorem 1 for details. The proof uses a bound on the discrepancy of our driver sequence with respect to convex sets (which is called isotropic discrepancy, see Definition 6 for details).

Upper bound
Let an unnormalized density function ψ : [0, 1] s−1 → R + be pseudo-convex, and which is pseudo-convex in [0, 1] s as ψ is a pseudo-convex function. Assume that there is an admissible convex covering of A with p parts and with q convex parts of A. Without loss of generality, let A 1 , . . . , A q be the convex subsets of A and A q+1 , . . . , A p , such that A ′ j = A j \A is convex for q + 1 ≤ j ≤ p. The definition of the star discrepancy of a point set {y 0 , y 1 , . . . , y N −1 } with respect to a density function ψ is given as follows.
Remark 2. Note that 1 C ψ is a probability density function on [0, 1] s−1 . Thus the discrepancy in Definition 8 measures the difference between the distribution 1 C ψ and the empirical distribution of the sample points with respect to the test sets [0, t) for t ∈ [0, 1] s−1 .
Theorem 1. Let the unnormalized density function ψ : [0, 1] s−1 → R + , with s ≥ 2, be pseudo-convex. Assume that there is an admissible convex covering of A given by Equation (2) with p parts and with q convex parts of A. Then the discrepancy of the point set {y 0 , y 1 , . . . , y N −1 } ⊆ [0, 1] s−1 generated by Algorithm 2 using a (t, m, s)-net in base b, for large enough N , satisfies We postpone the proof of this theorem to Appendix B.1.

Lower bound
In this section, we provide a lower bound on the star discrepancy with respect to a convex density function. The general idea is to find, for a given driver point set, a density function satisfying a certain convergence rate.
Theorem 2. Let P M be an arbitrary point set in [0, 1] s . Then there exists a concave density function ψ defined in [0, 1] s−1 such that, for N samples generated by the acceptance-rejection algorithm with respect to P M and ψ, we have where c s > 0 is independent of N and P M but depends on s.
A detailed proof is provided in Appendix B.2. We would like to point out that the lower bound also limits the convergence rates which we can obtain in our current approach via convex sets. Note that a concave function is also pseudo-convex as defined in Remark 1.
Additionally, note that [3] (in dimension s = 2) and [38] (for dimension s > 2) showed the existence of points with discrepancy with respect to convex sets bounded from above by N −2/(s+1) (log N ) c(s) (where c(s) is a function of only s). This would yield an improvement of our results from N −1/s to N −2/(s+1) (log N ) c(s) , however, those constructions are not explicit and can therefore not be used in computation.

Generalization to real state space
We consider now the case where the target density is defined on R s−1 with s ≥ 2. The aim is to show a discrepancy bound on samples generated by the deterministic acceptance-rejection method. The discrepancy with respect to a given density function ψ : R s−1 → R + is defined as follows.
Definition 9. Let P N = {z 0 , z 1 , . . . , z N −1 } be a point set in R s−1 . Let ψ : R s−1 → R + be an unnormalized probability density function. Then the star discrepancy D * N,ψ (P N ) is defined by . . . , t s−1 ). We use the transformation T given in Equation (1)   To prove a bound on the discrepancy of the samples generated by Algorithm 3, the following assumption is needed. Assumption 1. Let ψ be the target density and H be a product measure proposal density function, which is chosen such that its inverse CDF can be computed. Let A = {z ∈ R s : ψ(z 1 , . . . , z s−1 ) ≥ Lz s H(z 1 , . . . , z s−1 )} and the transformation T −1 is defined as the inversion of transform T . Then we assume that T −1 (A) is pseudo-convex.
As the mappings T and T −1 are measure preserving, and since there are the same number of samples in an arbitrary subset D ⊆ [0, 1] s and the corresponding subset T (D) ⊆ R s−1 × R + , we can consider the discrepancy in the unit cube instead of that in R s−1 × R + . Following by similar proof arguments as for Theorem 1 and Theorem 2, we obtain the same discrepancy bounds including an upper bound and a lower bound for the general density ψ defined in the real state space R s−1 .
Theorem 3. Let the unnormalized target density ψ : R s−1 → R + and the proposal density H : R s−1 → R + satisfy Assumption 1. Then the discrepancy of the point set P for all x ∈ R s−1 . ists an unnormalized density function ψ defined in R s−1 satisfying the assumption in Theorem 3 such that the star discrepancy of the points generated by the acceptance-rejection sampler with respect to ψ and H satisfies where c s is independent of N and P M , but only dependent on s.

Discrepancy properties of the deterministic reduced acceptance-rejection sampler
Algorithm 4 can be extended to a more general case. Consider the target density If it is possible to sample from H ℓ (x) individually and the expectations of H ℓ can be calculated or estimated with low cost, then we can use an embedded deterministic reduced acceptance-rejection sampler in each step. Let . . , k − 1, and, in particular, ψ k is the target density.
Suppose we aim to sample N points from the target density ψ. The sample set can be divided into two types, namely, points generated from the sets S ℓ 's and L ℓ 's respectively. We apply a deterministic acceptance-rejection method given in Algorithm 3 in each S ℓ with respect to ψ k−ℓ+1 and H ℓ . Note that we get ⌈N S ℓ ψ k−ℓ+1 (x)dx/ D ψ k (x)dx⌉ points from S ℓ for ℓ = 1, . . . , k − 1. For sampling from L ℓ , the remaining samples come from applying the inverse transformation of H ℓ in L ℓ . Then we obtain additional ⌈N L ℓ H ℓ (x)dx/ D ψ k (x)dx⌉ points from L ℓ for ℓ = 1, . . . , k. We conduct the procedure inductively until we get samples from H k (x). We assume that S ℓ ψ k−ℓ+1 (x)dx/ D ψ k (x)dx and L ℓ H ℓ (x)dx/ D ψ k (x)dx can be calculated or estimated.
The following algorithm is an extension of the DRAR algorithm, which summarizes the embedding idea.
be a target density we aim to sample from. Define ψ k−ℓ+1 (x) = k i=ℓ H i (x) for ℓ = 1, . . . , k. Denote S ℓ and L ℓ like in Equation (3) and assume that S ℓ ψ k−ℓ+1 (x)dx/ D ψ(x)dx and L ℓ H ℓ (x)dx/ D ψ(x)dx can be calculated or estimated. Further assume that we can sample from H ℓ individually by applying the transformation T H ℓ ,S ℓ and T H ℓ ,L ℓ given in Equation (1) in S ℓ and L ℓ respectively. Suppose we aim to generate N samples from ψ. Let

H. Zhu and J. Dick
For ℓ from 1 to k do: ii) Compute T H ℓ ,S ℓ (x n ) for n = 0, 1, 2, . . .. Use the acceptance-rejection method with respect to ψ k−ℓ+1 and H ℓ on the domain S ℓ using {x 0 , x 1 , . . . , x M−1 } as driver sequence. Choose M such that N 1,ℓ points are accepted by the DAR algorithm. Let R 1,ℓ = {z 0 , z 1 , . . . , z N 1,ℓ −1 } be the accepted points. iii) Compute T H ℓ ,L ℓ (x n ) for n = 0, 1, . . . , N onto the first s − 1 coordinates. Return the set R (s−1) N . Now we consider the discrepancy properties of sample points produced by this algorithm. Note that the sample set of ψ = k ℓ=1 H ℓ can be decomposed into several subsets with different star discrepancy.

Theorem 5. For a given target density
, where ψ and H ℓ , for 1 ≤ ℓ ≤ k, are piecewise continuous. Let S ℓ and L ℓ be given by (3). Let R (s−1) N be the sample set generated by Algorithm 5, where which is the number of points generated from S ℓ , and which is the number of points generated from L ℓ for ℓ = 1, . . . , k. Assume that N 1,ℓ and N 2,ℓ can be calculated or estimated for the given target density ψ and N . Then we have where D * S ℓ ,ψ k−ℓ+1 and D * L ℓ ,H ℓ is the discrepancy of the samples in S ℓ and L ℓ respectively.
The proof of Theorem 5 is given in Appendix B.3. Note that this method achieves an improved acceptance rate of points since we are only rejecting points in a certain range. For the remaining domain, we get samples by applying the inverse transform. To be more exact, all point sets from L ℓ have low discrepancy since the inverse transformation is directly applied with respect to H ℓ for ℓ = 1, . . . , k. Now we consider the star discrepancy of points generated from S ℓ .
The following result from [20] gives an improved upper bound on the star discrepancy on the first M terms of a (t, s)-sequence in base b with s ≥ 2.
Lemma 5. The star discrepancy of the first M terms of a (t, s)-sequence P M in base b with s ≥ 2 satisfies for some constant C s > 0 depending only on s.
With the help of Lemma 2, we obtain a bound on the isotropic discrepancy of the first M points of a (t, s)-sequence.
for some constant C ′ s > 0 depending only on s. Hence, for the star discrepancy of R 1,ℓ for 1 ≤ ℓ < k, using a (t, s)-sequence as a diver sequence in the DAR algorithm we have a convergence rate of order N −1/s 1,ℓ log N 1,ℓ . We omit a detailed proof since similar arguments as for proving Theorem 1 can be used. The following corollary holds by substituting the proper upper bounds and N 1,ℓ , N 2,ℓ in terms of N . Assume that H ℓ is a product density on S ℓ and L ℓ for 1 ≤ ℓ ≤ k. Let R (s−1) N be the sample set generated by Algorithm 5. Then we have and C S ℓ ,ψ k−ℓ+1 and C L ℓ ,H ℓ are constants associated with S ℓ , ψ k−ℓ+1 and L ℓ , H ℓ respectively.

Conclusion and outlook
As is well known, the integration error using a Monte Carlo method converges at the rate of N −1/2 . The acceptance-rejection sampler with a deterministic driver sequence, which is a simple class of MCQMC methods, performs much better in our numerical experiments than the theoretical result N −1/s of Theorem 1 or Theorem 3, for a density defined on [0, 1] s−1 or R s−1 , would imply. The three examples even demonstrate that it is possible to achieve a better convergence rate than with standard Monte Carlo using a well-chosen deterministic driver sequence.
The main drawback of the acceptance-rejection sampler is that it might reject many points in high dimension. Some methods to improve the acceptance rate of points are included in [19]. For a special class of density functions given by a finite sum, we propose an embedding deterministic reduced acceptance-rejection algorithm. This algorithm produces a better star discrepancy convergence rate in our numerical example.
Appendix A: δ-cover to approximate star discrepancy Since it is computationally too expensive to compute the supremum in the definition of the star-discrepancy exactly for dimensions larger than one, we use a so-called δ-cover to estimate this supremum.
The concept of δ-cover is motivated by the following result [15]. Assume that Γ δ is a δ-cover of A with respect to the distribution ψ. For all {z 0 , z 1 , . . . , z N −1 }, the following discrepancy inequality holds In the experiments we choose A to be the set of intervals [0, t), where t runs through all points in the domain. For densities defined in [0, 1] s−1 , we set Γ δ = { s−1 j=1 [0, a j 2 −m ) : a j ∈ Z, 0 ≤ a j ≤ 2 m }, which means that the δ-cover becomes finer as the number of samples increases, thus it can yield a more accurate approximation of the star discrepancy. For densities defined in R s−1 , we choose δ-covers with respect to m as is the inverse marginal CDF with respect to the proposal density H. Note that the approximation of the star-discrepancy is computationally expensive, thus our experiments only go up to several thousand sample points. However, the generation of samples using a (t, m, s)-net is fast.

Appendix B: Proofs
Before giving the proofs, we need some preparation.

Consider the following elementary intervals
with 0 ≤ c j < b k (where c j is an integer) for j = 1, . . . , s. The diagonal of W k has length √ s/b k and the volume is b −sk . Let J be an arbitrary convex set in [0, 1] s . Let W o k denote the union of cubes W k fully contained in J, Let W k denote the union of cubes W k having non-empty intersection with J or its boundary ∂(J), Lemma 7. Let k ∈ N. Let J be an arbitrary convex set in [0, 1] s . For the W o k and W k constructed by (5) and (6), we have To illustrate the result we provide the following simple argument which yields a slightly weaker result. Based on the construction of W k , its diagonal length is √ s/b k . Then where · is the Euclidean norm. Therefore Note that the outer surface area of a convex set in [0, 1] s is bounded by the surface area of the unit cube [0, 1] s , which is 2s. Thus the Lebesgue measure of the set B is bounded by the outer surface area times the diameter. Therefore The result for λ(J \ W o k ) follows by a similar discussion as the proof above. Remark 3. Note that in [28] it was also shown that the constant 2s is best possible. Now we extend the result in Lemma 7 to pseudo-convex sets.
Corollary 2. Let J be an arbitrary pseudo-convex set in [0, 1] s with admissible convex covering of p parts with q convex parts of J. For W o k and W k given by (5) and (6) we have Proof. Let A 1 , . . . , A p be an admissible convex covering of J with p parts. Without loss of generality, let A 1 , . . . , A q be the convex subsets of J and A q+1 , . . . , A p be such that A ′ j = A j \J is convex for q + 1 ≤ j ≤ p. It can be shown that Therefore Since B j ∪ A j for j = 1, . . . , q, and B ′ j ∪ A ′ j for j = q + 1, . . . , p are convex, using Lemma 7, we obtain  (4). For W o k and W k given by (5) and (6), obviously, W o k ⊆ J ⊆ W k . The sets W o k and W k are fair with respect to the net, that is Since the bound holds for arbitrary convex sets, the proof is complete.
. . , t s−1 ) and A = {x ∈ [0, 1] s : ψ(x 1 , , . . . , x s−1 ) ≥ Lx s }. Since y n are the first s − 1 coordinates of z n ∈ A for n = 0, . . . , N − 1, we have The right-hand side above is now bounded by where we used the estimation λ(J * t ) ≤ λ(A) and the fact that N = M−1 n=0 1 A (x n ). Since J * t is also pseudo-convex, it follows from Lemma 3 that we can bound the above expression by

B.2. Proof of lower bound
Let where · denotes the Euclidian distance. Let This set is the part of a sphere of radius s and centre (1/2, . . . , 1/2, 1 − s) which lies in the cube [0, 1] s . The radius has been chosen such that Υ s−1 is the graph of a concave function. The (closed) spherical cap C(y, ρ) ⊆ S s−1 with center y ∈ S s−1 and angular radius ρ ∈ [0, π] is defined by C(y, ρ) = y ∈ S s−1 |x · y ≥ s 2 cos ρ .
The packing of Υ s−1 considered here is constructed by identical spherical caps which are non-overlapping, that is, C(y i , ρ), C(y j , ρ) ⊆ S s−1 with i = j touch at most at their boundaries, and such that the C(y i , ρ n ) are fully contained in Υ s−1 . The lemma is essentially well-known for spheres. The explicit proof is due to Winer [43] and Hesse gives a summary in [18,Lemma 1]. A similar argument can be used for our case. Now we give the proof of Theorem 2 whose proof follows the argument from the proof of [36, Theorem 1].
Proof. We may suppose s ≥ 2. Let B be the intersection of the unit cube [0, 1] s with the ball of radius s and centre (1/2, . . . , 1/2, 1 − s), that is, Let C be a closed spherical cap on S s−1 with spherical radius ρ. The convex hull C of C is a solid spherical cap. For 0 < ρ < π/2, the normalized Lebesgue surface measure λ(C) is a continuous function of ρ with c 1 ρ s+1 < λ(C) < c 2 ρ s+1 .
If N is sufficiently large, there is a positive real number ρ 0 such that a cap C of spherical radius ρ 0 has measure λ(C) = 1 2N .
In view of (8), we have 0 < ρ 0 < c 3 N −1/(s+1) . We now pick as many pairwise disjoint caps, which are fully contained in Υ s−1 and which have radius ρ 0 , as possible, say C 1 , . . We have Hence for every i, either ∆ PN (C i ) ≥ 1 2 or ∆ PN (C i ) = − 1 2 . Choose σ i such that σ i ∆ PN (C i ) ≥ 1/2 for 1 ≤ i ≤ M . Then We take ψ as the boundary of J excluding the boundary of [0, 1] s , which completes the proof.

B.3. Proof of Theorem 5
Proof. In what follows we restrict our investigations to the case k = 2 for simplicity, the general case can be proved by similar arguments. Let ψ = H 1 +H 2 be the target density function. Assume that we can apply the inverse CDF on H 1 and H 2 to generate samples. Let N i,ℓ −1 } for i, ℓ = 1, 2. The number N i,ℓ of the points in each subset is given by Then there exists δ i ∈ [0, 1) for i = 1, 2, 3 such that Therefore, where D * S,ψ is the star discrepancy of sample points in S associated with ψ and the same notation is also applied to D * L,H1 and D * L,H2 . Since this result holds for arbitrary t, the desired result follows then immediately.