Equidistribution, Uniform distribution: a probabilist's perspective

The theory of equidistribution is about hundred years old, and has been developed primarily by number theorists and theoretical computer scientists. A motivated uninitiated peer could encounter difficulties perusing the literature, due to various synonyms and polysemes used by different schools. One purpose of this note is to provide a short introduction for probabilists. We proceed by recalling a perspective originating in a work of the second author from 2002. Using it, various new examples of completely uniformly distributed (mod 1) sequences, in the"metric"(meaning almost sure stochastic) sense, can be easily exhibited. In particular, we point out natural generalizations of the original $p$-multiply equidistributed sequence $k^p\, t$ mod 1, $k\geq 1$ (where $p\in \mathbb{N}$ and $t\in[0,1]$), due to Hermann Weyl in 1916. In passing, we also derive a Weyl-like criterion for weakly completely equidistributed (also known as WCUD) sequences, of substantial recent interest in MCMC simulations. The translation from number theory to probability language brings into focus a version of the strong law of large numbers for weakly correlated complex-valued random variables, the study of which was initiated by Weyl in the aforementioned manuscript, followed up by Davenport, Erd\"{o}s and LeVeque in 1963, and greatly extended by Russell Lyons in 1988. In this context, an application to $\infty$-distributed Koksma's numbers $t^k$ mod 1, $k\geq 1$ (where $t\in[1,a]$ for some $a>1$), and an important generalization by Niederreiter and Tichy from 1985 are discussed. The paper contains negligible amount of new mathematics in the strict sense, but its perspective and open questions included in the end could be of considerable interest to probabilists and statisticians, as well as certain computer scientists and number theorists.


Introduction
This is certainly neither the first nor the last time that equidistribution is viewed using a "probabilistic lense". A probability and number theory enthusiast will likely recall in this context the famous Erdös-Kac central limit theorem [EK] (see also [Du], Ch. 2 (4.9)), or the celebrated monograph by Kac [Ka1].
Unlike in [Ka1,Ka2,EK,Ke1,Ke2] our main concern here are the completely equidistributed sequences and their "generation". In the abstract we deliberately alternate between equidistribution (or equidistributed) and two of its synonyms. In particular, the completely uniformly distributed (sometimes followed by mod 1) and ∞-distributed in the abstract, the keywords, and the references (see for example any of [Ho1,Ho2,DT,Kn1,Kn2,Kr5,KN,NT1,Lac,Lo,Le2,TO,OT,SP]) mean the same as completely equidistributed. Uniform distribution is a fundamental probability theory concept. To minimize the confusion, in the rest of this note we shall: (i) always write the uniform law when referring to the distribution of a uniform random variable, and (ii) almost exclusively write equidistributed (to mean equidistributed, uniformly distributed, uniformly distributed mod 1 or · -distributed), usually preceded by one of the following attributes: simply, d-multiply or completely.
Let D = [0, 1] d be the d-dimensional unit cube. Let r ∈ N and assume that a bounded domain G is given in R r . Denote by λ the restriction of the d-dimensional Lebesgue measure on D, as well as the Lebesgue measure on R r . In particular λ(G) is the r-dimensional volume of G. In most of our examples r will equal 1, and G will be an interval.
Throughout this note, a sequence of measurable (typically continuous) functions (x k ) k≥1 , where x k : G → R d will define a sequence of points β k ∈ D as follows: understanding that 1 = (1, . . . , 1) ∈ R d , and that the modulo operation is naturally extended to vectors in the component-wise sense. While each β k is a (measurable) function of t, this dependence is typically omitted from the notation. The (x k ) k is called the generating sequence or simply the generator, and the elements t of G are called seeds. Any A = j (a j , b j ] ⊂ D, 0 ≤ a j < b j ≤ 1 will be called a d-dimensional box, or just a box in D. As usual, we denote by 1 S the indicator of a set S. The sequence for each A box in D, and almost every t ∈ G.

Notes on the literature
Our general setting is mostly inherited from [Li]. The few changes in the notation and the jargon aim to simplify reading of the present work by an interested probabilist or statistician. In [DT,Kn2,KN,Kr5,SP] and other standard references, equidistribution is defined as a property of a single (deterministic) sequence of real numbers. Nevertheless, in any concrete example discussed here (e.g. the sequences mentioned in the abstract, as well as all the examples given in the forthcoming sections), this property is verified only up to a null set over a certain parameter space. So there seems to be no loss of generality in integrating the almost everywhere/surely aspect in Definitions 1.1 and 1.2 below. There is one potential advantage: the Weyl criterion (viewed in a.s. sense) reduces to (countably many applications of) the strong law of large numbers (SLLN) for specially chosen sequences of dependent complex-valued random variables. The just made observation is the central theme of this note. Note in addition that the study of R4 (and R6) types of randomness (according to Knuth [Kn2], Section 3.5, see also Sections 2.4.1 and 3 below) makes sense only in the stochastic setting. Davenport, Erdös and LeVeque [DELV] are strongly motivated by the Weyl equidistribution analysis [We], yet they do not mention any connections to probability theory. Lyons [Ly] clearly refers to the main result of [DELV] as a SLLN criterion, but is not otherwise interested in equidistribution.
The Weyl variant of the SLLN (see Section 2.2) is central to the analysis of Koksma [Kok], who seems to ignore its probabilistic aspect. The breakthrough on the complete equidistribution of Koksma's numbers (and variations) by Franklin [Fr] relies on the main result in [Kok], but without (any need of) recalling the Weyl variant of the SLLN in the background (see the proof of [Fr], Theorems 14 and 15). On the surface, this line of research looks more and more distant from the probability theory. The classical mainstream number theory textbooks (e.g. [KN]), as well as modern references (e.g. [DT, SP]), corroborate this point of view of equidistribution in the "metric" (soon to be called "stochastic a.s.") sense. And yet, the Niederreiter and Tichy [NT1] metric theorem, considered by many as one of the highlights on this topic, consists of lengthy and clever (calculus based) covariance calculations, followed by an application of the SLLN from [DELV] (see Section 2.4.1 for more details).
To the best of our knowledge, [Li] is the first (and at present the only) study of complete (or other) equidistribution in the "metric" sense, which identifies the verification of Weyl's criterion as a stochastic problem (equivalent to countably many SLLN, recalled in Section 2), without any additional restriction on the nature of the generator (x k ) k . In comparison, Holewijn [Ho1,Ho2] made analogous connection (and in [Ho2] even applied the SLLN criterion of [DELV] to the sequence of rescaled Weyl's sums (14)), but only under particularly nice probabilistic assumptions, which are not satisfied in any of the examples discussed in the present survey. In addition, both Lacaze [Lac] and Loynes [Lo] use the Weyl variant (or a slight modification) of the SLLN, again under rather restrictive probabilistic hypothesis (to be recalled in Section 2.1.2).
Kesten in [Ke1,Ke2], as well as in his subsequent articles on similar probabilistic number theory topics, works in the setting analogous to that of [Ho1,Ho2], however his analysis is not directly connected to the Weyl criterion. On a related topic, in a pioneer study of (non-)normal numbers and their relation to equidistribution, Mendes France [MF] also applies the SLLN from [DELV] as a purely analytic result, though it is not clear that a probabilistic interpretation brings a new insight in that setting. Interested readers are referred to [Kem, Kh] for probabilistic perspectives on normal numbers.
The study of complete equidistribution in the standard (deterministic) sense, and its close link to normal numbers, was initiated by Korobov [Kr1] in 1948. Korobov used Weyl's criterion [We] for (multiple and complete) equidistribution, and exhibited an explicit function f (in a form of a power series) such that (f (k) mod 1) k is a completely equidistributed sequence (see also [Kr5], Theorem 28). Knuth was aware of [Fr], but apparently unaware of either [We,Kr1], when he exhibited in [Kn1] a (deterministic) completely equidistributed (there called "random") sequence of numbers in [0, 1], by extending the method of Champertowne's that previously served to find an explicit normal number. Knuth [Kn1,Kn2] uses his own (computer science inspired) criterion for complete equidistribution. All the examples given in [Kr1,Kr2] and [Kn1,Kn2], as well as those obtained later on by the "Korobov school" (see Levin [Le2], Korobov [Kr5], and also the historical notes in [KN], and [SP]), are practical to a varying degree. To find out more about the deterministic setting, readers are encouraged to use the pointers (to synonyms and references) given above as a guide to the literature.
We finally wish to make note of a recent spur of interest in completely equidistributed sequences, and their generalizations (definable only in the stochastic setting), in relation to Markov chain Monte-Carlo simulations [CMNO, CDO, OT, TO]. Section 2.3 makes a brief digression in this direction. Importance of complete equidistribution for MCMC is not surprising, in view of a long list of empirical tests that these sequences satisfy (see [Fr,Kn2] and Remark 2(d,e)). The first author is convinced that anyone who regularly runs or even looks at pseudorandom simulations should benefit from reading a note of this kind.
Disclaimer and motivation This review and tutorial is highly inclusive, but by no means exhaustive. The theory of equidistribution (or uniform distribution) is rich, complex and fast evolving, and it would be very difficult to point to a single book volume, let alone a survey paper, which covers all of its interesting aspects (for example, the list of references in the specialized survey [AB2] overlaps with ours in only four items). Even when focusing on complete equidistribution in the metric (stochastic a.s.) sense, it seems hard to find a single expository article aimed at specialists, let alone at the probability and statistics community at large. We hope to have accomplished here an "order of magnitude" effect, citing several dozens of original research papers and surveys, textbooks or monographs, in a brief attempt to shed a probability-friendly light on the concepts and ideas presented, as well as to point out a number of natural and interesting open questions. We wish we had come across such a paper at our very encounter with this important topic. The latter thought gave the impetus to our writing.

One-dimensional examples
Suppose that d = r = 1, and that G is an interval. We recall several well-known examples of functions x k : G → R, that generate equidistributed sequences in [0, 1]. The first class of examples is as follows: for each p ∈ N define x k (t) = k p t, t ∈ G := (0, 1), k ≥ 1.
(3) (1) and (β k ) k≥1 ∈ [0, 1] N as in (1). Finally, if one defines for each k, and any a > 1 then (β k ) k≥1 from (1) are known as Koksma's numbers. It is well-known that for each of the above three examples, the sequences (β k ) k are (at least simply) equidistributed for almost all seeds [We, Fr, Kok] (see also [KN, DT, SP]). In fact if p = 1, then the Weyl (Sierpinski, Bohl) equidistribution theorem is a stronger claim: k · t mod 1 is simply equidistributed for all irrational t. One could refine the notion of a seed, and call t a seed only if β k (t) is (sufficiently) equidistributed. The new set of seeds T would then be well-defined up to a null-set. Note that (3) and (4) are both linear in t, which is not true for (5). The methodology developed by the second author in [Li] was motivated by the particularly simple analysis of the linear case, which can be extended (under certain hypotheses on (x k ) k ) to the non-linear setting.

Multiple equidistribution with examples
Let d ≥ 2 be fixed. A sequence of real measurable functions (x k ) k≥1 can be used to form sequences (x k ) k≥1 of d-dimensional vector-valued (measurable) functions, and therefore the corresponding sequences (β k ) k≥1 (see (1)), in at least three different natural ways: (a) The set of seeds could be equally d-dimensional (here r = d). More precisely, , for all k ∈ N, j = 1, . . . , d. In this case (b) The set of seeds could be r-dimensional, and the successive x (and β) could be formed by shifting the window of observation by 1. More precisely, for t ∈ G ⊂ R r , define x k (t) := (x k1 (t), x k2 (t), . . . , x kd (t)), where x kj (t) := x k+j−1 (t), for all k ∈ N, j = 1, . . . , d. In this case (c) Let d ≥ 1. The set of seeds could be r-dimensional, and the successive x (and β) could be formed by shifting the window of observation by h ∈ N. More precisely, for t ∈ G ⊂ R r , define x k (t) := (x k1 (t), x k2 (t), . . . , x kd (t)), where x kj (t) := x (k−1)h+j (t), for all k ∈ N, j = 1, . . . , d. In this case Note that class (c) comprises class (b) (if h is set to 1), and that if h > d, then (β d,h k ) k is formed from a strict subsequence of (x k ) k . We shall consider the above definitions with d varying over N. The sequences in (6) will be included in the analysis of Section 2.1.
The equidistribution analysis is typically done on the sequences of vectors β d from (7). Yet (see Remark 1(c) below) the construction in (8) is particularly interesting from the perspective of comparison with (pseudo-)random simulations.
Simple equidistribution in D was defined in the paragraph preceding Section 1.1. Let again G be a bounded domain in R r for some r ≥ 1.
Definition 1.1 Assume that (x k ) k≥1 is a sequence of real measurable functions on G. (i) The sequence {β k : k ∈ N} defined in (1) is said to be d-multiply equidistributed in [0, 1] if there exists a measurable subset of seeds T d , of full measure (or equivalently, λ(T d ) = λ(G)), such that the sequence of vectors β d k in (7) is simply We shall often abbreviate completely equidistributed sequence in [0, 1] by c.e.s in [0, 1]. Occasionally we may drop the descriptive "in [0, 1]". It is natural to also consider shifts of arbitrary fixed length.
Definition 1.2 If for some measurable subset of seeds T d,h , again of full measure, we have that the sequence of vectors β d,h k in (8) is simply equidistributed in [0, 1] d for all t ∈ T d,h then we say that {β k : k ∈ N} defined in (1) is d-multiply equidistributed in [0, 1] with respect to shift by h.
Naturally, we omit "with respect to shift..." if h = 1, and use attribute simply instead of 1-multiply If d = 1 and h > 1. Remark 1. (a) Fix some h ∈ N and o ∈ N 0 . It is easy to check that if {β k : k ∈ N} is d-multiply equidistributed in [0, 1] with respect to shift by h, then the same property holds (on the same subset of seeds) for {β k+o : k ∈ N}. For this reason the definitions above are stated only in the case o = 0. (b) Due to the just mentioned easy fact, one can quickly check (by averaging over h different offsets) that if {β k : k ∈ N} is d-multiply equidistributed in [0, 1] with respect to shift by h ≥ 2, then it is also d-multiply equidistributed in [0, 1] (in the sense of Definition 1.1). Note however the following example: suppose (α k ) k and (γ k ) k are (simply) equidistributed in [0, 1] so that (α k /2) k and ((1 + γ k )/2) k are equidistributed in [0, 1/2] and [1/2, 1], respectively. Then (β k ) k defined by β 2k−1 := α k , β 2k := γ k , k ≥ 1, is (simply) equidistributed in [0, 1], but not even simply equidistributed in [0, 1] with respect to shift by 2. One could construct similar examples in the multiply equidistributed setting. (c) As already noted, the interest of including shifts becomes apparent if one compares the equidistribution with the law of large numbers (LLN), or with Monte-Carlo simulations using pseudo-random numbers. If X 1 , X 2 , . . . is a sequence of i.i.d. uniform random variables, and f : [0, 1] d → R a bounded measurable map, then Ef (X 1 , . . . , X d ) is the theoretical (LLN) limit (in the L p and in the almost Given a pseudo-random sequence a = (a k ) k≥1 , a direct analogue of (9) is and the shift by h = d is typical in doing simulations. While it is also true that for the variance (and therefore the theoretical error) of the approximation is the smallest if h ≥ d. This variance (error) bound is constant over shifts h ≥ d and offsets o ≥ 1, hence (9) is an "economical" approximation (each element of (X k ) k is used once and only once), and (10) is its direct analog. (d) We can point the reader to at least two different derivations ([Kn2] Section 3.5, Theorem C and the final Note in [Kr5] Ch. III, §20) of the following important fact: if {β k : k ∈ N} is completely equidistributed, then it is also ("omega-by-omega", on the same set of seeds) d-multiply equidistributed in [0, 1] with respect to shift by h for each d ≥ 1 and each h ≥ 2. Due to this fact, and the observations made in (b), defining completely equidistributed with respect to shifts is superfluous.
(e) As a consequence of (d) and various other properties derived in [Fr], Knuth [Kn1,Kn2] concludes that any c.e.s. passes numerous empirical tests (see [Kn2], Section 3.5, the comment following Definition R1), and is therefore an exemplary pseudo-random sequence. The Weyl sequence (β k ) k≥1 from (3) is p-multiply equidistributed, but it is not p + 1-multiply equidistributed; the multiplicatively generated sequence from (4) is only simply equidistributed (see also Remark 2(c)). Section 2.1.3 serves to point out that already in the linear setting, numerous examples of c.e.s. that generalize the Weyl sequence in a natural way, can be easily constructed.
Koksma's numbers from (5) generally serve (see e.g. [Kn2,KN,TO,SP]) as the prototype of a metric c.e.s. Most of the sequel is organized in connection to this example. In particular, Section 2.4 gives a page-long proof of their complete equidistribution, as an extension of the technique from Section 2.1 to the non-linear setting. The discussion in Section 2.4.1 serves to put this into perspective with respect to [Kok, Fr] and [NT1]. Section 3 recalls the main observations made in this and the next section, and discusses several natural open problems.

Equidistribution via probabilistic reasoning
The purpose of this section is to describe the approach of [Li], as well as to put it into perspective with respect to [Kok,Fr,NT1].

A lesson from the linear case
Consider the set of multi-indices m = (m 1 , m 2 , . . . , m d ) ∈ Z d . Let {x k : k ∈ N} be a sequence functions (soon taken to be linear) as in Definitions 1.1-1.2, and let points β k ∈ D be defined by (6).
Define the sequence (ν N ) N of purely atomic finite (probability) measures on [0, 1] d via where the dependence in t ∈ G is implicit. Then clearly, the (simple) equidistribution in [0, 1] d (2) is equivalent to the weak convergence of (ν N ) N to the uniform law on [0, 1] d (denoted in (2) by λ), as N goes to ∞. This in turn is equivalent to saying that for any polynomial p in d-variables we have lim N p(s)ν N (ds) = [0,1] d p(s)λ(ds).
From an analyst's perspective, choosing the class of polynomials in the above characterization of weak convergence is suboptimal. Indeed, if d = 1, the class of complex exponentials t → {exp(2πi m· t)}, m ∈ Z, is orthogonal (even orthonormal) in L 2 [0, 1], and (due to the Stone-Weierstrass theorem) it is dense in the (periodic) continuous functions on [0, 1], with respect to the usual sup-norm. The same is true in the d-multiple setting, this time with respect to the class of multi-dimensional complex exponentials D ∋ s → {exp(2πi m · s)}, m ∈ Z, where · is the dot (or scalar) product.
In order to check that lim N ν N = λ in law, it is necessary and sufficient that for The just obtained characterization for equidistribution of (β k ) k is the well-known Weyl criterion [We]: consider the quantities Recalling that in our setting each β k and therefore ν N is in fact a (measurable) function of t, the criterion reads (1), and the periodicity of the complex exponential, we have the identity A crucial point is that if (x k ) k is defined by (6), where the sequence of real (measurable) functions (x k ) k is given either in (3) or in (4), then the functions G d ∋ t → e(m, x k (t)) form a bounded orthogonal sequence in L 2 (G, C). Note that this is the first time that the linearity in t (resp. t) of the functions in (3,4) (resp. (6)) is being called for. Unlike [Li], we continue the discussion using probabilistic wording and notation. Given z ∈ C, denote by z its complex conjugate. For a fixed m ∈ Z d \ {0} and each k ∈ N, define Y k (m ; t) := e(m, x k (t)), t ∈ G. Then (Y k (m)) k is a sequence of complex-valued random variables on the probability space (G, B, P ), where B is the Borel σ-field on G, and P (dt) = λ(dt)/λ(G), such that EY k (m) = 0, for all k ≥ 1, and The standard proof of the strong LLN for i.i.d. variables with finite second moment (see for example the exercise concluding [Du] Ch. I, Section 7) can clearly be extended to sequences of pairwise uncorrelated centered random variables with constant (or uniformly bounded) variance. Anticipating some readers with nonprobabilist background, we include a sketch of the argument in points LLN.(a)-LLN.(d) below. The above sequence (Y k (m)) k has the just stated properties, and Recalling (14) and the fact that m ∈ Z d \ {0} was fixed but arbitrary, gives (13) (λa.e. is the same as P -a.s,) and therefore the above stated (simple) equidistribution in [0, 1] d of the sequence of vectors (β k ) k .
LLN.(a) Note that ES N = 0 and var(S N ) = O(N). LLN.(b) Use the Chebyshev (or the Markov) inequality, and the Borel-Cantelli lemma on the subsequence N n = n 2 to conclude that and therefore that S Nn /N n → 0, almost surely.
another application of Markov inequality gives that with probability greater than 1 − 2D and again by the Borel-Cantelli lemma, that (17) happens with probability 1 for all but finitely many n. (17) by n 2 , noting that N = n 2 + O(n). Use the fact ε > 0 was arbitrary, and the conclusion of (b). Remark 2. (a) In our special setting Y s are uniformly bounded below (and above), so (17) could have been replaced by a simpler estimate: with probability 1 for all n

LLN.(d) Divide
(b) If r = d = 1, then (6) and (7) coincide, leading to the conclusion that if (x k ) k is again either from (3) or in (4), then the corresponding (β k ) k is simply equidistributed.
The study of multiple equidistribution can be similarly set up (see also Section 2.1.3 below): here (11) has the same form, but we take d ≥ 2 and β := (β d k ) k defined as in (7) with (x k ) k linear (as in (3,4)) and study the averaged Weyl sums of terms Y k (m ; t) = e(x k (t), m), for t ∈ G, for each m ∈ Z d \ {0}. (c) Recall the multiplicatively generated sequence from (4). To see that it is not 2-multiply equidistributed, take m = (M, −1) = (0, 0) and note that with this choice of m the corresponding criterion (13-14) converges nowhere to 0. Indeed, m 1 x k + m 2 x k+1 ≡ 0, for any k ∈ N. Similarly, it is possible to find a non-trivial p + 1-dimensional vector m such that This amounts to solving a linear system Ax = 0, where A has p rows and p + 1 columns. The corresponding criterion (13-14) applied to Weyl numbers (3) will again converge nowhere to 0 (non-convergence to 0 on an event of positive probability would already be sufficient).

Equivalent formulations of complete equidistribution
Let d ≥ 1 be fixed. The d-dimensional (or multiple) discrepancy at level N is defined by where J is the family of boxes as in definition (2). The d-dimensional "star" discrepancy at level N is defined by where J * is the family of boxes in [0, 1] d as above, with lower left corner equal to 0. Equivalently, the supremum above is taken over all boxes A of the form d i=1 [0, b i ), where 0 ≤ b i ≤ 1, for each i. Note that we again deviate slightly from the standard definitions (see Kuipers and Niederreiter [KN]), where discrepancy sequences are defined for deterministic sets or sequences. Easy properties of measurability of functions make each D d N and D d, * N a random variable in the current (stochastic a.s./metric) setting.
Recall (11) with β k = β d k , k ≥ 1. As any probabilist knows (think about cumulative distribution functions), the weak convergence of the sequence of (random) measures ν N to the uniform law on [0, 1] d is, omega-by-omega (where "omega" is typically denoted by t), equivalent to the convergence of (D Note that I ∞ := [0, 1] N is a product of compact spaces, and therefore compact itself. Let B m be the Borel σ-field on [0, 1] m . Consider the cylinder sets C ⊂ I ∞ , such that C = C b × [0, 1] N , for some C b ∈ B m and m ∈ N. Let B ∞ be the σ-field on I ∞ generated by the cylinders. Instead of d-dimensional vectors, one could consider straight away the "∞-dimensional" (random) vector sequence and its corresponding β ∞ k := x k mod 1, where again "modulo" operation is applied component-wise (a.s.). Let ν ∞ N be as ν N in (11), but with β redefined as β ∞ . Another formulation of complete equidistribution can be read off from an "abstract fundamental theorem" (see e.g. [KN], Ch. 3, Theorem 1.2 and the remark following it) or easily proved by approximating all closed sets in B ∞ by closed cylinders: (β k ) k is a c.e.s. in [0,1] if and only if (ν ∞ N ) N converges weakly to the uniform law on I ∞ . A realization Γ from this limiting uniform law (that is, the random object having that law) is also called the infinite statistical sample or the i.i.d. family of uniform [0, 1] random variables (U 1 , U 2 , . . .). The just made statement could also be reformulated as follows: (β k ) k is a c.e.s. in [0, 1] if and only if for any f bounded and continuous function on I ∞ (equipped with the product topology), we have As indicated in Section 1.1, another formulation/criterion of complete equidistribution for a (deterministic) sequence of real numbers in [0, 1] was derived by Knuth [Kn1,Kn2]. There is no doubt that this could also be turned into a stochastic (a.s.) formulation for a sequence (1), the details are left to an interested reader.
2.1.2 Why not simply take an i.i.d. family of uniforms?
The first author can guess that, especially on the first reading, a non-negligible fraction of fellow probabilists could be asking the above or a similar question. It is clear that an i.i.d. sequence Γ of uniform random variables is a c.e.s. in [0,1] in the sense of Definition 1.1, or any of its equivalent formulations described in the previous section. Studying complete equidistribution, starting from i.i.d. (or similar) random families is precisely what probability oriented works [Ho1,Ho2,Lac,Lo] did. However, to most mathematicians non-probabilists this will not mean much, especially due to the fact that the rigorous probability theory is axiomatic, and the rigorous construction of Γ rather abstract.
More precisely, Γ is not presented in a neat (classical) functional format, like any of the sequences (x k ) k and their corresponding (β k ) = (x k mod 1) k in (3,4,5). Instead we usually start with an abstract infinite product space (or more concretely, with [0, 1] N ), and the kth uniform random variable equals the identity map from the kth space to itself. This suffices for the most purposes of modern probability theory, but may not seem very convincing to a non-probabilist who wishes to "see a concrete example of" Γ. A related wish to "see an explicit outcome from a concrete example of Γ" quickly leads to philosophical discussions around the question "What is a random sequence?" (the reader is referred to the section in [Kn2] carrying that very title).
The Kac [Ka1] approach to the i.i.d. discrete-valued sequence is also revolutionary from the point of view of the just mentioned drawback. Indeed, an infinite sequence of independent Bernoulli(1/2) random variables (also called the Bernoulli scheme) can be explicitly constructed on Ω = [0, 1], with F equal to the Borel σ-field B, and P equal to λ. This was done in [Ka1], by applying a simple transformation to the classical Rademacher system. We leave on purpose the link with (4), and other details, to interested readers, as well as the discovery of the related b-adic Rademacher system and its relation to the discrete uniform law on {0, 1, . . . , b − 1}.
Given an infinite sequence X := (X i ) ∞ i=1 , which has the Bernoulli scheme distribution (take for example the one from the above recalled construction by Kac), one can define Γ on the same (Ω, F , P ) by "redistributing the digits" via a triangular scheme, for example U 1 := X 1 /2 + X 3 /2 2 + X 6 /2 3 + X 10 /2 4 + · · ·, U 2 := X 2 /2 + X 5 /2 2 + X 9 /2 3 + · · ·, U 3 := X 4 /2 + X 8 /2 2 + · · ·, U 4 := X 7 /2 + · · ·, and so on, and finally Γ = (U i ) i . Is this asymptotic definition of Γ on ([0, 1], B, P ) sufficiently explicit? Or are the examples like (5,20,21) -all leading (after mod 1 application) to c.e.s. but none to i.i.d. uniforms -more reassuring to think about (more amenable to analysis)? The answer will likely vary from one peer to another, depending not only on their mathematical background and research interests, but also on their personal perception of randomness. Note that such and related questions challenged the very founders of probability theory (we refer the reader to the discussion [Lam] of Kollektivs (or collectives) of von Mises [Ms], the historical notes of Knuth [Kn2], Section 3.3.5, as well as to the monograph by Burdzy [Bu] for a modern perspective; and also to [Kol1,Kol2,Le1,ML] for yet another mathematically rigorous and well-studied notion of randomness).
We claim that (1) defined using either (20) or (21) is completely equidistributed in the sense of Definition 1.1. Let us take (21) for example. Simple equidistribution can be quickly deduced as in the previous section (see Remark 2(b) in particular). Now fix some d ≥ 2 and m ∈ Z \ {0}. WLOG we can assume that the Weyl criterion (13) has been shown for the (d − 1)-multiply case. Moreover, due to symmetry, we can and will assume It is clearly true that E(Y k ) = 0, and E|Y k | 2 = 1, for each k ∈ N. It is furthermore easy to see that whenever k − l > c(m) for some c(m) ∈ N, then i m i [(k + i − 1)! − (l + i − 1)!]) is a strictly positive integer, and due to the periodicity of sine and cosine, we have again E(Y k Y l ) = E(Y k Y l ) = 0 (or equivalently, Y k and Y l are uncorrelated). Moreover, |Y |s are uniformly bounded by 1. Therefore

The Weyl variant of the SLLN and generalizations
It is easy to see that the steps in LLN.(a)-LLN.(d) could be applied to show simple equidistribution in [0, 1] of (1), whenever x k (t) := a k t, k ≥ 1 and (a k ) k is a sequence of distinct integers, yielding an alternative derivation of [KN] I, Theorem 4.1. In the (analytic) number theory literature it is standard to apply instead, in such and in analogous situations (see e.g. [KN, MF, DT], as well as [Ho1], Theorem 2), the following variant of the SLLN, due to Weyl [We]: since and therefore by Tonelli's theorem n |S n 2 | n 2 < ∞ almost surely, and in particular S n 2 /n 2 → 0, as n → ∞. Finally use Remark 2(a) to get (16).
The Weyl variant of the SLLN has the following important extension, due to Davenport, Erdös and LeVeque [DELV], that has been since used extensively in number theoretic studies (see e.g. [KN] I, Theorem 4.2): Its proof strongly relies on the uniform boundedness of the individual random variables Y , but not at all on their particular (complex exponential) form (see Lyons [Ly], Theorem 1). Lyons [Ly] takes moreover general complex-valued Y s, and derives different generalizations of the above SLLN criterion of [DELV], where the uniform boundedness condition is replaced by various bounded moment conditions. Remark 3. In view of the SLLN stated here, the fact that the sequences of functions from Section 2.1.3 generate c.e.s. is again trivial. Just like with the usual SLLN (of LLN.(a)-(d) and Remark 2(a)), it is important here that all the covariances can be computed or adequately (uniformly) estimated.

Weyl-like criterion for weakly c.e.s.
Recall the (random) discrepancy sequences introduced in Section 2.1.1. Owen and Tribble (e.g. [OT] Definition 2 or [TO] Definition 5) define weak complete equidistribution (aka WCUD) in terms of the weak(er) convergence of the discrepancy sequence as follows: (β k ) k is WCUD if and only if D * ,d N ⇒ 0, or equivalently (since the limit is deterministic), if D * ,d N → 0 in probability, for each d ≥ 1. The following "a.s.-convergence along subsequences" characterization is wellknown: a sequence of random variables (X n ) n converges in probability to X, if and only if for any subsequence (n k ) k , one can find a further subsequence (n k(j) ) j such that X n k(j) converges to X almost surely. Let us apply it to the discrepancy sequences, and conclude (due to the countability of N, and the Arzelà-Ascolli diagonalization scheme) that for any subsequence (n k ) k one can find a further subsequence (n k(j) ) j and a single event (Borel measurable set) G f of full probability such that D * ,d n k(j) (t) → 0, for all d ≥ 1, and all t ∈ G f ⊂ G.
But now we recall (as in Section 2.1.1) that the above convergence is equivalent (t-by-t) to the simultaneous (for each d ∈ N and on the full probability event G f ) weak convergence of the random sequence (ν d n k(j) ) j to the corresponding uniform law λ on [0, 1] d , where The last made claim is in turn equivalent (recall the reasoning of Section 2.1 and definition (22)) to the statement This leads to the following conclusion: for each subsequence (n k ) k one can find a further subsequence (n k(j) ) j such that for each m ∈ Z d \ {0} the rescaled Weyl sums indexed by m converge to 0 along that sub(sub)sequence, almost surely. In other words, for each d and m ∈ Z d \ {0}, instead of (16) we arrived to Finally recalling that, in our special setting, |S N |/N is uniformly bounded by 1, the (adaptation of) Lebesgue dominated convergence theorem says that (25) is equivalent to We record the above sequentially made equivalences as LEMMA 2.1 (Sufficiency and necessity for weak c.e.s. (aka WCUD)) Let (x k ) k be a sequence of random variables (or measurable functions on G), Y k (m) be defined by (22), and (β k ) k by (1). Then (β k ) k is weakly completely equidistributed in [0, 1] if and only if (26) is valid for each d and m ∈ Z d \ {0}.
In view of Lemma 2.1, the claim from [TO] that WCUD sequences are hard to construct seems rather surprising.

Extensions to the non-linear setting
As already noted, the reasoning of Section 2.1 is crucially connected to linearity only through (15). In particular, if we were to take (5) or some other sequence of functions, then apply (7) to obtain d-dimensional vector sequences, and plug them into (12,14,22), the criterion (13), to be verified through (16), would stay the same. Note however that the correlation structure may (and typically does) become much more complicated than that given in (15) or in Section 2.1.3. In the special case of Koksma's numbers, the probability P is set to λ renormalized by (a − 1) on [1, a], or equivalently, P is the uniform law on [1, a]. Before considering covariance, one could already note that Ys are not (necessarily) even centered. Since Y k (t) ≡ Y k (m ; t) := e(m, x k (t)), t ∈ [1, a], we have that The non-zero expectation does not matter much, if one could show that the long term average of Ys is approximately centered, or equivalently, that The covariance analysis to be done below will imply an even stronger estimate E(|S N |) = O(N 1−1/d ), but already from Jensen's inequality we get that if E(|S N | 2 ) = o(N 2 ), then E|S N | = o(N), which would be sufficient. The argument LLN.(a)-LLN.(d) will continue to work for the sequence (Y k − EY k ) k , and will imply (16) for (Y k ) k , even in the presence of correlation, provided that the total covariance n k=1 n l=k+1 |E(Y k Y l ) + E(Y l Y k )| grows sufficiently slowly in N. Indeed, a little thought is needed to see that if for some c = c(a, m, d) < ∞, δ = δ(a, m, d) > 0, then the conclusion (16) remains (this time choose N n = ⌊n 2/δ ⌋, and then apply the sandwiching argument). Alternatively, this can be immediately deduced form the SLLN criterion (23).
We record the just made observations in form of a lemma.
LEMMA 2.2 (Sufficiency for complete equidistribution I) Let (x k ) k be a sequence of random variables (or measurable functions on G), Y k (m) be defined by (22), and (β k ) k by (1). If for each m, Deriving (28) is a longer calculus exercise, sketched here for reader's benefit. Fix d ≥ 1, m ∈ Z d \ {0}, and consider β d k = (β k1 , . . . , β kd ) defined in (7). Recalling that One can assume WLOG that k > l, so that g(t) is non-negative on G, with a single zero t = 1. Let d * equal to the maximal index i such that m i = 0. The polynomial p does not depend on k, l. It has degree d * − 1 and therefore at most d * − 1 ≤ d − 1 real zeros, each of which may fall into G. Therefore p(·)g(·) takes value 0 at 1, and at most d − 1 other points in G. Let us denote these zeros by (z j ) r j=1 , where 1 =: z 0 ≤ z 1 < . . . < z r ≤ a =: z r+1 . For j = 1, . . . , r, let l j be the multiplicity of z j for p. Extend this definition to l 0 = 0 if z 1 > 1, and l r+1 = 0 if z r < a. Define b 1 := max |p ′′ (t)|.
Keeping in mind that r ≤ d − 1, we now let I i := [z i , z i+1 ], and estimate separately for each i = 0, . . . , r. The non-zero polynomial p does not change sign on I i , so WLOG we can assume that it takes positive values in the interior of I i . Moreover min The integral of cos(·) over the first and the final piece is again trivially bounded above by 1/(k − l) 1/d . For the middle piece, it suffices to show that: (a) (pg) ′ (z i + 1 (k−l) 1/d ) ≥c i |k − l| 1/d , wherec i > 0 and that (b) (pg) ′′ (t) = g ′′ (t)p(t) + 2g ′ (t)p ′ (t) + g(t)p ′′ (t) > 0 on I ′ i . Indeed, (a)-(b) would imply that 2πg(·)p(·) increases more and more rapidly on I ′ i , and in turn that cos(2π g(t)p(t)) makes shorter and shorter "excursions" of alternating sign, away from 0. This would yield an upper bound for the integral over To show (b), note that g ′′ (t) = k(k − 1)t k−2 − l(l − 1)t l−2 is greater than (k 2 − l 2 + O(k + l))t k−2 . We also have g ′ (t) = kt k−1 − lt l ≤ ka · t k−2 and g(t) ≤ a 2 · t k−2 . Due to (31) and the fact l i + l i+1 ≤ d * − 1 ≤ d − 1, the leading term of (pg) ′′ (t) is positive and bounded below by c i |k − l| 1/d (k + l)t k−2 on I ′ i , while the two other terms are bounded above by b 1 kat k−2 and b 2 a 2 t k−2 , respectively, yielding (b). We leave to the reader a similar (and easier) argument for (a).
Finally note that the derivation of the bound in (28) applies also to estimating both the real and the imaginary part of (27), and leads to an analogous bound EY k = O(k −1/d ).

Koksma's numbers have even stronger properties
Koksma's derivation [Kok] of the simple equidistribution of Koksma's numbers was similar to that included above (see also [KN] I, Theorem 4.3), and easier due to the fact that a simpler polynomial m · g replaces p · g in (29). As already mentioned in the introduction, Franklin [Fr] found a way (again similar to the calculus exercise above) of building on Koksma's analysis, without directly applying Weyl's criterion for multiple equidistribution.
Koksma [Kok] showed in addition a stronger type of equidistribution: suppose that (a k ) k is a sequence of positive distinct integers, and consider the reordering (with possible deletion) of Koksma's numbers obtained via (1) from (x a k ) k≥1 defined as in (5), the obtained sequence is again simply equidistributed. Niederreiter and Tichy [NT1] proved that the above arbitrarily permuted (sub)sequence of Koksma numbers is in fact completely equidistributed, thus confirming a guess of Knuth's [Kn2], who called the above property pseudo-random of R4 type. As it turns out, the arguments in [NT1] are equally based on probabilistic reasoning (using the [DELV] variant (23) of the SLLN and covariance calculations). For the benefit of the reader we sketch it next.
Suppose initially that, for each (23) implies the required convergence in the Weyl criterion. We again record the just made observations in form of a lemma.
The rest of the argument is exactly as described above (without the probabilistic rescaling by a − 1).
Further qualitative and quantitative improvements and extensions, to be recalled soon, were done in the late 1980s by Niederreiter and Tichy [NT2], Tichy [Ti], Drmota, Tichy and Winkler [DTW], and Goldstern [Go] (see also [DT], Section 1.6). And yet, interesting non-trivial open problems remain, as indicated in the next section.

Concluding remarks with open problems
The main theorem of [Li] also yields novel generators of (stochastic a.s.) completely equidistributed sequences in the non-linear setting, of which the prototype is x k (t) = (t(log t) r k ) w k , k ≥ 1, where (r k ) k (resp. (w k ) k ) are sequences of positive (resp. natural) numbers satisfying certain hypotheses. This may not be so interesting (even though there is no obvious reduction of such sequences to exponential sequences of Section 2.4.1) in view of the main theorem of Niederreiter and Tichy [NT1,NT2]. However, the approach to (complete) equidistribution, implicit in [Li] and made explicit in Section 2, is interesting. In particular, it leads to the realization that [NT1] is based on probability arguments.
A careful reader must have noticed that (23) is only a sufficiency condition for the SLLN (16). Davenport, Erdös and LeVeque also state (in [DELV], main/only Theorem) "On the contrary...", however these counter-examples do not either imply nor disprove the necessity of assumption for the SLLN of the corresponding Weyl sums. Indeed, it could be that x k are purely constant (random variables), and such that for some m ∈ Z d \ 0 we have that is both Ω(1/log n) and o(1) (Korobov [Kr4] guarantees that such examples exist), so that the SLLN happens even though the series in the criterion diverges. However, here we would trivially have, for the same d and m ∈ Z d \ {0}, that n 1 n var |S n (m)| n < ∞, and lim n E|S n (m)| 2 n 2 = 0.
While the above example is arguably contrived, it already illustrates the following straight-forward consequence of (23) (or more precisely, of its generalization [Ly], Theorem 1): (32) is a (strictly) stronger sufficiency criterion for the SLLN of the corresponding (indexed by m) Weyl sums, than that given in [DELV] (see (23)). Is it necessary? If not, is the whole d-dimensional collection of them (meaning that (32) is true for all m ∈ Z d \ {0}) necessary for d-multiple equidistribution of the corresponding β? If not, is the complete collection of them (meaning that (32) is true for all m ∈ Z d \ {0} and all d ∈ N) necessary for complete equidistribution of the corresponding β? If not, are there natural additional (easy to check) hypotheses on (x k ) k under which these families of criteria become necessary (they are always sufficient)?
As pointed out in Section 2.4.1, we know from [NT1] that Koksma's numbers are of R4 type and that the same is true, as shown in [NT2], for a large class of other exponentially generated sequences. A trivial example of a c.e.s. which is also R4 (and even R6 in Knuth's terminology) is the infinite statistical sample Γ, but as explained in Section 2.1.2, this is likely not considered sufficiently explicit from a non-probabilist's perspective. The examples from Section 2.1.3 must have been known to the modern analytic number theory community, in particular as examples of so-called lacunary sequences. We could not find them on the list of c.e.s. in any the standard references, however (21) was studied by Korobov [Kr3] in connection to an explicit example of a simply equidistributed sequence. They seem amenable to analysis, and natural for continuing the investigation in Knuth's framework, who anticipated the result of [NT1], but at the time knew only the work of Franklin and Koksma about Koksma's numbers (and variations thereof) [Kok, Fr]. Are linear c.e.s. from Section 2.1.3 also of R4 type? Is it possible that any sufficiently random c.e.s. (e.g. var(β k 1 [c,d] ) > 0 for all k and and all 0 ≤ c < d ≤ 1) is of R4 type? If not, is there a natural and easy to check characterization of when the complete equidistribution (type R1) implies R4? By the way, note that if an infinite sequence of random variables (β k ) k on ([0, 1], B, λ) is both completely equidistributed in [0, 1] (that is, of type R1) and exchangeable (see e.g. [Du], Example 5.6.4), then it must be equal in law to Γ. Is there a natural exchangeability-like property (clearly stronger than R4 type), though weaker than exchangeability, that would jointly with c.e.s. still imply that the sequence (β k ) k has the law of Γ? Is there a natural condition (stronger than c.e.s. but weaker than i.i.d.) that would imply, jointly with R4 type, that the sequence (β k ) k has the law of Γ?
Once equidistribution is established, it is natural to ask about the rate of convergence in (2), or in analogous multi-dimensional expressions that correspond to Definitions 1.1-1.2. The discrepancy sequences (as recalled in Section 2.1.1) seem to be the principal object for this analysis, that was already analyzed in classical studies by the founders of analytic number theory (see [KN], Chapter 2 and [DT] Sections 1.1-1.2). From the stochastic point of view, it seems more interesting to study discrepancy as a random variable. In particular, the well-known Erdös-Turán-Koksma inequality (see e.g. [DT], Theorem 1.21) can be restated in the present setting as follows: if H is a positive integer and, for m ∈ Z d , we let r(m) := d i=1 max{1, |m i |}, then it is true (t-by-t) that Substantial work by the "Tichy school" has been done on multi-dimensional discrepancy estimation of Koksma's numbers and variations. In particular, Tichy [Ti] obtains, in the context of [NT2] (more precisely, x k (t) = t a k , t > 1, for some a k ∈ R, and the minimal distance between different powers is bounded below by δ > 0), bounds on D * ,d N of the form O(N −1/2+η ) for any η > 0, even if the multiplicity d is not fixed but diverges as log N raised to a small power; and Goldstern [Go] extends these to the setting where the replication between the exponents a n is possible but infrequent, and the minimal distance between the exponents may slowly converge to 0 (see also notes on the literature in [DT], Section 1.6).
Recall that already in the setting of linear generators, the random variables Y k (m) are not mutually independent, however they do have other nice properties. For concreteness, one could take the two examples of completely equidistributed sequences from Section 2.1.3. Is there a central limit theorem, or another type of concentration result that would apply, and give good estimates on d-dimensional (star) discrepancy of these sequences, with or without modifications related to R4 type of randomness? When is N/ log log N D * ,d N a tight sequence of random variables? The above LIL-type result for the simple (1-multiple) discrepancy sequences (D * ,1 N ) N is well-known (see e.g. [DT] Section 1.6.2 or [AB2]) in the context of lacunary sequences, and in particular for (20,21), even if permuted (without possible deletion), as shown by Aistleitner et al. [ABT]. But as soon as one starts increasing the multiplicity d, no specific study of the corresponding LIL seem to exist.
In view of (19) and Remark 1(d), most probabilists and statisticians would likely ask if any of the following is true for any (or all) of the c.e.s. discussed here: given any d ≥ 1 and any f : R d → R a nice enough function, define f (β ∞ k ) := f (β d k ) and f (Γ) := f (U 1 , U 2 , . . . , U d ). Are the sequences of random variables (recall (8) tight? Do they converge in law to a centered Gaussian random variable? These questions are similar in spirit to those of the preceding paragraph, but not quite the same. In particular, the well-known Koksma and Koksma-Hlawka inequalities (see e.g. [KN], Ch. 2, Section 5 or [DT], Theorem 1.14) serve to give universal bounds on the error in (MC) numerical integration in terms of the discrepancy of the sequence and (a multi-dimensional extension of) the bounded variation of the integrand. Though essential in various applications, they are too crude for studying weak convergence properties (33,34). In the one-dimensional setting, again a substantial progress has been made for lacunary sequences, starting from the classical work of Salem and Zygmund [SZ], and ending in the recent studies by Aistleitner et al. [AB1,ABT] (see also [AB2]). Satisfying the CLT analogues (33,34) (with permutations and possible deletions permitted) would undoubtedly be a strong "evidence of randomness". How is it (or is it) related to the strongest Knuth's [Kn2] R6 type of pseudo-randomness?