University of Birmingham A Gaussian limit process for optimal FIND algorithms

We consider versions of the FIND algorithm where the pivot element used is the median of a subset chosen uniformly at random from the data. For the median selection we assume that subsamples of size asymptotic to c · n α are chosen, where 0 < α ≤ 12 , c > 0 and n is the size of the data set to be split. We consider the complexity of FIND as a process in the rank to be selected and measured by the number of key comparisons required. After normalization we show weak convergence of the complexity to a centered Gaussian process as n → ∞ , which depends only on α . The proof relies on a contraction argument for probability distributions on càdlàg functions. We also identify the covariance function of the Gaussian limit process and discuss path and tail properties.


Introduction
The FIND algorithm is a selection algorithm, also called Quickselect, to find an element of given rank in a set S of data, where the data set S is a subset of finite cardinality |S| of some ordered set.We have ∈ {1, 2, . . ., |S|} and assume that the data are distinct.The algorithm was introduced by Hoare [22].
FIND is a one-sided version of the well-known sorting algorithm Quicksort.It works recursively by first choosing one element p ∈ S, called the pivot element, and generating two subsets S < and S > , where S < := {s ∈ S | s < p} and S > := {s ∈ S | s > p}.If = |S < | + 1 then the pivot element is the rank element to be selected and the algorithm stops.Otherwise, if ≤ |S < | it is recursively applied to S < , if ≥ |S < | + 2 it is recursively applied the S > searching for rank − |S < | − 1.This is called the 3-version could also adapt the algorithm to select particular ranks .This is a different task; the literature is reviewed below.
For our subsequent probabilistic analysis we assume that the data are random variables in the unit interval [0, 1], which are independent and identically distributed all with the uniform distribution on [0,1].Note that all our results also hold for any deterministic set of data as long as the subset to select the pivot element in each step is chosen independently and uniformly from the set of data.In our probabilistic model we also assume that the subset for the pivot selection is chosen independently from the data.
As a measure for the complexity we consider the number of key comparisons required by the version of FIND.We denote by X n ( ) and X n ( ) the number of key comparisons required when starting with a set of size n and selecting the element with rank 1 ≤ ≤ n using the 2-version and 3-version respectively.Note that the choice of c and α as well as the particular choice of the median selection algorithm to find the pivot element within the subset are suppressed in the notation.A median of a set can be found in time (i.e.number of key comparisons) linear in the size of the set.It will later turn out that our results are independent of the choice of the median-selection algorithm to find the pivot element within the random subset as long as mild assumptions are satisfied which are shared by standard median-selection algorithms (we could in fact use FIND itself in this step).We denote the number of key comparisons needed to find the pivot as the median of a subset of size k = k(n) by T n and assume for any p ≥ 1 that we have T n p = O(k(n)), (n → ∞), (1.1) where X p := E [|X| p ] 1/p denotes the L p -norm of a random variable X for 1 ≤ p < ∞.
The big-O notation as well as other Bachmann-Landau symbols are used here and later on.
Theorem 1.1.Consider the process X (2) n = (X (2) n ( )) 1≤ ≤n of the number of key comparisons needed by the 2-version of the median-of-k FIND algorithm with k = k(n) ∼ cn α with c > 0 and α ∈ (0, 1  2 ] and condition (1.1) for the pivot selection in the partitioning step.Then we have, as n → ∞, the weak convergence where Z = (Z t ) t∈[0,1] is a centered Gaussian process depending on α with covariance function specified in Theorem 2.4 below (and where we set by convention X (2) Our main convergence result for the 3-version is the weak convergence of all finite dimensional marginals, denoted by f dd −→, for the analogously normalized process to the corresponding marginals of the Gaussian process of Theorem 1.1.
Theorem 1.2.Consider the process X n ( )) 1≤ ≤n of the number of key comparisons needed by the 3-version of the median-of-k FIND algorithm with k = k(n) ∼ cn α with c > 0 and α ∈ (0, 1  2 ] and condition (1.1) for the pivot selection in the partitioning step.Then we have, as n → ∞, convergence of the finite dimensional marginals, X where Z = (Z t ) t∈[0,1] is the centered Gaussian process of Theorem 1.1 (and where we set by convention X n (n + 1) := X n (n)).
Some additional related results are stated in Corollary 3.6.
As observed by Grübel [20], for the worst-case behavior of any version of FIND, we have Moreover, Grübel [20,Theorem 5] notes that 1 n ( ) → 2 in probability for any median-of-k FIND variant with k = k(n) → ∞ and k = o(n/ log n).Hence, the algorithms investigated in the present work are asymptotically optimal with respect to the worst-case behavior.The following theorem gives more precise information.
Theorem 1.3.As n → ∞, with convergence of all moments, we have where Z(t) is the process of Theorem 1.1.The same result holds for the 2-version.
In the classical case of FIND (by classical we mean with a uniformly chosen pivot element) a process convergence results for the number of key comparisons (as in Theorems 1.1 and 1.2) has been obtained in the seminal paper of Grübel and Rösler [21].
Here, Z 0 and Z 1 have the same distribution as Z, U is uniformly distributed on [0, 1], and Z 0 , Z 1 , U are independent.In [21] also the difference between the 2-version and 3-version is discussed regarding weak convergence in (D[0, 1], d sk ) for the 2-version, whereas for the 3-version such a convergence does not hold.A similar behavior appears for our FIND algorithm as reflected in Theorems 1.1 and 1.2.
For the classical FIND Paulsen [41] studied variances and higher moments in the setting of quantiles of [21].Kodaj and Móri [31] investigated rates of convergence for the marginals of the process.Hwang and Tsai [23] considered the case t = 0, i.e. ranks of the form = o(n) and found (among other things) that here the limit distribution is the Dickman distribution.Note that this is the distribution of Z(0).
With respect to the one-dimensional marginals, Theorem 1.1 and Theorem 1.2 reveal that, asymptotically, both first and second order behavior of the considered complexities do not depend on t ∈ [0, 1].This stands in sharp contrast to the results for classical FIND (and median-of-k FIND with k > 1 fixed reviewed below), as the distribution of Z(t) in (1.2) depends on t.
Historically the mathematical analysis of classical FIND was initiated with an average case analysis for fixed ranks by Knuth [30].Variances were derived in Kirschenhofer and Prodinger [24].
For mathematical analysis of median-of-k versions of FIND with fixed k not depending on the size of the input we refer to Anderson and Brown [2], Kirschenhofer, Martínez and Prodinger [25] and Grübel [20].A broad survey, also covering median-of-k analysis is given in Rösler [45].
A discussion of FIND versions with k = k(n) depending on the size n of the list to be split with respect to the worst-case behavior was given in Grübel [20].Martínez and Roura [35] give an average case analysis, where optimal choices for the tradeoff between better balanced sublists versus additional cost for the median selection are discussed.Note that another idea to adapt the FIND algorithm is to not choose the median of a subsample but to choose an element that may depend on the rank searched for such that the sublist where the algorithm is recursively called may be small.This is investigated in Martínez, Panario and Viola [36], see also Knof and Rösler [29,.
In various contributions also the number of key exchanges is studied which has to be compared with the number of key comparisons for a more realistic measure of complexity.Corresponding limit distributions can be found in Hwang and Tsai [23], Knape and Neininger [26,Section 5], Mahmoud [32,33] and Dadoun and Neininger [6].
Another model for the rank searched for is to consider a random rank chosen uniformly and independently from the data and algorithm.So called grand averages were considered for key comparisons in Mahmoud, Modarres and Smythe [34], and, for a different version of the partitioning stage using two pivot elements, in Wild, Nebel and Mahmoud [51].For the number of key exchanges under grand averages see [33,6].Yet another complexity measure is the worst case complexity with worst case over the possible ranks, see Devroye [9].
Tail bounds for the number of key comparisons for the classical FIND were studied in Devroye [7] and Grübel [19].
A fundamentally different cost measure arises when a key comparison is weighted by the number of bit comparisons needed to identify its result.The number of bit comparisons was studied by Vallée et al. [50] and Fill and Nakama [16,17], see also Grabner and Prodinger [18].
The techniques used to show convergence in Section 3 and to construct the limit process Z in Section 2.1 are in the spirit of the contraction method.(We refer to Rösler and Rüschendorf [46] and Neininger and Rüschendorf [38] for an introduction and survey of the contraction method for univariate and finite-dimensional quantities.)In the last years a couple of general approaches have been developed to show process convergences within the contraction method on different function spaces and in different topologies, see Eickmeyer and Rüschendorf [14], Drmota, Janson and Neininger [12], Knof and Rösler [29], Neininger and Sulzbach [40] and Ragab and Rösler [43], as well as the PhD theses of Knof [28], Ragab [42] and Sulzbach [48].
The construction of the limit process Z that we present in Section 2.1 builds upon ideas of Ragab and Rösler [43].However, the convergence proof for Theorem 1.1 yields weak convergence in (D[0, 1], d sk ) which has to be compared with the convergence of finite dimensional distributions shown for a related problem in [43].Our approach to convergence is almost entirely based on contraction arguments on the level of the supremum norm of processes and very little (deformation of time) is needed in addition to align jumps.Besides leading to comparatively strong results, we feel that the technique for convergence developed here is flexible and general to be easily applicable to related recursive problems.
A similar version of the Quicksort algorithm consists in also choosing the pivot element in each step as a median of a random sub-sample of size k = k(n) ∼ cn α with n the size of the list to be split.We conjecture that such a Quicksort algorithm admits a Gaussian limiting distribution for the normalized number of key comparisons.This would be in contrast to the well-known non-Gaussian limiting distribution for classical Quicksort, see [44].
Plan of the paper.The paper is organized as follows.In Section 2.1 the limit process Z is constructed and in Section 2.2 identified as a centered Gaussian process with explicitly given covariance function.Section 3 contains the asymptotic analysis of the complexity of the median-of-k FIND leading to the proofs of Theorems 1.1 and 1.2.The organization of the proofs is outlined at the beginning of Section 3. In the final Section 4 we present properties of the limit process Z.In Subsections 4.2 and 4.3 path properties of Z are discussed, Subsection 4.1 has a characterization and a tail bound for the supremum of the limit process Z.The Appendix is devoted to the proofs of two technical lemmata.The first, Lemma 5.1, allows the transfer of the results for the 2-version in Theorem 1.1 to the 3-version in Theorem 1.2.The second, Lemma 4.3, is needed in the study of the path variation of the limit process Z.

Construction and characterization of the limit process
We first construct and characterize the limit process Z appearing in Theorems 1.1 and 1.2.In this and the following section we fix α ∈ (0, 1/2] and suppress the dependence on α in the notation.

Construction
We consider the rooted complete infinite binary tree, where the root is labeled by the empty word and left and right children of a node labeled ϑ are labeled by the extended words ϑ0 and ϑ1 respectively.The set of labels is denoted by Θ := ∪ ∞ k=0 {0, 1} k .The length |ϑ| of a label of a node is identical to the depth of the node in the rooted complete infinite binary tree.
For u ∈ [0, 1] we define linear operators as follows.For f ∈ D[0, 1] the càdlàg functions A u (f ) and B u (f ) are defined as respectively.Furthermore, we define the step function sg : . Hence, sg is a shifted version the sign function, and it is in D[0, 1].
For a given family {N ϑ | ϑ ∈ Θ} of independent random variables in R each with the standard normal distribution we recursively define a family {Z ϑ n | ϑ ∈ Θ, n ∈ N 0 } of random variables in (D[0, 1], d sk ) as follows: We set Z ϑ 0 := 0 for all ϑ ∈ Θ. Assume, the Z ϑ n are already defined for an n ≥ 0 and all ϑ ∈ Θ.Then for all ϑ ∈ Θ we set (2.1) We have the following asymptotic properties for the Z ϑ n : Lemma 2.1.Let {Z ϑ n | ϑ ∈ Θ, n ∈ N 0 } be a family as defined (2.1).Then, for each ϑ ∈ Θ, the sequence (Z ϑ n ) n≥0 converges almost surely uniformly and in the L p -norm for all p ∈ N to a limit càdlàg process Z ϑ .For all ϑ ∈ Θ we have, almost surely, 2) The family {Z ϑ | ϑ ∈ Θ} is identically distributed and all moments of the Z ϑ are finite.
Proof.We first show by induction that for all ϑ ∈ Θ and all n ∈ N 0 we have 3) is satisfied for n = 0. Now as induction hypothesis assume, that (2.3) is true for all ϑ ∈ Θ with n replaced by n − 1.Note that for a random variable (2.4) From (2.3), using Markov's inequality, we infer that sup m≥n Z ϑ m − Z ϑ n → 0 as n → ∞ in probability and hence sup m,p≥n Z ϑ m − Z ϑ p → 0 as n → ∞ in probability by a simple application of the triangle inequality.By monotonicity, the latter convergence is almost sure.In other words, for each ϑ ∈ Θ, the sequence (Z ϑ n ) n≥0 is almost surely a Cauchy sequence with respect to the • -norm.Since (D[0, 1], • ) is complete, there is a limit random process Z ϑ such that we have convergence almost surely uniformly.
Since the operators A 1 2 and B 1 2 are continuous with respect to the • -norm we obtain (2.2) from (2.1) by letting n → ∞.By construction, {Z ϑ n | ϑ ∈ Θ} is a family of identically distributed random variables for each n ∈ N 0 .Hence we obtain that the Z ϑ are identically distributed.Finally, note that and the triangle inequality for the Similar arguments apply for higher moments.
Definition 2.2.We write Z := Z , hence Z is a random process identically distributed as the Z ϑ in Lemma 2.1 and call it the limit process and its distribution the limit distribution.Analogously we define Z n := Z n .
Let M denote the set of probability measures on (D[0, 1], d sk ).We define the map T : M → M by, for µ ∈ M, where L(X 0 ) = L(X 1 ) = µ, N has the standard normal distribution and X 0 , X 1 , N are independent.For 1 ≤ p < ∞, we further denote We have the following characterization of the limit distribution L(Z) of Z: To see that the restriction of of (X, Y ) such that N, (X, Y ), (X , Y ) are independent and N has the standard normal distribution.Then a calculation similar to (2.4) implies ) is a strict contraction and has at most one fixed point.This implies the assertion.

Characterization of the limit process
For ϑ ∈ Θ let B ϑ be the set of real numbers in [0, 1] whose binary representation has prefix ϑ.Here, the binary expansion of t = t 1 t 2 . . .∈ [0, 1) is uniquely determined by the convention that we always use expansions such that for all k ∈ N there exists > k with t = 0. Note that we have the decomposition B ϑ = B ϑ0 ∪ B ϑ1 .The construction in (2.1) with the N ϑ there implies representations for Z and Z n from Definition 2.2, for all t ∈ [0, 1] and n ≥ 0: Thus, Z n is constant on the intervals [i2 −n , (i + 1)2 −n ) for i = 0, . . ., 2 n − 1.The ϑ ∈ Θ with |ϑ| = n we denote in lexicographical order by w 0 , w 1 , . . ., w 2 n −1 .Then we have For u, v ∈ [0, 1] we denote their binary expansions by , again with the convention introduced above.Then we denote the length of the longest joint prefix of u and v in their binary expansions by with the conventions max ∅ := 0 and max N := ∞.
Theorem 2.4.The limit process Z from Definition 2.2 is a centered Gaussian process with càdlàg paths.For its covariance function σ(s, t) with the convention κ ∞ := 0. Equivalently, (2.9) Proof.By induction we find that (Z n ) n≥0 is a sequence of centered Gaussian processes.Hence, Lemma 2.1 implies that Z is a centered Gaussian process.It remains to compute the covariance function of Z. Comparing left and right hand side of equation (2.2) and using that, by construction, N ϑ , Z ϑ0 and Z ϑ1 are independent, we find From this it follows, for s = t that σ(s, t) = −κ j(s,t) By the theorem of dominated convergence, right-continuity of Z and the fact that E Z 2 < ∞ it follows, that for any s ∈ [0, 1], t → σ(s, t) is right-continuous.This finishes the proof of (2.8).The equivalence with (2.9) is obvious.
Then, as Z is almost surely càdlàg, the previous theorem also implies for any t ∈ D where i is minimal with t ∈ D i .Here and subsequently, N (µ, σ 2 ) denotes the normal distribution with mean µ and variance σ 2 .Corollary 2.5.Almost surely, Z is continuous at t for all t / ∈ D. On the contrary, for any t ∈ D, almost surely, Z is not continuous at t.
Proof.Let A be a set of measure one such that Z n → Z uniformly on A. As Z n is continuous at t for all n if t / ∈ D it follows that Z is continuous at t for all t / ∈ D on A, thus almost surely.For t ∈ D, discontinuity follows immediately from (2.10).
More refined path properties are discussed in Sections 4.2 and 4.3.Simulations of realizations of Z 10 for α = 1/2 are presented below in Figure 1 to indicate the structure of the paths of the limit process Z.

Analysis of the Quickselect process
Our asymptotic analysis to prove the functional limit laws for the processes in Theorems 1.1 and 1.2 is organized as follows.In Section 3.1 we state a recurrence relation on which the whole analysis is based.To apply ideas from the contraction method we need to derive a distributional fixed point equation for a potential limit of the normalized processes as captured by the map T in (2.5).For this, in Section 3.2 first the asymptotic behavior of the size I n of S ≤ is identified.Then in Section 3.3 a recurrence for the normalized processes appearing in Theorem 1.1 is derived.The random quantities are all embedded on one probability space and coupled in such a way that distances can be bounded pointwise (with respect to randomness ω) in the supremum norm on D[0, 1].We keep the jumps of a couple of auxiliary processes exactly aligned to those of Y n in order to be able to bound distances by contraction arguments.The necessary deformations in time to align with the jumps of the limit process Z are afterwards done in Proposition 3.5.

Preliminaries
Our analysis is based on a recurrence for the distributions of the processes X (2) n ( )) 1≤ ≤n .Note that after the selection of the median from the subset the k elements of the subset can already be assigned to the sets S < , S > and S ≤ respectively so that only n − k remaining elements need to be compared with the pivot element.We denote the rank of the pivot element chosen in the first step by I n .We set X where T n ,I n , X (3) 0 , . . ., X n−1 are independent and X (3) j is distributed as X (3) j for 0 ≤ j ≤ n − 1.The stated independence is satisfied since in subsequent partitioning steps all choices of subsets are made independently.For the 2-version we have the same initial values as for the 3-version and, for all n ≥ 2 that with conditions on independence and identical distributions analogous to the 3-version in (3.1).
Recall that T n is the number of key comparisons for the identification of the median within the random subset and that we assume condition (1.1).
We choose n 0 large enough such that k(n) ≥ 3 for all n ≥ n 0 .This ensures that I n < n for all n ≥ n 0 .

Asymptotics for the pivot and sublist sizes
For simplicity of representation, we assume c = 1, i.e. k = k(n) ∼ n α with α ∈ (0, 1/2] for the remainder of the section.Elements in the presample of size k are chosen without replacement, thus the distribution of I n is given by where, here and subsequently, for n ∈ N, p ∈ [0, 1], Bin(n, p) denotes a random variable with the binomial distribution for n trials with success probability p.Moreover, for α, β > 0, Beta(α, β) denotes a random variable with the beta distribution with parameters α, β.Subsequently, let (M n ) n≥1 be a sequence of random variables with the beta distribution with parameters (k + 1)/2, (k + 1)/2.Lemma 3.1.We have and, for n → ∞, Proof.The expressions for mean and variance follow by straightforward calculations.
For the limit theorem note that for the beta distribution and the binomial distribution we have the following identity for all a, b ∈ N and x ∈ (0, 1).Applying this to M n and using the central limit theorem, e.g., in the version of de Moivre-Laplace implies the assertion.
For the size I n of the left sublist generated in the first partitioning step we have: We have Proof.The first two moments follow from Lemma 3.1.Given M n , let X n have the binomial distribution with parameters n − k, M n and set I n = k+1 2 + X n .By Skorokhod's representation theorem, we may assume the existence of a sequence (F n ), where F n has the distribution of n α/2 (M n − 1/2) such that F n → N almost surely where N has the normal N (0, 1 4 ) distribution.Let M n = F n n −α/2 + 1/2 and construct X n and I n such as X n and I n but based on the M n .Decomposition yields By construction, the second summand of the latter display tends to N almost surely.Moreover, the third summand tends to zero almost surely.By conditioning on (M n ) and the fact that M n → 1/2 almost surely, the first factor of the first summand converges to a standard normal distribution by the central limit theorem for sums of independent and uniformly bounded random variables.As the second factor of the first summand tends to zero almost surely, the first summand converges to zero in probability.This shows in probability.
More refined information about the distribution of M n is given in the Appendix.

Proof of Theorems 1.1 and 1.2
We first discuss the 2-version of the process and recall the normalization from Theorem 1.1 which we denote by Y 0 := 0 and Y n (t) := X (2) with the convention X (2) on (D[0, 1], d sk ) with conditions on independence and distributional copies as in (3.1).Now, we embed all the relevant random variables on one probability space such that we have appropriate almost sure convergences.Throughout we use boldface characters to denote the embedded quantities.To be specific, by Skorokhod's representation theorem and Lemma 3.2, we can construct a set of independent and identically distributed random variates {(S ϑ n ) n≥n0 , N ϑ , ϑ ∈ Θ} such that N ϑ has the standard normal distribution, S ϑ n has the distribution of (2I n − n)/n 1−α/2 and S ϑ n → N ϑ almost surely.
Moreover, by Lemma 3.2, we have We can further augment this set of random variables by another set {T ϑ n , n ≥ n 0 , ϑ ∈ Θ} of independent random variables, independent of (S , also independent of the family of random variables defined above.For n ≥ n 0 , we define recursively By construction, we have L(Y ϑ n ) = L(Y n ) for all n ∈ N, since the sequences (Y ϑ n ) n≥0 and (Y n ) n≥0 satisfy the same distributional recurrence and have the same initial distributions for i = 0, . . ., n 0 − 1. Subsequently, we use the sets {Z ϑ n , n ∈ N 0 , ϑ ∈ Θ} and {Z ϑ , ϑ ∈ Θ} as defined in (2.1) and Lemma 2.1 where the construction is executed using the particular set of random variables {N ϑ , ϑ ∈ Θ}.We denote the resulting random variables by Z ϑ n , n ∈ N, ϑ ∈ Θ and Z ϑ , ϑ ∈ Θ.To start bounding distances between Y n and Z n we use two intermediate sequences of stochastic processes Q ϑ n and R ϑ n in (D[0, 1], d sk ).First, let Q ϑ i := 0 for all ϑ ∈ Θ, i < n 0 and, recursively for all n ≥ n 0 , The proof of the functional limit law in Theorem 1.1 is organized by splitting the difference between Y n and Z into several intermediate differences involving the terms defined above.As in Definition 2.2 we use the abbreviations together with convergence of all moments.The same is true for the 3-version X n .
The rest of this section contains the proofs of our statements.
Proof of Proposition 3.3.By construction, we have We now take the supremum over t ∈ [0, 1] and the expectation on both sides.Then, by construction, the summands in lines (3.6) and (3.7) vanish as n → ∞.Using the Cauchy-Schwarz inequality for the product in (3.8) and furthermore A u = B u = 1 we obtain altogether that where ε n , ε n → 0. Now, the arguments to infer E Y n − Q n 2 → 0 are standard in the framework of the contraction method.In a first step, one shows that the sequence To this end, assume that ∆ m ≤ C for all m < n with C ≥ 1.Then, the last display implies Moreover, we can assume n to be large enough to satisfy P ( ≤ J n ≤ n − ) ≥ 1 − δ.
Proof of Proposition 3.4.By definition, we have Let ε n be the second moment of the first summand in the latter display.By construction, we have R n ≤ Z n for all n ∈ N. Thus, Lemma 2.1 implies that the sequence E R n 2 is bounded.Using the Cauchy-Schwarz inequality, we infer that ε n → 0 as n → ∞.Yet another application of the Cauchy-Schwarz inequality shows The result now follows by an argument similar to the proof of Proposition 3.3.
Proof of Proposition 3.5.Let ε > 0. By Lemma 2.1 there exists an n 1 ∈ N such that Let n ≥ n 1 .When applying the recurrence (3.5) for R n iteratively n 1 times we obtain a representation of R n with at most 2 n1 summands.Each summand corresponds to one of the 2 n1 sublists (some possibly being empty) generated by the algorithm in the first n 1 recursive steps.Let A n denote the event that each of these 2 n1 sublists has size at least n 0 .On A n the split into these first 2 n1 sublists causes 2 n1 − 1 points of discontinuity of R n which we denote by 0 < T R n has additional points of discontinuity caused by splits when further unfolding the recurrence (3.5).Moreover, we denote the points of discontinuity of Z n1 by τ k n = k/2 n1 for k = 1, . . ., 2 n1 − 1.
By Lemma 3.2 we have J ϑ n /n → 1/2 for each ϑ ∈ Θ almost surely, hence (3.10) To bound the Skorokhod distance between R n and Z n we define a deformation of time as follows: On A n let λ n : [0, 1] → [0, 1] be defined by λ n (0) := 0, λ n (1) := 1, λ n (τ k n ) = T k n for k = 1, . . ., 2 n1 − 1 and linear in between these points.Then, with id the identity t → t on [0, 1] we have on the event in (3.10) that λ n − id < ε.This implies for all n ≥ n To see this, note that on event on the left hand side, we have Thus, for all n sufficiently large, P (d Proof of Theorem 1.3.Distributional convergence for the 2-version follows directly from Theorem 1.1.The proof of Theorem 1.1 has also revealed that Z has finite moments of all orders and that the sequences ( Y n ) n≥1 and ( Q n ) n≥1 are both bounded in L p for any 1 ≤ p < ∞.This shows the claim of Theorem 1.3 for the 2-version.An alternative approach which works for both the 2and the 3-version relies on the contraction method for max-type recurrences.This is based on the distributional recurrence where W n := sup 1≤ ≤n X n ( ) and ( W n ) n≥0 is an independent copy of (W n ) n≥1 , both independent of (I n , T n ).The latter display allows to deduce Theorem 1.3 straightforwardly from Theorem 4.6 in [47] together with the characterization of Z given in Corollary 4.1.

Further properties of the limit process
In this section we first study the supremum of the limit process and derive tail bounds.Then path properties of the limit process Z are investigated.Here, first, the variation of the limit process Z is studied.Then, we will endow the unit interval with an alternative metric d κ such that Z has continuous paths with respect to d κ .This allows to study the modulus of continuity and Hölder continuity properties.In Sections 4.1 and 4.3, we make use of general results about path continuity and the supremum of Gaussian processes, see, e.g., Adler's book [1], and of the explicit construction of the limit process.

The supremum of the limit process
Let S ϑ n = sup t∈[0,1] Z ϑ n (t) and S ϑ = sup t∈[0,1] Z ϑ (t).By the uniform convergence stated in Lemma 2.1 we have S ϑ n → S ϑ almost surely.The first result concerns a max-type recurrence for S n and characterizes the distribution of S as solution of a stochastic fixed-point equation.To this end, let M(R) denote the set of probability measures on the real line, and T * : M(R) → M(R) be defined, for µ ∈ M(R), by where L(X 0 ) = L(X 1 ) = µ, N has the standard normal distribution and X 0 , X 1 , N are independent, and κ = 2 α−2 (as above).
Corollary 4.1.Let ϑ ∈ Θ.We have The distribution of S ϑ is the unique fixed-point of the restriction of T * to M p (R) for any p > p α with p α given in (2.6).
Proof.The recurrence for S ϑ n and the almost sure identity for S ϑ follow by construction and Lemma 2.1.The characterization of L(S ϑ ) is a special case of Theorem 3.4 in [39].It is a well-known phenomenon that the supremum of a Gaussian process resembles a Gaussian random variable.This explains the following proposition.Proposition 4.2.For the supremum S = sup t∈[0,1] Z(t) of the limit process Z from Definition 2.2 we have for any t > 0 that The same tail bounds are valid when S is replaced by S n = sup t∈[0,1] Z n (t) for any n ∈ N.
The constant in the exponent on the right hand side of (4.2) is asymptotically optimal as t → ∞ Moreover, we have since there, the assumption of path continuity can be relaxed to regularity.Both results also apply to S n for n ∈ N.
For the lower bound on E [S] note that there is a t 0 ∈ [0, 1] such that the terms (1 , which is the lower bound. For the upper bound on E [S] we take squares and expectations on left and right hand side of (4.1).This implies E S 2 ≤ 2/(1 − 2κ) and we obtain the bound from

Variation of paths
We have already seen that the constant p α defined in (2.6) is intimately linked to the limit process Z.In this section, we will see that this connection extends to path properties of Z, more precisely to its path variation.To formalize the main results of the section we need some notation.For t ∈ (0, 1], let Π(t) be the set of all finite decompositions of the interval [0, t].Elements π ∈ Π(t) we write as π = {τ 1 , τ 2 , . . ., τ k } with 0 = τ 1 < τ 2 < . . .< τ k = t.We also denote |π| = k the size of π.Moreover, we abbreviate mesh(π) = max i=1,...,|π|−1 |τ i+1 − τ i |.For a càdlàg function f and p > 0, t ∈ (0, 1], we define V p,t (f ) := sup where V p (f ) := V p,1 (f ).Let N f be the set of discontinuity points of f .Then, we set if the limit exists in R + 0 ∪ {∞}.The càdlàg property of f implies that, for any t ∈ (0, 1], The following lemma is well-known in the case p = 1, q = 2, we did not find a proof for the general case in the literature.Thus, we include one in the Appendix. Lemma 4.3.Let f ∈ D[0, 1], p > 0 and V p (f ) < ∞.Then, for any q > p, we have

[f ]
(q) t = W q,t (f ).Additionally, the map t → The following theorem is the main result of this section.Recall the definition of p α in (2.6) and γ = 4 − 2κ 1 − κ .
where the convergence in (4.3) with f = Z also holds with respect to all moments.
For the mean, we have ii) Almost surely, for any t ∈ (0, 1], The proof of the theorem makes use of a simple yet useful tool, well known, e.g., from Lévy's construction of Brownian motion.
Proof.We have The Borel-Cantelli Lemma implies the assertion.
Proof of Theorem 4.4.The main part of claim i) follows immediately from Lemma 4.3 upon establishing V p (Z) < ∞ for p > p α almost surely.To prove this, let A be a set of measure one and K = K(ω) for ω ∈ A such that the statement of Lemma 4.5 is satisfied with c = 2 there.Let π ∈ Π(1).Then, for fixed ω ∈ A, Page 18/28 ejp.ejpecp.orgA Gaussian limit process for optimal FIND algorithms We will show that both terms on the right hand side can be bounded from above independently of the partition π.This shows the claim V p (Z) < ∞.The first summand is easier.There are at most 2 pairs (τ i , τ i+1 ) such that j(τ i , τ i+1 ) = .Thus, i=1,...,π−1 j(τ i+1 ,τ i where we have abbreviated Since 2κ p/2 < 1 the right hand side of the latter display is finite.Combining the latter display and (4.8), we obtain the desired upper bound for (4.7).For the convergence of moments let m ∈ N.Then, for π ∈ Π(1), we have The result follows as the last bound does not depend on π.
Regarding the mean of the p-variation, abbreviating D 0 = ∅, we have which finishes the proof of i).
We move on to the proof of ii).Due to (4.4) it is sufficient to show that, for any t ∈ (0, 1], we have W pα,t = ∞ almost surely.Again, we restrict our presentation to the case t = 1.As a warm-up we first investigate the case p < p α .Let X n = t∈Dn |∆Z(t)| p and X n = t∈Dn\Dn−1 |∆Z(t)| p .Then X n ≤ X n ↑ t∈D |∆Z(t)| p almost surely.The assertion t∈D |∆Z(t)| p = ∞ almost surely now follows easily from Chebychev's inequality and the facts that, as n → ∞, Here, we have used that the random variables ∆Z(t), t ∈ D n \D n−1 are independent.Note that this does not extend to all t ∈ D n .The situation is more involved for p = p α .
Here, the sequence (E [X n ]) is constant which implies Thus, E [Z]  where the right hand side does not depend on n.For t ∈ D i \D i−1 and j > i, ∆Z(t) is independent of ∆Z(s) for all s ∈ D j \D j−1 except for its direct neighbors.Thus, we have The assertion follows.

Binary topology and path continuity
Regarding path continuity of a Gaussian process X on the unit interval, the canonical choice of a metric is given by d(s, t) := E [(X(t) − X(s)) 2 ] for s, t ∈ [0, 1].In our case, that is X = Z, identifying [0, 1] with {0, 1} N via the binary representations, d induces the product topology on {0, 1} N .A sequence (x (n) ) n≥1 where x ∈ D or x ∈ D and additionally x (n) ≥ x for almost all n.The limit process Z as well as its p-variation for p > p α are almost surely continuous with respect to d.
where the lim sup is taken over sequences h ↓ 0 with h = κ n for some n ∈ N.
Proof.We start with the upper bound.First, let ε > 0 and K 1 be large enough such that Lower bounds follow analogously as for Brownian Motion.
By construction, for fixed n ∈ N, the family of events This yields the assertion upon choosing h = κ n for n sufficiently large (and random).
Moduli of continuity of the order Proof.The result for β < 1/2 follows immediately from the upper bound on the modulus of continuity.Thus, we consider the case β > 1/2.We only treat the interval [0, 1), the proof for t = 1 being easier.We adopt the proof of the corresponding statement for the Brownian Motion from [37], Theorem 1.30.As explained there, it is sufficient to show that, for any M > 0, the event is a null event.We fix an integer L > 4 whose precise value will be specified later.For any n > 3L let Assume that ω ∈ A and (t 0 , ε 0 ) = (t 0 (ω), ε 0 (ω)) satisfies the statement in the event A.
Then, if k n (t 0 ) ∈ R n and n > 3L is large enough such that κ n−L < ε, for all 1 ≤ i ≤ L. Hence, ω ∈ S n,kn(t0) .As k n (t 0 ) ∈ R n for infinitely many n, we can deduce that also ω ∈ S n for infinitely many n, that is A ⊆ lim inf S n .We finish the proof by showing that P (lim inf S n ) = 0.For k ∈ R n , we have As the density of |N | is bounded by 2, we have P (S n,k ) ≤ (4M κ −L(β−1/2) γ −1/2 ) L κ nL(β−1/2) .
The regularity of t → [f ] (q) t and the characterization of its jumps follow immediately.
probability by Propositions 3.4 and 3.5.As Z is almost surely continuous at t, it follows that Y n (t n ) → Z(t) in probability.Based on the uniform boundedness of the sequence E Y n 2 a simple induction relying on its recursive definition shows that sup n≥1 E [ Y n m ] < ∞ for all m ∈ N.This implies the result for the 2-version.The statement about the 3-version follows from this and Lemma 5.1.

( 4 . 10 )
Note again that d κ and d depend on α via κ.Finally, working with d or more generally, changing the base in (4.10) to any value lower than one will only effect absolute constants in the following results.The additive construction of Z somewhat resembles Lévy's construction of Brownian Motion which guides both intuition and proofs in the remainder of this section.
4.6 (Modulus of continuity).With γ as in (4.6) we have, almost surely, An upper bound for the right hand side in the latter For any β < 1/2, almost surely, the paths of Z are Hölder continuous with exponent β with respect to d κ .For any β > 1/2, almost surely, the paths of Z are nowhere pointwise Hölder continuous with exponent β with respect to d κ .