Non asymptotic distributional bounds for the Dickman Approximation of the running time of the Quickselect algorithm

Given a non-negative random variable $W$ and $\theta>0$, let the generalized Dickman transformation map the distribution of $W$ to that of $$ W^*=_d U^{1/\theta}(W+1), $$ where $U \sim {\cal U}[0,1]$, a uniformly distributed variable on the unit interval, independent of $W$, and where $=_d$ denotes equality in distribution. It is well known that $W^*$ and $W$ are equal in distribution if and only if $W$ has the generalized Dickman distribution ${\cal D}_\theta$. We demonstrate that the Wasserstein distance $d_1$ between $W$, a non-negative random variable with finite mean, and $D_\theta$ having distribution ${\cal D}_\theta$ obeys the inequality $$ d_1(W,D_\theta) \le (1+\theta)d_1(W,W^*). $$ The specialization of this bound to the case $\theta=1$ and coupling constructions yield $$ d_1(W_{n,1},D) \le \frac{8\log (n/2)+10}{n} \quad \mbox{for all $n \ge 1$, where} \quad W_{n,1}=\frac{1}{n}C_{n,1}-1, $$ and $C_{n,m}$ is the number of comparisons made by the Quickselect algorithm to find the $m^{th}$ smallest element of a list of $n$ distinct numbers. A similar bound holds for $m \ge 2$, and together recover the results of [12] that show distributional convergence of $W_n$ to the standard Dickman distribution in the asymptotic regime $m=o(n)$. By developing an exact expression for the expected running time $E[C_{n,m}]$, lower bounds are provided that show the rate is not improvable for all $m \not = 2$.


Introduction
For a given non-negative random variable W and θ > 0, let the generalized Dickman transformation map the distribution of W to that of where U has the uniform distribution U[0, 1] on the unit interval, and is independent of W and where = d denotes equality in distribution. It is well known [6], [15] that the generalized Dickman distribution D θ is the unique fixed point of the transformation (1), that is, When (1) holds we will say that W * has the D θ -bias distribution of W . In what follows, D θ will denote a random variable with distribution D θ . The case θ = 1 corresponds to the (standard) Dickman distribution, for which we may drop the subscript θ. The Dickman function ρ first made its appearance in number theory [7] when counting the number of integers below a fixed threshold whose prime factors satisfy some given upper bound. Standardizing ρ yields the density of the standard Dickman distribution, the cannonical member of the family D θ , θ > 0 of generalized Dickman distributions, which also arise in the study of component counts of logarithmic combinatorial structures such as permutations and partitions [1], and more generally for the quasi-logarithmic class considered in [3]. See also the recent work [17], [2] and [4] in this area, that detail some connections to probabilistic number theory.
Members from the generalized Dickman family have subsequently been noted to arise in a variety of other contexts, in particular for the sum of edge lengths of vertices connected to the origin in minimal directed spanning trees in [15], and for weighted sums of independent random variables in [16], [2] and [4]. Simulation of the Dickman distribution has been considered in [6].
Here we study the error incurred when using the standard Dickman distribution to approximate that of the (properly normalized) number of comparisons made by the Quickselect sorting algorithm of Hoare [11] for locating the m th smallest element of a list of n distinct numbers. One may visualize how Quickselect works in terms of a tree structure. First, a 'pivot' is chosen uniformly from the given list. The list is then divided into those numbers on the list that are strictly smaller, making up the left subtree, and those that are strictly larger, making up the right. If the left subtree is of size m − 1 then the pivot is the desired m th smallest element, and the procedure terminates. Otherwise, the process continues recursively on the left sub-tree if it is of size m or larger, and else on the right sub-tree. Letting where C n,m is the number of comparisons made by Quickselect, the work of [12] showed that W n,m converges in distribution to the Dickman D when m = o(n). We note that in the case m = 1 Quickselect simplifies in that at each step of the recursion the procedure either stops or continues on the left subtree. As this case is simpler than for m ≥ 2 we deal with it separately.
The following two theorems quantify and recover the results of [12] by providing nonasypmptotic bounds in the Wasserstein distance d 1 between W n,m and D that converge to zero in the m = o(n) asymptotic regime. As the m th smallest number of a list of n distinct numbers only exists when n ≥ m, we need only consider this range of parameters in what follows. Theorem 1.1 Let C n,1 be the number of comparisons made by Quickselect to find the smallest of a list of n distinct numbers, and let W n,1 be given by (3). Then for all n ≥ 1 d 1 (W n,1 , D) ≤ 8 log(n/2) + 10 n .
Theorem 1.2 Let m ≥ 2 and C n,m the number of comparisons made by Quickselect to find the m th smallest element of a list of n distinct numbers, and let W n,m be given by (3). Then for all n ≥ m d 1 (W n,m , D) ≤ (46m + 8) log(n/m) + 54m + 8 n .
That the bounds in Theorems 1.1 and 1.2 are tight in the log n/n order for m = 2 is a consequence of the following result; in the following, we let h n = 1≤k≤n 1/k for n ≥ 1.
We note that in the case m = 1 the lower bound simplifies to 2 log n/n. That our method, where we focus only on the expectation E[C n,m ] to achieve our lower bound, does not succeed in the case m = 2 is explained by the lack of the term h n on the right hand side of (6). Theorem 1.3 is shown using the following exact expression for the expected running time of Quickselect; see also Section 6 of [9].
In particular, Theorems 1.1 and 1.2 are derived by applying Theorem 1.5 that quantifies the if direction of the fixed point property (2) in the Wasserstein, or d 1 metric between two random variables X and Y , given by On the left hand side of (7) we have chosen to write d 1 (X, Y ), rather than the technically correct expression d 1 (L(X), L(Y )), only for notational convenience.
Theorem 1.5 Let W be a non-negative random variable with finite mean, let θ > 0, and let the law of W * be given by (1). Then As the Wasserstein distance also satisfies where the infimum is over all couplings (X, Y ) having the given marginals, and is achieved here (see [18], for instance), Theorem 1.5 implies that for any non-negative random variable W with finite mean, and W * defined on a common space having the D θ -bias distribution of W .
In Section 2 we detail the workings of the Quickselect algorithm and prove Theorems 1.1 and 1.2 by applying Theorem 1.5, which is proved in Section 3. The proof of Theorem 1.3 appears in Section 4.
In related work, [8] considers the Quicksort method, which produces a fully sorted list, and [5] obtains distributional bounds for the running time of a variation of Quickselect to a non-Dickman approximand; compare its characterizion in (1.4) there to (1) here.

The Quickselect Method and the Proofs of Theorems 1.1 and 1.2
In this section we apply Theorem 1.5 to obtain the bounds in Theorems 1.1 and 1.2 on the error of the Dickman approximation for the distribution of W n,m in (3), the properly normalized running time of the Quickselect algorithm for finding the m th smallest element of a list of n distinct numbers. When the value of m is clear from context, we will write C n for C n,m .

Quickselect: the case m = 1
In this section we prove Theorem 1.1 for the distribution of the number C n of comparisons that Quickselect requires to locate the smallest element of a list of n distinct numbers. Clearly, a list of size zero requires no comparisons, hence C 0 = 0. For n ≥ 1, the procedure requires the n − 1 comparisons of the pivot to every other element at the first stage, followed by the cost of processing the left subtree, which may be empty. Since the pivot is chosen uniformly, we obtain the stochastic recursion where V 1 , the size of the left subtree, is a discrete uniform variable on {0, . . . , n − 1}. From (10) we see that C 1 = 0 and C 2 = 1 a.s., and that non-trivial distributions arise for n ≥ 3. Before proceeding to the proof of the theorem we describe how for all n ≥ 1 we may write C n as a function C(n; U 1 ) with and U 1 , U 2 , . . . a sequence of i.i.d. uniform variables on [0, 1]. Consider the initial list of size V 0 = n as making up the left subtree at stage 0. At stage k ≥ 1, given a non-null left subtree from the previous stage of size V k−1 , a new left subtree of size results by choosing a pivot uniformly from the current left subtree. In particular, the condi- Rewriting (10) in this notation we have As the size of each non-null left subtree decrements by at least one at each iteration, the value of C n will only depend on an initial subsequence of U 1 of length at most n.
We pause to prove a lemma that is needed in this and the following section.
then e n ≤ c log(en/q) for n ≥ q.
Proof: As (13) holds for n = q we see that e q ≤ c, verifying that the inequality in (14) holds at q. Assuming inequality (13) holds for q ≤ u ≤ n − 1 for some n ≥ q + 1 we have completing the inductive step, and the proof.
We now prove Theorem 1.1. In the proof, we use Lemmas 2.2 and 2.4, which appear with their proofs at end of this section.
Proof of Theorem 1.1: Take n ≥ 1. With V k as in (11), by (12) the variable W n as given by (3) satisfies We now construct a variable with the W * n distribution by first constructing W ′ n having the W n distribution. As U 1 and U 2 are equidistributed, and hence has the D-bias distribution by (1). The difference hence consequence (9) of Theorem 1.5 with θ = 1 yields We claim that When ⌊nU 1 ⌋ ≥ 1 this inequality follows from using the basic recursion (12) on both terms forming the difference that defines e n , followed by applying the triangle inequality, and is easily verified to hold directly in the case ⌊nU 1 ⌋ = 0 by applying (12) only on the first term of that difference, noting the second one in this case is zero. Now using that |u(n − 1) − ⌊nu⌋ + 1| ≤ 2 for all u ∈ [0, 1], we obtain For the final term, the inequality follows by applying Lemma 2.2, below, that shows that |⌊U 1 ⌊nU 2 ⌋⌋ − ⌊U 2 ⌊nU 1 ⌋⌋| ≤ 1 a.s, and Lemma 2.4, also below, that shows that E|C(p, Expanding the expectation in Ee ⌊nU 2 ⌋ in (16), using the fact that ⌊nU 2 ⌋ is uniformly distributed over {0, . . . , n − 1} and that e 0 = e 1 = 0 by virtue of C 0 = C 1 = 0, we obtain As e 1 = 0 inequality (15) shows that the claim of the theorem holds for n = 1. Applying Lemma 2.1 with c = 4 and q = 2 shows that e n ≤ 4 log(en/2) for n ≥ 2, and substituting this bound into (15) and simplifying now completes the proof. We now prove Lemmas 2.2 and 2.4.
Taking the difference, As the difference between u 1 ⌊nu 2 ⌋ and u 2 ⌊nu 1 ⌋ is less than 1, their integer parts can differ by at most 1.
To prove Lemma 2.4, we will use the easily verified fact that and for u ∈ [0, 1] that We will also require the following inequality that can be shown directly using induction.

Case of m ≥ 2
In this section we prove Theorem 1.2 for the approximation of the distribution of the properly scaled value of the number C n,m of comparisons made by the Quickselect algorithm Q m to determine the m th smallest element of a list of n distinct numbers in the case m ≥ 2.
As the m th smallest element of the list does not exist when n < m, no comparisons are required and we may set C n,m = 0 over this range. In the non-trivial case n ≥ m, Q m begins as for m = 1 at the first stage by selecting a uniformly chosen pivot, giving rise, through n − 1 comparisons to the pivot, to a left subtree of size V 1 , uniformly distributed over {0, . . . , n − 1}, and a right subtree of size n − 1 − V 1 . If V 1 ≥ m then the m th smallest element of the original list lies in the left subtree, and we may locate it by applying Q m to it. If V 1 = m − 1 then the pivot is the m th smallest element and the process stops. Otherwise V 1 < m − 1, and the m th smallest element is the m − V 1 − 1 st smallest element in the right subtree, which we then locate by applying Q m−V 1 −1 to it. Hence, we obtain C n,m = 0 for 0 ≤ n ≤ m − 1, and C n, We now develop a simple bound on the expectation E[C n,m ]. Proof: Recall h n is the harmonic series 1≤k≤n 1/k for n ≥ 1. The claim is trivial unless n ≥ m, and is also easily seen to be true for m = 1 and m = 2 using (5) and (6). Hence, we take n ≥ m ≥ 3. For such n and m, writing the difference between the two harmonic series below as a sum and separating out the last term for j = m − 2, we have the inequality holding since each ratio is bounded by 1. Hence, using the expression given for E[C n,m ] in Theorem 4 and applying (20) to yield the first inequality below, we obtain the upper bound Note that the indicator on the first term on the right hand side of (19) may be dropped, due to the boundary condition there, on the line above. Now letting C m (n; U 1 ) be defined by rewriting (19) as (12) was derived from (10), we obtain C m (n; U 1 ) = 0 for 0 ≤ n ≤ m − 1, and otherwise C m (n; U 1 ) = n − 1 + C m (⌊nU 1 ⌋; We next provide the following result that parallels Lemma 2.4 for the case m = 1.

Lemma 2.6
For all m ≥ 2 and p ≥ 1 Proof: As C m (p; U 1 ) = 0 for all 0 ≤ p ≤ m − 1 we may take p ≥ m. By the basic recursion (21) we have Applying the triangle inequality and taking expectation yields For the first expectation in (22), by (18) we have Now applying Lemma 2.5 on the first term of the remainder R, and using that ⌊pU 1 ⌋ ∼ U{0, . . . , p − 1}, yields and replacing p by p − 1 we see that the same bound holds for the expectation of the final term of R.
Substituting the bounds achieved into (22) we obtain kf k for all p ≥ m. (23) As f p = 0 for 1 ≤ p ≤ m − 1 inequality (23) holds for all p ≥ 2, and the conditions for invoking Lemma 2.3 with c = 1 + 8m are satisfied, yielding the desired conclusion.
3 Proof of Theorem 1.5 Theorem 1.5 was originally proven using Stein's method in [10], but [14] offered the following much simpler approach.