QuickSort: Improved right-tail asymptotics for the limiting distribution, and large deviations

We substantially refine asymptotic logarithmic upper bounds produced by Svante Janson (2015) on the right tail of the limiting QuickSort distribution function $F$ and by Fill and Hung (2018) on the right tails of the corresponding density $f$ and of the absolute derivatives of $f$ of each order. For example, we establish an upper bound on $\log[1 - F(x)]$ that matches conjectured asymptotics of Knessl and Szpankowski (1999) through terms of order $(\log x)^2$; the corresponding order for the Janson (2015) bound is the lead order, $x \log x$. Using the refined asymptotic bounds on $F$, we derive right-tail large deviation (LD) results for the distribution of the number of comparisons required by QuickSort that substantially sharpen the two-sided LD results of McDiarmid and Hayward (1996).


Introduction
To set the stage, and for the reader's convenience, we repeat here relevant portions of Section 1 of Fill and Hung [2]. Let X n denote the (random) number of comparisons when sorting n distinct numbers using the algorithm QuickSort. Clearly X 0 = 0, and for n ≥ 1 we have the recurrence relation where L = denotes equality in law (i.e., in distribution); X k L = X * k ; the random variable U n is uniformly distributed on {1, . . . , n}; and U n , X 0 , . . . , X n−1 , X * 0 , . . . , X * n−1 are all independent. It is well known that µ n := EX n = 2 (n + 1) H n − 4n, where H n is the nth harmonic number H n := n k=1 k −1 and (from a simple exact expression) that Var X n = (1 + o(1))(7 − 2π 2 3 )n 2 . To study distributional asymptotics, we first center and scale X n as follows: (1.1) Using the Wasserstein d 2 -metric, Rösler [12] proved that Z n converges to Z weakly as n → ∞. Using a martingale argument, Régnier [11] proved that the slightly renormalized n n+1 Z n converges to Z in L p for every finite p, and thus in distribution; equivalently, the same conclusions hold for Z n . The random variable Z has everywhere finite moment generating function with EZ = 0 and Var Z = 7 − 2π 2 /3 . Moreover, Z satisfies the distributional identity (1.2) On the right, Z * L = Z; U is uniformly distributed on (0, 1); U, Z, Z * are independent; and g(u) := 2u ln u + 2(1 − u) ln(1 − u) + 1.
Further, the distributional identity together with the condition that EZ (exists and) vanishes characterizes the limiting Quicksort distribution; this was first shown by Rösler [12] under the additional condition that Var Z < ∞, and later in full by Fill and Janson [4].
Fill and Janson [5] derived basic properties of the limiting QuickSort distribution L(Z). In particular, they proved that L(Z) has a (unique) continuous density f which is everywhere positive and infinitely differentiable. Janson [7] studied logarithmic asymptotics in both tails for the corresponding distribution function F , and Fill and Hung [2] did the same for f and each of its derivatives. For right tails, all these results can be summarized in the following theorem. We let F (x) : (1.5) As discussed in [7, Section 1] and in [2,Remark 1.3(b)], non-rigorous arguments of Knessl and Szpankowski [8] suggest very refined asymptotics, which to three logarithmic terms assert that for each k ≥ 0 we have (1.6) as x → ∞ (and hence that the same asymptotics hold for F (k) x ). Note that for k = 0, 1 these expansions match the lower bounds on f and F in Theorem 1.1 to two logarithmic terms.
In an earlier extended-abstract version [3] of this paper, we refined the upper bounds of Theorem 1.1 to match (1.6), and we were also able to improve the lower bound in (1.5) to match (1.6) to two terms. Here is the main theorem of [3]: (1.10) In this paper we substantially refine the upper bound of Theorem 1.2(b) with k = 0; we also improve the upper bounds for k ≥ 1, though not as dramatically. Let It is elementary using integration by parts that J(t) has the (divergent) asymptotic expansion Here is the main theorem of this paper: (a) As x → ∞, the limiting QuickSort distribution function F satisfies (b) Given an integer k ≥ 1, as x → ∞ the k th derivative of the limiting QuickSort distribution function F satisfies (1.14) for some (unspecified) constant C, with α := 2 ln 2 + 2γ − 1, where γ denotes the Euler-Mascheroni constant. Hence the bound of Theorem 1.3(a) on ln F (x) matches the conjectured asymptotics to within an additive term (c) In their notation, the non-rigorously derived eq. (88) of [8] should read recalling α = 2γ + 2 ln 2 − 1. Ignoring the factor 1 − (1/w * ) which ∼ 1, this result in our notation is We prove Theorem 1.3 in Section 2. In Section 3 we use our refined asymptotic bounds on F to derive right-tail large deviation results for the distribution of the number of comparisons required by QuickSort that sharpen somewhat the two-sided large-deviation results of McDiarmid and Hayward [9].
We conclude this section by repeating from [3] an open problem concerning left-tail behavior.

2.1.
A bound on the mgf of Z. Let ψ denote the mgf of Z. It was shown by Rösler [12] that ψ is everywhere finite. In this subsection we establish a bound on ψ(t) which (for large t) improves on that of [3, Lemma 2.1], which asserts that for every ǫ > 0 there exists a ≡ a(ǫ) ≥ 0 such that the mgf ψ of Z satisfies ψ(t) ≤ exp[(2 + ǫ)t −1 e t + at] (2.1) for every t > 0. The bound (2.1) in turn improved the one obtained in the proof of [7, Lemma 6.1], namely, that there exists a ≥ 0 such that Recalling the definition (1.12) of J(t), we next state our bound on ψ(t) which, according to (1.13), does indeed improve on (2.1) for large t.
Proposition 2.1. There exists a constant a ≥ 0 such that the moment generating function ψ of Z satisfies for every t ≥ 1.
We postpone the proof of Proposition 2.1 for a preliminary remark.
Remark 2.2. Using non-rigorous methods, Knessl and Szpankowski [8] derive that as t → ∞ the mgf ψ satisfies The proof of Proposition 2.1 will require the following lemma. Recall from Remark 2.2 that α = 2 ln 2 + 2γ − 1, and define For all sufficiently large t we have the strict inequality Proof. Call the left side of this inequality λ(t). To handle λ(t), we begin by changing the variable of integration from u to η, where u = 1 2 e −t η: We next show that the contribution to e t η=0 here from e t η=e t/10 is effectively quite negligible. To see this, we consider the integrand in two cases. Before breaking into cases, observe that the second argument for ψ is at least t/2, which exceeds 1 if (as we may suppose) t > 2. For the first case, suppose that the first argument for ψ also exceeds 1. In this case we need to treat the sum of the J-values at these arguments. But, using the increasingness of 2s −1 e s for s ≥ 1, we see that if a, b ≥ 1 and a + b = t, then and therefore For the second case, suppose that the first argument for ψ does not exceed 1. In this case we need to treat J(t − 1 2 te −t η) ≤ J(t − 1 2 te −9t/10 ). In this case, observe that For the major contribution e t/10 η=0 , we can use simple expansions for the first and third factors in the integrand, because 0 ≤ 1 2 te −t η ≤ 1 2 te −9t/10 = o(1): . We also use an expansion for J(t − 1 2 te −t η) appearing in the second factor in the integrand: . We now use the following additional expansions: ]. Further we can expand the factor exp[·] appearing in I(t) as 1 + · + O(· 2 ), because · = o(1) uniformly throughout the range of integration.
Calculus now gives We conclude for sufficiently large t that Remark 2.4. If we change the factor (1 − e −t/2 ) in the definition of ψ to (1 + e −t/2 ), then a similar proof shows that the reverse strict inequality holds in Lemma 2.3. In fact, the proof becomes a bit simpler, since the minor contribution can simply be bounded below by 0.
Proof of Proposition 2.1. We carry out the proof by showing that there exists a ′ ≥ 0 such that for every t > 0.
Let t 2 > 1 be such that the strict inequality in Lemma 2.3 holds for all t ≥ t 2 , and choose a ′ ≥ a ′′ so that (2.5) holds for t ∈ [t 1 , t 2 ]. Assuming for the sake of contradiction that (2.5) fails for some t > 0, let T := inf{t > 0 : (2.5) fails}. Then T ≥ t 2 , and continuity gives ψ(T ) = e a ′ T ψ(T ).
Further, if 0 < u < 1, then (2.5) holds for t = uT and t = (1 − u)T , and thus, using our standard integral equation for ψ, we have which is strictly smaller than e a ′ T ψ(T ) by applying Lemma 2.3 with t = T ≥ t 2 . The resulting strict inequality ψ(T ) < e a ′ T ψ(T ) contradicts the definition of T . Hence (2.5) holds for all t ≥ 0.  (b) The extent to which we are able to make rigorous the claim (2.4) and thereby, in particular, identify the linear term in ln ψ(t) is the following. If where we assume K ′ (t) = O(t b 1 ) and K ′′ (t) = O(t b 2 ) for some b 1 and b 2 [just as we now know rigorously that K(t) ∼ −t 2 = O(t 2 )], then we must have for some constant C, where b := max{4, 2 + 2b 1 , 2 + b 2 }. (Aside: It is natural to assume further that b 1 = 1 and b 2 = 0, in which case b = 4.) The proof of this assertion is quite similar to the proof of Lemma 2.3 and is omitted.

Proof of improved asymptotic upper bound on F .
Proof of Theorem 1.3(a). Choose t = w, apply the Chernoff bound and utilize Proposition 2.1 to establish Theorem 1.3(a).

Remark 2.7. (a)
For large x, the optimal choice of t for the Chernoff bound combined with (2.3) is not t = w, but rather the larger w ≡ w(x) of the two positive real solutions to But the resulting improvement in the bound on ln F (x) not only is subsumed by the error bound O(log x) but in fact is asymptotically equivalent to 2x −1 (log x) 2 = o(1) and so is negligible even as concerns estimating F (x) to within a factor 1 + o(1).
Here is a proof. Use of t = w vs. t = w gives the larger expression Using Taylor's theorem, we write where t belongs to (w, w), and we also note It remains to estimate w − w. We have Write this as w w e w−w = 1 + 2 w − a x and take logs. Note Proof of Theorem 1.3(b). The bound (1.14) holds for k = 0 because it is cruder than the bound of Theorem 1.3(a). The bound (1.14) for general values of k then follows inductively using Proposition 6.1 of [2], according to which lim sup

Large deviations for QuickSort
With some improvements, this section repeats Section 3 of [3].
McDiarmid and Hayward [9] study large deviations for the variant of QuickSort in which the pivot (that is, the initial partitioning key) is chosen as the median of 2t+1 keys chosen uniformly at random without replacement from among all the keys. The case t = 0 is the classical QuickSort algorithm of our ongoing limited focus in this paper. Restated equivalently in terms of the random variable Z n in (1.1) (as straightforward calculation reveals), the following is their main theorem for classical QuickSort. . Let x n satisfy µ n n ln n < x n ≤ µ n n . (3.1) Then as n → ∞ we have Observe that (3.1) is roughly equivalent to the condition that x n lie between 2 and 2 ln n, and rather trivially the range can be extended to 1 < x n ≤ µ n /n. But notice also that if x n = (ln ln n) cn with c n nondecreasing (say), then (3.2) provides a nontrivial upper bound on P(|Z n | > x n ) if and only if c n → ∞.
McDiarmid and Hayward require a fairly involved proof utilizing primarily the method of bounded differences pioneered by McDiarmid [10] to establish the ≤ half of (3.2). The ≥ half is proven by establishing (by means of another substantial argument) the right-tail lower bound We state next our right-tail large-deviations theorem for QuickSort. With the additional indicated restriction on the growth of x n (which allows for x n nearly as large as 1 2 ln n ln ln n ), parts (b)-(c) strictly refine (3.3) and the asymptotic upper bound on P(Z n > x n ) implied by (3.4). The left-hand endpoint of the interval I n in Theorem 3.3 is chosen as c > 1 simply to ensure that sup{− ln ln x : x ∈ I n } < ∞. with x n ≡ 1 2 ln n ln ln n 1 − ωn ln ln n ; this assertion decreases in strength as the choice of ω n is increased, so we may assume that ω n = o(log log n). Since, by Theorem 1.2(b), we have it suffices to show that for any constant C < ∞ we have − 1 2 ln n + C(ln n) 1/2 + x n ln x n + x n ln ln x n + Cx n → −∞. But, writing L for ln and L k for the kth iterate of L, and abbreviating α n := 1 − ωn L 2 n , this follows from the observation that, for n large, x n (L x n + L 2 x n + C)  (1))ω n L n 2 L 2 n .
For completeness we next present a left-tail analogue of Theorem 3.3 [but, for brevity, only parts (b)-(c) thereof]. Theorem 3.4 follows in similar fashion using the case k = 0 of (1.20) in place of Theorem 1.2(b). No such left-tail large-deviation result is found in [9]. Recall Γ := (2 − 1 ln 2 ) −1 and the notation L k used in the proof of Theorem 3.3. (resp., by exp −e γx+O (1) ) as x → ∞; there is no restriction at all on how large x can be in terms of n.
Here are examples of very large values of x for which the tail probabilities are nonzero and the aforementioned bounds still match logarithmic asymptotics to lead order of magnitude, albeit not to lead-order term. Let lg denote binary log. The largest possible value of X n is n 2 (corresponding to any binary search tree which is a path), which occurs with probability 2 n−1 /n!. The smallest possible value (supposing, for simplicity, that n = 2 k − 1 for integer k) is (k − 2)2 k + 2 = N (lg N − 2) + 2 (corresponding to the perfect tree, in the terminology of [1, Section 3]); according to [ and (rounded to seven decimal places) s(1) = 0.9457553.