On the strange domain of attraction to generalized Dickman distributions for sums of independent random variables

Let $\{B_k\}_{k=1}^\infty, \{X_k\}_{k=1}^\infty$ all be independent random variables. Assume that $\{B_k\}_{k=1}^\infty$ are $\{0,1\}$-valued Bernoulli random variables satisfying $B_k\stackrel{\text{dist}}{=}\text{Ber}(p_k)$, with $\sum_{k=1}^\infty p_k=\infty$, and assume that $\{X_k\}_{k=1}^\infty$ satisfy: $X_k>0,\ \ \ \mu_k\equiv EX_k<\infty, \ \ \ \lim_{k\to\infty}\frac{X_k}{\mu_k}\stackrel{\text{dist}}{=}1$. Let $M_n=\sum_{k=1}^np_k\mu_k$, assume that $M_n\to\infty$ and define the normalized sum of independent random variables $W_n=\frac1{M_n}\sum_{k=1}^nB_kX_k$. We give a general condition under which $W_n\stackrel{\text{dist}}{\to}c$, for some $c\in[0,1]$, and a general condition under which $W_n$ converges in distribution to a generalized Dickman distribution GD$(\theta)$. In particular, we obtain the following concrete results, which reveal a strange domain of attraction to generalized Dickman distributions. Let $J_\mu,J_p$ be nonnegative integers, let $c_\mu,c_p>0$ and let $$ \begin{aligned}&\mu_n\sim c_\mu n^{a_0}\prod_{j=1}^{J_\mu}(\log^{(j)}n)^{a_j},&p_n\sim c_p\big({n^{b_0}\prod_{j=1}^{J_p}(\log^{(j)}n)^{b_j}}\big)^{-1}, \ b_{J_p}\neq0. \end{aligned} $$ If $$ \begin{aligned}&i.\ J_p\le J_\mu;&ii.\ b_j=1, \ 0\le j\le J_p;&iii.\ a_j=0, \ 0\le j\le J_p-1,\ \text{and}\ \ a_{J_p}>0, \end{aligned} $$ then $ \lim_{n\to\infty}W_n\stackrel{\text{dist}}{=}\frac1{\theta}\text{GD}(\theta),\ \text{where}\ \theta=\frac{c_p}{a_{J_p}}. $ Otherwise, $\lim_{n\to\infty}W_n\stackrel{\text{dist}}{=}c$, for some $c\in[0,1]$. We also give an application to the statistics of the number of inversions in certain shuffling schemes.

= Ber(p k ), with ∞ k=1 p k = ∞, and assume that {X k } ∞ k=1 satisfy X k > 0 and µ k ≡ EX k < ∞. Let Mn = n k=1 p k µ k , assume that Mn → ∞ and define the normalized sum of independent random variables Wn = 1

Introduction and Statement of Results
The Dickman function ρ 1 is the unique function, continuous on (0, ∞), and satisfying the differential-delay equation This function has an interesting role in number theory and probability, which we describe briefly at the end of this section. With a little work, one can show that the Laplace transform of ρ 1 is given by where γ is Euler's constant. From this it follows that ∞ 0 ρ 1 (x)dx = e γ , and consequently, that e −γ ρ 1 is a probability density on [0, ∞). We will call this probability distribution the Dickman distribution.
We will call such distributions generalized Dickman distributions and denote them by GD(θ). We denote by D θ a random variable with the GD(θ) distribution. Differentiating its Laplace transform at λ = 0 shows that ED θ = θ.
These distribution decays very rapidly; indeed, it is not hard to show that p θ (x) ≤ C θ Γ(x+1) , x ≥ 1, for an appropriate constant C θ . A fundamental fact about these distributions is that where U is distributed according to the uniform distribution on [0, 1], and U and D θ on the right hand side above are independent. From (1.2) it is immediate that where {U n } ∞ n=1 are IID random variables distributed according to the uniform distribution on [0, 1]. It will follow from the proof of Theorem 1 below that exp(θ 1 0 is the Laplace transform of a probability distribution. In section 5 we will prove that a random variable with such a distribution satisfies (1.2), and that if a random variable satisfies (1.2), then it has a density of the form c θ ρ θ , where ρ θ satisfies (1.1). Thus, this paper is self-contained with regard to all the above noted facts, with the exception of the rate of decay and the value e −θγ Γ(θ) of the normalizing constant c θ in p θ . For more on these distributions, including a derivation of the normalizing constant, see, for example, [1] and [8].
In fact, the scope of this paper leads us to consider a more general family of distributions than the generalized Dickman distributions. Let X ≥ 0 be a random variable satisfying EX ≤ 1. Then, as we shall see, for θ > 0, there exists a distribution whose Laplace transform is exp θ We will denote this distribution by GD (X ) (θ) and we denote a random variable with this distribution by D (X ) θ . (When X ≡ 1, we revert to the previous notation for generalized Dickman distributions.) Differentiating the Laplace transform at λ = 0 shows that ED Mimicking the proof of (1.2) that we give in section 5 shows that where U is distributed according to the uniform distribution on [0, 1], and U , D (X ) θ and X on the right hand side above are independent. From (1.3) it is immediate that where {U n } ∞ n=1 and {X n } ∞ n=1 are mutually independent sequences of IID random variables, with U 1 distributed according to the uniform distribution on [0, 1] and X 1 distributed according to the distribution of X .
It is known that the generalized Dickman distribution GD(θ) arises as the limiting distribution of 1 n n k=1 kY k , where the {Y k } ∞ k=1 are independent random variables with Y k distributed according to the Poisson distribution with parameter θ k [1]. It is also known that the Dickman distribution GD(1) arises as the limiting distribution of 1 k . Such behavior is in distinct contrast to the law of large numbers behavior of a "well-behaved" sequence of independent random variables {Z k } ∞ k=1 with finite first moments; namely, that 1 Mn n k=1 Z k converges in distribution to 1 as n → ∞, where M n = n k=1 EZ k . The purpose of this paper is to understand when the law of large numbers fails and a distribution from the family GD (X ) (θ) arises in its stead. From the above examples, we see that generalized Dickman distributions sometimes arise as limits of normalized sums from a sequence {V k } ∞ k=1 of independent random variables which are are non-negative and satisfy the following three In light of the above discussion, we will consider the following setting.
be mutually independent sequences of independent random variables. Assume that {B k } ∞ k=1 are Bernoulli random variables satisfying: and assume that {X k } ∞ k=1 satisfy: We will be interested in the limiting behavior of W n . In order to avoid trivialities, we will assume that since otherwise ∞ n=1 B k X k is almost surely finite. Note that for the example brought with the Pois( θ k )-distribution, we have and M n = nθ. And for the example with the Ber( 1 k )-distribution, we have p k = 1 k , X k = k deterministically, µ k = k and M n = n. In the first of these two examples, X k µ k dist → 1, and in the second Our first theorem gives a general condition for W n dist → c (which is the law of large numbers if c = 1), and a general condition for convergence to a limiting distribution from the family of distributions GD (X ) (θ). Using this theorem, we can prove our second theorem, which reveals the strange domain of attraction to generalized Dickman distributions. (Of course, we are using the term "domain of attraction" not in its classical sense, since our sequence of random variables, although independent, are not identically distributed.) Let δ c denote the degenerate distribution at c.
ii. Assume that there exists a random variable X such that Assume also that {µ k } ∞ k=1 is increasing, that lim k→∞ p k = 0 and that there exist θ, L ∈ (0, ∞) such that satisfy the conditions of part (ii), and we choose X k = µ k , then W n dist → LD θ .
Since EW n = 1 and ED θ = θ, it follows from Fatou's lemma that L ≤ 1 θ . In most cases of interest, one has L = 1 θ . Remark 2. By Fatou's lemma, the random variable X in part (ii) must satisfy EX ≤ 1.
Remark 4. In the case that X k = µ k , or more generally, if EX k ≤ Cµ 2 k , for all k and some C > 0, then Thus, in this case part (i-a) follows directly from the second moment method.
Using Theorem 1, we can prove the following theorem that exhibits the strange domain of attraction to generalized Dickman distributions. Let log (j) denote the jth iterate of the logarithm, and make the convention 0 j=1 = 1.
where D θ is a random variable with the GD(θ) distribution.
Remark 1. Note that if one chooses µ k = µ(k) and p k = p(k), then the Remark 2. Theorem 2 shows that to obtain a generalized Dickman distribution, {p k } ∞ k=1 in particular must be set in a very restricted fashion. For some intuition regarding this phenomenon, take the situation where X k = µ k , and consider the sequence {σ 2 (W n )} ∞ n=1 of variances. This sequence converges to 0 in the cases where W n converges to 1, converges to ∞ in the cases where W n converges to 0, and converges to a positive number in the cases where W n converges to a generalized Dickman distribution.
We now state explicitly what Theorem 2 yields in the cases J p = 0, 1.
In order that (1.8) hold, we require either b 0 = 0 and b 1 > 0, or 0 < b 0 < 1, or b 0 = 1 and b 1 ≤ 1. We also require either: Remark. In [3] and [7], where the GD(1) distribution arises, one The organization of the rest of the paper is as follows. In section 2 we use Theorems 1 and 2 to investigate a question raised in [5] concerning the statistics of the number of inversions in certain random shuffling schemes.
In sections 3 and 4 respectively we prove Theorems 1 and 2. Finally, in section 5 we prove the basic facts about the Dickman distribution and its density, as was promised earlier in this section.
As mentioned above, we end this section with a little background concerning the Dickman function ρ ≡ ρ 1 . The Dickman function arises in probabilistic number theory in the context of so-called smooth numbers; that is, numbers all of whose prime divisors are "small." Let Ψ(x, y) denote the number of positive integers less than or equal to x with no prime divisors greater than y. Numbers with no prime divisors greater than y are called y-smooth numbers. Then for s ≥ 1, Ψ(N, N 1 s ) ∼ N ρ(s), as N → ∞. This result was first proved by Dickman in 1930 [4], whence the name of the function, with later refinements by de Bruijn [2]. See also [6] or [9]. Let in distribution as n → ∞ to the distribution whose distribution function 1], and whose density is − is easy to see that an equivalent statement of Dickman's result is that the random variable log p + (j) log j , j ∈ [n], on the probability space [n] with the uniform distribution converges in distribution as n → ∞ to the distribution whose distribution function is ρ( 1 x ), x ∈ [0, 1], We note that the length of the longest cycle of a uniformly random permutation of [n], normalized by dividing by n, also converges to a limiting distribution whose distribution function is ρ( 1 x ). If instead of using the uniform measure on S n , the set of permutations of [n], one uses the Ewens sampling distribution on S n , obtained by giving each permutation σ ∈ S n the probability proportional to θ C(σ) , where C(σ) denotes the number of cycles in σ, then the length of the longest cycle of such a random permutation of [n], normalized by dividing by n, converges to a limiting distribution whose distribution function is ρ θ ( 1 x ), x ∈ [0, 1]. This distribution is also the distribution of the first coordinate of the Poisson-Dirichlet distribution PD(θ) (see [1]).
The examples in the above paragraph lead to limiting distributions where the Dickman function arises as a distribution function, not as a density as is the case with the GD(θ) distributions discussed in this paper. The GD(θ) distribution arises as a normalized limit in the context of certain natural probability measures that one can place on N; see [3], [7].

An application to random permutations
We consider a setup that appeared in [5], and which in the terminology of this paper can be described as follows.
We allow E k = ∅, in which case B k = 0 and X k is not defined. In such a case, we define B k X k = 0 and µ k = 0. We always have E 1 = ∅. It is clear from the construction that the random variables {B k X k } n k=1 are independent. Thus, I n indeed gives the number of inversions in a uniformly random permutation from S n . It is well-known that the law of large numbers and the central limit theorem hold for I n in this case.
Consider now the general case that E k ⊂ {1, . . . , k − 1}. Then I n gives the number of inversions in a random permutation created by a shuffling procedure in the same spirit as the above one. At step k, with probability 1 − |E k | k , card number k is inserted at the right end of the row, thereby creating no new inversions, and for each j ∈ E k , with probability 1 k it is inserted in the position with j cards to its right, thereby creating j new inversions.
In particular, as a warmup consider the cases E k = {1} and E k = {k − 1}, 2 ≤ k ≤ n. In each of these two cases, at step k, 2 ≤ k ≤ n, card number k is inserted at the right end of the row with probability 1 − 1 k . In the first case, with probability 1 k card number k is inserted immediately to the left of the right most card, thereby creating one new inversion, while in the second case, with probability 1 k card number k is inserted at the left end of the row, thereby creating k − 1 new inversions. In both cases Xn µn dist = 1 for all n, and in both cases, p k = 1 k . In the first case, µ k = 1 while in the second case, µ k = k − 1. Thus, in the first case, M n = n k=1 p k µ k ∼ log n, and in the second case, M n ∼ n. Therefore, it follows from Theorem 1 or 2 that in the first case In log n converge in distribution to 1, while in the second case, In n converges in distribution to GD(1).
The authors of [5] ask which choices of {E k } ∞ k=1 lead to the Dickman distribution and which choices lead to the central limit theorem. Of course, the law of large numbers is a prerequisite for the central limit theorem. The following theorem gives sufficient conditions for the law of large numbers to hold and sufficient conditions for convergence to a distribution from the family GD (X ) (θ). In order to avoid trivialities, we need to assume that (1.8) holds. Recalling that µ k = 0 when |E k | = 0, and that µ k ≥ 1 otherwise, note that Thus, in the present context the requirement (1.8) is which holds in particular if E k = ∅ for all sufficiently large k.
i. Assume that at least one of the following conditions holds: ii. Assume that |E k | = N ≥ 1, for all large k, and that X k µ k dist → X . Also as- Remark 1. The condition on {µ k } in part (i-a) is just a very weak regularity requirement on its growth rate (recall that 1 ≤ µ k < k − 1). The condition Note that the random variable X in part (ii) takes on no more than N distinct values.
Proof. Assume first that the condition in part (i-a) holds. We claim that since { µn n k=1 µ k k } ∞ n=1 is bounded, there exists a sequence of positive integers {γ n } ∞ n=1 satisfying lim n→∞ γ n = ∞ and such that { µn n k=γn+1 µ k k } ∞ n=1 is also bounded. Indeed, assume to the contrary. Then, in particular, {µ n } ∞ n=1 is unbounded. Also, since µ k < k, we have n k=1 µ k k < γ n + n k=γn+1 µ k k , and it would follow that { µn γn } ∞ n=1 is bounded for all sequences {γ n } ∞ n=1 satisfying lim n→∞ γ n = ∞, which is a contradiction.
Let {γ n } ∞ n=1 be such a sequence. Then Thus, the condition in (i-a) guarantees that (1.9) holds. Now assume that the condition in part (i-b) holds. Since M n ≥ n k=1 µ k k , it follows again that (1.9) holds.
Thus, assuming either (i-a) or (i-b), it follows from part (i-a) of Theorem Now assume that the condition in part (ii) holds. Then p k = N k , for large k, and µ k ∼ c µ k a 0 Jµ j=1 (log (j) k) a j , with a 0 > 0. Thus, and lim k→∞

Proof of Theorem 1
Since

Proof of part (i). Note that part (i-a) is the particular case of part (i-b) in
which one can choose K n = n, and then (1.12) holds with c = 1. Thus, it suffices to consider part (i-b). We have for λ > 0, it follows from assumption (1.10) that Applying the mean value theorem to E exp(− λ Mn X k ) as a function of λ, and recalling that µ k = EX k , we have In light of (1.11) and the assumption that { X k µ k } ∞ k=1 is uniformly integrable, it follows that for all ǫ > 0, there exists an n ǫ such that Thus, (3.3) and (3.4) yield Since for any ǫ > 0, there exists an x ǫ > 0 such that −(1+ǫ)x ≤ log(1−x) ≤ −x, for 0 < x < x ǫ , it follows from (3.5) and (1.11) that there exists an n ′ ǫ such that From (3.6) we have concerning accumulation points, follow in the same manner.
Proof of part (ii). From (3.1), we have Since by assumption lim k→∞ p k = 0, for any ǫ > 0 there exists a k ǫ such that (3.10) We now show that for any ǫ > 0 there exists a k ′ ǫ such that (3.11) By assumption (1.14) and the assumption that {µ n } ∞ n=1 is increasing, there exists a C such that µ k Mn ≤ C, for 1 ≤ k ≤ n and n ≥ 1. By assumption, Without loss of generality, we assume that all of these random variables are defined on the same space and that X k µ k → X a.s. For δ > 0, let Then A k;δ is increasing in k and lim k→∞ P (A k;δ ) = 1. We have Now (3.11) follows from (3.12) and (3.13).

Proof of Theorem 2
We will assume that J p , J µ ≥ 1 so that we can use a uniform notation, leaving it to the reader to verify that the proof also goes through if J p or J µ is equal to zero.
First assume that (1.15) holds. Then by the assumptions in the theorem, Thus, Consequently, can be defined so that (1.10) and (1.11) hold, and so that (1.12) holds with c ∈ {0, 1}. We also have to show when c = 0 and when c = 1. Recall the definitions in (1.16). If {0 ≤ j ≤ J µ : a j = 0} is empty, or if it is not empty and a κµ < 0, then {µ k } ∞ k=1 is bounded. Therefore, (1.10) and (1.11) hold with K n = n and it follows from part (i-a) of Theorem 1 that lim n→∞ W n dist = 1. Thus, from now on we assume that {0 ≤ j ≤ J µ : a j = 0} is not empty and that a κµ > 0. In order to use uniform notation, we will assume that κ µ > 0, leaving the reader to verify that the proof goes through if κ µ = 0. Thus, we have In order to simplify notation, for the rest of this proof, we will let L l (k) denote a positive constant multiplied by a product of powers (possibly of varying sign) of iterated logarithms log (j) k, where the smallest j is strictly larger than l. The exact from of this expression may vary from line to line.
We now consider the case that {0 ≤ j ≤ J p : b j = 1} is not empty. Then in order to fulfill the second condition in (1.8), we have b κp < 1. We write From (4.3) and (4.11) it follows that M n = n k=1 p k µ k satisfies and from (4.11) it follows that for any K n satisfying K n → ∞ and K n ≤ n, From (4.3) and (4.12) we have It is immediate (4.3) and (4.14) that if κ µ ≥ κ p , then (1.10) and (1.11) hold by choosing K n = n. (For the case κ µ = κ p , recall that b κp ∈ (0, 1).) Thus, from part (i-a) of Theorem 1, lim n→∞ W n dist = 1. Now consider the case κ µ < κ p . For simplicity, we will assume that the higher order iterated logarithmic terms do not appear; that is, we will assume from (4.12)-(4.14) that (4.15) n k=Kn p k ∼ c p 1 − b κp log (κp) n 1−bκ p − log (κp) K n 1−bκ p ; µ Kn M n ∼ log (κµ) K n log (κµ) n aκ µ ; M Kn M n ∼ log (κµ) K n log (κµ) n aκ µ .
The additional logarithmic terms can be dealt with similarly to the way they were dealt with for (4.9), as explained in the paragraph following (4.9).

Basic Facts Concerning Generalized Dickman Distributions
We proved in Theorem 1 that exp(θ In particular, if we let X k = µ k = k and p k = θ k , in which case M n = n k=1 p k µ k = θn, then it follows from Theorem 1 that where the first of the two summands on the right hand side above is interpreted as equal to 0 if J + n ≤ 1. We have (1 − θ k ) ∼ x θ , x ∈ (0, 1).
Also, by the independence of {B k } ∞ k=1 , we have Letting n → ∞ in (5.2) and using (5.1), (5.3) and (5.4), we conclude that (1.2) holds, where U is a distributed according to the uniform distribution on [0, 1], D θ dist ∼ GD(θ) and U and D θ on the right hand side are independent.
Let For x > 0, making the change of variables, v = xy − 1 θ − 1, we can rewrite (5.5) as From (5.6) and the fact that F θ (x) = 0, for x ≤ 0, it follows that F θ is continuous on R. Also, since F θ (x) = 0, for x ≤ 0, we have