RANDOM TIME CHANGES FOR SOCK-SORTING AND OTHER STOCHASTIC PROCESS LIMIT THEOREMS

A common technique in the theory of stochastic process is to replace a discrete time coordinate by a continuous randomized time, de(cid:12)ned by an independent Poisson or other process. Once the analysis is complete on this Poissonized process, translating the results back to the original setting may be nontrivial. It is shown here that, under fairly general conditions, if the process S n and the time change (cid:30) n both converge, when normalized by the same constant (cid:11) n , to limit processes e S and e (cid:8), then the combined process S n (cid:14) (cid:30) n converges to e S + e (cid:8) (cid:1) ddt E S ( t ) when properly normalized. It is also shown that earlier results on the (cid:12)ne structure of the maxima are preserved by these time changes. The remainder of the paper then applies these simple results to processes which arise in a natural way from sorting procedures, and from random allocations. The (cid:12)rst example is a generalization of \sock-sorting": Given a pile of n mixed-up pairs of socks, we draw out one at a time, laying it on a table if its partner has not yet been drawn, and putting completed pairs away. The question is: What is the distribution of the maximum number of socks ever on the table, for large n? Similarly, when randomly throwing balls into n (a large number) boxes, we examine the distribution of the maximum over all times of the number of boxes that have (for example) exactly one ball.


Introduction
The following home-economics problem has reappeared in various guises over the years ( [Ber82], [Lut88], [Ste96], and [LP98]): N pairs of freshly washed socks, each pair having a distinguishing color or pattern, lie thoroughly mixed in a bin. I draw them out one by one, at random, with the object of sorting them into pairs. As each new sock is drawn out in turn, I lay it on my sorting table; if the new sock matches one already on the table, I fold the pair neatly away into my capacious sock drawer. After n draws, what is the expected number of socks on the table?
Since that question was seen off already in the mid-18th century by Daniel Bernoulli [Ber82], a more delicate question suggests itself: How much space do I need on the table? In other words, if s n is the number of unmatched socks after n draws, what can I say about the distribution of max 0≤n≤2N s n ? More generally, I will consider the sorting of N types, each with a random number of elements.
The trick which I will apply here, hardly an uncommon one in probability theory, is a random time-change. (This approach to the sock-sorting problem first appeared in my dissertation [Ste96], but it has since been applied to the same problem by W. Li and G. Pritchard [LP98], apparently independently. For more examples of this sort of embedding, and further references, see [BH91], [Lan96], and [Kin93,.) Concretely, s n may be represented where s i n is 1 if precisely one sock of type i is included in the first n draws, and 0 otherwise. For convenience, rescale the whole process to occur on the time interval [0, 1]: Essentially, what this does is to let the n-th draw occur at time n/2N . This is only mindless rescaling, but it opens the way for an almost magical simplification when this deterministic time-change is replaced by a random one, where each sock is drawn at an independent random time, uniform on [0, 1]. This i.i.d. randomization of the time automatically generates a uniform permutation of the socks. Let φ N (t) be the number of socks drawn up to time t, divided by 2N ; then where f i (t) is defined to be 1 if precisely one of the socks of type i has been drawn by time t and 0 otherwise. The advantage of this representation is that, since each sock pair is concerned only with its own clock, the random functions f i are independent. This opens up the problem to powerful empirical-process techniques, as described in [Ste96] and [Ste98], to study the limiting behavior of F N .
It remains, though, to consider how the information that carries over from this earlier analysis is affected by the time change. One result in [Ste98] says that The smooth curve in the picture represents the expectation, E S N (t) = 2Nt(1 − t).
converges weakly to a Gaussian process. Since φ N (t) converges uniformly in probability to the identity function, a functional central limit theorem for S N (t) = N 1/2 s [2Nt] − E s [2Nt] would imply a similar theorem for (Here, and occasionally elsewhere, when X(t) is a random function the expectation function E X(t) is represented as X N (t) in order to allow a distinction between E X(φ N (t)) and X(φ N (t)); that is, whether or not to integrate over the time change.) The problem here is, first, to move in the opposite direction, from a limit theorem for F N to a corresponding one for S N (where F N and φ N are not independent), and second, to replace S N φ N (t) by S N (t). Proposition 3.1 states that, asymptotically, F N (t) may be neatly decomposed into a sum of two independent Gaussian processes, one which is S N (t), and the other corresponding to the fluctuations in the time change itself.
Another result in the earlier paper says that the pair converges to the pair consisting of the maximum and the location of the maximum of B t − 2t 2 , where B t is a standard Brownian motion. * To put it another way, the maximum may be divided up on three different scales: where the terms in braces converge weakly to independent finite-valued random variables.
The distribution of the maximum is unaffected by the time change. Thus, it would certainly be true that satisfies the same limit law as (1). It would be pleasant to replace this by Since the empirical process φ N (t) − t is uniformly O p (N −1/2 ), there is no problem in eliminating φ N from the second term. On the other hand, s [2Nφ N (1/2)] can differ from s N by as much as 2N (φ N 1 2 − 1 2 ), which should be on the order of N 1/2 , swamping the N −1/3 normalization. What saves the situation, as I show in Proposition 3.2, is that it occurs at a relative extremum of the expectation. This means that, in these O(N 1/2 ) steps between 1 2 and φ N ( 1 2 ) the process moves like a symmetric random walk, and on average only fluctuates by about N 1/4 , not N 1/2 . Section 4.1 applies these results to a generalized sorting problem. The procedure is the one described above for socks, only now instead of pairs I admit classes of arbitrary (random) numbers of objects. I show there that both limit theorems hold -the functional central limit theorem as well as the second-order approximation for the maximum (corresponding to (2))as long as the class sizes have bounded fourth moments, and their distributions of the class sizes converge in L 4 when properly normalized. * For definiteness, for f a cadlag function defined on a real interval I, define arg max f = inf t : sup s≤t f (s) = sup s∈I f (s) .
In section 4.2 I apply these same methods to a related question about random allocations. Here balls are being thrown successively at random into a large number N of boxes, and I consider the behavior of such functions as the total number of boxes which have exactly one ball, after n have been thrown.

Technical lemmas
In what follows, I will be a subinterval of R and D = D(I) will represent the space of cadlag (right-continuous with limits from the left) functions from I to R. I will have little to say concretely about the topology, but assume it to be furnished with the Skorokhod topology (cf. section 14 of [Bil68]), and all measurability questions will be referred to the corresponding Borel σ-algebra. The modulus of continuity of a function x is denoted by For nondecreasing functions ϕ ∈ D(I), a cadlag pseudoinverse function is defined by An elementary fact about pseudoinverses is where R = Range φ ∩ Range ϕ.
Applying φ −1 to the two points on the left side gives the result, by the definition of modulus of continuity.
We will also need some facts about uniform tightness which, while not original, do not seem to be explicitly stated in standard reference works. Let (x n ) be a sequence of random functions in D(I), for some finite interval I. I will say that (x n ) is tight in the uniform modulus if for some t 0 ∈ I the random variables x n (t 0 ) are uniformly tight, and for every positive , 1. If (x n ) converges weakly to a process that is almost surely continuous, then it is tight in the uniform modulus. Conversely, if (x n ) is tight in the uniform modulus, then it is tight as a sequence of random functions in D, and if it converges weakly the limit law must be almost surely continuous.

2.
If (x n ) is tight in the uniform modulus then the sequence of random variables sup t∈I |x n (t)| is tight. 3. If (x n ) and (y n ) both are tight in the uniform modulus, then (x n + y n ) is as well.
Proof. The first assertion in 1 is a consequence of the Continuous Mapping Theorem (cf. Theorem IV.12 of [Pol84]), and the fact that the modulus of continuity, seen as a functional of cadlag functions x (with δ fixed), is continuous at continuous x. The second assertion appears as Theorem 15.5 of [Bil68].
For every positive K, which, by taking appropriate limits, proves assertion 2.
Assertion 3 follows from the subadditivity of the continuity modulus. Assertion 4 follows from assertion 3, since y n = x n + (y n − x n ), and y n − x n is tight in the uniform modulus, since it converges in probability to 0.
Lemma 2.3. Let (φ n ) be a sequence of random elements of D(I), which are each almost surely nondecreasing. Let α n be an increasing sequence of real numbers that go to ∞ with n, and φ a continuous real-valued function on I, such that φ −1 is twice differentiable with bounded second derivative. Suppose that φ n = α n (φ n − φ) is tight in the unifom modulus. Then φ −1 n = α n (φ −1 n − φ −1 ) also is tight in the uniform modulus.
Proof. Let R n represent the range of φ n , and let It is immediate from the definition of φ −1 n that ρ n is bounded by the size of the largest jump by φ n . It follows that for every positive δ, w e φn (δ) ≥ α n ρ n . The tightness condition for φ n implies then that α n ρ n converges to 0 in probability, as n goes to ∞. Define also σ n = sup t∈I |φ n (t) − φ(t)|. Since φ n is tight in the uniform modulus, the sequence of random variables (α n σ n ) is tight. By Lemma 2.1, for any fixed t, (| φ −1 n (t)|) ∞ n=1 is tight as well. Let L 1 and L 2 be upper bounds for the absolute values of the first and second derivatives of φ −1 respectively. Let δ > 0 be given, and consider any s, t ∈ R n with |s − t| < δ.
At the same time, using Lemma 2.1 and the fact that Thus for every positive , Remember that α n ρ n n→∞ −−− − −→ P 0, while the sequence (α n σ n ) is tight. This implies immediately that the second, third, and fourth terms on the right go to 0 as n goes to ∞, and the fifth term goes to zero when at last δ goes to 0. The first term on the right, meanwhile, is 0, because of the assumption that φ n is tight in the uniform modulus.
Note: Lemma 2.3 would not be true if the condition on the second derivative of φ −1 were removed. A particularly simple counterexample, with φ −1 only once differentiable, is obtained for α n = n and I = [−1, 1] by letting φ(t) = t |t| deterministically, and φ n = φ + 1 n . I include here also two elementary lemmas which will be needed for negotiating between the discrete and continuous settings.

Lemma 2.4. Given positive integers k, and c
Iterating this b − a times proves (5).
If b < k, the left-hand side of (5) is zero, so the statement is trivial.
This completes the proof of (5).
To demonstrate (7), use Taylor's Theorem for the function x −k to see that from which iteration yields (6).
Lemma 2.5. If a, b, and n are positive real numbers with b ≤ n/2 and n > 1, Proof. In general, if x and y are positive real numbers, by the Theorem of the Mean This indicates as well that so an application of (8) completes the proof. (Note: I have not been able to find a source for this elementary argument, but it was brought to my attention -to replace messier computations -by Brad Mann.)

Theoretical results
The random functions φ n : to be in D[0,T] or D[0,T ] respectively. These define cadlag processes F n = S n • φ n . The process φ n is assumed to converge uniformly in probability to a continuous, strictly increasing nonrandom function φ, with φ n (0) = φ(0) = 0. The expectations of all the random functions exist pointwise, and the expectations E S n (t) converge uniformly in t to a twice continuously differentiable function S(t). Also posit an increasing sequence of positive normalization constants (α n ) which go to infinity with n, and define the normalized processes Following (fairly) standard nomenclature, I will call the sequences of random variables S n and φ n defined on a common probability space asymptotically independent if for every bounded Suppose also that φ n converges uniformly in probability to an almost-surely continuous process Φ, and that S n and φ n are asymptotically independent.
(i) If S n (t) converges weakly to an almost-surely continuous process S(t), then F n (t) also converges weakly to an almost-surely continuous process F (t), and where S (u) = dS dt t=u . (ii) Suppose that φ n and S n satisfy for all n, and that φ −1 is twice differentiable, with bounded first and second derivatives. If φ n and F n converge to continuous Gaussian processes Φ and F respectively, then S n converges to a continuous Gaussian process S, and the relation (11) is satisfied.
Note: The condition (12) is required to exclude pathological fluctuations which would destroy the tightness of S n , but which would not be reflected in F n , because they occur within gaps in the range of φ n . It is trivially satisfied in the examples which I am considering, since S n is constant between the times that are hit by φ n .
Proof. The function F n may be decomposed into where A n , B n , Γ n , ∆ n , E n are random functions, and Z n is a deterministic function. By assumption (10), E n and Z n converge uniformly to 0 as n → ∞.
By a Taylor approximation, The tightness of α n (φ n − φ), together with the fact that α n goes to infinity, shows that the first term converges to 0 in probability.
If S n is tight in the uniform modulus (cf. page 5), then the term ∆ n converges to 0 uniformly in probability, since for every positive δ The first piece is 0 for every δ by the assumption of uniform convergence in probability of φ n . The second piece converges to 0 as δ → 0 by the convergence of S n to an almost surely continuous process. Thus, If assumption (i) holds, then ( S n ) is certainly tight in the uniform modulus, since it converges weakly to a continuous process. By Theorem 4.5 of [Bil68] (and the ensuing exercise 7), asymptotic independence and the convergence of φ n in probability suffice to show that the pair ( φ n , S n ) converges weakly. It follows that any linear combination also converges; that is, that Now suppose that the assumptions (ii) hold. By Skorokhod's Theorem (e.g., Theorem 3.2.1 of [Sko56]), the sequence of random functions must only be tight and have its finite-dimensional distributions converge. Unfortunately, the proof of the required relation (13) presupposed that S n is tight.
Tightness entered only into the proof that ∆ n (t) converges to 0 uniformly in probability. As a substitute, we may reverse the argument that led to (13). Define . Carrying out the same sort of decomposition yields terms The functions E * n and Z * n converge to 0 uniformly by assumption (10). The term ∆ * n converges uniformly to 0 in probability for essentially the same reason that ∆ n does, since F n forms a tight sequence, and by Lemma 2.1, where L 1 is the Lipschitz constant for φ −1 . This also shows that which means that Γ * n converges uniformly to 0 in probability. The assumption (12) implies that sup t | S * n (t) − S n (t)| → 0 as n goes to ∞. It follows that The sequence ( φ −1 n ) is tight in the uniform modulus by Lemma 2.3. By Lemma 2.2, this holds for S n as well.
The asymptotic characterization (13) implies the convergence of the characteristic functions. Together with the asymptotic independence and the multiplicativity of the characteristic functions of Gaussian variables, this establishes the convergence of the finite-dimensional distributions.
For the second result, the probability space is assumed to admit a σ-algebra R n , such that S n and φ n are independent conditioned on R n , and have regular conditional probabilities. (For an account of conditional independence, see section 7.3 of [CT78]. In this paper the existence of regular conditional probabilities will be trivial, since R n will be generated by a countably-valued random variable.) The symbols p, q, r, D, c and C will all represent positive real constants, q ≥ 1 2 . S(t) will generally represent a smooth approximation to the expectation of S n (t) such that and it will be assumed that S has a local extremum at t 0 , such that Proposition 3.2. Suppose that S n and φ n are independent when conditioned on R n , have regular conditional distributions, and that the expectation of the fluctuations in S n satisfies for all n positive and 0 ≤ t ≤ 1. Suppose, too, that there is a function S satisfying conditions (16) and (17), and that Then for α < min r + 1 4 , q 2 and α ≤ p, Proof. Define τ = φ n (t 0 ) − t 0 , and let R * n be the σ-algebra generated by R n and the random variable φ n . Define, too, a random variable Conditional independence tells us that the regular conditional distribution for (S n , τ) with respect to R n is the product of the regular conditional distributions for S n and for τ , which means that Fubini's theorem holds for the conditional expectations. That is, if one defines random functionals Finally, the last term is o(n −p ) by assumption (16).

Applications: socks and boxes
4.1. Sorting problems. Returning to the generalized sock-sorting problem described in the introduction, let k = (k 1 , . . . , k N ) be an N -tuple of independent random integers, each k i ≥ 2, and set K i = k 1 + · · · + k i and K = K N . Let Consider now the set of permutations of K letters, viewed as bijections Given a map π ∈ S k , let When there is no risk of confusion, the argument π will be suppressed. Identifying this with the sequential choice of elements from a set of size K, which is divided up into N classes of the form {(i, 1), . . . , (i, k i )}, s i n represents the number of elements from class i among π −1 (1), . . . , π −1 (n), unless the class is complete (that is, the number is k i ), in which case s i n = 0. Similarly, s n represents the total number of elements from incomplete classes among π −1 (1), . . . , π −1 (n). The stipulation that π be chosen uniformly from S k defines a stochastic process. The next step is to embed this in a continuous-time process, by defining The functions S i (t) are almost independent, as long as N is much larger than any of the k i ; but not quite, because only one jump can occur at a time. They become independent when the clock which determines the jump times is randomized. For each positive K, let X(K, ) : = 1, . . . , K be an array of i.i.d. random variables uniform on [0, 1]; without further mention, condition on the almost-sure event that the X(K, ) are distinct. For 1 ≤ i ≤ N and 1 ≤ j ≤ k i define x(i, j) = X(K, K i−1 + j). The empirical process converges in probability to the identity function as K → ∞, and that converges weakly to a Brownian bridge Φ, with covariance c 2 (s, t) = s(1 − t) for s ≤ t. The almost-sure representation theorem (Theorem 1.10.3 of [vdVW96]) then allows the convergence to occur uniformly almost surely, on a suitably defined common probability space. This circuitous definition has the advantage of providing an autonomous limit process, independent of k. As N increases, φ K will simply run through a subsequence of φ n ∞ n=1 , and necessarily will converge to the same limit.
Since the x(i, j) are i.i.d. and distinct, they define a uniform permutation in S k . Define Assume now that the distribution of k has a limit, in the sense that p k = lim N →∞ p (N ) k exists, and impose the following conditions: We then also have Since this is a convex combination of strictly concave functions, it has a unique maximum at an internal point t 0 .
(ii) The triple where M is a normal random variable with variance 1, independent of G(u). G(u) is distributed as σB u − Du 2 , where B u is a standard two-sided Brownian motion, and Proof. By Proposition 7.3 of [Ste98], the assumptions (21) and (22) suffice to prove that the conclusions hold with S N replaced by F N , and with the covariance c(s, t) replaced by (i) We know that F N (t) converges weakly to a Gaussian process with covariance c 1 (s, t), and wish to extend this result by an application of Proposition 3.1. As I have already mentioned, the process φ K (t) converges almost surely to the Brownian bridge Φ(t). By a slight abuse of indices, By (21) the variables k i have a fortiori bounded second moments, and so satisfy the strong law of large numbers [Fel71, Theorem VII.8.3]. Since every subsequence of a convergent sequence converges to the same limit, φ N converges almost surely, hence also in probability, to µ −1/2 Φ. For any fixed k, Integrating over k, and using the trivial facts K − k i ≥ 2N − 2, s i m ≤ k i for all m and i, and For the expectation of S N (t) this yields Since the normalization factor is α N = N 1/2 , this proves (10).
It follows now by Proposition 3.1 that S N (t) converges weakly to a Gaussian process whose covariance is precisely where c 2 (s, t) = s(1 − t) is the covariance of the Brownian bridge. In principle it would be possible to compute this covariance directly, using combinatorics of the discrete process and passing to the limit as N → ∞. For the case k i identically equal to 2 this has been done in [Ste96]; for larger k, though, the calculation becomes complicated and tedious.
(ii) By [Ste98], the triple converges in distribution to sup u∈R G(u) , arg max G , M .
The limit needs to be unchanged when all the F N are replaced by S N . This will be achieved if the difference created by this change goes to 0 in probability in the coupled version, where Since sup t F N (t) and sup t S N (t) have the same distribution, we know that the convergence of the third coordinate still holds if F N is replaced by S N . In the coupled version, which means that the second coordinate's convergence in distribution is also preserved when F N is replaced by S N .
The first coordinate only needs . For this apply Proposition 3.2 with q = 2, D and t 0 = t 0 as already defined, and R n the σ-algebra determined by k. The condition (17) is obviously satisfied, and (16) has already been established for p = 2 3 . Conditioned on any k, φ n is just an empirical process with K ≥ 2N points; thus To prove that (18) holds, the individual jumps in the process S N (t) need to be very nearly independent. Define ξ n = S n − S n−1 , and letξ n be normalized to have mean 0: For t > t 0 , define m 1 = [t 0 K] + 1 and m 2 = [tK], and apply Lemma 4.2. Then for a constant C (depending only on κ). Exactly the same is true if t < t 0 . This completes the proof of condition (18) with the bound α < 3 4 . In the present case α is 2 3 .
It remains only to prove Proof. Begin by conditioning on a particular realization of k. Let A m,i be the event that the draw at time m comes from class number i. By straightforward combinatorics, for i = j and m < n, An application of Lemma 2.4 then shows that for i = j and m = n, This uses the crude approximation K ≥ 2N and This yields (for m = n) For m = n, direct calculation shows that Summing over m and n gives the estimate Taking the expectation with respect to k, and using the crude approximations that κ ≥ 1 and t 2 − t 1 < 1, yields (27).

Random allocations.
Begin with N empty boxes and an unlimited number of balls. Throw the balls one by one into boxes chosen uniformly at random. For positive integers k, let c k (n) be the number of boxes which contain exactly k balls, after n balls have been allocated.
What is the distribution of c * k = max n c k (n), when N is large? Let The restriction of S N (t) to any compact interval [0, T ] converges to a Gaussian process with covariance for s ≤ t.
(ii) For c * k defined as above, the variables N −1/2 c * k − Nq(k) converge weakly to a multivariate normal distribution with mean 0 and covariance The following lemma will be helpful, corresponding to Lemma 4.2 in the sock-sorting problem: Proof. First, max so we may restrict our attention to m and n smaller than N 2 /2.
Since ξ takes on only the values −1, 0, +1, Thus E ξ (m)ξ(n) is bounded by twice the maximum absolute value of the expressions in parentheses. Let A be the event that ball number n goes into the same box as ball number m; let B m (y) be the event that ξ(m) = y. Since n > m, A is independent of B m (y). Then so there will be no loss in restricting all events to A . This will be done without further comment. Further, by symmetry, nothing will be changed if one conditions on ball number m going into box 1 and ball number n going into box 2, just to facilitate the notation.
The conditional probability P ξ(n) = +1 ξ(m) = +1 is just the chance that there are exactly k − 1 balls in box 2 at time n − 1, given that there are k − 1 balls in box 1 at time m − 1. Letting r be the number of balls in box 2 at time m − 1 and summing over all possibilities, it follows that Note that the last step used Lemma 2.5, so assumes that t ≤ N/2, which is to say, that n ≤ N 2 /2. Meanwhile, This yields where G is a polynomial of degree k − 1. Since e −t G(t) is bounded in t, this is a bound of order N −1 uniform in m, n for the first of the four terms that we want to bound. But the other terms are bounded in exactly the same way: the condition ξ(n) = −1 simply means that box 1 has exactly k + 1 balls rather than k, and similarly for ξ(m).
for s ≤ t.
I will say that E f i has a unique quadratic maximum if there is an interior point t 0 ∈ I such that for any positive , sup |t−t 0 |> f i (t) <f i (t 0 ), and for some positive D I will say that "jumps are not clustered" if for every t ∈ [S, T ], Lemma 4.5. Suppose that the expectation functionsf i (t) andv i (t) are twice continuously differentiable, thatf i (t) has a unique quadratic maximum, that v i (T ), has a finite fourth moment, and that jumps are not clustered.
Then F N (t) converges weakly on every compact interval to a Gaussian process with covariance c(s, t); the processes converge weakly on every compact interval to the process and D = −f i (t 0 )/2; and the triple where M is a normal variable independent of G with mean 0 and the same variance as f i (t 0 ).
i , (j = 1, . . . , k) are random functions satisfying the above conditions separately for each j (with t j taking the place of t 0 ), and if, in addition, Proof of Theorem 4.3. (i) The time change for the allocations process derives from a Poisson process. Define E i,j for 1 ≤ i ≤ N and 1 ≤ j < ∞ to be i.i.d. exponential variables with expectation 1, and for t ≥ 0 let This is the indicator function of the event that a Poisson process with rate 1 has the value k at time t, sof Similarly,v The maximum of E f i occurs at t 0 = k, and By Stirling's formula, q(k) ∼ (2πk) −1/2 for large k. Also, for s ≤ t, by the "memorylessness" of exponential variables, The time change is φ N (t) = (T + 1) ∧ 1 . This is just 1 N times a Poisson process with rate N , cut off at T = T + 1 in order to match the formal conditions of Proposition 3.1. In fact, we may and shall ignore the truncation, since the probability of reaching it is on the order of e −N .
The first step is to show that φ N converges uniformly in probability to the identity function. Since lim n→∞ E φ N (t) − φ N (s) 4 = 3|t − s| 2 , Billingsley's moment condition (Theorem 12.3 of [Bil68]) shows that ( φ N ) is tight in the uniform modulus. In addition, E φ N (t) = 0, E φ N (t) 2 = t (for 0 ≤ t ≤ T ), and φ N has independent increments, so Theorem 19.2 of [Bil68] implies the weak convergence of φ N to standard Brownian motion.
An application of Proposition 3.1 then implies (i), once the condition (10) has been established.
(Condition (12) is again trivial, since S N is constant on the intervals between points of the range of φ N .) These simply say that the expectations of the time-changed process F N and of the original process are sufficiently close to one another, and to S = lim n→∞ E S N . Since the f i are i.i.d., the latter condition is trivial.
For any positive integer n, by Lemma 2.5, n k where the big O terms are uniform in n between 0 and T N for any fixed T . It follows that where n = [Nt] completing the proof of (i).
(ii) The asymptotic second-order behavior goes essentially as in section 4.1. The only condition in Lemma 4.5 which is not trivial is (28); but this holds as well, since ≤ 4 lim δ→0 δ −1 P 2 jumps in (t, t + δ) As before, it remains only to show that S N (k) may be substituted for F N (k) in the limit, which will follow if Proposition 3.2 may be applied to show that in the joint probability space N 1/3 S N (k) − F N (k) converges to 0 in probability. In this case, the conditioning σ-algebra R n is trivial.
The condition (17) is satisfied for some D with q = 2, because S(t) = e −t t k /k! is smooth and has vanishing first derivative 0 and negative second derivative at t 0 = k, while being bounded between 0 and S(t 0 ). Condition (19) is simply an elementary statement about the variance of a Poisson distribution.
To establish condition (18), observe that for t > k The calculation is then the same as (26), with Lemma 4.4 in place of Lemma 4.2; t < k is nearly identical. It follows that N −1/3 c k (kN ) − NF N (k) → P 0, and so that N −1/3 c * k − c k (kN ) converges to the desired distribution. 1 if E i,1 + · · · + E i,k ≤ t < E i,1 + · · · + E i,k+1 ; 0 otherwise.

Since the differences
N (k) converge individually in probability to 0 (as N → ∞), any finite collection of them converges jointly to 0. On the scale of √ N the c * k thus have the same joint distribution as the variables F converges in distribution to a multivariate normal distribution with mean 0 and covariance matrix c(k 1 , k 2 ) = q(k 1 ) q(k 2 − k 1 ) − q(k 2 ) for k 1 ≤ k 2 .
Fix a real γ, and define for real u and positive integers k ≤ P A(k, u) ∩ A(k , u ) + P A(k, u) P A(k , u ) .
Each of these terms on the right is bounded above by P E ≤ γ(u ∨ u ) 2 , where E is an exponential variable with expectation 1. Thus, By Lemma 4.5, this is precisely the condition required for the corrections to c * k of order N 1/3 to be asymptotically independent of one another.
This example may readily be generalized by changing the parameters of the exponential variables.
For instance, if one wishes to have probability p i of throwing the ball into box i, simply assign the variables E i,j all to have expectation 1/p i , instead of 1. Another possibility is to change the probability of picking a given box according to its contents. For instance, letting E i,j be exponential with expectation 1/j will distribute the balls to the boxes according to Bose-Einstein statistics. In most imaginable variations, the nature of the limiting behavior is unchanged.