Renewal theory for asymmetric $U$-statistics

We extend a functional limit theorem for symmetric $U$-statistics [Miller and Sen, 1972] to asymmetric $U$-statistics, and use this to show some renewal theory results for asymmetric $U$-statistics. Some applications are given.


Introduction
Let X, X 1 , X 2 , . . . , be an i.i.d. sequence of random variables taking values in an arbitrary measurable space S = (S, S). (In most cases, S = R or perhaps R k , or a Borel subset of one of these, but we can just as well consider the general case.) Furthermore, let d 1 and let f : S d → R be a given measurable function. We then define the (real-valued) random variables We call U n a U -statistic, following Hoeffding [14].
Remark 1.1. Many authors, including Hoeffding [14], normalize U n by dividing the sum in (1.1) by n d , the number of terms in it; the traditional definition (which assumes n d) is thus in our notation U n / n d . We find it more convenient for our purposes to use the unnormalized version above.
It is common, following Hoeffding [14], to assume that f is a symmetric function of its d variables. In this case, the order of the variables does not matter, and we can in (1.1) sum over all sequences i 1 , . . . , i d of d distinct elements of {1, . . . , n}, up to an obvious factor of d!. ( [14] gives both versions.) Conversely, if we sum over all such sequences, we may without loss of generality assume that f is symmetric. However, in the present paper we consider the general case of (1.1) without assuming symmetry, which we for emphasis may call asymmetric U -statistics. One of the purposes of this paper is to generalize a result by [24] on functional convergence from the symmetric case to the general, asymmetric case. We then use this result to derive some renewal theory results for the sequence U n . One motivation for this is some applications to random restricted permutations, see Section 5.
Univariate limit results, i.e., limits in distribution of U n after suitable normalization, are well-known also in the asymmetric case, see e.g. [18,Chapter 11.2]. The possibility of functional limits is briefly mentioned in [18,Remark 11.25], and a special case (d = 2 and f antisymmetric) was Date: 14 April, 2018. 2010 Mathematics Subject Classification. 60F05; 60F17, 60K05. Partly supported by the Knut and Alice Wallenberg Foundation. 1 studied in [22], see Example 5.1; However, we are not aware of functional limit theorems in the generality of the present paper.
The main results are stated in Section 3. The proofs are given in Section 4; they use standard methods, in particular the decomposition and projection method of Hoeffding [14], but some complications arise in the asymmetric case; this includes applications to random restricted permutations that gave the initial motivation to write the present paper. Some examples and applications are discussed in Section 5. We end with some further comments and open problems in Section 6; this includes more comments on the relation between the symmetric and asymmetric cases.
The results in the present paper focus on the non-degenerate case, where the covariance matrix Σ = (σ ij ) defined by (3.2) below is non-zero. In the degenerate case when Σ = 0, the result still holds but are less interesting, since the obtained limits in e.g. Theorem 3.2 are degenerate. See Remark 6.3 for further comments on the degenerate case.

Some notation
We consider as in the introduction, unless otherwise said, some given i.i.d. random variables X i ∈ S and a given function f : S d → R. In particular, d 1 is fixed, and we therefore often omit it from the notation.
F n is the σ-field generated by X 1 , . . . , X n . If we consider a limit as n → ∞, and a n is a given sequence, then o a.s. (a n ) denotes a sequence of random variables R n such that R n /a n a.s. −→ 0. This extends to other limits such as x → ∞, mutatis mutandis.
C denotes positive constants that may change from one occurence to the next; they may depend on d (ord) but not on f or n or other variables.
Similarly, C f denote constants that may depend on f , C p denotes constants that may depend on the parameter p (and d), and so on.

Main results
3.1. Limit theorems. For completeness, we begin with the law of large numbers, extending the result by Hoeffding [15] to the asymmetric case.
Next we state a functional limit theorem, extending the theorem by Miller and Sen [24] for the symmetric case. We use the space D[0, ∞) with the usual Skorohod topology, see e.g. [23,Appendix A2]; recall that convergence in D[0, ∞) to a continuous limit is equivalent to uniform convergence on any compact interval [0, T ]. We define the d × d matrix Σ = (σ ij ) by with f i , f j defined by (4.1) below. Let W(t) := W 1 (t), . . . , W d (t) , t 0, be a continuous d-dimensional Gaussian process with W(0) = 0 and stationary independent increments Note that each component W j is a standard Brownian motion up to a factor σ 1/2 jj , and that we can represent W as We extend U n defined by (1.1) to a function of a real variable by U x := U ⌊x⌋ , x > 0. (We tacitly do the same for other sequences later.) where Z t is a continuous centered Gaussian process that can be defined as Equivalently, Z t has the covariance function, for 0 s t, Moreover, (3.5) holds jointly for several functions f (k) , possibly with different d (k) , with limits given by (3.6), where the corresponding W (k) j together form a Gaussian process with stationary independent increments given by the covariances (3.8) The Itô integrals in (3.6) can by (3.4) be written as linear combinations of t k t 0 s d−1−k dW j (s) with 0 k d − j; thus Z t is well-defined and continuous for t 0, with Z 0 = 0. These stochastic integrals can also by integration by parts be expressed as Riemann integrals of continuous stochastic processes, see (4.18).
Note that the final integral in (3.7) is elementary, for any given i, j, d, and that the covariance function in (3.7) is a homogeneous polynomial in s and t of degree 2d − 1. Example 3.3. In the case d = 2, we obtain from (3.7), still for 0 s t, Remark 3.4. By (3.4) and the binomial theorem, In the symmetric case, all f i are equal and thus all σ ij are equal, see (3.2). Hence, (3.7) simplifies by (3.10) to (3.11) This recovers the result by Miller and Sen [24] for the symmetric case. Note that our general result Theorem 3.2 is similar to the symmetric case, with a continuous Gaussian limit process, but that the covariance function in general is more complicated, as seen for d = 2 in (3.9), and that the limit thus is not a Brownian motion.
By restricting attention to t = 1, we obtain the following univariate limit, shown in [18,Corollary 11.20]. where Moreover, N + (x) := inf{n 0 : U n > x}. (3.17) Note that if f 0, then N + (x) = N − (x)+1, but if f attains negative values, then N − (x) > N + (x) is possible. Most of our results apply to both N + and N − ; we then use N ± to denote any of them.
The results above easily imply some renewal theorems for U -statistics generalizing well-known results for S n (i.e., the case d = 1). We begin with a law of large numbers.
A situation that is common in application is to stop when when one process (such as our U n ) reaches a threshold, and then look at the value of another process, say U n . For standard renewal theory, i.e. the case d = 1 in our setting, this was studied in [12]; we extend the main result there to (asymmetric) U -statistics. We consider as above an i.i.d. sequence X 1 , X 2 , . . . with values in S, but we now have two functions f : S d → R andf : Sd → R, where the numbers of variables d andd may be different. We use notations as above for both f andf , with˜to denote variables defined byf , for example U n := U n (f ) andμ := Ef ; we furthermore assume that the Gaussian processes W i (t) and W j (t) have the joint distribution specified by (3.8) (with obvious notational changes), and thus (3.5) holds jointly for f andf with limits Z t andZ t . Theorem 3.9. (i) Suppose that f (X 1 , . . . , X d ) ∈ L 1 ,f (X 1 , . . . , Xd) ∈ L 1 and µ > 0. Then, as x → ∞, . . , Xd) ∈ L 2 and µ > 0. Then, as x → ∞,
The asymptotic variance γ 2 in Theorem 3.9 can easily be calculated exactly using (3.6), (3.8) and (3.4), but a general formula seems more messy than illuminating, and we state only the special case d = 1. (In this case, U n is the standard partial sum n i=1 f (X i ).) Theorem 3.11. Suppose that f (X) ∈ L 2 ,f (X 1 , . . . , Xd) ∈ L 2 and µ > 0. Then, as x → ∞, where Moreover, Continue to assume that d = 1, and assume for simplicity that Y := f (X) 0 a.s. Thus U n (f ) = S n (f ) := n 1 Y i is a renewal process, and its overshoot (residual life time) is A classical result, see e.g. [10, Theorem 2.6.2], says that if 0 < µ < ∞, then R(x) converges in distribution. Recall that (the distribution of) Y has span d > 0 if Y ∈ dZ a.s., and d is maximal with this property, and that (the distribution of) Y is nonarithmetic if no such d exists.
This classical result may be combined with Theorem 3.11 as follows.
Theorem 3.13. Suppose in addition to the assumptions of Theorem 3.11 that f (X) 0 a.s. Let R ∞ be as in Proposition 3.12.
(We consider only x such that we condition on an event of positive probability.) 3.3. Moment convergence. In Corollary 3.5, we have convergence of the second moment in (3.12), and trivially also of the first moment. We have also convergence of higher moments, provided we assume the corresponding integrability of f . Theorem 3. 15. Suppose that f (X 1 , . . . , X d ) ∈ L p with p 2. Then, (3.12) holds with convergence of all moments and absolute moments of order p.
(i) Then, (3.26) holds with convergence of all moments and absolute moments.

Proofs
4.1. Limit theorems. The method used by Hoeffding [14] and many later papers is a decomposition, which in the asymmetric case is as follows. Assume that f (X 1 , . . . , X d ) ∈ L 2 and define, recalling (2.1), (In general, these are defined only a.e., but that is no problem.) Then, by the definition (1.1), We consider the three terms in (4.3) separately. The first is a constant, and we shall see that the third term is negligible, so the main term is the second term.
Remark 4.1. The decomposition (4.3) may be continued to higher terms by expanding f * further, see e.g. [14] for the symmetric case and [18, Chapter 11.2] in general; this is important when treating degenerate cases, see Remark 6.3, but for our purposes we have no need of this.
For the second term, we define for convenience, for 1 j d and n 1, ∆a n,j (i) := a n,j (i + 1) − a n,j (i), 1 i < n. Recall ψ(s, t) defined in (3.4), and let ψ ′ (s, t) denote ∂ ∂s ψ(s, t). Lemma 4.2. Uniformly for all n, j, i such that the variables are defined, In particular, a n,j (i) = O n d−1 and ∆a n,j (i) = O n d−2 . Furthermore (for d 2), any error term O(n −1 ) or O(n −2 ) here vanishes identically.
By the Skorohod coupling theorem [23,Theorem 4.30], we may assume that the convergence in (4.14) holds a.s., and thus as n → ∞, uniformly for t ∈ [0, T ] and all j, for every fixed T < ∞. (Note that the error term here, R n,j,t say, is random; the uniformity means that sup j d, t T |R n,j,t | a.s. −→ 0 for every T .) Fix T , and let m = ns with s T . Then, by (4.13), (4.15) and Lemma 4.2, uniformly for s ∈ [0, T ], An integration by parts yields (with stochastic integrals) and combining (4.16), (4.17) and (4.18) yields, using ψ j (m, m) = n d−1 ψ j (s, s), uniformly for 0 s T . Since T is arbitrary, this yields (4.11), jointly for all j.
To show that the final term in (4.3) is negligible, we give another lemma. Cf. [30] for similar results in the symmetric case.
Proof. (i): We introduce another decomposition of f and U n , which unlike the one in (4.1)-(4.3) focusses on the order of the arguments. Let F 0 := µ and, for 1 k d, In other words, using a summation by parts and the identity Since the right-hand side is weakly increasing in n, it follows that . . , X i k−1 , X n ) that all have the same distribution, and thus by Minkowski's inequality, and thus U n (F k ), n 0, is a martingale. Consequently, using (4.28), and Doob's inequality yields Finally, (4.27), (4.30) and Minkowski's inequality yield Hence, assuming f k = 0, It was seen in the proof of (i) that ∆U n (F k ) is a sum of n−1 k−1 terms F k (X i 1 , . . . , X i k−1 , X n ). It now follows from (4.33) that if {i 1 , . . . , i k−1 } and {j 1 , . . . , j k−1 } are two disjoint sets of indices, then, by first conditioning on X n , The second term in (4.3) is d j=1 U nt,j , using the notation in (4.10), and we use Lemma 4.3; (4.11) shows that this term divided by n d−1/2 converges in D[0, ∞) to Z t defined in (3.6).
For the third term, we apply Lemma 4.4 to f * . It follows from the definition (4.2) that µ * := E f * (X 1 , . . . , X d ) = 0 and that, applying (4.1) to f * , (f * ) i = 0 for every i d. Hence, Lemma 4.4(ii) applies to f * and yields Cn 2d−2 f 2 2 . Proof of Theorem 3.1. We do this in several steps.
Step 2. Assume now f ∈ L 1 and f 0. Define the truncation f M := f ∧ M . Then f M ∈ L 2 and Step 1 shows that for every M < ∞, a.s., Step 3. Continue to assume f ∈ L 1 and f 0. For every permutation π ∈ S d , let f π (X 1 , . . . , X d ) := f (X π(1) , . . . , X π(d) ), and let F := π∈S f π and g := F − f = π =id f π . Note that f, g ∈ L 1 with f, g 0; thus Step 2 applies to both f and g. Furthermore, F = f + g is symmetric, so we have . . , X d ) by the theorem by Hoeffding [15] for the symmetric case. (This case has a simple reverse martingale proof, see Remark 6.6.) Consequently, a.s., (4.40) Combined with Step 2, this shows (3.1) for every f ∈ L 1 with f 0.
Step 4. The general case follows by linearity.
We used for convenience the known symmetric case in this proof. An alternative would be to use suitable truncations, similarly to the original proof of the symmetric case by Hoeffding [15]. Lemma 4.5. Suppose that f (X 1 , . . . , X d ) ∈ L 2 . Then, as n → ∞, with Z 1 defined by (3.6), Proof. We may assume µ = 0. Then Consequently, by (3.7), Furthermore, this equals the sum in (4.41), as is seen by taking s = t = 1 in (3.7) and evaluating the resulting Beta integral.
Remark 4.6. Similarly, it follows more generally that Cov U ns , U nt /n 2d−1 → Cov(Z s , Z t ) given by (3.7), for any fixed s, t 0. In other words, (3.5) holds with convergence of second moments.
Proof of Corollary 3.5. The functional limit (3.5) implies, since Z t is continuous, convergence (in distribution) for each fixed t 0. Taking t = 1 we obtain (3.12) with σ 2 = Var Z 1 , which is evaluated by Lemma 4.5. By (3.7) and (3.2), Cov f i (X), f j (X)

Renewal theory.
Proof of Theorem 3.7. Consider first N − . Note that Theorem 3.1 and µ > 0 imply U n → ∞ a.s., and then N − (x) < ∞ for every x.
Restart the process after N + (x − ) and continue until N + (x). Since N + (x − ) is a stopping time, this continuation is independent of what happened up to N + (x − ), and thus it can be regarded as a renewal process S * n starting at 0 and running to N + (∆x); in particular, the overshoot R * (∆x) of this renewal process equals the overshoot R(x) of the original one. Here ∆x is random, but independent of the renewal process S * n , and since ∆x p −→ ∞, Furthermore, this holds conditioned on any events E(x − ) that depend on the original process up to N + (x − ), provided lim inf x→∞ P(E(x − )) > 0.
If the span d > 1, then R(x) = k implies x + k = U N + (x) ≡ 0 (mod d) and thus x ≡ −k (mod d), so we consider only x ∈ −k + dZ. Let k 0 := d⌈k/d⌉ and ∆ := k 0 −k ∈ [0, d −1]. Then x−∆ ≡ x+k ≡ 0 (mod d), and thus, since (4.74) Hence, we may replace x and k by x − ∆ and k 0 , and thus it suffices to consider x, k ∈ dZ, but then we can reduce to the case d = 1 by replacing f (X) by f (X)/d.

Moment convergence.
We turn to proving the theorems on moment convergence in Section 3.3, and begin by extending Lemma 4.4 to higher absolute moments.
Proof. We use the same decomposition as in the proof of Lemma 4.4. Note that, by Jensen's inequality, F k p f p , and thus, Hence, Minkowski's inequality yields, as in (4.28), Consequently, the Burkholder inequalities [11, Theorem 10.9.5(i)] applied to the martingale U n (F k ) yield, using also Hölder's inequality, Equivalently, Finally, (4.27), (4.79) and Minkowski's inequality yield which is (4.75).
We shall also use the following standard result, stated in detail and proved for convenience and completeness. Proof. If δ > 0 and E is any event with P(E) δ, then, using Hölder's inequality, Since ε is arbitrary, this can be made arbitrarily small, uniformly in α, by choosing first choosing ε and then δ small.

Proof of Theorem 3.15.
Denote the left-hand side of (3.12) by V n . Then E |V n | p is bounded by Lemma 4.7. This implies convergence of all moments and absolute moments of order < p in (3.12) by standard arguments, but is not by itself enough to include moments of order p. Thus we use a truncation: let M > 0 and let (4.82) 4.83) and also, using 2p instead of p, We use another simple lemma.
Proof of Theorem 3. 16. As usual, we consider for definiteness N − (x). By the definition (3.16), Suppose throughout x 1, and recall n(x) defined by (4.49). By (4.93) and Lemma 4.7, for any p > 0 and any A 1, Hence, for any p 0 and q > 0, using (4.94), Choosing q := 2dp, we obtain by summing (4.94) with A = 2 and (4.95) with A = 2 k , k = 1, 2, . . . , for every p > 0, This shows that if Y (x) denotes the left-hand side of (3.19), then E |Y (x)| p C p,f for x 1. By standard arguments [11,, this implies uniform integrability of |Y (x)| r for any r < p, and thus by (3.19) convergence of moments of order < p. Since p is arbitrary, convergence of arbitrary moments in (3.19) follows. Moment convergence in (3.18) is an immediate corollary. Alternatively, for every fixed p > 0, which implies moment convergence in (3.18) by the same uniform integrability argument.
It follows from (4.99) and (4.103)-(4.105) that Since p is arbitrary, this implies convergence of arbitrary moments in (3.21) by the same standard argument as in the proof of Theorem 3.16. Moment convergence in (3.20) is a corollary.
Proof of Theorem 3.18. (i): This is a special case of Theorem 3.17.
(ii): Denote the left-hand side of (3.26) by V (x), for integers x 1, and let p > 0. It follows from (i) that the family |V (x)| p , x 1, is uniformly integrable. This property is preserved by the conditioning, since we condition on a sequence of events E x with lim inf x→∞ P(E x ) > 0 by the proof of Theorem 3.13; hence the result follows from Theorem 3.13.
Example 5.2 (Substrings). Consider a random string X 1 · · · X n of length n from a finite alphabet A, with the letters X i i.i.d. with some distribution P(X i = a) = p a , a ∈ A. Fix a pattern W = w 1 · · · w m ; this is an arbitrary string in A m , for some m 1. A substring of X 1 · · · X n is any string X i 1 · · · X i k with 1 i 1 < · · · < i k n, and we let N n = N W (X 1 · · · X n ) be the number of substrings that have the pattern W. Obviously, this is an asymmetric U -statistic as in (1.1) with S = A, d = m and Corollary 3.5 yields asymptotic normality of N n as n → ∞, as shown by Flajolet, Szpankowski and Vallée [8].

Example 5.3 (Patterns in permutations)
. Let π = π 1 · · · π n be a uniformly random permutation of length n, and let the pattern σ = σ 1 · · · σ m be a fixed permutation of length m. The number of occurences of σ in π, denoted by N n = N σ (π) is the number of substrings (see Example 5.2) of π that have the same relative order as σ.
We can generate the random permutation π by taking i.i.d. random variables X 1 , . . . , X n ∼ U (0, 1), and then replacing these numbers by their ranks. Then N n is the U -statistic with d = m given by the function f (x 1 , . . . , x m ) = 1{x 1 · · · x m have the same relative order as σ 1 · · · σ m }.
(5.4) Corollary 3.5 shows that N n is asymptotically normal as n → ∞. For details, including explicit variance calculations, see [21]; see also the earlier proof of asymptotic normality by Bóna [3,4].
For example, taking σ = 21, N n is the number of inversions in π, and we obtain by simple calculations the well-known result, see e.g. [7,Section X.6], Example 5.4 (Restricted permutations I). Fix a set T of permutations, and consider only permutations π of length n that avoid T , in the sense that there is no occurence of any τ ∈ T in π. Let π be uniformly random from this set, for a given n. Several cases are studied in [20], and some of them yield asymmetric Ustatistics, sometimes stopped or conditioned as in Theorem 3.11 or 3.13. We sketch two examples here and in the next example, and refer to [20] for details and further similar examples.
For example, taking σ = 21, so N 21,n is the number of inversions in π, b = 1 and, by a calculation, γ 2 = 6; hence N 21,n − n n 1/2 d −→ N 0, 6 . (5.9) We here applied the conditional result in Theorem 3.13. Alternatively (since a geometric distribution has no memory), we may avoid the conditioning above and instead truncate the last element L B such that the sum becomes exactly n; using a simple approximation argument, we can then apply the unconditional Theorem 3.11. Let σ be a fixed permutation that avoids {231, 312, 321}, with block lengths ℓ 1 , . . . , ℓ b ∈ {1, 2}. Then the number N σ,n = N σ (π) of occurrences of σ in π is given by a U -statistic based on L 1 , . . . , L B , with d = b and the functionf in (5.7). Theorem 3.13(iii) applies and shows asymptotic normality in the form for some µ > 0 and γ 2 > 0 depending on σ.
For example, taking σ = 21, so N 21,n is the number of inversions in π, b = 1 and, by calculations, see [20], µ = (3 − √ 5)/2 and γ 2 = 5 −3/2 ; hence . If we only want to conclude convergence of a specific moment, e.g. convergence of second moments in (3.19) or (3.21), the proofs above show that it suffices to assume existence of some specific moment for f andf . However, we do not know the best possible moment conditions for this, and we leave it as an open problem to find optimal conditions. (The proofs above are not optimized; furthermore, the methods used there are not necessarily optimal.) In particular, we do not know whether convergence of summing over 1 i j,1 < · · · < i j,d(j) n j for every j = 1, . . . , ℓ, a multidimensional functional limit theorem has been given by Sen [28] in the symmetric case (i.e., with f symmetric in each of the ℓ sets of variables); see also e.g. [25], [13], [6]. We expect that this too can be extended to the asymmetric case, but we leave this to the interested reader. Remark 6.5. There is a standard trick to convert an asymmetric U -statistic to a symmetric one, see e.g. [18]. Let Y i ∼ U (0, 1) be i.i.d. random variables, independent of (X j ) ∞ 1 , let Z i := (X i , Y i ) ∈S := S × R, and define F :S n → R by F (x 1 , y 1 ), . . . , (x d , y d ) := f (x 1 , . . . , x d )1{y 1 < · · · < y d } (6.2) and its symmetrized version This trick often makes it possible to transfer results for symmetric U -statistics to the general, asymmetric case. However, this trick works only for a single n, and we do not know of any similar trick that can handle the process (U n ) ∞ n=0 . Hence this method does not seem useful for the results above. Remark 6.6. In the symmetric case, it is easily seen that U n / n d , n d, is a reverse martingale, which for example yields a simple proof of the law of large numbers; see [1] and e.g. [11,Chapter 10.16.2]. This does not hold in general; thus we used above (in the proof of Lemma 4.4) instead forward martingales similarly to [15].