An Erd\"os-R\' enyi law for nonconventional sums

We obtain rge Erd\"os-R\' enyi type law of large numbers for"nonconventional"sums of the form $S_n=\sum^n_{m=1}F(X_m,X_{2m},...,X_{\ell m})$ where $X_1,X_2,...$ is a sequence of i.i.d. random variables and $F$ is a bounded Borel function. The proof relies on nonconventional large deviations obtained in [8].


Introduction
Let X 1 , X 2 , ... be a sequence of independent identically distributed (i.i.d.) random variables such that EX 1 = 0 and the moment generating function φ(t) = Ee tX1 exists. Denote by I the Legendre transform of ln φ and set S n = n m=1 X m for n ≥ 1 and S 0 = 0. The Erdös-Rényi law of large numbers from [4] says that with probability one for all α > 0 in some neighborhood of zero. The nonconventional limit theorems initiated in [5] and partially motivated by nonconventional ergodic theorems study asymptotic behaviors of sums of the form S n = n m=1 F (X m , X 2m , ..., X ℓm ) and more general ones where F is a Borel function. In this paper we will obtain an Erdös-Rényi law similar to (1.1) for such sums where X 1 , X 2 , ... is again a sequence of i.i.d. random variables and F is a bounded Borel function. Observe that summands in nonconventional sums are long range dependent so this result cannot be derived directly from existing literature. On the other hand, as most proofs of the Erdös-Rényi law we will rely on large deviations which in the nonconventional setup were obtained in [8].
The first condition in (2.1) is not a restriction since we always can consider F −F in place of F and the second condition there means that F is not a constant almost surely (a.s.) with respect to the ℓ-product measure µ (ℓ) = µ × µ × · · · × µ on R ℓ where µ is the distribution of X 1 . Set also M = F ∞ and M + = F + ∞ where F + (x 1 , ..., x ℓ ) = max(0, F (x 1 , ..., x ℓ )) and the L ∞ norm on R ℓ is considered with respect to the measure µ (ℓ) . Introduce the moment generating function φ(t) = E exp(tF (X 1 , X 2 , ..., X ℓ )) and its Legendre transform
Our proof of Theorem 2.1 will follow the scheme of [2] but we will rely also on nonconventional large deviations results from [8]. As in some books and many papers on large deviations we did not address explicitly in [8] the crucial question when the rate function of large deviations is positive without which the large deviations principle is meaningless since it does not lead to any nontrivial estimates for the domains were the rate function is zero. We will rely on the following theorem which specifies further the results of [8] and actually provides more information than we need for the proof of Theorem 2.1.

Theorem. The limit
exists where Q(λF ) is a C ∞ function of λ with bounded derivatives and S n , n ≥ 1 are nonconventional sums from Theorem 2.1. The Legendre transform of Q, is a nonnegative, convex, lower semi-continuous function such that J(u) = 0 if and only if u = 0 and J(u) is strictly increasing for u ≥ 0 ( writing for convenience ∞ > ∞) while it is strictly decreasing for u ≤ 0. In addition, if Furthermore, the sums S n , n ≥ 1 satify the large deviations principle in the form for any closed set K ⊂ R while for any open set U ⊂ R,

2.3.
Remark. Theorem 2.1 shows that the Erdös-Rényi law for nonconventional sums has the same form as for sums of i.i.d. random variables having the same distribution as F (X 1 , X 2 , ..., X ℓ ). This is similar to the nonconventional strong law of large numbers proved in [6]. On the other hand, the nonconventional central limit theorem and the nonconventional large deviations estimates are somewhat different from the corresponding results for sums of i.i.d. random variables. In particular, it is shown in [7] that the nonconventional functional central limit theorem may yield in the limit a process with dependent increments while concerning large deviations it follows from [8] that the rate functions I and J above are, in general, different.

Proof of Theorem 2.1
Let Y 1 , Y 2 , ... be a sequence of i.i.d. random variables which have the same distribution as F (X 1 , X 2 , ..., X ℓ ) and set Σ n = n m=1 Y m . We will need the classical Cramér large deviation estimates in the form (see, for instance, Section 2.2 in [3]), for any closed set K ⊂ R while for any open set U ⊂ R, where I is given by (2.2). It is essential to observe that I(α) > 0 (I(α) = ∞ is possible) unless α = 0 which is well known and follows, in particular, from Theorem II.6.3 in [1] (which relies on general convex analysis results) but it has also a simple direct explanation in our case. Indeed, since ln φ(0) = (ln φ(t)) ′ t=0 = 0 then ln φ(t) = o(t) for small t. Hence, if α = 0 then tα > ln φ(t) either for small positive or for small negative t, and so in view of (2.2), I(α) = 0 only when α = 0 and otherwise I(α) is positive. By (2.1) and the Jensen inequality ln φ(t) ≥ tEF (X 1 , ..., X ℓ ) = 0, and so (see Therefore, for any ∆ > 0, which means that I(α) is strictly increasing for α ≥ 0. Similarly, I(α) is strictly decreasing for α ≤ 0. Observe that, in fact, for any ε > 0, Similar arguments relying on explicit formulas from [8] yield Theorem 2.2 but for now we will take it for granted in order to prove Theorem 2.1.
On the other hand, if m ≤ (ℓ − 1)b n and ℓ > 1 then we write is large enough and we use that J(β) is strictly increasing when β ≥ 0.
Since |S m | ≤ mM a.s. then Now assume that (ℓ − 1)b n ≥ m ≥ α 2M b n and ℓ > 1. Observe that −S m = m k=1 (−F (X k , X 2k , ..., X ℓk )), and so we can consider nonconventional large deviations estimates of Theorem 2.2 for the case where F is replaced by −F with a corresponding rate functionĴ having the same properties as J. Then we obtain is large enough and we use that J(β) is strictly increasing when β ≥ 0 (of course, ifĴ( 1 2(ℓ−1) (α + ε)) = ∞ then any δ will do).
Taking into account that S m+bn − S m is a sum of i.i.d. random variables having the same distribution as F (X 1 , X 2 , ..., X ℓ ) when (1 − ℓ −1 )n ≤ m ≤ n − b n we obtain by Cramér's lower large deviations bound (3.2) that (3.15) where we choose δ > 0 so small that (I(α − ε) + δ)/I(α) < 1 − δ which is possible since I(β) is strictly increasing for β ≥ 0. Hence, if n is sufficiently large, It follows that ∞ n=1 P (B n (ε)) < ∞ and by the Borel-Cantelli lemma with probability one B n (ε) occurs only finitely often which implies that Since ε > 0 can be chosen arbitrarily small we obtain the assertion of Theorem 2.1 from (3.12) and (3.17).

Proof of Theorem 2.2
Theorem 2.2 mostly follows from the results of [8] together with Theorem II.6.3 from [1] but for reader's convenience we will give a direct argument here. First, we recall relevant notations and formulas from [8]. Let r 1 , ..., r m ≥ 2 be all primes not exceeding ℓ. Set A n = {a ≤ n : a is coprime with r 1 , ..., r m } and B n (a) = {b ≤ n : b = ar d1 1 r d2 2 · · · r dm m for some nonnegative integers d 1 , ..., d m }. For any function V on R ℓ we write observing that S n from Theorem 2.2 equals S n (F ) here.
The existence of the limit (2.3) was proved in [8]. Recall, that convexity and lower semi-continuity of the Legendre transform J(u) of Q(λF ) follows from (2.3) and (2.4) automatically (see Theorem II.6.1 in [1]). Observe that by (2.1) and the Jensen inequality Q(λF ) ≥ 0 and since Q(λF ) ≤ |λ|M then J(u) = ∞ when u > M . Note that Theorem 2.7 in [8] is formulated for continuous functions but, in fact, only boundedness of functions is used in the proof so we can apply it to our setup where F ∞ = M < ∞.
In order to exhibit an explicit formula for Q(λF ) obtained in [8] introduce It was shown in [8] that for any l ≥ 1, Set Z n,a (λF ) = E exp S N,a (λF ). As it was explained in [8] the distribution of S N,a (λF ) depends only on |B n (a)| (in addition to λF , of course), and so Z n,a (λF ) is determined by |B n (a)|. Hence, we can set R l (λF ) = Z n,a (λF ) provided |B n (a)| = l. Now we can write the formula for Q obtained in [8], The series in (4.2) converges absolutely in view of (4.1) taking into account that ln R l (λF ) ≤ lM |λ|. By (2.1) and the Jensen inequality we have also that ln R l (λF ) ≥ 0. Now observe that for any k ≥ 1, and so for some C k > 0 depending only on k. It follows that Q(λF ) is C ∞ in λ and (e −ρ min (l) − e −ρmax(l) )l k and the latter series converges absolutely in view of (4.1  ) and (4.6) yield already that J(u) attains its infimum at the unique point 0 and it is positive when |u| > 0. As in Section 3 the direct argument proceeds as follows. Since Q(0) = 0 then (4.6) implies that Q(λF ) = o(λ) for small λ, and so |λu| > Q(λF ) when |λ| is small which together with (2.4) yields the assertion above. Taking into account that Q(λF ) ≥ 0 we see that (4.7) J(u) = sup λ≥0 (λu − Q(λF )) if u ≥ 0 and J(u) = sup λ≤0 (λu − Q(λF )) if u ≤ 0.