Nonconventional Random Matrix Products

Let $\xi_1,\xi_2,...$ be independent identically distributed random variables and $F:\bbR^\ell\to SL_d(\bbR)$ be a Borel measurable matrix-valued function. Set $X_n=F(\xi_{q_1(n)},\xi_{q_2(n)},...,\xi_{q_\ell(n)})$ where $0\leq q_1<q_2<...<q_\ell$ are increasing functions taking on integer values on integers. We study the asymptotic behavior as $N\to\infty$ of the singular values of the random matrix product $\Pi_N=X_N\cdots X_2X_1$ and show, in particular, that (under certain conditions) $\frac 1N\log\|\Pi_N\|$ converges with probability one as $N\to\infty$. We also obtain similar results for such products when $\xi_i$ form a Markov chain. The essential difference from the usual setting appears since the sequence $(X_n)$ is long-range dependent and nonstationary.


Introduction
Products Π N = X N · · · X 2 X 1 of random matrices X 1 , X 2 , ... are extensively studied for more than half a century now. In the pioneering work [7], it was shown that when X 1 , X 2 , ... form a stationary sequence with E ln + X 1 < ∞ then the limit γ 1 = lim N →∞ 1 N ln Π N exists with probability one. Later, the more general Kingman's subadditive ergodic theorem became available and it yielded the above result as a corollary. Applying it to actions on the exterior products, the result was extended to all the singular values of Π N , thus leading to the Oseledets multiplicative ergodic theorem.
In this paper we study similar questions for products of certain nonstationary sequences of random matrices. Namely, we start with a sequence of i.i.d. random variables ξ 1 , ξ 2 , ... and a Borel measurable matrix valued function F : R → SL d (R) along with integer valued functions 0 ≤ q 1 < q 2 < ... < q , and form the random matrices X n = F (ξ q1(n) , ξ q2(n) , ..., ξ q (n) ). In particular, we allow arithmetic progressions q i (n) = in, i = 1, ..., . The sequence X 1 , X 2 , ... is long range dependent and is not stationary, and so the study of the asymptotic behavior as N → ∞ of the product Π N = X N · · · X 2 X 1 is not described by the standard results mentioned above. Still, we Random matrix products show that lim N →∞ 1 N ln Π N exists with probability one and applying this to exterior products we will obtain corresponding results for all the singular values of Π N . Similar results are obtained also for such products when X n = F (ξ n , ξ 2n , ..., ξ n ) and ξ i form a Markov chain satisfying certain conditions of the type of uniform geometric ergodicity.
On the other hand, our motivation stems from the series of papers, originating in Furstenberg's proof of the Szemerédi theorem, on nonconventional ergodic and limit theorems which dealt with the sums of the form N n=1 ϕ(ξ q1(n) , ξ q2(n) , ..., ξ q (n) ) (see, for instance, [10] and references therein). Our results can be viewed as a counterpart of the nonconventional strong law of large numbers in the multiplicative setting.
(i) G µ is strongly irreducible, i.e. there does not exist a finite union of proper subspaces of R d that is preserved as a set by all matrices from G µ (see [4]).
Recall that the singular values s 1 (g) ≥ s 2 (g) ≥ ... ≥ s d (g) ≥ 0 of a d × d matrix g are the square roots of the eigenvalues s 2 i (g) of g * g. The first singular value s 1 (g) is the Euclidean operator norm of g, Since F ≡ 1 if d = 1 and the problems discussed here become trivial then, we assume without loss of generality that d > 1. Let Y 1 , Y 2 , ... be an i.i.d. sequence of random matrices having the distribution µ, and so satisfying (i) and (ii) of Assumption 2.1 with Y 1 in place of X 1 . Hence (cf. [4,3]), the exist with probability one; in particular, The following theorem asserts that the similar result holds true for Π N = X N · · · X 2 X 1 .
if this were true for Π N itself (see [4]). Hence, proving Theorem 2.2 for each s i (∧ i Π N ) we will obtain that which yields (2.4). The proof of Theorem 2.2, presented in Sections 3 and 4, is based on two main ingredients. The first one is a large deviations bound for products of random matrices which was first proved by Le Page under the additional contraction assumption. We rely on a version of this result from Theorem 14.19 in [3] which does not require the contraction condition. In fact, the upper bound of large deviations from Theorem 6.2 on p.131 of [4] suffices for our purposes, as well. The second ingredient playing a decisive role in our proof of the lower bound below is the avalanche principle proved originally for two dimensional matrices in [8] and extended (in a strengthened form) to the multidimensional case in [6]. It is not difficult to see that the convergence in Theorem 2.2 holds true also in mean which does not require large deviations estimates but only a subadditivity argument together with the avalanche principle.

Markov case
Next, we discuss the case when ξ 0 , ξ 1 , ξ 2 , ... form a Markov chain on a Polish space E (to conform with the standard notation, we start the indices from 0), F : E → SL d (R) is a Borel measurable matrix function and X n = F (ξ q1(n) , ξ q2(n) , ..., ξ q (n) ) with q i (n), i = 1, ..., satisfying Assumption 2.1(iii). Let P (n, x, ·), x ∈ E be the n-step transition probability of the Markov chain above, P (x, ·) = P (1, x, ·) and assume that there exists a probability measure ν on E such that for some R, ρ > 0, all n ≥ 1 and any bounded Borel function f on E, This assumption will be satisfied for an aperiodic Markov chain if, for instance, a version of the Doeblin condition holds true (see, for instance, [5], Section 21.23). It follows that ν is the unique invariant measure of this Markov chain, i.e. the only measure ν satisfying dν(x)P (x, Γ) = ν(Γ) for any Borel set Γ ⊂ E, and so ν is ergodic. Taking ν as the initial distribution of the Markov chain, i.e. as the distribution of ξ 0 , makes it a stationary ergodic process. Still, the condition (2.6) will enable us to obtain stronger results for the Markov chain starting at any initial point x ∈ E.
Let {ξ (i) n , n ≥ 0}, i = 1, ..., be independent copies of the Markov chain {ξ n , n ≥ 0} which produces an -component Markov chain Ξ n = (ξ where Ex,x = (x 1 , ..., x ) is the expectation with respect to the probability Px of the Markov chain Ξ n , n ≥ 0 starting atx. Set H n = Y n · · · Y 2 Y 1 . It follows from [2] (see also Section 5) that the limits exist Px-almost surely (a.s.) for eachx ∈ E where, again, s i (g) is the i-th singular value of a matrix g. Viewing (2.8) as a definition of γ i 's we assume also that, for some sufficient conditions for this can be found in [1] and [12]. In addition, following [2] we assume quasi-irreducibility which means that the subspaces are trivial for almost allx = (x 1 , ..., x ) with respect to the product measureν. Denote by P x the path space probability of the Markov chain ξ n , n ≥ 0 provided that ξ 0 = x. The singular values for each x ∈ E.
The proof of this result will be given in Section 5 relying on the large deviations theorem for products of Markov dependent random matrices from [2] and an additional argument enabling us to compare large deviations estimates for the products H m and for Π n+m Π −1 n in spite of the fact that the latter is not a product of Markov dependent random matrices.

Upper bound
There are two cases in the proof of Theorem 2.2: γ 1 = 0 and γ 1 > 0. The first case requires only the upper bound since ln A ≥ 0 for any A ∈ SL d (R). The second case will require both a lower and an upper bound so we will start with the latter which will serve in both cases. In fact, by Furstenberg's theorem (see Theorem 6.3 on p.66 in [4]) under the strong irreducibility condition γ 1 = 0 if and only if G µ is contained in a compact subgroup; then each X n belongs to this subgroup too and Theorem 2.2 follows in this case directly.

Lower bound
First, observe that without loss of generality we can assume here that γ 1 > γ 2 where the γ i 's were defined in (2.3). Indeed, either γ 1 = γ 2 = ... = γ d and then γ i = 0 for all i's since all the matrices here have determinant equal one, or γ 1 = ... = γ k > γ k+1 ≥ ... ≥ γ d for some 1 ≤ k < d. Then we can prove the result for the first singular value of the k-th exterior power ∧ k Π N of Π N obtaining that with probability one, Hence, we can and will assume here that γ 1 > γ 2 , γ 1 > 0 and start with another bound of large deviations for products of i.i.d. random matrices (see [3]) which in the same notation as in Section 3 says that for any ε > 0 there exists κ(ε) > 0 and n 1 (ε) ≥ 1 such that for all n ≥ n 1 (ε). Let r(n) and n 2 (ε) be the same as in Section 3. Then, for all n ≥ n 2 (ε) we obtain ln X n+r(n)−1 · · · X n+1 X n < γ 1 − ε} ≤ e −κ(ε)r(n) .

(4.2)
Since there exists no inequality similar to (3.5) to employ for a proof of the lower bound we will need a more advanced argument in order to make use of the splitting of the product X N · · · X 2 X 1 into appropriate products of i.i.d. matrices. Namely, we will rely on the avalanche principle which appears for products of multidimensional matrices in [6]. Following [6] for each g ∈ GL d (R) we set gr(g) = s 1 (g) s 2 (g) which is called the gap of g ∈ GL d (R). Now we have (see [6, §2.4]), Theorem 4.1. (Avalanche Principle). There exist universal constants c, C > 0 such that whenever a ≥ cb > c and g j ∈ GL d (R), j = 1, ..., l satisfy (i) gr(g j ) ≥ a, j = 1, ..., l and (ii) ln g j+1 g j − ln g j+1 − ln g j ≥ − 1 2 ln b then ln g l · · · g 2 g 1 + l−1 j=2 ln g j ≥ l−1 j=1 ln g j+1 g j − Cl b a , j = 1, ..., l − 1. Observe that from (ii) and (4.3) we obtain ln g l · · · g 2 g 1 ≥  Let us take where m 1 < m 2 < ... < m k(N ) are as in Section 3. This together with (4.4) will yield the required lower bound of the form provided that we can obtain appropriate bounds on parameters a and b in the avalanche principle above. Now, (4.2) together with the definition of r(n) = r ε (n) and the Borel-Cantelli lemma yield that there exists a finite with probability one random variable M 1 (ε) such that for any n ≥ M 1 (ε), ln X n+r(n)−1 · · · X n+1 X n ≥ r(n)(γ 1 − ε). In particular, for each i < k(N ) such that m i ≥ M 1 (ε) we have As explained in Section 2 the condition (2.1) implies also that D = E X −1 Thus, in the same way as in (3.7) we have  Thus, similarly to (3.10) we obtain that on the event Γ N = {ω : Random matrix products On the other hand, (4.13) Indeed, the first limit in the right hand side of (4.13) is zero since the sum there is a fixed random variable which is finite with probability one. The second limit there is zero in view of (4.12) and the estimate m jN ≤ Applying the avalanche principle we will show that in the above case with probability one First, we estimate the avalanche principle parameters a = a(ε, N ) and b = b(ε, N ) which will depend on ε and N . Set g(n) = X n+r(n)−1 · · · X n+1 X n so that g j = g(m j ), and let s 1 (g(n)) ≥ s 2 (g(n)) ≥ ... ≥ s d (g(n)) > 0 be the singular values of g(n). The second exterior power ∧ 2 g(n) of g(n) acting on the second exterior power ∧ 2 R d of R d has the biggest singular value equal to s 1 (g(n))s 2 (g(n)). Hence gr(g(n)) = s 1 (g(n)) s 2 (g(n)) = s 2 1 (g(n)) s 1 (g(n))s 2 (g(n)) = g(n) 2 ∧ 2 g(n) (4.15) where · is the Euclidean operator norm. and, recall that γ 2 < γ 1 . Applying the large deviations bounds to H n and to ∧ 2 H n we obtain that for any ε > 0 there exists κ(ε) > 0 (which could be different from before but we denote it by the same letter) and n 3 (ε) ≥ 1 such that for all n ≥ n 3 (ε). Hence, if r(n) ≥ n 3 (ε) and n ≥ n 0 ( 2 κ(ε) ) then This together with (4.2) and (4.15) yields that P {gr(g(n)) < e (γ1−γ2−2ε)r(n) } ≤ 2e −κ(ε)r(n) . Taking into account that r(n) = [ 2 κ(ε) ln n] we conclude from (4.19) and the Borel-Cantelli lemma that there exists a finite with probability one random variable M 3 (ε) such that for any n ≥ M 3 (ε), gr(g(n)) ≥ e (γ1−γ2−2ε)r(n) . (4.20) Next, we use that by our choice of r(n) there exists n 4 (ε) ≥ 1 such that if n ≥ n 4 (ε) then X n , X n+1 , ..., X n+r(n)+r(n+r(n+r(n)))−1 is an i.i.d. tuple having the same distribution Random matrix products as Y 1 , Y 2 , ..., Y r(n)+r(n+r(n+r(n))) . Thus, similarly to the above, relying on the large deviations bound (4.1) together with the Borel-Cantelli lemma we conclude that there exists a finite with probability one random variable M 4 (ε) such that for any n ≥ M 4 (ε), X n+r(n)+r(n+r(n+r(n)))−1 · · · X n X n+1 (4.21) = g(n + r(n))g(n) ≥ e (γ1−ε)(r(n)+r(n+r(n))) .
The remaining part of the proof of Theorem 2.3 proceeds in the same way as in the i.i.d. case of Theorem 2.2 except for the arguments leading to (3.10), (4.12) and (4.13). Namely, we cannot use the Chebyshev inequality in order to obtain (3.7) and (4.10) since, in general, in the present situation E ν X n α and E ν X −1 n α may be not equal to E ν X 1 α and E ν X −1 1 α , respectively, and the latter expectations may be not equal to Eν Y 1 α and Eν Y −1 1 α where E ν and Eν are the expectations corresponding to the path space probabilities P ν and Pν of the Markov chains ξ n and Ξ n having initial distributions ν andν = ν × · · · ν, respectively. But applying Lemma 5.1 in the same way as in (5.7) we obtain by the Chebyshev inequality that P x {ln X n ≥ 2 α ln n} ≤ Px{ln Y n ≥ 2 α ln n} + 8R n −2 ≤ n −2 Ex Y n α + 8R n −2 ≤ n −2 (supz Ez Y 1 α + 8R ) ∀ x ∈ E since Ex Y n α = E P Ξ (n,x, dȳ) F (ȳ) α ≤ sup z P Ξ (z, dv) F (v) α = sup z Ez Y 1 α < ∞.