Universality for cokernels of random matrix products

For random integer matrices $M_1,\ldots,M_k \in \operatorname{Mat}_n(\mathbb{Z})$ with independent entries, we study the distribution of the cokernel $\operatorname{cok}(M_1 \cdots M_k)$ of their product. We show that this distribution converges to a universal one as $n \to \infty$ for a general class of matrix entry distributions, and more generally show universal limits for the joint distribution of $\operatorname{cok}(M_1),\operatorname{cok}(M_1M_2),\ldots,\operatorname{cok}(M_1 \cdots M_k)$. Furthermore, we characterize the universal distributions arising as marginals of a natural generalization of the Cohen-Lenstra measure to sequences of abelian groups with maps between them, which weights sequences inversely proportionally to their number of automorphisms. The proofs develop an extension of the moment method of Wood to joint moments of multiple groups, and rely also on the connection to Hall-Littlewood polynomials and symmetric function identities. As a corollary we obtain an explicit universal distribution for coranks of random matrix products over $\mathbb{F}_p$ as the matrix size tends to infinity.


Introduction
Products of random matrices have been studied as far back as the works of Bellman [3] and Furstenberg-Kesten [32] around 1960, and many works since then have connected them to other problems in pure and applied mathematics and in physics, see e.g.[1,21,35,39].Such products have two natural parameters to vary, namely the size n of the matrices and the number k of matrices in the product.Different limit regimes of n, k yield different behaviors; at one extreme, [32] and later works consider the singular values of a product M 1 • • • M k of n × n matrices over R or C, for fixed n, as the number of matrices k in the product goes to infinity.At the other, works such as [37] consider the singular values of such a product in the limit as n → ∞, k fixed.
Another direction of random matrix theory, at first sight orthogonal, concerns asymptotics of random matrices over finite fields F p .Assume that M u = (m ij ) 1≤i,j≤n is a random matrix of size n whose entries are iid uniform over F p .Then for each given n, d it is elementary to compute the probability that M u has corank d, from which one can show Quite interestingly, it turns out that the above statistics are universal: it has been shown in [49,56,57,73] that if the entries of M are iid from some nonconstant distribution which is independent of n, then the rank (or corank) statistics of M over F p matches with that of the uniform model above 1 .
Returning to matrix products, our first goal is to understand the corank statistics; although these problems are basic, we could not find any references in the literature.Assume that M u,1 , M u,2 are two independent random matrices of size n whose entries are uniform over F p .It is clear that the product matrix M u,1 M u,2 has higher probability to be degenerate.More precisely, one can use (1) to show In particular, the second probability is actually greater than the probability that M u,1 , M u,2 have coranks 0, 1 or 1, 0, which may be computed by (1).This is because both have corank 1 and still have rank(M u,1 M u,2 ) = n − 1 with non-negligible probability, which is computed in the proof of Theorem 1.4.Even in this simple example, the complexity of matrix products begins to manifest.
This complexity increases further in the more general setting of random matrices over Z. Now the key object is not only the rank but the cokernel, an abelian group viewing M ∈ Mat n (Z) as a linear map Z n → Z n .By reducing modulo p, results on the cokernel naturally yield results on coranks of matrices over F p .For matrices with iid entries from the large class of "α-balanced" distributions described shortly, the distribution of the p-Sylow subgroup Cok(M )[p ∞ ]--often referred to equivalently as the p ∞ -torsion--is universal.Namely, it was shown in [73,Corollary 3.4] that it has the so-called Cohen-Lenstra distribution, for any finite abelian p-group G, where here and later we use the q-Pochhammer notation (a; q) ∞ := i≥0 (1 − aq i ).
This universality result was motivated by the Cohen-Lenstra heuristics for the distribution of class groups of quadratic imaginary number fields [20,26].
In light of the above it is natural to ask about cokernels of random matrix products over Z, but very little work has been done.To our knowledge cokernels of matrix products were first considered in [67], which studied the related setting of random matrices over the p-adic integers Z p in the regime of fixed matrix size n and growing number of products k.The present work considers random matrix products over Z in the opposite regime where n grows and k is fixed 2 .We seek to answer the following natural questions for cokernels, together with their analogues for coranks over F p : (Q1) What is the n → ∞ limiting distribution of Cok(M 1 • • • M k ), where M i ∈ Mat n (Z) are iid?Are the results universal, insensitive to the distribution of the entries of M i ?
1.1.Main results.The existing single-matrix universality results of [73] are proven for matrices with iid entries satisfying the following condition, which makes it a natural candidate to probe universality for products as well.
Definition 1.Given a real number α ∈ (0, 1/2], we say a random integer ξ is α-balanced if for every prime p we have max r∈Z/pZ Our main results generalize [73,Corollary 3.4] to products of matrices, answering (Q1) and (Q2) above.We begin with the simpler (Q1).In what follows, for a finite set P of primes we write where we recall that G[p ∞ ] is the p-Sylow subgroup of G.
Theorem 1.1.Let M 1 , . . ., M k be k independent random integral matrices with entries iid copies of an α-balanced random integer ξ.Let B be any finite abelian group, and let P be a finite set of primes including all those that divide |B|.Then For (Q2), we note first that as We also define the notation Sur(G, H) := {ϕ : G → H surjective} and similarly for Inj(G, H).
Theorem 1.2.For matrices under the same assumptions as in Theorem 1.1, finite abelian groups B 1 , . . ., B k , and P a finite set of primes including all those which divide every |B i |, 1 ≤ i ≤ k, we have where we take B 0 = 0.
Remark 1. Theorem 1.2 reduces to Theorem 1.1 by a simple computation.We also note that, since ) and M j = M T j in distribution for all j, it is immediate that Theorem 1.2 holds with Cok(M The k = 1 case of either above result yields [73,Corollary 3.4], and our results can be seen as a dynamical analog of this one.Within one matrix, the evolution of the cokernel after exposing each new row and column of the matrix was previously studied by the first author and Wood [57,58], while in this current model we study the cokernel evolution by multiplying the matrices one by one.An interesting related body of work [8,15,16,43] studies the joint cokernel distributions of matrices obtained from different polynomials of a 2 While we state our results over Z in this section, we simultaneously obtain results on matrices over Zp, see Theorems 8.2 and 9.2 in the body of the paper.single random matrix, and it is natural in light of the above to consider joint cokernel distributions of more complicated multivariate polynomials in several random matrices. Recall that for cokernels of a single matrix, the distribution (2) features weights inversely proportional to the number of automorphisms.It is a general heuristic that distributions on algebraic objects occuring in these contexts should feature probabilities inversely proportional to the number of automorphisms, for the appropriate notion of automorphism, the reason essentially being the orbit-stabilizer theorem-see for instance [71,Section 5].Our next result gives such an interpretation for the distributions appearing above.
As mentioned, the groups Cok(M 1 • • • M j ) come with additional structure of a sequence of maps There is a natural notion of automorphism of such a sequence G k ↠ . . .↠ G 1 of abelian groups with maps between them, namely an element of n i=1 Aut(G k ) for which the appropriate diagram commutes-see Section 10.This provides the right setting to interpret the distribution of Theorem 1.2, and hence Theorem 1.1 as well, as the following result shows.
Theorem 1.3.Let k ∈ Z ≥1 and P be a finite set of primes.Then there is a well-defined probability measure P .
Furthermore, under the above distribution, the marginal joint distribution of the isomorphism types of G 1 , . . ., G k (after forgetting the data of the maps between them) is the limit distribution of Theorem 1.2.
We refer to Section 10 for more detail.The closest previous work we are aware of is [7], which considers certain random short exact sequences of abelian p-groups as models for the distribution of exact sequences relating Selmer and Tate-Shafarevich groups of elliptic curves.However, these distributions are supported on split short exact sequences, so the maps between them do not add information beyond the isomorphism types of the groups, in contrast to our case.Nonetheless, such heuristics for groups with maps between them motivate the general problem of developing technology to prove universality for joint distributions of multiple groups, and we hope that the present paper may conversely help to spur more work on heuristics for sequences of groups in number theory.It would certainly be natural and interesting to prove a generalization of the universality result Theorem 1.2 which incorporates the extra data of the sequence of maps ( 6), but we leave this question to the future as our methods are not currently adapted to it.
In another direction, by taking everything modulo p, Theorem 1.2 and a linear algebra computation imply the following corollary on the joint distribution of ranks of matrix products over F p .
Theorem 1.4.Let p be a given prime.Let ξ be a nonconstant random variable valued in F p , r 1 , . . ., r k ∈ Z ≥0 , and M 1 , . . ., M k be independent random elements of Mat n (F p ) with entries iid copies of ξ.Then 1.2.Parallels with complex random matrices.Most universality results of random matrices in the literature concern the spectral distributions, most notably the Wigner semi-circular law, the quarter-circle law, and the circular law.We recite below the last two laws for the model closely related to ours.
Theorem 1.5.Assume that M = (m ij ) is a random matrix where m ij are iid copies of a real-valued random variable ξ of mean-zero and variance one.
Theorem 1.6.Assume that M 1 , . . ., M k are independent and their entries are iid copies of a real-valued random variable ξ of mean zero, variance one, and bounded • (See [2,4,10,11,54,77] for the Gaussian case, and [37, Theorem 3.1] for the general case) Let x 1 , . . ., x n be the singular values of M , then in probability where we refer the reader to [2] for more discussion on the free multiplicative convolution.• (See for instance [37,59,60]) If z 1 , . . ., z n are the eigenvalues of M , then in probability The groups Cok(A) in the discrete setting are in fact structurally analogous to singular values in the continuous setting.In complex (or real) random matrix theory, singular value decomposition tells that for any A ∈ Mat n (C) there exist unitary U, V ∈ U (n) (or, if A is real, O(n)) so that U AV is diagonal with nonnegative reals on the diagonal-the singular values.Analogously, for A ∈ Mat n (Z) there exist U, V ∈ GL n (Z) for which U AV = diag(a 1 , . . ., a n ) is diagonal with nonnegative integers on the diagonal.This result is known as Smith normal form, and the diagonal entries furthermore determine the isomorphism type of the cokernel by Z/a i Z.
Theorem 1.1 can thus be viewed as a cokernel analog of the above results concerning the singular values and eigenvalues of a product of k independent iid matrices.However, an important difference between the two settings is that the random empirical spectral measures above limit (with scaling) to deterministic measures, while in the cokernel setting there is no rescaling and the limit object is a random group or collection of integers a i .
A related setting in classical random matrix theory, where the limits are not deterministic, is that of local limits.Singular values of a random A ∈ Mat n (C) form a random collection of points on R ≥0 , and by zooming in at the scale of individual singular values as n → ∞, one may obtain a random collection of infinitely many points.The limit object differs depending on whether one zooms in close to the largest singular value (the soft edge), close to the smallest singular value (the hard edge), or in between the two (the bulk ), and is a random collection of points with a rightmost point, leftmost point, or infinitely many points in both directions in each case respectively.For singular values of a product of a fixed number k of complex Gaussian matrices, these scaling limits were computed for the bulk and soft edge in [46].Unlike our Theorem 1.1 and Theorem 1.6 above, the limit in [46] does not depend on the number of products, and matches the one for a single matrix.However, at the hard edge the limit does depend on the number of products: the n → ∞ limiting joint distribution of singular values of products [42], and for other explicit cases outside the Gaussian on the hard edge limit was computed in [41,40].The work [42] is probably the closest complex analogue of our Theorem 1.2.
1.3.Methods, moments and Hall-Littlewood polynomials.Previous works such as [57,58] and especially [72,73] show that the law of Cok(M ) converges to some universal distribution (ii) Compute the asymptotics of the moments E[# Sur(Cok(M ), H)] and check they agree.
(iii) Show that the moments determine the distribution.
For Theorem 1.1, we follow exactly this strategy: the computation (i) is in Section 7, the asymptotics (ii) are in Section 3, and for (iii) we slightly strengthen existing moment determinacy results of [72] and combine these ingredients to prove the theorem in Section 8.For Theorem 1.2, however, we must introduce an appropriate notion of joint moments of a sequence of random groups.We are able to generalize (ii) and (iii) to joint moments see Theorems 4.1 and 9.1.We find it helpful from an expository standpoint to prove Theorem 1.1 separately beforehand, as many ingredients are shared, and for this result (ii) and (iii) correspond to Theorems 3.2 and 8.1.In both cases, we rely heavily on an existing analytic result [72,Theorem 8.2] proven in a related context.It is also worth noting that a generalization of moment determinacy (iii) to joint moments of multiple groups as defined by (7) was carried out independently in [44], which appeared shortly after the first posting of the present paper, and applied to different joint distributions.
For (i), the candidate for the joint distribution of cokernels comes from previous work [67] (specifically Corollary 3.4) in the setting of random matrices over the p-adic integers Z p .The analogous cokernel joint distribution corresponds to the distribution in Theorem 1.2 when P = {p}.However, in [67] it was phrased in a nontrivially equivalent manner in terms of Hall-Littlewood polynomials, certain symmetric polynomials in n variables which encode harmonic analysis on the groups GL n (Z p ) ⊂ GL n (Q p ) and (equivalently) combinatorics of abelian p-groups, see [48, Chapters II, III, V].Previous to [67], Hall-Littlewood polynomials had been connected to the Cohen-Lenstra measure in [45], following their connection to an essentially equivalent measure arising in random matrix theory over finite fields in [27] (see also [28]).Recent applications to p-adic random matrix theory include [23,29,31,67,68,69].
After [67,Corollary 3.4], the subsequent work [68,Theorem 1.4] further gave explicit elementary formulas for this distribution, not featuring Hall-Littlewood polynomials.However, a more structural interpretation of these formulas was still lacking.Such an interpretation is furnished by the explicit group-theoretic formulation afforded by our Theorem 1.3, finally placing the distribution in the context of similar "1/# Aut" distributions which have appeared previously in integer and p-adic random matrix theory.
We have phrased our results in the group-theoretic language above, but Hall-Littlewood tools continue to be useful in our computations for (i) of the moments of the limiting distributions, in Section 7. To this end, Section 6 states many basic results translating between Hall-Littlewood and group-theoretic notation, and some purely group-theoretic results, all of which are not difficult to derive from [47] but many of which we are not aware of in the random matrix literature.We hope that the dictionary we give there, between 3 Depending on the class of M ; symmetric matrices [72] or rectangular matrices [73] yield different limiting laws for the cokernels.
Hall-Littlewood formulas and moments of abelian groups and maps between them, will be useful in the field beyond our matrix product setting.We note also that the analogies between cokernels and singular values mentioned above are somewhat cleaner with cokernels of p-adic-rather than integral-matrices, with structurally identical formulas appearing in both settings in terms of either Hall-Littlewood polynomials or the analogous special functions on the complex side, see [67] and the references therein.
For (ii), as mentioned, our main contributions are Theorem 3.2 and Theorem 4.1.Our proofs of these results focus on the evolution of "code" and "non-code" vectors after the application of each random matrix M i in the product.Roughly speaking, for a code vector v, the vector M i v is close to being a random uniform vector, and hence the main contribution in the moment computation comes from these vectors.For the non-code vectors v, the laws of M i v are intractable, but fortunately we can avoid this by relying on the fact that most vectors are codes.In a way, this approach is similar to [55], where similar dynamical aspects of "structured" and "non-structured" vectors were studied.The evolution in the joint distribution setting is more complicated, as one has to keep track of many code and non-code vectors at the same time.As such, for expository purposes we will present the simpler case k = 2 first, and then use induction to proceed further.
Lastly, for (iii) our contributions are Theorem 8.1, Theorem 9.1 and Theorem 9.3.Although our approach mainly follows [72,Theorem 8.3], these results, especially Theorem 9.1 and Theorem 9.3 require some nontrivial modifications as our focus is on the joint moments, where the growth rates are not straightforward to check.We hope that our joint moment comparison result, together with the developments in [72,73] (see also [74]), will provide useful tools to prove universality.1.4.Plan of paper.In Section 2 we state many basic definitions and results from [73] pertaining to the moment method for abelian groups.In Sections 3 and 4 we compute the moments and joint moments of matrix products, needed for Theorems 1.1 and 1.2 respectively (while the latter theorem implies the former, for simplicity of exposition we usually prove needed results for the former first).General background on Hall-Littlewood polynomials and processes is in Section 5, and we relate it to abelian p-groups in Section 6.We use this to compute the moments and joint moments of the limit distributions of Theorems 1.1 and 1.2 in Section 7. In Sections 8 and 9 we combine these ingredients to prove Theorems 1.1 and 1.2 respectively, along with their analogues for Z p .In Section 10 we set up and prove Theorem 1.3.Finally, in Section 11 we reduce to F p and prove Theorem 1.4.1.5.Acknowledgements.We thank Melanie Matchett Wood for helpful discussions and for asking about interpretations of the distribution of [67,Corollary 3.4] in terms of automorphisms, and the anonymous referees for many helpful questions and comments.RVP also thanks Alexei Borodin for discussions and feedback, Alisa Knizel for asking the same question about automorphisms, and Oron Propp for helpful discussions on characterizing automorphism classes of sequences of modules.HN was supported by NSF CAREER grant DMS-1752345, and RVP was supported by an NSF Graduate Research Fellowship under grant #1745302.

Supporting lemmas
Throughout this section fix a ∈ Z >1 and set R = Z/aZ.Let V = R n with standard basis v i , 1 ≤ i ≤ n.For σ ⊂ [n] we denote by V σ c the submodule generated by {v i : i ∈ σ c }. Throughout the paper, to declutter notation we will write (x 1 , . . ., x n ) ∈ R n for usual (column) vectors, and similarly for vectors in e.g.G n where G is a group, rather than using the notation (x 1 , . . ., x n ) T .Definition 2. Given real α ∈ (0, 1/2], we say an R-valued random variable ξ is α-balanced if for every prime p|a we have max r∈Z/pZ Clearly if ξ is a Z-valued α-balanced random variable as in Definition 1, then ξ (mod a) is an α-balanced R-valued random variable as in Definition 2. Hence the random matrices of Theorems 1.1 and 1.2, reduced modulo a, have iid α-balanced entries in R. From this section through Section 4 we will work in this setting, and work with abelian groups G with exponent dividing a (i.e.R-modules).Most of the results below are from [73].
2.1.Codes.Definition 3. Given w ≤ n, we say that F ∈ Hom(V, G) is a code of distance w if for every σ ⊂ [n] with |σ| < w we have Sometimes it is convenient to identify F with the vector (F (v 1 ), . . ., F (v n )) ∈ G n , and we will usually abuse notation and view F as a vector rather than a map.In particular, if X = (x 1 , . . ., x n ) ∈ R n is a vector, we write ⟨F, X⟩ := n i=1 x i F (v i ); note this is not a usual dot product because (F (v 1 ), . . ., F (v n )) ∈ G n and (x 1 , . . ., x n ) ∈ R n live in different spaces, though the formula is the same.If M is an n × n matrix with entries in R, then for any R-module G, M defines a linear map G n → G n by usual matrix multiplication, and we write M F for the image of the vector (F (v 1 ), . . ., F (v n )) ∈ G n under this map.
It is convenient to work with codes because the random walk S k = k i=1 x i F (v i ) (in discrete time indexed by k = 1, 2, . . ., n) spreads out in G very fast, as the following lemma shows.Lemma 2.1.[73, Lemma 2.1] Assume that x i ∈ R are iid copies of ξ satisfying (8).Then for any code F of distance δn and any g ∈ G, where In what follows, if not specified otherwise, X is always understood as the random vector (x 1 , . . ., x n ) where x i are iid copies of ξ satisfying (8) as in Lemma 2.1.
Using the above result, it is not hard to deduce the following matrix form.
Lemma 2.2.[73, Lemma 2.4] Assume that the entries of M of size n are iid copies of ξ satisfying (8).For code F of distance δn, for any vector A ∈ G n where K, c depend on a, G, α and δ.
Remark 2. In our applications, G will always be a fixed group (or perhaps summed over a finite collection of groups), so the dependence of the constants on G which we allow in Lemma 2.2 and similar results does not create any issue with our asymptotics.
We will also need the following useful result.
Lemma 2.3.Let δ be sufficiently small.Assume that F ∈ Hom(V, G) is a code of distance δn.Assume that the entries of the matrix M of size n are iid copies of ξ satisfying (8).Then for any H ≤ G where c ′′ depends on a, G, H, δ, α.
Proof of Lemma 2.3.First, by Lemma 2.2, for each A a code of distance δn of H we have It remains to count the number of codes of distance δn in H.
Claim 2.4.Let C(H) be the number of codes (defined as F (V )) of distance δn in H.We have , where K ′ depends on H and c δ depends on δ.
Proof.Let g 1 , . . ., g n be chosen independently uniformly from H.For each I ⊂ [n] an index set of size n − ⌊δn⌋, and for each H ′ a proper subgroup of H, let E I,H ′ be the event that g i ∈ H ′ for all i ∈ I. Then clearly P(E I,H ′ ) = (|H ′ |/|H|) |I| .Taking a union bound over the choices of I ∈ [n] n−⌊δn⌋ and over H ′ < H we obtain a bound Since we assume that δ is sufficiently small, the above is bounded by K ′ exp(−c δ n) for some K ′ and c ′ δ as in the statement.□ To complete the proof of Lemma 2.3 we have 2. Non-codes.Next, for non-code F , the random walk ⟨F, X⟩ does not converge quickly to the uniform distribution on G. However it is likely to be uniform over the subgroup where the restriction of F is a code.
In all results introduced below we remark that F is not necessarily a surjection.
Definition 5.For a real δ > 0, the δ-depth of F ∈ Hom(V, G) is the maximal positive integer D such that there exists σ ⊂ [n] with |σ| < ℓ(D)δn such that So roughly speaking the δ-depth measures the maximum of |G/F (V σ c )| over σ of size significantly smaller than δn.The depth is large if there exists such σ where F (V σ c ) is a small subgroup of G.The reason for this definition of depth is the following lemma, which shows that depth encodes how much one has to restrict F to obtain a code.
Proof.Suppose for the sake of contradiction that But this means that D satisfies the condition in the definition of depth, and is larger than D, contradicting maximality, which completes the proof.□ Lemma 2.6.[73, Lemma 2.6] The number of F ∈ Hom(V, G) with depth D is at most where K depends on a and G.
We remark that the assumption above is automatically true if F is a surjection.This result is different from [73,Lemma 2.7] in that g is any element instead of just 0.
Proof.We follow the proof of [73, Lemma 2 We write for any fixed values of x i , i ∈ σ \ i 0 we have using the randomness of x i0 that Furthermore, by Lemma 2.5 F (V σ c ) is a code of distance δn over H. Hence Putting together we have

□
Using this result, we can obtain similar bound in matrix form, the same way [73, Lemma 2.8] was deduced from [73, Lemma 2.7].
|G| n , where K depends on a, G, α and δ.
Proof.By Lemma 2.7, where we have Taylor expanded the logarithm inside the first exponential and kept only the first term (the rest are also negative), and K is the maximum over n ≥ 1 of e n(|G|/D) exp(−αδn/a 2 ) .□ To complete this section we introduce two more definitions that will be crucial to our work.
Definition 6.For a given k ≥ 0 we let n k (G) denote the number of sequences of nested subgroups For projections onto direct summands, when Definition 7.For a given k ≥ 0 and given k finite abelian groups and in general for each Finally, note also that as 3. Counting surjections for Theorem 1.1 Let a, R, V be as in the previous section.Throughout the section we write Hom(A, B) and Sur(A, B) for the set of homomorphisms and surjective homomorphisms, respectively, from A to B.
3.1.Set-up.We know from [73] that to understand the distribution of Cok(M ), it suffices to determine the "moments" 4 of Cok(M ), i.e. the quantities E[# Sur(Cok(M ), G)] for each finite abelian group G.To investigate each such moment, we recognize that each such surjection lifts to a surjection V → G and so we have where we view F as a column vector F = (F (v 1 ), . . ., F (v n )) ∈ G n .By the independence of columns, we have where X 1 , . . ., X n are rows of M .So in the case of a single matrix, ones must estimate these probabilities P(F (X j ) = 0), which give the desired moments.In our situation we have random matrices M 1 , M 2 , . . ., M k , and want to study Recall n k (G) from Definition 6.Our key result in this section is a generalization of Lemma 2.2 and Lemma 2.8 (though in what comes later we will not use the result itself as stated below, but actually use several intermediate steps of its proof).
Proposition 3.1.With the same assumption as in Theorem 1.1, the following holds for δ sufficiently small: there exist c, K depending on k, α, G, a, δ such that Proof.In what follows K and c may vary, and the implied constants in O(.) are allowed to depend on k, α, G, a and δ.
We prove (i) and (ii) together by induction on k, assuming both (i) and (ii) hold for k − 1 as the inductive hypothesis.When k = 1, (i) and (ii) follow from Lemma 2.2 and Lemma 2.8 respectively.Next we consider k ≥ 2.
Codes.We first prove (i) by working with F a code of distance δn.
Let H k−1 be a subgroup of H k = G.We consider the event (in the σ-algebra generated by For the first case, we apply the induction hypothesis for (i) to obtain For the second case, we also apply the induction hypothesis for (ii) to obtain For the first sum, by Claim 2.4, and then by Lemma 2.3 and the inductive hypothesis for (i) we have For the second sum, for each D k−1 we apply Lemma 2.6 and Lemma 2.2 to bound |G| n (13) and apply the inductive hypothesis for (ii) to bound Combining ( 13) with ( 14) yields where for the second line we recall that δ was chosen sufficiently small and n is sufficiently large.

Summing over divisors D
Summing over H k−1 ≤ H k we thus obtain completing the estimates for codes.
Non-codes.We next prove (ii) by working with Similarly to the previous part, we again compute the probability that M k F spans H k−1 in the two possible ways: (1) (2) M k F is not a code of distance δn, and hence has δ-depth For the first case, the probability with respect to M k is bounded by by bounding the number of codes by |H k−1 | n and applying Lemma 2.8.Hence, by induction and by the independence of M 1 , . . ., M k Summing over the subgroups H k−1 , we obtain For the second case (2), the probability with respect to M k , by Lemma 2.6 and Lemma 2.8, is bounded by Hence, by induction (applied to provided that δ was chosen sufficiently small and n is sufficiently large.
Summing over D k−1 a divisor of |H k−1 |, and then over the subgroup H k−1 proving our upper bound for non-codes F .□ Using the proof of Proposition 3.1 above we obtain Theorem 3.2 (Asymptotic moments of matrix products).Let a ≥ 2 and R = Z/aZ, G be any finite abelian group whose exponent is divisible by a, and M 1 , . . ., M k be random matrices in Mat n (R) with iid α-balanced entries.Then for some K, c depending on k, α, G, a.
Proof of Theorem 3.2.By ( 12) it suffices to show that From ( 15) and Claim 2.4, we sum over F as codes of distance δn in G to obtain From ( 16) and Lemma 2.6, for each D k as divisor of |G|, summing over non-codes provided that δ was chosen sufficiently small and n is sufficiently large.
Also, from (17) and Lemma 2.6, again as δ is small and n is sufficiently large.Summing (20), (21) over all D k ||G|, together with (19) we obtain (18) as claimed.□ We will not use the following result later, but include it because it demonstrates how moment bounds for finite rings can imply them for infinite ones such as Z and Z p .
Corollary 3.3.Let M 1 , . . ., M k have iid entries in Z p which are not constant modulo p. Then for any p-groups G we have ≤ Ke −cn for some K, c depending on k, α, G, where α ∈ (0, 1/2] is such that the matrix entries modulo p are α-balanced.
Proof.Let p L be the exponent of G. First note that for any abelian p-group H, as any surjection from H to G automatically annihilates p L H. Note also, with the notation M : The result now follows from Theorem 3.2 applied with R = Z/p L Z and matrices M1 , . . ., Mk , which are α-balanced since the entries are not constant modulo p. □

4.
Counting joint surjections for Theorem 1.2 Recall that R = Z/aZ where a is a positive integer.Let G 1 , . . ., G k be finite abelian groups whose exponents divide a.For matrices Recall m k (G 1 , . . ., G k ) from Definition 7. Our main goal for the proof of Theorem 1.2 is the following counting formula for the joint surjections.Theorem 4.1 (Asymptotic joint moments of matrix products).Let M 1 , . . ., M k be independent random elements of Mat n (R) with iid entries which are copies of some α-balanced ξ.Let G 1 , . . ., G k be finite abelian groups whose exponents divide a.We have for some K, c depending on k, α, G i , a, δ.
As before, this yields a corresponding moment result for p-adic matrices.
for some K, c depending on k, α, G i .
Proof.Argue as in Corollary 3.3 by letting p L be the maximum exponent of G 1 , . . ., G k , reducing matrices modulo p L , and applying Theorem 4.1.□ 4.1.Multidimensional setting.In this part we will give some preparation for the proof of Theorem 4.1.
Recall that we are interested in the event that Hence it is natural to consider a more general related problem of determining, for some maps , what is the probability of the joint events M F ′ 1 = 0, . . ., M F ′ k = 0 for a matrix M with iid α-balanced entries.Definition 8. We say that F 1 ∈ Hom(V, H 1 ), . . ., F k ∈ Hom(V, H k ) are a joint code of distance δn with respect to Remark 3. To avoid a potential point of confusion: if (F 1 , . . ., F k ) are a joint code, it is not in general true that the F i are individually codes with respect to H i -they do not even have to be surjections, since the definition of joint code is with respect to some subgroup H ′ .
Our first result is Lemma 2.1 restated under the "multidimensional" setting.Lemma 4.3.Let F 1 , . . ., F k be a joint code of distance δn with respect to H ′ ≤ k i=1 H i , and X a random vector in R n with iid α-balanced entries.Then for any (h 1 , . . ., h k ) ∈ H ′ , The reason we have this projection condition is that later G is generated by By assumption F i : V → G i is surjective, and hence the projection onto G i of the group above is G i itself.
We note that the sets H i from Definition 7 belongs to G i;i+1,...,k .
Remark 4. Since projections π : G → H appear frequently in this section, we will use the notation π(F ) for the vector in H n given by (π We recall from Definition 5 that the δ-depth of F is the maximal D such that there exist σ and H ≤ G such that If there is no such D, then the depth is 1. Lemma 2.6 applied to G yields Lemma 4.4 (Number of tuples with given depth).The number of (F 1 , . . ., F k ) ∈ Hom(V, G) with depth D is at most where K depends on a, G.
Lemma 4.6.With the same assumption as in Lemma 4.5, for any A ∈ G ′ n and n × n matrix M 1 with iid α-balanced entries, , where K depends on a, G, α and δ.Now we turn to Theorem 4.1.As in the previous section, in what follows the implied constants in O(.) are allowed to depend on k, α, G, a, δ.For expository purposes we focus on k = 2 first.4.2.Proof of Theorem 4.1 for k = 2.We have The remainder of the proof consists of analyzing this sum; let us now fix surjections For any fixed matrix M , the image of (F and since F 1 is a surjection, we must have π 1 (G ′ ) = G 1 .We will consider the random map (F 1 , M 2 F 2 ) : V → G (recall F i are fixed but M 2 is random), and we note that the events For each such G ′ , we further partition the event Here δ is a sufficiently small constant, to be fixed later.Let be the corresponding contributions to the sum in (25): and We now analyze these two contributions to (26), showing that the contribution S 1 is the main term while that of S 2 is asymptotically small.Here and below, we use G ′ as shorthand for a sum over all . By Lemma 4.3 we have where c depends on a, δ, α, and G (a priori c depends on G ′ , but we simply take the worst constant c over the finitely many choices of G ′ , which hence depends only on G).Hence It remains to evaluate the probability with respect to M 2 .For this we will sum over F 1 as well.We will divide into two cases.
(i) Main term: summation over codes F 1 and codes F 2 .We first consider the case when F 2 is a code of distance δn in G 2 .Claim 2.4 applied to the group G ′ immediately yields the following.
where the constants are allowed to depend on δ.Now for each such joint code of distance δn F , because F 2 is a code of distance δn in G 2 , by Lemma 2.2 Summing over F 1 , noting that π 1 (F ) is a code of distance δn over G 1 , we thus obtain We then sum over codes F of distance δn to obtain Hence in total we have, for a fixed code of distance δn F 2 over G 2 Now we sum over codes of distance δn F 2 (using Claim 2.4), and then over G ′ to obtain that Before moving to the next estimate, we record below some other useful results by summing (28) over codes F 2 , F1,F2 codes of distance δn and F1,F2 codes of distance δn (ii) Error term: summation over codes F 1 and non-codes F 2 .Assume that F 2 is not a code of distance δn over G 2 .Then it has some δ-depth D 2 > 1 for some D 2 |G 2 |.We will shortly sum over all such D 2 and F 2 , but for now note that for any fixed D 2 and F 2 of δ-depth D 2 , by Lemma 2.8 Sum over codes F , over F 1 , and then F 2 we obtain where we used Lemma 2.6 to enumerate F 2 , and in the last bound assumed that n is large and δ is sufficiently small so that the exponential growth of the δ-dependent factors is not too large.

Combining the above with (27) yields
F1code over G 1 F2∈Sur(V,G2) non-code In summary, we have shown the following For later use, we record here another useful result by putting (30) and (31) together b. Analysis of S 2 .Our treatment for the case of S 2 is similar to (ii) of a.We first partition S 2 into contributions corresponding to (F 1 , M 2 F 2 ) of given depth, Since the first two sums in (33) are finite, for the contribution of S 2 it suffices to fix G ′ and D, and show that First, by using Lemma 4.6 we can bound (i) Summation over F 1 ∈ Sur(V, G 1 ) and codes of distance δn F 2 over G 2 .We first fix code of distance δn F 2 , and F ′ of depth D over G ′ .By Lemma 2.2, Summing over codes of distance δn F 2 over G 2 using Claim 2.4, where c ′′ = min{c, c ′ }.
As a consequence, by summing over the non-codes F ′ of depth D over G ′ we obtain More importantly, by summing over the non-codes F ′ of depth D over G ′ , by Lemma 4.4 and (34) we obtain where again in the last bound we require that δ is suffiently small and n is sufficiently large.
(ii) Summation over F 1 ∈ Sur(V, G 1 ) and non-codes F 2 ∈ Sur(V, G 2 ).The treatment here is also similar to a (ii).Indeed, as F 2 is not a code of distance δn over G 2 , it has some δ-depth D 2 > 1 for some D 2 |G 2 |.By Lemma 2.8 As a consequence, summing over F ′ of depth D, over F 1 , and then F 2 (by summing over D 2 ) we obtain F1∈Sur(V,G1) F2∈Sur(V,G2) non-code Also, completing the treatment in this case.
Finally, we remark from ( 35) and (36) that 4.3.Proof of Theorem 4.1 for general k.We will proceed as in the case k = 2.We have For each subgroup G ′ ∈ G 1;2,...,k , we consider the case that ( (2) They are not joint, hence have some δ-depth D > 1.
Motivated by (32) and (37) in the proof of the k = 2 case of Theorem 4.1, we will show the following key result.
Proposition 4.8.With the same assumption as in Theorem 1.1 and δ sufficiently small, for each subgroup G ′ ∈ G 1;2,...,k the following inequalities hold with the randomness from M 2 , . . ., M k : and where c and the implied constants are allowed to depend on k, α, δ, G 1 , . . ., G k and a.
We return to the proof of this result after proving Theorem 4.1.
Proof of Theorem 4.1, assuming Proposition 4.8.By the above discussion, we must show For any G ′ , we have by Lemma 4.3 (where "code" is understood as "code of distance δn") that where the last equality follows from the first part of Proposition 4.8.
Summing over G ′ and using (9) The sum over non-codes can be treated as follows: where we used Lemma 4.6 in the first bound, the second part of Proposition 4.8 in the second bound, and assumed that δ is sufficiently small for the final step.□ Proof of Proposition 4.8.Similarly to the proof of Proposition 3.1 we will induct on k.The base case k = 2 are ( 32) and (37) in the previous proof of the k = 2 case of Theorem 4.1 in Subsection 4.2.We induct on both ( 38) and ( 39) simultaneously, i.e. we need the k − 1 case of both ( 38) and ( 39) to prove the k case of each.
Proof of (38).As we are working with joint, since π 1 (G ′ ) = G 1 by the definition of G ′ , it follows automatically that F 1 is a code of distance δn in G 1 .Recall that C(G ′ ) denotes the set of codes in (G ′ ) n .By Claim 4.7 we know that this set has size (1 + O(exp(−cn)))|G ′ | n .
This motivates us to fix a group G ′′ ∈ G 2;3,...,k with G ′′ ≥ π {2,...,k} (G ′ ), and consider the probability that span and is a code of distance δn in G ′′ .By the inductive hypothesis for (38), For short, let E 3,...,k,G ′′ be the event (depending on F i ) By Lemma 4.3, We therefore have Fi∈Sur(V,Gi),2≤i≤k where we used the induction hypothesis (41) in the very last estimate, and the fact that By Claim 2.4 there are (1 + O(exp(−cn)))|G ′ | n codes F ′ .Summing the above over them, we obtain It remains to show that the remaining part of LHS (38), corresponding to the case when (F is not a code in its image, is small.
Proof of (39).For each spanning . This probability again depends on whether (F 2 , M 3 F 3 , . . ., M 3 . . .M k F k ) forms a code or not.Hence we will divide into two cases.
Our treatment is similar to the case (i) in the proof of (39), except that the sum over F ′ is small as the number of non-codes is small.
This motivates us to fix a group G ′′ ∈ G 2;3,...,k with G ′′ ≥ π {2,...,k} (G ′ ), and consider the probability that span and is a code of distance δn in G ′′ .By the inductive hypothesis, For short, let E 3,...,k,G ′′ be the event (depending on F i ) By Lemma 4.3, We therefore have where we used the induction hypothesis (41) in the very last estimate.
As there are at most O( summing the above over them we obtain It remains to show the case when (F 2 , M 3 F 3 , . . ., is not a code in its image.
Summing over D 2 and over G ′′ we obtain F2,...,F k provided that δ is sufficiently small.
Finally, again as there are at most O( Together with (45), this estimate completes the proof of (39).

Hall-Littlewood polynomial background
This section contains standard definitions and results on Hall-Littlewood polynomials.We also introduce the ring of symmetric functions, which may be thought of as a ring meant to model symmetric polynomials in infinitely many variables, and is needed to obtain measures such as the Cohen-Lenstra measure in the Hall-Littlewood process formalism.This material may be found in [48, Chapter III], and the setup of Hall-Littlewood processes which may be found for instance in [9]; much of the material below is quoted with little modification from [66, Section 2].
We denote by Λ n the ring C[x 1 , . . ., x n ] Sn of symmetric polynomials in n variables x 1 , . . ., x n .It is a very classical fact that the power sum symmetric polynomials p k (x 1 , . . ., x n ) = n i=1 x k i , k = 1, . . ., n, are algebraically independent and algebraically generate Λ n .For a symmetric polynomial f , we will often write f (x) for f (x 1 , . . ., x n ) when the number of variables is clear from context.We will also use the shorthand One has a chain of maps where the map Λ n+1 → Λ n is given by setting x n+1 to 0. In fact, writing Λ n for symmetric polynomials in n variables of total degree d, one has n−1 → • • • → 0 with the same maps.The inverse limit Λ (d) of these systems may be viewed as symmetric polynomials of degree d in infinitely many variables.From the ring structure on each Λ n one gets a natural ring structure on Λ := d≥0 Λ (d) , and we call this the ring of symmetric functions.An equivalent definition is Λ := C[p 1 , p 2 , . ..]where p i are indeterminates; under the natural map Λ → Λ n one has p i → p i (x 1 , . . ., x n ).
Each ring Λ n has a natural basis {p λ : λ 1 ≤ n} where Another natural basis, with the same index set, is given by the Hall-Littlewood polynomials.Recall the q-Pochhammer symbol defined as (a; q) n := Definition 10.The Hall-Littlewood polynomial indexed by λ ∈ Y n is where σ acts by permuting the variables.We often drop the '; t' when clear from context.
Definition 11.For λ ∈ Y, we define the dual Hall-Littlewood polynomial by These similarly are consistent under maps Λ n+1 → Λ n and hence define symmetric functions.
Because the P λ form a basis for the vector space of symmetric polynomials in n variables, there exist symmetric polynomials P λ/µ (x 1 , . . ., x n−k ; t) ∈ Λ n−k indexed by λ ∈ Y n , µ ∈ Y k which are defined by The definition of Q λ/µ is exactly analogous.As with non-skew Hall-Littlewood polynomials, the skew versions are consistent under the maps Λ n+1 → Λ n and hence define symmetric functions in Λ, which we also denote by P λ/µ and Q λ/µ .
Hall-Littlewood polynomials and symmetric functions satisfy the skew Cauchy identity, upon which most probabilistic constructions rely.For polynomials in a finite number of variables, it reads For later convenience we set The second equality in (49) is not immediate but is shown in [48].The RHS of (49) makes sense as formal series in a suitable completion of Λ ⊗ Λ, and (48) generalizes straightforwardly with the skew P and Q functions in x and y replaced by corresponding elements of Λ ⊗ Λ.
In particular, if µ = ∅ we have Let us now set the parameter t to be real.We would like to define probabilities by substituting real numbers for the variables of Hall-Littlewood polynomials.The analogue of 'specializing infinitely many variables' for a general symmetric function is captured as follows.
Definition 12.Given a sequence of real numbers α 1 ≥ α 2 ≥ . . .≥ 0 with i α i < ∞, the pure alpha specialization with parameters α = {α i } i≥1 is the homomorphism Λ → C defined on the generators p k by For a general symmetric function f ∈ Λ, we write f (α) for the image of f under this homomorphism.
When α has only finitely many (say, k) nonzero α i parameters, f (α) is just the symmetric polynomial Im(f ) ∈ Λ k with x 1 = α 1 , . . ., x k = α k plugged in for the variables.Extending the notation of ( 49), we write Of course, one can substitute any real or complex numbers for the variables, but choosing α i ≥ 0 has the advantage for probability that when t ∈ [0, 1], P λ (α) and Q λ (α) are nonnegative.
Remark 5.There are other homomorphisms Λ → C with nonnegative values on the P λ , of which the pure alpha specializations form one family.These were classified in [52].
One obtains probability measures on sequences of partitions using Proposition 5.1 as follows.
The k = 1 case of Definition 13 is a measure on partitions, referred to as a Hall-Littlewood measure.Two special cases will be relevant for our setting.Below and subsequently, we use the notation x[k] in the arguments of Hall-Littlewood polynomials to denote the variable x repeated k times.

.) .
It follows from the branching rule (47) that the marginal distribution of λ (k) under P (k) ∞;t is P * (k) ∞;t .When t = 1/p, random sequence of partitions specified by P (k) ∞;t is related to the random sequence of groups in Theorem 1.2 with P = {p}, and similarly P * (k) ∞;t is related to Theorem 1.1.We discuss this in the next section.Remark 6.We use a slightly different setup for Hall-Littlewood polynomials than in some previous works [67,68] on Hall-Littlewood polynomials and p-adic random matrices.The reason is that those works considered random matrices over Q p , and consequently it was desirable to extend the indices of Hall-Littlewood polynomials to 'partitions' with negative parts allowed, which required modifying the standard notation of [48] slightly.Because we work only over Z p , there is no necessity to do this, and our notation follows that of [48].

Hall-Littlewood polynomials and abelian p-groups
The goal of this section is to prove several basic results giving formulas, in terms of Hall-Littlewood polynomials, for various counts of maps between abelian p-groups.All follow straightforwardly from the material in [48, Chapters II and III], but most do not seem to be present in the random matrix theory literature.We therefore hope this section will have some value in translating between the usual terminology of moments and the Hall-Littlewood notation used in e.g.[31,67,68,69].Here and in the next section, we will usually fix a prime p and let t = 1/p to declutter notation and keep with standard Hall-Littlewood usage.
Definition 16.For any partition λ ∈ Y, we denote by G λ the abelian p-group i≥1 Z/p λi Z when p is fixed and clear from context.The type of a finite abelian p-group H is the partition λ for which H ∼ = G λ .For partitions λ, µ ∈ Y, we denote by G µ,λ the set of subgroups of G λ which have type µ.Lemma 6.1.With t = 1/p as above, Proof.The result follows by collecting a few facts from chapters II and III of [48].By their definition [48, Chapter II.2 (2.1)], the Hall algebra structure constant is It is then shown in [48,Chapter III.3 (3.4)] that where n(λ) = i≥1 (i − 1)λ i and the second equality follows by [48,Chapter III.3 Ex. 2(a)].Here c λ µ,ν (t) are the multiplicative structure constants defined by or equivalently the comultiplicative structure constants of the Q polynomials defined by The second definition is equivalent to Hence the number of subgroups of G λ of type µ is □ Proposition 6.2.For any λ, µ ∈ Y, we have and more generally Proof.The first part, (54), follows directly from [48, Chapter III.3 Ex. 2(a)].This together with Lemma 6.1 yields

.) .
The fact that the RHS is equal to P λ/µ (t, t 2 , . ..) Pλ (t, t 2 , . ..)Q µ (1, t, . ..) follows since Q λ is a constant multiple of P λ and is homogeneous.Since a surjection G λ ↠ G µ induces an injection G * µ → G * λ and vice versa, and finite abelian groups are isomorphic to their dual groups, ), hence we have established (55).□ The following 'joint moment' result will be useful later.It is also a natural generalization of the well-known fact that moments of the Cohen-Lenstra distribution are 1, and reducing to this fact when M is trivial.Proposition 6.3.Let M, N be finite abelian p-groups.Then 1 Π t (1, t, . . .; t, t 2 , . ..) where the sum is one representative K from each isomorphism class of finite abelian p-groups.
Proof.Letting µ, ν be the types of M, N respectively, by Proposition 6.2 the LHS is By the skew Cauchy identity stated in Proposition 5.1, this is where we again used Proposition 6.
Proof.We induct on k, the base case k = 1 being trivial.It follows from ( 54) and ( 55) that for fixed G λ , where the last equality is by the branching rule.Since Q λ is a constant multiple of P λ and is homogeneous, the second equality of (57) follows.□

Moments and joint moments of the candidate limit distributions
In this section we compute the moments and joint moments of the limiting distributions appearing in Theorems 1.1 and 1.2, by relating them to the Hall-Littlewood framework of the last two sections.Below, for a finite set of primes P we use the notation A P for the set of all abelian groups G such that every prime factor of |G| lies in P .We begin by defining notation for the probability measures appearing in Theorems 1.1 and 1.2; we shortly show that these expressions do indeed define probability measures.
Definition 17.For k ∈ Z ≥1 and P a finite set of primes, given abelian groups B, B 1 , . . ., B k ∈ A P , we let and where we take B 0 to be the trivial group in the product.
The notation is meant to be suggestive of the Hall-Littlewood measures P * (k) t;∞ and P (k) t;∞ , and we relate the two measures in this section.In the remainder of the section we use P and k as in Definition 17 without comment.
We begin with factorization properties which reduce the theorem to the case of a single prime.
Lemma 7.6.Let L, M, N be finite abelian p-groups.Then We note that when N is trivial, Lemma 7.6 reduces to Proposition 6.3.
Proof of Lemma 7.6.For any K, composing φ ∈ Hom(K, M ⊕ N ) with projections onto the two factors yields a natural map which is a bijection by the universal property of direct products 5 .Hence summing over the possible images of φ yields # Sur(K, M )# Hom(K, N ) = By the definition of P ∞;1/p together with the above discussion, the LHS of ( 62) is equal to {p} is a probability measure follows from Proposition 7.5, since Hall-Littlewood processes are probability measures.

• • •
H1≤µ (1) ⊕H2: π1(H1)=µ (1)   1 = m k (µ (1) , . . ., µ (k) ), completing the proof.□ Theorem 8.1.Let X n and Y n be sequences of random finitely generated abelian groups.Let a be a positive integer and A a be the set of isomorphism classes of abelian groups with exponent dividing a. Suppose that for every G ∈ A a we have Then we have that for every H ∈ A a , lim n→∞ P (X n ⊗ (Z/aZ) ≃ H) exists and H∈Aa with a = p∈P p ep+1 , implies that

The proof is then complete because
We note also that the analogue of Theorem 1.1 over Z p holds by the exact same proof, with P = {p}.
Theorem 8.2.Let ξ be a Z p -valued random variable which is not constant modulo p, and for each n let M 1 , . . ., M k be k independent random matrices with iid ξ-distributed entries.Then for any finite abelian p-group B, It remains to prove Theorem 8.1.For this one follows the treatment of [72,Theorem 8.3], which roughly speaking can be summarized as follows: (iii) Lastly, one can show the limits lim n→∞ P(X n ⊗ (Z/aZ) ≃ H) exist for all H ∈ A a by contradiction, passing to subsequences where the limits exist for all H and use (i) and (ii) above.
To our current situation, we just need to guarantee that the growth of S G = n k (G) for each p-group G of type λ is appropriate so that we can apply (i)-(iii) outlined above.It is worth noting for the reader that in this section we refer more details to [72], while in the next section we give a more self-contained argument because the extensions of [72] to the setting of joint moments are more nontrivial.We require the following strengthened (but more cumbersome to state) version of [72,Theorem 8.2], and recall that Y n denotes the set of partitions with at most n parts.
for some nonnegative reals C λ .Suppose these satisfy a bound of the form , but the only property of this bound which is needed in the proof is the convergence of the sum (69), hence the result holds in our more general setup.□ By viewing the conjugate partitions (λ j ) ′ of each partition (λ 1 , . . ., λ s ) ∈ M as specifying an abelian p jgroup of exponent dividing p mj j , we see that M is in bijection with A a where a = s j=1 p mj j .In applications of the above proposition, x µ , y µ will be the limiting probabilities that certain random elements of A a has isomorphism type specified by µ in this manner, and C λ are the so-called Hom-moments E K [# Hom(K, G λ )].In order to get good bounds on the Hom-moments in our setup, we first consider the usual (Sur-)moments.We require the following estimate, which is [72,Lemma 7.4].Lemma 8.4.For G µ,λ as in Definition 16, . Lemma 8.5.There exist positive constants F k and 0 < c k < 1 such that for any p, Proof.We will induct on k.We first show Indeed, using the notations from [72, Section 7], with where the first inequality is Lemma 8.4.In the last inequality, we are using the fact that for any λ i , and absorbing this bound into the constant F 2 .
For induction, assume that for some c j−1 > 0 We will show that then where > 0.
To see this, we proceed exactly as before: .
Here again we use Lemma 8.4 in the first inequality and essentially the same bound in the last.So we can take , completing the proof.□ We now explain in more detail how to adapt the proof in [72] to our setting.
for all λ ∈ Y m , where f p,m satisfies (69).The LHS is n k+1 (G λ ′ ), which by Lemma 8.5 is bounded above by Hence the summand in ( 69) is of the form so the sum converges.The remainder of the proof is identical to that of [72,Theorem 8.3].□ 9. Joint moments comparison and the proof of Theorem 1.2 The main goal of this section is the following analog of Theorem 8.1, which informally says that if the limits of the joint moments are the same and not too large, then the joint distributions must be asymptotically the same.
Theorem 9.1.For each n ≥ 1 let (X n , . . ., X n ) and (Y n , . . ., Y n ) be two sequences of random finitely generated abelian groups.Let a be a positive integer and A a be the set of isomorphism classes of abelian groups with exponent dividing a. Suppose that for every G 1 , . . ., G k ∈ A a we have Then we have that for every H 1 , . . ., H k ∈ A a , lim n→∞ P(X n Furthermore, To complete the proof of Theorem 1.2 one just needs to combine the above result together with Theorem 4.1 and Theorem 7.2.
Proof of Theorem 1.2, assuming Theorem 9.1.To show that for any P the random groups Cok(M then shows that the matching of these moments implies convergence of the joint distribution of Cok(M 1 • • • M j ) ⊗ Z/aZ, 1 ≤ j ≤ k to P where we take B 0 = 0.
It remains to justify Theorem 9.1.For this one follows the three steps (i)-(iii) outlined above in the proof of [72,Theorem 8.3], which we do now.
Proof of Theorem 9.
Proof.Note that the group G 1 ⊕ • • • ⊕ G k corresponds to the union of {λ(1), . . ., λ(k)}.The conjugate parts of this partition are given by { ℓ λ(ℓ) i ′ : i ≥ 1}.We then use Lemma 8.5 to obtain , and the claim follows by bounding the maximum above by the sum and letting F k = max( F k , 1).□ Proof of Theorem 9.1.We closely follow the proof of [72,Theorem 8.3].Let us first suppose that the limits lim n→∞ P(X (1)  n ⊗ Z/aZ ≃ H 1 ∧ • converges.By factoring the sum into a product of k sums and factoring each one over the primes p j dividing a, it suffices to show when a = p e is a prime power that for any G ∈ A a there exists G ′ ∈ A a such that H∈Aa # Hom(H, G) # Hom(H, G ′ ) converges.This follows as in [72] by letting λ be the type of G and taking G ′ to have type π with π ′ i = 2λ ′ i +1 for 1 ≤ i ≤ e.By [72, Lemma 7.1], # Hom(G µ , G λ ) = p i µ ′ i λ ′ i , (80) and the above convergence (and hence convergence of (79)) follows by a simple computation.
We now show (72), still assuming without proof that the limits in that equation exist.Let a = s j=1 p mj j be the prime factorization of a, and let M be as in Theorem 9.3 so that M is in bijection with A a as in the previous section.Write H µ ∈ A a to be the element corresponding to µ ∈ M .Note this differs from each 1 ≤ i ≤ k − 1, there is a surjection Our tools do not currently allow us to prove universality of this random sequence of abelian p-groups with maps between them, but the perspective of this extra data is nonetheless useful for interpreting the limit distribution on the isomorphism types of this sequence of groups.where all nontrivial maps are quotient or identity maps.Then ϕ and ϕ ′ are not equivalent, and to check this it suffices to observe that p ker ϕ ∩ p 2 G = {0} while p ker ϕ ′ = p 2 G, as these two facts are unchanged by any 2-sequence automorphism.
Clearly, ≃ is an equivalence relation on k-sequences.

Corollary 4 . 2 .
Let M 1 , . . ., M k have iid entries in Z p which are not constant modulo p. Then for any finite abelian

Claim 4 . 7 .
The number of joint F

Theorem 7 . 1 .Theorem 7 . 2 .
The map B → P * (k) P (B) defines a probability measure on A P , with momentsE B∼P * (k) P [# Sur(B, G)] = n k (G)for any G ∈ A P .The map (B 1 , . . ., B k ) → P (k) P (B 1 , . . ., B k ) defines a probability measure on A k P , with joint moments E (B1,...,B k )∼P (k) ) where again the sum is over all isomorphism classes of finite abelian p-groups and K is a representative from the class.Interchanging the sums and applying Proposition 6.3 to(63) completes the proof.□ Proof of Theorem 7.2.By the factorizations Lemma 7.3 and factorization of the number of surjections, it suffices to prove Theorem 7.2 in the case P = {p}.The fact that P (k)

8 . 1
Moment comparison and the proof of Theorem 1.Fix a finite set of primes P , and let Y ∼ P * (k) P as defined in Definition 17.For any a divisible only by primes in P , Theorem 3.2 and Theorem 7.1 imply that Y and Cok(M 1 • • • M k ) (in the setting of Theorem 1.1) have asymptotic matching moments with respect to all groups G of exponent dividing a.To pass this information back to distribution, we then use the following result on the moment problem for finite abelian groups, a direct analogue of [72, Theorem 8.3], which suffices to prove Theorem 1.1.

Furthermore, lim n→∞ P
(X n ⊗ (Z/aZ) ≃ H) = lim n→∞ P(Y n ⊗ (Z/aZ) ≃ H).Proof of Theorem 1.1, assuming Theorem 8.1.Assume that the exponent of the group B under consideration has prime factorization p∈P p ep .Theorem 8.1, applied to the sequence X n = Cok(M 1 • • • M k ) and Y n = Y ∼ P * (k) P

□Theorem 9 . 2 .
As before, the exact same proof above with a a power of p shows the following p-adic analogue: For matrices M i ∈ Mat n (Z p ) under the same assumptions as in Theorem 8.2 and finite abelian p-groups B 1 , . . ., B k , one haslim n→∞ P (Cok(M 1 . . .M j ) ≃ B j , 1 ≤ j ≤ k) = (p −1 ; p −1 ) k ∞ k i=1 # Sur(B i , B i−1 )# Aut(B i ) ,
[72,r some appropriate condition on the growth of S G and assuming that lim n→∞ P(X n ⊗ (Z/aZ) ≃ H) exists, one can show that H∈Aa lim n→∞ P(X n ⊗ (Z/aZ) ≃ H)# Hom(H, G) = H≤G S H for all G ∈ A a .fromwhich,underappropriate growth condition on S G , one can deduce that these two limits are actually the same as desired (see[72, Theorem 8.2, 8.3], and also Theorem 9.3 below).