Rank deficiency in sparse random GF[2] matrices

Let $M$ be a random $m \times n$ matrix with binary entries and i.i.d. rows. The weight (i.e., number of ones) of a row has a specified probability distribution, with the row chosen uniformly at random given its weight. Let $N(n,m)$ denote the number of left null vectors in ${0,1}^m$ for $M$ (including the zero vector), where addition is mod 2. We take $n, m \to \infty$, with $m/n \to \alpha>0$, while the weight distribution may vary with $n$ but converges weakly to a limiting distribution on ${3, 4, 5, ...}$; let $W$ denote a variable with this limiting distribution. Identifying $M$ with a hypergraph on $n$ vertices, we define the 2-core of $M$ as the terminal state of an iterative algorithm that deletes every row incident to a column of degree 1. We identify two thresholds $\alpha^*$ and $\underline{\alpha}$, and describe them analytically in terms of the distribution of $W$. Threshold $\alpha^*$ marks the infimum of values of $\alpha$ at which $n^{-1} \log{\mathbb{E} [N(n,m)}]$ converges to a positive limit, while $\underline{\alpha}$ marks the infimum of values of $\alpha$ at which there is a 2-core of non-negligible size compared to $n$ having more rows than non-empty columns. We have $1/2 \leq \alpha^* \leq \underline{\alpha} \leq 1$, and typically these inequalities are strict; for example when $W = 3$ almost surely, numerics give $\alpha^* = 0.88949 ...$ and $\underline{\alpha} = 0.91793 ...$ (previous work on this model has mainly been concerned with such cases where $W$ is non-random). The threshold of values of $\alpha$ for which $N(n,m) \geq 2$ in probability lies in $[\alpha^*,\underline{\alpha}]$ and is conjectured to equal $\underline{\alpha}$. The random row weight setting gives rise to interesting new phenomena not present in the non-random case that has been the focus of previous work.

Let X 1 , X 2 , . . . , X m denote the vectors constituting the rows of M , and let σ(n, m) denote the co-rank over GF [2], namely, with 'span' denoting the linear span over GF [2], σ(n, m) := m − dim span{X 1 , X 2 , . . . , X m }. (1.1) Then the number of null vectors of M , including the zero vector, is N (n, m) = 2 σ(n,m) , (1.2) which counts the number of distinct solutions in {0, 1} m , including the zero solution, to a 1 X 1 + · · · + a m X m ≡ 0 (mod 2). (1.3) Note that for a fixed n and a given realization of the sequence of rows X 1 , X 2 , . . ., the numbers N (n, m) are nondecreasing as m increases.
Suppose that n, m → ∞, with m/n → α > 0. We study asymptotics of the expected size E[N (n, m)] and probability P[N (n, m) > 1] of non-triviality of the left null space of M (n, m), in terms of the asymptotic aspect ratio α. In particular, we derive computable thresholds for α at which phase transitions occur. We describe the relevance of the 2-core construction to this question. We also study the rate of exponential decay of the probability that 1 := (1, 1, . . . , 1) is a null vector.
In our probabilistic setting, the rows X 1 , X 2 , . . . are independent and identically distributed (i.i.d.) with the law of a random vector X = X(n) ∈ {0, 1} n . Our focus is the (very) sparse regime in which the number of non-zero components of X converges in law as n → ∞ to some given weight distribution. The existing literature focuses on the simplest case, where the weight distribution degenerates to some constant r. Before describing our model in detail and presenting our main results (in Section 2), we make some remarks on motivation. Note that σ(n, m) = 0 if and only if M has row rank m, which occurs if and only if M has column rank m. Thus the absence of nontrivial left null vectors is equivalent to all column vectors in {0, 1} m being expressible as a linear combination of the columns of M (with addition modulo 2), or in other words, to there being a solution x ∈ {0, 1} n to M x ≡ y for all column vectors y ∈ {0, 1} m . In the special case of r = 2, motivation for considering this question is discussed at the start of [16,Chapter 3]. The following interpretations help to motivate the general case.
A scheduling problem. A tennis club is organizing its annual schedule. There are n playing days, and m potential players. Each player wants to play on a given subset of the days; if there is not a match available on every one of these days, they refuse to pay the annual membership. So that nobody is left out, an even number of players is needed on each day. Each possible schedule satisfying these requirements is a left null vector mod 2; the one with the most units achieves the maximal income for the club.
Randomized Lights Out. This is a variant of the game 'Lights Out' [22]. Each of m lamps can be either on or off, and there are n switches, each of which is incident to a subset of the lamps specified by the matrix M ; Lamp i and Switch j are mutually incident if and only if the (i, j) entry of M is 1. If a switch is toggled, the status of every variables incident to the row are deemed true. Given a vector y ∈ {0, 1} m , finding a solution x to M x ≡ y corresponds to finding a truth-assignment for the Boolean variables so that each clause i is true if y i = 1 and false if y i = 0. Thus the column rank is m if and only if the problem is satisfiable for all possible choices of y.
A spin-glass model. The relationship between satisfiability problems and spin glasses has already been noted in [21]. In the present instance, consider the following variant of the well-known Sherrington-Kirkpatrick mean-field spin-glass model (see e.g. [23]).
There is a random collection of hyperedges on n vertices, represented by the m rows of M . Each hyperedge i has a sign g i , taking value (−1) yi . Each vertex j is assigned a spin σ j ∈ {−1, 1}. The (zero temperature) probability measure on the state-space is concentrated on states of minimal energy, i.e. with maximal value of g i e i , where e i is the product of spins at vertices in hyperedge i. The existence of a configuration with all terms in the sum equal to +1 is equivalent to the existence of a solution to M x ≡ y.
The Ehrenfest urn and random walk on the hypercube. In the Ehrenfest model of heat exchange (see e.g. [12, p. 121] or [20, §3.5]), a box contains n particles, each either red or blue. At each step, a particle is sampled uniformly from the box and changes its colour. In the case where X has a single unit entry, we may view each row of M as selecting which particle is to be changed at that step. Then 1 is a null vector for M if and only if the box returns to the initial state after m steps. This may be phrased in terms of a random walk on a discrete hypercube {0, 1} n : the event that 1 is null corresponds to the walker being back in the initial state after m steps.
The general case, allowing other weight distributions, corresponds to a generalization of the Ehrenfest model whereby multiple 'diffusions' are allowed, i.e. at each step several particles may change colour at once; cf [20,Chapter 10]. This can be similarly interpreted in terms of a walk on a version of the hypercube with additional edges.
There is a large body of work on random matrices and random linear equations over finite fields, including the surveys [16,Chapter 3] and [18,19]. Problems may also be formulated in terms of random hypergraphs: each row represents a hyperedge, and each column represents a vertex (see Section 2.3 below). Generally, such models can be described in the framework of random allocation or occupancy problems [16,20,17].
The null-vector problem in the fixed row-weight case has received several treatments in the literature. It is not easy to reconcile all the existing results, due to differences both in presentation and in the underlying probabilistic models. The present paper provides clarification, including a rigorous justification that the results are unchanged under small perturbations of the underlying model. Our main contribution, however, is the treatment of genuinely random row weights, which is new. We mention recent renewed interest in this area in several scientific communities: Alamino and Saad [1] give a statistical physics approach to the null-vector problem; Ibrahimi et al. [13] treat the related random XORSAT problem; Costello and Vu [6] study the rank of random symmetric matrices.

Description of the random matrix model
Given n ∈ N, suppose that X = X(n) ∈ {0, 1} n is a random row vector, selected according to a probability law of the following form. Let W be an N-valued random variable (P[W ≥ 1] = 1) whose law will be the (limiting) weight distribution of X. Let W 1 , W 2 , . . . be a sequence of random variables with W n ∈ [n] such that W n d −→ W as n → ∞. Let w(X) have the distribution of W n , and for each k ∈ [n] let the conditional distribution of X, given w(X) = k, be uniform over {x ∈ {0, 1} n : w(x) = k}.
Consider i.i.d. random vectors X 1 , X 2 , . . . with the same law as X. Let M := M (n, m) be the m × n matrix whose rows are X 1 , X 2 , . . . , X m . Let ρ(s) := E[s W ] and ρ n (s) := E[s Wn ] denote the probability generating functions of W and W n , respectively. We use P ρn and E ρn for probability and expectation for the random matrix model with n columns and row-weight distribution specified by ρ n . We say W n are uniformly bounded if there is a finite constant r 1 such that P[W n ≤ r 1 ] = 1 for all n (so P[W ≤ r 1 ] = 1 too).

Threshold results in the general setting
Given the probability generating function ρ, define the threshold . These facts, and others, are proved in Lemma 4.1 below. We now present our first main result, describing the threshold behaviour of the expected number of null vectors N (n, m n ).
Moreover, if in addition there exist r 0 ≥ 3 and r 1 < ∞ such that P[r 0 ≤ W n ≤ r 1 ] = 1 for all n, and α ∈ (0, α * ρ ), then as n → ∞, We give the proof of Theorem 2.1 in Section 4.5. A key role in our proof is played by the event A(n, m) that the row vector 1 = (1, . . . , 1) is null for M , i.e., A(n, m) := {X 1 + · · · + X m ≡ 0 (mod 2)}. (2.5) Observe that N (n, m) is the number of collections of rows of M (n, m) which sum to 0 (mod 2), and for each set of rows the probability that it sums to 0 is P ρn [A(n, )]. So (2.6) The first step in our analysis is to study the asymptotics of P ρn [A(n, m)], which is of its own interest in the context of random allocations; see Section 2.4. The starting point for this analysis is the novel exact formula (3.1) below for this probability in the special binomial model, which we show serves as a good approximation for the general case (details are in Section 3). The asymptotic analysis of (2.6), leading to the proof of Rank deficiency in sparse random GF [2] matrices For a fixed n, the number of rows m at which the first non-zero null vector appears, T n := min {m ∈ N : X m ∈ span {X 1 , X 2 , . . . , X m−1 }} , is another random variable of interest. Standard linear algebra implies that T n ≤ n + 1.
The relevance of α ρ for the null vector problem is shown by the next result.
Theorem 2.2. Suppose W n are uniformly bounded and P[W n ≥ 3] = 1 for all n. Then α * ρ ≤ α ρ ≤ 1, and for any ε > 0, lim n→∞ P ρn (α * ρ − ε)n ≤ T n ≤ (α ρ + ε)n = 1. The left plot shows parts of the curves y = h(x) (all the line) and x = g * (y) (solid line). The right plot shows parts of the curves y = ψ(x) (all the line) and the locus of (g * (α), ψ(g * (α))) (solid line). The left plot shows that g * (α) has two discontinuities, one at α = α ρ ≈ 0.908654 and one at α ≈ 0.938536, with the first corresponding to a jump from g * = 0 to g * ≈ 0.719682 and the second to a jump from g * ≈ 0.835696 to g * ≈ 0.964919. The right plot shows the single positive solution of ψ( It is not a coincidence that the curves h and ψ seem to mirror each other: see Lemma 5.8 below.

2-cores and random hypergraphs
To describe the probabilistic interpretation of α ρ and α ρ we need additional terminology. Given a set V = {v 1 , . . . , v n }, whose elements we call vertices, a non-empty subset of V is called a hyperedge. Given a collection E := (E i ) of m hyperedges, we refer to the pair (V, E) as a hypergraph. This hypergraph may be identified with an m × n incidence matrix of {0, 1} entires, having no zero rows, as follows: the (i, j) entry is 1 if and only if v j ∈ E i , in which case we say row i is incident to column j, and that hyperedge E i is incident to vertex v j , and refer to (E i , v j ) as an incidence of the hypergraph.
The number of hyperedges incident to a vertex is its degree. Fix a hypergraph (V, E). For F ⊆ E, the set V (F) ⊆ V of vertices incident to at least one of the hyperedges in F is the vertex span of F. We identify the hypergraph (V (F), F) by the edge subset F that induces it, and call F ⊆ E a partial hypergraph. A partial hypergraph F = ∅ is a hypercycle if every vertex v has even degree with respect to F; this corresponds to a non-trivial left null vector for the incidence matrix of the hypergraph (V, E).
Given a hypergraph (V, E), the 2-core is defined via the following algorithm: 1. If there exists no vertex of degree one, stop.
2. Otherwise, select an arbitrary vertex of degree one, and delete the unique incident hyperedge; then return to Step 1.
The algorithm terminates, because the partial hypergraphs are decreasing; the terminal partial hypergraph, which does not depend on the arbitrary choices made in Step 2 (see [8, pp. 127-128]), and may be empty, is called the 2-core of E, denoted Core(E). The next result, Theorem 2.4, describes the 2-core of our random matrix M (n, m n ): specifically, the limiting aspect ratio of the 2-core being less than or greater than 1 depends on the sign of ψ(g * (α)), where ψ and g * are defined at (2.8) and (2.11) respectively. (A related result appears in [5].) While of interest in its own right, Theorem 2.4 has importance for the rank deficiency problem in view of the following observation: if the 2-core has more rows than columns, then the corresponding hypergraph has a hypercycle (see Lemma 5.1 below for details). Theorem 2.4 is thus the basis for the appearance in Theorem 2.2 of α ρ as defined at (2.12) (we explain in detail in Section 5).
We note, under the hypotheses of Theorem 2.4: (i) for α > α ρ , g * is positive and strictly increasing; and (ii) g * is right continuous, with a finite set of discontinuities D ρ ⊂ (0, ∞) with α ρ = min D ρ . These and other facts are proved in Lemma 5.5 below.
In the example in Figure 1, and also in the fixed weight setting, ψ(g * (α)) changes sign only once, but in the general random weight setting it may change sign multiple times, leading, via Theorem 2.4, to non-monotonic behaviour for the aspect ratio of the 2-core. Figure 2 shows an example where ρ(s) = 0.9183s 3 + 0.04s 19 + 0.0417s 41 : as α increases from 0, the 2-core switches from having asymptotically more columns than rows to having more rows than columns not just once (at α ρ ), but twice, as ψ(g * (α)) changes sign. Proposition 5.7 below, and the subsequent discussion, explains some of the features in the figure. Thus the random weight setting displays subtle new phenomena not present in the fixed weight case that has been the focus of previous work.

Even occupancy in random allocations
One interpretation of the event A(n, m) defined at (2.5) is in terms of the random allocation model. Suppose we have n urns, and for each row of M we allocate a collection of balls to a set of urns determined by the unit entries of that row of M . Event A(n, m) is that all the urns end up with an even number of balls. Random allocations have been extensively studied; see e.g. [12, p. 101], and the monographs [16,17,20].
The following theorem, which we prove in Section 3, describes the exponential rate of decay for P ρn [A(n, m n )] where m n /n has a finite positive limit. The theorem excludes the case in which both W and m n only take odd values; if m is odd and W n is odd a.s., then P ρn [A(n, m)] = 0 since the total number of units in the matrix is odd.
A consequence of Theorem 2.5 of independent interest concerns the probability π n (m) that all the components Y 1 , . . . , Y n of a multinomial (m; n −1 , . . . , n −1 ) random vector are even. Here Y j can be interpreted as the occupancy of urn j after m balls are independently and uniformly distributed into n distinct urns: see e.g. [20, p. 11]. Then π n (m) = 2 −n n j=0 n j (1 − (2j/n)) m ; this formula is known in the Ehrenfest urn literature [20, pp. 128-129] and can also be obtained from (3.1) below. If m is odd, π n (m) must be zero.
Proposition 2.6. Let π n (m n ) denote the probability that all the n components of a multinomial (m n ; n −1 , . . . , n −1 ) random vector are even. Suppose that m n is even for each n and m n /n → α = λ tanh λ ∈ (0, ∞) as n → ∞. Then lim n→∞ n −1 log π n (m n ) = log cosh λ − (λ tanh λ)(1 − log tanh λ). (2.16) We derive Proposition 2.6 from Theorem 2.5 in Section 3.5; it also follows from a result of Kolchin [15,Theorem 2,p. 141], and yet another proof is given in Section 3.6 of the first version of the present paper on ArXiv.

Thresholds in the fixed-weight case
In this section we consider the case where P[W = r] = 1 for fixed r ∈ N, which is the focus of the existing literature (see the discussion in Section 2.6 below). In particular, we discuss numerical and asymptotic evaluation of the thresholds α * r , α r , and α r , defined to be the values of α * ρ , α ρ , and α ρ , respectively, in the case where ρ(s) = s r .
Appropriate versions of Theorems 2.1 and 2.2 apply in this setting. We remark that in the case r = 2, α * 2 = 1/2 and the number of cycles N (n, m n ) in an Erdős-Rényi graph with m n /n → α has a Poisson limit with finite expectation for α ∈ (0, 1/2), but the limiting expectation is infinite for α ≥ 1/2 (see e.g. [16, §2.3]); we could not find in the literature an explicit reference to the fact that the expectation blows up exponentially with n for α > 1/2, at the rate given by the appropriate case of Theorem 2.1. Table 1 shows values of α r , α * r , and α r , for r ≤ 8: we describe how these were computed in Appendix A, where we also review previous computations of these thresholds.  Table 1: Fixed row-weight thresholds. Note that α r is not defined when r = 1 or 2.
As suggested by the numerical results, it can be shown that, for r large enough, α r < α * r < α r < 1; this is a consequence of the following result. Proposition 2.7. As r → ∞, (2.17) The α * r result in (2.17) is due to Calkin [3]; we prove the other two in Appendix A.

Previous results on threshold values
In the simplest case, W n = W hyp n := r∧n a.s., for a fixed r ∈ N; then W = r a.s. This fixed row weight 'hypergeometric' model is studied by Cooper [4]. A variation is the model in which r units are assigned to the row independently and uniformly at random, with multiplicities reduced mod 2. The latter 'binomial' model corresponds to W n = W bin n distributed as the number of odd components in a multinomial (r; n −1 , . . . , n −1 ) random vector; then W n d −→ r (see Lemma 3.3 below). The r ≥ 3 binomial model is studied by Kolchin [15]. Note that in this model rows of all zeroes may appear, in which case they are ignored (in other words, empty hyperedges are discounted): this is a small effect since P[W bin n = 0] = O(n −r/2 ), so a vanishing proportion of rows needs to be discarded.
Phase transitions in the null vector problem for random matrices over finite fields with fixed row weight r ≥ 3 have been studied since the early 1990s. In the case of the binomial model, the threshold α * r , r ≥ 3, for E[N (n, m n )], m n /n → α, was described by Balakin et al. [2] and Kolchin [15,14]; in these results α * r is characterized by the fact that the expected number of non-trivial null vectors tends to 0 (∞) when α < α * r (α > α * r ), but the proofs show that the growth is in fact exponential for α > α * r . Calkin [3] and Cooper [4] also study α * r , r ≥ 3, and in particular Calkin [3] studies α * r as r → ∞; both [3] and [4] work in the case W n = r ∧ n. Note that Cooper's [4] expression of the matrix problem is transposed compared to ours. Even the special case P[W = r] = 1 of our Theorem 2.1 represents a slight generalization of the results just mentioned because it allows for any class of W n provided W n → r in probability.
In these previous investigations, the analytic description of the threshold α * r varies, but these descriptions can be shown to be consistent with ours: see Appendix A.
A similar tabulation to our tabulation of α r is given by Cooper [5, pp. 370-371], who also gives an equivalent analytic description of α r to our (A.2); see also Dietzfelbinger et al. [10] which we discuss further in the next subsection. We note also that α r has received considerable attention in its own right: see e.g. [13] for its role in random XORSAT.

Between the two thresholds
The following problem arises in the XORSAT literature. Let r ∈ N with r ≥ 3. Let M be our m × n matrix, with m/n → α > 0, and suppose W n = r a.s. for all n ≥ r. Let N denote the number of column vectors x ∈ {0, 1} n such that M x ≡ ω, where ω ∈ {0, 1} m is chosen uniformly at random (independent of M ). Thus N is a random variable.
The proof of this in [11] is based on a second moment calculation. The analytical definition ofα r in [11,10] is not obviously the same as our definition of α r , but the definition in terms of cores (see [10, Proposition 3 and equation (4)]) seems to match our definition of α r , and the numerical values in [10] are consistent with our α r .
If we accept thatα r = α r , this result implies that if α < α r there is, for n large enough, no non-zero left null vector for M , as follows. Suppose that a non-zero y sat- We may then deduce that in the case with W n = n ∧ r, our Theorem 2.2 may be strengthened to n −1 T n → α r in probability. This implies that for α in the interval (α * r , α r ), a form of substantialism occurs; existence of any left null vector is unlikely, but if there is one, there are lots of them.

Overview and terminology
In this section we work towards proving Theorem 2.5, in the context of the classical occupancy problems of random allocations of balls into urns.
We shall use the following terminology. Suppose W is a random variable taking values in Z + , and k ∈ N, and p, p 1 , p 2 , . . . , p k are numbers in [0, 1] such that (In most of the rest of the paper we assume W ≥ 1, but for this section we can allow W to take value 0.) Let us say the random variable X has the Bin(W, p) distribution if for each n ∈ Z + the conditional distribution of X, given that W = n, is binomial with parameters (n, p). Let us say that a random vector (Z 1 , . . . , Z k ) has the multinomial (W ; p 1 , . . . , p k ) distribution if for each n ∈ Z + the conditional distribution of (Z 1 , . . . , Z k ), given that W = n, is multinomial with parameters (n; p 1 , . . . , p k ).
As in Section 2.1, we assume W n (having the distribution of row weights for our matrix with n columns) is chosen to converge in distribution to a limiting random variable W . An important special case is the so-called binomial model. In the binomial scheme take W n = W bin n to be distributed as the number of odd components in a multinomial (W ; n −1 , . . . , n −1 ) random vector. Note that we may also generate the corresponding row by first sampling the given multinomial vector and then reducing its elements mod 2. By Lemma 3.3 below, W bin n d −→ W as n → ∞, so this is indeed a special case. We write P bin ρn for probability associated with the binomial allocation scheme. For the general model we write P ρn as before.

Exact formulae for the allocation problem
Fix n. Let X ij denote the jth component of X i . Define the column sums Y j and partial row sums S i,J of the matrix (X ij ) as follows (with standard addition): Recall from (2.5) that A(n, m) denotes the event that 1 is a null (row) vector for M . Lemma 3.1. In the binomial allocation scheme, we have the exact formula In the general allocation scheme, where p (n) j := n r=0 p j,r P[W n = r] and p j,r is given by Consider the binomial allocation scheme. In the binomial model, we then obtain (3.1) from (3.2). In the general scheme, conditional on n j=1 X 1j = r, the distribution of S 1,J is hypergeometric with parameters (n; |J|, r). Let H ⊆ [n] denote the set of values of j for which X 1j = 1. For r ∈ [n], we write E r for expectation in the case where P[W n = r] = 1. Instead of fixing J ⊆ [n] and choosing H as a uniform random r-subset, we obtain an exact formula for E r [(−1) S 1,J ] by fixing j = |J| and an r-subset H, and selecting J uniformly from the j-subsets of [n]. The probability p j,r that S 1,J := |H ∩ J| is even is given by summing probabilities for |H ∩ J| ∈ {0, 2, 4, . . .}, giving the expression in (3.4). It follows that E r [(−1) S 1,J ] = 2p j,r − 1, and hence Substitution of this into (3.2) gives (3.3).

Asymptotics in the binomial model
The remainder of Section 3 concerns asymptotic analysis of the quantities in Lemma 3.1. The first result enables us to work primarily with even m, which has technical advantages.
Proof. The hypotheses imply that there exist ε > 0 and r ∈ 2Z such that P[W n = r] > ε for all n large enough. For m > 3, suppose A(n, m − 3) occurs. Then A(n, m) will occur if the 3 additional rows constitute a hypercycle. With probability at least ε 3 , these new rows each have r units, and given this, there is a probability at least n −2r , say, that these units form a hypercycle.
Applying this inequality twice, once with m + 3 in place of m, gives the result.
Recall that in general we assume W n d −→ W . Next we give an elementary lemma that confirms the binomial model's place in this framework. We will prove Theorem 2.5 (in Section 3.5) by first showing that (2.14) holds in the binomial setting, using (3.1) and the Stirling approximation as discussed in Appendix B. Then we will extend this to the general setting using an approximation argument described in Section 3.4. We start with a slightly more general statement than (2.14) in the binomial case, which we will also need later in the proof of Theorem 2.1.
For the second inequality, we have from (3.1) and (B.2) that for any integer i n ≤ n/2, using the fact that m n is even. Then, since m n /n < α 2 , Now use continuity of g α to choose a sequence of integers i n ≤ n/2, n ∈ N, such that g α2 (i n /n) → sup γ∈[0,1/2] g α2 (γ), with i n → ∞ and n − i n → ∞ as n → ∞. The lower bound in (3.5) follows, for m n even.

Approximation by the binomial model
The exact formula (3.1) is simpler to work with than the more complicated exact formula (3.3), but intuition suggests that the asymptotics of any of the models in the class with W n d −→ W should be similar. The next result quantifies this intuition.
Hence for the sum on the right-hand side of (3.13), there exists ε n with |ε n | < ε so that using the assumption that m n is even so all the terms in the sum are nonnegative.
Then by (3.12) and a similar argument to (3.13), the last displayed quantity is equal to By Lemma 3.4, we have that P bin ρn [A(n, m n )] ≥ exp{−nR ρ (α 2 ) − εn}, for all n large enough. So we may take ε > 0 small enough so that the final log term in the last display is O(exp{−n}), say. Hence Since |ε n | ≤ ε and ε > 0 was arbitrary, (3.8) follows in the case of even m n . In the other case, Lemma 3.2 yields the same conclusion. The final statement in the lemma then follows from the final statement in Lemma 3.4.

Proofs of Theorem 2.5 and Proposition 2.6
Now we can complete the proofs of Theorem 2.5 and Proposition 2.6.
Proof of Theorem 2.5. The theorem is now a consequence of Lemmas 3.4 and 3.5.

Approximation by the binomial model
In Section 3.4 we showed (in Lemma 3.5) that P ρn [A(n, m n )] can be well approximated by P bin ρn [A(n, m n )] on the logarithmic scale, provided that m n /n → α. The following result is an analogous approximation lemma for E ρn [N (n, m n )]. One could obtain such a result from Lemma 3.5 applied to (2.6), with some work (including dealing separately with terms with = o(n): cf Section 4.4 below). However, it is more convenient to proceed directly, albeit using similar ideas to the proof of Lemma 3.5; helpful is the fact that E ρn [N (n, m)] possesses monotonicity properties absent for P ρn [A(n, m)].  2), . . . be independent copies of W . Using the Skorokhod representation theorem, we may take W n,1 , W n,2 , . . . as independent copies of W n , being the weights of the rows in the general model, such that W n,i → W (i) almost surely. Also, take W bin n,i to be the number of odd components in a multinomial (W (i); n −1 , . . . , n −1 ) distribution, so that W bin n,1 , W bin n,2 , . . . are independent copies of W bin n and the weights of the rows in the binomial model. We also couple the row entries: if W bin n,i = W n,i , we generate a single row i with the given weight to use in both models, otherwise, it suffices to generate the two rows independently given their (different) weights.
Take ε > 0. Let A n (i) := {W n,i = W bin n,i }. Then for any δ > 0, we may take n large enough so that P[A n (i)] ≤ δ, uniformly in i. Let K(n, m) = m i=1 1 An(i) denote the number of 'bad' rows. Then K(n, m) is stochastically dominated by a Bin(m, δ) variable. In particular, for any fixed ε > 0 and any C < ∞, standard binomial tail bounds imply that we may take δ small enough, and hence n sufficiently large, so that P[K(n, m n ) ≥ εn] ≤ P[Bin(2αn, δ) ≥ εn] ≤ exp{−Cn}.

Null vectors consisting of few rows
In the asymptotics of E ρn [N (n, m)], it turns out that null vectors of low weight play a distinct and important role. Recall (4.1). The main result of this section is the following lemma, which exhibits a polynomial growth rate for null vectors of few rows.  Proof of Lemma 4.3. Let n, ∈ N. Let R = R(n, ) denote the 'column range' of M (n, ), i.e., the number of columns of non-zero degree. We estimate P ρn [A(n, )] by considering separately the events R ≤ k and R > k, where k = k( ) ∈ [n] is to be chosen later.
We describe M (n, ) in the language of allocations: for each row, a W n -distributed collection of balls is distributed uniformly at random among n urns (columns), at most one ball per urn. If R ≤ k then there is some set of k urns that contain all the balls. For each ball, the probability that it lands in one of the first k urns, given that the other balls cast so far for that row all land in the first k urns, is at most k/n. Hence since for each of the rows at least r 0 balls are cast, For A(n, ) to occur, each of the columns in the range must have degree at least 2. Thus if R > k and A(n, ) occurs there is a collection of k + 1 urns such that each urn in the collection gets at least 2 balls. Let B(i) be the event that urn i gets at least 2 balls. The probability that a particular entry is 1, given the values of up to k other entries in the same row, is at most r 1 /(n − k). Hence the union bound yields for 1 ≤ j ≤ k + 1 that Hence by the union bound, provided k ≤ n/2 we have where we put c 1 = 2r 2 1 . Combined with (4.8) this gives For all n large enough, m n ≤ (1 + α)n so that, for all , and for k ≤ n/2, by (4.2), E ρn [N (n, m n ; )] = m n P ρn [A( , n)] Taking k = + r 0 − 2, we obtain for each fixed that for some constant c( ) we have which is O(n 2−r0 ) for any fixed ≥ 2.
Fix an integer K ≥ 2, to be chosen later, and take K ≤ ≤ δn. Now put k = (r 0 − 1) − /2 . Assume δ ≤ 1/(2r 0 ); then for ≤ δn this choice of k satisfies k ≤ n/2, so (4.9) remains valid. Also, since r 0 ≥ 3, k ≥ 3 2 − 1 ≥ provided ≥ 2. By the bound e ≥ ! and similar for k, there are constants c 2 , c 3 , c 4 such that the first term in the right side of (4.9) (i.e. the product of the first factor with the first term in the second factor) is bounded by a constant times Similarly, there are constants c 5 , c 6 , c 7 such that the second term in the right side of (4.9) is bounded by a constant times Combining with (4.10), since r 0 ≥ 3 so r 0 − 2 ≥ 1 and ( /2) ≤ /2 ≤ ( /2) + 1, we can find a constant c 8 such that for 2 ≤ ≤ n we have , which is O(n 2−r0 ) provided we choose K so that K/2 ≥ r 0 − 1.

Hypercycles and 2-cores
Recall the definitions from Section 2.3. The connection between the 2-core and hypercycles was exploited by Cooper [5, p. 371], following an idea that he attributes to Molloy (see [4, p. 268]). The connection is demonstrated by the following result. (i) Any hyperedge E / ∈ C cannot belong to a hypercycle of (V, E).
Proof. If there are s hyperedges not in the 2-core C, there exists a labelling of them as E 1 , E 2 , . . . , E s with the property that, for every j, E j has some vertex with degree one after hyperedges E 1 , E 2 , . . . , E j−1 are removed. Suppose (V, E) has some hypercycle F = ∅. None of E 1 , E 2 , . . . , E s can belong to F: otherwise, there would be some minimum j for which E j ∈ F, and this E j has some vertex v of degree one in the partial hypergraph from which E 1 , E 2 , . . . , E j−1 have been removed, which contains F; so v cannot have even degree in F, which is a contradiction. This proves (i), and (ii) follows. For (iii), say c := |V (C)| < |C| =: r. Then there are 2 r − 1 non-empty partial hypergraphs, but only 2 c < 2 r − 1 possible indicator vectors for a set of vertices of odd degree. By the pigeonhole principle, there must be two distinct partial hypergraphs F, F ⊆ E for which the sets of vertices of odd degree are the same. Then F F is a hypercycle.

The 2-core in uniform random hypergraphs
In this section we consider a certain uniform random hypergraph model, which is different from (but related to) the hypergraph model induced by our random matrix M (n, m n ); in Section 5.3 we will connect the two models.
Recall from Section 2.3 that we may represent a hypergraph by an incidence matrix, and the degree of a vertex is the number of incidences (non-zero entries) in the corresponding column of the incidence matrix. The weight of a hyperedge is its number of incident vertices, i.e., the number of incidences in the corresponding row of the matrix. A natural probability model for a random hypergraph is to fix the multiset of vertex degrees and the multiset of hyperedge weights in advance (subject to a consistency condition), and sample uniformly from the hypergraphs with these collections of vertex degrees and hyperedge weights. This gives a uniform random hypergraph.
Darling and Norris [9] analyse the statistical properties of the 2-cores for sequences of uniform random hypergraphs, assuming a uniform bound on the hyperedge weights and vertex degrees. In unpublished work of the same authors, the uniform bounds are replaced by third moments assumptions. For the present paper, we require a more modest relaxation of the conditions of [9], to cover the case where the row weights remain uniformly bounded but the vertex degrees are approximately Poisson distributed.
For each n, define vectors of nonnegative integers d n := (d n (k) : k ∈ Z + ) and w n := (w n (k) : k ∈ N) with k≥0 d n (k) = n and m n := k≥1 w n (k); we assume that d n and w n are compatible in the sense that k≥1 kw n (k) = k≥0 kd n (k) < ∞. We also assume that m n → ∞. Suppose that for each i ∈ N and j ∈ Z + , lim n→∞ w n (i) Rank deficiency in sparse random GF [2] matrices Define generating functions ρ(s) := k≥1 ρ k s k and ν(s) := k≥0 ν k s k . We assume that the weights are uniformly bounded and the degree distribution has all moments, so that the means ρ (1) and ν (1) corresponding to these distributions are finite. Consider a sequence of random hypergraphs with n vertices and m n hyperedges, selected uniformly from those hypergraphs with edge weight multiplicities w n and vertex degree multiplicities d n . Let (E n , v n ) be a random incidence, sampled uniformly at random from all incidences in the nth hypergraph. Denote the weight of E n by 1 + S n , and the degree of v n by 1 + L n ; thus S n counts the other vertices in this hyperedge, and L n counts the other hyperedges incident to this vertex. Size bias occurs here: the probability that E n has weight k is proportional to k times the number of rows of weight k, and similarly the probability that the degree of v n is d is proportional to d times the number of degree d vertices. Given ρ w and ν d describing via ( σ w s w , (5.2) where, due to the size biasing, the coefficients in (5.2) are given by Hence the generating functions themselves become: To avoid triviality, we assume that σ 0 = 0 (equivalently, ρ 1 = 0), i.e., there are no 1-edges, and λ 0 / ∈ {0, 1} (otherwise the 2-core is of no interest). Define ϕ(s) := 1 − λ(1 − σ(s)).
Now we can state the result on the 2-core that we shall use, which amounts to a variant of Theorem 7.1 of [9]. Theorem 5.2. Consider a sequence of uniform random hypergraphs associated with sequences w n and d n satisfying (5.1) with ρ w = 0 for all w large enough and d≥1 ν d d β < ∞ for all β > 0. Suppose that the corresponding pair (5.2) of random-incidence generating functions has σ 0 = 0, λ 0 / ∈ {0, 1}, and is such that g * , given by (5.5), has either g * = 0 or g * satisfying (5.6). Then the following hold a.s. in the limit as n → ∞.
(i) If g * = 0, the proportion of hyperedges which survive in the 2-core converges to zero.
(ii) If g * > 0, then for any k ∈ Z + with ρ k > 0, the proportion of weight-k hyperedges which survive in the 2-core is asymptotically (g * ) k ; overall, a proportion ρ(g * ) of hyperedges survive, and a proportion g * σ(g * ) of incidences. (iii) If g * > 0, then for any d, k ∈ N with 2 ≤ d ≤ k and ν k > 0, the proportion of vertices of degree k whose degree in the 2-core is d converges to (iv) If g * > 0, the 2-core is again a uniform random hypergraph, given its hyperedge weights and vertex degrees, whose distributions are determined by the previous assertions.
As mentioned above, in [9] all but finitely many coefficients of the generating functions (5.2) were taken to be zero, but the methods admit the modest extension of this section, and indeed can be extended to the case where λ (1) and σ (1) are finite, corresponding to finite third moments for hyperedge weight and vertex degree distributions. Because of its proximity to the result in [9], we do not prove Theorem 5.2 here.

Application to the random matrix model
We return to the random matrix model used in the rest of the paper, so our random incidence matrix will be M (n, m n ) described in Section 2.1, i.e., with i.i.d. rows with W n -distributed weights, and corresponding generating function ρ n (s) having limit ρ(s).
To justify application of Theorem 5.2 in this setting, we give the following strong laws of large numbers for the empirical distributions of the row and column weights of M . Proof. First note that m n −1 E[N k (n)] = P[W n = k], which converges to P[W = k] by assumption. To deduce almost sure convergence from this convergence in means, we use the Azuma-Hoeffding inequality in a standard way, as follows. Fix n and for 1 ≤ i ≤ m n let F i be the σ-algebra generated by the rows X 1 , . . . , X i of M (n, m n ). Define Since resampling a single row changes the number of rows of weight k by at most 1, we have for 1 ≤ i ≤ m n that The remaining two parts of the lemma use the assumption P[W ≤ r 1 ] = 1 for r 1 < ∞. To prove (5.8) note that the degree of the first column of M (n, m n ) is binomially distributed with parameters m n and E[W n ]/n. Hence E[Ñ k (n)/n] = P[Bin(m n , E[W n ]/n) = k], which tends to e −µ µ k /k! as n → ∞ by binomial-Poisson convergence. Given convergence of means, we may prove (5.8) by a similar Azuma-Hoeffding argument to that for (5.7), since resampling a single row changes the number of columns of degree k by at most r 1 .
For the final statement in the lemma, we have that Hence conditionally on this sequence of empirical distributions, almost surely we have a sequence of random matrices satisfying the hypotheses of Section 5.2.
In the notation of Section 5.2, in this case ν(s) = and to emphasize the dependence on α we will use the notation ϕ α for ϕ from now on. Recall from (5.5) that g * was defined as the largest s ∈ [0, 1) for which ϕ α (s) = s. In order for the model of this section to fit into the setting discussed in Section 5.2, we need to assume that σ 0 = 0 and λ 0 / ∈ {0, 1}. Here λ 0 = e −µ = e −αE [W ] and σ 0 = P[W =1] E [W ] .
So it suffices to assume that α > 0, P[W ≥ 2] = 1, and E[W ] < ∞; in this case the argument in Section 5.2 shows that g * is well defined. Note that g * depends both on ρ and on α; in this section we write g * = g * (α) to emphasize the dependence on α; we will show (see Lemma 5.5) that the present definition is equivalent to that at (2.11) given in Section 2.2. For any solution s ∈ [0, 1) to ϕ α (s) = s, so in particular for s = g * (α), provided ρ (s) = 0, we have α = h(s) as given by (2.9).
Now assume also that P[W ≥ 3] = 1 and E[W 2 ] < ∞. Then the following hold.
(ii) The function g * is right continuous, and there is a finite set D ρ ⊂ (0, ∞), with α ρ = inf D ρ , such that g * is continuous apart from jumps at points of D ρ . For each α ∈ D ρ , h(g * (α)) = α is a local minimum for h.
For part (i), under the extra assumption P[W ≥ 3] = 1 we have h going to infinity at 0 and at 1, and by continuity h attains its infimum on (0, 1), so using (2.10) and (2.11) we have that g * (α ρ ) is the supremum of a non-empty compact set contained in (0, 1), and so lies in (0,1). The last part of (i) also follows from the continuity of h.
For part (ii), under the extra assumption E[W 2 ] < ∞, note first that if 0 ≤ y < α ρ then g * (y) = 0. Hence g * is continuous at y for all y < α ρ . Now let y ≥ α ρ ; note that by (2.11) and continuity of h, we have h(g * (y)) = y. Take a monotonic sequence y n tending to y; set x n = g * (y n ).
Suppose first that y n ↓ y. Then x n is nonincreasing; denoting the limit by x ∞ we have h(x n ) = y n so h(x ∞ ) = y by continuity, and therefore x ∞ ≤ g * (y) by (2.9). Since also x n ≥ g * (y) by monotonicity we have x ∞ = g * (y); hence g * is right-continuous at y. Now suppose instead that y n ↑ y. Set x = g * (y). If h does not have a local minimum at x then lim inf g * (y n ) ≥ x, so that x n → x, and hence g * is left-continuous at y. Hence, if g * is discontinuous at y then h has a local minimum at g * (y).
The function h is analytic and non-constant on (0, 1) so its zeros do not accumulate except possibly at 0 or 1.
(5.10) By Theorem 5.2(iii) and (5.8), for d ≥ 2 the proportion of original vertices whose degree in the 2-core is d is asymptotically the remainder having degree 0 in the 2-core (the algorithm of Section 5.1 never deletes any columns). In other words, the 2-core vertex degrees have the distribution of a random variable D1{D = 1}, where D ∼ Po(µσ(g * )); by (5.10), µσ(g * ) = αρ (g * ). As a check on the previous calculation of the number of surviving incidences, n −1 times the total number of incidences in the 2-core should converge to the mean of the vertexdegree distribution, which is αρ (g * )(1 − e −αρ (g * ) ) = αg * ρ (g * ), as in (5.10).

Proofs of Theorems 2.2 and 2.4
We are now in a position to present the proof of Theorem 2.4.
Proof of Theorem 2.4. By Corollary 5.4, a.s. our sequence of random matrices satisfies the hypotheses of Theorem 5.2. If α < α ρ , then g * (α) = 0, and Theorem 5.2 implies the 2-core has o(n) rows. From now on suppose α > α ρ , so g * = g * (α) > 0 by Lemma 5.5. For the statement (i), note that out of m n ∼ αn rows, a proportion ρ(g * ) survives, by Theorem 5.2(ii). For (ii), the discussion around (5.11) implies that the proportion of the n original vertices whose degree in the 2-core is non-zero is obtained by subtracting from 1 the mass that a Po(ν) random variable places on {0, 1}.
Proof. We know from Lemma 5.5 that α ρ ≤ 1, so if α ρ ≤ α ρ there is nothing to prove. Hence we assume α ρ > α ρ from now on. First we show that for any ε > 0 there exists α ∈ (α ρ − ε, α ρ ), such that ψ(g * (α)) > 0. (5.14) By the definition (2.12) of α ρ , and the assumption α ρ > α ρ , if (5.14) fails then there exists δ > 0 such that ψ • g * is identically zero on the interval I := (α ρ − δ, α ρ ), and by taking δ small enough we may assume the interval I contains no discontinuities of g * . But then the image J := g * (I) is also an open interval because g * is continuous and strictly increasing on I. So we would then have ψ identically zero on J, which would contradict the fact that ψ is analytic and non-constant on (0, 1). Thus (5.14) must hold as asserted.
Observe next that every time the 2-core algorithm deletes a row, it has to create at least one column of degree zero, and possibly more. So the aspect ratio (i.e., number of rows divided by number of occupied columns) is nondecreasing at each step of the algorithm, provided the initial aspect ratio is at least 1. Hence the aspect ratio of a non-empty 2-core is at least as large as the aspect ratio of the original incidence matrix to which the algorithm is applied, provided the latter is at least 1.
So if m n /n → α > 1, the aspect ratio of the original matrix exceeds 1 for all n large enough, and hence so does the aspect ratio of any non-empty 2-core. Suppose that α ρ > 1. By (5.14) and the finiteness of D ρ , there exists α ∈ (1, α ρ ) \ D ρ such that ψ(g * (α )) > 0. Then, by Theorem 2.4(iii), with m n /n → α = α , the 2-core has aspect ratio less than 1 for all n large enough, which contradicts the previous conclusion that α > 1 implied the 2-core having limiting aspect ratio greater than 1. Hence α ρ ≤ 1.
The situation in Theorem 2.4(iii) is clarified by the following facts on h and ψ.
The function ψ has at least one zero in (0, 1), and h has at least one local minimum in (0, 1). Suppose that the following condition holds: (a) h has a single local minimum x ρ in (0, 1), with h(x ρ ) = inf x∈(0,1) h(x). Then x ρ is the location of the unique local maximum of ψ in (0, 1), ψ(x ρ ) > 0, and the interval (0, 1) contains exactly one zero of ψ, denoted x * ρ , which satisfies x ρ < x * ρ .
Finally, in the fixed row-weight case where where W = r ≥ 3 a.s., condition (a) holds, and the unique positive zero of ψ is x * r ∈ ( r−2 r−1 , 1).
An important observation that helps to explain the close connection between the functions h and ψ (apparent in Figure 1, for example) and will also form an ingredient in the proof of Proposition 5.7 is the following result.