Strong Consistency Guarantees for Clustering High-Dimensional Bipartite Graphs with the Spectral Method

In this work, we focus on the Bipartite Stochastic Block Model (BiSBM), a popular model for bipartite graphs with a community structure. We consider the high dimensional setting where the number $n_1$ of type I nodes is far smaller than the number $n_2$ of type II nodes. The recent work of Braun and Tyagi (2022) established a sufficient and necessary condition on the sparsity level $p_{max}$ of the bipartite graph to be able to recover the latent partition of type I nodes. They proposed an iterative method that extends the one proposed by Ndaoud et al. (2022) to achieve this goal. Their method requires a good enough initialization, usually obtained by a spectral method, but empirical results showed that the refinement algorithm doesn't improve much the performance of the spectral method. This suggests that the spectral achieves exact recovery in the same regime as the refinement method. We show that it is indeed the case by providing new entrywise bounds on the eigenvectors of the similarity matrix used by the spectral method. Our analysis extend the framework of Lei (2019) that only applies to symmetric matrices with limited dependencies. As an important technical step, we also derive an improved concentration inequality for similarity matrices.


Introduction
Bipartite graphs are a convenient way to represent the relationships between objects of two different types.One can find examples of applications in many fields such as e-commerce with customers and products Huang et al. (2007), finance with investors and assets Squartini et al. (2017), and biology with plants of pollinators networks Young et al. (2021).These networks are often large, and sparse.Moreover, the number of type I and type II nodes can be quite different.
To extract relevant information from these networks one often relies on clustering methods.Amongst them, spectral clustering (SC) is one of the most popular approaches due to its efficiency in terms of computational complexity and statistical accuracy.However, the existing consistency guarantees for SC are often weak or require a sub-optimal sparsity level, and do not fully explain the performance of SC, as observed experimentally in Braun and Tyagi (2022) and Ndaoud et al. (2022).
In this work, we fill this gap by showing that the SC achieves exact recovery under the BiSBM, an asymmetric extension of the Stochastic Block Model (SBM) commonly used to evaluate the performance of the clustering algorithm for bipartite graphs.Besides, we show that SC is optimal in the sense that it achieves exact recovery whenever n 1 n 2 p 2 max ≳ log n 1 , the optimal sparsity regime.We leave as future work the characterization of the precise constant necessary for exact recovery.

Main contributions
Our main contributions are summarized below.
• We show that the spectral method achieves exact recovery of the rows partition whenever n 1 n 2 p 2 max ≳ log n 1 and is hence optimal.To do that, we extend to similarity matrices the entrywise concentration bounds for eigenvectors obtained by Lei (2019) for matrices with independent entries, or limited dependencies.
• Our analysis applies to rank deficient connectivity matrix.It allows for the partially remove of the " spectral gap condition " -a common condition in the analysis of spectral methods that requires that the matrices of interest satisfy some rank condition to ensure that there is a spectral gap -as in the recent work of Löffler et al. (2021); Zhang and Zhou (2022).
• Central to our proof is an improved concentration bound for similarity matrices.We derive this result by adapting the combinatorial argument of Feige and Ofek (2005) used to show the concentration of adjacency matrices sampled from the generalized Erdös-Renyi model.

Related work
Bipartite graphs and spectral clustering.The recent work of Braun and Tyagi (2022) confirmed the conjecture of Ndaoud et al. (2022) that n 1 n 2 p 2 max ≳ log n 1 is a necessary and sufficient condition for exact recovery of the rows partition under the high-dimensional BiSBM where n 1 ≪ n 2 .This threshold can be achieved by generalized power methods proposed in the aforementioned articles.However, existing strong consistency guarantees for SC requires stronger assumption.For example, when specialized to the setting of Ndaoud et al. (2022) (a special case of our more general model), the result of Cai et al. (2021) holds only when the sparsity level satisfies n 1 n 2 p 2 max ≳ log 2 n 2 .When n 1 n 2 p 2 max ≳ log n 1 , SC is only guaranteed to achieve weak consistency Braun and Tyagi (2022).The work of Florescu and Perkins (2016) also showed that when n 1 n 2 p 2 max ≳ 1, one can recover a proportion of the type I nodes labels by a SBM reduction, but this is the weakest existing recovery guarantee and we are focusing on exact recovery.The recent work of Zhang and Zhou (2022) also proposed an improved analysis of the spectral method for asymmetric matrices with independent entries, but their bound becomes trivial in the high-dimensional regime n 1 ≪ n 2 we are interested in.
Entrywise concentration bounds for eigenvectors.In recent years, spectral algorithms have been shown to successfully achieve exact recovery in various community detection tasks under various settings such as, e.g., the SBM Abbe et al. (2020b), the Contextual SBM Abbe et al. (2020a), the Censored Block Model Dhara et al. (2022a), Hierarchical SBM Lei et al. (2020) and uniform Hypergraph SBM Gaudio and Joshi (2022).Spectral methods have also been used in other estimation problems such as group synchronization d 'Aspremont et al. (2021), ranking Chen et al. (2019), or planted subgraph detection Dhara et al. (2022b).To prove these results, one generally needs to obtain entrywise eigenvector concentration bounds.In this work, we will follow the framework developed by Lei (2019) that combines techniques used to obtain deterministic perturbation bounds Fan et al. (2016); Cape et al. (2019b); Damle and Sun (2020) with techniques that rely on some stochastic properties of the noise Abbe et al. (2020b); Cape et al. (2019a); Eldridge et al. (2018).

Notations
We use lowercase letters (ϵ, a, b, . ..) to denote scalars and vectors, except for universal constants that will be denoted by c 1 , c 2 , . . .for lower bounds, and C 1 , C 2 , . . .for upper bounds and some random variables.We will sometimes use the notation If the inequalities only hold for n large enough, we will use the notation Matrices will be denoted by uppercase letters.The i-th row of a matrix A will be denoted as A i: .The column j of A will be denoted by A :j , and the (i, j)th entry by A ij .The transpose of A is denoted by A ⊤ and A ⊤ :j corresponds to the jth row of A ⊤ by convention.I k denotes the k × k identity matrix.For matrices, we use ||.|| and ||.|| F respectively denote the spectral norm (or Euclidean norm in the case of vectors) and the Frobenius norm.

Model and algorithm description 2.1 The Bipartite Stochastic Block Model (BiSBM)
The BiSBM is a direct adaption of the SBM Holland et al. (1983) to bipartite graphs.The model depends on the following parameters.
• A set of nodes of type I, N 1 = [n 1 ], and a set of nodes of type II, where M n,K denotes the class of membership matrices with n nodes and K communities.Each membership matrix Z 1 ∈ M n1,K (resp.Z 2 ∈ M n2,L ) can be associated bijectively with a partition function z : where k is the unique column index satisfying (Z 1 ) ik = 1 (resp.(Z 2 ) ik = 1 ).
• A connectivity matrix of probabilities between communities Let us write A graph G is distributed according to BiSBM(Z 1 , Z 2 , Π) if the entries of the corresponding bipartite adjacency matrix A are generated by where B(p) denotes a Bernoulli distribution with parameter p. Hence the probability that two nodes are connected depends only on their community memberships.The sparsity level of the graph is denoted by p max = max i,j p ij .We make the following assumptions on the model.
Assumption A1 (Approximately balanced communities).The communities C 1 , . . ., C K , (resp.C ′ 1 , . . ., C ′ L ) are approximately balanced, i.e., there exists a constant α ≥ 1 such that for all k ∈ [K] and l ∈ [L] we have We will consider throughout this work the parameters α, K and L as constants.We won't keep track in the stated bounds of the dependencies in these parameters.
We will rely on the following assumption to ensure that the communities are well separated.
Assumption A2 (Communities are well separated).Let U ΛU ⊤ be the spectral decomposition of P P ⊤ .All the communities are well separated if the following assumptions are satisfied.
1.The smallest non zero eigenvalue of ΠΠ ⊤ , denoted by λ min (ΠΠ ⊤ ), satisfies λ min (ΠΠ Remark 1.This assumption doesn't require that ΠΠ ⊤ is full rank contrary to classical assumptions used for analyzing spectral clustering.For example, consider the setting where K = 2 = L, the communities are exactly balanced and where p is the sparsity parameter and c > 0 is a constant.Observe that where W = 2 n1 has orthonormal columns.The SVD decomposition of ΠΠ ⊤ is given by cpV V ⊤ where V = ( c √ 1+c 2 , 1 √ 1+c 2 ) ⊤ .Hence, U = W V and for i ∈ C 1 and j ∈ C 2 we have The quality of the clustering is evaluated through the misclustering rate r defined by where S denotes the set of permutations on [K].We say that an estimator ẑ achieves exact recovery if r(ẑ, z) = 0 with probability 1 − o(1) as n tends to infinity.It achieves weak consistency (or almost full recovery) if P(r( Ẑ, Z) = o(1)) = 1 − o(1) as n tends to infinity.A more complete overview of the different types of consistency and the sparsity regimes where they occur can be found in Abbe (2018).

Algorithm description
In the high-dimensional and sparse setting where n 1 ≪ n 2 and n 1 n 2 p 2 max is of order log n 1 , there is no hope to recover the columns partition Z 2 .So, it is natural to form the similarity matrix AA ⊤ and compute the top-K eigenspace of this similarity matrix.Unfortunately, the diagonal elements of AA ⊤ create an important bias ((AA ⊤ ) ii is typically of order n 2 p max while the diagonal entries of corresponding population similarity matrix are of order n 2 p 2 max ).To avoid this issue, one can remove the diagonal of AA ⊤ and obtain a matrix B. In this work, we consider a slightly different variant of the spectral methods proposed by Braun and Tyagi (2022); Ndaoud et al. (2022); Florescu and Perkins (2016).See Algorithm 1 for a complete description of the method.
Algorithm 1 Spectral method on H(AA ⊤ ) (Spec) Input: The number of communities K, the rank r of ΠΠ ⊤ and the adjacency matrix A.
1: Form the diagonal hollowed Gram matrix B := H(AA ⊤ ) where H(X) = X − diag(X).2: Compute the matrix U ∈ R n1×r whose columns correspond to the top r-eigenvectors of B. 3: Apply approximate (1 + 2/e + ϵ) approximate k-medians on the rows of U and obtain a partition z (0)  of [n 1 ] into K communities.Output: A partition of the nodes z (0) .
When the rank of ΠΠ ⊤ is not known, we propose AdaSpec (see Algorithm 2), an adaptive version of Algorithm 1.
Algorithm 2 Adaptive spectral method on H(AA ⊤ ) (AdaSpec) Input: The number of communities K, a threshold T > 0, and the adjacency matrix A.
1: Form the diagonal hollowed Gram matrix B := H(AA ⊤ ) where H(X) = X − diag(X).2: Let r ∈ [K] be the largest index such that the difference between two consecutive eigenvalues are larger than some threshold T r := arg max{r 3: Compute the matrix U ∈ R n1×r whose columns correspond to the top r-eigenvectors of B. 4: Apply approximate (1 + 2/e + ϵ) approximate k-medians on the rows of U and obtain a partition z (0) of [n 1 ] into K communities.Output: A partition of the nodes z (0) .Cohen-Addad et al. (2019).Here we used (approximate) k-medians because it can be linked easily with ℓ 2→∞ perturbation bounds (see Lemma 5.1 in Lei (2019)).But we could also apply (approximate) k-means as a rounding step and use results from Su et al. (2020), Section 2.4 for the analysis.Depending on the rounding step used, the dependencies in some model parameters such as the number of communities K can change.

Main results
First, we derive a new concentration bound for the similarity matrix B. It improves the upper-bound n 1 n 2 p 2 max ∨ log n 1 used in Braun and Tyagi (2022) to n 1 n 2 p 2 max when n 1 n 2 p 2 max ≳ log n 1 .This improvement of a √ log n 1 factor is essential to show that Spec achieves exact recovery in the challenging parameter regime where Remark 2. By using this concentration inequality, one could improve the conditions of applicability of Proposition 1. and Theorem 2. For example, Proposition 1 requires that n 1 n 2 p 2 max ≥ C log n 1 for a constant C > 0 large enough.But by using the concentration inequality of Theorem 1, we would only require n 1 n 2 p 2 max ≥ c log n 1 for an arbitrary constant c > 0. See also Remark 8 in Braun and Tyagi (2022).Finally, we show that Spec achieves exact recovery by proving the following ℓ 2→∞ concentration bound for the top−r eigenspace U of B. Let us denote the ℓ 2→∞ between two matrices of eigenvectors U and for a large enough constant C > 0, and n 2 p 2 max = o(1).Let U ΛU ⊤ (resp.U * Λ * U * ⊤ ) be the spectral decomposition of B = H(AA ⊤ ) (resp.B * = P P ⊤ ).Then there exists a constant c > 0 (that can be made arbitrarily small if C is chosen large enough) such that with probability at least 1 − n −Θ(1) Corollary 1.Under the same assumption as in Theorem 2 Spec achieves exact recovery with probability at least 1 − n −Θ(1) .
Corollary 2. Under the same assumptions of Theorem 2 with the choice T = n 1 n 2 p 2 max / log log n 1 , AdaSpec achieves exact recovery with probability at least 1 − n −Θ(1) .

Proof of Theorem 1
The proof strategy is based on the combinatorial argument developed by Feige and Ofek (2005).
Let us denote By Chernoff bound and a union bound (log(n1) .
By choosing C large enough, we can ensure that E occurs with probability at least 1 − n −3 1 .From now, we will condition on this event.
Step 1.A standard ϵ−net argument with the Euclidean norm (see e.g.Lemma B.1 and B.2 in Lee et al. (2020)) shows that for all 0 < ϵ < 1/2 there exists a ϵ−net In the following, we will fix ϵ = 1/4.
Step 2. In order to bound the previous quantity, let us introduce for all x ∈ S n1−1 the set of "light pairs" and the set of "heavy pairs" When clear from the context, we will omit the dependency in x in the notations of the previous sets.We have .
Step 3. We are going to bound (T1) w.h.p.Observe that .
It is easy to bound the deterministic quantity (E2) The upper-bound of (E1) conditioned on E follows from Lemma 6 that gives Step 1, we obtain by a union bound argument that Step 3. We are now going to bound the term involving the heavy pairs (T 2).First, one needs to control the sum of the entries of each row and column of B.
Lemma 1.There exists a constant C 2 > 0 such that with probability at least 1 − e Θ(n1n2p 2 max ) max Proof.Fix i ∈ [n 1 ].We have S = j B ij = ⟨A i: , j̸ =i A j: ⟩.One can apply Lemma 7 (see appendix) with sets I = {i} and J = [n 1 ] \ I.We conclude by using a union bound.
Then, we need to show that the matrix B satisfies w.h.p. the discrepancy property defined below, with appropriate parameters.We say that M obeys the discrepancy property DP(δ, κ 1 , κ 2 ) with parameters δ > 0, κ 1 > 0 and κ 2 ≥ 0 if for all non-empty S, T ⊂ [n], at least one of the following properties hold If one can show that B satisfies DP (δ, κ 1 , κ 2 ) where κ 1 , κ 2 > 0 are absolute constants and δ = n 2 p 2 max , then Lemma B.4 in Lee et al. (2020)  .Otherwise, we can write e B (S, T ) = i,j w ij ⟨A i: , A j: ⟩ where w ii = 0 and w ij = 1 i∈S 1 j∈T .By Lemma 7 we have for all C > C * where C * > 0 is a large enough constant.We can now continue as in the proof of Theorem 5.2 in Lei and Rinaldo (2015).For a given c > 0, let us denote We  Step 4. We can conclude by summing all the terms that have been shown to be O( √ n 1 n 2 p max ) w.h.p.

Entrywise analysis of the spectral method
To show that the spectral method achieves exact recovery, we need to derive ℓ 2→∞ eigenspace perturbation bound.Unfortunately, existing results only apply to symmetric matrices with independent entries or weak dependencies (see Section 7 in Lei ( 2019)) and cannot be directly applied to our setting.We propose an extension of the main result of Lei (2019) to the hollowed Gram matrix B considered in this work.We believe that our result can be extended to more general Gram matrices or Kernel matrices.
The spectral decomposition of the matrices B and B * is given by where Ū is the full eigenspace matrix of B and Λ is the diagonal matrix of non-zero eigenvalues of B (resp.Λ * = diag(λ * 1 , . . ., λ * r )).The noise E = B − B * can be further decomposed as First, let us establish analogous results to the Conditions (A2) and (A3) in Lei (2019).
Lemma 2. Under the assumption of Theorem 2, there is an absolute constant C 1 > 0, such that for any W ∈ R n×K , the following inequalities hold with probability at least 1 − n with R(δ) = log(n 1 /δ) + K and δ = n −c 1 for some constant c > 0.
Proof.Recall that by Theorem 1, we have with probability at least 1 − n Also, by definition, max i ∥P i: . By Lemma 5 in Braun and Tyagi (2022) we have ∥(A − P )Z 2 ∥ ≲ √ n 1 n 2 p max , hence by submultiplicativity of the norm because n 1 p max = O(1) by assumption.Consequently, the dominant error term is Ẽ and we have shown that ∥E∥ ≲ n 1 n 2 p 2 max . (5.1) Proof of 1.This is a direct consequence of Weyl's inequality and (5.1).
Proof of 3. It is a direct consequence of the sub-multiplicativity of the norm and the fact that ∥U * ∥ ≤ 1.
Proof of 4. By Proposition 2.2 in Lei (2019), if we can show that for any δ ∈ (0, 1) and vector w ∈ R n1 there exist a ∞ (δ), a 2 (δ) > 0 such that for each i ∈ [n 1 ] and let us denote R = A − P .Consider S = Ẽi: w = j∈[n2]\{i} ⟨R i , R j ⟩w j .Conditionally on R i , this is a sum of independent and centered r.v.s.By using Lemma F.3 in Lei (2019) with weights wjl = R il w j we obtain that conditionally on R i the following holds with probability at least 1 − δ Besides, with probability at least 1 − e Θ(n2pmax) , ∥R i: ∥ 2 ≤ Cn 2 p max by Hoeffding's inequality.Therefore with probability at least 1 − δ − e −Θ(n2pmax) It remains to bound S ′ = E ′ i: w = j∈[n2]\{i} (⟨R i , P j ⟩ + ⟨P i , R j ⟩)w j and S ′′ = E ′′ i: w.We have Also observe that = j∈[n2]\{i} ⟨R i , P j ⟩w j = j̸ =i,l R jl w j P il , so we can apply Lemma F.3 in Lei (2019) with weights (w j P il ) j̸ =i,l .We obtain with probability at least 1 − δ.A similar result holds for S ′ = j̸ =i,l ⟨P i , R j ⟩w j : we can apply again Lemma F.3 in Lei (2019) with weights (P jl w j ) j,l and obtain . Also note that n 2 p max ≳ log n 1 by assumption so if we choose δ = n −c 1 for an appropriate constant c > 0, the term e −Θ(n2pmax) will be negligible compared to δ.

A new decoupling argument
The main difficulty to adapting Theorem 2.3 and 2.5 Lei (2019) comes from the decoupling assumption (A1) which requires the existence of a matrix B (i) (typically obtained by replacing the i-th row and column of B by zeros or the expectation of the entries) such that for any δ ∈ (0, 1) (5.2) If the matrix B had independent entries it would be straightforward to satisfy this condition, but in our setting, it is not clear how to obtain such a general result.Consequently, we adopted a different approach that avoids bounding the total variation distance between two probability distributions.
Let us denote by B (i) the matrix obtained by removing the i-th row and column of B. We have , and ∥E i: ∥ ≤ ∥E∥ ≤ n 1 n 2 p 2 max .We also have by definition Hence, because of assumption A2 we obtain that w.h.p.
These inequalities correspond to the Condition (C0) used in the proof of Theorem 2.3 in Lei (2019).They are summarized in the following lemma.
Steps one and two of the proof of Theorem 2.3 Lei (2019) are deterministic and still hold in our setting (see the discussion in Section 5.3).The only step that uses the decoupling argument is the third step where one needs to bound E i: (U (i) H (i) − U * ) where H (i) ∈ R r×r is the orthogonal matrix that best aligns U (i) and U * .Lemma 4. Let W (i) ∈ R n1×K be a matrix that only depends on B (i) .Under the assumptions on Theorem 2, it holds with probability at least Proof.
Recall that E = Ẽ + E ′ + E ′′ .By triangular inequality We will first handle the first term.Let us denote R = A − P and consider where w (i) ∈ R n1 depends on A −i .
Conditionally on A −i , S is a weighted sum of independent and centered Bernoulli's r.v.Hence, by Lemma F.3 in Lei (2019) with δ = n −c 1 , and weights wjl = R jl w (i) j we obtain Fact.We have with probability at least 1 − e −Θ(n2pmax) that max Proof of the Fact.We have R 2 jl ≤ 1 and Var( l R 2 jl ) ≤ 2n 2 p max .Hence by Bernstein inequality, n2pmax) .
We can conclude by a union bound and the fact that n 2 p max ≳ log n 1 by assumptions on the sparsity level p max and n 2 ≳ n 1 log n 1 .
Let us denote by Ω 1 the event under which the inequality of the previous fact holds.Note that this event only depends on A −i .We have P(Ω c 1 ) ≤ e −Θ(n2pmax) .Consequently n2pmax)  where E Ω1 denotes the expectation over A −i conditioned on Ω 1 .The other terms E ′ i: w (i) and E ′′ i: w (i) can be handled in a similar way.They are actually easier to treat because one doesn't need to use a conditioning argument since E ′ i: , E ′′ i: are independent of A −i .We have by definition Also we can decompose .
S 1 is a sum of n 2 weighted independent Bernoulli's r.v. with weights given by w l = j̸ =i P jl w Lei (2019) gives with probability at least 1 By a similar argument, we can show that with probability at least 1 Consequently, with probability at least 1 − O(n −c 1 ), Then, by using Proposition 2.2 (ϵ-net argument) in Lei (2019) we obtain that with probability at least 1 − O(n −c 1 ) (5.3) Once we have obtained this inequality, the proof of Step III. is the same as in Lei (2019).

Proof of Theorem 2
First, we will extend Theorem 2.3 in Lei (2019).In order to make the adaptation easier, we will use the same notations as in Lei (2019).Let ∆ * = λ * min be the effective eigengap (it corresponds with the definition in Lei (2019), with s = 0).In our setting, the condition number κ only depends on K and L and hence is considered a constant.Also, observe that U * is the full eigenspace of B * .We have shown in Section 5.1 and 5.2 that the following conditions (partially matching the assumptions (A1)-(A4) in Lei ( 2019)) hold with δ = n −q 1 for some constant q > 0.
Condition C1.There exists a constant C 1 > 0 such that with probability at least 1 − O(n −q 1 ) the following conditions hold 1.
Condition C2.There exists a constant C 2 > 0 such that with probability at least 1 − O(n −q 1 ) the following inequalities hold Condition C3.For any i ∈ [n 1 ] and fixed matrix W ∈ R n1×r , where b ∞ (δ) ≲ log n1 log log n1 and b 2 (δ) Theorem 3. Let δ = n −q 1 for some constant q > 0. Then under conditions C1-C4 and the assumptions of Theorem 2, there exists a constant C 3 > 0 such that with probability at least 1 − O(n −q 1 ) Proof.We cannot directly apply Theorem 2.3 in Lei (2019) because Condition C1 doesn't include the condition stated in (A.1).But this condition is only used in the Step III. of Theorem 2.3 where one needs to control E i: (U (i) H (i) − U * ) .We used a different argument to control this quantity in Section 5.2 and we obtained by equation ( 5.3)

This concludes
Step III in Theorem 2.3 in Lei (2019).
Corollary 3.Under the same assumption as in Theorem 3, there is a constant c > 0 (possibly depending on q) such that with probability at least 1 − O(n −q 1 ) Proof.By triangular inequality We can bound ∥EU * ∥ 2→∞ by using the same proof technique as in Lemma 2, bullet 4, similarly to Lemma 3.3.in Lei (2019).We obtain with probability at least 1 Hence with probability at least 1 It is easy to check that By consequence, triangular inequality and Theorem 3 implies that w.h.p.
for a constant c > 0 that can be made small enough if the constant C such that n 1 n 2 p 2 max ≥ C log n 1 is chosen large enough.

Proof of Corollary 1
The proof is standard, but for completeness, we outline it.First, we need to relate the k-medians algorithm with the ℓ 2→∞ perturbation bounds.It can be done by the following lemma.

Proof of Corollary 2
It is sufficient to show that w.h.p. we have r = r.But this is a straightforward consequence of Weyl's inequality and the fact that ∥B − B * ∥ ≲ n 1 n 2 p 2 max .
always the case and one needs to carefully control the number of indexes l such that i∈[n1] A il scales as √ log n 1 .Toward this perspective, let us introduce the events where M > 0 is a constant that will be defined later.Note that onditionally on E, [n 2 ] ⊂ L 1 ∪ L 2 .By using the fact that w ij = x i x j and j∈Λ c δ be the r.v.corresponding to the size of L c 1 .By Chernoff bound, and n 2 ≳ n 1 log 2 n 1 .Hence, by using Bernstein inequality, we obtain As a consequence the event occurs with probability at most e −Ω(n2) .

Control of the term l∈
Let us define the event By consequence, Berstein inequality implies that Control of the term l∈L2 ( i∈Λ δ A il x i ) 2 .Let us define the event But it is easy to check that, under the lemma assumptions, n 1 ≪ √ n 1 n 2 e −2C √ log n1 .
Control of E Λ c S δ − E A S δ .Let us define the event We have for all t > 0 log E(e E Λ c S δ −E A S δ ) = ≲ e −cn1 .
By a union bound we have for a constant c ′ > 0. It follows that, conditionned on E, with probability at least 1 − e −c ′ n1 The stated result of the Lemma follows by symmetry of S (the weights w ij can be negative).Note that the value of c ′ depends only on the constants in the events we conditioned on.So, by choosing such constants large enough, we obtain c ′ > 11.
Lemma 7. Assume that the assumption of Theorem 1 are satisfied.Let, I, J ⊂ [n 1 ] with J ⊂ I, S = i,j∈[n1] w ij ⟨A i: , A j: ⟩ where w ii = 0 for all i, w ij = 1 i∈I 1 j∈J .Then for C > 0 large enough we have A il ) 2 .
We only highlight the main modifications since the proof is similar to Lemma 6.
Control of the term l∈L1 ( i∈Λ δ ∩I A il ) 2 .Let us define the event

Definition 1 .
Let M be a n × n matrix with non-negative entries.For every S, T ⊂ [n], let e M (S, T ) denote the number of edges between S and T e M (S, T ) = i∈S j∈T M ij .
P E ∩ {S ≥ C|I||J|n 2 p 2 max } ≤ e − CProof.Following the same calculation as in Lemma 6 one can show that log E Λ c e t(S δ −E Λ c (S δ )) ≤ ≤ p max t 2