Matrix and discrepancy view of generalized random and quasirandom graphs

Abstract We will discuss how graph based matrices are capable to find classification of the graph vertices with small within- and between-cluster discrepancies. The structural eigenvalues together with the corresponding spectral subspaces of the normalized modularity matrix are used to find a block-structure in the graph. The notions are extended to rectangular arrays of nonnegative entries and to directed graphs. We also investigate relations between spectral properties, multiway discrepancies, and degree distribution of generalized random graphs. These properties are regarded as generalized quasirandom properties, and we conjecture and partly prove that they are also equivalent for certain deterministic graph sequences, irrespective of stochastic models.


Introduction
One may think of random graphs as very disordered. However, we will show, that generalized random graphs have almost sure properties which are related to their spectra, discrepancies, vertex-degrees, and exhibit regular patterns at the expectation. The generalized random graph model, sometimes called 20 stochastic block-model, was first introduced in [24], and discussed later in [4,15,22,27,30,33]. This model is the generalization of the classical Erdős-Rényi random graph, the first random graph of the history introduced in [23] and also discussed in [13], which corresponds to the one-cluster case.
The graph G n (P, P k ) on n vertices is a generalized random graph with k × k symmetric probability matrix P = (p uv ) and proper k-partition P k = (C 1 , . . . , C k ) of the vertices (|C u | = n u ) if vertices of 25 C u and C v are connected independently, with probability p uv , 1 ≤ u < v ≤ k; further, any pair of the vertices within C u is connected with probability p uu (u = 1, . . . , k). Therefore, the subgraph of G n (P, P k ) confined to the vertex set C u is an Erdős-Rényi type random graph G nu (p uu ), while the bipartite subgraphs connecting vertices of C u and C v (u = v) are random bipartite graphs of edge probability p uv . Sometimes we refer to P k as clustering, where C 1 , . . . , C k are the clusters. 30 In Chapter 3 of [8] we proved that for a given positive integer k ≤ n, there are almost surely k outstanding, so-called structural eigenvalues in the adjacency, and k − 1 outstanding ones in the normalized modularity spectrum of the generalized random graph G n (P, P k ) as n → ∞ under some balancing conditions on the cluster sizes. Under the same conditions, the k-variances of the vertex representatives, constructed by the eigen-subspaces corresponding to the structural eigenvalues, is o (1). 35 The k-way discrepancy of G n (P, P k ) also tends to 0, and the subgraphs and bipartite subgraphs defined on the vertex classes are asymptotically regular and biregular, respectively. These properties can be regarded as so-called generalized quasirandom properties, provided their equivalence can be proved for any graph sequence. More precisely, we focus on an expanding family of graphs such that, for them, any of the above properties implies the others, regardless of stochastic models. In the k = 1 case these are called quasirandom or pseudorandom graph sequences and were first discussed by Thomason [36], 5 later, by Chung, Graham and Wilson [19,20], also by Lovász [29]. In the k > 1 case, the deterministic counterparts of the generalized random graphs were first defined in [28] as graph sequences converging to a vertex-and edge-weighted graph (vertex-weights correspond to the relative sizes of the partitionmembers, whereas edge-weights to the probability matrix) in the sense of the homomorphism densities. Due to convergence facts on spectra [17], the generalized quasirandom graphs are spectrally equivalent 10 to the generalized random graphs.
In the spirit of the Szemerédi regularity lemma [35], given a large graph, we look for a k-partition of its vertices, such that the induced subgraphs and bipartite subgraphs be nearly quasirandom, in terms of the discrepancy. For this purpose, we define the k-way discrepancy that can be related to spectra. Based on the multiway discrepancy and spectra together with spectral subspaces, we will formulate 15 quasirandom properties and conjecture their equivalences, irrespective of stochastic models. Real-life expanding graph sequences asymptotically capturing one of these properties are random-like, confined to the subgraphs and bipartite subgraphs of them. The equivalences also suggest that spectral methods are capable to find k-partitions of the vertices with small within-and between-cluster discrepancies; further, help us to find the optimal k based on gaps within the spectrum. The novel idea is that large, 20 real-life graphs are instances of expanding graph sequences, and if there is a cluster structure behind them, then we are able to recover it by spectral techniques.
The scope of the paper is twofold: partly we want to establish the equivalence of generalized quasirandom properties based on former results of others [19,20] and Chapter 3 of [8], and partly to make up for the missing chains in the implications. In Proposition 2, we also give a short proof for the Expander 25 Mixing Lemma for irregular graphs, and in Theorem 1, we estimate the k-th largest singular value of the normalized matrix with the k-way discrepancy. We will also extend the notion of multiway discrepancy to rectangular arrays, of which undirected or directed, unweighted or weighted graphs are special cases. The results are supported by computer simulations and processing migration data on the directed graph of which the spectral relaxation technique is illustrated. 30 The organization of the paper is as follows. In Section 2, we introduce the notion of graph-based matrices, together with their spectra, spectral subspaces, and corresponding spectral clustering techniques. In Section 3, we discuss the generalized random and quasirandom graphs, together with properties related to spectra, discrepancies and vertex degrees. In Conjecture 1 we state the equivalence of these properties, which are partly known, partly proved in this paper; in fact, only the relation to the vertex- 35 degrees is missing. Particularly, in Section 4, we prove a relation between the k-th largest singular value of the normalized matrix and the k-way discrepancy of this matrix, which is the key to prove an important implication between the quasirandom properties. In Section 5, we summarize the ideas of the paper. 40 The notion of the modularity matrix was first introduced for simple graphs (see Newman [31] for an overview) to capture the so-called community structure in social networks. In [7] we extended this notion to weighted graphs as follows. Let G = (V, W) be an edge-weighted graph on the n-element vertex-set V with the n × n symmetric weight-matrix W; the entries satisfy w ij = w ji ≥ 0, w ii = 0 and they are similarities between the vertex-pairs. The modularity matrix of G is defined as M = W − dd T , where the entries of d are the generalized vertex-degrees d i = n j=1 w ij (i = 1, . . . , n). Here W is normalized in such a way that n i=1 n j=1 w ij = 1, an assumption that does not hurt the generality, since the following normalized modularity matrix, to be mostly used, is not affected by the scaling of the entries of W:

Notation and graph based matrices
is the diagonal degree-matrix. We will demonstrate that the modularity matrix is capable to measure the discrepancy of the underlying graph, a notion which becomes important if we want to find homogeneous patterns in the graph. First we introduce some further notions.
An edge-weighted graph is called connected if its vertices cannot be divided into two disjoint subsets with all zero weights between them. This is equivalent to the weight matrix W being irreducible, in 5 which case, the generalized vertex-degrees are all positive. The modularity matrix M always has a zero eigenvalue with eigenvector 1 n = (1, . . . , 1) T , since its rows sum to zero. Because of tr(M) < 0, M must have at least one negative eigenvalue, and it is usually indefinite. In [11] we proved that the modularity matrix of a simple graph is negative semidefinite if and only if it is a complete multipartite graph. The same applies to the normalized modularity matrix, since it has the same inertia. In [8] we proved that 10 the eigenvalues of M D are in the [−1, 1] interval, and 1 cannot be an eigenvalue if G is connected. M D is closely related to the normalized Laplacian matrix. The normalized Laplacian of G = (V, W) is defined as L D = I − D −1/2 WD −1/2 , and the following relation can be established between the spectra of L D and M D when G is connected. Let 0 = λ 0 < λ 1 ≤ · · · ≤ λ n−1 ≤ 2 denote the eigenvalues of L D . The zero is a single eigenvalue with corresponding unit-norm eigenvector The 15 eigenvalues of M D are the numbers 1 − λ i with the same eigenvectors (i = 1, . . . , n − 1); further, the zero with corresponding unit-norm eigenvector √ d. Let 1 < k < n be a fixed integer. Usual spectral clustering techniques use the k bottom eigenvalues λ 0 , . . . , λ k−1 of L D together with the corresponding eigenvectors to find k 'loosely connected' clusters of the vertices; about this so-called spectral relaxation of the minimum k-way normalized cut problem 20 see, e.g., Chapter 2 of [8]. More generally, in the modularity based spectral clustering, we look for the proper k-partition C 1 , . . . , C k of the vertices such that the within-and between cluster discrepancies are minimized.
To motivate the introduction of the exact discrepancy measure observe that the ij entry of M is w ij − d i d j , which is the difference between the actual connection of the vertices i, j and the connection that is expected under independent attachment of them with probabilities d i and d j , respectively. Consequently, the difference between the actual and the expected connectedness of the subsets X, Y ⊂ V is i∈X j∈Y where w(X, Y ) = i∈X j∈Y w ij is the weighted cut between X and Y , and Vol(X) = i∈X d i is the volume of the vertex-subset X. Further, let ρ(X, Y ) := w(X,Y ) Vol(X)Vol(Y ) be the density between X and Y . 25 Definition 1. The multiway discrepancy of the edge-weighted graph G = (V, W) in the clustering C 1 , . . . , C k of its vertices is where The minimum k-way discrepancy of G is disc k (G) = min (C1,...,C k ) disc(G; C 1 , . . . , C k ).
Note that disc(G; C 1 , . . . , C k ) is the smallest α such that for every C u , C v pair and for every X ⊂ C u , holds. Hence, in the k-partition of the vertices, giving the minimum k-way discrepancy of G, every C u , C v pair is so-called α-volume regular (see [2]), and this is the smallest possible discrepancy that can be attained with proper k-partitions of the vertices of G. It resembles the notion of -regular pairs in the Szemerédi regularity lemma [35], albeit with given number of vertex-clusters, which are usually not equitable; further, with volumes, instead of cardinalities. 5 In Section 4, we will justify for the following spectral approximation of the minimum k-way discrepancy problem. Let the eigenvalues of M D , enumerated in decreasing absolute values, be 1 > |µ 1 | ≥ |µ 2 | ≥ · · · ≥ |µ n | = 0. Assume that |µ k−1 | > |µ k |, and denote by u 1 , . . . , u k−1 the corresponding unitnorm, pairwise orthogonal eigenvectors. Let r 1 , . . . , r n ∈ R k−1 be the row vectors of the n × (k − 1) matrix of column vectors D −1/2 u 1 , . . . , D −1/2 u k−1 ; they are called (k − 1)-dimensional representatives 10 of the vertices.
The weighted k-variance of these representatives is defined as where c u = 1 Vol(Cu) j∈Cu d j r j is the weighted center of the cluster C u . It is the weighted k-means algorithm that gives this minimum, and the point is that the optimumS k is just the minimum distance between the eigensubspace corresponding to µ 0 , . . . µ k−1 and the one of the suitably transformed step- 15 vectors over the k-partitions of V . In Chapter 2 of [8] we also discussed that, in view of subspace perturbation theorems, the larger the gap between |µ k−1 | and |µ k |, the smallerS k is. In the k-partition, which gives the minimum weighted k-variance of G, the k-way discrepancy of G is also 'fairly small'. The exact relations are established in Section 4, and the message is, that here the eigenvectors corresponding to the largest absolute value eigenvalues have to be used, unlike usual spectral clustering techniques. 20 In Section 3, we will also need the plain k-variance of the representatives r 1 , . . . , r n ∈ R k that are row-vectors of the matrix, the columns of which are the unit-norm, pairwise orthogonal eigenvectors corresponding to the k largest absolute value eigenvalues of W. This k-variance is where c u = 1 |Cu| j∈Cu r j is the center of the cluster C u . It is the usual k-means algorithm that finds this minimum. In fact, under some conditions, there are variants of this algorithm which find a clustering 25 'close' to the optimal one in polynomial time. We will not discuss these algorithmic aspects, see, e.g., [26] for details.

Generalized random and quasirandom graphs
Generalized random and quasirandom graphs are specimens, where the 'large' spectral gap an and the 'small' k-variance show up together with 'small' k-way discrepancy. 30 Definition 2. Let n be a natural number and k ≤ n be a positive integer. The graph G n (P, P k ) is a generalized random graph with probability matrix P and proper k-partition P k = (C 1 , . . . , C k ) of the vertices if it satisfies the following. The vertex set is V , |V | = n; the k × k symmetric matrix P is such that its entries satisfy 0 ≤ p uv ≤ 1 (1 ≤ u ≤ v ≤ k). Then vertices of C u and C v are connected independently, with probability p uv , 1 ≤ u < v ≤ k; further, any pair of the vertices of C u is connected 35 with probability p uu (u = 1, . . . , k).
With different notation, this definition can be found, e.g., in [4,22,27,30,33]. Sometimes it is called stochastic block-model that was first mentioned in [24], and discussed much later in [15] as a special case of an inhomogeneous random graph. Note that this model is the generalization of the classical Erdős-Rényi random graph, the first random graph of the history introduced in [23] and also discussed in [13], which corresponds to the k = 1 case. In this case, the probability matrix boils down to the number 0 < p < 1, whereas edges come into existence independently, with the same probability p; it is denoted by G n (p). 5 Note that that Definition 2 makes sense if the probability matrix P contains at least one non-zero entry. In many cases, one or more entries of P are zeros. In particular, when p uu = 0 (u = 1, . . . k) and p uv = p ∈ (0, 1), then the graph G n (P, P k ) has a so-called soft-core multipartite structure, defined in [11]. In the special case when p = 1, it is the complete k-partite graph K n1,...,n k over the independent vertex classes of P k , where n i = |C i | (i = 1, . . . , k). 10 If k = n and p ij : , then the model gives the random graph with expected degree sequence d 1 , . . . , d n , first discussed in [21] on the condition that This is a good model for capturing power law graphs in that the random power law graph, introduced in [3], is a special case of it.
However, the generalized random graph model can better be exploited in applications where k is 15 much less than n. Now, we keep k and P fixed, while n → ∞ under some balancing conditions on the cluster sizes. In [5,6,8] we proved the following properties of a generalized random graph. Proposition 1. Let G n (P, P k ) be a generalized random graph on n vertices with vertex-classes P k = (C 1 , . . . , C k ) of sizes n 1 , . . . n k and k × k symmetric probability matrix P. Let k be a fixed positive integer and n → ∞ in such a way that nu n ≥ c (u = 1, . . . , k) with some constant 0 < c ≤ 1 k (called balancing 20 condition). Then the following hold almost surely for the adjacency matrix A n and the normalized modularity matrix M D,n of G n (P, P k ). 1. A n has k so-called structural eigenvalues that are Θ(n), while the remaining eigenvalues are O( √ n) in absolute value. Further, the k-variance S 2 k,n of the k-dimensional vertex representatives, based on the eigenvectors corresponding to the structural eigenvalues of A n (see (5)), is O( 1 n ). 25 2. There exists a positive constant 0 < δ < 1 independent of n (it only depends on k) such that M D,n has exactly k−1 structural eigenvalues of absolute value greater than δ, while all the other eigenvalues are less than n −τ in absolute value, for every 0 < τ < 1 2 . Further, the weighted k-varianceS 2 k,n of the (k − 1)-dimensional vertex representatives, based on the transformed eigenvectors corresponding to the structural eigenvalues of M D,n (see (4)), is O(n −τ ). 30 3. There is a constant 0 < θ < 1 (independent of n) such that disc 1 (G n (P, P k )) > θ, . . . , disc k−1 (G n (P, P k )) > θ, and the k-way discrepancy disc k (G n (P, P k ); For the proofs of Properties 1-2 see Theorems 3.1.6, 3.1.8 and Propositions 3.1.10, 3.1.12 of [8]. The 2-3 relation between discrepancy and spectra will be discussed in Section 4, whereas the proof of Property 4 is as follows. We will use the following version of the Chernoff's inequality. 35 Lemma 1 (Chernoff inequality for large deviations). Let X 1 , . . . , X n be independent random variables, Then for every a > 0: 2(Var(X)+Ka/3) .
Proof of Property 4. Consider the generalized random graph sequence G n (P, P k ), the subgraphs and the bipartite subgraphs of which have the following expected degrees. We will drop the index n, and use the notation A = (a ij ) for the entries of its adjacency matrix. As for the C u , each vertex in C u has the same expected number of neighbors in C v . Observe that for i ∈ C u , the sum j∈Cv a ij has binomial distribution with the above expectation and variance n v p uv (1 − p uv ). Therefore, by Lemma 1, the between-cluster average degrees are highly concentrated on their expectations as n → ∞ under the balancing conditions nu n ≥ c (u = 1, . . . , k) for the cluster sizes. Indeed, for any 0 < ε < 1: that tends to 0 even with the choice ε = n −τ , 0 < τ < 1 2 . Therefore, it holds almost surely that This finishes the proof. As for every 1 ≤ u ≤ v ≤ k, the number of common neighbors in C v of any i, j ∈ C u (i = j) pair has binomial distribution with expectation n v p 2 uv and variance n v p 2 uv (1 − p 2 uv ), with the same calculations as above we obtain that | t∈Cv almost surely. This finishes the proof. Consequently, the subgraphs confined to the vertex-classes exhibit regular, while the induced bipartite subgraphs of a generalized random graph exhibit biregular structure asymptotically. 5 Now we will discuss similar properties of the generalized quasirandom graphs, which are the deterministic counterparts of the generalized random graphs and are spectrally equivalent to them.
Let us start with the k = 1 case. Quasirandom or pseudorandom graph sequences were first discussed by Thomason [36]. Later, Chung, Graham and Wilson [19] used the term quasirandom for simple graphs that satisfy any of some equivalent properties, where these properties are closely related to the properties 10 of expander graphs, including the 'large' spectral gap. For a sampler of these quasirandom properties see also Lovász [29]. Chung and Graham [20] investigated quasirandom graphs with given degree sequences. Among others, they proved that 'small' discrepancy is caused by a 'large' spectral gap, which is 1− M D . This relation is summarized in the following proposition that is a straightforward generalization of the Expander Mixing Lemma for irregular graphs. 15

Proposition 2.
disc where M D is the spectral norm of the normalized modularity matrix of G. Though, with different notation, even a stronger version of this proposition is proved in [20], we give another short proof here. Proof. Via separation theorems for singular values, the largest singular value |µ 1 | of M D is the maximum of the bilinear form v T M D u over the unit sphere. Let X, Y ⊂ V be arbitrary, and denote by 1 X , 1 Y ∈ R n the indicator vectors of them. Then Taking the maxima on the right-hand side over subsets X, Y ⊂ V , the desired relation follows. Note that the estimate is also valid if we take maxima over disjoint X, Y pairs only.
In [20], the authors also proved that in the case of dense enough graphs (the minimum degree is cn for some constant c and number of vertices n) the converse implication is also true. In view of the Expander Mixing Lemma, a 'large' spectral gap is an indication that the weighted cut between any two 5 subsets of the graph is near to what is expected in a random graph, the vertices of which are connected independently, with probability proportional to their generalized degrees. The notion of discrepancy together with the Expander Mixing Lemma was first used for simple (sometimes regular) graphs, see e.g., [1,25], and extended to Hermitian matrices in [14]. Historically, Thomason [36,37] was the first to prove equivalences between quasirandom properties, though, with a bit different notions: he used the 10 term jumbled graph and not discrepancy.
The multiclass extension of quasirandomness (k > 1) is discussed throughly in Lovász and Sós [28], where the generalized quasirandom graphs are defined. Here the clusters or cluster-pairs of small discrepancy behave like expanders or bipartite expanders. In fact, these are the deterministic counterparts of the generalized random graphs. Much earlier, in [34] the authors established valuable relations between 15 quasirandomness and the partitions of the seminal Szemerédi regularity lemma [35]. For the definition, the notion of the convergence of edge-and vertex-weighted graph sequences is needed. Without going into details, we will use the notion of graph convergence as discussed in [16].
The sequence (G n ) of ede-and possibly vertex-weighted graphs is said to be convergent if the sequence t(F, G n ) of homomorphism densities converges for any simple graph F as n → ∞. They also 20 define the limit object that is a symmetric, bounded, measurable function W : [0, 1] × [0, 1] → R, called graphon. The stepfunction graphon W G is assigned to the weighted graph G in the following way: the sides of the unit square are divided into intervals I 1 , . . . , I n of lengths of the relative vertex-weights, and over the rectangle I i × I j the stepfunction takes on the value that is the edge-weight between vertices i and j. The convergence of (G n ) is also equivalent that the stepfunction graphon W G converges to 25 the limiting graphon in the so-called cut-metric. Roughly speaking, the members of a convergent graph sequence become more and more similar in small details. In terms of the graph convergence, in Section 4 of [8] we proved the following.

Proposition 3.
Consider the generalized random graph sequence G n (P, P k ) with P k = (C 1 , . . . , C k ), |C u | = n u (u = 1, . . . , k). Let n → ∞ in such a way that nu n → r u with some r 1 , . . . , r k > 0, k u=1 r u = 1. 30 Then G n (P, P k ) → W H as n → ∞, where H is a vertex-and edge-weighted graph on k vertices with vertex-weights r 1 , . . . , r k , the edge-weights are the entries of P, and W H is the step-function graphon corresponding to H.
In [28] the following definition of a generalized quasirandom graph sequence was given. H on k vertices with vertex-weights r 1 , . . . , r k and edge- 35

Definition 3. Given a model graph graph
The authors of [28] also proved that the vertex set V of a generalized quasirandom graph G n can be partitioned into classes C 1 , . . . , C k in such a way that |Cu| |V | → r u (u = 1, . . . , k) and the subgraph of G n induced by C u is the general term of a quasirandom graph sequence with edge-density tending to p uu 40 (u = 1, . . . , k), whereas the bipartite subgraph between C u and C v is the general term of a quasirandom bipartite graph sequence with edge-density tending to p uv (u = v) as n → ∞.
Because of the limit relation in the definition of the generalized quasirandom graphs, and the spectral equivalence of convergent graph sequences, the properties, discussed in Proposition 1, are as well valid for the generalized quasirandom graphs. Actually, the authors in [17] proved that for any k, the k 45 largest absolute value normalized adjacency eigenvalues of a convergent graph sequence converge (to the corresponding eigenvalues of the limiting graphon). In [9] we proved the same for the normalized modularity spectra of convergent graph sequences.
How to construct a generalized quasirandom graph with given k, P, and vertex-weights of the model graph H? Consider the instance when there are k clusters C 1 , . . . , C k of the vertices of sizes n 1 , . . . , n k such that nu n = r u (u = 1, . . . , k). Let us choose the independent irrational numbers α uv (1 ≤ u ≤ v ≤ k). Then the subgraph on the vertex-set C u is constructed as follows: i, j ∈ C u , i < j are connected if and only if where {.} denotes the fractional part of a real number. The bipartite subgraph between C u and C v is constructed as follows: i ∈ C u and j ∈ C v are connected if and only if Analytical number theoretical considerations (see, e.g., [13,32]) guarantee that, for any 1 ≤ u ≤ v ≤ k, the sequence is well-distributed symmetrically in [0, 1] 2 , uniformly in i, j ∈ C u (i = j). Therefore, with the considerations of [32], if n → ∞ and nu n → r u (u = 1, . . . , k). For more examples of quasirandom graphs in the k = 1 case see [12,13,36].
Therefore, a large random graph, constructed in this way, will be 'nearly' k-partite, k-regular, and its 5 normalized modularity spectrum contains k − 1 structural eigenvalues, whereas, all the other eigenvalues are o(1), akin to the weighted k-variance of the optimal (k − 1)-dimensional representatives. In this case, the k-way discrepancy in the optimal spectral clustering is o (1). As for the complete k-partite graph K n1,...,n k (pure case), its normalized modularity spectrum contains k − 1 structural negative eigenvalues and n − k + 1 zeros. Also, the above k-variance is zero; further, disc k (K n1,...,n k ) = 0 and s k = 0. The properties of Proposition 1 can be regarded as generalized quasirandom properties provided their implications can be proved for any graph sequence. To make the idea more precise, we formulate the following conjecture.     , is o(1). The kpartition P k = (C 1 , . . . , C k ) minimizing the above weighted k-variance is such that nu n ≥ c (u = 1, . . . , k) holds with some constant c, where n u = |C u |. PIII.There are vertex-classes P k = (C 1 , . . . , C k ) of sizes n 1 , . . . n k , satisfying nu n ≥ c (u = 1, . . . , k) 5 and a constant 0 < θ < 1 (independent of n) such that disc 1 (G n ), . . . , disc k−1 (G n ) > θ, and disc k (G n ; C 1 , . . . , C k ) = o(1). PIV.There are vertex-classes P k = (C 1 , . . . , C k ) of sizes n 1 , . . . n k , satisfying nu n ≥ c (u = 1, . . . , k) and a k × k symmetric probability matrix P = (p uv ), such that every vertex of C u has asymptotically n v p uv neighbors in C v for any 1 ≤ u ≤ v ≤ k pair. Further, for the codegrees (number of common neighbors) the following holds: every two different vertices i, j ∈ C u have asymptotically p 2 uv n v common neighbors in C v for any 1 ≤ u ≤ v ≤ k pair. More exactly, for every 1 ≤ u ≤ v ≤ k and i, j ∈ C u : t∈Cv a it = p uv n v + o(n); The PI-PII implications follow from the statements of Chapter 3 [8]. Particularly, statement (a) implies (b) by subspace perturbation theorems both in PI and PII. The PII→PIII implication is proved in [9], 10 and discussed in Section 4. As for the PIII→PII implication, we will prove Theorem 1 in Section 4. Based on the results of [19,37] we guess that PIII implies PIV, and vice versa. With some transformation, theorems of [36,37] about (p, α)-jumbled graphs may be applicable for the subgraphs and bipartite subgraphs, where p is some p uv and α is related to the k-way discrepancy.

Discrepancy versus spectra
Here we extend the notion of discrepancy to rectangular matrices of nonnegative entries, like microarrays or contingency tables. Edge-weighted and directed graphs are special cases. Let A = (a ij ) be an m × n matrix with a ij ≥ 0. We assume that AA T (when m ≤ n) or A T A (when m > n) is irreducible. Consequently, the row-sums d row,i = n j=1 a ij and column-sums d col,j = 5 m i=1 a ij of A are strictly positive, and the diagonal matrices D row = diag (d row,1 , . . . , d row,m ) and D col = diag (d col,1 , . . . , d col,n ) are invertible. Without loss of generality, we mostly assume that n i=1 m j=1 a ij = 1, since the normalized table is not affected by the scaling of the entries of A. It is well known (see e.g., [10]) that the singular values of A D are in the [0,1] interval. Enumerated in non-increasing order, the positive ones are the real numbers where r = rank(A). Under the above conditions, 1 is a single singular value, and it is denoted by s 0 , since it belongs to the trivial singular vector pair. In [10] we estimated the multiway discrepancy, to be 10 introduced, of A by means of these singular values and the corresponding spectral subspaces.

Definition 4.
The multiway discrepancy of the rectangular array A of nonnegative entries in the proper k-partition R 1 , . . . , R k of its rows and C 1 , . . . , C k of its columns is where Here a(X, Y ) = i∈X j∈Y a ij is the cut between X ⊂ R a and Y ⊂ C b , Vol(X) = i∈X d row,i is the volume of the row-subset X, Vol(Y ) = j∈Y d col,j is the volume of the column-subset Y , whereas Vol(X)Vol(Y ) denotes the density between X and Y . The minimum k-way discrepancy of A is disc k (A) = min R1,...,R k C1,...,C k disc(A; R 1 , . . . , R k , C 1 , . . . , C k ).
In [10], we proved that given the m × n rectangular array A, the following spectral biclustering 15 results in row-column cluster pairs of small discrepancy. The clusters R 1 , . . . , R k of the rows and C 1 , . . . , C k of the columns are obtained by applying the weighted k-means algorithm for the (k − 1)dimensional row-and column representatives, defined as the row vectors of the matrices of column vectors (D k,row andS 2 k,col of these row-and column-representatives introduced in (4). Then, under some balancing conditions for the margins and for the cluster sizes, we proved that disc k (A) = O( √ 2k(S k,row +S k,col ) + s k ). In the special case when m = n and A is symmetric of zero diagonal, we have the edge-weight matrix of an undirected graph. In [9], we proved the following for the k-way discrepancy of the edge-weighted  1 ≥ |µ 1,n | ≥ · · · ≥ |µ k−1,n | > ε ≥ |µ k,n | ≥ · · · ≥ |µ n,n | = 0.
The partition (C 1 , . . . , C k ) of V is defined so that it minimizes the weighted k-varianceS 2 k of the optimum vertex representatives obtained as row vectors of the n × (k − 1) matrix of column vectors D −1/2 n u i,n , where u i,n is the unit-norm eigenvector corresponding to µ i,n (i = 1, . . . , k − 1). Assume that there is a constant 0 < K ≤ 1 k such that |C i | ≥ Kn, i = 1, . . . , k. With the notation s = S2 k , the (C i , C j ) pairs are O( √ 2ks + ε)-volume regular (i = j) and for the clusters C i (i = 1, . . . , k) the following holds: for all where ρ(C i ) = w(Vi,Vi) Vol 2 (Vi) is the relative intra-cluster density of C i .
Then, by Proposition 4, PII implies PIII, under some balancing conditions for the margins and for the cluster sizes. Conversely, we are able to estimate s k with the k-way discrepancy.
Theorem 1. With the above notation, for any positive integer k < rank(A).
For the proof we need the following lemmas. Lemma 3 of Bollobás and Nikiforov [14] states that to 5 every 0 < ε < 1 and vector x ∈ C n , x = 1, there exists a vector y ∈ C n such that its coordinates take no more than 8π ε 4 ε log 2n ε distinct values and x − y ≤ ε. Lemma 3 of Butler [18] can be traced back to this one. It states that to any vector x ∈ C n , x = 1 and diagonal matrix D of positive real diagonal entries, one can construct a step-vector y ∈ C n such that x − Dy ≤ 1 3 , Dy ≤ 1, and y has at most Θ(log n) distinct coordinates. We well also use the following lemma that we constructed just 10 for this purpose. . . , C be proper partitions of the rows and columns; further, x ∈ C m and y ∈ C n be stepwise constant vectors having equal coordinates over the index sets corresponding to the partition members of R 1 , . . . , R k and C 1 , . . . , C l , respectively. The k × real matrix A = (a uv ) is defined by where A denotes the spectral norm, that is the largest singular value of the real matrix A , and the squared norm of a complex vector is the sum of the squares of the absolute values of its coordinates.
Note that here the row-and column-weights have nothing to do with the entries of A, and the volumes are usually not the ones defined in Section 2; this is why they are denoted by VOL instead of Vol.

15
Proof of Lemma 2. For the distinct coordinates of x and y we introduce with x a and y b that are coordinates of x ∈ C k and y ∈ C l . Obviously, D by the well-known extremal property of the largest singular value, which finishes the proof.
Proof of Theorem 1. Assume that α = disc k (A) is attained with the proper k-partition R 1 , . . . , R k of the rows and C 1 , . . . , C k of the columns of A; i.e., for every R a , C b pair and X ⊂ R a , Y ⊂ C b we have Introducing the m × n matrix where R = (ρ(R u , C v )) is the m × n block-matrix of k × k blocks with entries equal to ρ(R u , C v ) over the block R u × C v , Equation (10) yields Since the rank of the matrix D 1/2 row RD 1/2 col is at most k, by Theorem 3 of Thompson [38], describing the effect of rank k perturbations for the singular values, we obtain the following upper estimate for s k , that is the (k + 1)th largest (including the trivial 1) singular value of A D : where . denotes the spectral norm. 5 Let v ∈ R m be the left and u ∈ R n be the right unit-norm singular vector corresponding to the maximal singular value of D In view of Butler [18], there are stepwise constant vectors x ∈ C m and y ∈ C n such that v−D Then using the above discussed argument and Butler's results, with the matrix F defined in (10) and the constructed step-vectors x ∈ C m , y ∈ C n , we have With the preliminary argument, x takes on at most r 1 = Θ(log m), and y takes on at most r 2 = Θ(log n) distinct values, which define the proper partitions P 1 , . . . , P r1 of the rows and Q 1 , . . . , Q r2 of the columns. Let us consider the subdivision of them with respect to R 1 , . . . , R k and C 1 , . . . , C k . In this way, we obtain the proper partition P 1 , . . . , P 1 of the rows and Q 1 , . . . , Q 2 of the columns with at most 1 = kr 1 and 2 = kr 2 parts. 10 Now, we apply Lemma 2 to the matrix F and to the step-vectors x and y, which are also stepwise constant with respect to the above partitions. The row-weights and column-weights are the d row,i 's and d col,j 's, respectively. In view of the lemma, the entries of the 1 × 2 matrix F are and | x, Fy | ≤ F · D 1/2 row x · D 1/2 col y ≤ F . But by a well-known linear algebra fact we get that |f uv | ≤ · disc(A; R 1 , . . . , R k , C 1 , . . . , C k ), where = √ 1 2 and we used Formula (7) for the discrepancy. Consequently, follows. Note that, by Theorem 1, for the undirected, edge-weighted graph G n , the relation |µ k | = O(log n)disc k (G n ) holds; therefore, if in addition disc k (G n ) = O(n −τ ) with some 0 < τ < 1 2 , as in the case of generalized random and quasirandom graphs, then PIII implies PII.
The discrepancy of a directed graph G = (V, W) is a special case of that of a rectangular array 5 in that its edge-weight matrix W = (w ij ) is quadratic, but asymmetric: w ij ≥ 0 is the weight of the i → j edge (i = j) and w ii = 0 (i = 1, . . . , n). We used the spectral clustering algorithm to migration data between 34 countries. The row-and column-clusters are the out-and in-clusters, corresponding to countries exhibiting similar emigration and immigration patterns; w ij represents the number of persons in thousands who moved from country i to country j during the year 2011. of the normalized (asymmetric) 34 × 34 edge-weight matrix W D , there was indeed a gap after s 2 , so we found three clusters for both the rows and the columns. The row-clusters (emigration trait clusters) were the following: 1. Australia, Austria, Canada, Chile, Czech Republic, Estonia, Greece, Hungary, Israel, Japan, Korea, Luxembourg, Mexico, New Zealand, Poland, Slovak Republic, Slovenia, Turkey, United States. 15 2. Belgium, France, Germany, Ireland, Italy, Netherlands, Portugal, Spain, Switzerland, United Kingdom. 3. Denmark, Finland, Iceland, Norway, Sweden. The column-clusters (immigration trait clusters) were: 1. Australia, Austria, Belgium, France, Greece, Israel, Italy, Luxembourg, Poland, Portugal, Spain, 20 Switzerland, United Kingdom. 2. Canada, Chile, Czech Republic, Germany, Hungary, Iceland, Ireland, Japan, Korea, Mexico, Netherlands, New Zealand, Slovak Republic, Slovenia, Turkey, United States. 3. Denmark, Estonia, Finland, Norway, Sweden. Figure 4 shows the results, where we can spot some dense and sparse edge-densities within the subgraphs 25 and bipartite subgraphs. For example, there is a high edge-density between out-cluster 2 and in-cluster 1, which indicates frequent migration between the countries of the European Union. Also, high edgedensity is shown between out-cluster 3 and in-cluster 3, i.e., between the countries of Northern Europe, in 2011. However, their separation is not so spectacular, since with n = 34 the asymptotic properties are not clearly effectuated. 30

Conclusion
We characterized spectra and discrepancies of generalized random and quasirandom graphs. Properties, like 'large' spectral gap, 'small' within-cluster variances of the vertex representatives, and 'small' withinand between-cluster discrepancies were formulated with graph based matrices, for a given number of clusters. However, our theory helps the practitioners to find the optimal number of clusters.
35 Fig. 4. Asymmetric adjacency matrix of the migration graph, where darker spots mean larger numbers while lighter spots mean smaller ones. Rows and columns are ordered according to the out-and in-cluster memberships, respectively; further, the clusters and separated by red lines.
As a generalization of quasirandomness, that applies to the one-cluster situation, we also considered generalized quasirandom properties, and proved some implications between them, irrespective of stochastic models. Real-life expanding graph sequences asymptotically capturing one of these properties are random-like, confined to the subgraphs and bipartite subgraphs of them. The equivalences also suggest that spectral methods are capable to find partitions of the vertices with 'small' multiway discrepancy. 5 We extended these notions to rectangular arrays of nonnegative entries, of which directed graphs are special cases.