Community Detection by $L_0$-penalized Graph Laplacian

Community detection in network analysis aims at partitioning nodes in a network into $K$ disjoint communities. Most currently available algorithms assume that $K$ is known, but choosing a correct $K$ is generally very difficult for real networks. In addition, many real networks contain outlier nodes not belonging to any community, but currently very few algorithm can handle networks with outliers. In this paper, we propose a novel model free tightness criterion and an efficient algorithm to maximize this criterion for community detection. This tightness criterion is closely related with the graph Laplacian with $L_0$ penalty. Unlike most community detection methods, our method does not require a known $K$ and can properly detect communities in networks with outliers. Both theoretical and numerical properties of the method are analyzed. The theoretical result guarantees that, under the degree corrected stochastic block model, even for networks with outliers, the maximizer of the tightness criterion can extract communities with small misclassification rates even when the number of communities grows to infinity as the network size grows. Simulation study shows that the proposed method can recover true communities more accurately than other methods. Applications to a college football data and a yeast protein-protein interaction data also reveal that the proposed method performs significantly better.


Introduction
Community detection has attracted tremendous research attention, initially in the physics and computer science community [22,25,24] and more recently in the statistics community [3,4,34,13]. Considering an undirected network G = (V, E), where V is the set of nodes and E is the set of edges. Community detection is to find an "optimal" partition of the nodes V = G 1 · · · G K such that nodes within the communities G k (k = 1, · · · , K) are more closely connected than nodes between the communities.
One class of community detection algorithms detects community by optimizing a heuristic global criterion over all possible partitions of the nodes. For example, modularity [25] has been very popular in community detection and fast algorithms for maximizing modularity [23] have been developed and widely used. The well-known spectral clustering algorithms [13,2,6,27,14] can be traced back as continuous approximation methods of global criterion such as ratio cut [10] or normalized cut [28]. Spectral clustering methods are fast in computation and easy to implement since they usually only require calculation of a few eigenvectors of the Laplacian matrix.
Probabilistic model-based methods are another class of community detection algorithms. They detect communities by fitting a probabilistic model [4,26,21,8] or by optimizing a criterion derived from a probabilistic model [3,15]. One of the most commonly used models is the stochastic block model (SBM) [12]. Given the adjacency matrix A = (A ij ) 1≤i,j≤n of a network G with n nodes, the SBM assumes that true node labels c i are independently sampled from a multinomial distribution with parameters π = (π 1 , ..., π K ) T , i.e. π k = P (c i = k), k = 1, · · · , K. Conditional on the community labels, the edges A ij (i < j) are independent Bernoulli random variables with P (A ij = 1|c i , c j ) = p cicj . The SBM assumes that the expected degrees are the same for all nodes in the same community and thus cannot allow hubs in the network. To remove this constraint, the degree corrected stochastic block model (DCSBM) [15] introduces a degree correction variable θ i to each node such that P (A ij = 1|c i , c j , θ i , θ j ) = θ i θ j p cicj , where θ i > 0 and E(θ i ) = 1.
Consistency results were developed for a number of community detection algorithms, mostly based on the SBM or DCSBM. Under the assumption that the community number is fixed, Bickel and Chen [3] laid out a general theory under the SBM for checking consistency of community detection criteria when the network size grows to infinity, and similar theories were also developed for DCSBM [34,13]. With a fixed community number, the community size would linearly grow as the number of nodes grows. However, this is not a realistic assumption, because real networks often have tight communities at small scales, even when networks contain millions of nodes [20]. Recent researches [27,7,5] generalized these consistency results by allowing the number of communities grows to infinity. However, as far as we know, similar results for the DCSBM are not available yet.
Despite all these progresses, current algorithms implicitly assume that all nodes of the network belong to a community. However, many real networks contain outlier nodes. These outlier nodes do not belong to any community and they just loosely connect to other nodes in the network. Ignoring these outlier nodes can significantly influence the accuracy of community detection. In addition, real networks often do not have a known number of communities. Although several methods have recently been proposed to estimate the number of communities [18,29], but these methods are also based on the assumption that all nodes belong to a community. A few available algorithms [17,33] can detect communities for networks with ourliers and unknown community numbers. However, there is no theoretical result to guarantee the consistency of these methods when there are outliers in the network.
In this paper, we propose a novel model-free tightness criterion for community detection. Community detection based on this criterion iteratively extracts single communities and no prior knowledge about the community number is needed. Maximizing this criterion is closely related with the L 0 -penalized graph Laplacian. An efficient algorithm is developed based on the alternating direction method of multiplier (ADMM) to maximize this penalized Laplacian. A permutation-based test is performed to filter the extracted communities that are likely to be outliers or false communities. Under the DCSBM and the DCSBM with outliers, we establish asymptotic consistency allowing the community number K increases as the number of nodes grows. Simulation and real data analysis show that the proposed method can computationally efficiently recover the community structure with high resolution and accuracy. This paper is organized as follows. The model-free criterion and the ADMM algorithm are described in Section 2. Theoretical results are given in Section 3. Section 4 presents simulation comparison with existing methods and Section 5 is the real data analysis. Proofs of the theorems are given in the Appendix.

Method and algorithm
Assume that nodes of a graph G = (V, E) are indexed by {1, 2, ..., n} and each node i belongs to exactly one of K non-overlapping communities denoted by a latent label c i ∈ {1, ..., K}. Given a set S ⊂ V , the complementary set of S is denoted byS and the number of elements in S is denoted as |S|. Define W (S) = i,j∈S A ij , B(S) = i∈S,j∈S A ij and V (S) = W (S) + B(S). Then, W (S) is twice the number of edges between nodes in S, B(S) is the total number of edges between S andS and V (S) is the total degrees in S. Given a vector u, we denote u 0 as the number of nonzero elements in u and u 2 as the L 2 -norm of the vector u.

A tightness criterion
Given a set S ⊂ V , if it is a true community, we expect that most of its connections are within S itself and thus W (S)/V (S) should be large. However, directly maximizing W (S)/V (S) has a trivial solution S = V . We instead introduce a penalty to the size of the community and consider the following tightness criterion, where η is a tuning parameter. In Section 3, we will show that with a proper choice of η, maximizing this tightness criterion can render consistency in community detection. The quantity B(S) is known as the cut between S andS [10]. True communities should have a small cut value. However, the entire network V or single nodes with no connections all have zero cut values. To avoid these trivial solutions, the ratio cut minimizes B(S)/(|S||S|) for community detection [10]. The denominator |S||S| can be viewed as a penalty to guard against too large or too small communities. Similarly, the normalized cut minimizes B(S)/V (S) + B(S)/V (S) for community detection [28]. The denominators V (S) and V (S) are penalties for the community size. The criterion proposed in Zhao et al. 2011 [33] also has a penalty for both S andS. Since these criteria penalize both |S| and |S|, they perform best when the community sizes are similar. In comparison, the tightness criterion (2.1) only penalizes |S|. This endows the tightness criterion with a high detection power for both large and small communities. However, only penalizing |S| can also lead to small spurious communities and we use a resampling procedure to remove these potential false communities in section 2.3.
The tightness criterion 2.1 is closely related to a penalized graph Laplacian. More specifically, let Q = D −1/2 AD −1/2 be the graph Laplacian, where A is the adjacency matrix and D = diag{d 1 , · · · , d n } is the nodal degree matrix with d i being the degree of the ith node. Then, we have the following proposition.

Proposition 2.1. Given a set S ⊂ V , define its membership vector by
Then we have ψ(S) = u t S Qu S − η u S 0 and u S 2 = 1. Therefore, maximizing the tightness criterion (2.1) is equivalent to the following optimization problem Finding the global solution to (2.3) is difficult in general, because we have to search over all possible subsets of V . In the next section, we will develop an efficient algorithm based on the ADMM to find a local optimal.

Algorithm
Before introducing the algorithm, we first give some notations. For any u with u 2 = 1, we denote its nonzero element index set S(u) = {i : On the other hand, given S(u), we can define a new membership vector u d = u S(u) using (2.2). The vector u d is obtained just by reassigning values of the nonzero elements of u according to the degrees of S(u). Note that u d satisfies u d 2 = 1. Given λ 1 ≥ 0, we consider the following optimization problem We alternatively update u, u d , v and v d to solve (2.6). Given u or v, we could easily get u d and v d using the d-operator defined above. Given other variables, updating u or v reduces to a simple linear programming problem which has an explicit solution given by the following proposition. Proposition 2.2. For a given vector z = (z 1 , ..., z n ) t ∈ R n , we denote its rth largest absolute value as |z| r , and let z h r be the vector with the ith element as z h r (i) = z i I(|z i | > |z| r+1 ). Then for a constant ρ > 0, the solution to where r is the smallest integer that satisfies The proof of Proposition (2.7) is given in [16] and we omit it here. We summarize the algorithm for the optimization problem (2.6) in the following L0Lap algorithm.
In all simulation and real data analysis, we set the convergence tolerance parameter as 10 −4 . The parameters λ and λ 1 are the penalty parameters in the augmented Lagrangian and they can be chosen as fixed [35]. Throughput the paper, we set λ = 1/ √ n. For λ 1 , we first set λ 1 = 0 and run Algorithm 1 with the initial value v 0 = (1/ √ n, ..., 1/ √ n) to getv 0 . Then, we set λ 1 = 1 and run Algorithm 1 with the initial valuev 0 to get the final solution. Although Algorithm 1 cannot guarantee a global maximum for (2.4), we find that this process achieves robust results and good performance in the numerical analyses.
The parameter η is the most important tuning parameter and we introduce a criterion to tune the parameter η.
A large φ(S) implies that S has more connections within itself and thus would be more likely to be a community. From the theoretical results in section 3, we know that the best η is at the order of O(1/n). Therefore, we run Algorithm 1 for η = 0, 1/10n, 2/10n, · · · , 10/10n and choose the η such that the resulted S η achieves the largest φ(S η ).

The permutation test
After a community is extracted by Algorithm 1, we remove it from the network and iteratively apply Algorithm 1 to the remaining network until there is no edge left. However, it may lead to some small spurious communities during this process, because even Erdös-Rényi (ER) networks can have small communitylike structures. To filter these spurious communities, we introduce the following permutation test. Suppose that S 1 , · · · , S c are all the identified communities with less than M nodes and G 0 is the sub-network of G composed of nodes in . Given a subset S of G 0 , if S is an ER-graph with a connecting probabilityp, given any m nodes, the probability of observing no more than E edges between these m nodes is Let n i and E i be the number of nodes and number of edges in S i (i = 1, · · · , c), respectively. Each detected community S i has an associated probability p(n i , E i ) using (2.10). We permute N times the edges in G 0 to generate N ER-graphs and run Algorithm 1 to each of the N ER-graphs. The first extracted community of the jth ER-graph also has a probability p ER j using (2.10). Note that p(n i , E i ) should be less than most p ER j if S i is a true tight community. We assign the permutation p-value for S i as In our simulation and real data, we set M = 20, N = 100 and α = 0.05.

Theoretical properties
In this section, we discuss theoretical results about the estimator S that maximizes the tightness criterion (2.1) under the DCSBM and the DCSBM with outliers. We first give the exact definition of the DCSBM.
Throughout this paper, we assume that α, τ, γ, δ are fixed constants such that 0 ≤ 2δ < α < 1/2 and 0 < τ < γ < α − 2δ. The constant α is to control the lower bound for the within-community connection probabilities and the constant δ is to control the smallest community size. The constant τ is to separate different communities by a feature depending on the community size (π k ) and the within-community connection probabilities (p kk ). Given any two sets S 1 and S 2 , we denote S 1 ΔS 2 = S 1 S 2 − S 1 S 2 as their symmetric difference. For two nonnegative sequences a n and b n , we write a n b n if there exists a constant C 0 > 0 such that a n ≥ C 0 b n . Define Γ δ = {S ⊂ V, |S| 2 /K n 2−2δ }. Similar to [34], we assume Π is the K × M matrix representing the joint Given the community label c and a set of nodes With such a choice of η, suppose that S ⊂ V is such that the tightness criterion (2.1) is maximized in Γ δ , then with probability at least 1 − (2K)n −2 − 2 n+2 /n n , This theorem says that under a number of regularity conditions, if the tuning parameter is chosen properly, the detected community S is very close to the underlying true community G 1 which has the largest ρ d k . Remark 3.1. In Theorem 3.1, the maximum is taken over S ∈ Γ δ . This constraint is needed because small spurious communities could generate a smaller tightness criterion (2.1) than the true communities. However, we find it difficult to develop an efficient algorithm with this constraint and hence this constraint is not added in Algorithm 1. Instead, we filter the potential small spurious communities by a resampling procedure.
Remark 3.2. The condition π d 1 /π 1 ≥ max 2≤k≤K π d k /π k is not as restrictive as it looks. For example, if c and θ are independent, then π d k = π k for all 1 ≤ k ≤ K and this condition is naturally satisfied. The SBM clearly also satisfies this condition, since in this case M = 1 and h 1 = 1.
n −τ is to make sure that the first community is separable from the other communities. Consider a simple case of SBM when p kl = p 0 for all k = l, we have ρ d k = 1/ ((1 − p 0 /p kk ) π k + p 0 /p kk ). The ratio β k = p 0 /p kk can be viewed as the "out-in-ratio" defined in [8]. If all π i 's are the same, the first extracted community G 1 is the community with the smallest out-in-ratio. If all out-in-ratios β k are the same, the first extracted community G 1 is the community with the smallest size.
Since p − log n/n 1−2α and π − n −δ/2 , we have p − n 1−2α+δ/2 K log n. Consider a special case when K is finite and the community sizes are all O(n). In this case, the lower bound of the connecting probability within communities should satisfies p − log n/n 1−2α+δ/2 and thus np − / log n n 2α−δ/2 . This condition is similar to the condition np − / log n → ∞ in [34], especially when α is close to 0. If p − = O(1) and δ = 1/4 − 2 for some > 0 very close to 0, then n min = O(n 7/8+ ) and K = O(n 1/8− ). Thus, the upper bound of K is O(n 1/8 ). Consider the simplest case when K = 2. Let τ = 0 and γ = α/2 − δ, the misclassification rate is about O p (log n/n α/2−δ ) by the inequality (3.2). This improves the results in [27] and [19] where the misclassification rate was O p (1/ log n).
When there are outliers in networks, we also have a consistency result similar to Theorem 3.1. We first give the definition of the DCSBM with outliers. For a DCSBM with outliers, all communities are well-defined communities except the Kth outlier community. We also assume p − log n/n 1−2α and π − n −δ/2 for the DCSBM with outliers. We have the following theorem. This theorem says that as long as the outlier community is not too large, the first extracted community will be very close to the community with the largest ρ d k .

Simulation study
In this section, we perform simulation to compare the proposed method with state-of-the-art algorithms including SCORE [13], nPCA [28], OSLOM [17], Zhao [33], and PLH [29]. Since SCORE and nPCA require a known community number, we provide the true community number to these algorithms in the simulation. For the algorithm developed in this paper, we consider two versions of the algorithm, with or without the permutation test. This helps us to see the effect of the permutation test on removing false communities. We call these algorithms L0Lap (without the permutation test), L0LapT (with the permutation test). We evaluate the performance of the algorithms by the normalized mutual information (NMI) [31] between the detected community and true community. For methods that can automatically determine the community number, we also compare their estimated community numbers. In addition, we also consider another two algorithms, NB and BH [18], when comparing the accuracy for the community number estimation. Since NB and BH can only estimate the community number, we do not evaluate their performance in terms of NMI. For OSLOM, we use the C++ implementation available at http://www.oslom.org/software.htm. The computer codes of the algorithms Zhao, NB, BH and PLH were provided by the original authors. For the other methods, we implement the algorithms using Matlab according to their respective descriptions. L0Lap and L0LapT were implemented by Matlab which are available at https://github.com/ChongC1990/L0Lap. Computationally, we find that SCORE and nPCA are computationally most efficient methods. For a network with 1000 nodes, they can finish computation in less than 0.1 second. The proposed method can finish computation in 3 seconds for a 1000-node network. The Zhao method and OSLOM need 50 seconds and 282 seconds to process a 1000-node network, respectively. For a larger network with 10,000 nodes, SCORE and nPCA can finish computation in a few seconds. The proposed method can finish in 70 seconds. In comparison, the Zhao method requires more than 3500 seconds and OSLOM is unable to give any result.

Simulation without Outliers
We perform the simulation under both SBM and DCSBM. All simulated networks have n = 1, 000 nodes and K = 21 communities of different sizes. Among the 21 communities, 5 of them have 100 nodes, 6 have 50 nodes and 10 have 20 nodes. For the DCSBM, Θ = (θ ij ) are drawn independently from U [0.5, 1]. For SBM, all elements of Θ are set as 1. Similar to [1], the connecting matrix P is constructed depending on an "out-in-ratio" parameter β [8]. Given a β, we set the diagonal elements of matrix P (0) as β −1 and set all off-diagonal elements as 1. Then, given an overall expected network degree Λ, we rescale P (0) to give the final P: P = Λ (n − 1)(π T P (0) π)(EΘ) 2 P (0) , where π = (π 1 , π 2 , ..., π 21 ) is the proportion of nodes in each community. Conditional on the labels and P, the edges between nodes are generated as independent Bernoulli variables with parameter θ i θ j P ij . The methods NB, BH and PLH require a candidate set of K, we provide the candidate set by all possible values from 1 to 25. For L0LapT, OSLOM and Zhao, since there will be unclassified nodes, we only consider nodes that are assigned with a community label when calculating the NMI.
We first fix Λ = 50 and vary the out-in-ratio parameter β from 0.02 to 0.2. For each β, the mean NMI of each algorithm is summarized by 100 repetitions (Figure 1). For all algorithms, the NMIs tend to decrease as β increases. We clearly see that our algorithm achieves the highest NMI compared with other methods. As expected, after the permutation test, the NMI can be significantly improved. This is because the permutation test successfully removes small false communities. Especially, when β is large, the test is more effective in terms of improving the NMI. For example, under the SBM, when β = 0.2, the NMI of L0Lap is similar to that of nPCA, but after applying the permutation test, the NMI of L0Lap becomes close to 1. Furthermore, since nodes in the DCSBM are heterogeneous, as expected, all methods perform better in the SBM than in the DCSBM. In terms of the community number, L0LapT and PLH give comparable estimates and are usually better than other algorithms (Figure 1, the  bottom panel). When the out-in-ratio β is large, other than the Zhao method, all other methods tend to underestimate the community number. The Zhao method prefers to find a big community and many small communities, so it often overestimates the number of communities. We then fix β = 0.1 and vary Λ from 2 to 100 to compare different algorithms. The mean NMI of each algorithm is shown in Figure 2. Again, we see that our algorithm generally performs better than other algorithms. When Λ is very small, since OSLOM and Zhao often divide networks to many small connected subsets, they tend to have much larger NMIs than other methods and also tend to significantly overestimate the community number.

Simulation with outliers
In this section, we compare the performance of each method under the DCSBM with outliers. The simulated networks are similar to the simulated networks in Section 4.1 except that the 5 communities with 20 nodes are treated as outliers. The connecting probability between outliers is the same as the betweencommunity connection probability. For SCORE and nPCA, we set the community number as 17 in this simulation (16 communities and 1 outlier community). To compute the reasonable NMIs, the outlier nodes are viewed as in the 17th true community. Figure 3 shows the NMIs and the number of detected communities of these algorithms. Because there are outliers, even when β is very small, there is still an nonignorable gap between the NMIs and its upper bound 1. However, after applying the permutation test, the NMI of L0Lap is significantly improved. Furthermore, the community number found by L0LapT is significantly better than all other methods.

Real data analysis
We consider two real data sets in this section, the college football network data [30] and the protein-protein network data in yeast [32].

College football data
The college football network data is the 2006 National Collegiate Athletic Association (NCAA) Football Bowl Subdivision (FBS) schedule [30]. The data set consists of 115 schools belonging to 11 conferences in FBS, 4 independent schools and 61 lower division schools. Schools within conferences play more often against each other, so the 11 conferences are 11 communities. The four independent schools are hubs. They play against many schools in different conferences but do not belong to any conferences. The 61 lower division schools connect loosely with other nodes and are outliers of the network. We apply all methods considered in the simulation study to this data set. The algorithms L0Lap, L0LapT, OSLOM, Zhao and PLH can automatically estimate the community number. For SCORE and nPCA, we provide them with the true community number 12, including 11 communities and one outlier community. The outlier community includes both the hub nodes and the outlier nodes. Table 1 shows the NMI and the detected community number (CN) of each algorithm. This clearly show that L0LapT have the largest NMI compared with other methods. PLH also works well. Its NMI is 0.929 and ranks the second best among all algorithms. In terms of outlier identification, although OSLOM and Zhao are designed to be able to identify outliers, OSLOM fails to report any outlier and the Zhao method assigns most outlier nodes to its largest detected community. In comparison, L0LapT identifies 80 nodes as outliers and 62 of them are true outliers. Table 1 Performance comparison on the college football network data. CN is the detected or the provided community number. We set the community number as 12 for SCORE and nPCA.

L0Lap
L0LapT To look into more details of the detected communities of each algorithm, we examine the pairwise overlaps between detected communities with true communities. Specifically, given a detected community C D i and a true community C T j , we calculate an overlapping score between these two communities by Thus, we get a matrix O = (o ij ) CN ×12 for each algorithm, where CN is the detected community number. Figure 4 shows heat maps of these matrices for L0LapT, PLH, OSLOM and Zhao. Since Zhao extracted too many communities, we only consider the top 12 biggest communities. All communities identified by L0LapT are highly similar to or exactly the same as the true communities, which is shown in Figure 4. This demonstrates that L0LapT can give high quality communities. However, L0LapT fails to detect the community 11 and nodes in this community are filtered as outliers. For OSLOM, most of the diagonal overlapping scores are less than 0.71 and the largest overlapping score is only 0.86, showing that many detected communities by OSLOM contain substantial amount of nodes not belonging to these communities. PLH performs well for most communities, but members from true communities 4, 7 and 11 are mixed up. Zhao performs poorly in this data. Most of its detected communities are far away from true communities.

Protein-protein interaction data in yeast
In this section, we consider a protein-protein interaction (PPI) network data in yeast [32]. After removing isolated nodes, we get a network with 1,540 nodes and 7,123 edges. Different proteins often interact with each other to achieve one biological function. The communities of the PPI network should then represent different cellular functions. We apply all methods in the simulation study to this PPI network. L0LapT finds 22 communities with their sizes ranging from 8 to 138. The Zhao method finds 15 communities ranging from 2 to 632 nodes. OSLOM finds 114 communities ranging from 3 to 103. For SCORE and nPCA, since the number of communities is unknown, we set the community number as 50, which roughly is the average number of communities detected by L0LapT, Zhao and OSLOM. The candidate community number set for PLH is set as all integers between 20 and 50. Finally, PLH find 29 communities ranging from 14 to 181. We further filter out communities with less than 5 nodes, since these are unlikely to be true communities.
There is no true community structure to evaluate the quality of detected communities. We instead use gene oncology (GO) enrichment analysis to compare different methods. We download yeast gene GO annotation database from http://www.yeastgenome.org/ and only focus only on GO terms with at least 10 annotated genes. For each community, we calculate a list of p-values with every GO term by Fisher's exact test. If the detected communities are biological meaningful, the communities should be highly significant with a number of GO terms. After log 10 transformation of these p-values, define ratio t = | − log 10 p-value > t| | − log 10 p-value > 0| for a threshold t. This ratio could be viewed as an indicator of biological relatedness of the detected communities. At the same cutoff t, larger ratio value should correspond to more biologically meaningful communities. The ratio curves of these methods are shown in Figure  5, left panel. We see that the curve of L0LapT is largely above other curves.
However, when t is large, it is hard to see the difference. Therefore, we further consider only p-values less than 0.1 and define ratio r t = |{− log 10 p-value > t}| |{− log 10 p-value > 1}| for any threshold t ≥ 1. The new ratio curves are shown in Figure 5, right panel. We can now clearly see that L0LapT is always above other methods.

Conclusion and discussion
In this paper, we propose a community detection method by maximizing a tightness criterion. This method does not require a known community number and it can detect communities in networks with outliers. We prove a consistency result for DCSBM with or without outliers. Simulation studies and real data applications show that the proposed method generally performs better than other available algorithms. One problem we found is that although the proposed method generally gives more accurate estimation of the community number, when networks contain more noise or when the network is too sparse, it still cannot give a very accurate estimate of community number. In addition, the statistical test used in this paper is based on permutation. Although simulation shows that this permutation works well in general in terms filtering false communities, we were not able to develop theoretical guarantees for this test.
The ADMM Algorithm 1 cannot guarantee a global maximum. Recently, a few paper showed that global optimizer could be identified by local adjustments [36]. These methods could be generalized to our optimization problem (2.4) and deserve future research. If the community number K is known, the tightness criterion (2.1) can be generalized to a partition of V .
True communities should have a large ψ(G 1 , ..., G K ) and we may detect communities by maximizing ψ (G 1 , ..., G K ) over all partitions of V . Similarly, this optimization problem can be approximated by the Graph Laplician problem Then the sum X = n i=1 X i has expectation E(X) = n i=1 p i , and we have  Proof. By Lemma 7.1 and the condition p − log n/n 1−2α , we have by the Cauchy-Schwarz inequality. Then we have E(W (S)|c) n 1+2(α−δ) log n if S ∈ Γ δ . Let λ = 2 n log nE(W (S)|c) and by Chernoff's inequality, we have Since E(W (S)|c) n 1+2(α−δ) log n, we have λ/3 < E(W (S)|c) with sufficiently large n and thus P (W (S) − E(W (S)|c) > λ) < n −n .
For V (S), we have Similarly, letλ = 2 n log nE(V (S)|c), and we have In addition, we have Thus, with probability at least 1 − 4/n n , we have Therefore, with probability at least 1 − 2 n+2 /n n we have when n is sufficiently large.
Case II: y 1 > min 1≤k≤K y k . There exists i = 1 such that y i < y 1 , then we have z + /y 1 < x i /y i . Ifȳ(1 − t * 1 ) ≥ y 1 t * 1 , we haveȳ and z + /y 1 ≤ max 2≤k≤K x k /y k , then we have x k y k ).
So we have Based on the lemmas given previously, we give the proof of Theorem 3.1.