Revealing network communities through modularity maximization by a contraction–dilation method

Many real-world systems can be described by networks whose structures relate to functional properties. An important way to reveal topology–function correlations is to detect the community structures, which can be well evaluated by graph modularity. By maximizing modularity, large networks can be divided into naturally separated groups. Here, we propose a contraction–dilation algorithm based on single-node-move operations and a perturbation strategy. Tests on artificial and real-world networks show that the algorithm is efficient for discovering community structures with high modularity scores and accuracies at low expenses of both time and memory.


Introduction
Scientifically interesting complex networks often show potential group structures, i.e. a whole network consists of loosely connected dense parts. Such structures usually have strong correlations to functionalities of the networks. For example, relatively independent groups of molecules in a metabolic network might imply functionally independent modules or pathways [1]; a densely linked group of web pages on the World Wide Web might correspond to related topics [2]. Finding loosely tied dense groups of nodes, also known as clusters or communities, is thus an important approach to analyzing complex networks.
Network clustering has been a hot topic in many research fields like sociology, pattern recognition and bioinformatics for decades [3]- [6]. Among various methods proposed for network clustering, the modularity-based approaches have attracted much attention recently. The modularity proposed by Newman and Girvan [7] is an evaluation index of community structures. A large value of modularity corresponds to a good partitioning of the network. In this framework, discovering communities is realized through maximizing the modularity.
It should be noted that there are actually unlimited possible measures to assess network partitioning from different points of view. For example, normalized cut [8] is another popular criterion that is widely used in computer science [9,10]. No universal, operational standard is available to judge a criterion. All known criteria encounter problems under certain conditions. A recently raised concern on the resolution limit of modularity was discovered by Fortunato and Barthelemy [11], who proved that small tightly connected communities may be theoretically undistinguishable by maximizing modularity. Fortunately, this problem can be solved within the framework of modularity maximization in a hierarchical manner as shown by Arenas et al [12].
The nature of modularity optimization is to compare the modularity values among a large number of partitions of a network. This needs to evaluate the modularity function many times and consequently demands much computational effort. Nonetheless, if a partition is obtained by just changing the community index of a single node from another partition, the difference of modularity between the two community structures can be calculated at very little expense. Moving a single node at a time is thus a very efficient method to drive the system into a local optimum and is often employed as a refinement technique. For example, Ruan and 3 Zhang [13] adopted this technique in the refining stage of their two-stage procedure. Schuetz and Caflisch [14] also used it to improve the results obtained by the multi-step greedy algorithm. When incorporated in some appropriate global search strategies, however, single-node-move may play the key role in finding partitioning solutions. For example, Guimera and Amaral [15] and later Reichardt and Bornholdt [16] used the simulated annealing (SA) optimization method to avoid the single-node-move procedure getting trapped in local optima.
Although significant advances in community detection methods have been obtained, there are problems relating to efficiency and memory requirements, and new algorithms are still of essential interest to treat very large networks and get better results. For this purpose, we present here a method, called the contraction-dilation (CD) algorithm, which is simple but efficient. Compared with existing algorithms [13,17,18], the proposed method can find better solutions with higher modularity in less CPU time.

The CD algorithm
According to Newman and Girvan [7], the modularity of a network with n nodes and m edges can be written as where A is the adjacency matrix of the network, the element A i j = 1, if an edge is present between nodes i and j, and 0 otherwise. d i is the degree of node i. δ(x, y) = 1 if x = y; and 0 otherwise. The community index of node i is denoted by c i ∈ {1, 2, 3, . . . , c max } (i = 1, 2, . . . , n), where c max is a predefined maximal number of communities. In practice, the performance of the algorithm is not sensitive to c max , which can be assigned a large enough value.
The goal of a modularity-based optimization algorithm is to find an integer vector c that maximizes Q. The main operation of our algorithm is to move a single node, which can be described by a least change operator, denoted by T iα , acting on community index vector c to yield a new community structure c . Formally, moving node i to community α is represented as where The difference of modularity caused by moving a single node can be deduced as where d i (α) is the number of node i's connecting partners in community α, and d(α) the total degree of the nodes in community α. By using (4), the modularity difference caused by a singlenode-move can be computed at cost O(1); this is the reason why our search strategy relies mainly on the single-node-move operations.
Our algorithm starts with all nodes randomly distributed in c max communities. To improve this initial community partition, the algorithm repeatedly moves a single node to the community that maximizes the increment of modularity until no further improvement can be obtained. Because the differences of modularity can be computed very efficiently, the adjustment of nodes' communities proceeds fast. In this process, nodes gradually aggregate into a smaller number of communities, more and more communities become empty. Since the number of effective (non-empty) communities decreases monotonically, we call this a contraction process.
After a contraction, all nodes are arranged in relatively appropriate communities with a much higher modularity value. This state is a local optimum with respect to the single-nodemove operations, i.e. the modularity cannot be increased by changing the community of any single node. To find better community structures, other operations are thus needed to drive the system to escape from the local optimum. We use a simple method that migrates some of the nodes into randomly chosen communities, forcing the number of effective communities to increase suddenly, and the procedure is thus called dilation.
In a dilation process, all nodes change their community indices with a given probability p, called the perturbation rate, which controls the destruction level of the output community structure of the contraction process. When p = 0, the community structure remains unchanged; when p = 1, the community structure is totally dissimilar to the output result of the contraction process. For p values between 0 and 1, a part of the previous community structure is reserved, so the following optimization can hopefully make use of earlier achievements.
The whole procedure applies the contraction and dilation operations iteratively, as shown in the following algorithm: (Initialization) 1. Settle all nodes randomly in c max communities; record the community structure as c best and the Q value as Q best 2. repeat (Contraction) 3. repeat 4. for each node i 5. move node i to community β = arg max α iα 6. endfor 7. until Q cannot be increased anymore 8. if Q > Q best , update c best and Q best (Dilation) 9. for each node i 10. with perturbation rate p move node i to a randomly chosen community 11. endfor 12. until NITER times In the above algorithm, NITER is the given number of iterations of the contraction and dilation processes; arg max α iα denotes the value of argument α that maximizes iα . As the contraction and dilation processes proceed, the modularity gradually reaches a high level, further application of the procedures becomes unlikely to find better community structures. The whole procedure should stop when this happens. Experimental observations show that it takes about 30-70 iterations to reach this stage, with usually rather high Q values.

5
By end of the contraction and dilation processes, most nodes aggregate in tight groups, but there are still a small proportion of nodes forming small and loosely connected communities. This is usually unfavorable to good partitioning in terms of modularity. To overcome this problem, we try eliminating the unfavorable communities to reduce the intergroup links and increase the modularity. The elimination is realized by a greedy community merging procedure, an effective technique also used by other authors [13,19]. Numerical experiments show that the final merging indeed raises the modularity values by small, but substantial, differences. Since the community merging usually changes states of a small part of the nodes, it actually acts as a refinement process. For networks with relatively clear community structures, the contraction and dilation procedure alone is enough to achieve satisfactory results. To efficiently identify a good merging, we precompute the change to Q for each potential merging. Specifically, the difference of modularity caused by merging communities α and β is given by where e(α, β) is the number of inter-community edges.

Results and discussion
To test the performance of our algorithm CD, we applied it to a family of computergenerated networks with heterogeneous community structures and some real-world networks, and compared it with Newman's spectral with refinement algorithm (SR) [18], Ruan and Zhang's Qcut algorithm (Qcut) [13], as well as Lehmann and Hansen's mean field (MF) optimization algorithm [17]. Finally, we tested our algorithm on a class of benchmark graphs, as suggested recently by Lancichinetti et al [20]. We performed all the tests on a computer server with two Intel X5460 CPUs and 16 GB RAM. It should be noted that only one CPU core was used because all the programs are not parallel.

Computer-generated networks-Zipf block model
Network models with controllable explicit structures are indispensable to test and compare community detection algorithms. The random block model with evenly sized communities has been extensively employed as a benchmark [7,21]. In such model networks, all intra-and intercommunity node pairs are connected with probabilities p in and p out , respectively. In spite of the simplicity and ease of use, this model is oversimplified. Considering the inhomogeneity of most realistic networks, we use a new block model with heterogeneous community structures controlled by an extra parameter γ . Community sizes in this model are designed to obey Zipf's law [22] with exponent γ . This can be realized by assigning every node to αth community with probability where k is the number of designed communities and which guarantees normalization of the probabilities.

6
The probability of a node establishing a link with an outside-community partner is set to be p out . The overall average node degree is denoted by d , the link probability of a pair of nodes belonging to the same community α can be written as where n α and n are the numbers of nodes in community α and the whole network, respectively. Obviously, the intra-community link densities are different. Define community uncertainty as It can be seen from (8) that as χ goes from 0 to 1, the model presents from clear to ambiguous community structures. When χ = 1, the designed communities are totally undetectable because p in = p out . Let Q design be the modularity value corresponding to the model designed community structure. Deduced from (1, 6-9), Q design for a Zipf block model network turns out to be In this work, we design a family of Zipf block model networks. For each network, we set n = 3000, d = 50 and γ = 0.6. The nodes in each network are divided into 10 communities. So the biggest community has about 670 nodes, whereas the smallest one has about 170 nodes. The community uncertainty χ is used as the control parameter to change detectability of the community structures. We vary χ from 0.64 to 0.89, and the two extreme cases correspond to 29 and 40 outside-community links for each node on average, respectively. Link densities of this model with two different χ values are depicted in figure 1. Figures 2(a)-(c) show the results of four community detection methods, CD, Qcut, SR and MF applied to the model networks. We see from figure 2(a) that CD uniformly outperforms SR and MF in maximizing Q. When χ is small, CD and Qcut find almost the same Q values. As χ becomes larger, the community structure becomes ambiguous and CD finds higher Q value than Qcut. It should be noted that the community structure is not significantly clear when χ = 0.64 since a node will have on average more inter-than intra-community links. The Q values found by CD are higher than model designed ones for all χ values. Since the model community structure is known in advance, the fraction of nodes classified in accordance with the model designed community indices can be computed. As shown in figure 2(b), when χ is less than 0.78, CD and Qcut successfully detect all communities with almost 100% coincidence. When χ is larger than 0.78, though CD can find higher Q values than model designed ones, the distribution of nodes into communities is no longer identical to the model designed community structure and thus the coincidence decreases dramatically. This low coincidence is caused by the merging of multiple communities into larger ones, which is in accordance with the predictions of Fortunato and Barthelemy concerning resolution limits of community detection [11]. As shown in figure 2(c), for example, when χ = 0.89, CD finds on average seven out of the ten designed communities.

Choice of perturbation rate p
We use a Zipf block model with χ = 0.84 to investigate the influence of perturbation rate p on overall performance of the algorithm. As seen from figure 3, the average CPU time increases The number of nodes is set to 3000 and the average degree of each node is fixed at 50. All nodes are assigned to 10 communities, whose sizes obey Zipf's law with exponent γ = 0.6. The blockwise link densities are determined by the community uncertainty χ. When χ approaches 1, the community structures become more and more ambiguous. In our numerical experiments, χ ranges from 0.64 to 0.89 corresponding to relatively clear to ambiguous community structures. The left and right parts show the link densities for the two extreme cases. Gray levels present link densities: the darker the color, the denser the links. Note that when χ = 0.89, the difference between intra-and inter-community link densities are very small so that the communities are very hard to detect. monotonically when p increases. This is reasonable because larger perturbation rates lead to more nodes being driven to random communities in the dilation procedure and thus result in more effort to arrange them in new suitable locations. However, the longest CPU time does not mean the maximum Q value. With the same iteration times, Q has a peak when p equals 0.6 and the corresponding CPU time is about 3 s. As p becomes larger than 0.6, Q decreases. When p = 1, the found modularity value is the worst though it takes longer time to compute. Selecting an appropriate p, therefore, can dramatically improve the performance in terms of both the CPU time and Q value. Our numerical experiments show that a choice of p ranging from 0.4 to 0.8 is suitable for most tested cases.

Real-world networks
In order to further test our algorithm, we chose a set of networks of different sizes. The networks are, the karate club network of Zachary [23], the network of American football games between Division IA colleges [24], a metabolic network of the C. elegans [25], a network of email contacts at a university [26], two protein-protein interaction (PPI) networks of Escherichia coli and Saccharomyces cerevisiae [27], and a coauthorship network of scientists working in condensed matter physics [28].
Performance on real-world networks is concluded in table 1. CD clearly outperforms MF and SR in terms of both quality (higher Q values) and efficiency (less CPU time) for networks with more than 100 nodes. Besides, CD has identical performance to Qcut for very small networks, but is faster than Qcut and simultaneously reaches higher Q values for larger networks. MF is sensitive to the number of initial communities, n c0 . Larger n c0 leads to smaller Q values. For instance, when n c0 = 300, MF takes more than 4 min to reach its largest found modularity, Q max = 0.4873, of the PPI network of S. cerevisiae. Whereas, when n c0 = 20, it takes ∼0.7 min to reach a higher Q max = 0.5459. This is a shortcoming of MF because in practice it is not trivial to guess a good value of n c0 . For all tested networks, n c0 for MF is tuned to its optimum values. In comparison, other methods do not need to set n c0 with much care.
SA is a common global optimization technique that has been used in community detection [15,16]. The main difference between SA and CD is that SA avoids getting trapped  a The modularity value found by SR is directly taken from [18]. b For the collaboration network, the programs for MF and SR ran out of memory and resulted in no output.
in local optima through allowing some unfavorable node movements decreasing the modularity. A large number of unfavorable movements in SA lowers down the search efficiency. Table 2 is the test results of an SA method proposed by Guimera and Amaral [15] with the cooling factor set to 0.8, 0.95 and 0.995, and other parameters fixed as recommended by the authors. It can be seen that SA yields comparably good Q values but takes much longer computational time.

A class of benchmark networks
Recently, Lancichinetti et al [20] have introduced a model to generate networks with powerlaw distributed node degrees (with exponent γ ) and community sizes (with exponent β) to simulate the main feature of a wide range of real-world networks. The ability of an   Performance of CD on the collaboration network [28] and the S. cerevisiae protein-protein network [27] with different values of NITER. Each data point is the mean and standard deviation of 50 independent runs.
algorithm to rediscover the designed community structures is of great interest for the purpose of estimating the performance on real-world networks. We use the software of Lancichinetti et al (http://santo.fortunato.googlepages.com/benchmark.tgz) to test the accuracy of our algorithm in the determination of designed community structures. To evaluate the accuracy, we employ a measure, normalized mutual information, borrowed from information theory, as used in [20,21], to compute similarity between the model designed partitions and algorithm found ones. The normalized mutual information ranges from 0 to 1, with 1 implying a perfect agreement. The ambiguity of the community structures in Lancichinetti et al's model is controlled by a mixing parameter µ, which is defined as the expected proportion of outside links to the total ones of every node. When µ > 0.5, most links of each node connect to outside partners, therefore the community structures are rather ambiguous. From figure 4 we see that even in the ambiguous regions, community structures found by CD strongly agree with the models. On the same models, the methods tested in [20] gave significantly lower agreements.

Influence of NITER
In general, more runs of the contraction and dilation processes result in higher modularity values. However, as the optimization proceeds, better partitioning is gradually harder to find, i.e. the search efficiency decreases. To maximize the efficiency, a tradeoff should be made between effort and gain. In practice, a moderate number of runs of the contraction and dilation processes is usually enough to produce good results, because the deficiency of inadequate optimization can be compensated by the final refinement. This can be seen from figure 5 where we present examples of the influence of different values of NITER on the performance of CD, for the collaboration network [28] and the S. cerevisiae protein-protein network [27]. We see from the figure that CD yields high modularity solutions with few iterations of the contraction and dilation processes. Satisfactory modularity values can be reached when NITER is approximately set to 30. For the collaboration network, we tested nine different values of NITER. When NITER equals 5, CD has already found a relatively Figure 6. The optimal community structures of Zachary's karate club network at three resolution scales. Nodes belonging to the same community are depicted in the identical colors and shapes. The case of r = 0 corresponds to the best community structure of the original network.
higher modularity value, which is larger than the one found by Qcut and SR (table 1). When NITER varies from 30 to 70, the modularity increases slightly and reaches the maximum value. The result for the S. cerevisiae PPI network is similar, and CD needs less runs of contraction and dilation processes to obtain the highest modularity value. Experimental observations show that a choice of NITER ranging from 30 to 70 is suitable for most tested cases.

Conclusions
To find communities by maximizing network modularity, we propose a simple and efficient method using a CD strategy. The contraction procedure optimizes community structures based essentially on the single-node-move operation at very low computational expense. This procedure rapidly decreases the number of communities assigned initially and thus is called contraction. To drive the system to escape from local optima, we use the dilation procedure that partially destroys the community structures found by the contraction procedure and increases the number of communities to a large number. When the contraction and dilation procedures eventually finish, we apply a merging procedure to refine the community structure.
The proposed method outperforms existing algorithms in terms of higher modularity values found and accuracies in less CPU time for computer-generated networks as well as real-world examples. The required memory for CD is approximately O(nc max ), which is basically used to store the data d i (α) (the number of node i's connecting partners in community α). The low memory demand is an implicit strength of the algorithm and makes it possible to analyze very large networks. In contrast MF and SR could not be applied to the collaboration network with 27 519 nodes because of the high memory requirements in our tests.
It is also worth noting that our algorithm can be directly applied to weighted networks. Owing to this characteristic, it is easy to use CD in Arenas et al's approach [12] to cope with the resolution limit problem of modularity-based community detection methods. For example, the original Zachary's karate club network was detected to have four communities, as shown in figure 6(a). According to Arenas et al's method, we rescale the network topology by adding self-loops of weight r for every node, i.e. adding r to the diagonal entries of the adjacency matrix. When r = −2, the optimal partition has two communities (figure 6(b)). As r increases, the optimal communities become gradually smaller, as shown in figure 6(c) for r = 2. The best partitions in terms of modularity for different r values reveal the community structures of the original network at multiple resolution levels.