Heuristic Modularity Maximization Algorithms for Community Detection Rarely Return an Optimal Partition or Anything Similar

Aref, Samin; Mostajabdaveh, Mahdi; Chheda, Hriday

doi:10.1007/978-3-031-36027-5_48

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 10476))

Included in the following conference series:

International Conference on Computational Science

1213 Accesses
2 Citations
1 Altmetric

Abstract

Community detection is a fundamental problem in computational sciences with extensive applications in various fields. The most commonly used methods are the algorithms designed to maximize modularity over different partitions of the network nodes. Using 80 real and random networks from a wide range of contexts, we investigate the extent to which current heuristic modularity maximization algorithms succeed in returning maximum-modularity (optimal) partitions. We evaluate (1) the ratio of the algorithms’ output modularity to the maximum modularity for each input graph, and (2) the maximum similarity between their output partition and any optimal partition of that graph. We compare eight existing heuristic algorithms against an exact integer programming method that globally maximizes modularity. The average modularity-based heuristic algorithm returns optimal partitions for only 19.4% of the 80 graphs considered. Additionally, results on adjusted mutual information reveal substantial dissimilarity between the sub-optimal partitions and any optimal partition of the networks in our experiments. More importantly, our results show that near-optimal partitions are often disproportionately dissimilar to any optimal partition. Taken together, our analysis points to a crucial limitation of commonly used modularity-based heuristics for discovering communities: they rarely produce an optimal partition or a partition resembling an optimal partition. If modularity is to be used for detecting communities, exact or approximate optimization algorithms are recommendable for a more methodologically sound usage of modularity within its applicability limits.

You have full access to this open access chapter, Download conference paper PDF

Keywords

1 Introduction

Community detection (CD), the process of inductively identifying communities within a network, is a core problem in computational sciences, particularly, in physics, computer science, biology, and computational social science [19, 50]. Among common approaches for CD are the algorithms which are designed to maximize a utility function, modularity [37], across all possible ways that the nodes of the input network can be partitioned into communities. Modularity measures the fraction of edges within communities minus the expected fraction if the edges were distributed randomly; with the random distribution of the edges being a null model that preserves the node degrees. Despite their name and design philosophy, current modularity maximization algorithms, which are used by no less than tens of thousands of peer-reviewed studies [28], are not guaranteed to maximize modularity [24, 35, 38]. This has led to uncertainty [20, 24] in the extent to which they succeed in returning a maximum-modularity (optimal) partition or something similar.

Modularity is among the first objective functions proposed for optimization-based detection of communities [18, 37]. Several limitations [16, 18, 21, 41] of modularity including the resolution limit [17] have led researchers to develop alternative CD methods using stochastic block modeling [23, 32, 40, 45], information theoretic approaches [43, 44], and alternative objective functions [2, 34, 36, 47]. Modularity-based algorithms are the most commonly used method for CD [19, 46]. Despite the widespread adoption of modularity-based heuristics, there is uncertainty [20, 24] in their success in maximizing modularity. This study aims to address this uncertainty by quantifying the extent to which eight commonly used heuristics [7, 8, 13, 30, 31, 46, 48, 50] succeed in returning an optimal partition or a partition resembling an optimal partition. After describing the methods and materials, we present the main results followed by a discussion of the methodological ramifications and future directions.

2 Methods and Materials

This study aims to investigate the extent to which eight commonly used heuristic modularity maximization algorithms [7, 8, 13, 30, 31, 46, 48, 50] succeed in returning an optimal partition or a partition similar to an optimal partition. To achieve this objective, we quantify the proximity of their results to the globally optimal partition(s), which we obtain using an exact Integer Programming (IP) model for maximizing modularity [1, 9, 15]. We do not claim that maximum-modularity partitions represent best partitions. Throughout the paper, we use the terms network and graph interchangeably.

2.1 Modularity

Consider the simple graph $G=(V,E)$ with $|V|=n$ nodes, $|E|=m$ edges, adjacency matrix entries $a_{ij}$, and a partition $X=\{V_1,V_2, \dots , V_k \}$ of the node set V into k communities. The modularity function $Q_{(G,X)}$ is computed [18, 37] according to Eq. (1)

$$\begin{aligned} Q_{(G,X)}= \frac{1}{2m} \sum \limits _{(i,j) \in V^2} \left( a_{ij} - \gamma \frac{d_id_j}{2m}\right) \delta (i,j) \end{aligned}$$

(1)

where $d_i$ represents the degree of node i, $\gamma $ is the resolution parameter^{Footnote 1}, and $\delta (i,j)$ is 1 if nodes i and j are in the same community otherwise 0. The term associated with each pair of nodes (i, j) is alternatively represented as $b_{ij}=a_{ij} -\gamma \frac{d_id_j}{2m}$ and referred to as the modularity matrix entry for (i, j).

2.2 Modularity Maximization

The modularity maximization problem for input graph $G=(V,E)$ involves finding a partition $X^*$ whose associated $Q_{(G,X^*)}$ is globally maximum over all possible partitions of the node set V.

2.3 Sparse IP Formulation of Modularity Maximization

Consider the simple graph $G=(V,E)$ with modularity matrix entries $b_{ij}$, obtained using the resolution parameter $\gamma $. We use the binary decision variable $x_{ij}$ for each pair of distinct nodes $(i,j),i<j$. Their community membership is either the same (represented by $x_{ij}=0$) or different (represented by $x_{ij}=1$). Accordingly, the problem of maximizing the modularity of input graph G can be formulated as an IP model [15] as in Eq. (2).

$$\begin{aligned} \begin{aligned}&\max _{x_{ij}} Q = \frac{1}{2m} \left( \sum \limits _{(i,j) \in V^2 , i< j} 2b_{ij}(1- x_{ij}) + \sum \limits _{(i,i) \in V^2} b_{ii} \right) \\&\text {s.t.} \quad x_{ik}+x_{jk} \ge x_{ij} \quad \forall (i,j) \in V^2 , i< j, k\in K(i,j) \\&\quad \quad x_{ij} \in \{0,1\} \quad \forall (i,j) \in V^2 , i< j \end{aligned} \end{aligned}$$

(2)

In Eq. (2), the optimal objective function value equals the maximum modularity for the input graph G. An optimal community assignment is characterized by the optimal values of the $x_{ij}$ variables. K(i, j) indicates a minimum-cardinality separating set [15] for the nodes i, j. Using K(i, j) in the IP model of this problem leads to a more efficient formulation with $\mathcal {O}(n^2)$ constraints [15] instead of $\mathcal {O}(n^3)$ constraints as in earlier IP formulations of the problem [1, 9]. Solving this optimization problem is NP-complete [9, 35]. We use the Gurobi solver (version 10.0) [22] to solve it for the small and mid-sized instances as outlined in Subsect. 2.6.

2.4 Reviewing Eight Heuristic Modularity Maximization Algorithms

We evaluate eight modularity maximization heuristics known as Clauset-Newman-Moore (CNM) [13], Louvain [7], Leicht-Newman (LN) [30], Combo [46], Belief [50], Paris [8], Leiden [48], and EdMot-Louvain [31]. We have used the Python implementations of these eight algorithms which are accessible in the Community Discovery library (CDlib) version 0.2.6 [42].

We briefly describe how these eight algorithms use modularity to discover communities. The CNM algorithm initializes each node as a community by itself. It then follows a greedy scheme of merging two communities that contribute the maximum positive value to modularity [13]. The Louvain algorithm involves two sets of iterative steps: (1) locally moving nodes for increasing modularity and (2) aggregating the communities from the first step [7]. Despite Louvain being the most commonly used modularity-based algorithm [28], it may sometimes lead to disconnected components in the same community [48]. The LN algorithm uses spectral optimization to maximize modularity which also supports directed graphs [30]. The Combo algorithm is a general optimization-based CD method which supports modularity maximization among other tasks. It involves two sets of iterative steps: (1) finding the best merger, split, or recombination of communities to maximize modularity and (2) performing a series of Kernighan-Lin bisections [26] on the communities as long as they increase modularity [46]. The Belief algorithm seeks the consensus of different high-modularity partitions through a message-passing algorithm [50] motivated by the premise that maximizing modularity can lead to many poorly correlated competing partitions. The Paris algorithm is suggested to be a modularity-maximization scheme with a sliding resolution [8]; that is, an algorithm capable of capturing the multi-scale community structure of real networks without a resolution parameter. It generates a hierarchical community structure based on a simple distance between communities using a nearest-neighbour chain [8]. The Leiden algorithm attempts to resolve a defect of the Louvain algorithm in returning badly connected communities. It is suggested to guarantee well-connected communities in which all subsets of all communities are locally optimally assigned [48]. The EdMot-Louvain algorithm (EdMot for short) is developed to overcome the hypergraph fragmentation issue observed in previous motif-based CD methods [31]. It first creates the graph of higher-order motifs (small dense subgraph patterns) and then partitions it using the Louvain method to heuristically maximize modularity using higher-order motifs [31].

To evaluate these eight modularity-based algorithms in maximizing modularity, we quantify (1) the ratio of their output modularity to the maximum modularity for each input graph and (2) the maximum similarity between their output partition and any optimal partition of that graph. We obtain optimal partitions by solving the IP model in Eq. (2) using the Gurobi solver (version 10.0) with a termination criterion ensuring global optimality [22].

2.5 Measures for Evaluating Heuristic Algorithms

For a quantitative measure of proximity to global optimality, we define and use the Global Optimality Percentage (GOP) as the fraction of the modularity returned by a heuristic method for a network divided by the globally maximum modularity for that network (obtained by solving the IP model in Eq. (2)). In all cases where the modularity returned by a heuristic method equals the maximum modularity for the input graph, we set GOP = 1. In cases where a heuristic algorithm returns a partition with a negative modularity value, we set GOP = 0 to facilitate easier interpretation of proximity to optimality based on non-negative GOP values.

We also use a quantitative measure for the similarity of a partition to an optimal partition. Normalized Adjusted Mutual Information (AMI) [49] is a measure of similarity between two partitions of the same network. Unlike normalized mutual information [49], AMI adjusts the measurement based on the similarity that two partitions may have by pure chance. AMI for a pair of identical partitions (or permutations of the same partition) equals 1. For two different partitions, however, AMI takes a smaller value (including 0 or negative values close to 0 for two extremely dissimilar partitions).

2.6 Data and Resources

For our computational experiments, we include 60 real networks^{Footnote 2} with no more than 2812 edges as well as 10 Erdős-Rényi graphs and 10 Barabási-Albert graphs with 125–153 edges. These instance sizes were chosen to ensure all algorithms terminate within a reasonable time. The computational experiments were implemented in Python 3.9 using a notebook computer with an Intel Core i7-11800H @ 2.30 GHz CPU and 64 GB of RAM running Windows 10.

3 Results

We present the main results from our experiments in the following four subsections. In Subsect. 3.1, we compare partitions from different algorithms on a single network. In Subsect. 3.2, we examine the multiplicity of optimal partitions and investigate the similarity between multiple optimal partitions of the same networks. In Subsect. 3.3, we evaluate the effectiveness of the heuristic algorithms on 80 networks by measuring the distance of sub-optimal partitions from an optimal partition. Finally, in Subsect. 3.4, we investigate the success rate of the heuristic algorithms in finding an optimal partition.

3.1 Comparing Partitions from Different Algorithms on One Network

Figure 1 shows one graph and its nine partitions returned by nine CD methods. This graph^{Footnote 3} represents an anonymized Facebook ego network^{Footnote 4}. Nodes represent Facebook users, and an edge exists between any pair of users who were friends on Facebook in April 2014 [33]. Communities are shown using node colors.

Panel 1a of Fig. 1 shows an optimal partition obtained by solving the IP model in Eq. (2) for the network facebook_friends. It involves $k=28$ communities, and a maximum modularity value of $Q^*=0.7157714$. The partitions from the eight heuristic modularity maximization algorithms are all sub-optimal as depicted in panels 1b–1i of Fig. 1. Compared to other algorithms, the two algorithms Combo and LN have more success in achieving proximity to an optimal partition. LN returns a partition with $k=28$ communities and a modularity of $Q=0.7139$ which has the highest AMI among all heuristics (0.971). The relative success of the Combo algorithm is in returning a high-modularity partition with $Q=0.7157709$, but with $k=13$ communities and a lower AMI (0.949) compared to LN. The sub-optimal partitions from the other six algorithms have more substantial variations in Q, AMI, and k (number of communities) as shown by the values in the corresponding subcaptions in Fig. 1.

3.2 Multiplicity of Optimal Partitions

While the partition which maximizes modularity is often unique, some graphs have multiple optimal partitions. For all networks considered in our analysis, we obtain all optimal partitions using the Gurobi solver by running it with a special configuration for finding all optimal partitions [22]. Figure 2 shows a protein network^{Footnote 5} and its four optimal partitions. In this network, nodes represent proteins and an edge represents a binding interaction between two proteins (PDZ-domain-mediated protein-protein binding interaction) [6]. All four optimal partitions have $Q^*=0.80267$ and $k=29$.

The differences between optimal partitions of this network are in the community assignments for two nodes indicated by red arrows in Fig. 2. The six pairwise AMI values for the optimal partitions are all >0.98 confirming the high level of similarity between the four optimal partitions in Fig. 2.

Obtaining all optimal partitions for all 80 networks, we observed that 89% of the graphs have unique optimal partitions and the multiplicity of optimal partitions is a relatively rare event. Given the possibility of multiple optimal partitions in some graphs, we calculated the AMI for the partition of each heuristic algorithm and each of the multiple optimal partitions of that graph. We then conservatively reported the maximum AMI of each heuristic for each graph to quantify the similarity between that partition and its closest optimal partition. Consequently, a low value of AMI for a partition obtained by a heuristic algorithm indicates its dissimilarity to any optimal partition.

Our results suggest that the rarely observed multiple optimal partitions of a graph often have a high degree of similarity (AMI values >0.9) because their differences are often only in the community assignments of a very few nodes (as in Fig. 2). Dissimilarity between multiple optimal partitions of a network seems to be exceptional, but it has been observed in one of our 80 networks: contiguous USA^{Footnote 6}, where nodes are US states and each edge indicates a land-based border between two states. The AMI of the two optimal partitions for this network is exceptionally low (0.34). Upon further investigation, we observed that one optimal partition combines five communities of the other optimal partition together. This makes the two partitions related in terms of belonging to a clustering hierarchy, while they are not similar according to an AMI definition of partition similarity. These exceptional cases are possible due to the mathematical symmetries resulted from the value of $\gamma $ used in Eq. (1) for defining modularity. Our results suggest that there is usually a distinct uniqueness to an optimal partition (or a group of similar optimal partitions) for a given network in comparison to sub-optimal partitions. This new perspective is contrary to the premise that maximizing modularity leads to many competing partitions with almost the same modularity [50] and no clear way of selecting between them [41]. It is the failure to actually maximize modularity that may lead to many poorly correlated competing partitions with unknown distances from the desired objective (both in modularity and in partition similarity). What remains to be analyzed is how different sub-optimal partitions are from an optimal partition and how often heuristic modularity maximization algorithms return sub-optimal partitions. We investigate these two questions in the next two subsections.

3.3 Evaluating Heuristic Algorithms on 80 Networks

For summarizing the results of eight heuristics on 80 networks, we present four scatter-plots of GOP and AMI. Figure 3 shows GOP on the y-axes and AMI on the x-axes for the combination of each network and algorithm. For each algorithm (color-coded), there are 60 data points for the 60 real networks and 2 data points representing the average of 10 Erdős-Rényi and the average of 10 Barabási-Albert graphs. The first three letters of the network names are indicated on each data point (magnify the figure on screen for the details). Four 45-degree lines are drawn to indicate the areas where the GOP and AMI are equal. In other words, the 45-degree lines show areas where the extent of sub-optimality ($1-\text {GOP}$) is associated with a dissimilarity ($1-\text {AMI}$) of the same size between the sub-optimal partition and any optimal partition.

Looking at the y-axes values in Fig. 3, we observe that there is a substantial variation in the values of GOP (i.e. the extent of sub-optimality) for the eight heuristic algorithms. The Belief algorithm returns partitions associated with negative modularity values for 45 of the 80 instances (leading to most of its data points having GOP = 0 and being concentrated at the bottom of the scatter-plot). The Paris algorithm returns partitions with modularity values substantially smaller than the maximum modularity values. Aside from a few exceptions, all data points for Leiden and LN have the same position indicating their identical performance on most of these instances. The two algorithms CNM and EdMot seem to have higher variation in GOP (compared to the other algorithms) for these instances. Overall, the four algorithms with highest and increasing performance in returning close-to-maximum modularity values are LN, Leiden, Louvain, and Combo respectively. Despite that these instance are graphs with no more than 2812 edges, they are, according to Fig. 3, challenging instances for these heuristic algorithms to optimize. Given that modularity maximization is an NP-complete problem [9, 35], one can argue that the performance of these heuristic methods in term of proximity to an optimal partition does not improve for larger networks.

The x-axes values in Fig. 3 show considerable dissimilarity between the sub-optimal partitions and an optimal partition for these 80 instances. Except for the Combo algorithm, a large number of the sub-optimal partitions obtained by these heuristic algorithms have AMI values smaller than 0.6. This indicates that their sub-optimal partitions are substantially different from any optimal partition. Even for data points concentrated at the top of the scatter-plots which have $0.95<\text {GOP}<1$, we see AMI values substantially smaller than 1. Compared to the other seven heuristics, Combo appears to consistently return partitions with large AMIs on a larger number of these 80 instances.

Focusing on the position of data points, we observe that they are mostly located above their corresponding 45-degree line. This indicates that sub-optimal partitions tend to be disproportionately dissimilar to any optimal partition (as foreshadowed in [14]). This result goes against the naive viewpoint that close-to-maximum modularity partitions are also close to an optimal partition. Our results are aligned with previous concerns that these heuristics may result in degenerate solutions far from the underlying community structure [20] and they have a high risk of algorithmic failure [24].

3.4 Success Rate of Heuristic Algorithms in Maximizing Modularity

Our GOP results for the eight heuristic algorithms allow us to answer a fundamental question about the heuristic modularity maximization algorithms: how often each algorithm returns an optimal (a maximum-modularity) partition? We report the fraction of networks (out of 80) for which a given algorithm returns an optimal partition. Combo [46] has the highest success rate, returning an optimal partition for $55\%$ of the networks. LN [30] and Leiden [48] maximize modularity for 36.2% of the networks considered. Louvain [7] has a success rate of 18.7%. The algorithms CNM [13], EdMot [31], Paris [8], and Belief [50] have success rates of 5%, 2.5%, 1.2%, and 0% respectively. These are arguably low success rates for what the name modularity maximization algorithm implies or the idea of discovering network communities through maximizing a function.

Earlier in Fig. 3, we observed that near-optimal partitions tend to be disproportionately dissimilar to any optimal partition. In other words, close-to-maximum modularity partitions are rarely close to any optimal partition. Taken together with the low success rates of heuristic algorithms in maximizing modularity, our results indicate a crucial mismatch between the design philosophy of modularity maximization algorithms for CD and their capabilities: heuristic modularity maximization algorithms rarely return an optimal partition or a partition resembling an optimal partition.

4 Discussions and Future Directions

Understanding modularity capabilities and limitations has been complicated by the under-studied sub-optimality of modularity-based heuristics and their methodological consequences. Previous methodological studies [11, 12, 29, 36, 41], which have shed light on other aspects, had rarely disentangled the heuristic aspect of these algorithms from the fundamental concept of modularity. Our study is a continuation of previous efforts [20] in separating the effects of sub-optimality (or the choice of using greedy algorithms [24]) from the effects of using modularity on the fundamental task of detecting communities.

We analyzed the effectiveness of eight heuristics in maximizing modularity. While our findings are limited to a few algorithms, their combined usage by tens of thousands of peer-reviewed studies [28] motivates the importance of this assessment. Most heuristic algorithms for modularity maximization tend to scale well for large networks [51]. They are widely used not only because of their scalability or ease of implementation [24], but also because their high risk of algorithmic failure is not well understood [24]. The scalability of these heuristics comes at a cost: their partitions have no guarantee of proximity to an optimal partition [20] and, as our results showed, they rarely return an optimal partition. Moreover, we showed that their sub-optimal partitions tend to be disproportionately dissimilar to any optimal partition.

Neither using modularity nor succeeding in maximizing it is required for CD at the big-picture level. A recent study suggests modularity maximization is the most problematic CD method and considers it harmful [41]. Another study shows that, given computational feasibility, exact maximization of multiresolution modularity outperforms other CD methods in accurate and stable retrieval of planted communities [4] suggesting the relevance of modularity for CD. For some applications and contexts, general CD algorithms [39] which scale to large instance sizes are needed. However, for a “narrow set of tasks” [39, pp.7], involving small and mid-sized networks, specialized algorithms which outperform general algorithms are useful.

Our findings suggest that if modularity is to be used for detecting communities, developing approximation [10, 14, 25] and exact [3, 4] algorithms are recommendable for a more methodologically sound usage of modularity within its applicability limits. Exact algorithms can also reveal the formal guarantees of performance [19] for accurate modularity-based algorithms.

A promising path forward could be using the advances in integer programming to develop a specialized accurate algorithm for solving the modularity maximization IP models [1, 9, 15] for networks of practical relevance within the limits of computational feasibility. New heuristic and approximation algorithms that strike a balance between accurate computations and scalability may also be useful particularly for large-scale networks.

Notes

1.
Without loss of generality, we set $\gamma =1$ for all the analysis in this paper.
2.
All networks are accessible from the Netzschleuder with the details in the Appendix.
3.
facebook_friends network [33] from the Netzschleuder repository.
4.
A network of one person’s social ties to other persons and their ties to each other.
5.
interactome_pdz network [6] from the Netzschleuder repository.
6.
contiguous_usa network [27] from the Netzschleuder repository.

References

Agarwal, G., Kempe, D.: Modularity-maximizing graph communities via mathematical programming. Eur. Phys. J. B 66(3), 409–418 (2008). https://doi.org/10.1140/epjb/e2008-00425-1
Article MathSciNet MATH Google Scholar
Aldecoa, R., Marín, I.: Deciphering network community structure by surprise. PLoS ONE 6(9), 1–8 (2011). https://doi.org/10.1371/journal.pone.0024195
Article Google Scholar
Aloise, D., Cafieri, S., Caporossi, G., Hansen, P., Perron, S., Liberti, L.: Column generation algorithms for exact modularity maximization in networks. Phys. Rev. E 82(4), 046112 (2010). https://doi.org/10.1103/PhysRevE.82.046112
Article Google Scholar
Aref, S., Chheda, H., Mostajabdaveh, M.: The Bayan algorithm: detecting communities in networks through exact and approximate optimization of modularity. arXiv preprint arXiv:2209.04562 (2022)
Aref, S., Chheda, H., Mostajabdaveh, M.: Dataset of networks used in accessing the Bayan algorithm for community detection (2023). https://doi.org/10.6084/m9.figshare.22442785
Beuming, T., Skrabanek, L., Niv, M.Y., Mukherjee, P., Weinstein, H.: PDZBase: a protein-protein interaction database for PDZ-domains. Bioinformatics 21(6), 827–828 (2005)
Article Google Scholar
Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008(10), P10008 (2008). https://doi.org/10.1088/1742-5468/2008/10/P10008
Article MATH Google Scholar
Bonald, T., Charpentier, B., Galland, A., Hollocou, A.: Hierarchical graph clustering using node pair sampling. In: MLG 2018–14th International Workshop on Mining and Learning with Graphs. London, UK (2018)
Google Scholar
Brandes, U., et al.: On modularity clustering. IEEE Trans. Knowl. Data Eng. 20(2), 172–188 (2007)
Article Google Scholar
Cafieri, S., Costa, A., Hansen, P.: Reformulation of a model for hierarchical divisive graph modularity maximization. Ann. Oper. Res. 222, 213–226 (2014)
Article MathSciNet MATH Google Scholar
Chen, S., et al.: Global vs local modularity for network community detection. PLoS ONE 13(10), 1–21 (2018). https://doi.org/10.1371/journal.pone.0205284
Article Google Scholar
Chen, T., Singh, P., Bassler, K.E.: Network community detection using modularity density measures. J. Stat. Mech. Theory Exp. 2018(5), 053406 (2018). https://doi.org/10.1088/1742-5468/aabfc8
Article MathSciNet MATH Google Scholar
Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)
Article Google Scholar
Dinh, T.N., Li, X., Thai, M.T.: Network clustering via maximizing modularity: approximation algorithms and theoretical limits. In: 2015 IEEE International Conference on Data Mining, pp. 101–110 (2015). https://doi.org/10.1109/ICDM.2015.139
Dinh, T.N., Thai, M.T.: Toward optimal community detection: from trees to general weighted networks. Internet Math. 11(3), 181–200 (2015)
Article MathSciNet MATH Google Scholar
Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010)
Article MathSciNet Google Scholar
Fortunato, S., Barthélemy, M.: Resolution limit in community detection. Proc. Natl. Acad. Sci. 104(1), 36–41 (2007)
Article Google Scholar
Fortunato, S., Hric, D.: Community detection in networks: a user guide. Phys. Rep. 659, 1–44 (2016). https://doi.org/10.1016/j.physrep.2016.09.002
Article MathSciNet Google Scholar
Fortunato, S., Newman, M.E.: 20 years of network community detection. Nat. Phys. 18, 848–850 (2022)
Article Google Scholar
Good, B.H., De Montjoye, Y.A., Clauset, A.: Performance of modularity maximization in practical contexts. Phys. Rev. E 81(4), 046106 (2010)
Article MathSciNet Google Scholar
Guimerà, R., Sales-Pardo, M., Amaral, L.A.N.: Modularity from fluctuations in random graphs and complex networks. Phys. Rev. E 70, 025101 (2004)
Article Google Scholar
Gurobi Optimization Inc.: Gurobi optimizer reference manual (2023). https://gurobi.com/documentation/10.0/refman/index.html. Accessed 16 Feb 2023
Karrer, B., Newman, M.E.J.: Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107 (2011)
Article MathSciNet Google Scholar
Kawamoto, T., Kabashima, Y.: Counting the number of metastable states in the modularity landscape: algorithmic detectability limit of greedy algorithms in community detection. Phys. Rev. E 99(1), 010301 (2019)
Article Google Scholar
Kawase, Y., Matsui, T., Miyauchi, A.: Additive approximation algorithms for modularity maximization. J. Comput. Syst. Sci. 117, 182–201 (2021). https://doi.org/10.1016/j.jcss.2020.11.005
Article MathSciNet MATH Google Scholar
Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–307 (1970)
Article MATH Google Scholar
Knuth, D.E.: The Stanford GraphBase: A Platform for Combinatorial Computing, vol. 1. ACM Press, New York (1993)
MATH Google Scholar
Kosowski, A., Saulpic, D., Mallmann-Trenn, F., Cohen-addad, V.P.: On the power of Louvain for graph clustering. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33 (NeurIPS’20) (2020)
Google Scholar
Lancichinetti, A., Fortunato, S.: Limits of modularity maximization in community detection. Phys. Rev. E 84(6), 066122 (2011). https://doi.org/10.1103/PhysRevE.84.066122
Article Google Scholar
Leicht, E.A., Newman, M.E.J.: Community structure in directed networks. Phys. Rev. Lett. 100(11), 118703 (2008). https://doi.org/10.1103/PhysRevLett.100.118703
Article Google Scholar
Li, P.Z., Huang, L., Wang, C.D., Lai, J.H.: EdMot: an edge enhancement approach for motif-aware community detection. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 479–487 (2019)
Google Scholar
Liu, X., et al.: A scalable redefined stochastic blockmodel. ACM Trans. Knowl. Discov. Data (TKDD) 15(3), 1–28 (2021)
Google Scholar
Maier, B.F., Brockmann, D.: Cover time for random walks on arbitrary complex networks. Phys. Rev. E 96(4), 042307 (2017)
Article Google Scholar
Marchese, E., Caldarelli, G., Squartini, T.: Detecting mesoscale structures by surprise. Commun. Phys. 5(1), 1–16 (2022)
Article Google Scholar
Meeks, K., Skerman, F.: The parameterised complexity of computing the maximum modularity of a graph. Algorithmica 82(8), 2174–2199 (2020)
Article MathSciNet MATH Google Scholar
Miasnikof, P., Shestopaloff, A.Y., Bonner, A.J., Lawryshyn, Y., Pardalos, P.M.: A density-based statistical analysis of graph clustering algorithm performance. J. Complex Netw. 8(3), 1–33 (2020)
Google Scholar
Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103(23), 8577–8582 (2006). https://doi.org/10.1073/pnas.0601602103
Article Google Scholar
Newman, M.E.J.: Equivalence between modularity optimization and maximum likelihood methods for community detection. Phys. Rev. E 94(5), 052315 (2016). https://doi.org/10.1103/PhysRevE.94.052315
Article Google Scholar
Peel, L., Larremore, D.B., Clauset, A.: The ground truth about metadata and community detection in networks. Sci. Adv. 3(5), e1602548 (2017)
Article Google Scholar
Peixoto, T.P.: Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models. Phys. Rev. E 89(1), 012804 (2014)
Article Google Scholar
Peixoto, T.P.: Descriptive vs. Inferential Community Detection in Networks: Pitfalls, Myths and Half-Truths. Elements in the Structure and Dynamics of Complex Networks, Cambridge University Press, Cambridge (2023)
Google Scholar
Rossetti, G., Milli, L., Cazabet, R.: CDLIB: a Python library to extract, compare and evaluate communities from complex networks. Appl. Netw. Sci. 4(1), 1–26 (2019)
Article Google Scholar
Rosvall, M., Bergstrom, C.T.: An information-theoretic framework for resolving community structure in complex networks. Proc. Natl. Acad. Sci. 104(18), 7327–7331 (2007). https://doi.org/10.1073/pnas.0611034104
Article Google Scholar
Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. 105(4), 1118–1123 (2008). https://doi.org/10.1073/pnas.0706851105
Article Google Scholar
Serrano, B., Vidal, T.: Community detection in the stochastic block model by mixed integer programming (2021)
Google Scholar
Sobolevsky, S., Campari, R., Belyi, A., Ratti, C.: General optimization technique for high-quality community detection in complex networks. Phys. Rev. E 90(1), 012811 (2014)
Article Google Scholar
Traag, V.A., Aldecoa, R., Delvenne, J.C.: Detecting communities using asymptotical surprise. Phys. Rev. E 92, 022816 (2015). https://doi.org/10.1103/PhysRevE.92.022816
Article Google Scholar
Traag, V.A., Waltman, L., van Eck, N.J.: From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9(1) (2019). https://doi.org/10.1038/s41598-019-41695-z
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J. Mach. Learn. Res. 11(95), 2837–2854 (2010). http://jmlr.org/papers/v11/vinh10a.html
Zhang, P., Moore, C.: Scalable detection of statistically significant communities and hierarchies, using message passing for modularity. Proc. Natl. Acad. Sci. 111(51), 18144–18149 (2014)
Article Google Scholar
Zhao, X., Liang, J., Wang, J.: A community detection algorithm based on graph compression for large-scale social networks. Inf. Sci. 551, 358–372 (2021)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We are thankful to the three anonymous referees and Ly Dinh for their helpful comments. We acknowledge Zachary P. Neal for pointing us to this problem and Santo Fortunato for the encouraging correspondence. Accessing CDlib, Netzschleuder, and ICON has been particularly helpful in this study for which we thank Giulio Rossetti, Tiago P. Peixoto, Aaron Clauset, and everyone contributing to these open science initiatives. This study has been supported by the Data Sciences Institute at the University of Toronto.

Author information

Authors and Affiliations

Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, M5S 3G8, Canada
Samin Aref
Huawei Technologies Canada, Burnaby, V5C 6S7, Canada
Mahdi Mostajabdaveh
Department of Computer Science, University of Toronto, Toronto, M5S 2E4, Canada
Hriday Chheda

Authors

Samin Aref
View author publications
You can also search for this author in PubMed Google Scholar
Mahdi Mostajabdaveh
View author publications
You can also search for this author in PubMed Google Scholar
Hriday Chheda
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization (SA); data curation (SA, HC); formal analysis (SA, MM); funding acquisition (SA); investigation (SA); methodology (SA,MM); project administration (SA); resources (SA, MM); software (SA, HC, MM); supervision (SA, MM); validation (SA, MM); visualization (SA, MM); writing - original draft preparation (SA); writing - review & editing (SA, MM).

Corresponding author

Correspondence to Samin Aref .

Editor information

Editors and Affiliations

Czech Technical University in Prague, Prague, Czech Republic
Jiří Mikyška
University of Amsterdam, Amsterdam, The Netherlands
Clélia de Mulatier
AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M.A. Sloot

Appendix

The data on 20 random graphs used in this study are available in a FigShare data repository [5]. The 60 real networks are loaded as simple undirected graphs. They are available in the publicly accessible network repository Netzschleuder with the 60 names below:

dom, packet_delays, sa_companies, ambassador, florentine_families, rhesus_monkey, kangaroo, internet_top_pop, high_tech_company, moviegalaxies, november17, moreno_taro, sp_baboons, bison, dutch_school, zebras, cattle, moreno_sheep, 7th_graders, college_freshmen, hens, freshmen, karate, dutch_criticism, montreal, ceo_club, windsurfers, elite, macaque_neural, sp_kenyan_households, contiguous_usa, cs_department, dolphins, macaques, terrorists_911, train_terrorists, highschool, law_firm, baseball, blumenau_drug, lesmis, fresh_webs, sp_office, swingers, polbooks, game_thrones, football, football_tsevans, sp_high_school_new, foodweb_baywet, revolution, foodweb_little_rock, student_cooperation, jazz_collab, interactome_pdz, physician_trust, malaria_genes, marvel_partnerships, facebook_friends, netscience.

For more information on each network and its original source, one may check the Netzschleuder website by adding the network name at the end of the url: https://networks.skewed.de/net/. For example, https://networks.skewed.de/net/malaria_genes provides additional information for the malaria_genes network. In cases of multiple networks existing with the same name in Netzschleuder, we have used the first network (e.g. we have used the HVR_1 network from https://networks.skewed.de/net/malaria_genes).

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aref, S., Mostajabdaveh, M., Chheda, H. (2023). Heuristic Modularity Maximization Algorithms for Community Detection Rarely Return an Optimal Partition or Anything Similar. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 10476. Springer, Cham. https://doi.org/10.1007/978-3-031-36027-5_48

Download citation

DOI: https://doi.org/10.1007/978-3-031-36027-5_48
Published: 26 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36026-8
Online ISBN: 978-3-031-36027-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Heuristic Modularity Maximization Algorithms for Community Detection Rarely Return an Optimal Partition or Anything Similar

Abstract

Keywords

1 Introduction

2 Methods and Materials

2.1 Modularity

2.2 Modularity Maximization

2.3 Sparse IP Formulation of Modularity Maximization

2.4 Reviewing Eight Heuristic Modularity Maximization Algorithms

2.5 Measures for Evaluating Heuristic Algorithms

2.6 Data and Resources

3 Results

3.1 Comparing Partitions from Different Algorithms on One Network

3.2 Multiplicity of Optimal Partitions

3.3 Evaluating Heuristic Algorithms on 80 Networks

3.4 Success Rate of Heuristic Algorithms in Maximizing Modularity

4 Discussions and Future Directions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation