Keywords

1 Introduction

Community detection (CD), the process of inductively identifying communities within a network, is a core problem in computational sciences, particularly, in physics, computer science, biology, and computational social science [19, 50]. Among common approaches for CD are the algorithms which are designed to maximize a utility function, modularity [37], across all possible ways that the nodes of the input network can be partitioned into communities. Modularity measures the fraction of edges within communities minus the expected fraction if the edges were distributed randomly; with the random distribution of the edges being a null model that preserves the node degrees. Despite their name and design philosophy, current modularity maximization algorithms, which are used by no less than tens of thousands of peer-reviewed studies [28], are not guaranteed to maximize modularity [24, 35, 38]. This has led to uncertainty [20, 24] in the extent to which they succeed in returning a maximum-modularity (optimal) partition or something similar.

Modularity is among the first objective functions proposed for optimization-based detection of communities [18, 37]. Several limitations [16, 18, 21, 41] of modularity including the resolution limit [17] have led researchers to develop alternative CD methods using stochastic block modeling [23, 32, 40, 45], information theoretic approaches [43, 44], and alternative objective functions [2, 34, 36, 47]. Modularity-based algorithms are the most commonly used method for CD [19, 46]. Despite the widespread adoption of modularity-based heuristics, there is uncertainty [20, 24] in their success in maximizing modularity. This study aims to address this uncertainty by quantifying the extent to which eight commonly used heuristics [7, 8, 13, 30, 31, 46, 48, 50] succeed in returning an optimal partition or a partition resembling an optimal partition. After describing the methods and materials, we present the main results followed by a discussion of the methodological ramifications and future directions.

2 Methods and Materials

This study aims to investigate the extent to which eight commonly used heuristic modularity maximization algorithms [7, 8, 13, 30, 31, 46, 48, 50] succeed in returning an optimal partition or a partition similar to an optimal partition. To achieve this objective, we quantify the proximity of their results to the globally optimal partition(s), which we obtain using an exact Integer Programming (IP) model for maximizing modularity [1, 9, 15]. We do not claim that maximum-modularity partitions represent best partitions. Throughout the paper, we use the terms network and graph interchangeably.

2.1 Modularity

Consider the simple graph \(G=(V,E)\) with \(|V|=n\) nodes, \(|E|=m\) edges, adjacency matrix entries \(a_{ij}\), and a partition \(X=\{V_1,V_2, \dots , V_k \}\) of the node set V into k communities. The modularity function \(Q_{(G,X)}\) is computed [18, 37] according to Eq. (1)

$$\begin{aligned} Q_{(G,X)}= \frac{1}{2m} \sum \limits _{(i,j) \in V^2} \left( a_{ij} - \gamma \frac{d_id_j}{2m}\right) \delta (i,j) \end{aligned}$$
(1)

where \(d_i\) represents the degree of node i, \(\gamma \) is the resolution parameterFootnote 1, and \(\delta (i,j)\) is 1 if nodes i and j are in the same community otherwise 0. The term associated with each pair of nodes (ij) is alternatively represented as \(b_{ij}=a_{ij} -\gamma \frac{d_id_j}{2m}\) and referred to as the modularity matrix entry for (ij).

2.2 Modularity Maximization

The modularity maximization problem for input graph \(G=(V,E)\) involves finding a partition \(X^*\) whose associated \(Q_{(G,X^*)}\) is globally maximum over all possible partitions of the node set V.

2.3 Sparse IP Formulation of Modularity Maximization

Consider the simple graph \(G=(V,E)\) with modularity matrix entries \(b_{ij}\), obtained using the resolution parameter \(\gamma \). We use the binary decision variable \(x_{ij}\) for each pair of distinct nodes \((i,j),i<j\). Their community membership is either the same (represented by \(x_{ij}=0\)) or different (represented by \(x_{ij}=1\)). Accordingly, the problem of maximizing the modularity of input graph G can be formulated as an IP model [15] as in Eq. (2).

$$\begin{aligned} \begin{aligned}&\max _{x_{ij}} Q = \frac{1}{2m} \left( \sum \limits _{(i,j) \in V^2 , i< j} 2b_{ij}(1- x_{ij}) + \sum \limits _{(i,i) \in V^2} b_{ii} \right) \\&\text {s.t.} \quad x_{ik}+x_{jk} \ge x_{ij} \quad \forall (i,j) \in V^2 , i< j, k\in K(i,j) \\&\quad \quad x_{ij} \in \{0,1\} \quad \forall (i,j) \in V^2 , i< j \end{aligned} \end{aligned}$$
(2)

In Eq. (2), the optimal objective function value equals the maximum modularity for the input graph G. An optimal community assignment is characterized by the optimal values of the \(x_{ij}\) variables. K(ij) indicates a minimum-cardinality separating set [15] for the nodes ij. Using K(ij) in the IP model of this problem leads to a more efficient formulation with \(\mathcal {O}(n^2)\) constraints [15] instead of \(\mathcal {O}(n^3)\) constraints as in earlier IP formulations of the problem [1, 9]. Solving this optimization problem is NP-complete [9, 35]. We use the Gurobi solver (version 10.0) [22] to solve it for the small and mid-sized instances as outlined in Subsect. 2.6.

2.4 Reviewing Eight Heuristic Modularity Maximization Algorithms

We evaluate eight modularity maximization heuristics known as Clauset-Newman-Moore (CNM) [13], Louvain [7], Leicht-Newman (LN) [30], Combo [46], Belief [50], Paris [8], Leiden [48], and EdMot-Louvain [31]. We have used the Python implementations of these eight algorithms which are accessible in the Community Discovery library (CDlib) version 0.2.6 [42].

We briefly describe how these eight algorithms use modularity to discover communities. The CNM algorithm initializes each node as a community by itself. It then follows a greedy scheme of merging two communities that contribute the maximum positive value to modularity [13]. The Louvain algorithm involves two sets of iterative steps: (1) locally moving nodes for increasing modularity and (2) aggregating the communities from the first step [7]. Despite Louvain being the most commonly used modularity-based algorithm [28], it may sometimes lead to disconnected components in the same community [48]. The LN algorithm uses spectral optimization to maximize modularity which also supports directed graphs [30]. The Combo algorithm is a general optimization-based CD method which supports modularity maximization among other tasks. It involves two sets of iterative steps: (1) finding the best merger, split, or recombination of communities to maximize modularity and (2) performing a series of Kernighan-Lin bisections [26] on the communities as long as they increase modularity [46]. The Belief algorithm seeks the consensus of different high-modularity partitions through a message-passing algorithm [50] motivated by the premise that maximizing modularity can lead to many poorly correlated competing partitions. The Paris algorithm is suggested to be a modularity-maximization scheme with a sliding resolution [8]; that is, an algorithm capable of capturing the multi-scale community structure of real networks without a resolution parameter. It generates a hierarchical community structure based on a simple distance between communities using a nearest-neighbour chain [8]. The Leiden algorithm attempts to resolve a defect of the Louvain algorithm in returning badly connected communities. It is suggested to guarantee well-connected communities in which all subsets of all communities are locally optimally assigned [48]. The EdMot-Louvain algorithm (EdMot for short) is developed to overcome the hypergraph fragmentation issue observed in previous motif-based CD methods [31]. It first creates the graph of higher-order motifs (small dense subgraph patterns) and then partitions it using the Louvain method to heuristically maximize modularity using higher-order motifs [31].

To evaluate these eight modularity-based algorithms in maximizing modularity, we quantify (1) the ratio of their output modularity to the maximum modularity for each input graph and (2) the maximum similarity between their output partition and any optimal partition of that graph. We obtain optimal partitions by solving the IP model in Eq. (2) using the Gurobi solver (version 10.0) with a termination criterion ensuring global optimality [22].

2.5 Measures for Evaluating Heuristic Algorithms

For a quantitative measure of proximity to global optimality, we define and use the Global Optimality Percentage (GOP) as the fraction of the modularity returned by a heuristic method for a network divided by the globally maximum modularity for that network (obtained by solving the IP model in Eq. (2)). In all cases where the modularity returned by a heuristic method equals the maximum modularity for the input graph, we set GOP = 1. In cases where a heuristic algorithm returns a partition with a negative modularity value, we set GOP = 0 to facilitate easier interpretation of proximity to optimality based on non-negative GOP values.

We also use a quantitative measure for the similarity of a partition to an optimal partition. Normalized Adjusted Mutual Information (AMI) [49] is a measure of similarity between two partitions of the same network. Unlike normalized mutual information [49], AMI adjusts the measurement based on the similarity that two partitions may have by pure chance. AMI for a pair of identical partitions (or permutations of the same partition) equals 1. For two different partitions, however, AMI takes a smaller value (including 0 or negative values close to 0 for two extremely dissimilar partitions).

2.6 Data and Resources

For our computational experiments, we include 60 real networksFootnote 2 with no more than 2812 edges as well as 10 Erdős-Rényi graphs and 10 Barabási-Albert graphs with 125–153 edges. These instance sizes were chosen to ensure all algorithms terminate within a reasonable time. The computational experiments were implemented in Python 3.9 using a notebook computer with an Intel Core i7-11800H @ 2.30 GHz CPU and 64 GB of RAM running Windows 10.

3 Results

We present the main results from our experiments in the following four subsections. In Subsect. 3.1, we compare partitions from different algorithms on a single network. In Subsect. 3.2, we examine the multiplicity of optimal partitions and investigate the similarity between multiple optimal partitions of the same networks. In Subsect. 3.3, we evaluate the effectiveness of the heuristic algorithms on 80 networks by measuring the distance of sub-optimal partitions from an optimal partition. Finally, in Subsect. 3.4, we investigate the success rate of the heuristic algorithms in finding an optimal partition.

3.1 Comparing Partitions from Different Algorithms on One Network

Figure 1 shows one graph and its nine partitions returned by nine CD methods. This graphFootnote 3 represents an anonymized Facebook ego networkFootnote 4. Nodes represent Facebook users, and an edge exists between any pair of users who were friends on Facebook in April 2014 [33]. Communities are shown using node colors.

Fig. 1.
figure 1

Modularity maximization for one network using nine methods leading to one optimal partition (panel a) and eight sub-optimal partitions (panels b-i) with different Q, k, and AMI values. (Magnify the high-resolution color figure on screen for more details.)

Panel 1a of Fig. 1 shows an optimal partition obtained by solving the IP model in Eq. (2) for the network facebook_friends. It involves \(k=28\) communities, and a maximum modularity value of \(Q^*=0.7157714\). The partitions from the eight heuristic modularity maximization algorithms are all sub-optimal as depicted in panels 1b–1i of Fig. 1. Compared to other algorithms, the two algorithms Combo and LN have more success in achieving proximity to an optimal partition. LN returns a partition with \(k=28\) communities and a modularity of \(Q=0.7139\) which has the highest AMI among all heuristics (0.971). The relative success of the Combo algorithm is in returning a high-modularity partition with \(Q=0.7157709\), but with \(k=13\) communities and a lower AMI (0.949) compared to LN. The sub-optimal partitions from the other six algorithms have more substantial variations in Q, AMI, and k (number of communities) as shown by the values in the corresponding subcaptions in Fig. 1.

3.2 Multiplicity of Optimal Partitions

While the partition which maximizes modularity is often unique, some graphs have multiple optimal partitions. For all networks considered in our analysis, we obtain all optimal partitions using the Gurobi solver by running it with a special configuration for finding all optimal partitions [22]. Figure 2 shows a protein networkFootnote 5 and its four optimal partitions. In this network, nodes represent proteins and an edge represents a binding interaction between two proteins (PDZ-domain-mediated protein-protein binding interaction) [6]. All four optimal partitions have \(Q^*=0.80267\) and \(k=29\).

The differences between optimal partitions of this network are in the community assignments for two nodes indicated by red arrows in Fig. 2. The six pairwise AMI values for the optimal partitions are all >0.98 confirming the high level of similarity between the four optimal partitions in Fig. 2.

Fig. 2.
figure 2

A protein network and its four optimal partitions (panels a-d). The red arrows show the differences between optimal partitions. (Magnify the high-resolution color figure on screen for more details.) (Color figure online)

Obtaining all optimal partitions for all 80 networks, we observed that 89% of the graphs have unique optimal partitions and the multiplicity of optimal partitions is a relatively rare event. Given the possibility of multiple optimal partitions in some graphs, we calculated the AMI for the partition of each heuristic algorithm and each of the multiple optimal partitions of that graph. We then conservatively reported the maximum AMI of each heuristic for each graph to quantify the similarity between that partition and its closest optimal partition. Consequently, a low value of AMI for a partition obtained by a heuristic algorithm indicates its dissimilarity to any optimal partition.

Our results suggest that the rarely observed multiple optimal partitions of a graph often have a high degree of similarity (AMI values >0.9) because their differences are often only in the community assignments of a very few nodes (as in Fig. 2). Dissimilarity between multiple optimal partitions of a network seems to be exceptional, but it has been observed in one of our 80 networks: contiguous USAFootnote 6, where nodes are US states and each edge indicates a land-based border between two states. The AMI of the two optimal partitions for this network is exceptionally low (0.34). Upon further investigation, we observed that one optimal partition combines five communities of the other optimal partition together. This makes the two partitions related in terms of belonging to a clustering hierarchy, while they are not similar according to an AMI definition of partition similarity. These exceptional cases are possible due to the mathematical symmetries resulted from the value of \(\gamma \) used in Eq. (1) for defining modularity. Our results suggest that there is usually a distinct uniqueness to an optimal partition (or a group of similar optimal partitions) for a given network in comparison to sub-optimal partitions. This new perspective is contrary to the premise that maximizing modularity leads to many competing partitions with almost the same modularity [50] and no clear way of selecting between them [41]. It is the failure to actually maximize modularity that may lead to many poorly correlated competing partitions with unknown distances from the desired objective (both in modularity and in partition similarity). What remains to be analyzed is how different sub-optimal partitions are from an optimal partition and how often heuristic modularity maximization algorithms return sub-optimal partitions. We investigate these two questions in the next two subsections.

3.3 Evaluating Heuristic Algorithms on 80 Networks

For summarizing the results of eight heuristics on 80 networks, we present four scatter-plots of GOP and AMI. Figure 3 shows GOP on the y-axes and AMI on the x-axes for the combination of each network and algorithm. For each algorithm (color-coded), there are 60 data points for the 60 real networks and 2 data points representing the average of 10 Erdős-Rényi and the average of 10 Barabási-Albert graphs. The first three letters of the network names are indicated on each data point (magnify the figure on screen for the details). Four 45-degree lines are drawn to indicate the areas where the GOP and AMI are equal. In other words, the 45-degree lines show areas where the extent of sub-optimality (\(1-\text {GOP}\)) is associated with a dissimilarity (\(1-\text {AMI}\)) of the same size between the sub-optimal partition and any optimal partition.

Looking at the y-axes values in Fig. 3, we observe that there is a substantial variation in the values of GOP (i.e.  the extent of sub-optimality) for the eight heuristic algorithms. The Belief algorithm returns partitions associated with negative modularity values for 45 of the 80 instances (leading to most of its data points having GOP = 0 and being concentrated at the bottom of the scatter-plot). The Paris algorithm returns partitions with modularity values substantially smaller than the maximum modularity values. Aside from a few exceptions, all data points for Leiden and LN have the same position indicating their identical performance on most of these instances. The two algorithms CNM and EdMot seem to have higher variation in GOP (compared to the other algorithms) for these instances. Overall, the four algorithms with highest and increasing performance in returning close-to-maximum modularity values are LN, Leiden, Louvain, and Combo respectively. Despite that these instance are graphs with no more than 2812 edges, they are, according to Fig. 3, challenging instances for these heuristic algorithms to optimize. Given that modularity maximization is an NP-complete problem [9, 35], one can argue that the performance of these heuristic methods in term of proximity to an optimal partition does not improve for larger networks.

The x-axes values in Fig. 3 show considerable dissimilarity between the sub-optimal partitions and an optimal partition for these 80 instances. Except for the Combo algorithm, a large number of the sub-optimal partitions obtained by these heuristic algorithms have AMI values smaller than 0.6. This indicates that their sub-optimal partitions are substantially different from any optimal partition. Even for data points concentrated at the top of the scatter-plots which have \(0.95<\text {GOP}<1\), we see AMI values substantially smaller than 1. Compared to the other seven heuristics, Combo appears to consistently return partitions with large AMIs on a larger number of these 80 instances.

Fig. 3.
figure 3

Global optimality percentage and normalized adjusted mutual information measured for eight modularity maximization heuristics in comparison with (all) globally optimal partitions. (Magnify the high-resolution figure on screen for more details.)

Focusing on the position of data points, we observe that they are mostly located above their corresponding 45-degree line. This indicates that sub-optimal partitions tend to be disproportionately dissimilar to any optimal partition (as foreshadowed in [14]). This result goes against the naive viewpoint that close-to-maximum modularity partitions are also close to an optimal partition. Our results are aligned with previous concerns that these heuristics may result in degenerate solutions far from the underlying community structure [20] and they have a high risk of algorithmic failure [24].

3.4 Success Rate of Heuristic Algorithms in Maximizing Modularity

Our GOP results for the eight heuristic algorithms allow us to answer a fundamental question about the heuristic modularity maximization algorithms: how often each algorithm returns an optimal (a maximum-modularity) partition? We report the fraction of networks (out of 80) for which a given algorithm returns an optimal partition. Combo [46] has the highest success rate, returning an optimal partition for \(55\%\) of the networks. LN [30] and Leiden [48] maximize modularity for 36.2% of the networks considered. Louvain [7] has a success rate of 18.7%. The algorithms CNM [13], EdMot [31], Paris [8], and Belief [50] have success rates of 5%, 2.5%, 1.2%, and 0% respectively. These are arguably low success rates for what the name modularity maximization algorithm implies or the idea of discovering network communities through maximizing a function.

Earlier in Fig. 3, we observed that near-optimal partitions tend to be disproportionately dissimilar to any optimal partition. In other words, close-to-maximum modularity partitions are rarely close to any optimal partition. Taken together with the low success rates of heuristic algorithms in maximizing modularity, our results indicate a crucial mismatch between the design philosophy of modularity maximization algorithms for CD and their capabilities: heuristic modularity maximization algorithms rarely return an optimal partition or a partition resembling an optimal partition.

4 Discussions and Future Directions

Understanding modularity capabilities and limitations has been complicated by the under-studied sub-optimality of modularity-based heuristics and their methodological consequences. Previous methodological studies [11, 12, 29, 36, 41], which have shed light on other aspects, had rarely disentangled the heuristic aspect of these algorithms from the fundamental concept of modularity. Our study is a continuation of previous efforts [20] in separating the effects of sub-optimality (or the choice of using greedy algorithms [24]) from the effects of using modularity on the fundamental task of detecting communities.

We analyzed the effectiveness of eight heuristics in maximizing modularity. While our findings are limited to a few algorithms, their combined usage by tens of thousands of peer-reviewed studies [28] motivates the importance of this assessment. Most heuristic algorithms for modularity maximization tend to scale well for large networks [51]. They are widely used not only because of their scalability or ease of implementation [24], but also because their high risk of algorithmic failure is not well understood [24]. The scalability of these heuristics comes at a cost: their partitions have no guarantee of proximity to an optimal partition [20] and, as our results showed, they rarely return an optimal partition. Moreover, we showed that their sub-optimal partitions tend to be disproportionately dissimilar to any optimal partition.

Neither using modularity nor succeeding in maximizing it is required for CD at the big-picture level. A recent study suggests modularity maximization is the most problematic CD method and considers it harmful [41]. Another study shows that, given computational feasibility, exact maximization of multiresolution modularity outperforms other CD methods in accurate and stable retrieval of planted communities [4] suggesting the relevance of modularity for CD. For some applications and contexts, general CD algorithms [39] which scale to large instance sizes are needed. However, for a “narrow set of tasks” [39, pp.7], involving small and mid-sized networks, specialized algorithms which outperform general algorithms are useful.

Our findings suggest that if modularity is to be used for detecting communities, developing approximation [10, 14, 25] and exact [3, 4] algorithms are recommendable for a more methodologically sound usage of modularity within its applicability limits. Exact algorithms can also reveal the formal guarantees of performance [19] for accurate modularity-based algorithms.

A promising path forward could be using the advances in integer programming to develop a specialized accurate algorithm for solving the modularity maximization IP models [1, 9, 15] for networks of practical relevance within the limits of computational feasibility. New heuristic and approximation algorithms that strike a balance between accurate computations and scalability may also be useful particularly for large-scale networks.