Analyzing Modularity Maximization in Approximation, Heuristic, and Graph Neural Network Algorithms for Community Detection

Community detection, which involves partitioning nodes within a network, has widespread applications across computational sciences. Modularity-based algorithms identify communities by attempting to maximize the modularity function across network node partitions. Our study assesses the performance of various modularity-based algorithms in obtaining optimal partitions. Our analysis utilizes 104 networks, including both real-world instances from diverse contexts and modular graphs from two families of synthetic benchmarks. We analyze ten inexact modularity-based algorithms against the exact integer programming baseline that globally optimizes modularity. Our comparative analysis includes eight heuristics, two variants of a graph neural network algorithm, and nine variations of the Bayan approximation algorithm. Our findings reveal that the average modularity-based heuristic yields optimal partitions in only 43.9% of the 104 networks analyzed. Graph neural networks and approximate Bayan, on average, achieve optimality on 68.7% and 82.3% of the networks respectively. Additionally, our analysis of three partition similarity metrics exposes substantial dissimilarities between high-modularity sub-optimal partitions and any optimal partition of the networks. We observe that near-optimal partitions are often disproportionately dissimilar to any optimal partition. Taken together, our analysis points to a crucial limitation of the commonly used modularity-based methods: they rarely produce an optimal partition or a partition resembling an optimal partition even on networks with modular structures. If modularity is to be used for detecting communities, we recommend approximate optimization algorithms for a more methodologically sound usage of modularity within its applicability limits.


Introduction
Community detection (CD), the process of inductively identifying communities within a network, is a core problem in computational sciences, particularly, in physics, computer science, biology, and computational social science [19].Among common approaches for CD are the algorithms which are designed to maximize a utility function, modularity [40], across all possible ways that the nodes of the input network can be partitioned into communities.Modularity measures the fraction of edges within communities minus the expected fraction if the edges were distributed randomly; with the random distribution of the edges being a null model that preserves the node degrees.Despite their name and design philosophy, current modularity maximization algorithms, which are used by no less than tens of thousands of peer-reviewed studies [31], are not guaranteed to maximize modularity [41,27,39].
Modularity is among the first objective functions proposed for an optimization-based detection of communities [40,18].Several limitations [21,16,18,45] of modularity including the resolution limit [17,52,30] have led researchers to develop alternative CD methods using stochastic block modeling [26,44,36,53], information theoretic approaches [49,50], and alternative objective functions [58,2,23,38].Modularity-based heuristics are the most commonly used methods for CD [56,19].Not only do modularity-based heuristics fail to guarantee optimality or proximity to an optimal partition, but also there is uncertainty [20,27] in the extent to which they succeed in returning a maximum-modularity (optimal) partition or something similar.Unlike heuristic methods for modularity maximization, approximation algorithms provide guarantees on the proximity to optimality.The Bayan algorithm [4] is one such example which can be configured to be used as an approximation algorithm for modularity maximization.
In this paper, we evaluate eight modularity-based heuristics [13,7,34,56,60,8,57,35], two variations of a graph neural network algorithm [55], and several variations of an approximate algorithm [4].We quantify the extent to which these ten algorithms succeed in returning an optimal partition or a partition resembling an optimal partition.After describing the methods and materials, we present the results in five subsections followed by a discussion of the methodological ramifications and future directions.
In Eq. ( 1), d i represents the degree of node i; γ is the resolution parameter 3 ; and the Kronecker delta, δ(i, j), is 1 if nodes i and j are in the same community, otherwise it is 0. The term associated with each pair of nodes (i, j) is alternatively represented as b ij = a ij − γ didj 2m and referred to as the modularity matrix entry for (i, j).

Modularity maximization
The modularity maximization problem for the input graph G = (V, E) involves finding a partition X * (G) whose modularity is maximum over all possible partitions:

Optimal and sub-optimal partitions
For graph G, we refer to any partition that satisfies the definition of X * (G) = arg max X Q (G,X) as an optimal partition (i.e., a maximum-modularity partition).Any partition that in not an optimal partition is a sub-optimal partition.Different sub-optimal partitions X 1 , X 2 of graph G are distinguished based on their corresponding modularity values Q (G,X1) , Q (G,X2) as well as by their (maximum) similarity to an optimal partition of G.

Sparse IP formulation of modularity maximization
Consider the simple graph G = (V, E) with modularity matrix entries b ij , obtained using the resolution parameter γ.We use the binary decision variable x ij for each pair of distinct nodes (i, j), i < j.Their community membership is either the same (represented by x ij = 0) or different (represented by x ij = 1).Accordingly, the problem of maximizing the modularity of input graph G can be formulated as an IP model [15] as in Eq. (2).
3 Without loss of generality, we set γ = 1 for all the analysis in this paper.
In Eq. ( 2), the optimal objective function value equals the maximum modularity for the input graph G.An optimal community assignment is characterized by the optimal values of the x ij variables.K(i, j) indicates a minimumcardinality separating set [15] for the nodes i, j.Using K(i, j) in the IP model of this problem leads to a more efficient formulation with O(n 2 ) constraints [15] instead of O(n 3 ) constraints as in earlier IP formulations of the problem [9,1].Solving this optimization problem is NP-hard [9,39].To obtain the baseline of our comparative analysis, we use the Gurobi solver (version 10.0) [22] to solve this NP-hard problem to global optimality for the small and mid-sized instances outlined in Subsection 2.7.

Reviewing ten modularity-based algorithms and their variations
We evaluate ten modularity-based algorithms known as Clauset-Newman-Moore (CNM) [13], Louvain [7], Reichardt Bornholdt with the configuration model as the null model (LN) [46,34], Combo [56], Belief [60], Paris [8], Leiden [57], and EdMot-Louvain [35], recurrent graph neural network (GNN) [55], and Bayan [4].Except for Bayan and GNN, we use the Python implementations of the remaining eight algorithms (collectively referred to as heuristics) which are accessible in the Community Discovery library (CDlib) version 0.2.6 [48].For Bayan, we use the bayanpy version 0.7.6 library in Python.And we use the GNN as implemented in its public GitHub repository referenced in [55].We briefly describe how these ten algorithms use modularity to discover communities.
CNM: The CNM algorithm initializes each node as a community by itself.It then follows a greedy scheme of merging two communities that contribute the maximum positive value to modularity [13].
Louvain: The Louvain algorithm involves two sets of iterative steps: (1) locally moving nodes for increasing modularity and (2) aggregating the communities from the first step [7].Despite Louvain being the most commonly used modularity-based algorithm [31], it may sometimes lead to disconnected components in the same community [57].
Leiden: The Leiden algorithm attempts to resolve a defect of the Louvain algorithm in returning badly connected communities.It is suggested to guarantee well-connected communities in which all subsets of all communities are locally optimally assigned [57].
LN: The LN algorithm uses the same heuristic rules as the Leiden algorithm, but it supports weighted and directed graphs [34].
Combo: The Combo algorithm is a general optimization-based CD method which supports modularity maximization among other tasks.It involves two sets of iterative steps: (1) finding the best merger, split, or recombination of communities to maximize modularity and (2) performing a series of Kernighan-Lin bisections [29] on the communities as long as they increase modularity [56].
Belief: The Belief algorithm seeks the consensus of different high-modularity partitions through a message-passing algorithm [60] motivated by the premise that maximizing modularity can lead to many poorly correlated competing partitions.
Paris: The Paris algorithm is suggested to be a modularity-maximization scheme with a sliding resolution [8]; that is, an algorithm capable of capturing the multi-scale community structure of real networks without a resolution parameter.It generates a hierarchical community structure based on a simple distance between communities using a nearest-neighbour chain [8].
EdMot: The EdMot-Louvain algorithm (EdMot for short) is developed to overcome the hypergraph fragmentation issue observed in previous motif-based CD methods [35].It first creates the graph of higher-order motifs (small dense subgraph patterns) and then partitions it using the Louvain method to heuristically maximize modularity using higher-order motifs [35].

GNN:
The GNN algorithm uses a recurrent graph neural network for maximizing modularity [55].It relies on a continuous optimization technique that considers current nodes' attachments: continuous variable representing the cluster membership of a given node in a given community.In this algorithm, the attachments of nodes are combined with attachments of their neighbors.It starts with a random initial matrix of all attachments which is then updated iteratively to increase the modularity function using a recurrent graph neural network architecture [55].We have used two variations of the GNN algorithm: GNN-100 (suggested to be the fastest version) and GNN-25K (suggested to be a slow but very precise version) [55].
Approximate Bayan: Unlike the other algorithms discussed, the approximate Bayan algorithm (Bayan for short) is an approximation algorithm for modularity maximization.Bayan is theoretically grounded by an IP formulation of the modularity maximization problem [15].For approximating an optimal solution to the IP problem, Bayan uses a branch-and-cut scheme [4] while accounting for the gap between the upper bound and lower bound of the optimization problem.When the two bounds reach the desired approximation threshold (set by the user), the Bayan algorithm returns the highest modularity partition found alongside the maximum potential modularity gap (in percentage) that the obtained partition may have from a globally maximum-modularity partition.
For each network instance, we first obtain the optimal partitions by solving the IP model in Eq. ( 2) using the Gurobi solver (version 10.0) and a termination criterion that ensures global optimality [22].While the partition which maximizes modularity is often unique [6], some graphs have multiple optimal partitions.We obtain all optimal partitions of the networks using the Gurobi solver by running it with a special configuration for finding all optimal partitions [22].In the next step, we evaluate these ten modularity-based algorithms in maximizing modularity.On each network instance, we quantify the following for each algorithm (or algorithm variation): (1) the ratio of their output modularity to the maximum modularity for that network and (2) the maximum similarity between their output partition and any optimal partition of that network.We use use two different measures of partition similarity.

Measures for evaluating the algorithms
For a quantitative measure of proximity to global optimality, we define and use the Global Optimality Percentage (GOP) as the fraction of the modularity returned by an algorithm for a network divided by the globally maximum modularity for that network (obtained by solving the IP model in Eq. ( 2)).In cases where an algorithm returns a partition with a negative modularity value, we set GOP=0 to facilitate easier interpretation of proximity to optimality based on non-negative GOP values.
We use two measures to quantify the similarity of a partition to an optimal partition: AMI and RMI which are shown to be reliable measures of partition similarity [24].We use adjusted mutual information [59] and normalize it symmetrically [24].The symmetrically normalized adjusted mutual information (AMI for short) [59] is a measure of similarity between two partitions of the same network.To ensure the reliability of our results on similarities of partitions, we also use reduced mutual information [42] and normalize it asymmetrically [24].The asymmetrically normalized reduced mutual information (RMI for short) [42] is a measure of similarity between two partitions of the same network.
Unlike the commonly used [47,54,23,30,51,52] yet problematic [42,24] normalized mutual information (NMI) [59], AMI and RMI adjust the measurement based on the similarity that the two partitions may have by pure chance.AMI and RMI for a pair of identical partitions (or permutations of the same partition) equal 1.For two extremely dissimilar partitions, however, AMI and RMI take values close to 0.
Given the possibility of multiple optimal partitions in some graphs, we calculated the AMI and RMI for the partition of each algorithm and each of the multiple globally optimal partitions of that graph.We then conservatively report the maximum AMI and the maximum RMI of each heuristic for each graph to quantify the similarity between that partition and its closest optimal partition.Consequently, a low value of AMI or RMI for a partition obtained by an algorithm indicates its dissimilarity to any optimal partition of that network.

Data sources and computing resources
For our computational experiments, we consider 54 real networks4 from a wide range of contexts and domains including online and offline social relations, social affiliation, social collaboration, animal interactions, biological, neural, informational, technological, fictional, geographical, organizational, communications, and terrorism.We also use 50 structurally diverse synthetic networks that have modular structures.
To create synthetic graphs with modular structures, we use two benchmark graph generation models: Lancichinetti-Fortunato-Radicchi (LFR) benchmark graphs [33] and Artificial Benchmarks for Community Detection (ABCD) graphs [25].These two families of synthetic benchmark graphs are designed for evaluating the performance of CD algorithms based on their success in retrieving a planted (ground-truth) partition (see [30,54,4] for the common use case).However, we deploy these two models for generating synthetic graphs with controllable modular structures and do not use the planted partition information.LFR and ABCD each have a distinct mixing parameter which controls the association between the structure and the planted communities.This association in turn impacts the strength of the modular structure (relatively higher density of intra-community edges and relatively lower density of inter-community edges).ABCD is the more recent alternative to the LFR model for generating CD benchmark graphs [33].Being directly comparable to LFR, ABCD offers additional benefits including higher scalability and better control for adjusting the mixing parameter [25].
The 50 synthetic networks that we use alongside the 54 real networks include 20 LFR graphs [33] and 30 ABCD graphs [25] with up to 1000 edges.The real networks considered have at most 2812 edges.These instance sizes were chosen to ensure that globally optimal partitions can be obtained a reasonable time.
The LFR benchmarks are randomly generated based on the following parameters: number of nodes (n) randomly chosen from the range [20,300], maximum degree ⌊0.3n⌋, maximum community size ⌊0.5n⌋,power law exponent for the degree distribution τ 1 = 3, power law exponent for the community size distribution τ 2 = 1.5, and average degree of 4. The parameter µ (LFR mixing parameter) is chosen from the set {0.01, 0.1} (10 LFR graphs for each value of µ).
The ABCD benchmarks are randomly generated based on the following parameters: number of nodes (n) randomly chosen from the range [10, 500); minimum degree d min and minimum community size k min randomly chosen from the range [1, n/4); maximum community size chosen randomly from [k min + 1, n); maximum degree chosen randomly from [d min + 1, n); and power law exponents for the degree distribution and community size distribution randomly from (1,8) and then rounded off to 2 decimal places.The parameter ξ (ABCD mixing parameter) is chosen from the set {0.01, 0.1, 0.3} (10 ABCD graphs for each value of ξ).
The computational experiments were implemented in Python 3.9 using a notebook computer with an Intel Core i7-11800H @ 2.30GHz CPU and 64 GB of RAM running Windows 10.

Results
We present the main results from our experiments in the following five subsections.In Subsection 3.1, we compare partitions from 12 modularity maximization methods on a single network to illustrate the differences between the results of solving the same underlying optimization problem using different methods (algorithms and their variations).In Subsection 3.2, we use AMI and RMI to investigate the cost of sub-optimality in terms of dissimilarity of partitions from an optimal partition.In Subsection 3.3, we provide the distributions of AMI and RMI for each algorithm on all 104 networks.In Subsection 3.4, we compare the solve times of all the algorithms and their variations.Finally, in Subsection 3.5, we investigate the success rate of all the algorithms and their variations in returning an optimal partition.

Comparing partitions from different algorithms on one network
Figures 1-2 show the largest connected component (the giant component) of one network partitioned by twelve modularity-based CD methods.This network dataset 5 represents an anonymized Facebook egocentric network6 from which the ego node has been removed.Nodes represent Facebook users, and an edge exists between any pair of users who were friends on Facebook in April 2014 [37].Partitions of nodes into communities are shown using node colors.
Comparing Figures 1-2, the partitions from the six algorithms in Figure 1 have more substantial differences from the optimal values in Q, AMI, and k (number of communities) as shown by the values in the corresponding subcaptions in Figure 1.SubFigure 2f shows an optimal partition of the network face-book_friends obtained using the Bayan approximate algorithm with an approximation tolerance of 1e − 3.This optimal partition has k = 28 communities, and a modularity value of Q * = 0.7157714 (the maximum modularity for this network).The partitions from the all the other eleven methods are sub-optimal as depicted in Figures 1-2.Compared to other heuristics, the two heuristics Combo and LN have more success in achieving proximity to an optimal partition.LN returns a partition with k = 28 communities and a modularity of Q = 0.7153755 which has the highest AMI among all heuristics (0.971).The relative success of the Combo algorithm is in returning a particularly high-modularity partition with Q = 0.7157709, but with k = 13 communities and a lower AMI (0.949) compared to LN.The two variations of the GNN algorithm return sub-optimal partitions with 19 and 16 communities.A similar observation can be made from the RMI values of these partitions which are not reported in the interest of brevity.

The disproportionate cost of sub-optimality
We use two scatter-plots for GOP vs. AMI and GOP vs. RMI to investigate the cost of sub-optimality (in terms of dissimilarity from an optimal partition) for each of the algorithms based on all the 104 networks.Figure 3 has one panel for each algorithm which shows GOP on the y-axis and AMI on the x-axis.Each of the 104 data points in a panel of Figure 3 corresponds to one of the 104 networks.The 45-degree lines in Figure 3 indicate the areas where the GOP and AMI are equal.In other words, the 45-degree lines show areas where the extent of sub-optimality (1-GOP) is associated with a dissimilarity (1-AMI) of the same proportion between the sub-optimal partition and any optimal partition.AMI: similarity of partition obtained by an algorithm and (the closest) optimal partition GOP: ratio of modularity value reported by an algorithm to the maximum modularity Fig. 3: Global optimality percentage (GOP) and normalized adjusted mutual information (AMI) measured for each algorithm by comparing its results with (all) globally optimal partitions.(Magnify the high-resolution figure on screen for more details.) Looking at the y-axes values in Figure 3, we observe that there is a substantial variation in the values of GOP (i.e., variations in the extent of sub-optimality) for most of the eight heuristic algorithms.The Belief algorithm returns partitions associated with negative modularity values for 22 of the 104 instances (leading to the corresponding data points having GOP= 0 and being concentrated at the bottom of the scatter-plot).The Paris and EdMot algorithms return partitions with modularity values substantially smaller than the maximum modularity values.Among the eight heuristics, the four algorithms with the highest and increasing performance in returning close-to-maximum modularity values are LN, Leiden, Louvain, and Combo.Despite that these instance are graphs with no more than 2812 edges, they are, according to Figure 3, challenging instances for these heuristic algorithms to optimize.Given that modularity maximization is an NP-hard problem [9,39], one can argue that the performance of these heuristics, in achieving proximity to an optimal partition, does not improve for larger networks.The y-axes values for different variations of Bayan and GNN have a much lower variability and are closer (than partitions of heuristics) to 1, which indicate that these two algorithms return partitions with modularity values closer to the maximum modularity values of these networks.
The x-axes values for the eight heuristics in Figure 3 show considerable dissimilarity (from an AMI perspective) between the sub-optimal partitions of these heuristics and any optimal partition for these 104 instances.Some sub-optimal partitions obtained by these heuristics have AMI values smaller than 0.5.They are substantially different from any optimal partition.Even for some data points concentrated at the top of the scatter-plots which have 0.95 < GOP < 1, we see some substantially small AMI values.They indicate that some high-modularity partitions are particularly dissimilar from any optimal partition.Compared to the other seven heuristics, Combo appears to consistently return partitions with high AMIs on a larger number of these 104 instances.The twelve panels for Combo and different variations of GNN and Bayan show fewer instances of low AMI values indicating that these three algorithms are overall more successful at returning partitions highly similar to an optimal partition.The panels for Bayan show that decreasing the approximation tolerance (gradually from 0.9 to 1e − 5) leads to a gradual increase in the resulting AMI values.Unlike heuristics whose performance cannot be controlled, Bayan provides the user with the flexibility to obtain better approximations by reducing the tolerance (at the cost of additional computations).
The most important pattern in Figure 3 is observed when we focus on the positions of data points with respect to the 45-degree lines.We observe that they are mostly located above their corresponding 45-degree line.This indicates that, irrespective of the algorithm, sub-optimal partitions tend to be disproportionately dissimilar to any optimal partition (as foreshadowed in [14]).This result goes against the naive viewpoint that close-to-maximum modularity partitions are also close to an optimal partition.Our results are aligned with previous concerns that modularity maximization heuristics have a high risk of algorith-mic failure [27] and may result in degenerate solutions far from the underlying community structure [20].RMI: similarity of partition obtained by an algorithm and (the closest) optimal partition GOP: ratio of modularity value reported by an algorithm to the maximum modularity Fig. 4: Global optimality percentage (GOP) and normalized reduced mutual information (RMI) measured for each algorithm by comparing its results with (all) globally optimal partitions.(Magnify the high-resolution figure on screen for more details.) To ensure that the observations made are not artefacts of using AMI, we also measure and report the RMI values of the same partitions.Figure 4 shows GOP on the y-axis and RMI on the x-axis.Similar observations can be made from the RMI values of these partitions: (1) x-axes values in the eight panels of the heuristics show substantial dissimilarity between the sub-optimal partitions of heuristics and any optimal partitions of these networks, with Combo suffering less than other heuristics from this issue.(2) GNN and Bayan return partitions with higher similarity (from an RMI perspective) to the optimal partitions.(3) Most data point are above their corresponding 45-degree lines indicating that sub-optimal partitions tend to be disproportionately dissimilar (from an RMI perspective) to any optimal partition.

Distribution of partition similarity measures for each algorithm
In the scatter-plots in Subsection 3.2, data points overlap and distributions are not visible.Figure 5 complement the observations made from Figures 3-4. Figure 5 illustrates for each algorithm the box plots of AMI values and RMI values obtained on all 104 networks.Each box in Figure 5 shows: the first quartile (Q 1 ), the median (Q 2 ), and the third quartile (Q 3 ) of the distribution for one similarity measure and one algorithm.The whiskers are drawn from the nearest hinge (Q 1 or Q 3 ) to the farthest datapoint within the 1.5 interquartile range (±1.5(Q 3 − Q 1 )).The distributions for AMI (top panel) and RMI (bottom panel) are quite similar indicating that the patterns observed in Subsection 3.2 are irrespective of the choice between AMI and RMI.This is aligned with the results in [24] while that study recommends using RMI.Belief has the widest distributions of partition similarity measures followed by EdMot and Paris.The medians of AMI and RMI for Paris are below 0.8.The medians for EdMot and Belief are slightly higher.For CNM, the medians are around 0.85.All the distributions are leftskewed indicating higher variability among values below the median.Compared to the other heuristics, Louvain, Leiden, LN, and Combo have distributions with smaller ranges and higher medians.Both variations of the GNN algorithm have wider distributions than Combo.The nine variations of the Bayan algorithm have medians extremely close to 1 with some of them also having the smallest variability from 1 among all the algorithms considered.We reobserve the expected pattern that reducing the approximation threshold of Bayan (from 0.9 to 1e − 5) generally leads to better performance (higher similarity to an optimal partition with lower variation).

Empirical time comparison of the algorithms
All these algorithms attempt to solve the same optimization problem: maximizing modularity, but their ways of exploring the feasible space are widely different leading to considerably different solve times for their specific computations (which are inherently different).Figure 6 shows for each algorithm a box plot of its empirical solve times measured on the 104 networks.Note that the y-axis of Figure 6 is on a logarithmic scale.These solve times are empirical and depend on the computing resources used, but are comparable to each other because the same hardware configuration is used for all these algorithms.Box plots of two algorithms stand out: GNN-25K has the longest solve time (and a median of 247 seconds) followed by the Belief algorithm (that has a median of 8.4 seconds).The median solve time of GNN-100 (0.56 seconds) and that of five Bayan variations are similar (those with approximation tolerances from 0.1 to 0.9) .Slower variations of Bayan have a median solve time above one second with the slowest variation (Bayan 1e−5) having a median of 1.97 seconds.The distributions of solve times for all variations of Bayan are left-skewed with their Q 1 solve time often being smaller than their median solve time by an order of magnitude.The widest boxes belong to Bayan for which the Q 3 often an order of magnitude larger than Q 2 .Except for Belief, the heuristic algorithms are 1-2 orders of magnitude faster than GNN and Bayan; with Leiden and LN being the fastest algorithms in our analysis.
The solve time distributions of all algorithms have outliers (values larger than the top whisker of the box plot).All 104 instances considered are from the limited range of networks with tens to a few thousands of edges.Therefore, the outliers existing for almost all algorithms indicates that solving some instances take much longer the typical solve time of that algorithm for that range of input size.This can be partially explained by the differences in graph structures.Some graphs have a structure close to the structure of a random graph (and far from a modular structure).Finding a high-modularity partition for them potentially takes orders-of-magnitude longer (than the typical time for networks of that size range) irrespective of the algorithm used.

Success rates of the algorithms in maximizing modularity
The GOP values allow us to answer a fundamental question about these modularity-based algorithms: how often does each algorithm return an optimal partition?We report the fraction of networks (out of 104) for which a given algorithm returns an optimal partition.A similar assessment of some of these algorithms on a different set of networks is provided in [6].
Figure 7 shows the success rates of all algorithms and their variations in achieving global optimality on the 104 networks considered.Among the eight heuristics, Combo has the highest success rate, returning an optimal partition for 90.4% of the networks.The average success rate of all 8 heuristics combined is 43.9%.With the exception of Combo, the rates for these heuristics are arguably low success rates for what the name modularity maximization algorithm implies or the idea of discovering network communities through maximizing a function.The two variations of GNN have markedly different success rates and their average success rate is 68.7%.The least restricted version of approximate Bayan (Bayan 0.9) returns optimal partitions on 75% of networks and is more successful than the average heuristic and the average GNN variation.This indicates that in the context of solving this NP-hard graph optimization problem, a wellestablished optimization procedure (branch and bound which is used in Bayan) can be more successful than most alternatives (e.g., heuristics and GNNs).After all, modularity maximization is a mathematical optimization task and it should be safe to assume that guaranteed approximate optimization methods  would likely be hard to outperform in optimality success rate (acknowledging that they most likely take longer and can only be used for small and mid-sized networks).Figure 7 also shows that using Bayan with a smaller approximation tolerance value often leads to a higher success rate.Earlier in Figures 3-4, we observed that near-optimal partitions tend to be disproportionately dissimilar to any optimal partition.In other words, close-tomaximum modularity partitions are rarely close to any optimal partition.Taken together with the low success rates of the average heuristic in maximizing modularity, our results indicate a crucial mismatch between the design philosophy of these methods and their average capability: most heuristic modularity maximization algorithms rarely return an optimal partition or a partition resembling an optimal partition even on graphs with modular structures.

Discussions and Future Directions
Understanding modularity capabilities and limitations has been complicated by the under-studied sub-optimality of the modularity-based heuristics and their methodological consequences.Previous methodological studies [32,12,11,45], which have shed light on other aspects, have rarely disentangled the unguaranteed aspect of the heuristic optimization from the fundamental concept of modularity.Our study is a continuation of previous efforts [20] in separating the effects of sub-optimality (or the choice of using greedy algorithms [27]) from the effects of using modularity on the fundamental task of detecting communities.
We analyzed the effectiveness of eight heuristics [13,7,34,56,60,8,57,35], two variations of a GNN [55], and several variations of an approximation algorithm [4] in maximizing modularity.While our findings are limited to only a handful of algorithms, their combined usage by tens of thousands of peerreviewed studies [31] motivates the importance of this assessment.Most heuristic algorithms for modularity maximization tend to scale well for large networks [61].They are widely used not only because of their scalability or ease of implementation [27], but also because their high risk of algorithmic failure is not well understood [27].The scalability of these heuristics comes at a cost: their partitions have no guarantee of proximity to an optimal partition [20] and, as our results showed, the average heuristic rarely returns an optimal partition even on modular graph structures.Moreover, we showed that their sub-optimal partitions tend to be disproportionately dissimilar to any optimal partition irrespective of using AMI or RMI.
Neither using modularity nor succeeding in maximizing it is required for CD at the big-picture level.A recent study claims modularity maximization is the most problematic CD method and considers it harmful [45].Another study shows that, given computational feasibility, exact maximization of modularity outperforms 30 other CD methods in accurate and stable retrieval of groundtruth communities in both LFR and ABCD benchmarks [4] suggesting the relevance of modularity for CD.For some applications and contexts, general CD algorithms [43] which scale to large instance sizes [23,57] are needed.However, for a "narrow set of tasks" [43, pp.7], involving small and mid-sized networks, specialized algorithms which outperform general algorithms are useful.
Our findings suggest that if modularity is to be used for detecting communities, using approximation [10,14,28,4] and exact [3,4] algorithms is recommendable for a more methodologically sound usage of modularity within its applicability limits.Exact algorithms can also reveal the formal guarantees of performance [19] for accurate modularity-based algorithms.
A promising path forward could be using the advances in integer programming to develop better approximation algorithms (better than approximate Bayan) for solving the mathematical models of modularity maximization [9,1,15] for networks of practical relevance within the limits of computational feasibility.New heuristic algorithms that strike a balance between accurate computations and scalability (same performance as Combo, but with higher scalability) may also be useful particularly for large-scale networks.

Fig. 1 :
Fig. 1: Modularity maximization for one network using six methods leading to six sub-optimal partitions (panels a-f) with increasing Q, different k, and different AMI values.Only the giant component is shown.(Magnify the high-resolution color figure on screen for more details.)

Fig. 2 :
Fig.2: Modularity maximization for one network using six methods leading to one optimal partition (panel f) and five sub-optimal partitions (panels a-e) with increasing Q, different k, and different AMI values.Only the giant component is shown.(Magnify the high-resolution color figure on screen for more details.)

Fig. 5 :
Fig. 5: The box plots of similarity to an optimal partition (AMI in the top panel and RMI in the bottom panel) for each algorithm based on all 104 network instances.(Magnify the high-resolution figure on screen for more details.)

Fig. 6 :
Fig. 6: Box plots representing the empirical solve time of the algorithms for the 104 networks

Fig. 7 :
Fig. 7: Ten algorithms and their variations ranked based on their success rate in achieving global optimality on the 104 networks considered