Dynamical detection of network communities

A prominent feature of complex networks is the appearance of communities, also known as modular structures. Specifically, communities are groups of nodes that are densely connected among each other but connect sparsely with others. However, detecting communities in networks is so far a major challenge, in particular, when networks evolve in time. Here, we propose a change in the community detection approach. It underlies in defining an intrinsic dynamic for the nodes of the network as interacting particles (based on diffusive equations of motion and on the topological properties of the network) that results in a fast convergence of the particle system into clustered patterns. The resulting patterns correspond to the communities of the network. Since our detection of communities is constructed from a dynamical process, it is able to analyse time-varying networks straightforwardly. Moreover, for static networks, our numerical experiments show that our approach achieves similar results as the methodologies currently recognized as the most efficient ones. Also, since our approach defines an N-body problem, it allows for efficient numerical implementations using parallel computations that increase its speed performance.

where x(t) is the D-dimensional state vector ( x : R → R D ) at time t and F is the flow vector ( F : R D → R D , which we assume to be smooth). The orbit of the system, or trajectory, is a particular solution of Eq. (1) for a given initial condition, i.e., the set of values given by { x(t) ∈ R D ∀t ∈ I such that x(0) = x 0 }, x 0 being the initial condition for the state vector and I ⊂ R the time interval where the solution is valid. We note that the flow vector in Eq. (1), besides the dependence on x, may depend on particular parameters, namely, F( x; α), with α ∈ R P , P being the number of relevant parameters.
Given an ODE as in Eq. (1), a gradient system is such thaṫ ∇ being the gradient operator and V being an scalar function, which is defined on an open set ϒ containing x * with V : ϒ → R + and V ( x) > 0 for x = x * . Then, if V is a Lyapunov function, 1,2 so that x * is an isolated minimum of V , the solution of Eq. (2) is an asymptotically-stable equilibrium point. Thus, choosing a dynamical system for community detection that evolves according to Eq. (2), guarantees a globally attracting region. Moreover, each particular V creates a different ratio of convergence and a different equilibrium state. Consequently, although our approach for community detection is tested using a particular choice of V , it is unrestricted to that particular V and its convergence is improved when the steepness of V is increased, as is also done in different energy models. 3 In order to implement our approach to detect communities in networks, we choose a particular gradient function to define our particles' equations of motion, where each particle is associated with a node in the network and is embedded on a D-dimensional Euclidean space with position x i for i = 1, . . . , N. Moreover, its initial condition is randomly set. Thus, our particle system follows the equations of motion defined bẏ where ∇ i is the gradient operator with respect to the space coordinates of particle i, i.e., ∇ i = ( ∂ ∂ x i , ∂ ∂ y i , ∂ ∂ z i ), and the scalar potential function is divided into a sum of attracting, V (A) j , and repelling, V (R) j , non-negative scalar functions given by The reason for their non-negativity is because is the Euclidean norm, A jl ≥ 0 is the adjacency matrix of the network (i.e., A jl = 1 if there is a link connecting nodes j and l in the network or A jl = 0 otherwise), R jl ≡ 1 − δ jl − A jl ≥ 0 is the complement of the adjacency matrix, δ jl being the Kronecker delta (i.e., δ jl = 1 if j = l or δ jl = 0 otherwise), and k j ≡ ∑ l A jl is the node's degree. These equations, namely, Eqs. (3) and (4), result in the equation of motion framework for our particle system and network community detection algorithm.

Convexity of the attracting and repelling potential functions
We note that the arguments of V (A) j and V (R) j in Eq. (4) are mutually exclusive, namely, when one is positive the other one is null. The reason is that these functions depend on the network topology, namely, the symmetric adjacency matrix A jl = A l j , and the Euclidean distance between particles, namely, j ] depends on the existing [non-existing] links of the network of nodes, which are defined by A jl . Thus, when A jl = 1 (for j = l), R jl = 0.
A convex function, f : in Eq. (4) depend on the distance between the particles, convexity is fulfilled only in particular conditions, for example, for V Moreover, since the potentials are a sum of distances between particles, convexity can be violated under other conditions as well. Nevertheless, we highlight that our numerical findings show that by including the Markov probabilities, namely, A jl /k j and R jl /k j , in the arguments of the potential functions, the system still evolves towards an asymptotic state where cluster of particles are found for an attracting region close to the origin. In other words, although the convexity requirement for V being a Lyapunov function is violated in a general scenario, the evidence we find points to the fact that the inclusion of the random walk probabilities avoids falling on a local minima of the potential function V .

Differentiability of the potential function
The potential functions in Eq. (4) represent attractive and repulsive forces and the control parameters α and β are the global magnitudes for the attractive and repulsive forces. From Eq. (3), we have thaṫ which defines the attractive and repulsive forces, namely, F i . We observe that applying the x-coordinate of the gradient to the distance between two particles holds where, analogously, the remaining coordinates hold similar results. Hence, We note that these equations are differentiable as long as their arguments, x j − x l , are different, namely, as long as x j = x l . This happens as long as particles are non-overlapping in space. Taking into account this consideration, the dynamical model for the particle's motion that follows from Eqs. (3)-(4) is given by Dimensional analysis and relevant parameters Our interacting particle model requires the determination of 3 parameters: α, β , and γ. However, as we show here, there is only a single relevant parameter, namely, the ratio between the attractive and repulsive force strengths: η ≡ β /α. The basis for our deduction is Buckingham π theorem. We start by noting that Eq. (3) can be written as 2/5 where we have introduced a dimensionless timet ≡ α γ t. The dimensions of α and γ are those of x divided by time and the inverse of x, respectively. Then, introducing a dimensionless vector y i ≡ γ x i for all particles, where after the change of variables, our interaction forces are written as These equations justify the use of γ = 1 in our work, since other values of γ constitute a stretching or contraction of the unit of time,t. Moreover, these equations also show the reason for setting α = 1 and varying β , since this is effectively changing the setting to a single parameter, namely, going from the 2D (α, β )-parameter space to the 1D (1, η)-parameter space.

Equilibrium state
The equilibrium state is found when the system of particles achieves the steady-state, namely, when˙ x i = 0 for all particles. Hence, this is fulfilled when which corresponds to where x * i is the steady-state position for the i-th particle, corresponding to the potential V minimum. This is a transcendental equation 3 for the equilibrium state of the particle system. We note that the tuning of the parameters α and β affect the equilibrium state as well as the modifying the connectivity of the network being analysed. This implies that finding a set of parameters that allows an accurate community detection requires a parameter tuning which depends on each particular network. However, we find that fine tuning of the parameters is generally unnecessary, meaning that parameter choices are robust. This allows to do a coarse search for parameter optimization and still find the optimal steady-state solution of the particle system.

Numerical framework and algorithm performance
Choosing the β parameter Although the model contains several parameters, most of them can be hold constant by using the values mentioned in the manuscript The only remaining parameter that needs adjustment according to the network features and the desire community structure detection is the parameter controlling the repulsive force, namely, β , as it is also shown in Dimensional analysis. In general, by choosing small values for β , we find that the repulsion term [Eq. (14)] becomes small and the particle system tends to highlight macro-clusters, namely, giant communities, or even a single cluster grouping all particles. By increasing β , the repulsion becomes stronger and a division of the macro-clusters is observed. Interestingly, the sum of the error for the seeds during the centroid-seed method (i.e., E(n) = ∑ E sk (n)) in Eq. indicative that the particles represented by a seed are more separated than they should be, albeit insufficiently separated to split the cluster into more seeds.
In particular, Fig. 1 presents the analysis of E(n) as a function of β for a hierarchical network, which corresponds to the network shown in Fig. (3) of the manuscript. On the one hand, there are two clear minima in the E(n) curve, one near β = 0.05 and another one near β = 0.3 (curve with triangles in Fig. 1). On the other hand, the first maxima of the NMI, achieved for this network using our particle approach, coincides with those two choices of β . Namely, the the macro-community detection, which is the curve with squares in Fig. 1, and the micro-community detection, which is the curve with circles in Fig. 1. Thus, by evaluating the evolution of E(n), one can find target values for the β parameter where a reasonable division of the network into communities is achieved. The left vertical axes shows the normalized mutual information, NMI, as a measure of the efficiency of our approach to detect communities. In particular, there is a distinction in our algorithm's performance to detect macro-or micro-communities (curves with squares or circles, respectively). The right vertical axes shows the quadratic error, E(n), of the centroid-seed approach we use to distinguish between different clusters of particles.
Moreover, since our model is intrinsically a dynamical model, the β changes are taken into account naturally as a perturbation to the particle system. This means that, it is unnecessary to reapply the model from the beginning for each value of β . Instead, after the transient time for the particle system to settle in an equilibrium state for a given β value, the following changes in β can be observed as small perturbations to this equilibrium state for the particles and a new equilibrium state is reached only a few iterations afterwards. In other words, the changes in β act as a bifurcation diagram, where at each stage that a community is unfolded into smaller communities a bifurcation has occurred. 1 The former conclusions are valid for a broad range of networks. However, we note that for larger networks (i.e., N 10 4 ) the θ r parameter can also be increased to reduce the number of iterations that the algorithm takes to achieve the equilibrium state. Although, assigning a large value for θ r can stop the particles' evolution prematurely leading to an ill-formation of particle clusters. This trade-off needs empirical adjustment for different classes of large networks.

Computational time
The variation of the particle's positions is computed according to [Eq. (1) in the manuscript] where x i is the instantaneous position of the i-th particle associated to the network, with i = 1, . . . , N.
In order to compute the former variation we need to evaluate the distance between each pair of nodes. Thus, each iteration of the model demands O(N 2 ) steps. The equilibrium is reached after some iterations, namely, T when using an Euler scheme for the discretization of Eq. (14). Thus, the core computational-cost of our method is O(T × N 2 ). However, we note that the number of iterations, T , varies depending on the network structure. Specifically, the larger is the mixing between the communities (i.e., more inter-connected communities), the more iterations are needed to reach the equilibrium state.
On the other hand, we have the clustering method. This method consists in finding the number of representative seeds (or communities), after the equilibrium state for the particles has been reached. The running time for each step of the clustering 4/5 method [see lines X-Y in Alg. (1) in the manuscript] is O(S × N), where S is the number of seeds (or detected communities). From our experiments, we observe that the clustering method demands C steps, linearly related to S by C = κ S with κ O(1). Thus, the computational complexity of the clustering method is O(S 2 × N).
Consequently, the total complexity of our approach for community detection is O(S 2 × N + T × N 2 ), considering both the particle dynamics and the clustering method. Since in general N S, we have that the running time is O(T × N 2 ). Table 1, with its figure, depicts the CPU time needed to run the serial version of our code. The experiments were conducted in a 2.5 GHz Intel Core i5, with 8GB of RAM.  4 with network sizes N = 10 4 : 10 5 , parameters µ = 0.1, k = 50, maximum degree k = 100, and community sizes from 500 to 1000. The values of the computational time t (in seconds), the model iterations T , and the clustering method iterations n, are averages over 30 network realizations.