Modularity of Directed Networks: Cycle Decomposition Approach

The problem of decomposing networks into modules (or clusters) has gained much attention in recent years, as it can account for a coarse-grained description of complex systems, often revealing functional subunits of these systems. A variety of module detection algorithms have been proposed, mostly oriented towards finding hard partitionings of undirected networks. Despite the increasing number of fuzzy clustering methods for directed networks, many of these approaches tend to neglect important directional information. In this paper, we present a novel random walk based approach for finding fuzzy partitions of directed, weighted networks, where edge directions play a crucial role in defining how well nodes in a module are interconnected. We will show that cycle decomposition of a random walk process connects the notion of network modules and information transport in a network, leading to a new, symmetric measure of node communication. walk process, for which we will prove that although being time-reversible it inherits all necessary information about directions and modular structure of the original network. Finally, we will use this measure to introduce a communication graph, for which we will show that although being undirected it inherits all necessary information about modular structures from the original network.


1.
Introduction. In recent years complex networks have become widely recognized as a powerful abstraction of real-world systems [18]. The simple form of network representation allows an easy description of various systems, such as biological, technological, physical and social systems [19]. Thus, network analysis can be used as a tool to understand the structural and functional organization of realworld systems. One of the most important network substructures are modules (or clusters, community structures), that typically correspond to groups of nodes which are similar to each other in some sense. In the case of undirected networks, modules represent densely interconnected subgraphs having relatively few connections to the rest of the network [9]. Finding modules leads to a coarse-grained description of network by smaller subunits, which in many cases have been found to correspond to functional units of real-world systems, such as protein complexes in biological networks [4]. Although a large number of different clustering algorithms [8,18] have been developed, most of them are oriented towards: (i) finding hard (or full, complete) partitions of networks into modules, where every node is assigned to exactly one module; (ii) analyzing undirected, unweighted networks. However, in many networks a fuzzy partition, where some nodes belong to several modules at once or to none at all, might be more natural. Moreover, many real-world networks are weighted and directed [3,15,16] such as metabolic networks, email networks, World Wide Web, food webs etc. In such networks edge directions contain important information about the underlying system and as such should not be dismissed [13]. So far, several approaches for finding modules in directed networks have been developed [14,27,15,25], see [16] for an extensive overview. Among these, generalizations of modularity optimization became the most popular group of approaches [1,15,12]. We will refer to this family of methods as generalized Newman-Girvan (gNG) approaches. Despite their frequent application, modularity based methods are typically only sensitive to the density of links, but tend to neglect information encoded in the pattern of the directions [13,24]. This limitation of gNG methods (but also one-step methods and methods based on network symmetrization) is illustrated with the following example. Using a gNG approach, the network shown in Figure 1a is partitioned into three modules C 1 , C 2 and C 3 with high link density. However, C 3 is qualitatively different from C 1 and C 2 : While in C 1 and C 2 edge directions are fairly symmetric, in C 3 they are completely asymmetric such that all nodes in C 3 are connected to where K(x, y) denotes the weight of an edge (xy) ∈ E and K + (x) = y K(x, y) the weighted out-degree of a node x. Since the networks we consider are strongly connected, the introduced random walk process is ergodic and has a unique stationary distribution π with π T P = π T . However, this process is not time-reversible and the detailed balance condition π x p xy = π y p yx , x, y ∈ V doesn't hold. For this reason, most of the existing spectral methods fail to deal with finding modules in directed networks, as they are restricted to analysis of reversible processes.

2.2.
Cycle decomposition of flows. In this section we will briefly introduce the theory of cycle decompositions of Markov processes on finite state spaces as developed in [11] and [10]. We will use this decomposition to describe the random walk process defined by (1) using cycles. Let us introduce a probability flow F : E → R, such that F xy = π x p xy for every 4 DJURDJEVAC CONRAD, BANISCH AND SCHÜTTE edge (xy) ∈ E. Then, the equation π T P = π T of stationarity for π directly yields conservation of the flow at every node x Now we will generalize the notion of edge flows to cycle flows, where we define cycles in the following way Definition 2.
1. An n-cycle (or n-loop) on G is defined as an ordered sequence 1 of n connected nodes γ = (x 1 , x 2 , . . . , x n ), whose length we denote by |γ| = n. Cycles that do not have self-intersections are called simple cycles and we denote with C the collection of simple cycles in G.
x 1 x 4 x 2  Figure 2. Examples of two simple cycles α and β and one realization of the Markov chain (X n ) x .
As a consequence of ergodicity and the flow conservation law (2), we can find a cycle decomposition of F into cycles. More precisely, there is a collection Γ ⊂ C of cycles γ with real positive weights F (γ) such that for every edge (xy) the flow decomposition formula holds where we write γ ⊃ (xy) if the edge (xy) is in γ. Different approaches for finding cycle decompositions of Markov processes [26,28,11,2] have been developed. In this paper we will refer to a stochastic decomposition approach introduced in [11,10], as this approach leads to finding a unique cycle decomposition. Let (X n ) 1≤n≤T be a realization of the Markov chain given by (1). We will say that (X n ) 1≤n≤T passes through the edge (xy) if there exists k < T such that X k = x and X k+1 = y. The process (X n ) 1≤n≤T passes through a cycle γ if it passes through all edges of a cycle γ in the right order, but not necessarily consecutively meaning that excursions through one or more full new cycles are allowed. This behavior is illustrated in Figure 2.
Let N γ T be the number of times (X n ) 1≤n≤T passes through a cycle γ up to time T . The limit exists almost surely [10]. Then, if the following weights condition (WC) holds we obtain a unique cycle decomposition (Γ, w). In terms of the random walk process, the weight w(γ) denotes the mean number of passages through a cycle γ of almost all sample paths (X n ) 1≤n≤T . We will refer to w(γ) as an importance weight for the cycle γ.
An explicit formula to calculate the weights w(γ) was given in [10], which reads where the normalization factor N x1...xn is a product of taboo Green's functions accounting for the excursions the random walk process can take while completing γ. It is hard to calculate w(γ) in general, so we will use (6) for illustration purposes only, see [10] for more details.
As an example we will consider the barbell graph presented in Figure 3. This is a weighted, directed graph consisting of two cycles α l = (l 0 , l 1 , . . . , l n−1 ) and α r = (r 0 , r 1 , . . . , r n−1 ) with n nodes each and all edge weights are one. These two cycles are joined by an undirected edge α c = (l 0 , r 0 ) with edge weight K(l 0 , r 0 ) = K(r 0 , l 0 ) = ε < 1. Then, the probability of a standard random walk process (1) leaving any of the two cycles at l 0 or r 0 is p l0r0 = 1+ = p r0l0 . The central nodes have the same stationary distribution π l0 = π r0 = π c and for all other nodes we have π li = π ri = π l . Using the flow conservation property (2) at one of the central nodes we get the equation π l + π c ε 1+ε = π c from which we can easily calculate π l = 1 2(n + ε) , π c = (1 + ε)π l = 1 + ε 2(n + ε) .
Since every edge belongs to exactly one of the three loops α c = (l 0 , r 0 ), α l = (l 0 , l 1 , . . . , l n−1 ) and α r = (r 0 , r 1 , . . . , r n−1 ), the weights of these loops can be inferred directly from (3) as w(α l ) = w(α r ) = 1 2(n + ε) =: w, w(α c ) = εw. 6 DJURDJEVAC CONRAD, BANISCH AND SCHÜTTE 2.3. Cycle-based random walks. In this section, we will use the stochastic cycle decomposition to introduce reversible processes that are a good approximation of the original non-reverisble process, in the sense that they contain all the important directional information needed for the module identification. This will allow us to construct graph transformations which will map a directed graph into an undirected graph, such that cycle node connections (i.e. if x, y ∈ α) and the cycle importance weights are encoded into the structure of the new graph. Then, we will be able to reduce the original module finding problem on directed networks to a well understood problem of finding module partitions of undirected networks. We start by reformulating the original dynamics (X n ) n on vertices V in terms of a dynamics on the cycle space Γ. Namely, instead of the dynamics of the standard random walk process with jumps from one node to another node, we will look at the random walk process with dynamics of the type node-cycle-node. To this end, we introduce Definition 2.2. The process (X n ) n is a stochastic process on V generated by the following procedure: 1. IfX n = x, then draw the next cycle ξ n according to the conditional probabilities 2. If ξ n = α, then follow the cycle α one step to obtainX n+1 = y, (xy) ⊂ α.
In this process, jumps from a node x to a cycle α are probabilistic with probabilities b (x) α given by the cycle decomposition {Γ, w}. Using (3) it follows that α⊃x w(α) = π x , so that b (x) α form indeed conditional probabilities. Furthermore, (X n ) n is a Markov process with a transition matrix P P(X n+1 = y|X n = x) = α⊃(xy) P(ξ n = α|X n = x) Thus, (X n ) n and (X n ) n are statistically identical and we will refer to them as (X n ) n in the rest of the paper. As a byproduct we obtained the associated chain (ξ n ) n ⊂ Γ of cycles used by (X n ) n . Notice that cycle-node jumps are deterministic and that (ξ n ) n is a non-Markovian process. To see this, consider the barbell graph in Figure  3. We have P(ξ n+2 = α r |ξ n+1 = α r , ξ n = α c ) = 1 since in this case X n+2 = r 1 with probability 1. But P(ξ n+2 = α r |ξ n+1 = α r ) = 1 − 1 n ε 1+ε as in this case X n+2 = r 0 with probability 1/n, so that X n+2 = α c is possible. If (X n ) n is in equilibrium such that P(X n = x) = π x , then the chain (ξ n ) n is also in equilibrium, with distribution Further assuming that (X n ) n is in equilibrium, the one-step transition matrix of (ξ n ) n is see Appendix for detailed calculation. Moreover, Q is detailed-balanced Thus, given a non-reversible Markov chain (X n ) n on V , we have constructed a reversible, but non-Markovian associated chain (ξ n ) n on the cycle space Γ. We will now modify Definition 2.2 to obtain two Markov chains (Z n ) n ⊂ V and (ζ n ) n ⊂ Γ. It will turn out later that (ζ n ) n is the Markov approximation of (ξ n ) n .
Definition 2.3. The cycle-node-cycle process (ζ n ) n ⊂ Γ and the node-cycle-node process (Z n ) n ⊂ V are stochastic processes generated by the following procedure: 1. If Z n = x, then draw the next cycle ζ n according to the conditional probabilities Jumps node-cycle are again probabilistic, but now also jumps from a cycle to a node are probabilistic because from a cycle α the process can jump with uniform probability to any of the nodes in this cycle. Both processes (Z n ) n and (ζ n ) n are Markov chains and we will now look at their transition properties. To simplify the notation we introduce the two stochastic matrices B : Let π n (x) := P(Z n = x) and µ n (α) := P(ζ n = α) be the probability distributions of (Z n ) n and (ζ n ) n . Then the above definition implies the following propagation scheme for π n and µ n π n The next lemma gives the explicit expressions for the transition matrices of (ζ n ) n and (Z n ) n in terms of B and V Lemma 2.4. Let P be the transition matrix of (Z n ) n and Q be the transition matrix of (ζ n ) n . Then P = BV and Q = VB and Proof. By the Kolmogorov forward equation π n+1 = P T π n , which in view of (10) immediately implies P = BV, and similarly we get Q = VB. On the other hand, using (9): Since B and V are stochastic, then P and Q are stochastic matrices too.
Process Table 1. Properties of the introduced random walk processes.
By comparing with (7), we can see that the one-step transition matrices of (ζ n ) n and (ξ n ) n are the same, so that (ζ n ) n is the Markov approximation of (ξ n ) n . Thus, our construction produced a Markov process (Z n ) n on the space of graph nodes V with transition matrix P and a closely related Markov process (ζ n ) n on the space of graph cycles Γ with the transition matrix Q. Table 1 shows the main properties of the introduced random walk processes. To establish a more precise relation between these processes, we will examine the spectral properties of P and Q. Let D π = diag(π) and D µ = diag(µ) with µ(α) = |α|w(α) as before. We equip R V and R Γ with the scalar products u, v π := u T D π v and u, v µ := u T D µ v. Then, Lemma 2.5 shows that (Z n ) n and (ζ n ) n are two time-reversible Markov processes with identical spectral properties.
Lemma 2.5. The transition matrices P and Q of processes (Z n ) n and (ζ n ) n have the following properties: (i) π is the stationary distribution of P and µ is the stationary distribution of Q.
(ii) (Z n ) n and (ζ n ) n are time-reversible processes, i.e. u, Pv π = Pu, v π and u, Qv Proof. See Appendix. Figure 4 shows the spectrum of transition matrices P and P for the barbell graph example from Figure 3 and n = 40, = 0.1. The spectrum of the transition matrix P of the original non-reversible random walk process consists of complex eigenvalues which all lie almost on the unit circle, i.e. have a module almost 1. On the other hand, the spectrum of P (therefore also Q) is real valued and offers a clear spectral gap after the second eigenvalue, indicating the existence of two network modules. The benefit of the new random walk process in this example is that it preserves the dominant real valued eigenvalues related to the existence of network modules and projects out the complex eigenvalues.  2.4. Communication graph and cycle graph. Time-reversibility of (Z n ) n and (ζ n ) n directly implies that the weights I xy := π x P xy and W αβ := µ(α)Q αβ are symmetric. Remarkably, both W αβ and I xy have an interpretation in terms of the original Markov process (X n ) n . More precisely, the quantity is the total exchange of probability flux F between the cycles α and β per timestep. To see this, note that for x ∈ α, w(α) is the fraction of probability current arriving at x which is carried by α, and b (x) β is the probability to switch to β at x, so that is the exchange of probability flux between α and β that happens at x. The quantity is a measure for intensity of communication between nodes x, y under (X n ) n in terms of the cycles through which x and y are connected. Indeed, I xy is large, when x and y are connected by many cycles α which are important (i.e. w(α) large) and short (i.e. |α| small). This interpretation of I xy as a communication intensity measure will be made clear in the next Section 2.5. We will use these properties of W αβ and I xy , to introduce the following undirected graphs: Definition 2.6. Given the directed graph G = (V, E), we let H G (Γ, E W ) be the undirected graph with vertex set Γ which has an edge with weight W αβ between α and β if W αβ > 0, and we call H G the cycle graph of G. We let K G (V, E I ) be the undirected graph with vertex set V which has an edge with weight I xy between x and y if I xy > 0, and we call K G the communication graph of G.
Notice that H G is simply the network representation of the cycle-node-cycle process (ζ n ) n and K G is the network representation of the node-cycle-node process (Z n ) n . Thus, the standard random walk processes on H G and K G will have transition matrices Q and P respectively. Since important module information from G is encoded in the structure of both H G and K G , we can apply some of the existing module finding approaches in undirected graphs on H G and K G and project the result to G. In this paper we will present the algorithm for module identification which uses only the communication graph (see Section 3), because H G is usually much larger than K G for modular directed networks G.
In Figure 5, the barbell graph is shown together with the corresponding cycle graph H G and communication graph K G . Since the cycle decomposition of the barbell graph produces three cycles α l , α c and α r , the cycle graph will have only three nodes. The edge weights (13) of the cycle graph are W α l αc = w 1+ = W αcαr . The communication graph of the barbell graph has the following edge weights x, y ∈ α l w/n x, y ∈ α r εw/2 x = y ∈ α c εw/2 + w/n x = y ∈ α c 0 otherwise (14) As observed in Figure 5, this results in cycles α l and α r being mapped into two complete subgraphs with equally weighted edges and consequently the appearance of two modules M 1 and M 2 in K G .

Remark 1.
(Line-graphs) Cycle graphs can be related to line-graphs (or edgegraphs), which are important objects in spectral graph theory [21,22]. If G = (V, E) is an undirected graph with |E| edges, its line-graph L(G) has |E| nodes which are edges of G and nodes in L(G) are adjacent if edges in G are. So, a transformation of G into L(G) maps graph's edges into nodes and a transformation of G into H G maps graph's cycles into nodes. Furthermore, line-graphs can be very efficiently used for fuzzy clustering of undirected networks [7] by employing some of the full partitioning approaches for finding modules in line-graphs and then projecting these modules back to the original graph. Similarly, one can use cycle graphs for fuzzy partitioning of directed networks.
Remark 2. (Entropy production) The work of [2] introduces cycle graphs as a way to describe non-equilibrium Markov processes. Indeed, many non-equilibrium properties may very well be related to cycle decompositions of the type we discussed here. We give one example. The entropy production rate is believed to be a measure of the degree of irreversibility of a system. For Markov chains, it is given by It has been shown in [10] that the entropy production rate can be expressed by means of the stochastic cycle decomposition (4) as Here, γ − denotes the cycle γ with reversed orientation. This shows that the entropy production rate is given in terms of the difference of the cycle weights for cycles γ and their reversed counterparts γ − . More precise relations between cycle decomposition and entropy production will be the topic of our research in the future.
2.5. Information transport. In this section we show that the cycle-node-cycle process (Z n ) n in fact mimics a model for information transport in the network, which gives an interpretation to the quantity I xy . Suppose that initially, X 0 = x. LetZ 0 = x denote the initial location of one bit of information. We will now model how this bit is transported through the network by means of (X n ) n : 1. Let τ be the first time (X n ) n returns to x. Since we assumed (X n ) n to be irreducible, τ is a.s. finite and the path p = (X 0 , . . . , X τ ) is a loop, possibly with self-intersections. We can decompose p into simple cycles, exactly one of which contains x. Let this cycle be γ.

The bit of information is now distributed uniformly among all nodes on γ.
To model this, we will pick the new locationZ 1 = y of the bit uniformly at random among the vertices of γ. We will then restart the process (X n ) n at y and repeat. Now observe that the outcome of step 1 is the cycle γ exactly iff (X n ) n passes through γ, as it was defined in Section 2.2. The probability of the outcome being γ is therefore the probability of (X n ) n passing through γ conditioned on starting at x, which is δ(x ∈ γ)w(γ)/ γ⊃x w(γ) = B xγ . On the other hand, if the outcome of (1) is γ, then the probability ofỸ 1 = y in (2) is 1 |γ| δ(y ∈ γ) = V γy . We thus have by Lemma 2.5: This implies that (Z n ) n = (Z n ) n , that is the node-cycle-node process describes the sequence of locations of the bit of information which is transported along the network by (X n ) n . Then is the probability that the bit is delivered from x to y, and therefore measures how easy it is to share information between the nodes x and y. From the point of view of the network G, I xy connects topological properties of G (how many cycles connect x and y, and how long are they) with dynamical properties of the process (X n ) n (which gives the importance weights w(α)). As we established in Section 2.4, I xy is symmetric since (Z n ) n is reversible. This allows us to apply any clustering method designed for undirected, weighted networks to find a partition of G into modules based on node communication. The algorithmic aspect of this approach will be presented in the next section together with a view on its computational complexity.

Identification of modules.
Following the ideas developed in Section 3, we will say that a subset M ⊂ V is a module of the directed graph G = (V, E) iff it is a module in the usual sense of the undirected, weighted communication graph K G = (V, E I ) defined in Section 2.4. This suggests the following procedure to identify modules in G: 1. Given the graph G = (V, E) with edge weights K(x, y), construct the transition matrix P according to (1). 2. Obtain the cycle decomposition {Γ, w} of the RW given by P , where the weights w(γ) satisfy (4). 3. Construct the undirected weighted graph K G and find modules in K G . To partition K G in step (3), we will use Markov State Modeling (MSM) clustering [5,23], but any other method designed for clustering undirected, weighted graphs could be used instead [8,18]. MSM clustering returns m module cores M 1 , . . . , M m ⊂ V , which are disjoint and do not necessarily form a full partition The number of modules m can be inferred from the spectrum of P , see [6] for more details. Furthermore, the MSM approach aims at finding fuzzy network partitions into modules, such that some nodes do not belong to only one of the modules but to several or to none at all. For nodes that do not deterministically belong to one of the modules, we will say that they belong to the transition region T . The next obvious question is how to characterize the affiliation of the nodes from the transition region to one of the modules. In the MSM approach, this affiliation is given in terms of the random walk process (Z n ) n on K G by the committor functions where τ x (A) is the first hitting time of the set A ⊂ V by the process (Z n ) n if started in node x, τ x (A) = inf{t ≥ 0 : Z n ∈ A, Z 0 = x}. Therefore, q i (x) gives us the probability that the random walk process if started in node x, will enter the module M i earlier than any other module. Committor functions can be computed very efficiently by solving sparse, symmetric and positive definite linear systems [17,6] ( Furthermore, it is easy see that the committor functions q 1 , . . . , q m form a partition of unity such that we can interpret q i (x) as the natural random walk based probability of affiliation of a node x to a module M i .

3.2.
Obtaining the cycle decomposition. We now turn our attention to the problem of actually obtaining the cycle decomposition {Γ, w} with weights w(γ) satisfying (4). This is clearly a hard problem: While even finding and enumerating all cycles in G scales exponentially in the worst case, explicitly computing the weight w(γ) for γ = (x 1 , . . . x l ) according to (6), requires the computation of l taboo Green functions of the form (I − P {x1,...,xs} ) −1 for 1 ≤ s ≤ l where P {x1,...,xs} is P with rows and columns indexed by {x 1 , . . . , x s } deleted [10]. This will not be feasible for large networks. We therefore resort to a sampling algorithm which computes the collection Γ of cycles and counts N γ T from a realization (X n ) 1≤n≤T of the Markov chain (X n ) n , see [10]. The algorithm uses an auxiliary chain η which is initialized at η = X 1 , then the states X i are added sequentially to η until η contains a cycle, which is then removed from η: Algorithm summary of the stochastic cycle decomposition: (i) Initialization: Set Γ = ∅ and η = X 1 .
Otherwise add γ to Γ and set N γ T = 1.
We then set w T (γ) := 1 T N γ T . In [10] it has been shown that w T (γ) → w(γ) a.s. as T → ∞, therefore the finite-time weights w T returned by the sampling algorithm are an approximation of the true weights w. The computational complexity of this algorithm is at most quadratic in T . However, the number of cycles with significant weights w(γ) which we have to sample accurately might scale unfavorably with the size of the network. In practise, one has to choose T such that convergence is observed. If the network is very large and very modular such that (X n ) n is very metastable, this might be prohibitively expensive.
Remark 3 (Timeseries perspective). In many applications, the network G will be a representation of a finite timeseries of data (X n ) 1≤n≤T . In such situations it is natural to use the sampling algorithm directly on the data, which fixes T and removes any convergence issues. Our method then turns into a form of timeseries analysis via recurrence networks. We will address this point of view in more detail in further publications.
Remark 4 (Alternative Algorithms). Various alternative algorithms exist to obtain a cycle decomposition {Γ,w} which satisfies only (3), but not (4). We quickly review two examples: (a) Deterministic iterative algorithm: [11,10] This algorithm iterates the following steps: The iteration stops when F < T OL is reached for a prescribed small T OL > 0. This algorithm has polynomial complexity in the number of vertices of G. (b) Cycle-basis construction: [11] The space C of cycle flows is a vector space of dimension 2 d ≤ |E| − |V | + 1 and can be equipped with a scalar product. Then we may takeΓ to be a basis of C, which can be found e.g. via a minimal spanning tree [11]. The weightsw can be computed by solving a linear system involving the d × d matrix C γγ = χ γ , χ γ . If we use (a) or (b) instead of the sampling algorithm in step (2), then the resulting weightsĨ xy loose their interpretation in terms of information transport measure between x and y. Still, in many casesĨ xy might be a good approximation to I xy , and (a) and (b) are fast even for large networks. The approximation quality ofĨ xy obtained by (a) and (b) will be addressed in our future work.
3.3. Verification of modularity. We can use the well-known modularity function [20] to evaluate the resulting partition {χ m } M m=1 of K G . We obtain that Q = m x,y χ m (x) [P(Z n = x, Z n+1 = y) − P(Z n = x)P(Z n = y)] χ m (y) = m x,y χ m (x) [I xy − π x π y ] χ m (y).
Notice thatQ is the modularity of the partitioning evaluated on the undirected graph K G . Recently the extensions of modularity to directed networks have been discussed in literature [15,12]. These approaches lead to the following modularity function Q of partitioning on G However, approaches based on such Q are shown to often neglect important directional information [13,24] and have the same outcome as when symmetrizing P . More precisely, Q is unchanged if π x p xy is replaced by the symmetrized version π x p s xy = 1 2 (π x p xy +π y p yx ), which ignores directions. In contrast, usingQ to evaluate the quality of a partitioning of K G is unambiguous, and the directional information of G is directly built into I xy . We will demonstrate on some examples in Section 4 that Q andQ may differ significantly. This difference comes mainly from the fact that Q takes into account only one-step transitions p xy , unlikeQ which considers multi-step transitions encoded in I xy via cycle structures of G.

Numerical experiments.
In this section, we will compare the results of three different clustering algorithms on a few example networks. The algorithms considered are: 1. The algorithm proposed in Section 3, i.e. construction of K G via a realization (X n ) 1≤n≤T and then MSM clustering of K G , which we will refer to as Cycle MSM (CMSM) clustering. This algorithm returns a fuzzy partition {χ m } M m=1 . 2. Construct K G via a realization (X n ) 1≤n≤T and then optimize the K G -based modularity functionQ (16) over all full partitions {χ m } M m=1 , χ m (x) ∈ {0, 1}. We will refer to this asQ-maximization. 3. Maximize the G-based modularity function Q (17) over all full partitions 1}. This direct approach doesn't need the construction of K G . We will refer to it as Q-maximization.
4.1. Barbell graph. As discussed earlier, the communication graph K G consists of two complete graphs with n nodes each which are joined by the edge (l 0 r 0 ). CMSM clustering produces a full partition {χ 1 , χ 2 } where χ 1 is one on C 1 = α l and zero on C 2 = α r , see Figure 6. The scores assigned by Q andQ to {χ 1 , χ 2 } are almost identical: Now we address the question whether {χ 1 , χ 2 } optimizes Q resp.Q. To do so, we compare {χ 1 , χ 2 } for even n with another partition into 3 sets C 1,a , C 1,b and C 2 where |C 1,a | = |C 1,b |, see Figure 6. Let ∆Q = Q(C 1 , C 2 ) − Q(C 1,a , C 1,b , C 2 ) and ∆Q similarly. One finds Then, ∆Q > 0 for n ≥ 8 and ∆Q < 0 as long as n > ε. That is, as n grows Q-maximization produces a partition into more and more subchains with less then 8 nodes whileQ-maximization always produces the partition {χ 1 , χ 2 }. Notice that the block structure (14) indicates that it is equally easy to transmit information between all nodes in C 1 while it is hard to transmit information between C 1 and C 2 . From the point of view of the RW process this is natural: The average time to go from C 1 to C 2 is (n+ 1 2 ) ε 1+ε , while mixing within C 1 and in particular transitions between C 1,a and C 1,b happen quickly.
To summarize, the barbell graph has a modular structure which is not based on link density and therefore cannot be detected by density based methods such as Q-maximization. Rather, this modular structure is information based and revealed by K G . In the particular case here, using CMSM clustering orQ-optimization to partition K G yields the same result.

4.2.
Wheel switch. Our next example is the 'wheel switch' graph which is shown in Figure 7a. It consists of a clockwise outer loop α o and an anticlockwise inner loop α i with n nodes each and some connections between the two loops. At each connection, the probability to switch between α o and α i is p and the connections are such that two smaller loops β 1 and β 2 of length 4 are formed (coloured in Figure  7a).
The community structure of this graph depends on p. If p > 1/2, completing the small loops β 1 and β 2 is more likely then completing α i and α o , so we expect l 0 r 0 r n-1 r 1 l n-1 ε C 1,a C 1,b C 1 l n/2 l n/2-1 C 2 Figure 6. The two modules C 1 and C 2 produced by CMSM clustering. Q-maximization gives a partitioning into subchains of the type C 1,a , C 1,b . a community structure where β 1 and β 2 dominate. Indeed, CMSM clustering produces the partition shown in Figure 7a which identifies β 1 and β 2 as modules and gives all other nodes an affiliation of 0.5 to both modules (indicated by the white colour). The modularity values achieved by this partitioning depend on p and n, but typical values areQ ≈ 0.2.
If p ≤ 1/2, transitions between α i and α o are unlikely and we expect to find the community structure shown in Figure 7b. Indeed CMSM clustering produces exactly this partition. As in the previous example, for p ≤ 1/2Q-maximization gives the same result. On the other hand and again as in the previous example, Q-maximization divides the red and green modules in Figure 7b further into many small chains of length less then 8.
Also for 1/2 < p ≤ 0.63,Q-maximization produces the partition shown in Figure  7b, while for p > 0.63 it gives the one shown in Figure 7c. Note that the difference between Figure 7a and Figure 7c is only that the white nodes are now fully assigned to one of the two modules in such a way that as much of α i and α o is intact as possible. Q-maximization again produces many small chains of length less then 8 as modules, how many depends on n. Even if we restrict Q-maximization to partitions with only two modules, the optimum is degenerate: Any cut S that produces modules of equal size and leaves β 1 and β 2 intact, as shown in Figure 7d, will optimize Q.
In summary, CMSM clustering exactly reproduces the expected community structure for all values of p and n. Q-maximization destroys the community structure for large n and produces many small chains instead.Q-maximization cannot produce the fuzzy partition of Figure 7a, but instead produces the best possible full partition, independent of n. However, the p-threshold between outcomes 7a and 7b is p = 0.5 for CMSM clustering, while it is p = 0.63 between outcomes 7b and 7c forQ-maximization. 5. Conclusion. In this paper we discussed the problem of finding fuzzy partitions of weighted, directed networks. The main difficulty compared to the well studied case of analyzing undirected networks is that considering asymmetric edge directions often leads to more complicated network substructures. For this reason most of the clustering approaches for undirected networks cannot be generalized to directed networks in a straightforward way. Although different new methods have been proposed, most of them fail to capture all directional information important for module finding, as they consider only one-step, one-directional transitions, which can lead to detecting false modules (where nodes are closely connected only in one  Figure 7. Graph of the wheel switch with n = 10, and p being the probability to switch between the outer and inner loop. The coloring: in (a) shows the modules found by CMSM clustering for p > 1/2; in (b) is shown clustering found by CMSM for p ≤ 1/2; (c) shows resulting clustering ofQ-maximization for p ≥ 0.63; and (d) one example of clustering obtained by Q-maximization for p ≥ 1/2. direction) and over-partitioning of the network. To this end, we introduced herein a novel measure of communication between nodes which is based on using multi-step, bidirectional transitions encoded by a cycle decomposition of the probability flow.
We have shown that this measure is symmetric and that it reflects the information transport in the network. Furthermore, this enabled us to construct an undirected communication graph that captures information flow of the original graph and apply clustering methods designed for undirected graphs. We applied our new method to several examples, showing how it overcomes the essential limitation of common methods. This indicates that our method can be used to circumvent the problems of one-step methods, like modularity based methods. Further generalizations of modularity function and consideration of different null models will be the topic of our future research. An important aspect of our method is its role in describing non-equilibrium Markov processes and in particular the connection with entropy production rate. Our future research will be oriented towards these problems.