Spectral Complexity of Directed Graphs and Application to Structural Decomposition

We introduce a new measure of complexity (called spectral complexity) for directed graphs. We start with splitting of the directed graph into its recurrent and non-recurrent parts. We define the spectral complexity metric in terms of the spectrum of the recurrence matrix (associated with the reccurent part of the graph) and the Wasserstein distance. We show that the total complexity of the graph can then be defined in terms of the spectral complexity, complexities of individual components and edge weights. The essential property of the spectral complexity metric is that it accounts for directed cycles in the graph. In engineered and software systems, such cycles give rise to sub-system interdependencies and increase risk for unintended consequences through positive feedback loops, instabilities, and infinite execution loops in software. In addition, we present a structural decomposition technique that identifies such cycles using a spectral technique. We show that this decomposition complements the well-known spectral decomposition analysis based on the Fiedler vector. We provide several examples of computation of spectral and total complexities, including the demonstration that the complexity increases monotonically with the average degree of a random graph. We also provide an example of spectral complexity computation for the architecture of a realistic fixed wing aircraft system.


Introduction
Given that complex engineering systems are constructed by composing various subsystems and components that interact with one another, it is common practice in modern engineering design to consider the directed interconnectivity graph as a representation of the underlying system [1]. Thus, the question of inferring complexity of a given system from the resulting graph arises naturally; the idea being that higher complexity graphs imply higher complexity of system design and testing procedures [2]. System complexity is particularly important in the context of complex aerospace systems and leads to frequent budget overruns and project delays [2,3]. Thus, early identification of complexity levels can enable early intervention and system redesign to mitigate risk.
A graph can be analyzed using either combinatorial graph-theoretic methods or by matrix representations such as the adjacency matrix. In the latter case, algebraic methods for analysis are available. In particular, the spectrum of the matrix associated with an undirected graph can be related to its structural properties [4,5]. Previously, the graph spectrum has been used to compute properties such as clusters [6,7] and isomorphisms [8]. Unfortunately, such relationships are not readily available in the case of directed graphs that arise frequently in typical engineering applications (and in various social network settings) due to the directionality of flow information or energy.
In directed graph theory, a common source of complexity is the existence of directed cycles in the graph. This led Thomas J. McCabe in 1976 to measure the complexity of a computer program [9,10], using the so-called cyclomatic complexity -which counts the number of linearly independent cycles in the program. A good survey on software system complexity metrics can be found in [11,12]. We argue that these cycles are particularly important in the context of engineering systems. In particular, they are important drivers of complexity. For example, these cycles can give rise to positive feedback loops [13] which lead to system instabilities. Cycles in engineering systems also make design and analysis challenging from a simulation convergence perspective [14,15].
Inspired by the above argument, we develop a class of complexity metrics based on the algebraic properties of a matrix that represents the underlying directed graph. Although there exists extensive literature on graph complexity measures of information-theoretic and energy type [16,17], such measures can either directly or indirectly be related to the moduli of eigenvalues of the underlying graph matrices. Our approach is based on ideas that are fundamentally different from the underlying concept present in the above works. Namely, we start with the postulate that the complexity of a system should be a measure of the distance from the least complex system of the same size. We assume that the least complex system is the one where every component is isolated, not interacting with any other component (thus lacks any interdependencies). We develop our "spectral complexity" metric by using a Wasserstein-type distance on spectral distribution of the recurrence matrix of the directed graph (for an application of such an approach to measure uncertainty, see [18]).
Based on the above spectral complexity approach, we then develop a novel graph decomposition technique that is based on cyclic interaction between subsystems and does not resort to symmetrization of the underlying matrices. This approach facilitates the identification of strongly interacting subsystems that can be used for design and analysis of complex systems. In particular, our goal is to group subsystems that should be co-designed or co-analyzed. Our methodology can be viewed as a complementary approach to Fiedler based methods and can also be used to provide graph sparsification [19].
The problem of structural decomposition, clustering or partitioning graphs (or data) into disjoint groups, is a problem that arises in numerous and diverse applications such as, social anthropology [20], gene networks [21], protein sequences [22], sensor networks [23], computer graphics [24] and Internet routing algorithms [25]. In general, the problem of clustering requires one to group a set of objects such that each partition contains similar objects, or objects that are "close" to one another with respect to an appropriate metric. Alternatively, graph partitioning can be mathematically posed as the set of approaches that minimize the number of edges that cross from one subgroup of nodes to another while maintaining a balanced decomposition [6].
Graph clustering is a well studied topic and spectral clustering has emerged as a very popular approach for decomposing graphs [6]. These methods for clustering graphs use the eigenvalues and eigenvectors of the graph Laplacian matrix to assign nodes to clusters [6]. The theoretical justification for these methods was given by M. Fiedler (see [26,27]). In spectral graph partitioning, one computes the eigenvector corresponding to the smallest non-zero eigenvalue of the Laplacian matrix. This eigenvector is known as the Fiedler vector [6] and is related to the minimum cut in undirected graphs [26,27]. The signs of the components of the Fiedler vector can be used to determine the cluster assignment for the nodes in the graph [6].
The drawback of spectral clustering and other traditional partitioning methods is that they are restricted to undirected graphs [6] (they assume that the adjacency matrix is symmetric). The problem of clustering undirected graphs has been well studied (we refer the reader to [5,7,28,29,30,31,32]). However, for many applications, the adjacency matrix resulting from the underlying graph representation is not symmetric. Examples include, engineering systems [33], social networks, citation networks, Internet communications, and the World Wide Web to name a few [34].
The theory for spectral partitioning of directed graphs has not been developed as extensively as that for undirected graphs [35]. In [36], the graph Laplacian for directed graphs is defined and its properties are analyzed. The Cheeger inequality for directed graphs is also derived in [36]. In [37] the author extends the work in [36] to partition directed graphs. A method for clustering directed weighted graphs, based on correlations between the elements of the eigenvectors is given in [38]. In [39], spectral clustering for directed graphs is formulated as an optimization problem. Here the objective function for minimization is the weighted cut of the directed graph. In [40], communities or modules in directed networks are found by maximizing the modularity function over all possible divisions of a network. The heuristic for this maximization is based on the eigenvectors of the corresponding modularity matrix. Recently, in [41], the authors develop a fast local approach to decompose graphs using network motifs. There are recent papers that consider complex eigenvalues of the graph transition matrix to achieve clustering [42,43]. While [43] concentrates on 3-cycles in a directed graph, our methods enable detection of more general, almost-cyclic structures. The spectral decomposition that we develop in this paper looks beyond the Fiedler vector for partitioning. We utilize complex eigenvalues of the graph transition matrix to identify underlying cycling behavior. The methods of [42] are closer to ours. 1 The paper is organized as follows. In section 2 we introduce the idea of spectral complexity of a directed graph. We compare the new measure of complexity to the standard graph energy complexity metric used in literature. In section 3 we propose an approach for partitioning directed graphs that groups nodes into clusters that tend to map into one another (i.e. form "almost cycles"). In section 4 we give examples and compare our results with existing methods.

Spectral Complexity
The key idea underlying our methodology is that every digraph G = (V, A, B), where V is a set of nodes, A is a set of directed edges, and B is a set of weights, can be represented using a multi-valued (one-to-many) map T : V → V that maps node i to a set of nodes j ∈ V i , with the associated probabilities p ij = β ij / j β ij , β ij being weights. If a node is a sink, and has no edges, we set p ii = 1. We consider the weighted adjacency matrix U whose i, j element is p ij . This matrix is analogous to the Koopman operator in dynamical systems [46,47].
We can decompose the state-space of one-to-many maps into the recurrent set V r and non-recurrent set V nr . We define the recurrent set as the set of all the points k ∈ V such that every orbit that starts at k lands in k some time later. The rest of V is the transient (non-recurrent) set. Obviously, the row-stochastic matrix U has its restriction R to the recurrent set, where R is obtained from U by deleting the rows and columns corresponding to transient set nodes. Note that R is also row-stochastic, since nodes in the recurrent set have 0 probability of transitioning to the transient set. We call T irreducible if we can get from any initial state k to any final state l, i.e. l ∈ T n (k) for some n and every k, l. V can be split into irreducible components. It can be shown that on each irreducible component, every state has the same period where in case the period is the greatest common divisor of all n such that k ∈ T n (k) [48]. We identify complex vectors with elements v j , j ∈ V with functions f : the function has a constant value on C. In the following, we will use the notion of period p = k/j, where k, j are integer and k ≥ j to mean p = k if k/j is not an integer, and k/j otherwise. We have the following theorem: Theorem 1 Let T be irreducible of period d. Then: 1. λ 1 = 1 is an eigenvalue of U and R. The eigenspace of λ 1 is onedimensional and consists of constant functions.
2. λ jd = e i2πj/d is an eigenvalue of U and R, where j = 1, ..., d. The 1 The paper [42] appeared in print and on arXiv after our submission. Also, the clustering methodology we provide was first disclosed in an internal report to DARPA [44]. We provide a different algorithm for clustering, and provide a more general theoretical justification for the method based on the work in [45]. eigenspaces associated with each of these consist of vectors whose level sets define an invariant partition of period that is equal to d/j.
4. If there is a pure source node, then 0 is in the spectrum of U .
Proof. Items 1. and 3. are a simple consequence of the Perron-Frobenius theorem [49]. Item 2. follows from the observation [48] that a Markov chain with period d possesses eigenvalues λ jd = e i2πj/d , and from the fact that T is a Discrete Random Dynamical System [47]. Then, Theorem 15 implies 2 that the associated eigenfunction f j/d is a deterministic factor map of T . The theorem also implies that the state space V splits into sets on which f j/d has constant value. The number of such sets is d provided d/j is not an integer, and d/j if it is. The last statement follows from the fact that if i is a source node, then a vector that is 1 on i and 0 on all other nodes gets mapped to 0 by U .
If T is not irreducible, it can always be split into irreducible components, and then Theorem 1 can be applied on each component. In Theorem 1 the cycle of order d is identified and its eigenvectors serve to partition the graph by using their level sets. The lower order cycles are also associated with an eigenvalue and an associated partition: Theorem 2 If λ jd = e i2πj/n is an eigenvalue of U or R, where n ≤ d then the eigenspace associated with it consist of vectors whose level sets define an invariant partition of period that is equal to n/j.
Proof. This also follows from Theorem 15 in [47] if we take the state space to be a discrete space V of n nodes, and T as a random dynamical system on it.
Theorems 1 and 2 give us motivation to define a measure of complexity based on the structure of recurrent (i.e. cycle-containing) and non-recurrent sets. Here are the postulates that we use for defining complexity, that is based on the properties of T : 1. Any graph that consists of disconnected single nodes has complexity equal to the sum of complexities of the nodes.
2. Any linear chain has complexity equal to the sum of the sum of complexity of the nodes and weights of the edges.
Note that in the definition of spectral complexity we use the notion of distance on the unit disk. There are a variety of choices that can be made, just like the choice of L 1 norm or L 2 norm on Lebesgue spaces. We now describe definitions and algorithms for computation of complexity, with a specific choice of distance based on the Wasserstein metric.

Definition of Spectral and Total Complexity of a Directed Graph
In this section, we propose an algorithm for calculating the complexity of directed graphs using the spectral properties of the matrix R. To construct the matrix for a graph, we start by removing all the sources and their corresponding edges until no sources are left. This is motivated by the notion that sources are elements that contribute to complexity in a linear manner, and will be included in the complexity metric through the edge weights. We note that, a source is a node with only outgoing edges (a disconnected node is not a source). In matrix terms, every source contributes to a zero (generalized) eigenvalue. We then construct the edge weighted adjacency matrix for the new graph that effectively captures the dynamics of the multivalued map T (a random walk on the graph). Thus, the rows of this adjacency matrix are normalized, such that the sum of the elements in any given row is 1. This is achieved by dividing each element in the row by the sum of all the row elements. If the row contains only zeros (the given node is a sink), we put a 1 on the diagonal element in that row, i.e. we add a self loop in a standard manner of associating a Markov chain with a graph. This changes the zero eigenvalue associated with that row to 1. Note that the eigenvector associated with this eigenvalue is constant on the connected component, and all the other eigenvalues and eigenvectors remain unchanged. We call the resulting matrix R the recurrence matrix. As a corollary of Theorem 1 we have that this matrix always has an eigenvalue 1 associated with a constant vector, and all of the remaining eigenvalues are distributed on the unit disk. We now define a complexity measure on the class of recurrence matrices. For a K × K recurrence matrix, we will define the least complex matrix to be the identity matrix (this matrix corresponds to a graph with no edges). This corresponds to the graph in which each node only has a pure self-loop. We define complexity as the distance of the eigenvalue distribution of R from the eigenvalue distribution of the identity matrix. Distance on distributions can be measured in different ways. Here we adopt an approach based on the Wasserstein distance. For this, we first need to define a distance on the unit disk. We do this using polar coordinates r and θ, considering the unit disk as the product space I × S 1 , where I = [0, 1]. The distance on I is the usual one d(r 1 , r 2 ) = |r 1 − r 2 |, while on S 1 we impose the discrete metric: Now, the normalized Wasserstein distance between the least complex eigenvalue distribution and the one with eigenvalues λ i = (r i , θ i ), i = 1, ..., n is, where K is the number of non-zero eigenvalues of the recurrence matrix R and 1 {θ =0} is the indicator function on the set {θ = 0}. The following fact on the graph with least spectral complexity is obvious: Fact: The graph with K disconnected nodes has complexity 0.
The first term in the spectral complexity function (1) is a measure of the amount of "leakage" in the graph. If one is performing a random walk on the graph then the leakage is a measure of the probability of transition between nodes [50]. This term takes values between 0 (no leakage) and 1 (probability of transition is 1). In other words, the first term captures the decay in probability of a random walk and the second term captures the cycles. According to the above definition, the maximally complex graph in some class should maximize both terms separately. Namely, the eigenvalues of such a graph would be radially as close to zero as the class definition allows, and would have the maximal number of eigenvalues off the positive real line inside the unit disc, thus maximizing the second term. The following result indicates how the maximum spectral complexity of a graph is achieved if the graph family is not restricted: Theorem 3 Let R be a K × K recurrence matrix of a K-node graph. Then maximal spectral complexity F is achieved for a matrix with constant entries.

Proof.
A recurrence matrix R with constant entries has K − 1 zero eigenvalues corresponding to eigenvectors that have −1 at j-th component and 1/(K −1) for all other components. Note that these are counted as θ = 0 eigenvalues. Since 1 is always an eigenvalue, the resulting eigenvalues maximize both the first and the second sum in F , making it 2(K − 1)/K.
From the above theorem, it is clear that graphs with a large number of nodes have maximal complexity very close to 2. This is evident in the example we present in the next section.
If the entries of R are such that it forms a random Markov matrix [51], then, as we prove next, the complexity increases to maximal complexity as the size of the matrix increases.
Theorem 4 Let β jk , j, l ≥ 1 be i.i.d random variables with bounded density, mean m and finite positive variance σ 2 . Every realization of β jk , j, l ≤ K gives a weighted directed graph. Let R be a K ×K recurrence matrix of such a K-node graph. Then F (R) → 2 as K → ∞, with probability 1.
Proof. The recurrence matrix R is a random Markov transition matrix [51] with the underlying Markov chain irreducible with probability 1. Let be the empirical measure supported on the location of eigenvalues of the matrix µ √ KR , where δ λj is the Dirac delta function centered at eigenvalue λ j . Then, Theorem 1.3 in [51] implies that µ √ KR converges to the uniform measure on the disk U σ/m = {z ∈ C|z ≤ σ/m}. This, in turn, implies that the modulus of eigenvalues of R goes to zero as K → ∞, and that Also noting that lim The above result is interesting in the context of numerical tests that we do in section 2.3, which show random graphs of increasing size whose complexity converges to 2, and in the section 4.2, where most of the eigenvalue distributions for several web-based networks are within a disk in the complex plane, but a small proportion is not, indicating the non-random nature (and lower complexity) of these networks.
The use of the "counting" of eigenvaues with θ j = 0 in the second term of F makes the spectral complexity measure have some features of discrete metrics, as the following example shows: Example 5 (Spectral complexity in a class of recurrent 2-graphs) We consider graphs with 2 elements that have both a self loop and an edge connecting them to the other element, with uniform probabilities as shown in figure 1. Such a system has R of the form: where p ∈ [0, 1]. The eigenvalues λ of R satisfy the equation One solution comes from, and the other from, For p < 1/2 the self loop is weaker than the edge connecting to the other node, and for p > 1/2 the opposite is true. The spectral complexity is Spectral complexity of this class of graphs distinguishes between graphs that have stronger self-interaction than interaction between the nodes, characterized by p > 1/2 and the graphs in which the interaction between the nodes is stronger than the self-interaction. Note that spectral complexity is discontinuous at p = 1/2. This is in line with the behavior of the underlying Markov chain: for p > 1/2 any initial probability distribution on the chain will decay exponentially and monotonically to the uniform distribution. For p < 1/2, the decay of the distribution assumes oscillatory manner, thus representing a qualitative, discontinuous change in behavior. Note that for p = 1/2 the complexity measure shows features of a discrete metric. Thus, the discontinuity in the complexity metric accurately captures the transition from the more complex oscillatory evolution of the distribution to the invariant measure (for p ≤ 1/2) versus the less complex monotonic convergence to the invariant measure for p > 1/2. We note that the oscillatory nature of the distribution, in the more complex case, corresponds to strong interaction between nodes (since p ≤ 1/2). This is in contrast with the weak interactions between nodes in the p > 1/2 case, whereby the graph interactions are less important when compared to the self interaction of nodes.

Physical intuition for complexity metric and meaning of eigenfunctions of the recurrence matrix for the network behavior
Spectral objects associated with undirected graphs -such as the Fiedler eigenvalue, that is associated with speed of mixing of the associated Markov chain and reflects connectivity of the underlying graph, and the Fiedler vector, whose components indicate subgraphs that have strong internal connectivity but weak interconnectivity -often have impact on the physical understanding of the network. The same is true for the eigenvalues and eigenvectors of the matrix R.
They have strong correlation with the structural properties of the underlying graph. For example, existence of a real eigenvalue 0 > λ ≥ −1 indicates that the network can be split into two subnetworks that have weak internal connectivity but strong interconnectivity between two subnetworks (see Example 5). Also, the associated eigenvector values can be clustered into two separate sets that indicate the mentioned subgraphs. Both the simple example 5, and the large graph Wikipedia example in the section 4.2 provide evidence for this statement. Analogously, an eigenvalue set λ 1 , λ 2 .λ 3 , whose arguments are close to (0, π/3, 2π/3) indicates that the graph possesses 3 subgraphs with weak in-ternal and strong connectivity between the 3 subgraphs. An example of this is shown in the section 4.2 for the Gnutella network. The complexity metric has the above spectral elements as part of the metric. It speaks to the structural complexity of the graph, but it has a physical meaning for the behavior of the network as well. As a simple example, consider the case of spring mass system illustrated in figure 2. If we set k 1 = k 2 , the weight matrix is The associated recurrence matrix is then where p = k 1 /(k 1 + k 12 ). Now assume p = 1. This indicates a decoupled system, and the complexity of such system is clearly the smallest among all considered systems. For p slightly smaller then 1, the complexity is small, as the system is "almost decoupled". For p = 0, the system has one eigenvalue at −1, indicating that the 2 masses interact strongly, while there is no self-interaction for either mass. It is physically intuitive that the highest complexity occurs for p = 1/2, in which case the effects of both the spring attached to only one of the masses and the spring attached to both masses have equal influence on the individual mass motion. It is also intuitive that the situation with p = 0 is less complex -for example, in design considerations we do not need to take into account the properties of two of the springs. This intuition carries over to other examples. If we take three masses with no self interaction, but connected by springs, there is a double eigenvalue at −1 and thus its complexity is larger than that of the 2-mass system. The more balanced the self connectivity is with the connectivity to other nodes the more complex tasks like engineering design will become. It is sometimes argued that networks with full connectivity are simpler to analyze, but this comes from a statistical mechanics approach to the problem. In a design engineer or maintenance engineer world, adding an edge in the device or network design always increases the complexity of the resulting system.
The above discussion introduces a way of measuring the complexity of the recurrent part of a directed graph, and points to the intuitive aspects of the definition. But, complexity of the graph is not solely a function of the recurrence and cycles. Namely, more components in a graph, and more edges between nonrecurrent nodes contribute to complexity as well -and we assume they do so in a linear fashion. Thus, if for a particular application we need to take into account the weights of nodes and the weights of the removed edges while removing sources, the total complexity C can be formulated in the following way: where W is the user-defined weighting parameter for the spectral complexity in the total complexity metric that can take any value from [0, ∞). N is the initial total number of nodes, α i 's are the complexity of the individual nodes, 3 M is the number of edges removed while removing source nodes and β j 's are the weights of the edges that were excluded in the source nodes removal step. F is given by equation (1) and γ is the scaling factor that arises due to the fact that the terms β j and F might have vastly different numerical values. One choice for γ is the following: Note that the expectation is taken over various configurations of the system, and thus the probability distribution on a collection of graphs must be given. An alternative choice is to replace the E operator with the nonlinear max operator in equation (12).

Comparison with Graph Energy
In this section, we compare the spectral complexity introduced in this paper to graph energy. The notion of graph energy [52,53] emerged from molecular and quantum chemistry, where it has found use in ranking proteins on the basis of the level of folding [54]. It has also been used as a metric for complexity of graphs. The graph energy complexity, interestingly, does not peak for graphs with maximum possible connections (the rank of the adjacency matrix for a complete graph is not maximum). Instead, statistically the most complex graphs are those with ≈ 2/3 possible connections [55]. Note that this complexity metric fails to capture directed cycles in the graph since one is forced to either work only with undirected or symmetrized directed graphs, as demonstrated below.
The algorithm for calculating graph energy is as follows. At first, for a given graph, we construct the adjacency matrix M : 1, for all edges (i, j), i = j of the graph; 0, otherwise.
The graph energy C is calculated by using the following formula: where b |A| are edge weights, |A| is the number of edges in the graph, SVD(M ) is a vector of singular values of matrix M . Equation (14) can be used with symmetrized adjacency matrix M In the following, we present Figures 4 and 5 to highlight the difference between the complexity introduced in this paper and the graph energy. Random graphs were probabilistically constructed using the following formula: the probability with which a node is connected to another node is given by p = Average Degree Number of Nodes .
All graphs considered have 1000 nodes. The average degree was varied from 1 to 20 in increments of 1 and then from 50 to 1000 in increments of 50. The degree is defined as the number of outgoing edges from each node. All weights of the edges are equal to 1. Each realization was repeated 10 times. The spectral complexity increases fast with the average degree, reaching values of about 1.8 (out of the maximum possible value of 2) at an average degree of about 20/1000 of the total number of nodes; it then continues to increase monotonically, but less rapidly, with the average degree.
In the case of the graph energy, as shown in Figure 5, the maximum energy is reached when the average degree is at about 50% of the total number of nodes; then the graph energy starts to decrease.     (14) as a function of the average degree of the graph. In the case when the matrix was symmetrized, the average degree relates to the initial non-symmetrized matrix. Each graph has 1000 nodes.
This difference can be understood from the following argument. For simplicity, we take graphs with edge weights all equal to 1. As shown in Theorem 4, random graphs with large average degree will statistically have eigenvalues with modulus close to zero. Since graph energy is equal in this case to the sum of moduli of eigenvalues, the graph energy will be small. In other words, small graph energy is in fact indicative of a large number of connections in the graph, and thus large, not small, complexity. Namely, the key to decrease of energy of random graphs is the decrease in the moduli of the eigenvalues. In contrast, the metric F counts the number of complex eigenvalues, that will, in the case of a random graph with large average degree tend to increase with the average degree.
In the case of graphs corresponding to engineered systems, there is no reason why the complexity should decrease with increasing the number of connections (interdependencies) in the graph. Thus, we believe that the complexity measure introduced in this paper is more appropriate for engineering and physical systems. We note that the graph energy metric is more appropriate from an information theory standpoint.
We additionally note that in [56], the authors develop a complexity measure that is based on the entanglement of cycles in directed graphs. They compute this metric using a game theoretic approach (using a cops-and-robbers game). This reachability approach is similar in spirit to our spectral cyclomatic complexity measure. However, we note that, in general, computing k-entanglements scales as O(n k+1 ), whereas our approach in general scales as O(n 3 ), and much faster than that for sparse graphs. The spectral complexity captures the "entanglements" at all scales of the graph (for all k). Moreover, unlike the approach in [56], our methodology leads to natural clustering of the graph that is discussed in the next section.

Clustering of Directed Graphs
The clustering of undirected graphs is a well-developed area [5,6,7] with several decades of research behind it. The area of clustering of directed graphs is far less developed. However, the analysis and clustering of directed graphs is slowly coming in vogue [57,36,58,59]. In [36], the author generalizes random walk based Cheeger bounds to directed graphs. These bounds are related to the spectral cuts often used for graph partitioning [5]. In [57], the authors generalize Laplacian dynamics to directed graphs, resulting in a modularity (quality) cost function for optimal splitting.
An alternative approach has focused on block modeling [58,59]. Under this methodology, nodes are grouped into classes that exist in an image graph. This assignment is performed based on node connectivity and neighbor properties. This approach assumes that a template image graph and roles (for the nodes) are supplied a priori. The graph is then fit onto the image graph using an optimization scheme [58]. Although the approach extends to directed graphs, such image graphs are not always available in engineering or social systems.
In the following, we introduce a new graph clustering approach that complements standard spectral methods for decomposing graphs. In particular, we construct a new algorithm that is based on computing the underlying cycles in the graph by computing the corresponding generating eigenvalues and eigenvectors. In particular, by decomposing the graph into these cycles, we aim to identify strongly interacting components in a directed graph. The method is compared to Cheeger and Laplacian dynamic based methods [36,57].
From the discussion leading to Theorem 1, we recognize that cycling in a directed graph is associated with its recurrent part. Thus, we can use spectral properties -and in particular complex eigenvalue pairs -of the recurrence matrix R in order to recognize cycles in a directed graph. Note that, according to Theorem 1, a set of complex eigenvalues with unit modulus always has a generator e i2π/d . We extend this idea to eigenvalues off the unit circle and search for such generating eigenvalues.
In our algorithm, we seek the dominant cycle in a graph by identifying an eigenvalue (the generating eigenvalue) that is closest to a pure cycle on the unit circle. The algorithm is as follows: we compute nonzero eigenvalues λ j of R. We then compute the angles α j of the calculated eigenvalues in the complex plane and set where K = 2, . . . , N , N is the number of nonzero eigenvalues, S is the set of eigenvalues for which (2π/K) × (t − 1.5) ≤ α j ≤ (2π/K) × (t − 0.5). If the set S is empty, then the minimum in equation (15) is 1. We denote the number of clusters corresponding to the dominant cycle as K min . Then we find the generating eigenvalue(s) and the corresponding eigenvector(s). We choose j such that π/K min ≤ α j ≤ 3π/K min . We want the generating eigenvalue to be close to the case of a pure cycle of size K min , when the generating eigenvalue is at 2π/K min . We find the index of the first generating eigenvector as argmin j |λ j − exp( 2πi Kmin )|. Other generating eigenvalues are those that are within a predefined threshold (we use 10 −4 in our work) of the first generating eigenvalue.
For each generating eigenvector v j we compute angles φ i in the range [0, 2π] for each element 1 ≤ i ≤ N . Then we obtain graph clusters by partitioning coordinates of v j into K min groups by splitting the unit circle into K min equal parts. Disconnected nodes and sinks are placed in separate clusters.
For example, the 7 node graph (see Fig. 6 (left)) with 6 non-zero eigenvalues of the recurrence matrix (red points in Fig. 6 (right)) has K min = 3 clusters. The sector of the unit circle, that contains the generating eigenvalue is between π/K min and 3π/K min and is colored with green in Fig. 6 (right). The generating eigenvalue is the non-zero eigenvalue which is closest to the eigenvalue of the pure cycle of size K min .
In previous work [60,61] a method for identifying coarse-grained dynamics using aggregation of variables or states in linear dynamical systems was developed. The condition for aggregation is expressed as a permutation symmetry of a set of dual eigenvectors of the matrix that defines the dynamics. It is based on the fact that the n × k aggregation matrix Π reduces a (transition) matrix P describing a linear dynamical system if and only if there exists a set of k linearly independent vectors invariant under P T , e.g. (left) eigenvectors, that respect the same permutation symmetry group as Π. It is straightforward to identify permutation symmetries in the invariant vectors of P T . A permutation symmetry is realized through identical elements in the vectors. Thus, by identifying the above permutation symmetries, one can group elements in a complex (directed) graph. In other words, the algorithm that we introduced above leads to a natural method for graph sparsification [19].

Fixed Wing Aircraft Example
To test both our clustering approach and the complexity metric, we consider the architecture of a fixed wing aeroplane system [33]. This is a particularly important and relevant example since recent analysis by the RAND Corporation concluded that the increase in cost of fixed wing aircraft is primarily due to increased complexity [62]. An example of the impact of the complexity of fixed wing aircraft is the recent cost overruns of the F-35 platform [3].
The aerospace system considered in this work, consists of the following functional subsystems: aircraft engine, fuel system, electrical power system (EPS), environmental control system (ECS), auxiliary power unit (APU), ram cooler, and actuation systems. These subsystems may be connected to one another through various means. For example, the engine may provide shaft power to the fuel system, the EPS and actuation system. Similarly, the APU may be connected to the engine as it may be required to provide start up pneumatic power. Note that the interconnections need not be electrical or mechanical in nature. Since the fuel system can be designed to absorb heat from the actuation system and EPS, the dependencies of the subsystems may also be thermal. For a discussion on these systems we refer the reader to [33]. An example architecture depicting the subsystems and their interconnections is shown in Fig. 7. Figure 7: Example architecture of a fixed-wing aeroplane system. Traditionally, aerospace system architectures are specified by subsystems (such as EPS, ECS etc) and their interconnections. The exploration of design space for these aerospace systems can be a particularly daunting and challenging task. One possible approach to this problem has been to enumerate all feasible architectures and then pick the most desirable one [33]. It would appear that the exponential size of the design space would make this enumeration task intractable. However, the feasible set is typically very sparse and generative filters can be used to enumerate all the possible system designs [33].
In generative filters, one starts by defining the functional subsystems and then listing their interconnection rules. Based on these rules one can efficiently identify all possible architectures [33]. Using generative filtering on the fixed wing aircraft system gives 27, 225 feasible architectures (significantly less than the 2 42 possible combinations of subsystem interconnection). One can now analyze and rank the resulting architectures based on complexity and interdependencies.
After analyzing 27,225 configurations of a system, we show the most complex one and the least complex one, from the definition of metrics in equation (1) and equation (11) with W = ∞. We compare results obtained by using our spectral complexity with those obtained by using graph energy.
We compare our clustering results, with those obtained by using the Fiedler method, Cheeger bounds [36], and modularity maximization [57]. Our approach for the Fiedler method is as follows: at first for a given graph we construct the adjacency matrix M according to equation (13). Then we symmetrize the obtained matrix as M (sym) ij = M ij ∨M ji , where ∨ is the logical OR operator. After that we find the Laplacian matrix L = D − M (sym) , where D is the degree matrix. In this matrix, rows sum to zero. The Fiedler approach is based on the second smallest eigenvalue and the corresponding eigenvector of the symmetric matrix L. In particular, the signs of the components of the corresponding eigenvector are used to partition the graph in two parts.
In the following, N1 will correspond to the engine, N2 to the fuel system, N3 to the EPS, N4 to the ECS, N5 to the APU, N6 to the ram cooler, and N7 to the actuation system.

High Complexity Architecture
After analyzing all 27,255 configurations, the architecture number 26, 940 in Fig. 8 was found to be the most complex. The eigenvalues for the graph are displayed in Fig. 9. Figure 8: Graph configuration 26,940. Edge weights are shown next to the edges. Node 1 has weight 20, Node 2 has weight 8, Node 3 has weight 10, Node 4 has weight 10, Node 5 has weight 15, Node 6 has weight 4.
The complexity for this graph by using equation (1) and W = ∞ in equation (11) is equal to 1.4043. The complexity for the random graph with the same number of nodes and average degree by using equation (1) and W = ∞ in equation (11) is equal to 0.9237. The complexity predicted by equation (1) for the high complexity graph is about 152% of the value of complexity predicted in expectation by the same equation for a random graph. The complexity for this graph by using equation (1) and W = 1 in equation (11) is equal to 1.2012.
The above complexity can be motivated from a "system cycle" standpoint. In particular, in Fig. 8  These cycles capture energy, fuel, and data flows and interactions. We note that increased interactions amongst aircraft subsystems can be related to reduced efficiencies and failures [63]. Thus, multiple intersecting cycles with several nodes give rise to higher complexity systems since failure in single subsystems would propagate through the cycles and across thereby requiring additional redundancies for safety.
The nodes form the following clusters: cluster 1 contains Nodes 1 (engine), 4 (ECS), and 6 (ram cooler); cluster 2 is Node 2 (fuel system), cluster 3 is Node 3 (EPS); and cluster 4 is Node 5 (APU). Here we note that the single node clusters are ones that co-occur in multiple cycles. By visual inspection one can see the "leaky" (in the sense that eigenvalues corresponding to it are at a large distance from the unit circle) 4-cycle composed of the clusters; the system cycles through the 4-cycle giving rise to high complexity. This leakiness naturally arises due to the interactions of the various cycles (enumerated above) at common nodes such as Fuel System, APU etc.
The energy for this graph by using equation (14) is equal to 28.3401 (sum of singular values is equal to 7.9352). If the matrix is symmetrized, then the energy for this graph by using equation (14) is equal to 33.9041 (sum of singular values is equal to 9.4931).
By using the Fiedler method the graph is divided into the following clusters: cluster 1 contains Nodes 2 (fuel system), 3 (EPS) and 6 (ram cooler); cluster 2 contains Nodes 1 (engine), 4 (ECS) and 5 (APU), which neither captures strongly connected components nor critical nodes that co-occur in multiple cycles.
Using a Cheeger bound approach [36], we find that the above graph is split into two groups. Cluster 1 contains nodes [1,2] and cluster 2 contains nodes [3,4,5,6]. The spectral approach for modularity maximization (by analyzing the leading eigenvector) yields a clustering where nodes [1,2,4,6] are in the first cluster and nodes [3,5] lie in cluster 2. Neither of these methods capture the visually evident cycling behavior. We now contrast this architecture with one of low complexity as identified by our approach.

Low Complexity Architecture
After analyzing all 27,255 configurations as above, the architecture number 1, 160 in Fig. 10 was found to be the least complex, not counting very simple graphs containing mostly disjoint nodes after removing sources. The eigenvalues for the graph are displayed in Fig. 11.  Figure 10: Graph configuration 1,160. Edge weights are shown next to the edges. Node 1 has weight 20, Node 2 has weight 8, Node 3 has weight 10, Node 4 has weight 10, Node 5 has weight 15, Node 6 has weight 4, Node 7 has weight 8.
The complexity by using equation (1) and W = ∞ in equation (11) is equal to 0.5847. The complexity for the random graph with the same number of nodes and average degree by using equation (1) and W = ∞ in equation (11) is equal to 0.8136. The complexity predicted by equation (1) for the low complexity graph is about 71% of the value of complexity predicted in expectation by the same equation for a random graph. The complexity by using equation (1) and W = 1 in equation (11) is equal to 0.8195.
As in the previous case, the complexity can again be motivated from a "system cycle" standpoint. In particular, in Fig. 8  Compared to the architecture with higher complexity, we see that this example has only 5 cycles versus 6 in the previous one. Additionally, the cycles in the higher complexity architecture have more nodes (hops) when compared to the low complexity architecture. Thus, the previous architecture had a higher complexity when compared to the current one despite the fact that the current example has one additional node (7 nodes) when compared to the previous one (6 nodes).
The nodes form the following clusters: cluster 1 contains Nodes 1 (engine) and 5 (APU); cluster 2 contains Nodes 2 (fuel system) and 3 (EPS). It is easy to check that these nodes generate the cycles in the graph. The eigenvalues indicate a "leaky" two cycle with these two clusters. Nodes 4 (ECS), 6 (ram cooler) and 7 (actuation systems) are sinks. These unidirectional connections lower the complexity of the system.
The energy for this graph by using equation (14) is equal to 25.6040 (sum of SVDs is equal to 7.2359). If the matrix is symmetrized, then the energy for this graph by using equation (14) is equal to 34.8340 (sum of SVDs is equal to 9.8444). Thus, in contrast to spectral complexity, they are not much different in values obtained for the high complexity architecture. Note that the self loop of node 2 is not included in the energy calculation.
Using a Cheeger bound approach [36], we find that the clustering approach finds no partition. The spectral approach for modularity maximization (by analyzing the leading eigenvector) and Fiedler method both yield a clustering where nodes [1,2,3,5] are in the first cluster and nodes [4,6,7] lie in cluster 2. Once again, these methods do not capture the cycling behavior.

Large Network Examples
In this subsection we provide examples of calculating complexity and clustering for some large graphs.

Wikipedia who-votes-on-whom network
At first we consider the Wikipedia who-votes-on-whom network with 7, 115 nodes ( [34]). Nodes in the network represent Wikipedia users and a directed edge from node i to node j represents that user i voted for user j. After removing sources, the network has 2,372 nodes. This is to be expected since most nodes are simply voters that do not compete in elections (making them sources with no incoming edges). In Figure 12, we show nonzero elements of the recurrence matrix. The multiplicity of λ i = 0 is 82 and the multiplicity of λ i = 1 is 1005, which corresponds to 42.4% of the total number of nodes. In Figure 13, we show all non-zero eigenvalues of the recurrence matrix. Complexity The complexity obtained from equation (1) is equal to 1.0418 (0.4938+0.5480). The complexity for the random graph with the same number of nodes and average degree by using equation (1) is equal to 1.8171 (0.8215+0.9956). The complexity predicted by equation (1) for the Wikipedia who-votes-on-whom graph is about 57% of the value of complexity predicted by the same equation for the random graph, indicating an internal structure to the graph. Looking at the eigenvalue distribution shown in figure 13, we see that it has the structure of randomly distributed eigenvalues inside a disk of small radius. We know from theorem 3 that such distributions of eigenvalues yield high spectral complexity. There is also a set of eigenvalues away from that disk on positive and negative real line inside the unit disc. We next show, using clustering, that there is internal structure corresponding to a low period -namely period 2-cycle that contributes to an eigenvalue on the negative real line that lowers complexity over the maximally complex graph, or even a random graph. Clustering There are 56 disjoint single nodes for Wikipedia who-votes-on-whom network which are not considered for clustering. The graph contains 1,016 sinks. The clustering was done for the strongly connected component. We obtained cluster  In Figure 14, we plot the ratio of the number of edges going from cluster X to cluster Y to the number of edges inside cluster X depending on the percentage of nodes in all clusters. The percentage of nodes in all clusters is calculated as follows: we first sort the generating eigenvector in the ascending order. We then compute the fraction of nodes to keep such that the sum of the ratios is the maximum.  Figure 14: The ratio of the number of edges going from cluster X to cluster Y to the number of edges inside cluster X depending on the percentage of nodes in all clusters for Wikipedia who-votes-on-whom network. Cluster X can be cluster C1 or cluster C2 and cluster Y can be cluster C1 or cluster C2. The asterisk shows the point where the sum of two ratios is the maximum.
In the following, we select such percentage of nodes in all clusters so that the sum of two ratios, plotted as solid lines in Figure 14, is the maximum. We mark the found point of 2.9% with * on Figure 14. The obtained graph is shown in Figure 15, where nodes' numbers are numbers in the graph before removing sources. The number of nodes in each cluster and the ratio of the number of edges between clusters or inside the cluster to the number of nodes in the cluster are shown in Table 1. The average degree of this graph is 1.3243, calculated as the ratio of the total number of outgoing edges from each cluster and edges inside each cluster to the total number of nodes in clusters. In the table, the number in parenthesis shows the number of nodes in the corresponding cluster. Other numbers show the ratio of the number of edges from X to Y to the number of nodes in X, where X can be cluster C1 and cluster C2 and Y can be cluster C1 and cluster C2. As it can be seen from the table the biggest ratio is for C1 to C2 and for C2 to C1.
The number of nodes in each cluster and the ratio of the number of edges between clusters or inside the cluster to the number of nodes in the cluster in the case of 100% of initial number of nodes in all clusters are shown in Table  2. The average degree is 30.3508. As it can be seen from the table the biggest ratio is for C1 to C2.  Figure 15: Clustering for Wikipedia who-votes-on-whom network with 2.9% of initial number of nodes in both cluster C1 and cluster C2. Nodes labels are nodes numbers in the network before removing sources. The nodes from cluster C1 are situated on light red background. The nodes from cluster C2 are situated on light green background. The edges going from cluster C1 to cluster C2 are red, the edges going from cluster C2 to cluster C1 are green, the edges inside clusters are black.  We also performed clustering for the strongly connected component by using the Fiedler method. We obtained cluster 1 of 1,280 nodes and cluster 2 of 20 nodes (a highly unbalanced cut). The table for the number of nodes in each cluster and the ratio of the number of edges between clusters or inside the cluster to the number of nodes in the cluster are shown in Table 3. The number of edges between and inside clusters is calculated for the directed graph before the symmetrization of the adjacency matrix. The smallest ratio is for C1 to C2, what reveals the weak connection from C1 to C2. We see that the method is not capable of uncovering any strong internal structure in this directed graph.  (20) 29.600 2.1500

Gnutella peer to peer network
In the following, we consider the Gnutella peer to peer network with 6, 301 nodes ( [34]). Nodes represent hosts in the Gnutella network topology and edges represent connections between the Gnutella hosts. After removing sources the network has 6,179 nodes. In Figure 16, we show nonzero elements of the recurrence matrix. There are 674 zero eigenvalues and 3,836 one eigenvalues, which are 62.0% of the total number of nodes. In Figure 17, we show all non-zero eigenvalues of the matrix. We again see the structure similar to the Wikipedia network, but with even stronger indication of complexity indicated by the concentration of eigenvalues inside the disk of small radius. The eigenvector corresponding to the eigenvalue of about 0.5 has zero components for sinks and the same sign nonzero components for nodes that are not sinks. Complexity The complexity by using equation (1) is equal to 0.5638(0.2661 + 0.2977). The complexity for the random graph with the same number of nodes and average degree by using equation (1) is equal to 1.5522 (0.5976+0.9546). Thus, the complexity predicted by equation (1) for the Gnutella graph is about 36% of the value of complexity predicted by the same equation for the random graph, again indicating structure induced by a low-period cycle that we uncover next. Clustering There are 151 disjoint single nodes in the Gnutella graph which are not considered for clustering. The graph contains 3,960 sinks. The clustering algorithm found the generating eigenvalue −0.1054572 + 0.2470956i (see the circled eigenvalue in the figure 17). From the associated generating eigenvector we obtained three clusters: cluster C1 of 659 nodes, cluster C2 of 675 nodes, cluster C3 of 734 nodes. In Figure 18 we plot the ratio of the number of edges going from cluster X to cluster Y to the number of edges inside cluster X depending on the percentage of nodes in all clusters.
In the following, we select such percentage of nodes in all clusters so that the sum of three ratios, plotted as solid lines in Figure 18, is the maximum. We mark the found point of 6.0% with * on Figure 18. After removal of nodes that become disjoint when the clusters were reduced in size this percentage is 4.6. Ratio of C1−>C2 edges to C1−>C1 edges Ratio of C2−>C3 edges to C2−>C2 edges Ratio of C3−>C1 edges to C3−>C3 edges Ratio of C1−>C3 edges to C1−>C1 edges Ratio of C2−>C1 edges to C2−>C2 edges Ratio of C3−>C2 edges to C3−>C3 edges Figure 18: The ratio of the number of edges going from cluster X to cluster Y to the number of edges inside cluster X depending on the percentage of nodes in all clusters for Gnutella network. Cluster X can be cluster C1 or cluster C2 or cluster C3 and cluster Y can be cluster C1 or cluster C2 or cluster C3. The asterisk shows the point where the sum of three ratios plotted as solid lines is the maximum.
The obtained graph is shown in Figure 19, where nodes' numbers are numbers in the graph before removing sources. The number of nodes in each cluster and the ratio of the number of edges between clusters or inside the cluster to the number of nodes in the cluster are shown in Table 4. The average degree of this graph is 0.9263. As it can be seen from the table the biggest ratios are for C1 → C2, C2 → C3, C3 → C1.  The number of nodes in each cluster and the ratio of the number of edges between clusters or inside the cluster to the number of nodes in the cluster in the case of 100% are shown in Table 5. The average degree of this graph is 4.5034. As it can be seen from the table the biggest ratios are for C1 → C2, C2 → C3, C3 → C1, but the ratio between them and other elements of the matrix is smaller than in the 6% case.  Figure 19: Clustering for Gnutella peer to peer network with 4.6% of initial number of nodes in clusters C1, C2, C3. Nodes labels are nodes numbers in the network before removing sources. The nodes from cluster C1 are situated on light red background. The nodes from cluster C2 are situated on light green background. The nodes from cluster C3 are situated on light blue background. The edges going from cluster C1 are red, the edges going from cluster C2 are green, the edges going from cluster C3 are blue, the edges inside clusters are black. The clustering of the strongly connected component by using the Fiedler method gives cluster 1 of 1,878 nodes and cluster 2 of 190 nodes. The number of nodes in each cluster and the ratio of the number of edges between clusters or inside the cluster to the number of nodes in the cluster are shown in Table 6. The number of edges between and inside clusters is calculated for the directed graph before the symmetrization of the adjacency matrix. As it can be seen from the table the smallest ratio is for C1 to C2, what reveals the weak connection from C1 to C2. Again, the method fails to uncover the internal structure in the graph because the structure is of cycling type, and not of separate subgraph type. Table 6: The number of nodes in each cluster and the ratio of the number of edges between clusters or inside the cluster to the number of nodes in the cluster in Gnutella network (Fiedler method

Conclusions
In this work we proposed a new, spectral, measure of complexity of systems and an associated spectral clustering algorithm. This complexity measure (that we call spectral complexity) is based on the spectrum of the underlying interconnection graph of the subcomponents in the system. Spectral complexity is a natural engineering extension to software complexity measures developed in [9]. We find that compared to competing complexity measures (such as graph energy), spectral complexity is more appropriate for engineering systems. For example, one of its features is that the complexity monotonically increases with the average node degree. In addition, it properly accounts for structure and complexity features induced by cycles in a directed graph. Using the spectral complexity measure, comparison of complex engineered systems is enabled, potentially providing significant savings in design and testing. Spectral complexity also provides an intuitive approach for clustering directed graphs. It partitions the graph into subgroups that map into one another. Our partitioning shows a strong cycling structure even for complex networks such as Wikipedia and Gnutella, that the standard methodologies like the Fiedler vector partitioning do not provide.
Our methods are demonstrated on engineering systems, random graphs, Wikipedia and Gnutella examples. We find that the high and low spectral complexity architectures uncovered by our methods correspond to an engineer's intuition of a highly complex vs. a low complexity architecture. Namely, the low complexity of the engineered architecture is related to more layers in its horizontal-vertical decomposition [45,64] i.e. with a graph structure closer to acyclic.
It is of interest to note that the methods introduced here have proven to be of strong use in data-driven analysis of dynamical systems [65], which should make it possible to combine the introduced measure of complexity with measure of dynamic complexity for dynamical systems on networks.