Tree decompositions of real-world networks from simulated annealing

Decompositions of networks are useful not only for structural exploration. They also have implications and use in analysis and computational solution of processes (such as the Ising model, percolation, SIR model) running on a given network. Tree and branch decompositions considered here directly represent network structure as trees for recursive computation of network properties. Unlike coarse-graining approximations in terms of community structure or metapopulations, tree decompositions of sufficiently small width allow for exact results on equilibrium processes. Here we use simulated annealing to find tree decompositions of narrow width for a set of medium-size empirical networks. Rather than optimizing tree decompositions directly, we employ a search space constituted by so-called elimination orders being permutations on the network's node set. For each in a database of empirical networks with up to 1000 edges, we find a tree decomposition of low width.


Introduction
The analysis and modeling of complex systems involves complex computational tasks. In comparing empirical network structures with network models, for instance, one asks if they share common macroscopic behaviour under a percolation process or spin kinetics.
While theory of networks in complex systems has been enriched to include adaptive [1], temporally evolving [2,3], and multi-layer [4,5] structures, open questions remain also for processes on simple static structures. The Ising model in the macrocanonical ensemble, for instance, is fully captured by its density of states. Computing the exact density of states is not feasible in general. A subproblem of this task is to decide if there is a state at or above a given energy . Even this modest subproblem is equivalent to the NP-complete maximum cut problem on the given network [6,7].
Computational problems on networks are made more feasible when effective size is reduced by coarse-graining, e.g. [8]. Several nodes with similar neighborhoods forming a community [9] or block [10] are lumped together into one representative node. Similar coarse-graining is applied in metapopulation models for epidemic spreading [11,12] and opinion formation [13] where the nodes of the network are geographic locations rather than individuals. Coarse-graining amounts to lossy compression [14], however, so precision of results decreases with the amount of coarse-graining.
Here we advocate the study and development of exact methods [15,16] rather than approximations and heuristics. In parallel to the development of exact methods, their applicability is to be explored by finding the networks and classes of networks that allow for efficient computation.
Tree networks allow for particularly simple recursive methods [17] where the network itself is used as the recursion tree. For non-tree networks, the same recursion equations are applied as a tree-like heuristic [18,19,20,21]. Recent research generalizes message passing to account for cycles to a certain extent. Radicchi and Castellano introduce corrections due to triangles only [22]. Cantwell and Newman consider expansion in cycle length and use Monte-Carlo sampling [23] to deal with combinatorial explosion in the number of terms in the recursion [24].
Rather different generalizations of the tree property, termed tree decompositions and branch decompositions have been known in the area of combinatorial optimization and graph algorithms [25,26,27,28]. A network has tree-width k if it can be split into smaller subnetworks with overlaps of at most k nodes, where this property holds again for the subnetworks; the tree representation of this split recursion is then called a tree decomposition of width k. See Section 2 for the strict definition. Computations with running time O(f (n)) for a naive (e.g. exhaustively enumerative) algorithm on system size n, typically reduce to running time O(nf (k)) when exploiting tree-width k. For the density of states of the Ising model, this is a reduction from O(2 n ) to O(n2 k ), making exact computation feasible as long as there is a tree decomposition of sufficiently low width k.
This suggests a two-step process as a general modus operandi for solving a computational problem on a given network. (i) Find a tree decomposition of possibly low width. (ii) Solve the original problem with a method efficiently exploiting the low tree-width.
Step (i) may fail, either because the given network does not have a tree decomposition of sufficiently low width or because we are unable to find it. Finding tree decompositions of optimally low width is in itself an NP-hard problem [25]. Thus we resort to a heuristic such as a greedy algorithm or stochastic optimization in step (i) [29,27].
Here we demonstrate this two-step process for a collection of small and medium-size networks frequently employed as test structures. In step (i), we find tree decompositions of low width by simulated annealing [30]. Demonstrating step (ii), we use the tree decomposition and find the exact values of the networks' maximum Ising energy.

Tree decompositions
Let G = (V, E) be a graph with node set V and edge set E. Let B a family of non-empty subsets of V . Each such B ∈ B is called a bag, being B ⊆ V . Furthermore, let T be a set of edges (unordered tuples) on B, so that (B, T ) is a tree. Now this tree with node set B and edge set T is called a tree decomposition of G if additionally the following two conditions hold.
(i) For each edge {v, w} ∈ E, there is a bag B ∈ B so that x ∈ B and y ∈ B.
(ii) The set of bags containing v induces a connected subgraph of (B, T ).
In the literature (e.g. [27]), an additional requirement is that each node v ∈ V is contained in at least one bag. This makes a difference only when there are isolated nodes; a non-isolated node must anyway appear in a bag due to condition (i).
Each graph G = (V, E) trivially has a tree decomposition: simply take B = {V }, a single bag containing all nodes, and T = ∅. Tree decompositions of use for recursive computation, however, need to achieve small bag size. The width w ∞ of a tree decomposition (B, T ) is the size of a largest bag, reduced by 1, The tree-width of a graph G is the minimum of width over all tree decompositions of G. If G itself is a tree, G has a tree decomposition with a bag B = {v, w} for each edge {v, w} ∈ E. Thus a tree has tree-width 1.

Elimination order
Finding a tree decomposition of low width for a given graph is a hard problem in itself. Stochastic search methods like annealing and genetic algorithms are useful for this task. The application of these methods is made easier by encoding the search space [32] rather than operating on the tree decompositions themselves. Elimination order is one such encoding of tree decomposition, with the theory well explored [27]. Let us define the operation of elimination Λ. For graph G = (V, E) and node v ∈ V , elimination of v from G generates a graph Λ(G; v) by (i) adding all missing edges between neighbours of v, thus turning the neighbourhood of v into a clique and (ii) removing v and all its incident edges. Now let π = π 1 , π 2 , . . . , π n be an ordering (permutation) of the nodes of the An elimination order π for a graph G = (V, E) gives rise to a tree decomposition (B, T ) of G in the following way. The set of bags is B = {B i : i ∈ {1, . . . n}}. Now for each i, bag B i contains π i and its neighbours in G (i) . Note that π i may have neighbours in G (i) it does not have in G itself. For indices i < j, we have a tree edge {i, j} ∈ T if and only if j = min{k > i : π k ∈ B i }. This means that, for each i = n, bag B i is connected to exactly one bag with higher index, being the bag of the earliest (w.r.t. π) neighbour. Figure 1(b) illustrates elimination and generation of a tree decomposition from elimination order for a subgraph of the Karate club network (Figure 1(a)). (a) Karate club network [31] to illustrate (b) elimination order and the tree decomposition generated, and (c) Initial steps of merging and removal in the computation of maximum cut size. In (b), nodes are eliminated in the order (17,5,6,7,11,1) considering only the subgraph spanned by these six nodes (upper right part of panel (a)). Thick node outline and thick edges indicate part of the network removed in each step. Note that elimination of node 5 generates and extra edge {7, 11} due to clique completion.
Some model networks grow incrementally by attaching a new node to an existing clique of given size m [33,34,35]. In this case considering the nodes in reverse order of addition yields a perfect elimination order whose application does not involve edge addition. In each elimination step, the neighborhood of the eliminated vertex is a clique already by construction. Graphs with a perfect elimination order are chordal, exhibiting further useful features [36].

Simulated annealing
We use a standard setting for simulated annealing [30] as a Metropolis Markov chain [23] with a slow decrease of temperature. We initialize the elimination order π with a permutation drawn from the uniform distribution of permutations on the node set. In order to generate a proposal π from π, select an index i ∈ {1, . . . , n − 1} uniformly at random and swap elements at positions i and i + 1: The proposal is accepted with probability where the cost function r assigns each elimination order a real number, see section .
The inverse temperature β depends on time t as β(t) = ct with cooling speed c as parameter. Time is measured in sweeps, so the clock advances by ∆t = 1/(n − 1) in every step of proposal and acceptance.

Cost functions
A large part of work on tree decompositions focuses on minimization of tree width [27], which translates to stochastic optimization with a cost function w ∞ (π) on elimination order π. We know of the following two problems that motivate the use of cost functions different from w ∞ (π).
(i) As a maximum over all bags, w ∞ (π) only varies at moves that involve the largest bags of the tree decomposition. This renders the cost of a solution equal to that of most of its neighbouring solutions, thus lacking local gradient information in the search space.
(ii) While the time needed for a computational task on a network has an upper bound in terms of tree width, the time actually depends on all bag sizes s 1 , s 2 , . . . s n of the tree decomposition, typically as n i=1 2 s i for spin systems. Clautiaux et al. [37] propose a solution to problem 1 above in terms of a modified cost function The additional sum accounts for the sizes of all bags while the dominating term remains w ∞ . We introduce the cost function with a parameter η > 1. This is motivated by both items 1 and 2 above. For spin systems with computation time proportional to 2 s i for bag i, the parameter value eta = 2 is to be chosen. Note that w η has the tree-width w ∞ as a limit η → ∞ if there is a unique maximum bag size.

Solving maximum cut using an elimination order
Given a network G = (V, E), the maximum cut problem asks to find the largest bipartite subgraph of G. In other words, we are looking for a partition of the node set V into disjoint subsets V 1 and V 2 to maximize the number of edges running between V 1 and V 2 . Assigning set membership in V 1 or V 2 is equivalent to choosing a spin value σ v ∈ {−1/2, +1/2} for each node v ∈ V . Then a solution of the max-cut problem is a spin configuration (σ v ) v∈V that maximizes the number of edges between nodes with unequal spins, This amounts to finding the ground state of the spin glass with Hamiltonian H where each edge is an antiferromagnetic bond of unit weight. Equivalently we look for a maximum energy state of the Ising (all ferromagnetic) model on the same network.
The maximum cut problem is NP-hard [7]. Therefore we do not know a general solver working in time polynomial in the size of the network.
In the following, we describe an algorithm to solve the maximum cut problem using an elimination order π. During each phase of the computation, F contains a set of functions storing partial results on cut sizes. Each function f ∈ F is based on a subset V f of the node set. The domain of the function is the set {−1/2, +1/2} V f of all spin configurations on V f . Each such spin configuration is assigned the maximum cut size possible.
Initially, F contains one function f for each edge {v, w} ∈ V of the network. There are 2 × 2 = 4 spin configurations on the nodes v and w. If σ v = σ w , then f (σ v , σ w ) = 1; otherwise f (σ v , σ w ) = 0. Now the computation proceeds in a loop, running loop index i from 1 to n = |V | and using the given elimination order π = (pi 1 , pi 2 , . . . , π n ). In each round of the loop (i) Find all functions f ∈ F with π i ∈ V f and replace them by the merged function f .
See below for details of merging.
(ii) Remove node π i from the function f generated at step 1. See below for details of node removal.
Note that merging in step 1 and removal in step 2 do not commute. Before removing a node r, all functions containing r need to be merged into a single one.
Merging functions. Two functions f, g ∈ F are merged into function h having the joint This describes the merging of two functions. For merging more than two functions, as happens in general at step 1 above, merging is applied several times. The operation is commutative, i.e. the result does not depend on the order of merging.  Removing nodes from functions. For a function f ∈ F and node r ∈ V f , the removal of r from f generates a function f ↓r with V f ↓r = V f \ {r}. For each spin configuration σ = (σ v ) v∈V f ↓r , we have Independence from the variable σ r is thus obtained by taking the maximum of the edge count with respect to the two spin orientations at node r. Figure 1(c) illustrates the steps of merging and removal.

Network data
We use the set of networks compiled by Radicchi and Castellano [22] restricting it to those 15 with less than 1000 edges. See Table 1.

Results
As a first step in exploring elimination orders by simulated annealing, we compare the effects of the choice of cost function, see also Section 2.4. Figure 2 is a scatter plot of Table 1.
Networks considered and results obtained. Next to network name and reference, columns w 2 and w ∞ give the lowest values found by . simulated annealing. For each network, minima of w 2 and w ∞ are taken from 10 6 sweeps of 80 independent runs, 10 runs for each of cooling speed value c ∈ {10 −5 , 3 × 10 −4 , 10 −4 , . . . , 3 × 10 −2 }. The rightmost column is the exact number b of edges in a maximum cut (edge-maximal bipartite subgraph) on the network. Networks have |V | nodes and |E| edges.

Network
ref. minimal w 2 values obtained in simulated annealing runs. Better performance is obtained with optimization under cost function w 2 as compared to cost function m. The following optimization runs on all networks in the data set are thus performed with cost function w 2 only. Table 1 is the overview of results for all 15 networks considered. For 13 out of these, we find tree decompositions with w 2 < 10 and w ∞ ≤ 13. For the networks College football and David Copperfield, the best tree decompositions found are wider.
The tree decompositions obtained by optimization enable us to calculate the exact maximum cut size for 14 out of the 15 networks. For the network College football with the tree decomposition of width 34, the computation runs out of memory on the machine currently used. The fraction of edges b/|E| contained in a maximum cut is between 0.75 and 0.80 for most of the networks. Exceptions are the electronic circuits S 208, S 420, and S 838 with b/|E| ≈ 0.888, confirming these networks to be close to bipartite [40].

Discussion
Originating in computer science and discrete mathematics, the concept of tree decompositions helps to exploit sparseness and tree-like structure in the analysis of network systems. The purpose of this contribution is an exposition of tree decompositions from the perspective of complex systems in physics. There are three results. (i) A simple method for finding tree decompositions by simulated annealing, tested on real networks; (ii) the resulting tree decompositions, encoded as node elimination orders, see the supplement [46]; and (iii) an example of an exact computation (maximum cut) made feasible by the use of tree decompositions.
Resulting from simulated annealing, the best tree decompositions we found are not necessarily optimal ones yet. In fact, the annealing method may be improved to become more suitable for the particular search space, e.g. with an adaptive cooling schedule [47]. Sometimes pictured as simulated annealing with ongoing restart, parallel tempering [48] may lead to better tree decompositions. Another ingredient for the method is to also compute a lower bound on tree-width [28]. Then the search for a good tree decomposition can be stopped once we are sufficiently close to the lower bound.
Tree decompositions enable us to efficiently analyze results for networks with respect to processes running on them. The method for the maximum-cut size and maximum Ising energy presented here is the basis for obtaining the exact partition function and complete density of states for Ising and Potts models [49]. For models of epidemic spreading and percolation [50], tree decompositions support the computation of cluster size distributions and epidemic thresholds as well [51,52].