Density decompositions of networks

We introduce a new topological descriptor of a network called the density decomposition which is a partition of the nodes of a network into regions of uniform density. The decomposition we define is unique in the sense that a given network has exactly one density decomposition. The number of nodes in each partition defines a density distribution which we find is measurably similar to the degree distribution of given real networks (social, internet, etc.) and measurably dissimilar in synthetic networks (preferential attachment, small world, etc.). We also show how to build networks having given density distributions, which gives us further insight into the structure of real networks.


Introduction
A better understanding of the topological properties of real networks can be advantageous for two major reasons. First, knowing that a network has certain properties, e.g., bounded degree or planarity, can sometimes allow for the design of more efficient algorithms for extracting information about the network or for the design of more efficient distributed protocols to run on the network. Second, it can lead to methods for synthesizing artificial networks that more correctly match the properties of real networks thus allowing for more accurate predictions of future growth of the network and more accurate simulations of distributed protocols running on such a network.
We show that networks decompose naturally into regions of uniform density, a density decomposition. The decomposition we define is unique in the sense that a given network has exactly one density decomposition. The number of nodes in each region defines a distribution of the nodes according to the density of the region to which they belong, that is, a density distribution (Section 2). Although density is closely related to degree, we find that the density distribution of a particular network is not necessarily similar to the degree distribution of that network. For example, in many synthetic networks, such as those generated by popular network models (e.g. preferential attachment and small worlds), the density distribution is very different from the degree distribution (Section 3.1). On the other hand, for all of the real networks (social, internet, etc.) in our data set, the density and degree distributions are measurably similar (Section 3). Similar conclusions can be drawn using the notion of k-cores [30], but this suffers from some drawbacks which we discuss in Section 2.3.

Related work
We obtain the density decomposition of a given undirected network by first orienting the edges of this network in an egalitarian 4 manner. Then we partition the nodes based on their indegree and connectivity in this orientation.
Fair orientations have been studied frequently in the past. These orientations are motivated by many problems. One such motivating problem is the following telecommunications network problem: Source-sink pairs (s i , t i ) are linked by a directed s i -to-t i path c i (called a circuit). When an edge of the network fails, all circuits using that edge fail and must be rerouted. For each failed circuit, the responsibility for finding an alternate path is assigned to either the source or sink corresponding to that circuit. To limit the rerouting load of any vertex, it is desirable to minimize the maximum number of circuits for which any vertex is responsible. Venkateswaran models this problem with an undirected graph whose vertices are the sources and sinks and whose edges are the circuits. He assigns the responsibility of a circuit's potential failure by orienting the edge to either the source or the sink of this circuit. Minimizing the maximum number of circuits for which any vertex is responsible can thus be achieved by finding an orientation that minimizes the maximum indegree of any vertex. Venkateswaran shows how to find such an orientation [32]. Asahiro, Miyano, Ono, and Zenmyo consider the edge-weighted version of this problem [3]. They give a combinatorial { wmax wmin , (2 − )}-approximation algorithm where w max and w min are the maximum and minimum weights of edges respectively, and is a constant which depends on the input [3]. Klostermeyer considers the problem of reorienting edges (rather than whole paths) so as to create graphs with given properties, such as strongly connected graphs and acyclic graphs [19]. De Fraysseix and de Mendez show that they can find an indegree assignment of the vertices given a particular properties [13]. Biedl, Chan, Ganjali, Hajiaghayi, and Wood give a 13 8 -approximation algorithm for finding an ordering of the vertices such that for each vertex v, the neighbors of v are as evenly distributed to the right and left of v as possible [8]. For the purpose of deadlock prevention [34], Wittorff describes a heuristic for finding an acyclic orientation that minimizes the sum over all vertices of the function δ(v) choose 2, where δ(v) is the indegree of vertex v. This objective function is motivated by a problem concerned with resolving deadlocks in communications networks [35].
In our work we show that the density decomposition can isolate the densest subgraph. The densest subgraph problem has been studied a great deal. Goldberg gives an algorithm to find the densest subgraph in polynomial time using network flow techniques [16]. There is a 2-approximation for this problem that runs in linear time [10]. As a consequence of our decomposition, we find a subgraph that has density no less than the density of the densest subgraph less one. There are algorithms to find dense subgraphs in the streaming model [4,15]. There are algorithms that find all densest subgraphs in a graph (there could be many such subgraphs) [29].
We consider many varied real networks in our study of the density decomposition. We find our results to be consistent across biological, technical, and social networks.

The density decomposition
In order to obtain the density decomposition of a given undirected network we first orient the edges of this network in an egalitarian manner. Then we partition the nodes based on their indegree and connectivity in this orientation.
The following procedure, the Path-Reversal algorithm, finds an egalitarian orientation [9]. A reversible path is a directed path from a node v to a node u such that the indegree of v, δ(v), is at least greater than the indegree of u plus one: δ(v) > δ(u) + 1 Arbitrarily orient the edges of the network. While there is a reversible path reverse this path.
Since we are only reversing paths between nodes with differences in indegree of at least 2, this procedure converges; the running time of this algorithm is quadratic [9]. The orientation resulting from this procedure suggests a hierarchical decomposition of its nodes: Let k be the maximum indegree in an egalitarian orientation.
Ring k (R k ) contains all nodes of indegree k and all nodes that reach them. Iteratively, given R k , R k−1 , . . . , and R i+1 , R i contains all the remaining nodes with indegree i along with all the remaining nodes that reach them.
By the termination condition of the above procedure, only nodes of indegree k or k − 1 are in R k . Further, nodes in R i must have indegree i or i − 1. By this definition, an edge between a node in R i and a node in R j is directed from R i to R j when i > j and all the isolated nodes are in R 0 . The running time to give this decomposition is bounded by the running time to find an egalitarian orientation, O(|E| 2 ).
Density can be defined in two ways: either as the ratio of number of edges to number of nodes ( |E| |V | ) or as the ratio of number of edges to total number of possible edges ( 2|E| |V |(|V |−1) ). In this discussion we use the former definition. This definition of density is closely related to node degree (the number of edges adjacent to a given node): the density of a network is equal to half the average total degree.
We identify a set S of nodes in a graph by merging all the nodes in S into a single node s and removing any self-loops (corresponding to edges of the graph both of whose endpoints were in S). Our partition R k , R k−1 , . . . , R 0 induces regions of uniform density in the following sense: Density Property For any i = 0, . . . , k, identifying the nodes in ∪ j>i R j and deleting the nodes in ∪ j<i R j leaves a network G whose density is in the range (i−1, i] (for |R i | sufficiently large).
In particular, R k isolates a densest region in the network. Consider the network G i formed by identifying the nodes ∪ j>i R j and deleting the nodes in ∪ j<i R j ; this network has one node (resulting from identifying the nodes ∪ j>i R j ) of indegree 0 and |R i | nodes of indegree i of i − 1, at least one of which must have indegree i. Therefore, for any i, the density of G i is at most i and density at least In Section 2.1, we observe that this relationship between density and this decomposition is much stronger.

Density and the Density Decomposition
In this section we discuss the following three properties: Property D1 The density of a densest subnetwork is at most k. That is, there is no denser region R j for j > k. Property D2 The density decomposition of a network is unique and does not depend on the starting orientation. Property D3 Every densest subnetwork contains only nodes of R k .
These properties allow us to unequivocally describe the density structure of a network. We summarize the density decomposition by the density distribution: i.e. the number of nodes in each region of uniform density. We will refer to a node in R i as having density rank i.
The subnetwork of a network G induced by a subset S of the nodes of G is defined as the set of nodes S and the subset of edges of G whose endpoints are both in S; we denote this by G[S]. First we will note that both the densest subnetwork and the subnetwork induced by the nodes of highest rank have density between k − 1 and k. Recall that k is the maximum indegree of a node in an egalitarian orientation of G and that R i is the set of nodes in the i th ring of the density decomposition. We will refer to R k as the densest ring.
We use the following two lemmas to prove Property D1.

Lemma 1. The density of the subnetwork induced by the nodes in
We could prove Lemma 1 directly with a simple counting argument on the indegrees of nodes in R k or by using a network flow construction similar to Goldberg's and the max flow-min cut theorem [16]. The upper bound given in Lemma 2 may be proven directly by using a counting argument for the indegrees of vertices in an egalitarian orientation of the densest subnetwork or by using the relationship between the density of the maximum density subgraph and the psuedoarboricity [20].
This upper bound proves Property D1 of the density decomposition. Property D1 has been proven in another context. It follows from a theorem of Frank and Gyárfás [12] that if is the maximum outdegree in an orientation that minimizes the maximum outdegree then the density of the network, d, is such that d ≤ .
Corollary 1. The subgraph induced by the nodes of R k is at least as dense as the density of the densest subgraph less one.
Note that the partition of the rings does not rely on the initial orientation, or, more strongly, nodes are uniquely partitioned into rings, giving Property D2.

Theorem 1. The density decomposition is unique.
We can prove this by noting that the maximum indegree of two egalitarian orientations for a given network is the same [9,3,32]. For a contradiction, we consider two different egalitarian orientations of the same graph that yield two distinct density decompositions. We then compare corresponding rings in each orientation and find that they are in fact the same.
The following theorem relies on the fact that the density decomposition is unique and proves Property D3.
Theorem 2. The densest subnetwork of a network G is induced by a subset of the nodes in the densest ring of G.
We could prove Theorem 2 directly by comparing the density of the subgraph induced by the vertices in the densest subgraph intersected with the vertices in R k and the density of the densest subgraph. Or we could use integer parameterized max flow techniques [3,14].
Note that there are indeed cases where the densest subgraph is induced by a strict subset of nodes in the top ring. For example, consider the graph, G, consisting of K 3 and K 4 with a single edge connecting the two cliques. K 4 is the densest subgraph in G, however all of G is contained in the top ring (R 2 ).

Interpretation of density rank
We can interpret orientations as assigning responsibility: if an edge is oriented from node a to node b, we can view node b as being responsible for that connection. Indeed several allocation problems are modelled this way [9,2,32,3,17]. Put another way, we can view a node as wishing to shirk as many of its duties (modelled by incident edges) by assigning these duties to its neighbors (by orienting the linking edge away from itself). Of course, every node wishes to shirk as many of its duties as possible. However, the topology of the network may prevent a node from shirking too many of its duties. In fact, the egalitarian orientation is the assignment in which every node is allowed to simultaneously shirk as many duties as allowed by the topology of the network. For example, consider two situations in which a node has degree 7, in the first situation, a is the center of the star network with 8 nodes, in the second situation, b is a node in the clique on 8 nodes. Although nodes a and b both have degree 7, in the star network a can shirk all of its duties, but in the clique network b can only shirk half of its duties. There is a clear difference between these two cases that is captured by the density rank of a and b that is not captured by the degree of a and b. For example, if these were co-authorship networks, the star network may represent a network in which author a only co-authors papers with authors who never work with anyone else whereas the clique network shows that author b co-authors with authors who also collaborate with others. One may surmise that the work of author b is more reliable or respected than the work of author a.
Theorem 3. For a clique on n nodes, there is an orientation where each node has indegree either n/2 or n/2 − 1.
A proof for Theorem 3 can be given by construction of such an egalitarian orientation or by using a non-linear programming approach [26].

Relationship to k-cores
A k-core of a network is the maximal subnetwork whose nodes all have degree at least k [30]. A k-core is found by repeatedly deleting nodes of degree less than k while possible. For increasing values of k, the k-cores form a nesting hierarchy (akin to our density decomposition) of subnetworks H 0 , H 1 , . . . , H p where H i is an i-core and p is the smallest integer such that G has an empty (p + 1)-core. For networks generated by the G n,p model, most nodes are in the p-core [23,27] For the preferential attachment model, all nodes except the initial nodes belong to the c-core, where c is the number of edges connecting to each new node [1].
These observations are similar to those we find for the density distribution (Section 3) and many of the observations we make regarding the similarity of the degree and density distributions of real networks also hold for k-core decompositions [24]. However, the local definition of cores (depending only on the degree of a node) provides a much looser connection to density than the density decomposition, as we make formal in Lemma 3.
The density of the top core may be less then the density of the top ring. Also, there are graphs for which the densest subgraph is not contained in the top core.    421 nodes and 175,692 edges). In the DBLP network, nodes represent computer scientists and two computer scientists are connected if they have at least one co-authored paper [37] (317,080 nodes and 1,049,866 edges). The (truncated) normalized density and degree distributions are displayed. The degree distributions have long diminishing tails. AS 2013 has 67 non-empty rings, but rings 31 through 66 contain less than 1.5% of the nodes; ring 67 contains 0.75% of the nodes. DBLP has 4 non-empty rings denser than ring 30 that are disconnected; rings 32, 40, 52 and 58 contain 0.02%, 0.01%, 0.03% and 0.04% of the nodes, respectively.

The similarity of degree and density distributions
The normalized density ρ and degree δ distributions for three networks (AS 2013, PHYS 2005, and DBLP) are given in Figure 1, illustrating the similarity of the distributions. We quantify the similarity between the density and degree distributions of these networks using the Bhattacharyya coefficient, β [7]. For two normalized p and q, the Bhattacharyya coefficient is: β(p, q) ∈ [0, 1] for normalized, positive distributions; β(p, q) = 0 if and only if p and q are disjoint; β(p, q) = 1 if and only if p = q. We denote the Bhattacharyya coefficient comparing the normalized density ρ and degree δ distribu-tions, β(ρ, δ) for a network G by β ρδ (G). Specifically, where ρ i is the fraction of nodes in the i th ring of the density decomposition of G and δ i is the fraction of nodes of total degree i in G; we take ρ i = 0 for i > k where k is the maximum ring index. Refer to Figure 2. For all the networks in our data set, β ρδ > 0.78. Note that if we exclude the Gnutella and Amazon networks, β ρδ > 0.9. We point out that the other networks are self-determining in that each relationship is determined by at least one of the parties involved. On the other hand, the Gnutella network is highly structured and designed and the Amazon network is a is a one-mode projection of the buyer-product network (which is in turn self-determining).
Perhaps this is not surprising, given the close relationship between density and degree; one may posit that the density distribution ρ simply bins the degree distribution δ. However, note that a node's degree is its total degree in the undirected graph, whereas a node's rank is within one of its indegree in an egalitarian orientation. Since the total indegree to be shared amongst all the nodes is half the total degree of the network, we might assume that, if the density distribution is a binning of the degree distribution, the density rank of a node of degree d would be roughly d/2. That is, we may expect that the density distribution is halved in range and doubled in magnitude (ρ i ≈ 2δ 2i ). If this is the case, then If we additionally assume that our network has a power-law degree distribution such as δ x ∝ 1/x 3 , (after normalizing the distributions and using a continuous approximation of β). Even with these idealized assumptions, this does not come close to explaining β ρδ being in excess of 0.78 for the networks in our data set. Further, for many synthetic networks β ρδ is small, as we discuss in the next section. We note that this separation between similarities of density and degree distributions for the empirical networks and synthetic networks can be illustrated with almost any divergence or similarity measure for a pair of distributions.

The dissimilarity of degree and density distributions of random networks
In contrast to the measurably similar degree and density distributions of real networks, the degree and density distributions are measurably dissimilar for networks produced by many common random network models; including the  preferential attachment (PA) model of Barabasi and Albert [6] and the small world (SW) model of Watts and Strogatz [33]. We useβ ρδ (M ) to denote the Bhattacharyya coefficient comparing the expected degree and density distributions of a network generated by a model M .
Preferential attachment networks In the PA model, a small number, n 0 , of nodes seed the network and nodes are added iteratively, each attaching to a fixed number, c, of existing nodes. Consider the orientation where each added edge is directed toward the newly added node; in the resulting orientation, all but the n 0 seed nodes have indegree c and the maximum indegree is c. At most cn 0 path reversals will make this orientation egalitarian, and, since cn 0 is typically very small compared to n (the total number of nodes), most of the nodes will remain in the densest ring R c . Therefore PA networks have nearly-trivial density distributions: ρ c ≈ 1. On the other hand the expected fraction of degree c nodes is 2/(c + 2) [5]. Thereforeβ ρδ (PA) ≈ 2/(c + 2).
Small-world networks A small-world network is one generated from a d-regular networkby reconnecting (uniformly at random) at least one endpoint of every edge with some probability. For probabilities close to 0, a network generated in this way is close to d-regular; for probabilities close to 1, a network generated this way approaches one generated by the random-network model (G n,p ) of Erdös and Rényi [11]. In the first extreme,β ρδ (SW ) = 0 (Lemma 4 below) because all the nodes have the same degree and the same rank. As the reconnection probability increases, nodes are not very likely to change rank while the degree distribution spreads slightly. In the second extreme, the highest rank of a node is c/2 + 1 [36] and, using an observation of the expected size of the densest subnetwork 5 , with high probability nearly all the nodes have this rank. It follows thatβ ρδ (G n,p ) ≈ c c/2 e −c (c/2)! , which approaches 0 very quickly as c grows. We verified this experimentally finding thatβ ρδ (G n,p ) < 0.5 for c ≥ 5. We can prove Lemma 4 by showing that ρ d = 0 since δ d = 1 for regular networks.