A fast and robust kernel optimization method for core–periphery detection in directed and weighted graphs

Many graph mining tasks can be viewed as classification problems on high dimensional data. Within this class we consider the issue of discovering core-periphery structure, which has wide applications in the economic and social sciences. In contrast to many current approaches, we allow for weighted and directed edges and we do not assume that the overall network is connected. Our approach extends recent work on a relevant relaxed nonlinear optimization problem. In the directed, weighted setting, we derive and analyze a globally convergent iterative algorithm. We also relate the algorithm to a maximum likelihood reordering problem on an appropriate core-periphery random graph model. We illustrate the effectiveness of the new algorithm on a large scale directed email network.


Introduction
Graph theory gives a common framework for formulating and tackling a range of problems arising in data science. Many such tasks can be viewed in terms of categorizing nodes or discovering hidden substructures that relate them. Clustering, or community detection, is perhaps the most widely studied problem, and it forms the basis of many classification algorithms [1]. In this work we study the different, but closely related, issue of identifying core-periphery structure; we seek a set of nodes that are highly connected internally and with the rest of the network, forming the core, and a set of peripheral nodes that are strongly connected to the core but have only sparse internal connections. This kind of structure is important for a number of reasons. For example, identifying core-periphery structures can help in identifying and categorizing hubs, i.e., well-connected nodes. As noted in [2], such nodes often occur in real-world networks. This is an issue for some community detection methods, as hubs tend to be connected to many different communities and, thus, can be awkward to classify. Moreover, the set of core nodes can be used to identify internally cohesive subgraphs of highly central nodes. In fact, even though all core nodes typically have high centrality score, not all nodes with high centrality measures belong to the core and it is possible to find sparsely connected subgraphs of central nodes not belonging to the core [3].
The concept of the network core-periphery is closely related to the idea of richclubs, nested networks and onion network structures [4,5,6]. In particular, a number of core-defining algorithms have been proposed in recent years, e.g., [7,8,9], following the seminal work by Borgatti and Everett [3]. Core-periphery structure has been detected and interpreted in many complex systems, including protein-protein interaction networks [10], metabolic and gene regulatory networks [11], social networks [3,12], engineered networks (such as the Internet, power-grids or transportation networks) [9], and economic networks [13]. See also the review [14].
From a computational perspective, several recent works provide algorithms that apply to undirected networks. In particular, we have introduced in [9] a scalable nonlinear optimization method with global quality guarantees for core-periphery detection in binary, undirected and connected graphs. This method exploits an intriguing connection between optimization and nonlinear eigenproblems and allows for a fast and easily implementable iteration which guarantees to compute the global maximum of a highly nonconvex core-score quality function.
In this work we consider the core-periphery concept in the more general setting of directed, weighted and possibly disconnected networks and we extend the results of [9], both in terms of the algorithms and of the theoretical analysis, to this more challenging case.
In our directed case, we use the concept that a set of nodes forms a core if there are many core-to-core, core-to-periphery and periphery-to-core edges, with few periphery-to-periphery edges. Although the ideal core-periphery subdivision defines two well distinguished sets of nodes, in practice one often looks for a corescore vector u ≥ 0 such that a smaller value u i indicates that node i is more peripheral. Such an assignment may be viewed as a type of node centrality measure [15]. Indeed the classic cases of degree centrality and eigenvector centrality have been proposed and tested in this context [3,8,9,16], and, as we explain in Section 4, the approach we propose here may be viewed as a nonlinear generalization of both these cases.
The manuscript is organized as follows. Section 2 introduces some relevant notation. In Section 3 we express the core-periphery detection problem in terms of kernel-based optimization and in Section 4 we connect this idea with classical node centrality measures. Section 5 shows that another viewpoint is also relevant; the approach may be viewed as maximum likelhood reordering under a new random graph model that generates directed core-periphery structure. In Section 6 we study the nonlinear optimization problem and show that it may be solved via an inexpensive and globally convergent iteration. Section 7 illustrates the performance of the algorithm on a large scale email dataset and Section 8 gives some conclusions.

Notation
We consider directed and possibly weighted graphs G = (V, E) with node set V = {1, . . . , n} and adjacency matrix A = (A ij ).
If node i does not point to node j then the entry a ij is zero. Otherwise, a ij takes a positive value, accounting for the strength of the directional tie from i to j.
We let 1 denote the column vector in R n with all values equal to one, and define the in and out degree vectors as d in = A1 and d out = A T 1, respectively. Operations on and between vectors are to be interpreted in a componentwise sense, so that, for example, x p−1 has ith component given by x p−1 i and x p−1 y has ith component given by x p−1 i y i . Inequalities involving vectors and matrices are also to be interpreted componentwise, so that, for example, A ≥ 0 means A ij ≥ 0 for all i and j.

Core-periphery via functional kernel optimization
To search for the presence of a core and periphery we define a core-score vector, that is, a nonnegative vector u quantifying the coreness of the nodes, where u i > u j indicates that node i is closer to the core than node j. We define our core-score vector as the solution to the following nonconvex and constrained core-periphery quality function maximization problem where, for some fixed real number α ∈ R to be chosen, f α is the core-quality function Note that, since only relative values are important, a constraint of the form x = 1 is very natural. However, there is no reason at this stage to prefer a particular norm over another. Therefore, we assume for now that · is any vector norm and consider the problem (1) in this general setting.
For x, y ∈ R, the kernel κ α (x, y) is the generalized (or Binomial) mean of the two nonnegative numbers |x| and |y|. The case α → ∞ is particularly well-suited for core-periphery purposes. If x is a nonnegative vector, we have and thus any nonnegative vector x for which f ∞ (x) is large assumes a necessarily large value on the entries involving the nodes in the core and smaller values within the periphery. In fact, when · denotes a p-norm, any vector x ≥ 0, x = 1 such that f ∞ (x) is large assigns to each node a value x i between zero and one so that each connection between two nodes i, j in the graph or, equivalently, each nonzero in the weight matrix A, involves at least one node such that x i is large. We note that the relevance of f ∞ (x) as a core-periphery quality function is highlighted for example in [8], and the relaxed version (1) involving α was considered in [9] for undirected graphs.

Connection with degree and eigenvector centralities
In the undirected case, A = A T , it has been argued that both the degree vector and the eigenvector (or Bonacich) centrality vector carry interesting core-periphery information and are good candidates for core score vectors [3,8,9,16]. In this section we show that when α = 1 or α = 0 the problem (1) admits an explicit solution that, even when the graph is directed, boils down to the degree and the eigenvector centrality, respectively.
When α = 1 the function f α (x) is linear, taking the form of the scalar product As both the matrix A + A T and the vector x have nonnegative entries, using the Cauchy-Schwarz inequality, we have and the inequality is always strict unless x is a multiple of (A Therefore, if we choose the norm constraint in (1) to be x 2 = 1, we have that This shows that the solution of (1) reduces to the degree vector when the graph is undirected and coincides with the sum of the incoming and outgoing degree vectors in the general case. It is well-known that when α → 0, κ α (x, y) converges to the geometric mean of |x| and |y|. Thus, when α = 0 and x ≥ 0, we have As x → √ x is bijective on the set of vectors with nonnegative entries, we can change variable y = √ x in (1) and recast the problem (1) as max y≥0 ij A ij y i y j = y T Ay, subject to y 2 = 1.
Again, if we consider the 1-norm, we can write the constraint y 2 1 = 1 as y 2 1 = y T y = 1 and problem (1) becomes By the Perron-Frobenius theorem [17], the Rayleigh quotient y T Ay/y T y has a unique nonnegative maximizer c, which coincides with the Perron eigenvector of the nonnegative matrix A + A T . In other words, the core score that maximizes f 0 is the vector u = c 2 , where c is the eigenvector centrality of the symmetrized network with weight matrix A + A T .
In Section 6 we show that both these two cases are actually a special case of a more general setting. We prove that for any α ≥ 0 the solution of (1) is the Perron eigenvector of a nonlinear core-periphery operator. In particular, this implies that the solution u to (1) is unique for any α ≥ 0, with an appropriate normalization. Moreover, using nonlinear Perron-Frobenius theory, this further allows us to introduce an iterative algorithm that computes u, with global convergence guarantees.

Logistic core-periphery random model for directed graphs
We introduced in [9] a random graph model for undirected and unweighted graphs that can be used to artificially generate networks with a planted core-periphery structure. This model, unlike more classical block-based versions, is based on the logistic sigmoid function 1/(1 + e −x ) rather than a Heaviside step function and allows a smooth transition between the set of core nodes and the set of peripheral ones. We refer to it as the logistic core-periphery random model. We notice that similar logistic function based random models have been considered in [18,19,20].
Here we extend the model to the case of directed graphs and we prove that the method of maximum likelihood applied to this random model coincides with the core-periphery quality function maximization problem (1), which provides the core-periphery analogue of a known phenomenon for stochastic block models in the community detection case [21].
Consider a core-ranking assignment, that is, a nonnegative permutation vector π that assigns a distinct integer π i between 1 and n to each vertex i. The closer π i is to 1, the higher the rank of i as a member of the core. For convenience, we shiftand-scale the core-ranking vectors via the affine transform u → 1 − π/n. Hence, we consider the set so that, similarly to a core-score assignment, u ∈ CR(n) has values in [0, 1] and larger values of u correspond to higher positions in the core ranking. Now, given u ∈ CR(n), the logistic core-periphery random model generates an edge from node i to node j with independent probability given by Note that for α 2 ≥ α 1 ≥ 0 we have Thus, for any α ≥ 0, the probability p ij (u) tends to be large if at least one of the nodes i and j has a high core rank and this effect increases as α grows, as shown by Figure 1. Suppose we are given a network with the nodes in arbitrary order and wish to find the best core ranking assignment based on the logistic random model (3). From a maximum likelihood perspective, this corresponds to maximizing the log-likelihood among all possible u ∈ CR(n). In other words, assuming that the given network is a sample from the logistic core-periphery random model (3) with the node labels shuffled arbitrarily, this is most likely to be the correct reordering. A key observation here is that, for ranking vectors u ∈ CR(n), this maximum likelihood approach is equivalent to assigning a core-score to the nodes which maximizes a logistic core quality function f α . In fact, the following extension of Theorem 3.1 in [9] holds. This equivalence provides further justification for the kernel optimization approach. It also suggests that the logistic core-periphery random model (3) is a useful resource for testing core-periphery detection algorithms in this directed setting.
We also note that a closely related generative random graph model for coreperiphery networks was proposed in [20]. That work focused on the undirected case and aimed to incorporate additionally available spatial information.
6 Core-periphery nonlinear operator A study of the Hessian of f α reveals that f α is neither convex nor concave in general. This makes the solution of (1) particularly challenging. However, here we show that this optimization problem can be re-cast in terms of the Perron eigenvector of a nonlinear operator. We then show how its solution is always achievable via a generalization of the classical power method from numerical linear algebra.
Given α ≥ 1, consider the nonlinear core-periphery operator Φ α : R n → R n , entrywise defined as follows Given p > 1, we consider the following nonlinear eigenvalue problem for Φ α One easily realizes that Φ α is linear if and only if α = 1 in which case that operator degenerates into the map such that Φ 1 (x) = d in +d out , for any nonnegative vector x ≥ 0. In this setting it is easily seen that the only nonnegative solution of (5) is x = d in + d out with λ = 1 and p = 2. Combined with the discussion of Section 4, this shows that for the case α = 1 the unique nonnegative solution of the eigenvalue problem (5) coincides with the maximizer of (1). We can retrieve the same analogy for α → 0 and p = 1. In that case we have lim Arguing as in Section 4, again, we deduce that for α = 0 and p = 1 a nonnegative solution of the eigenvalue problem (5) coincides with a maximizer of (1). When α = 0, 1 the question of existence and uniqueness of a solution to (5) is less trivial. The following theorem gives a full answer and shows that the same one-to-one correspondence between (5) and (1) holds.
Theorem 2 Let α ≥ 0 and p > max{1, α}. Then the eigenvalue equation (5) has a unique nonnegative solution u ≥ 0 such that u p := (|u 1 | p + · · · + |u n | p ) 1/p = 1 which is also the unique solution of (1), provided that · = · p . Moreover u is positive if and only if the network has no isolated nodes, i.e., all nodes have at least one outgoing or one incoming edge.
Note that, as no assumption on the connectedness of the graph is made, the eigenvector centrality, i.e., the nonnegative solution of (5) for α = 0 and p = 1, is not uniquely defined. Instead, Theorem 2 shows that the core-score assignment is always unique when p > max{1, α} and α ≥ 0. The relevance of Theorem 2 is not only theoretical. In fact, it comes together with the following corollary which shows the global convergence to u of a simple iterative scheme.
Note that on sparse networks the method scales linearly with the number of nodes. In fact, each iteration requires O(|E|) floating point operations, where |E| is the number of edges in the graph. Moreover, the free parameter p allows us to tune the overall number of iterations k = O(ln ε/ ln α−1 p−1 ) required to achieve the precision u k − u = O(ε). We will refer to the iteration in Corollary 3 as the Nonlinear Spectral Method (NSM).
From Corollary 3 the choice p α appears to be attractive, since it leads an extremely rapid (linear) convergence rate. However, in choosing values for p and α, we must take account of two further issues. 1. As we argued in Section 3, a larger value of α gives a kernel that more closely matches the ideal of max{x i , x j }. 2. A larger value of p produces a relaxed problem that is less likely to distinguish between the nodes. (Note that in the extreme case of p = ∞, the constraint x ∞ = 1 allows for the obvious solution x = 1, which assigns the same score to all nodes.) Combining points 1 and 2 with Corollary 3, we must compromise between a large parameter α in the kernel and a not-too-large value p for the vector norm, while keeping p > α to maintain convergence. In practice, we found that changing the value of p did not significantly affect the core-periphery structure output of the algorithm, which was instead governed by the value of α. In our experiments we chose p = 2α and α = 10, as this produced good results with guaranteed fast convergence. Moreover, we observed that larger values of α did not produce a noticeable change in the core-periphery structure identified.

Enron dataset
The Enron email network consists of 1,148,072 emails sent between 87,273 employees of Enron between 1999 and 2003. Nodes in the network are individual employees and weighted directed edges, with weights ranging from 1 to 3,904, count the number of emails sent from one employee to another. It is possible to send an email to oneself, and thus this network contains self-loops. Note that this network is not strongly (or even weakly) connected. The data has been collected from [22].
Three plots in Figure 2 display the network by means of colored adjacency sparsity plots. Here, each nonzero entry in the adjacency matrix is shown with an intensity that corresponds to the edge weight (the darker the dot the larger the weight on the corresponding edge). These plots correspond to three different node labelings: the first one (top-left corner) is the original node labeling; the second plot (topright corner) is the labeling, somewhat corresponding to a rich-club paradigm, obtained by re-ordering the nodes according to decreasing values of the overall degree d in + d out ; the third one (bottom-left corner) is the labeling corresponding to decreasing values of the core-score computed with the NSM using parameters α = 10 and p = 20. This latter figure clearly shows that the Enron email dataset contains a strong core-periphery structure, which was less prevalent initially. This is further confirmed by the core-periphery profile in the bottom-right plot, which shows the behavior of against k, where S k is the set of k most peripheral nodes corresponding to a corescore assignment. As k varies from 1 to n = 87, 273, γ(S k ) varies from 0 to 1 and measures the ratio of periphery-periphery links to periphery-all links, if S k were to be chosen to be the periphery set. Thus a network has a strong core-periphery structure revealed by a core-score vector if the corresponding profile γ(S k ) takes small values as k increases from zero and then grows dramatically as k crosses some threshold value.
For undirected networks, the profile γ(S k ) was proposed in [23] as a means to visualize core-periphery structure. In this case, γ(S k ) coincides with the persistence probability of the set S k , i.e., the probability that a random walker who is currently in any of the nodes of S k remains in S k at the next time step. For directed strongly connected networks, the persistence probability of S k would instead be given by ij∈S k y i P ij / j∈S k y j , where y is the stationary distribution of the random walk with transition matrix P ij = A ij /d out i . However, as the Enron dataset we are considering is not connected, y is not well defined, and we compute γ(S k ) in its place.
Finally, in order to show how the parameter α affects the core-periphery assignment obtained with NSM on this dataset, we show in Figure 3 the coreperiphery structure and core-periphery profile using three different values of α, namely α ∈ {1.5, 3, 10}. While changing the value of p does not effect the coreperiphery structure output of the algorithm, small values of α show a weaker coreperiphery structure, which is consistent with the fact that our model ideally works best when α → ∞. However, in practice α = 10 performs well and have observed that larger values of α do not result in any significant change.
Since the network is not strongly connected, we do not show plots corresponding to the eigenvector centrality-this is not uniquely defined and, in our tests, different runs of Julia's Arpack.eigs gave rise to very different reorderings.
For this network the NSM with parameters α = 10 and p = 20 computed the solution to 9 digits of precision in less than 5 seconds on a standard i7 single core laptop, using Julia 1.0. Our code in both Matlab and Julia is available online at the address https://github.com/ftudisco/nonlinear-core-periphery.

Conclusion
Our main aim in this work was to show that the attractive properties of the nonlinear spectral method proposed in [9] can almost completely be transferred to the directed, weighted and unconnected setting. In particular we show that for the coreperiphery kernel quality function (1), proposed for example in [8,9], there is always a unique solution for α ≥ 0 and p > max{1, α}, and this solution can be computed via a nonlinear spectral method whenever it is feasible to form matrix-vector products based on the network weight matrix. The proposed method, which exploits an intriguing connection between optimization and eigenproblems, generalizes the classical power method in order to compute the global maximum of a highly nonconvex function; thus it may also be of interest in other machine learning contexts.
Let us analyze the two terms S 1 and S 2 individually. If a = 1 1+e −b , we have i.e., S 1 (u) = f α (u). Now note that, if u, v ∈ CR(n) then there exists a permutation σ of {1, . . . , n} such that u i = v σ(i) for all i. Therefore, which implies that S 2 is constant on CR(n). Thus u maximizes L α (u) if and only if it maximizes S 1 (u) and the proof is complete.
Proof of Theorem 2 and Corollary 3. This proof is based on the proof of Theorem 4.5 in [9] and the lemmas therein proved. For convenience, let us denote by R n + the cone of vectors with nonnegative entries. Since we are interested in a nonnegative maximizer of f α (x) constrained on the sphere x p = 1, we can equivalently look for a maximizer of f α (x/ x p ) on the whole cone of nonnegative vectors R n + . Now, notice that f α is positively 1-homogeneous, that is f α (ax) = af α (x) holds for any real number a ≥ 0. Therefore we can further change our problem into the global maximum on R n + of g(x) = f α (x)/ x p , without losing any generality. The critical point condition for g implies the equivalence with the eigenvalue problem (5), i.e., x is a stationary point for g if and only if it is such that Φ α (x) = λx p−1 . As p > 1, we can equivalently write Φ(x) = µx, with µ = λ 1 p−1 and Φ(x) = Φ α (x) 1 p−1 . We now show that there can only be one nonnegative x such that x p = 1 and Φ(x) = µx.
To this end, note that Φ(x) ≥ 0 for any x ≥ 0. Thus if x ≥ 0 and x p = 1, then µ > 0 and we have where q is such that 1/p + 1/q = 1. Therefore any x ≥ 0, x p = 1 solution of (5) is a fixed point of the map Let τ (x, y) = ln x − ln y ∞ . Lemma 4.4 of [9] implies that for any x, y ∈ R n + . As τ is a complete metric on the cone R n + (see, for example, [24]), this shows that Ψ is a contraction and thus it has a unique fixed point.
Next we prove that u has a zero component i if and only if i is an isolated node, i.e., it has no incoming nor outgoing links. To this end, let Ω + A ⊆ R n + be the set of vectors Ω + A = {x ≥ 0 : x i = 0 if and only if i is isolated} .
Note that, equivalently, x i = 0 for x ∈ Ω + A if and only if A ij + A ji = 0 for all j = 1, . . . , n. Now note that if x ∈ Ω + A then Φ α (x) ∈ Ω + A . In fact, from its definition we see that, if i is isolated, x i = 0 and thus Φ α (x) i = 0, whereas if x i > 0, then Φ α (x) i > 0 as κ α (x i , x j ) ≥ x i > 0 and there exists at least one j * such that A i,j * + A j * ,i > 0, which implies Φ α (x) i ≥ x α−1 i (A i,j * + A j * ,i )κ α (x i , x j * ) 1−α > 0. Note that the same conclusion holds for any initial positive vector; that is, x > 0 implies Φ α (x) ∈ Ω + A . Therefore the iterative method of Corollary 3 converges to a vector in Ω + A for any starting point, or, equivalently, any nonnegative solution of (5) must be in Ω + A . Since there exists only one such solution, the proof is complete.