Core-periphery structure requires something else in the network

A network with core-periphery structure consists of core nodes that are densely interconnected. In contrast to a community structure, which is a different meso-scale structure of networks, core nodes can be connected to peripheral nodes and peripheral nodes are not densely interconnected. Although core-periphery structure sounds reasonable, we argue that it is merely accounted for by heterogeneous degree distributions, if one partitions a network into a single core block and a single periphery block, which the famous Borgatti–Everett algorithm and many succeeding algorithms assume. In other words, there is a strong tendency that high-degree and low-degree nodes are judged to be core and peripheral nodes, respectively. To discuss core-periphery structure beyond the expectation of the node’s degree (as described by the configuration model), we propose that one needs to assume at least one block of nodes apart from the focal core-periphery structure, such as a different core-periphery pair, community or nodes not belonging to any meso-scale structure. We propose a scalable algorithm to detect pairs of core and periphery in networks, controlling for the effect of the node’s degree. We illustrate our algorithm using various empirical networks.


I. INTRODUCTION
Many complex systems, biological, physical or social, can be represented by networks [1,2].
A network consists of a set of nodes and edges, where nodes represent objects (e.g., people, web pages) and edges represent pairwise relationships between objects (e.g., friendships, hyperlinks). A consistent observation across different types of networks is that they are often composed of communities, i.e., groups of densely interconnected nodes [3]. A community is often associated with a group of nodes sharing a role or similarity such as a circle of friends in social networks [4], a set of web pages discussing the same topic [5,6] and a functional group of proteins [7].
Given that block structure of networks, or equivalently, hard partitioning of the nodes into groups, has spurred many studies such as community detection [3,40] and the inference of stochastic block models (SBM) [6,41], as well as its appeal to intuition, we focus on the discrete version of core-periphery structure based on edge density in the present paper. If a network has such core-periphery structure, the core block should have more intra-block edges and the periphery block should have fewer intra-block edges than a reference. We argue that the core-periphery structure that Borgatti and Everett proposed (Fig. 1), which many of the subsequent work is based on, is impossible if we use the configuration model [42] as the null model and there are just one core and one periphery. The configuration model is a common class of random graph models that preserve the degree or its mean value of each node. Therefore, our claim implies that there is no core-periphery structure a la mode de Borgatti and Everett beyond the expectation from the degree of each node (i.e., hubs are core nodes), which is, in fact, consistent with some previous observations [16,28,30].
Then, we are led to a question: what is a core-periphery structure? To answer this question, let us look at the status of the configuration model in other measurements of networks. We have a plethora of centrality measures for nodes because the degree is often not a useful measure of the importance of nodes [1]. In other words, different centrality measures provide rank orders of nodes in the given network that are not expected from the configuration model. In network motif analysis, where one looks for small subnetworks that are abundant in a given network, we discount the frequency of subnetworks that are merely explained by the degree of the nodes (i.e., configuration model) [43]. In community detection, it is conventional to use the configuration model as the null model against which one assesses the significance of community structure [3,4,6,41]. To solve the conundrum that one does not discover core-periphery structure using the configuration model as the null model, we propose that one must add at least one different block apart from a core block and the corresponding periphery block for a network to have core-periphery structure that is consistent with Fig. 1. Such blocks may be a community, sparsely connected part, a different core-periphery pair [10,19,20,32], a core that shares the periphery with the focal core-periphery pair [34] and so forth. Then, we propose a scalable algorithm to partition a network into multiple core-periphery pairs including community detection as special cases, aiming to detect core-periphery structure that is not merely explained by the degree of each node. Crucially, we use the configuration model as the null model, which is different from our previous algorithm [19].

II. CORE-PERIPHERY STRUCTURE NEEDS AT LEAST THREE BLOCKS
Consider an unweighted network composed of N nodes and M edges. The N ×N adjacency matrix of the network is denoted by A = (A ij ), where A ij = 1 if nodes i and j ( = i) are adjacent and A ij = 0 otherwise. We assume that the network is undirected (i.e., A ij = A ji for all i = j) and has no self-loops (i.e., A ii = 0 for all i). As the null model of networks, we use the configuration model, i.e., a random network model preserving the degree of each node. For the configuration model, we allow multi-edges (i.e., multiple edges between nodes) and self-loops for computational ease. In fact, multi-edges and self-loops change our quality function for finding core-periphery structure in the order of 1/N , which is negligible if N is large. We denote by E[·] the expectation with respect to the configuration model.
Consider a partition of the set of N nodes into B blocks (i.e., groups). Let N u be the number of nodes in block u and m uv be the number of edges between blocks u and v. For notational convenience, we define m uu as twice the number of self-loops in block u (in the case of the configuration model) plus twice the number of edges between different nodes within block u. Suppose a network composed of B = 2 blocks ( Fig. 2(a)). There are potentially six types of block structure of networks represented by two blocks. In Figs. 2(b)-2(g), a filled block has more edges than that for the configuration model (i.e., m uv > E[m uv ]), and a blank block has fewer edges than that for the configuration model (i.e., m uv < E[m uv ]).
Here we have ignored the case m uv = E[m uv ] because it is unlikely in practice. The entire network would be dense if there are many intra-and inter-block edges ( Fig. 2(b)). In contrast, the network would be sparse if there are relatively few intra-and inter-block edges ( Fig. 2(c)). The network has community structure if there are many intra-block edges and relatively few inter-block edges ( Fig. 2(d)). A contrasting case is a structure close to a bipartite network, where there are relatively few intra-block edges and many inter-block edges ( Fig. 2(e)). Core-periphery structure would correspond to the case in which there are many edges within one block and few edges within the other block. With core-periphery structure, inter-block edges may be abundant ( Fig. 2(f)) [8,10,12,15,19,20,24] or not ( Fig. 2(g)) [8,10,16,17,25,26,31,[33][34][35].
Many algorithms for finding discrete versions of core-periphery structure seek a partition of nodes into one core block and one periphery block (Figs. 2(f) or 2(g)). Let us consider the karate club network [44], which has been demonstrated to have core-periphery structure [10,13,19,20,26,32,35]. The Borgatti-Everett (BE) algorithm partitions the N = 34 nodes into a core and a periphery as shown in Fig. 3. The detected blocks seem to suggest core-periphery structure because the core nodes are densely interconnected, whereas the peripheral nodes are sparsely interconnected. However, relative to the configuration model, the network is closer to a bipartite network than to core-periphery structure; there are fewer edges within both core and periphery blocks (i.e., m 11 Equation (1) with B = 2 yields If a network has a core-periphery structure, the core block should have more intra-block  Fig. 4(f), blocks 1 and 2 constitute a core-periphery pair, and block 3 constitutes a community. In Fig. 4(g), blocks 1 and 2 constitute a core-periphery pair, and blocks 2 and 3 constitute a bipartite-like subnetwork.
With B = 4 blocks, 49 types of block structure are consistent with Eq. (1). Four of them are shown in Figs. 4(i)-4( ) for illustration (see Fig. 18 for the others). The network shown in Fig. 4(i) is composed of two non-overlapping core-periphery pairs [19,20,24,32]. The network shown in Fig. 4(j) consists of one core-periphery pair (i.e., blocks 1 and 2) and one bipartite-like subnetwork (i.e., blocks 3 and 4). The network shown in Fig. 4(k) consists of one core-periphery pair (i.e., blocks 1 and 2), one bipartite-like subnetwork (i.e., blocks 2 and 3) and a community (i.e., block 4), in which the core-periphery pair and bipartite-like subnetwork overlap. The network shown in Fig. 4( ) has three overlapping communities, i.e., a community composed of blocks 1 and 2, one composed of blocks 2 and 3, and one composed of blocks 3 and 4.
To conclude, the core-periphery structure a la mode de Borgatti and Everett [8] relative to the configuration model can exist only when we have at least three blocks. In other words, a core-periphery pair requires a different substructure of the network that coexists in the same network, e.g., a community, bipartite-like structure, or another core-periphery pair that may overlap with the first one.

III. METHODS
In this section, we first describe a new algorithm for detecting core-periphery structure, which we refer to as KM-config, based on the observations made in Section II. MATLAB and C++ codes of KM-config are available at https://github.com/skojaku/km_config/. Then, we explain other methods and data used in Section IV.

Objective function
We propose an algorithm, KM-config, to detect discrete versions of core-periphery structure in networks. In contrast to our previous algorithm that uses the Erdős-Rényi random graph as the null model [19], which we refer to as KM-ER, here we use the configuration model as the null model. This is because we are interested in the structure that is not merely explained by the node's degree.
We assume that a network consists of C non-overlapping core-periphery pairs, each of which is composed of one core block and one periphery block, e.g., Fig. 4(i). Each coreperiphery pair should have (i) many intra-core edges, (ii) many edges between the core and the corresponding periphery (i.e., core-periphery edges), (iii) few intra-periphery edges and (iv) few edges to other core-periphery pairs (i.e., inter-pair edges). Although some previous studies do not assume property (ii) [8, 25, 27-29, 31-33, 35], we require it because otherwise one cannot relate a periphery with a particular core.
We define idealised core-periphery pairs satisfying properties (i)-(iv) [19,24] by where x i = 1 or x i = 0 if node i is a core node or a peripheral node, respectively, c i is the index of the core-periphery pair to which node i belongs, and δ is the Kronecker delta. Within each idealised core-periphery pair, every core node is adjacent to every other core node (property (i)) and also adjacent to all the corresponding peripheral nodes (property (ii)), and every peripheral node is not adjacent to any other peripheral nodes (property (iii)). Furthermore, there are no edges between different idealised core-periphery pairs (property (iv)).
We seek c i and x i (1 ≤ i ≤ N ) that maximise similarity between A and A * as defined by The first term of the right-hand side of Eq. (4) is the fraction of intra-core and core-periphery edges (i.e., A ij = A * ij = 1), corresponding to properties (i) and (ii). The second term is the counterpart for the configuration model. The factor 1/2M in the first and second terms normalises Q cp config to range in [−1, 1]. The remaining two properties (iii) and (iv) are also consistent with the maximisation of Q cp config . To show this, we rewrite Q cp config as is the sum of the number of intra-periphery edges and that of inter-pair edges, the maximisation of Q cp config minimises the two types of edges associated with properties (iii) and (iv).
In the configuration model, the expected number of edges between nodes i and j is given If we restrict that all nodes are core nodes (i.e., x i = 1 for i = 1, 2, . . . , N ), Q cp config is equivalent to the modularity [4,46], which is used for finding communities in networks.

Relationship to Markov stability
We can relate Q cp config to discrete-time random walks, similar to the case of the Markov stability formalism for community detection [47][48][49][50]. Consider a random walker that moves from a node to one of the neighbouring nodes selected uniformly at random in each discrete time step. Let T (c,x)(c ,x ) ≡ m (c,x)(c ,x ) /D (c,x) be the transition probability from block (c, x) to block (c , x ), where D (c,x) is the sum of the degree of the nodes in block (c, x). Let π (c,x) ≡ D (c,x) /2M be the stationary probability with which the random walker visits block (c, x). Then, one can rewrite Q cp config as Now, imagine a random walker starting from a node i selected randomly according to the stationary density d i /2M (1 ≤ i ≤ N ), at time t = 0. The probability that the random walker is in block (c, x) at time t = 0 and block (c , x ) at time t = 1 is given by π (c,x) T (c,x)(c ,x ) , which is accounted for by the first and second terms of the right-hand side of Eq. (7).
The corresponding probability for the configuration model is given by π (c,x) π (c ,x ) , which is accounted for by the third and fourth terms. Therefore, Q cp config measures how likely a random walker moves to the core of the currently visited node in one step relative to the probability expected for the configuration model. This observation is exploited in a different algorithm to detect core-periphery structure of networks [13].

Maximisation of the objective function
We maximise Q cp config using a label switching heuristic [51,52], which we have employed in our previous algorithm, KM-ER, that uses the Erdős-Rényi random graph as the null model [19]. First, we initialise the labels by c i = i and x i = 1 (1 ≤ i ≤ N ). Then, we update the label of each node as follows. Suppose that node i has a neighbour in a core-periphery pair c . We tentatively assign node i to the core (i.e., (c i , x i ) = (c , 1)) and compute the new value of Q cp config . We also tentatively assign node i to the periphery (i.e., (c i , x i ) = (c , 0)) and compute Q cp config . We perform the tentative assignments for all the core-periphery pairs to which any neighbour of node i belongs. If any tentative assignments do not raise Q cp config , we do not update (c i , x i ). Otherwise, we update (c i , x i ) to the tentative label (i.e., (c , 0) or (c , 1)) giving the largest increment in Q cp config . We inspect each node in a random order. If no node has changed its label during the inspection of all the N nodes, we stop updating the labels. Otherwise, we draw a new random order and inspect each node according to the new random order. We run this algorithm ten times starting from the same initial condition and adopt the node labelling that realises the largest value of Q cp config . The increment in Q cp config caused by updating node i's label from (c, x) to (c , x ) is given by is the number of edges connecting node i and block (c, x). When inspecting node i, we calculate Eq. (8) at most 2d i times. Therefore, the time , and that of the entire algorithm is O(M × (the number of inspections over the N nodes)).

Statistical test
We define the quality q of a core-periphery pair c by its contribution to Q cp config , i.e., One may deem that a core-periphery pair is significant if its q is statistically larger than the value expected for the configuration model. However, q may depend on the size (i.e., the number of nodes) n of the core-periphery pair, as is the case for the modularity [53].
Inspired by these considerations, we carry out a statistical test of the detected coreperiphery pairs as follows. We generate 500 randomised networks for the given network using the configuration model. Then, we detect core-periphery pairs in each randomised network. We compute the qualityq and sizen of each core-periphery pair detected in the randomised network. On the basis of the samples ofq andn, we infer the joint probability distribution P (q,n) using the Gaussian kernel density estimator [54,55]. Finally, we regard the core-periphery pair detected in the original network with a quality value of q to be significant if q is statistically larger than that of the core-periphery pair of the same size n detected in the randomised networks, i.e., if P (q ≥ q | n) ≤ α, where P is the probability and α is a significance level. (See Appendix B for the computation of P (q ≥ q |n).) We refer to the nodes that do not belong to any significant core-periphery pair as residual nodes.
Because we carry out the test for each core-periphery pair in the original network, we have to correct the significance level to suppress false positives due to multiple comparisons.
To this end, we adopt the Šidák correction [56], with which we test each core-periphery pair in the original network at a significance level of Empirical networks often have core-periphery pairs that are substantially larger than any of those detected in the 500 randomised networks (Section IV A). It is unlikely that one finds core-periphery pairs of the same size in randomised networks even if more samples of randomised networks are generated. The kernel density estimator enables us to infer P (q ≥ q | n) for large core-periphery pairs in the original network based on the quality and size of smaller core-periphery pairs detected in randomised networks.
Quality q may be significantly large for bipartite-like pairs of blocks ( Fig. 2(e)). Therefore, if our algorithm detects bipartite-like pairs of blocks, we manually mark them and distinguish them from the core-periphery pairs. Specifically, we regard a detected pair of blocks as bipartite-like if it has fewer intra-core edges than expected for the configuration model (i.e., . Otherwise we regard it as a core-periphery pair. Our algorithm did not find other types of block pairs (i.e., those shown in Figs. 2(b), 2(c) and 2(g)) for the networks examined in the following sections.

B. Other algorithms for comparison
We compare the present algorithm, KM-config, with three algorithms for finding a single core-periphery pair, i.e., the BE [8], MINRES [25,33] and SBM [16] algorithms, and three algorithms for finding multiple core-periphery pairs, i.e., Xiang [32], Divisive [19] and KM-ER algorithms [19]. We ran the Tunç-Verma [24] algorithm but do not show the results because the Tunç-Verma algorithm did not find significant core-periphery pairs or did not terminate within 48 hours on our computer (Intel 2.6GHz Sandy Bridge processors and 4GB of memory).
It should be noted that none of these algorithms uses the configuration model as the null model.
The BE, Divisive and KM-ER algorithms intend to produce many core-periphery edges (i.e., edges connecting a core node and a peripheral node) within each core-periphery pair We set the parameters of these algorithms as follows. For the SBM algorithm, we set γ k , The Xiang algorithm has a parameter, denoted by β ∈ [0, 1] in Ref. [32], to tune the number of core-periphery pairs. We set to β = 1. The Xiang algorithm uses a centrality measure to find core-periphery pairs. Therefore, we adopt the degree centrality measure. Note that the authors of Ref. [32] claim that the choice of the centrality measure does not considerably affect the results. With the Xiang algorithm, each node may belong to multiple core-periphery pairs. Therefore, if a node belongs to multiple core-periphery pairs, we assign the node to the core-periphery pair to which the extent of belonging is the largest.
If a node belongs to multiple core-periphery pairs to the same extent, then we assign the node to one of the core-periphery pairs selected with equal probability. The other algorithms do not have parameters. As is the case of KM-config, the BE, SBM, Divisive and KM-ER algorithms are stochastic. Therefore, we run the BE, SBM, Divisive or KM-ER algorithm ten times and use the best core-periphery pairs in terms of the algorithm-specific quality function.
For the core-periphery pairs detected by the six previous algorithms, we carry out our previously proposed statistical test [19] that adopts the Erdős-Rényi random graph model as the null model. The statistical test runs as follows. Suppose that a network is composed of a single core-periphery pair. We generate 500 randomised networks using the Erdős-Rényi random graph with the same number of edges as the original network. Then, we detect a single core-periphery pair in each of the randomised networks using the BE algorithm and compute its quality by where A * is given by Eq. (3) and . If the quality of the core-periphery pair detected in the original network is larger than a fraction 1 − α of those detected in the randomised networks, then we regard the core-periphery pair in the original network as significant. It should be noted that this test is not applicable when the null model is the configuration model. If we use the configuration model as the null model, any core-periphery pair detected in the original network will be judged to be insignificant because no network is partitioned into a single core-periphery pair whose q value is larger than that for the configuration model.
If we detect multiple core-periphery pairs in the original networks, we apply the same statistical test for each of them [19]. Specifically, for each core-periphery pair, we construct a subnetwork composed of the nodes and edges within the focal core-periphery pair. Then, we apply the statistical test to the subnetwork. We correct the significance level using the Šidák correction [56]; we test each core-periphery pair in the original network at a significance level of α = 1 − (1 − α ) 1/C , where α = 0.05 and C is the number of core-periphery pairs detected in the original network.

C. Data
We analyse the 12 empirical networks listed in Table I. We discard the direction and weight of the edge.
In the karate club network, each node represents the member of a university's karate club [44]. Two members are defined to be adjacent if they frequently interact outside the club activities. The club experienced a fissure as a result of a conflict between the instructor and the president. Based on their self-reports, each node has a label indicating either the instructor's side (15 members), president's side (16 members) or neutral (3 members).
In the dolphin social network, each node represents a dolphin living near Doubtful Sound in New Zealand [57]. An edge between two dolphins indicates that they were frequently observed in the same school during 1994 and 2001. Each dolphin has a label indicating the sex, i.e., female (25 dolphins), male (33 dolphins) and unknown (4 dolphins).
In the network of novel Les Misérables, each node is a character of the book [58]. Two characters are defined to be adjacent if they appear in the same chapter. The book consists of 365 chapters, most of which are a few pages long.
In the Enron email network, each node is an email account of the staff of Enron Inc [59].
An edge indicates that an email was sent from one account to another account during the observation period.
In the jazz network, each node represents a jazz musician [60]. Two jazz musicians are defined to be adjacent if they have played in the same band.
In the co-authorship network, each node represents a researcher in network science [46].
An edge indicates that two researchers have a joint paper. The nodes and edges were retrieved from all the references cited by two influential review papers on network science. Then, the author of Ref. [46] manually added some nodes and edges and excluded those not belonging to the largest connected component.
In the blog network, each node represents a blog on the United States presidential election in 2004 [5]. Each edge indicates that one blog has a hyperlink to the other blog on its top page. The blogs and their labels were collected from several blog directories [5]. If a blog was unlabeled or had conflicting labels, the authors of Ref. [5] manually determined the label.
There are 586 liberal blogs and 636 conservative blogs.
In the worldwide airport network, each node is an airport [61,62]. An edge represents a direct commercial flight between two airports. We use the network provided in Ref. [62].
In the protein-protein interaction network, each node is a human protein [63,64]. An edge indicates the presence of physical interaction between two proteins.
In the network of chess players, each node represents a chess player [65]. Two players are adjacent if they have played before.
In the co-authorship network of the arXiv astro-ph section, each node is a researcher [66].
An edge indicates that two researchers have a joint paper in the arXiv's astro-ph section.
In the network of the Internet, a node is an autonomous system (AS), i.e., a set of routers (or IP routing prefixes) managed by a network operator [65]. An edge indicates a logical peering relationship between two ASes.

A. Quality and size of detected core-periphery pairs
The circles in Fig. 5 represent the quality and size (defined as the number of nodes) of coreperiphery pairs detected by KM-config in the 12 empirical networks. A larger core-periphery pair tends to have a large quality, q. This is also the case for the randomised networks (crosses in Fig. 5). Some core-periphery pairs detected in the empirical networks have a significantly larger q value than those of the same size detected in the randomised networks.
Our statistical test suggests that these core-periphery pairs are significant (circles outside the shaded regions in Fig. 5). We find bipartite-like pairs in the 7 out of the 12 networks (squares in Fig. 5), some of which are significant in 2 out of the 7 networks (Figs. 5(i) and 5( )). In 2 out of the 12 networks, we find significant core-periphery pairs that are larger than any of those detected in the corresponding randomised networks (Figs. 5(g) and 5( )).

B. Core nodes are not necessarily hub nodes
With KM-config, whether the node belongs to a core or periphery is not strongly associated with the node's degree. To show this, we carry out a receiver operating characteristic (ROC) analysis (Fig. 6). Let us regard θN nodes (θ ∈ {0, 1/N, 2/N, . . . , 1}) with the largest degree as hub nodes and the remaining nodes as non-hub nodes. The ROC curves show the relationship between the fraction of hub nodes in the set of significant core nodes (i.e., true positive rate) and that in the set of significant peripheral nodes (i.e., false positive rate) when one varies the threshold θ. If all core nodes have a larger degree than all peripheral nodes, the ROC curve passes through (0, 1) of the unit square (Fig. 6). If the degree of core nodes and that of peripheral nodes obey similar distributions, then the ROC curve is close to the diagonal line for the entire range of θ.
The area under the curve (AUC) of each ROC curve is shown in Table II to be more frequently connected to the core nodes within the same core-periphery pair than the core nodes do (Fig. 8). For some core nodes with a large degree, d core i is equal to zero, which happens when a core-periphery pair has only one core node and forms a star.

C. A core-periphery pair is a community?
We compare the core-periphery pairs identified by KM-config and communities in networks.
Here we determine communities by modularity maximisation using the Louvain algorithm [52]. We run it ten times and adopt the node partition that realises the largest modularity value. Table III reports the modularity values for the node partition identified by the Louvain algorithm and that determined by KM-config, with the insignificant core-periphery pairs being included. The modularity value for the node partitioning into core-periphery pairs is close to that obtained by the modularity maximisation for most of the empirical networks.
Therefore, the detected core-periphery pairs may be similar to communities.
This result poses a question whether a core-periphery pair is a community in the traditional sense, and if so whether the KM-config algorithm effectively classifies the nodes in each community into a core and a periphery according to the composition of intra-and intercommunity edges that each node owns. To examine this point, we analyse the role of each node using a cartographic representation of networks [67,68]. With the cartographic representation, the role of each node i in a network is characterised by the standardised within-module degree z i ∈ [−∞, ∞] and the participation coefficient p i ∈ [0, 1] [67,68]. They are defined by whered i,c is the number of neighbours of node i in the cth core-periphery pair (1 ≤ c ≤ C), c i is the core-periphery pair to which node i belongs, d c i is the average ofd j,c i over the nodes j in the c i th core-periphery pair including the case j = i, and σ c i is the unbiased estimation of the standard deviation ofd j,c i over the nodes j in the c i th core-periphery pair. A large z i value indicates that node i has relatively many neighbours within the same core-periphery pair. The p i value is the smallest if node i is adjacent only to the nodes in a single core-periphery pair and largest if node i is adjacent to an equal number of nodes across all core-periphery pairs. In the cartographic representation of networks, each node i is classified according to the position (z i , p i ) of the node in the z-p space. The nodes are categorised into seven roles [67,68]. Here we do not use this categorisation rule but examine the distributions of the core and peripheral nodes in the z-p space. Figure 9 shows z i and p i of each node for the 12 empirical networks. KM-config classifies the nodes having very large z i as core nodes. However, for the other nodes, the values of z i and p i are not predictive of whether a node is in the core or periphery. Therefore, the core and periphery that we propose are distinct from the roles of nodes identified by the cartographic analysis.
We find different results for the Divisive algorithm, which first divides the network into communities and then estimates the role of each node (i.e., core or periphery). With Divisive, the core nodes have larger z i than most of the peripheral nodes ( Fig. 25 in Appendix C), indicating that the core nodes detected by Divisive largely correspond to the hub nodes as identified by the cartographic analysis. This is because Divisive uses the BE algorithm to partition each community into a core and a periphery. As shown in Fig. 6, the BE algorithm classifies nodes into a core and a periphery by the degree of each node to a large extent.
Therefore, Divisive regards the nodes with a large z i as core nodes.

D. Case studies
In this section, we present case studies of some of the empirical networks analysed in the previous sections.
The core-periphery pairs in the karate club network are shown in Fig. 10. KM-config detected two significant core-periphery pairs and ten residual nodes. A majority of the members on the president side (12 members; 75%), including the president (node 34), belong to core-periphery pair 1. A majority of the members on the instructor side (11 members; 73%), including the instructor (node 1), belong to core-periphery pair 2. These results are consistent with the social conflict of the club. The residual consists of four members on the instructor side, five members on the president side and one neutral member. The significant core-periphery pairs are similar to those detected by our previous algorithm, KM-ER [19].
The core-periphery pairs in the dolphin social network are shown in Fig. 11. KM-config detected three significant core-periphery pairs and 14 residual nodes. Each core-periphery pair mostly consists of the dolphins of the same sex; there are two male-dominant core-periphery pairs (pairs 1 and 3) and one female-dominant core-periphery pair (pair 2). A previous study identified five communities in the dolphin network by modularity maximisation [4], three of which are similar to the present core-periphery pairs 1, 2 and 3.
For the network of Les Misérables, KM-config identified four significant core-periphery pairs and 40 residual nodes (Fig. 12). A majority of nodes belonging to the significant core-periphery pairs are core nodes, suggesting that each core-periphery pair resembles a community. In fact, a previous study used modularity maximisation to identify 11 communities in the same network [4], four of which are similar to our core-periphery pairs 1-4. The significant coreperiphery pairs are consistent with the plot of the story; the characters in core-periphery pairs 1, 2, 3 and 4 are the members of a revolutionary student club, Thénardier family and a street gang, Fantine's relatives and her friends, and characters involved in the Champmathieu's trial, respectively. The main characters, e.g., Valjean, Javert and Cosette, are classified as residual nodes (arrows in Fig. 12). Although they have a large degree, they are regarded as residual nodes because they belong to insignificant core-periphery pairs.
For the co-authorship network, KM-config detected 28 significant core-periphery pairs and 133 residual nodes ( Fig. 13(a)). Detailed structure of core-periphery pairs 1-10 is shown in Fig. 13(b). Five core-periphery pairs (pairs 1, 5, 8, 9 and 10) have relatively many intra-core edges, many core-periphery edges and no intra-periphery edges, indicating a strong core-periphery structure. Core-periphery pair 8 contains only one peripheral node, implying a structure close to a community. Some core researchers collaborate with most of the researchers in the same core-periphery pair, e.g., A. Barabási, H. Jeong and Z. Oltvai in core-periphery pair 1, A. Vázquez and A. Vespignani in core-periphery pair 2 and S.
For the blog network, KM-config identified two core-periphery pairs and 79 residual nodes ( Fig. 14). A majority of the blogs leaning to the conservative and to the liberal belong to core-periphery pairs 1 and 2, respectively. These core-periphery pairs are similar to those identified by KM-ER [19].
For the airport network, KM-config identified 23 significant core-periphery pairs and 983 residual nodes (Fig. 15). Each core-periphery pair mainly consists of the airports in the same geographical region, which agrees with the previous results [19,68,69]. Our previous algorithm, KM-ER, detected ten core-periphery pairs, of which the three largest core-periphery pairs based in Europe, East Asia and the USA are similar to core-periphery pairs 1, 3 and 2 detected by KM-config, respectively [19]. Properties of core-periphery pairs 1-8 are shown in Table IV. Among the representative airports (i.e., the airports having the largest degree in each core-periphery pair), some peripheral airports have a larger degree than core airports, e.g., MUC (Munich) in core-periphery pair 1, SVO (Moscow) in core-periphery pair 5 and NBO (Nairobi) in core-periphery pair 8, showing that hub nodes are not always classified as core nodes.

E. Synthetic networks
The results in the previous sections suggest that KM-config tends to detect core-periphery pairs without using the node's degree as a main criterion but produces node partitioning consistent with the concept of core-periphery structure based on edge density. To confirm this point further, in this section we test the algorithms on model networks with a planted core-periphery structure composed of two core-periphery pairs (Fig. 16).
The discrepancy between the degree distribution of core nodes and that of peripheral nodes where λ is a mixing parameter, m rand We evaluate the performance of algorithms by the difference between the planted and detected core-periphery structures. To quantify the difference, we use the variation of information (VI) [70] given by where R(c, x;ĉ,x) is the fraction of nodes having true label (c, x) and inferred label (ĉ,x).
The VI value is the smallest (i.e., zero) if and only if the partitioning of nodes by the true labels and that by the inferred labels are identical. In the computation of the VI values, we regard the set of residual nodes as a block; technically, we set (ĉ i ,x i ) = (C + 1, 0) for the residual nodes. We generate 30 synthetic networks and average the VI values over the 30 generated networks.
The VI values for the Xiang, Divisive, KM-ER and KM-config algorithms are shown in Fig. 17. We do not show the results for the other three algorithms because they do not find multiple core-periphery pairs by definition. When λ is large, the VI values are large because the network is close to the configuration model and has weak core-periphery structure. The VI values for the Xiang algorithm are large in the entire λ-µ parameter space ( Fig. 17(a)).
This is because the Xiang algorithm did not find significant core-periphery pairs in all the generated networks (i.e., all nodes are residual nodes). The VI values for the Divisive and KM-ER algorithms are relatively large for most λ values if some planted core nodes are non-hub nodes, i.e., µ ≥ 0.15 (Figs. 17(b) and 17(c)). The VI values for the KM-config algorithm are the smallest in most of the λ-µ parameter space (Fig. 17(d)), including the case for bipartite-like structure (µ ≥ 0.4). Therefore, KM-config but not the other algorithms is capable of detecting core-periphery structure even when a substantial fraction of core nodes are non-hubs and peripheral nodes are hubs.

V. DISCUSSION
We have studied core-periphery structure using the configuration model as the null model.
We have shown that discrete versions of a single core-periphery pair determined based on edge density, which many studies assume, can never be significant relative to the configuration model. The core-periphery structure beyond what one expects for the configuration model must accompany other meso-scale network structure such as another core-periphery pair, communities and bipartite-like subnetworks coexisting in the given network. This claim is in resonance with the studies [19,28,30] reporting the absence of core-periphery structure when the configuration model is used as the null model. Then, we have presented a scalable algorithm to find core-periphery structure in networks and applied it to various networks.
Our argument does not apply to continuous versions of core-periphery structure [8,10,14,17,25], in which each node belongs to the core to a different extent. A possible extension of our present algorithm (i.e., KM-config) to the case of continuous core-periphery structure is to replace the idealised core-periphery structure defined in Eq. (3) with a continuous version of idealised core-periphery structure, such as those proposed in Refs. [8,10]. This line of investigation may reveal relationships between continuous versions of core-periphery structure, multiple core-periphery pairs and the configuration model.
Null models for networks do not have to be limited to the Erdős-Rényi random graph and the configuration model. Other null models incorporate different properties of networks such as the weight of edges [71], the sign of edge weights [72], correlations [73], bipartiteness [74] and spacial properties [75]. It is probably possible to incorporate such null models into our algorithm by modifying the null-model term of Q cp config (i.e., the second term on the right-hand side of Eq. (6)).
Akin to the modularity, our quality function Q cp config allows an interpretation in terms of random walks on networks. With a core-periphery structure, random walkers are likely to move from any node to a core node within the same core-periphery pair in a discrete time step. In a core-periphery structure on a small scale, the random walkers would reach the core in a small number of steps. In contrast, they would need a large number of steps to reach the core on a large scale. By regarding the number of steps as a resolution parameter, we may be able to identify core-periphery structure across different scales, as in the case of the Markov stability, where modularity maximisation with different values of the time resolution parameter provides information about hierarchical organisation of communities in networks [47][48][49][50].
Our quality function Q cp config shares shortcomings with the modularity, such as the inability of finding small communities [76] and of distinguishing random from non-random structure [77]. Remedies for these problems include multi-resolution approaches [78] and statistical tests [79,80]. Another approach is the statistical inference based on SBMs, which has been used for finding communities [6,16,30,41] and core-periphery structure [16,24,30].
Investigation of core-periphery structure with SBMs may be a topic for future study.
Finally, we have restricted ourselves to undirected and unweighted networks. It is straightforward to incorporate the weight of edges by replacing A ij on the right-hand side of Eq. (6) by the weight of the edge between nodes i and j. In contrast, it is nontrivial to incorporate the direction of edges. It seems that the direction of edges can be incorporated into Q cp config by allowing an adjacency matrix to be asymmetric, as in the case of modularity [81,82]. However, for modularity, this extension elicits a problem [83], which may also hold true for Q cp config .

Appendix A: Network structure with four blocks
All possible network structures with four blocks that are consistent with Eq. (1) are shown in Fig. 18.

Appendix B: Estimating statistical significance of core-periphery structure
Let S be the sum of the number of core-periphery pairs detected in the 500 randomised networks. Letq (s) andn (s) (1 ≤ s ≤ S) be the quality and size of the sth core-periphery pair in the randomised networks, respectively. We use the Gaussian kernel density estimator [54,55] to infer the joint probability distribution P (q,n), which gives where σq and σn are the standard deviations of {q (s) } and {n (s) } (1 ≤ s ≤ S), respectively.
Standard deviations σq and σn are defined as where q and n are the average values of {q (s) } and {n (s) } (1 ≤ s ≤ S), respectively. In Eq. (B1), f is the probability density function of the standard bivariate normal distribution given by where γ is the Pearson correlation coefficient between In Eq. (B1), h is a parameter specifying the width of the Gaussian kernel density estimator.
The probability that the core-periphery pair of size n has a quality value greater than or equal to q in randomised networks is computed as where Φ (y) = (2π) −1/2 y −∞ exp(−z 2 /2)dz is the cumulative distribution function of the standard normal distribution. Substitution of Eq. (B7) into Eq. (B6) yields   1   2 3  33 34  4 14  20 8  12 13  18 22  27 9  15 16  21  23  30 31  19 10  5 6  7 11  24 28  25 26  29 32  17   1  2  3  33  34  4  14  20  8  12  13  18  22   27  9  15  16  21  (a)                     [57], network of characters in Les Misérables [58], Enron email network [59], network of jazz musicians [60], co-authorship network in network science [46], political blog network [5], worldwide airport network [61,62], protein-protein interaction network [63,64], network of chess players [65], co-authorship network in the arXiv astro-ph section [66] and the Internet at the level of AS [65]. We exclude isolated nodes and self-loops from the networks. We count the multi-edges between a pair of nodes as a single edge.    IV: Property of the eight largest significant core-periphery pairs in the airport network. The representative airports of each core-periphery pair are defined as four core and four peripheral airports having the largest degree. The territory is defined as the country where the airport is located. If the airport is located in a sovereign state, we instead show the name of the state. IATA is a three-letter code of an airport assigned by the International Air Transport Association.