Cluster-based topological features of nodes in a multiplex network—from a network of networks perspective

In the gathered multiplex systems, nodes inherit a part of their original system’s topological features, as in the world economic system, national policies and resource distribution bring industry advantages and resource advantages to the domestic industry. Although they represent one of the important original topological features of nodes, the inherited topological features of nodes have not received sufficient attention and have hardly been analyzed by existing network models. In our research, we defined the inherited topological features of nodes as ‘cluster-based topological features. To accurately calculate the cluster-based topological features of nodes in multiplex networks, we first provide a network model to summarize the multiplex networks into a calculable network of networks (NoN). Based on our network model, we propose a series of algorithms for calculating industries’ cluster-based topological features. Our calculating process contains 2 steps: ‘abstracting’ the NoN into one-layer calculable network; ‘inheriting’ subnetworks’ topological features into the inner nodes. Our network model and calculation algorithms are applied in a series of theoretical and empirical multiplex networks. The results not only confirm the realizability of our model but also produce several interesting findings, the most important of which is that some unremarkable nodes in multiplex network may have a very high contributory value from NoN perspective.


Introduction
The statistical mechanics of networks, such as topological features and clusters, as one of the most critical issues in both theoretical and real-world networks, have attracted much attention from many researchers [1][2][3][4][5][6][7][8][9][10][11][12]. Moreover, calculating the statistical features of nodes more precisely is one of the main challenges in this research field [3,13]. Recently, along with the development of economics and technology, real-world networks are becoming much more complex; multiple kinds of independent social and economic networks are connected and evolve into multiplex (where multiple nodes and edges coexist in one network system) social or economic networks [14][15][16][17][18][19][20][21][22][23][24][25][26][27]. To achieve the goal of precisely calculating nodes' statistical features, previous studies calculated topological features (such as centralities indexes, Page rank, Leader rank) [1,3], separating processes (like k-core and k-shell) and h-indexes [5,28]. Other studies calculated the cluster characteristics based on a 'community', which defined communities based on network structures [4,29]. However, previous studies insufficiently address the multiplex network environment, which exists widely in social and economic systems.
Under this circumstance, for more precisely calculating the topological features of nodes, we propose 'cluster-based' topological features. Cluster-based topological features indicate that within the gathering process of independent clusters, nodes will inherit a part of their original clusters' topological features. For example, in Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. global economic multiplex networks, countries contain various types of nodes, such as companies and industries. In other words, countries will be the original cluster of the industries and companies. Thus, companies and industries will benefit from their nation's policies and will inherit parts of their countries' development strategies. Moreover, this 'inheriting from cluster' phenomenon exists widely in society and natural systems, such as researchers' cooperation networks and ants' cooperation networks. Researchers inherit ideas and experiences from their clusters, or research groups, just as ants inherit pheromones from their clusters, or colonies. Due to the wide existence of the 'inheriting' phenomenon in real-world systems, precisely calculating inherited topological features will be helpful in better understanding nodes' functions and contributions in multiplex networks and real-world systems. Therefore, our primary research goal is to quantify how these clusters' topological features (such as topological features) are inherited and how much nodes will inherit from their clusters in multiplex network environments.
To simulate the multiplex network into a calculable network model, we propose a network of networks (NoN) model. The basic idea of NoN is abstracting the clusters or subnetworks (like countries) into nodes, and simulating the relationships between clusters as edges, as shown in figure 1. Based on this approach, the multiplex network can be abstracted into a calculable one-layer network to calculate the topological features of each cluster. As members of the cluster, the topological features will be inherited to inner nodes based on their contributions. This allocation process also recovers the missing information in the abstracting process from multiplex networks to one-layer networks. Based on the 'abstracting-allocating' calculation process, clusterbased statistic features (such as cluster-based topological features) of each node in the multiplex network can be effectively calculated.
In this article, we provide two calculation processes for calculating cluster-based statistic features, from both theoretical and mathematical approaches: an abstracting process (abstracting multiplex network into a onelayer network, calculating the statistical features of each cluster) and an allocating process (allocating statistical features of each cluster into the cluster's inner nodes based on the nodes' contribution). To verify the realizability and computability of our model, we applied our calculating algorithm in five types of theoretical NoN and compared the results with a traditional one-layer network analysis model. Moreover, for the empirical analysis of our model, we calculated the cluster-based topological features of each industry based on the global inputoutput table and compared the results with the theoretical NoN. Due to the interactions within each country is different with the interactions between different countries, we choose this typical NoN as our empirical case study.  Our approach not only extends some single-layer network metrics to NoN, but also computes topological features that will be inherited from the subnetworks. The topological features inherited from the subnetworks can often affect the development of the node to a large extent. For example, in the collaboration network of scientists, different research organizations and academic schools have formed several relatively close 'academic collaboration subnetworks'. Scholars located in different subnetworks continue to learn and inherit the academic resources, academic ideas and experience of other scholars in the subnetworks. The main research goal of our model is to calculate for these inherited topological features. In the following sections, we describe these steps in detail and provide the mathematical formulation of each step.

Simulating multiplex network into NoN
In this section, we provide the construction definition and algorithm of the NoN model. This model will be helpful in abstracting the relationships among multiple real-world social groups or multiplex social systems into a calculable, integrated network model.
In the mathematical approach, the algorithm for the NoN construction can be described in matrix manipulations. The matrix of all large networks can be described as:  where  w j k represents the weight of the edge between node j (which is the subnetwork J in the multiplex network) and node k (which is the subnetwork K in multiplex network). Based on the matrix of NoN, the topological properties of each node (subnetwork) can be calculated.

The calculation algorithm of edges' weights in NoN
The edge between nodes j and k in NoN is merged by the intersubnetwork edges between subnetworks J and K, as shown in figure 2. Briefly, the edges in NoN represent the interactions among different subnetworks. The weights of these edges measure the tightness or information flow among different clusters. Thus, the weight of the edge between nodes j and node k should contain all the connecting ability from the internetwork edges between subnetworks J and K.
In a relational network, the weights of internetwork relationships usually represent the 'interaction frequency,' rather than the 'ability'. Like in social networks, the propagation path of a piece of information does not necessarily follow the highest weighted edges, but follows the shortest pathway. Effective information is usually the first received information. Therefore, we introduce an index to calculate the 'ability' of internetwork relational edges: internetwork betweenness centrality (BC) of edges. represents the shortest pathways that pass through edge (l, k). We believe that the shortest pathways can reflect information transmission channels. Thus, internetwork BC can reflect the transmission topological features of internetwork edges. A larger internetwork BC of one edge means that more information will be transmitted through the associated edge. Based on this index, we multiply the edge's internetwork BC by the edge's weight to represent the connecting ability of the edge. Therefore, the weights of edges in NoN can be calculate based on the formula: where w lk NoRNs represents the weight of the edge between nodes l and k in the NoN, respectively simulating networks L and K. w lk represents the weight of edge between nodes l and k in the original network. Based on the weights of the edges in the NoN, the network matrix of NoN can be calculated as shown by formula (2).

Topology indexes of clusters
In this section, several essential indexes of calculating the topological features of nodes in NoN are provided using both theoretical and mathematical approaches. These indexes include In-degree, Out-degree, In-Strength, Out-strength, BC and closeness centrality (CC), respectively calculated by the following formulas.
where ID , r OD , r IS , r OS , r BC r and CC r respectively represents the In-Degree, Out-degree, In-Strength, Out-Strength, BC and CC of node r in the NoN; a ir represents the connecting condition which is the element of the adjacency matrix; w ir represents the weights of the edges connected with node r; d ri represents the distance between nodes r and i; g pq represents all of the shortest pathways between nodes p and q; ( ) g r pq represents the shortest pathways that pass through node r; and n represents as the total number of nodes in the NoN.
These indexes calculate three kinds of information transmission 'ability' of the nodes in NoN: information impacting ability (Degree and Strength), information intervening ability (BC), and information anti-intervening ability (CC). A higher Strength indicates a higher frequency of information exchange and a wider scope of information spreading, which leads to a broader and stronger impacting ability. A higher BC means that the node passes through more transmission channels of information, and the node can also intervene more in the information transmission process. A higher CC means that the spreading of information will pass through fewer nodes, intervening less with the transmission process. Based on these indexes, we can calculate the essential characteristics of the small primary network for the whole system. Our next step is to inherit these topological features into the inner nodes of each subnetwork, to achieve the main purpose of this research: calculating the topological properties of individuals in NoN.

Allocating algorithms of each index
Based on the topological indexes, three topological features of each subnetwork in the whole system can be calculated. However, this abstracting process loses some information inside subnetworks: the topological features of the nodes are unable to be calculated. In this section, we provide an algorithm for 'allocating' subnetworks' topological features into inside nodes. Because the different indexes reflect various information transmission topological features, the allocating algorithm of each index will be different.
Degree and Strength represents the influencing scope and intensity of nodes, mainly reflecting the information diffusibility of the entire subnetwork. Inside the subnetwork, not only intercluster connecting nodes (which have connections with other subnetworks) contribute the outward-diffusing channels of the entire group but also the internal individuals contribute information sources and information processing to the subnetwork. Thus, the allocation of Degree and Strength should evaluate the information dominance of each node and its neighbors. To achieve this purpose, we propose an index called self-eigenvector-centrality (SEC) which can be described as follows: where SEC J p represents the SEC of node p in network J,  x represents the maximum eigenvector of the network matrix, and x q and x p respectively represent the eigenvalues of nodes q and p from the maximum eigenvector  x. Further, l represents the eigenvalue of the matrix. Regarding the SEC as the evaluation index of the contribution on information diffusibility, the Degree and Strength of the group can be inherited to everyone in the group. The A-ID (inherited In-Degree), A-OD (inherited Out-Degree), A-IS(inherited In-Strength) and A-OS (inherited Out-Strength) of each node is described as follows: Where AID , p AOD , p AIS , p and AOS p respectively represents the A-ID, A-OD, A-IS and A-OS, åSEC J j represents the sum of the SEC of all nodes in network J, S J is the strength of network J, and AS p reflects the information diffusibility of each individual on the whole system. BC and CC represent the betweenness capability and centered degree of information transmission, mainly reflecting the influence on information transmitting channels, or in other words, the influence on the shortest pathways. We believe information is transmitted through the shortest pathways not only in NoN but also inside each small group. Thus, evaluating nodes' contributions of BC and CC is based on their influencing ability on shortest pathways. The A-BC (inherited betweenness centrality) and A-CC (inherited closeness centrality) of the nodes in small groups can be respectively described as: where ABC J p and ACC J p respectively represent the A-BC and A-CC of node p in network J, BC J p and CC J p respectively represent the BC and CC of node p in network J, and BC J and CC J respectively represent the BC and CC of network J in the NoN. Based on indexes of A-Str, A-BC and A-CC, the topological centrality of nodes can be calculated in an agglomerate social economic from network of network perspective. To verify the effectiveness and validity of our model, we set up five groups of theoretical NoN models containing scale-free networks, small-world networks and random networks. Additionally, in order to highlight specific information calculated by our model, we analyzed topological centralities of nodes in theoretical models from a traditional network analyzing perspective, in contrast to the experiments. All results are analyzed and discussed in the following section 3. Results

Theoretical results
To verify the effectiveness and validity, as well as to determine general theoretical conclusions of our model, we apply our model to five groups of theoretical NoN: SF-SF (scale-free and scale-free), SF-SW (scale-free and small-world), SW-SF (small-world and scale-free), SW-SW (small-world and small-world) and RD-RD (random and random). SF-SF contains 100 scale-free networks connected in scale-free characteristics. SF-SW contains 100 scale-free networks connected in small-world characteristics. SW-SF contains 100 small-world networks connected in scale-free characteristics. SW-SW contains 100 small-world networks connected in small-world characteristics. RD-RD contains 100 random networks connected randomly. The schematic figures of these NoN models are shown in figure 3. Based on these theoretical NoN models, we calculate the influence capability, betweenness capability and centered degree of nodes based on the indexes of A-S, A-BC, and A-CC. The distributions of these indexes are shown in figure 4.
The distributions of A-Str, A-BC, and A-CC performed differently, which means the influence capability (A-Str), betweenness capability (A-BC) and centered degree (A-CC) of all nodes are distributed differently in different NoN. Overall, the distributions of betweenness capability have a higher scale-free characteristic ( figure 4(A)), whereas the centered degree is distributed randomly ( figure 4(C)). The distribution of the SW-SW network model is similar to the RD-RD model, which means that the randomness of SW-SW is relatively strong (almost as strong as the random network). To compare the distribution of each NoN, all curves are fitted with a mathematical distribution.
The fitting results are listed in table 1. Based on the results of the mathematical fitting, almost all distribution curves (except SF-SF) of A-Str and A-CC fit a normal distribution. The distribution curves of A-BC for the NoN, with a power-law character (SF-SF, SF-SW, and SW-SF), fit a power-law distribution. Additionally, we also calculate the Mode and Divergence of the normal distribution curves. The mode reflects the majority value (A-Str, A-BC or A-CC), where a higher mode indicates a lower concentration (fewer nodes have high value). Divergence reflects the randomization of the distribution, where higher divergence indicates higher randomization. The highest mode and the highest divergence of the distribution in A-Str, A-BC, and A-CC are the distributions of RD-RD and SW-SW (SW-SF once). This indicates that the influence capability and (2) In the highly random SW-SW, the correlation of the two indicators is more random, which means when the randomness increases, the two kinds of indicators become more corelated.

Case study-The global input-output NoN
To achieve the goal of examining our theoretical model and method and calculating the highest contribution industries, we construct a global input-output network of a network and calculate the contribution topological features (BC, CC, degree, and strength) of each industry. However, the high-contributory industries calculated by statistical data and the traditional one-layer complex network are exterior, i.e. the fastest developed industries or greatest export industries. These results ignored implied contributory industries, i.e. supporting industries and educational industries. The contributions of Public Administration and Education, Health and Other Services industries, due to their low statistical data cannot be ignored. Under this circumstance, we applied our   To calculate the most intervening industries of the system, we also calculate the highest contribution industries for target monitoring and target controlling. The results are shown in S-table 1. The highest contribution industries of the betweenness capability are Financial, Retail, Recycling and surprisingly, Education and Health and Public Administration. This means that some support industries do make a large contribution to the betweenness capability. Due to the high randomness of the distribution of centered degree, the results are irregular, and the highest contribution industries are mainly from relatively small and underdeveloped countries.
The results of the highest contribution industries of diffusion capacity are much more regular. Most industries are from relatively large or developed countries, such as Germany, the United States, Japan, China, Belgium, France, Spain, and the United Kingdom. However, the highest contribution industries in these countries are not Financial and Manufacturing. Education, Health and Public Administration contribute more than some of the major industries. Combined with the results of the highest contribution industries of betweenness capability, we discover that the contribution of supporting industries (such as Education, Health, Public Administration) is more than some major industries (such as Manufacturing and Financial). In this circumstance, supporting industries may be more important than major industries.

Discussion
To achieve the goal of calculating topological features with original cluster characteristics in a multiplex network environment, we provide a calculating model from the perspective of 'NoN (network of networks)'. The calculation process of the calculation model contains four steps: (1) NoN construction; (2) weight calculations of edges; (3) definition of the topology indexes of the subnetwork in the NoN; and (4) defining the allocation process of each index into subnetworks. Based on the calculation process, we can calculate three inherited topological centralities including A-Str (inherited strength), A-BC (inherited betweenness centrality) and A-CC (inherited closeness centrality). The three inherited topological centralities respectively represent the influence capability (A-Str), betweenness capability (A-BC) and centered degree (A-CC) of the nodes in the NoN. To verify the effectiveness and validity and develop general theoretical conclusions for our model, we apply the model in a series of theoretical NoN constructed by SF (scale-free) networks and SW (small-world) networks and the empirical global input-output network of the network.
Based on the theoretical experiments, we achieve the following conclusions: (1) Our model is achievable and calculable in theoretical NoN. The results of three new topological centralities are valid and the results show a certain mathematical distribution. (2) The concentration of node A-BC is higher, and the randomness of A-CC is higher. (3) A scale-free character promotes a stronger concentration of the NoN, and a small-world character causes strong randomness. (4) Under the circumstances of high concentration, the distributions of inherited topological centralities (A-BC and A-Str) and distributions of one-layer network topological centralities are negatively correlated. As the randomness increases, the distributions of inherited topological centralities and one-layer topological centralities are positively correlated.
The results of the empirical application verified that our model is not only achievable and calculable in realworld systems; it also produced exciting findings regarding the highest contribution industries. The highest contribution industries are not only major industries (such as the Financial and Manufacturing); some support industries (such as Education, Health and Public administration) also contribute substantially. From the contribution perspective, these supporting industries are even more critical than major industries. To be noticed, the computational complexity will increase a lot when this calculating method is applied in huge networks, especially detecting shortest pathways. Thus, when apply our model in huge networks, we use K-path centrality (both for nodes and edges) instead of shortest pathway related centralities in huge real-world networks [30][31][32][33].
To extend the application scope of our calculation model, our next study will apply the model to other realworld social and economic systems. To achieve more accurate and abundant results, more inherited topological centralities will be discovered in future research.