Nonlinear Structural Fusion for Multiplex Network

Many real-world complex systems have multiple types of relations between their components, and they are popularly modeled as multiplex networks with each type of relation as one layer. Since the fusion analysis of multiplex networks can provide a comprehensive insight, the structural information fusion of multiplex networks has become a crucial issue. However, most of these existing data fusion methods are inappropriate for researchers to apply to complex network analysis directly. 'e featurebased fusion methods ignore the sharing and complementarity of interlayer structural information. To tackle this problem, we propose a multiplex network structural fusion (MNSF) model, which can construct a network with comprehensive information. It is composed of twomodules: the network feature extraction (NFE)module and the network structural fusion (NSF)module. (1) In NFE, MNSF first extracts a low-dimensional vector representation of a node from each layer. 'en, we construct a node similarity network based on embedding matrices and K-D tree algorithm. (2) In NSF, we present a nonlinear enhanced iterative fusion (EIF) strategy. EIF can strengthen high-weight edges presented in one (i.e., complementary information) or more (i.e., shared information) networks and weaken low-weight edges (i.e., redundant information). 'e retention of low-weight edges shared by all layers depends on the tightness of connections of their K-order proximity. 'e usage of higher-order proximity in EIF alleviates the dependence on the quality of node embedding. Besides, the fused network can be easily exploited by traditional single-layer network analysis methods. Experiments on real-world networks demonstrate that MNSF outperforms the state-of-the-art methods in tasks link prediction and shared community detection.


Introduction
e abundant relation data between entities can be collected from various sources or scenarios, allowing a slew of problems to be better solved in different application domains, e.g., information retrieval, cross-media computing, science and technology management, business intelligence, biomedicine, and ecology [1,2]. Taking together these types and sources of data may be able to give a more accurate and nuanced picture of network structure than individual network alone [3].
We believe that the joint analysis of multiple sources/ types of network data can provide a more accurate and comprehensive perspective. (1) In relation extraction tasks of multisource and multimodal data [4][5][6], networks can be extracted from video, text, and audio, respectively. Each network only reflects the connectivities among nodes in a single view. erefore, data analysis results can be easily misinterpreted if we only rely on data from a single source or modal. (2) In the process of knowledge graph fusion [7], to obtain a more informative knowledge graph, we need to integrate the existing knowledge graph with other specialized knowledge graphs. (3) In social network data analysis [8,9], lots of new online social networks have emerged and started to provide services, and the information available for the users in these emerging networks tends to be limited. e abundant information available in mature networks can be quite useful for link prediction and community detection in the emerging networks. (4) In biological multiomic data research studies [10,11], by using the individual's expression in each omic, researchers can construct networks of different omics and then fuse these networks into one comprehensive network to achieve more accurate prediction and analysis. In a word, the information fusion of multiplex networks in different application scenarios is a crucial issue and should be paid more attention. To model such networks, we represent such kind of networks as multiplex networks [12]. A multiplex network is one of the multilayer networks [13] in which the same set of nodes are connected by different types of relationships. Multiplex networks can not only present the intralayer links between nodes but also model the interlayer dependencies and interactions well. However, the latter is ignored by heterogeneous information network models [14]. On the left side of Figure 1, three layers of this multiplex network, respectively, are derived from three types of data, in which each layer has the same number of nodes. Nodes connected by dotted lines represent the same entity or user.
In order to fuse the multiple network data, inspired by the idea of data fusion, we divide the existing methods into four categories: (1) network structural (data-level) fusion methods [15,16]; (2) network feature (feature-level) fusion methods [17,18]; (3) network analysis model (strategy-level) fusion methods [19,20]; and (4) hybrid fusion methods. In this paper, we mainly focus on the former two methods. e first category mainly utilizes a network embedding method for multiplex networks to fuse the multiple features of each node into a comprehensive feature/tensor. e second category mainly aims to preserve the sharing and complementary information of the network and fuse the structural information of multiplex networks. Despite the fact that the forceful aggregation of multiplex networks may result in loss of information [21], if the lost information is shared by some layers, it will not affect actual network information due to redundancy. If the loss information is complementary, it actually causes the loss of information.
e existing feature-level fusion based on multiplex network embedding methods cannot clearly distinguish shared and complementary information of network structure. e similarity network fusion (SNF) [22] can fuse structural information of networks. Nevertheless, SNF operates on feature matrices, which is easy to compute in the structured data but is not directly applicable to the graph domain anymore. Furthermore, the existing structural fusion methods based on SNF only consider adjacency information and fail to notice higher-order proximity information. Overall, these problems severely limit the effectiveness of conjoint analysis and mining of multiple network data with heterogeneous information. In order to solve the existing research problems, we have the following challenges. (1) How to alleviate the dramatic increase in computational complexity with the increase of network layers for filtering redundant or uninformative information of multiplex networks? (2) How to effectively fuse the information of different layers, taking advantage of the complementarity in the network data? (3) How to solve the sensitivity and dependence of the model on parameters and network embedding quality?
Aiming at these challenges, we model multiple network data as multiplex networks. We propose a multiplex network deep structural fusion model (MNSF) to construct a singlelayer network with comprehensive information. e model can not only extract structural features but also filter redundant information and fuse complementary information. We present a method to construct different higher-order proximity networks based on network embedding matrices. e network constructed by our method can provide more abundant information and improve the robustness of MNSF. We also design a nonlinear enhanced iterative fusion (EIF) strategy. To make each layer more similar to the others, EIF utilizes the higher-order proximity and message-passing theory to iteratively update the similarity matrix of each layer. We perform link prediction and shared community detection tasks on a variety of real-world multiplex network datasets. e results show that MNSF outperforms other state-of-the-art algorithms in terms of performance and efficiency. According to [23], the feature vectors of the benchmark datasets already provide much useful information, and the graph structure only provides a method for the data denoising. erefore, MNSF achieves the goal of noise reduction of network features using the similarity network constructed based on the embedding of nodes to refine features of original networks. e rest of the paper is organized as follows. Section 2 presents some feature-level and structure-level fusion methods. Section 3 introduces related definitions of the data model we use and problem formulations. Section 4 presents our model and core algorithm. Section 5 shows the experiment results. Finally, the summary and outlook are described in Section 6.

Related Work
Network fusion is the process of integrating multiple network structures and additional information to produce more comprehensive, accurate, and useful information than that provided by any individual network data. We mainly focus on two categories of network fusion methods: (1) network feature fusion and (2) network structural fusion. In this section, we introduce and summarize the related work from these two aspects.

Network Feature Fusion.
e idea of network feature fusion is mainly based on network representation learning, aiming at mapping the multiple features of nodes in different layers into low-dimensional representation spaces. e goal of network feature fusion methods is to achieve the information fusion of multiple networks' features, in which these methods can be divided into coordinated representation fusion and joint representation fusion [24], as shown in Figure 2. network analysis. Zhang et al. [27] proposed a scalable multiplex network embedding method, which assumes that the same node in multiple networks preserves certain common features and unique features of each layer. us, the common and unique embedding of nodes in each layer is learned by the DeepWalk algorithm separately. Ma et al. [28] implemented node embedding for multidimensional networks with a hierarchical structure. e method adds up node embedding in multiple dimensions as the fusion feature of nodes in multiple networks. Based on the simultaneous modeling of two properties of multiview networks identified in the real world (preservation and cooperation), Shi et al. [29] proposed a feasible multiview network embedding algorithm MVN2VEC. Matsuno et al. [30] presented a multilayer network embedding method that captures and characterizes each layer's connectivity. e method utilizes the overall structure to consider sharing or complementarity of the layer structure. e fusion feature of nodes in a multiplex network is obtained by considering the combination of node embedding in each layer with layer vectors. In order to improve the performance of the existing embedding algorithms, Al-Sayouri et al. [31] proposed a tensor-based node embedding method, which constructed an explicit view based on a connection matrix and an implicit view through the nearest neighbors. For the multinetwork embedding task, DMNE [32] is very flexible and can be embedded into low-dimensional space for different scales and weighted (unweighted) and directed (undirected) networks. DMNE embeds a single network independently and then uses a joint regularization method to achieve the fusion of multiple features. Although the methods mentioned above implement the fusion of network information by shared/common embedding methods, the final output is the embedding of nodes in different layers, rather than a fusion representation (i.e., a fusion vector/tensor).

Joint Representation Fusion.
Ohmnet framework was used to learn features of proteins of different tissues in [15]. ey represented each tissue as a network, where nodes represent proteins. Individual tissue networks act as layers in a multilayer network, where they use a hierarchy to model dependencies between the layers (i.e., tissues). Recently, Liu et al. [16] extended a standard graph mining into the area of Network data analysis and mining  multilayer networks. e proposed methods ("PMNE(n)," "PMNE(r)," and "PMNE(c)") can project a multilayer network on a continuous vector space. On the one hand, without leveraging interactions among layers, "PMNE(n)" and "PMNE(r)" apply the standard network embedding method on the merged graph or each layer to find a vector space for multilayer network. On the other hand, in order to consider the influence of interactions among layers, "PMNE(c)" expands arbitrary single-layer network embedding method to a multilayer network.

Network Structural Fusion.
e main idea of network structural fusion is to follow the principle of sharing and complementarity for carrying out the structural fusion of the network. Hristova et al. [33] studied the geo-social properties of multiplex links for spanning more than one social networks. ey applied structural and features of interaction to the problem of link prediction across social networking services. Deng et al. [34] analyzed that social network is multisource and heterogeneous. Each network represents a specific relationship, and each node has different roles in different relationships. erefore, they proposed an optimal linear combination method for learning multiple relationships and extracted the connectivity of multisource social networks. Michele et al. [35] proposed a method which can preserve the information of the network in each dimension as much as possible and mapped the network into a singlelayer network. Tang et al. [36] proposed a multidimension network fusion method based on structural features and conducted community detection task in multisource networks. ey also analyzed four possible network integration strategies: network integration, benefit integration, feature integration, and partition integration.
Wang et al. [22] proposed an important similarity network fusion method. Based on the information propagation theory, the method iteratively updates each network so that multiple networks are obtained as similar as possible.
e resulting network contains structural information of multiple networks. Manlio et al. [37] presented a dimension reduction method for reducing the number of layers in multilayer networks based on quantum theory. is method can maximize the resolution of original and aggregated networks. It quantifies the information loss as a consequence of dimensionality reduction and calculates a threshold for the fusion. Xu et al. [38] proposed a weighted similarity network fusion method for the identification of cancer subtypes. Ma et al. [18] extended an upgraded version of ANF for similarity network fusion. ANF can reduce the computational complexity of similarity network fusion, and the structural information of multiplex networks is well preserved. Cowen et al. [39] reviewed the fusion analysis method of the propagation characteristics of biological data into a network. Ruan et al. [17] proposed an enhanced similarity network fusion method (ab-SNF) for associated signal annotation. e ab-SNF method adds feature-level correlation signal annotation as a weight when constructing a topic similarity network, aiming to increase signal features and reduce noise features for improving the performance of disease subtypes. Pai et al. [40] reviewed some of the latest approaches for patient similarity networks and looked forward to the widespread usage of network fusionbased approaches in medical and genomic data. Subsequently, they proposed a novel supervised classification framework (netDx) [41] for patient classification problems. e framework has high accuracy and scalability, which is able to integrate multiple types of data and handle sparse data well. Most of these methods are for multisource Euclidean data fusion, but it is difficult to expand to non-Euclidean graph data fusion. erefore, the study of this paper fills the gap. Ghavasieh et al. [42] introduced a framework for functional reducibility which allows enhancing transport phenomena in multilayer systems by coupling layers together with respect to dynamics rather than structure.

Data Model and Problem Formulation
In this section, we describe related symbols, concepts, and definitions in detail. We first describe a multiplex network to show the basic concepts. Next, we give the complementary structural information concepts of multiplex networks and define the problem of network feature fusion based on network embedding of multiplex networks. Finally, we formalize a generalized structural fusion problem of multiplex networks.

Data Model.
In terms of network data of multiple types and sources, it is more appropriate to represent such kind of networks as multiplex networks. As shown in Figure 3, three layers of this multiplex network are derived from three modal data, such as co-author network, semantic relation network, and social network. Multiplex networks can not only express the intralayer link but also model the dependencies and interactions between networks well. e detailed definitions of multiplex networks are as follows.
Definition 1 (multiplex networks). Consider L-layer multiplex networks of N nodes, in which each node can interact with the other ones through L kinds of relationships. An aligned Bob,Brian denotes Bob α 1 and Brian α 1 is connected in layer α 1 . e α 2 ,α 3 Danny,Adam denotes Danny α 2 and Adam α 3 is connected by duplicates Danny α 2 cross layers α 2 and α 3 .

Problem Formulation
Definition 2 (shared and complementarity structural information of multiplex networks). Given a two-layer 4 Complexity where the structural information of G 1 layer can be denoted as I 1 and the structural information of G 2 layer can be denoted as I 2 . en, I s � I 1 ∩ I 2 denotes shared information, also known as consistency information.
denotes complementary information, also known as unique information.
It is worth noting that shared information does not refer only to the edges shared by multiple networks. On the mesoscales and macroscales, a group of nodes always belongs to the same community in different layers, while the edges among these nodes are more likely to be different in each layer. Inspired by the idea [33], we assume the similarity (dependence) of different layers is a quantification of consistency information between layers. Furthermore, as shown in Figure 4, we visualize the distribution of shared and complementary information among all layers in the datasets of this paper according to the following formula: where s and t denote source and target layers, respectively. V is a set of nodes of a multiplex network. ego(s, v i ) denotes the set of nodes of ego network with respect to node v i in global network s. e detailed information of these datasets is presented in Section 5. It can be seen from Figure 4 that the similarity between different layers is obviously different. e lighter the color of a block, the greater the similarity of a local structure between the corresponding pair of layers and the more the information they share. On the contrary, the darker the color, the more the complementary information between the corresponding pair of layers.
Definition 3 (network feature fusion; multiplex network embedding). Suppose the methods make use of a realvalued superadjacency matrix A, A ∈ R N×L,N×L (e.g., representing text or metadata associated with nodes). Node embedding aims at learning a map function f: is a group of vectors of node v i in superadjacency matrix of G, and it can also be understood that it is composed of adjacency matrices of multiple layers [43].
Notice that all definitions above can be easily extended to the case of weighted networks. We embody this definition and related symbols in Figure 5.
Definition 4 (network structural fusion). A multiplex network G � G 1 , . . . , G L , where each layer should have the same set of nodes V and the number of nodes in the multiplex networks is N � |V|. Network structural fusion aims to find a joint structural representation that can describe the same node in a multiplex network based on the consistency and complementary information between multiple layers.
is joint representation can reflect the properties of each node in different layers. We define the Similarity network construction   Figure 3: e architecture of MNSF for three networks as an example. Network representation learning is a node embedding process. e embedded matrices contain vectors that represent the latent representations of all nodes. Enhanced iterative fusion is a process of multiplex network fusion.
Complexity fusion function as F: G ⟶ g, where G represents a multiplex network and g represents a single-layer network after fusion. e number of nodes in the network g is N.
In this study, the structural information of networks can be divided into three types: (1) macroscale information, such as small-world property and power-law distribution; (2) mesoscale information, such as community and motif; and (3) microscale information, such as node adjacency information, node degree, and the centrality of nodes. erefore, some network properties can be captured by network fusion methods, such as the rich club property, the calculation of edge betweenness, and k − core index, the synchronization analysis of the complex network. However, these properties cannot be captured by network feature fusion methods. e intuitive understanding of this definition and related mathematical symbols are shown in Figure 5.
In summary, the fusion of multiplex networks can provide more abundant information than a single-layer network. Yet, this abundant information is reflected by the complementary information. It is worth noting that the fusion of multiplex networks needs to not only consider the edges of networks (microscale) but also deeply consider the  [a 1 , ..., a n ] is a boole vector (1) [a f1 , ..., a fn ] is a dense vector (2) Danny Danny 6 Complexity community (mesoscale) of the network and the degree distribution of nodes (macroscale). For preserving the node's adjacency information, we simply integrate these edges existing in a multiplex network and name it as MerNet. It does not guarantee the preservation of mesoscale and macroscale information [13]. It is also verified in our subsequent experiments. In addition, as shown in Figure 5, one form of network structure fusion result is boole vectors with N dimension. One form of network structure fusion result is a dense vector with d dimension. erefore, there is an essential difference between structural fusion and feature fusion of network.

Multiplex Network Structural Fusion
In this section, we introduce a deep structural fusion model for multiplex networks (MNSF). MNSF incorporates network feature extraction (NFE) and network structural fusion (NSF). Figure 3 illustrates the steps of MNSF by three networks as an example. e gray section shows the name of the module. e right side of the figure visualizes the process of enhanced iterative fusion. e first module is the network feature extraction module, which performs representation learning for each layer. In the process of acquiring network embedding, the structural information of the original network is comprehensively exploited. e second module is the construction of node similarity matrices module, which takes the result of the network representation learning module as input. For each layer of a multiplex network, the similarities between nodes and its neighbors are calculated based on K-D tree algorithm. e third module is the network structural fusion module. In the process of algorithm iteration, the similarities between nodes and their neighbors are used to update the global similarity between nodes. e similarity information can be propagated among layers. After a few iterations, until the algorithm converges, the similarity matrix of each layer network is averaged at the layer as the final output to construct the final network. We mainly focus on the two modules: the network feature extraction and enhanced iterative fusion.

Network Feature Extraction.
In network feature extraction module, we conduct a biased random walk according to [44] to generate the sequence of nodes. en, we perform Skip-gram over the sequences to learn the node embedding with a given dimension d. For a node v j , it appears in position j, and we define C(v j ) � v j−c , . . . , v j+c as the context of v j , where 2c is the window size. We optimize the following objective function, which minimizes the log-probability of neighborhoods conditioned on its feature representation.
Hence, we need to minimize the following equation.

−log
For each P(v u | v j ), a softmax function is used to define the probability: where X v j denotes the word embedding of node v j and X v u and X v s denote the local context vectors and global node vectors. To solve the computation problem, we adopt the negative sampling approach proposed in [45], which samples multiple negative edges according to some noisy distribution for each edge. We replace each log where σ � 1/1 + exp(−x) is the sigmoid function and C n is the negative sample node set. We employ stochastic gradient descent (SGD) to optimize objective function equation (5).

Network Structural Fusion.
e network structural fusion module takes the result of the network representation learning as input. A similarity network g � (V, E) has N � |V| nodes. e distance between two nodes can be measured by embeddings of two nodes. Using the square of the Euclidean distance as a measure of similarity, ρ(v i , v j ) represents the distance between nodes v i and v j . e similarity matrix W is constructed according to the embedding of nodes, where W(v i , v j ) represents the similarity between the nodes v i and v j , and the calculation formula is where ρ(·, N i ) is the average distance between the node v i and the set of its neighbors N i . In this step, to increase the computational efficiency of the nearest neighbors, we use K-D tree algorithm with time complexity O(log N) to calculate ρ(·, ·). In order to eliminate the influence of node autocorrelation on the similarity matrix (that is, the elements on the main diagonal of the similarity matrix have larger values), the similarity matrix W needs to be normalized: Let N i denote the neighbor of node v i . For the given network G, we use K nearest neighbors to measure local affinity of node v i : It is generally believed that the affinity of a node and its neighbors is higher than that of the node and other further nodes, so this equation constructs a local affinity matrix S. e matrix P contains the global similarities between the node and all other nodes in the fusion network, while matrix S contains only the similarities between a node and its K − order nearest neighbors. Furthermore, the correlation between P and S is nonlinear. Based on the information propagation theory, the local affinity matrix of a layer of multiplex networks is exchanged with the global similarity matrix of another layer of multiplex networks, and the structural information fusion of the multiplex networks can be completed by iterative updation.
where l denotes current layer and L denotes the layer index of multiplex networks. In this module, due to the construction of the affinity network, this module reconstructs the direct and indirect relations between nodes. rough the selection of nearest neighbors, K nodes with the strongest correlation are used as direct neighbors to achieve information purification. For example, if certain 2-order neighbors v j of node v i have higher similarity than some direct neighbors, v j can be served as a direct neighbor of v i with high probability in affinity network S. In iteration fusion process of computing P, we consider the weight of interlayer and intralayer edges at the same time. Take edge e(v i , v j ) in affinity network S (l) as an example. By interaction of e(v i , v j ) and P (k) (v i , v j ) in other network k, we can determine complementarity and consistency of e(v i , v j ) in the whole multiplex network. To summarize, the result of this module is that (1) the disappearance of weak similarities (low-weight edges) helps to reduce the noise; (2) strong similarities (high-weight edges) in one or more networks are added to the others; and (3) low-weight edges supported by all networks are retained depending on how tightly connected their neighborhoods are across networks, which brings about reducing redundant information.

Higher-Order Proximity.
In equation (9), the result of iteration may lose higher-order proximity information and slow convergence rate if and only if S matrix with first-order proximity information is considered in the iteration update process. Moreover, higher-order proximity can avoid the model's strong dependence on the accuracy of similarity calculation based on embedding vectors and alleviate the loss of structural information [46]. On the basis of introducing higher-order proximity, we propose a nonlinear enhanced iterative fusion strategy to improve equation (9). Taking a paired network as an example, the equation for iterating using higher-order proximity is where Tr (1) n denotes the measure of n-order proximity in layer 1 and t denotes the current time of iteration. e final formula of enhanced iterative fusion is where α ∈ [0, 1] is a parameter to control the importance of proximity and α 1 + α 2 + · · · + α k � 1, l, k � 1, 2, . . . , n. e global similarity matrix P of each layer of the network is averaged to obtain the similarity matrix of the fused network as the final output of the algorithm. It should be noted that the studies in [47] have empirically demonstrated that network features extracted based on three-hop neighborhoods contain the most useful information, so the choice of K is not the bigger the better in Tr n . erefore, in this paper, we use the 1st, 2nd, and 3rd order proximity to implement the enhanced iterative fusion module, and we have verified through many experiments that important parameters of these similarities are set to α 1 � 0.3, α 2 � 0.4, and α 3 � 0.3 to obtain trade-off between precision and efficiency. We use the nearest neighbors to measure local affinity. en, the corresponding localized network can be constructed from the original weighted network using the following equation: Equation (12) can be understood as constructing a new network only based on second-order proximity. As shown in Figure 6, to avoid repeatedly calculating low-order proximity problems when constructing a high-order affinity network, we need to remove some edges existing in loworder networks from high-order proximity networks. In fact, equation (12) is a second-order instance of Tr (1) n in equation (10).
According to equation (13), through the network structure S of a layer in the iterative process, the average value of the similarity matrix P of the remaining other 8 Complexity network is iteratively updated, which can gradually converge to each other after updating each layer of the multiplex networks. is similarity network contains its shared information and complementary information and finally achieves the fusion of multiplex networks. Taking a node pair (1, 2) as an example, according to equation (13), we elaborate a computation process of the proximity between nodes 1 and 2 in Figure 7. α controls the contribution of different proximities to the result. Equation (13) is an implementation of equation (11) based on 1st, 2nd, and 3rd order proximity.
is module can strengthen high-weight edges presented in one (i.e., complementary information) or more (i.e., complementary information) networks and weaken lowweight edges (i.e., complementary information). e retention of low-weight edges shared by all layers depends on the tightness of connections of their K-order proximity. e pseudocode of MNSF is shown in Algorithm 1. In this pseudocode, lines 2-4 correspond to the network feature extraction module. Lines 5-11 implement the similarity of node matrices construction. Lines 12-15 indicate the network structural fusion module, in which line 13 achieves the calculation of high-order proximity and enhanced iterative fusion is implemented in line 14. We can obtain a matrix of the average global similarity network in line 16. e resulting network graph is calculated by K-D tree algorithm. Using this algorithm, we construct neighbors of each node.

Experiment Analysis
In this section, we study the performance of MNSF with high-order proximity iteration in different real-world datasets. We use link prediction and shared community detection tasks to verify the performance of MNSF.

Datasets.
For our experiments, we run MNSF and compare baseline methods on each of the following multiplex networks. ese datasets contain two categories: Public datasets and Private dataset. Public datasets are composed of five benchmark multiplex network datasets involving social, biological, genetic, and transportation. e specific information on these public datasets is shown in Table 1. Private dataset is an interesting semantic network dataset that we construct. is dataset is a network of acknowledgment relationships extracted from the acknowledgment part of dissertation data and the collaboration network of corresponding entities from AMiner.

Public Datasets.
VICKERS classroom social multiplex networks: this dataset was collected by Vickers from 29 seventh grade students in a school in Victoria, Australia. Students were asked to nominate their classmates on a number of relations.
CS-Aarhus social multiplex networks [48]: this dataset consists of five kinds of online and offline relationships (Facebook, Leisure, Work, Co-authorship, and Lunch) between the employees of Computer Science department at Aarhus. ese variables cover different types of relations between the actors based on their interactions.
CKM physicians innovation multiplex network [37]: this dataset was collected by Coleman, Katz, and Menzel on medical innovation, considering physicians in four towns in Illinois, Peoria, Bloomington, Quincy, and Galesburg. ey were concerned with the impact of network ties on the physicians' adoption of a new drug, tetracycline.
London multiplex transport network [49]: this dataset was collected in 2013 from the official website of Transport for London and was manually cross-checked. Nodes are train stations in London and edges encode existing routes between stations. Underground, overground, and DLR stations are considered.
Celegans multiplex gpi network [37]: this dataset considered different types of genetic interactions for organisms in the Biological General Repository for Interaction Datasets (BioGRID, thebiogrid.org), a public database that archives and disseminates genetic and protein interaction data from humans and model organisms.
ese networks have been used as benchmark datasets for evaluating multiplex network analysis methods. In addition, the CKM dataset has ground-truth information about the community label of nodes. erefore, we perform performance testing of link prediction task on all datasets and perform performance testing of shared community detection task on CKM dataset.

Private Dataset.
We first introduce the acknowledgment data from the dissertation. e acknowledgment chapter is the most emotional part in a dissertation, which can truly reflect the author's research and social interaction.
ere are many entities in this chapter, including the (1) Figure 6: Construction of first-and second-order weight proximity networks. (a) First-order proximity network (original network).
(b) Second-order proximity network. e latter is constructed on the basis of the former network. In (b), blue edges denote edges common to first-order networks, and yellow edges denote edges unique to second-order networks.
author's mentor, teachers, fellow students in laboratory, classmates or family members, etc. We construct acknowledgment network in the acknowledgment of all dissertations from 1997 to 2015 in several Chinese universities. In addition, we also construct the co-authorship network by an open academic graph: ArnetMiner,  Figure 7: A toy example illustration. Green part is an iterative process that directly uses adjacent information. Purple part is an iterative process that uses first-order neighbor similarity information. Blue part is an iterative process that uses second-order neighbor similarity information. e final summation is the approximate similarity between nodes 1 and 2.
Input: Graph G � 〈V, E, L〉, K is the number of nearest neighbors, μ is a hyperparameter, and α is a parameter in equation (11).
We also perform algorithm verification on this dataset.

Experimental Setup.
For implementing the network feature extraction module, we use representation learning of nodes to extract the feature of each layer. We set p � 2 and q � 1 as default parameters in the biased sample process of the Node2Vec method. We set the number of walks to 20, walk length to 30, and the dimension of vectors to 128. In MNSF, the detailed parameters in enhanced iterative fusion process are set as K � 20, μ � 0.4, and α 0 , α 1 , α 2 � {0.3, 0.4, 0.3}. In the construction process of the fused network, the K value of K-D tree is also 20. According to a large number of experiments, the settings of the above parameters are designed for the performance trade-off of our model on different datasets.

Baseline Method.
In these experiments, we test 11 baseline methods with the same parameters and dimensions. e explanations of these baseline methods are as follows. Some of these methods can be used to test two tasks simultaneously. Other methods can only be suited to one of the two tasks. Details of baseline methods are as follows: (i) CN (common neighbor) captures the notion that two nodes that have a common neighbor may be introduced by that neighbor. It has the effect of "closing a triangle" in the graph and likes a common mechanism in real life. (ii) JC (Jaccard coefficient) is a measure used for gauging the similarity and diversity of sample sets and is defined as the size of the intersection divided by the size of the union of the sample sets. (iii) AA (Adamic/Adar) is a measure to predict links, according to the number of shared links between two nodes. It is defined as the sum of the inverse logarithmic degree centrality of the neighbors shared by the two nodes. (iv) AAMT [33] is a link prediction method for multiplex networks based on the Adamic/Adar coefficient neighbor similarity, which considers the intensity and structural overlap of multiplex links simultaneously. (v) Node2Vec [44] adds a pair of parameters to achieve BFS and DFS sampling process on the single-layer network. It makes it better for capturing the role of nodes, such as hubs or tail users. (vi) Ohmnet [15] is a node embedding method for multiplex networks, where hierarchy information is used to model dependencies between the layers. (vii) PMNE [16] has three methods of node embedding, each of which generates a common embedding of each node by merging multiple networks. We compare these three models with other baseline methods. We denote their "network aggregation," "results aggregation," and "co-analysis model" as PMNE(n), PMNE(r), and PMNE(c), respectively. (viii) MNE [27] is a scalable multiplex network embedding. It contains one high-dimensional common embedding and a low-dimensional additional embedding for each type of relations. en, multiple relations can be learned jointly based on a unified network embedding model. (ix) MELL [30] is a novel embedding method for multiplex networks, which incorporates an idea of layer vector that captures and characterizes each layer's connectivity. is method exploits the overall structure effectively and embeds both directed and undirected multiplex networks, whether their layer structures are similar or complementary. (x) GraphSAGE [50] is a graph neural network framework for inductive representation learning on large graphs. GraphSAGE is used to generate low-dimensional vector representations for nodes and is especially useful for graphs that have rich node attribute information. We use an unsupervised learning version of GraphSAGE to serve as a baseline method of the link prediction task. (xi) GenLouvain [51] is a modularity-based multiplex network community detection algorithm. e algorithm not only considers the modularity within the layer but also considers the modularity between layers. By maximizing the modularity metrics, the algorithm completes the community detection task. We only use this algorithm as a baseline method for the node clustering task.
In this paper, we only apply CN, JC, AA, Node2Vec, and GraphSAGE to link prediction tasks on a single layer where the test edge is located. For the Ohmnet algorithm, we construct a hierarchy describing relationships between different layers randomly. We regard the common embedding in the MNE algorithm as the node global embedding. AAMT uses the multiplexity property of nodes (interlayer information) and similarity between nodes (intralayer information) to predict the probability of link. Besides the same walk length, walk times and embedded dimensions are set as the same parameters of MNSF, and we also set other experimental baseline methods using the default parameters, such as PMNE and MELL.

Link Prediction.
In this section, we perform the link prediction task on these multiplex networks. We refer to the experimental settings of the multiplex networks of literature [52]. For the link prediction task, we remove 20% edges of each layer in the original network and use area under the curve (AUC) scores to evaluate the performance of these algorithms for predicting missing edges in each layer. In this paper, we use the residual (80%) edges of each layer for training, and 20% edges are randomly selected from each layer for testing. ese node pairs in edge sets of the test set are regarded as positive examples. en, we randomly sample an equal number of node pairs from the test set, in which no edge connecting node pairs is served as negative examples. AUC is the area under the receiver operating characteristic (ROC) curve. AUC of a classifier is equal to the probability that the classifier will rank a randomly chosen positive example higher than a randomly chosen negative example. With positive examples (Pos) and negative examples (Neg), AUC can be calculated by We calculate the similarity between nodes by AA (Adamic/Adar) metric based on the fused network. In terms of node embedding methods, we use the cosine function of vectors as a similarity metric. e larger the similarity scores are, the more likely there exists a link between them. For other single-layer network methods, we train a separate embedding for each relation type of the network to predict links on the corresponding relation type. It means that they do not have information from other relation types of the network.
From Table 2, we can know that MNSF is significantly better than other comparison algorithms. Our model shows better performance on multiplex network dataset method than single-layer methods such as CN, JC, AA, Node2Vec and GraphSAGE, which directly proves that structural information fusion can improve the accuracy of link prediction. We perform the Node2Vec method on a single target layer of multiplex networks. Under the condition of having the same network feature extraction module, the results of the comparison experiment also indirectly prove the effectiveness of the enhanced iterative fusion module. Similarity network construction can refine the original network's information and filter the noise information and redundant information.
e iterative fusion process can capture structural information from other networks. is result also validates the point of view of literature [23]. We regard Ohmnet, PMNE, MNE, MELL, and MNSF as comparative experimental groups. e first four algorithms are the latest multiplex network representation learning methods to realize network feature fusion. Ohmnet and PMNE are extensions of the traditional single-layer network embedding method (Node2Vec), but there is no direct or indirect consideration of interlayer correlation and dependency information in the fusion information. It leads to an inevitable loss of information in the fusion process, so the structural information fusion of multiplex networks cannot be well realized. Both MNE and MELL transfer between nodes in the layer from the perspective of consistency information (shared information) and complementary information (unique information). In these two algorithms, the common (or layer) embedding is considered, but these embedding methods ignore redundant and uninformative information in the network. is process of interlayer node embedding based on common vectors can lead to distortion and inaccuracy of information. On the Celegans dataset, the AAMT obtains outstanding results. We think that nodes have strong shared information except layer 2. In the iteration process, MNSF makes structural information between each layer as similar as possible. e specificity of layer 2 leads to differences between the fused network and other layers. In fact, almost all the layers except the second are very similar, which is the reason for the unsatisfactory performance of our model. AAMT can consider the Adamic/Adar index and the multiplexity of nodes of each layer comprehensively, so this influence is weaker than that of MNSF.

Shared Community Detection.
Community detection aims to group similar nodes so that nodes in the same group are more similar to each other than those in different groups. In CKM dataset, nodes have the global community label. In other words, each node in a multiplex network has different relation types but only belongs to a unique community. For this dataset, this task usually is called a shared community detection task, which is a significant mining task in multiplex network analysis. So, we use CKM dataset as the benchmark dataset of the shared community detection task. After fused multiplex networks, the traditional community detection algorithms can be applied to the fusion network and be treated as comparison methods of this paper. In this paper, since GN algorithm [53] simply obtains node partition with different number communities, experimental comparison can be conducted under the same number of communities. So, we use GN algorithm to conduct the shared community detection task in this paper.

Evaluation Metrics.
Given the ground-truth community in the real-world datasets, we use normalized mutual information (NMI) [54] to evaluate the performance of the methods.
where X and Y denote two partitions of the network and H(X | Y) denotes the normalized conditional entropy of a partition X with respect to Y which can be expressed as follows: where |C| denotes the number of the community. e larger the NMI is, the better the result is. e value of NMI ranges from 0 to 1. It equals to 1 when two partitions match perfectly and equals to 0 on the contrary. In the domain of node clustering, the chance-corrected version of this measure is the adjusted Rand index (ARI). It is known to be less sensitive to the number of parts. It is possible to say that two elements of Y, i.e., (x, x ′ ), are paired in P if they belong to the same cluster. Let Q and U be two partitions of the object set Y. e Hubert-Arabie formulation of the adjusted Rand index is where a is the number of pairs (y, y ′ ) ∈ Y that are paired in Q and in U; b is the number of pairs (y, y ′ ) ∈ Y that are paired in Q but not paired in U; c is the number of pairs (y, y ′ ) ∈ Y that are not paired in Q but paired in U; and d is the number of pairs (y, y ′ ) ∈ Y that are neither paired in Q nor paired in U. is index has an upper bound of 1 and takes the value 0 when the Rand index is equal to its expected value. Figure 8, MNSF also shows excellent performance in shared community detection task in general. Among them, MNSF has obtained the largest NMI and ARI scores. e main reason is that there are isolated nodes in the network, and the shared community detection result of the whole network is up to 0.9835. Compared with other comparison algorithms, it is verified in the shared community detection task that our model can preserve the global mesoscale information of the multiplex network more effectively. In particular, there is strong complementary information between layers in CKM dataset. MNSF further validates that the network fusion method can more fully consider the consistency and complementarity between networks. In terms of other methods, MELL learns a representation of each node separately in each layer. We sum the representations in different layers of nodes as the global embedding of nodes and compare them with our model. erefore, the performance of MNE and MELL in this task shows that this kind of algorithm cannot well preserve the shared community information of nodes. Ohmnet and Gen-Louvain methods show competitive performance. ey detect the network sharing community from a global perspective.

Result Analysis. As shown in
In general, the results of link prediction and shared community detection tasks prove the effectiveness of our model. Considering the 1st, 2nd, and 3rd order proximity of enhancement iteration process, MNSF can effectively fuse the shared and complementary structural information between layers and preserve more abundant network structural information such as microscale and mesoscale information.

Parameter Sensitivity.
In this section, we test the parameter sensitivity of MNSF for the link prediction task. Based on the above experimental setup, we use the variable-controlling (adjust one parameter and fix other parameters) strategy and the CS-Aarhus dataset to study parameter sensitivity.
e detailed experimental parameters are (1) the nearest neighbor parameter K in proximity network construction and iteration process and (2) the hyperparameter μ in calculating the weight matrix process. As shown in Figure 9(a), with the parameter K gradually increasing, there is a significant change for AUC scores, which rises first and then declines. It shows that the performance of our model (blue line) is dependent on the selection of nearest neighbor parameters in the iteration fusion process. In the fusion network construction process, K can directly affect the quality of the fusion network and the performance of downstream tasks. As K gradually increases, the selected nearest neighbors result in a nonsimilar node pair being constructed as an edge and introduce noisy information. However, SNF is more sensitive to the selection of K values than MNSF. Compared with the original SNF model (orange line) in the literature [22], the proposed enhanced iterative fusion strategy can alleviate the sensitivity problem. According to Figure 9(b), the hyperparameter μ is suitable around 0.4 in CS-Aarhus and optimizes the performance of the test task. It is consistent with the recommended range of values given in [22].

Compared Methods.
We use three comparison algorithms to verify the validity of each module of MNSF.
(i) MerNet: this network construction method is to integrate edges of multiplex networks directly. In a multiplex network, if there are edges between node pairs, then there is an edge between the node pairs in the merge network.
(ii) Net4Mnsf constructs a network by making each layer of multiplex networks participate directly in the iteration fusion process of the MNSF method.
(iii) MMSF(DW) is constructed via DeepWalk, a node embedding method, to replace Node2Vec method in the network feature extraction module of MNSF.  Figure 10, the fused network obtained by MNSF is superior to MerNet obtained by directly collecting the edges of the multiplex network (the flattening of a multiplex network). It shows that our model enhances the mesoscale structure of the network through nonlinear enhanced iterative instead of linearly merging the edges between nodes from a microscale perspective. Moreover, MNSF outperforms Net4Mnsf method that directly feeds the network to the MNSF iterative fusion process. It indicates that the network feature extraction module (representing learning method and similarity network construction) in our model can filter some noise, so the module is effective and meaningful. Both MNSF(DW) and MNSF are different in the network feature extraction module. Both of them can achieve the best ARI and NMI scores, which also verify that the method of highorder similarity used in the enhanced iterative fusion process can effectively alleviate the dependence on the quality of the obtained node embedding. Figure 11, we can see that MNSF generally has relatively fast computational efficiency. Our model is based on K-D tree algorithm, and the average time complexity of K-D tree is O(log N). erefore, MNSF has lower computational complexity than the original SNF  model (based on KNN). Besides, in the network feature extraction model, we use the Node2Vec algorithm, which has better scalability. e MNE and MELL algorithms result in computational complexity due to feature learning and backpropagation to optimize the model. For Ohmnet and PMNE algorithms, random walks at each layer of multiplex networks are required to reduce the computational efficiency of the algorithm. erefore, MNSF also has satisfactory computational performance in real-world datasets with different scales.

Conclusions
In this paper, we propose a deep structural fusion framework of multiplex networks, named MNSF, which is based on network representation learning and enhanced iterative fusion (EIF). MNSF utilizes a network embedding method to generate a low-dimensional vector representation of nodes in each layer. Based on the node embedding matrices, MNSF constructs node similarity networks for each layer of a multiplex network. Considering the sharing and complementarity of multiplex networks, we also propose a nonlinear enhanced iterative fusion strategy to fuse these similarity networks into a comprehensive single-layer network. Moreover, in the iteration process, EIF alleviates the dependence on the quality of node embedding and provides more abundant information by higher-order proximity. We evaluate MNSF for link prediction and shared community detection tasks on real datasets from different domains. e experimental results verify that our model outperforms baseline methods in general. It indicates that MNSF can fuse the structural information of multiplex networks more effectively than the existing methods. e structural fusion of multiplex networks has promising prospects. For future work, we will investigate the impact of different network representation learning methods on our model. Besides, we will try to apply it to other applications such as cross-domain retrieval and cross-network information propagation.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.