Finding Missing Links in Complex Networks : A Multiple-Attribute Decision-Making Method

Link prediction, which aims to forecast potential or missing links in a complex network based on currently observed information, has drawn growing attention from researchers. To date, a host of similarity-based methods have been put forward. Usually, one method harbors the idea that one similarity measure is applicable to various networks, and thus has performance fluctuation on different networks. In this paper, we propose a novel method to solve this issue by regarding link prediction as a multipleattribute decision-making (MADM) problem. In the proposed method, we consider RA, LP, and CAR indices as the multiattribute for node pairs. The technique for order performance by similarity to ideal solution (TOPSIS) is adopted to aggregate the multiattribute and rank node pairs. The proposed method is not limited to only one similarity measure, but takes separate measures into account, since different networks may have different topological structures. Experimental results on 10 real-world networks manifest that the proposed method is superior in comparison to state-of-the-art methods.


Introduction
In recent years, the research of link prediction in complex networks has captured much attention of researchers from various disciplines [1] not only because many available real-world networks are incomplete [2,3] but also because link prediction is closely related to many other problems [4,5].It offers one possible way to understand the evolution of networks [6][7][8] and gives help to find potential interactions between proteins in biological networks [9,10].Link prediction has also been applied to friendship suggestions in social networks [11,12], products recommendation in e-commerce systems [13,14], collaboration prediction in coauthorship networks [15,16] and so on.
As a fundamental research hotspot in complex network analysis, link prediction can uncover missing or latent existent links and point out spurious links in a network based on the observed information [9,17,18].In this paper, we center on finding missing links.Generally speaking, two unconnected nodes with a high similarity score are deemed to be likely to have a missing link [5,17].That is the basic hypothesis of the so-called similarity-based approaches.In a similarity-based approach, similarity scores of nonexisting links are estimated first, and then links at the top of the sorted score list with descending order are predicted as missing ones [17].To date, great efforts have been devoted to link prediction based on observed network structure information [1,4,17], such as common neighbors [19][20][21], local paths [22,23], and triangle structures [24,25].Along this line, a plethora of similarity-based indices and methods have been proposed.These approaches usually focus on only one similarity measure and assume that it is applicable to all networks.However, different networks always have separate inner structural features [26,27].Thus, the prediction performances of these approaches are not stable on different networks.
On the other hand, likelihood-based algorithms aim at identifying the most likely generative model for a network and then estimate the connection likelihood of any two nodes according to the considered model [28,29].These algorithms usually assume that a network has a known structure and build a model, such as hierarchical structure model [29] and stochastic block model [9], to fit the structure and evaluate model parameters using statistical methods [4].However, the structure of a network is not always known, and a model cannot be suitable for all networks.
In this paper, we regard link prediction as a multipleattribute decision-making (MADM) (also called multicriteria decision-making (MCDM)) problem.MADM is an approach that has been designed to select a preferred alternative, classify alternatives in a small number of categories, and/or rank alternatives in a subjective preference order [30,31].It is a widely used tool in various fields [32,33].Among numerous MADM methods put forward to solve real-world decision problems, the technique for order preference by similarity to ideal solution (TOPSIS) continues to work satisfactorily across diverse application areas [31,34].TOPSIS was originally proposed to give help to determine the best alternative with a finite number of criteria [35].TOPSIS makes full use of attribute information, affords a cardinal ranking of alternatives, and does not require attribute preferences to be independent [36].As a well-known classical MADM method, TOPSIS has received considerable interest from researchers and practitioners [31].In a complex network analysis, TOP-SIS was used to identify influential nodes [34,37,38].In this paper, we apply TOPSIS to reveal missing links.The similarity scores based on different similarity indices are considered as multiattribute to make the decision for ranking nonexisting links.
The work in [39] also adopted TOPSIS in link prediction; however, it is totally different from our work.In [39], TOPSIS is only used to evaluate the local centralities of common neighbors.The similarity score between two nodes is computed based on the local centralities of their common neighbors.In our method, TOPSIS is employed to evaluate the degrees of similarity of node pairs and determine the missing links.Three famous similarity indices, that is, RA [21], LP [21,23], and CAR [24] are chosen as the multiattribute of TOPSIS.The reason for selecting these indices is that they are designed based on different but prominent structural features.Since each attribute is associated with a weight in TOP-SIS, we present a new algorithm based on the known information about micro nodes to determine the weights.To verify the performance of the proposed method, we conduct experiments on 10 real-world networks from various fields.The experimental results demonstrate the stability and robustness of our method.
The rest of the paper is structured as follows.In Section 2, we give the description of the link prediction problem and the metrics for evaluating the accuracy of link prediction algorithms.Section 3 lists the baselines, and Section 4 introduces the proposed method.In Section 5, the experimental results and performance analysis of the proposed method are presented.Finally, Section 6 concludes this work.

Problem Description and Metric
Consider an undirected and unweighted network G V, E , in which V and E are the node set and link set, respectively.Multilinks and self-loops are not allowed in this study.For a network containing N nodes, the set that contains N N − 1 /2 possible links is denoted by U, and then the set of unconnected node pairs is U − E. Each node pair in U − E is assigned a similarity score according to a given similarity method.All unconnected node pairs are sorted in descending order according to their scores, and the node pairs at the top are most likely to have missing links [17].
Actually, we do not have the ground truth, that is, the missing links are not known.Therefore, to test the accuracy of a link prediction method, the link set E is randomly divided into two parts: training set E tr and testing set E ts , such that E = E tr ∪ E ts and E tr ∩ E ts = ∅.Two standard metrics are employed to quantify the accuracy of link prediction algorithms: AUC [17] and Precision [40].In this situation, the AUC value can be interpreted as the probability that a randomly selected missing link (i.e., a link in E ts ) is assigned a higher similarity score than a randomly selected nonexistent link (i.e., a link in U − E).In implementation, we perform n times of independent comparisons.If there are n 1 times that the missing link has a higher score and n 2 times that they have the same score, the AUC value is computed as Precision characterizes the ratio of correctly predicted links within a given prediction list.If we take the top-L as the prediction list, among which m links are correctly predicted, then Precision is

Baseline Prediction Methods
Up to date, many link prediction methods have been proposed [1,4,17].Here, we list some state-of-the-art approaches used in this paper.
(1) Resource allocation (RA) index [21].This index models the resource allocation between two nodes through their shared neighbors.The amount of resource that one node received from another through their common neighbors is defined as their similarity, which is where Γ x denotes the neighbor set of node x, and k z is the degree of node z.
(2) Adaptive degree penalization (ADP) index [27].This method is proposed to automatically adapt to the network structure.It tries to estimate the bestperforming degree penalization by the network clustering coefficient.The formal definition is where β is a constant, C is the average clustering coefficient of the network.β is set to 2.5 in this paper according to the authors' suggestion.
(3) CAR index [24].This index stems from both nodebased and link-based perspectives, which suggests that two nodes are more likely to link together if their common neighbors are members of a local community.The similarity function defined by this index can be computed as

5
where CN x, y is the number of common neighbors between x and y, which is (4) Clustering coefficient for link prediction (CCLP) index [25].This index is inspired by the idea that triangle information is very useful in estimating similarities of nodes.It computes the similarity between two nodes by employing the clustering coefficient of shared neighbors: where CC z denotes the clustering coefficient of node z, which is in which t z is the number of triangles passing through node z.
(5) Local path (LP) index [21,23].This metric extends the horizon of common neighbors to three hops paths, which estimates the similarity of two nodes as where P i x, y is the number of paths between x and y with length i; λ is an adjustable parameter.Generally, λ is a very small number, and we set λ = 0 001 in this paper.
(6) Mutual information (MI) index [41].This method defines nodes' similarities from the perspective of information theory by computing the conditional self-information of the existence of a link between two unconnected nodes given their common neighbors.The similarity between two nodes is estimated as where I L 1 xy is the self-information of x and y being connected, and I L 1 xy ; z is the mutual information of the existence of a link between x and y and the shared neighbor z.I L 1  xy and I L 1 xy ; z are, respectively, calculated as 7) Adaptive fusion model base on logistic regression (LR index) [26].The method was proposed based on the observation that: (i) the roles of different structural features in a network are utterly different, and (ii) the role of a structural feature in different modules is also different [26].LR index defines the connection probability of node pair x, y as where P M k x, y is the connection probability of x, y in module M k , which is where S F l M k x, y denotes the similarity score induced by feature F l for x, y in module M k .Three scenarios of modules were considered in [26]; correspondingly, the index based on these scenarios was denoted as LR 1 , LR 2 , and LR m .LR m is a tradeoff between LR 1 and LR 2 ; hence, we use LR m in our experiments.As in [26], 3 Complexity three features, that is, CN (see (6)), PA (see (14)), and DD (see (15)), are incorporated for LR m in this paper.8) Local nave Bayes (LNB) method [42].This method calculates the connection likelihood between two nodes based on the LNB model.The likelihood score of node pair x, y is defined as

PA x, y
where s = U / E tr − 1, which is a constant for a network, and R z = N Δz + 1 / N ∧z + 1 , in which N Δz and N ∧z are, respectively, the number of connected and unconnected node pairs whose common neighbors include z.
In [42], an exponent function f k z , which is a function of the degree of node z, is added to the item sR z in (16).Using a logarithmic function on both sides and neglecting the constant s −1 , a linear formula of connection likelihood is obtained, which is In this paper, we use f k z = 1; the corresponding method is named LNB CN [42], which is Maximization entropy (MaxE) method [28].This method is a likelihood-based algorithm based on a series of results concerning constrained entropy maximization [43].MaxE uses the observed portion of a network as constraints of a maximization procedure defined within the exponential random graph (ERG) framework [43].In the case of undirected and unweighted networks, the ERG framework maximizes the likelihood function L = ln P A o , where A o is the adjacency matrix of the observed portion, that is, training set, and P A o is defined as where a ij ∈ A o and p ij = x i x j / 1 + x i x j .By solving the system of equations in (20), the maximization of likelihood can be obtained.

Methodology
A multitude of similarity indices has been carried out for link prediction in complex networks.In general, one similarity index only employs one or two structural features and assumes that they are suitable for all networks.However, different networks usually have different structural features; separate similarity indices need to be taken into account.
To address this issue, a novel method is proposed in this paper to forecast missing links by considering link prediction as a multiple-attribute decision-making (MADM) problem.
MADM is an approach to rank alternatives in a subjective preference order [30,31].In this paper, each potential missing link is viewed as an alternative, and each similarity index is considered as an attribute.In implementation, we adopt TOPSIS [35], a well-known classical MADM method, to uncover the missing links in this paper.Meanwhile, three classical similarity indices, that is, RA, LP, and CAR, are chosen as the multiattribute in our method.For convenience, our method is named LP TOPSIS .
In the following of this part, we first introduce the TOP-SIS method in Section 4.1.Then, the proposed method is presented in Section 4.2.Finally, a simple example is given to explain how the proposed method performs in Section 4.3.

TOPSIS Method.
Technique for order preference by similarity to ideal solution (TOPSIS) is a simple but effective ranking method in conception and application [35].The standard TOPSIS method attempts to determine the best alternative that simultaneously has the shortest distance from the positive ideal solution and the farthest distance from the negative ideal solution.The ranking of the alternatives is calculated according to the relative closeness to the ideal solution.
Given a decision matrix X = x ij m×n of m alternatives and n criteria (decision attributes) in where x ij denotes the value of the ith alternative under jth criteria in matrix X.The procedures of the TOPSIS method are depicted as follows: Step 1. Normalize the decision matrix X by using the vectornormalization technique. where , n is the normalized value of ith alternative under the jth criteria.

Complexity
Step 2. Calculate the weighted-normalized decision matrix V = v ij m×n by using the associated weights.
where w j j = 1, 2, … , n is the weight of the jth criterion.
Step 3. Determine the positive ideal solution S + and negative ideal (anti-ideal) solution S − , respectively.
For the benefit criteria K b : 26 and for the cost criteria K c : Step 4. Obtain the Euclidean distance of each alternative from the positive ideal and the negative ideal solutions, respectively. where are the Euclidean distances of ith alternative from the positive ideal and the negative ideal solutions, respectively.
Step 5. Compute the relative closeness of each alternative to the ideal solution.
where C i ∈ 0, 1 for i = 1, 2, … , m.An alternative with a higher C i is supposed to be a better solution and vice versa.
4.2.The Proposed Method.Figure 1 presents the flow chart of the proposed method.The detailed description of the proposed method is outlined as follows.
Step 1. Determine the weights of different similarity indices.
In TOPSIS, each attribute is associated with a weight.In this work, we employ similarity indices as the multiattribute in TOPSIS application for link prediction.Thence, we need to determine the weight of each index.To this end, we adjust the AUC for micro nodes to propose AUC v i for node v i [44].For the node v i , let E v i denote the set of existing links between v i and other nodes and E v i represent the set of nonexisting links between v i and other nodes.Suppose I be a similarity index, the similarity scores of all links in E v i and E v i are computed based on I, respectively.Then, the value of AUC v i is defined as where n v i is the total independent comparison time; n v i ′ denotes the comparison time that the link from E v i has a higher similarity score, and n v i ″ denotes the comparison time that has the same score.
To define the weight of similarity index I, we randomly select p% nodes from the network, and compute the value of AUC v i for each selected node.Let ρ be the group of selected nodes, we define The weight of similarity index I is denoted as w I , which is determined by the ratio of AUC ρ and 0.5 (see (33)) [44].0.5 represents the result of random prediction.
Algorithm 1 lists the procedure for determining the weight of a similarity index.In implementation, we randomly select 10% nodes from the networks.
Step 2. Calculate the similarity score for each potential missing link by different similarity indices.As aforementioned, the indices of RA, LP, and CAR are used in LP TOPSIS .The reason that we adopt these indices is that they are designed based on different but prominent structural features.Suppose that m unconnected node pairs are assigned similarity scores, and the decision matrix is represented as follows: Step 3. Compute the normalized and weighted decision matrix.Since the values of different similarity indices are in different scales, we normalize matrix X based on (22).Then, we 5 Complexity compute the weighted normalized matrix using (23).The associated weights are obtained in Step 1.
Step 4. Determine the positive ideal and negative ideal solutions.According to the weighted normalized matrix obtained in Step 2, the positive ideal solution S + and negative ideal solution S − are determined using ( 24) and ( 25), respectively.
Step 5. Calculate the separation measures of potential missing links from the positive ideal and the negative ideal solutions.The separation measures between potential missing links and the positive ideal solution are calculated using (28), and the separation measures between potential missing links and the negative ideal solution are calculated using (29).
Step 6. Obtain the relative closeness of each potential missing link to the ideal solution.The relative closeness is obtained by (30).Sort potential missing links in descending order according to their closeness scores, and the links at the top are most likely to be missing ones.

Example Explanation
. This section uses an example to explain how the proposed method works.Figure 2 shows the example network.In this example, we simply assume that each similarity index has the same weight.Thus, we have no need to weight the normalized matrix.
First, calculate the values of RA, LP, and CAR for unconnected node pairs of the network.Thirteen, twenty, and three unconnected node pairs are assigned values by RA, LP, and CAR, respectively.The results are listed in Figure 3(a).Because LP index considers both 2-hop and 3-hop paths, it gives values for the most unconnected node pairs.CAR index demands that there are links between the common neighbors.Since the example network is very simple, the requirement of CAR index cannot always be granted.Therefore, it assigns the score of zero to most node pairs.Second, the decision matrix X and the normalized decision matrix Y are obtained according to Figure 3(a) and (22), which are presented in Figures 3(b) and 3(c), respectively.
Next, determine the positive ideal solution S + and the negative ideal solution S − , which are S + = 0 66699, 0 52204, 0 72761 , Then, compute d + i and d − i for each potential missing link, respectively.The results are outlined in Table 1.
At last, calculate the relative closeness of all potential missing links and rank them based on their relative closeness.The results are shown in Table 1.As mentioned above, rank the potential missing links according to the relative closeness to the ideal solution and assume that the links at top ranks are the real missing ones.

Compute the weights
Use Algorithm 1 for RA, LP, and CAR Step 1 Step 2 Step 3 Step 6 Step 4 Step (1) C. elegans (CE): the neural network of a Caenorhabditis elegans worm [45].
(2) Email: a network of email interchanges between members of a university [46].
(4) Football (FTB): the network of American football games between Division IA colleges during regular season Fall 2000 [48].
(5) HEP: the coauthorships network of scientists who posted preprints on the high-energy theory archive from 1995 to 1999 [49].(7) Jazz: a network of Jazz musicians [51].
(10) USAir: a network of the US air transportation system [17].
In this work, all networks are treated as undirected and unweighted networks, and only the giant component of each network is used.Table 2 lists the basic statistics of the giant components of these networks.

Friedman Test.
To further analyze the statistical significance of the proposed method, the Friedman test [56] is introduced in our experiments.This test is a nonparametric statistical hypothesis test used to compare multiple methods on a group of datasets [57].It ranks the methods for each dataset separately according to their accuracy; the best performing method getting rank 1, the second best rank 2, and so on.In case of ties, average ranks are assigned.
Given k methods and N datasets, r i,j denotes the rank of the ith method on the jth dataset, and R i is the average rank of the ith method, R i = 1/N∑ j r i,j .The null-hypothesis in Friedman test is that all the methods are equivalent and then their ranks R i should be equal.The Friedman statistic is which is distributed according to χ 2 F with k − 1 degrees of freedom.Later, Iman and Davenport found that Friedman statistic is undesirably conservative and presented a better statistic [58], which is This statistic is distributed according to the F-distribution the null-hypothesis is rejected [57].Consequently, there are significant differences between these approaches.
If the null-hypothesis is rejected, a post-hoc test is further proceeded to analyze the significant differences.The critical difference is defined as where q α is a critical value for post-hoc test [57].If their average ranks differ by at least the critical difference, the performance of two approaches is significantly different.7 Complexity paths introduced by LP can make the similarities much more distinguishable and thus enhances the accuracy.Although L P TOPSIS integrates the advantage of LP index, the CAR index, which is not suitable for very sparse networks, weakens its performance.On Jazz and USAir, the proposed method gets rank 4 and 3, respectively.That is resulted from the poor accuracy of LP index on both networks.In this paper, we uniformly set the parameter of LP index as 0.001; however, the optimal value of the parameter is varying on different networks [21,23], and detecting the optimal value is very time-consuming.In addition, LR m manifests fairly good predicted results, which obtains three best and three second.Generally speaking, LP TOPSIS performs the best and LR m does the second.The reason is that both methods aggregate several structural features.However, other baselines do not obtain satisfactory results.Take index as an example, it is ranked first on HEP but ranked seventh on USAir and eighth on Jazz.In a nutshell, the proposed method is more stable on different networks than baselines.For the MaxE method, which uncovers missing links by reconstructing the network at hand based on the exponential random graph framework, it performs relative poorly on the networks with welldefined community, such as FBK, FTB, and Email.The reason is the configuration model used in MaxE to reconstruct the network may not be the optimal generative model for those networks.Actually, determining the optimal generative model is very difficult.Nevertheless, on the network with core-periphery structure, for example, PB and USAir, MaxE achieves good accuracy under the metric of AUC.The corresponding values are 0.9009 and 0.8887.These values are acceptable.The configuration model of MaxE takes as input just the nodes degrees; networks with core-periphery structure seem to be largely explained by the degree sequences [28].Thus, for networks with a core-periphery structure, the MaxE is suitable.

Results and Analysis.
Next, we applied the Friedman test [57] on the above results to analyze the significant differences between baselines and LP TOPSIS .Depending on Table 3, we get χ 2 F = 72 115 and F F = 36 288.In this paper, there are 10 methods and 10 networks.F F is distributed according to the F-distribution with 10 − 1 = 9 and 10 − 1 × 10 − 1 = 81 degrees of freedom.The critical value of F 9, 81 for α = 0 05 is 1.998.Due to F F = 36 288 > 1 998, we reject the null-hypothesis, which states that all the methods are equivalent.
Since the null-hypothesis is rejected, we proceed with a post-hoc test.In our experiments, the Bonferroni-Dunn test [59], in which all methods are compared only to a     [45] and assortative coefficient [54], respectively.H is the degree heterogeneity [17], e is the network efficiency [55].10 Complexity control method and not between themselves [57], is employed to estimate the significant differences between L P TOPSIS and baselines.The critical difference is CD = 2 773 × 10 × 10 + 1 / 6 × 10 = 3 75 for α = 0 05.The results are graphically presented in Figure 4.In the axis, the best rank is on the left side.Figure 4 shows that LP TOPSIS is significantly better than LNB CN, CCLP, MI, CAR, and MaxE.Although there are no significant differences of LR m , ADP, RA, and LP with LP TOPSIS , the average rank of LP TOPSIS is better than them.
Furthermore, Figure 5 describes the changes of AUC of all prediction methods for varying proportions of training set E tr in E (from 0.7 to 0.9).Clearly, AUC scores show an upward trend when the proportion increase from 0.7 to 0.9 in Figure 5.The reason is event; the more proportion of training set E tr is, the more information is provided for training.On the contrary, low proportions of E tr will enhance the difficulty of link prediction.Therefore, we do not conduct experiment with lower proportions of E tr , for example, 0.6 and 0.5.According to Figure 5, the AUC values of the proposed method, with varying sizes of training set, are either the best or close to the best.Figure 5(k) exhibits the average ranks of different methods for varying proportions of E tr .It can be observed that the proposed method always gets the best.The significant analysis for E tr / E = 0 9 (that is, E ts / E = 0 1) is already presented above.Now we give the analysis for E tr / E = 0 8 and 0.7, respectively.The corresponding values of F F are 25.415 and 31.898,respectively.Both values are greater than 1.998 (the critical value of F 9, 81 ).That means the null-hypothesis, which states that all these methods are equivalent, is rejected.The results of Bonferroni-Dunn test [59] are depicted in Figure 6.From Figure 6, LP TOPSIS ranks first and is significantly better than LNB CN, CCLP, MI, CAR, and MaxE.These results are similar with that in Figure 4.
On the other hand, the metric of Precision focuses on the top-L predicted links.Figure 7 shows the prediction results under the metric of Precision on the 10 networks with different sizes of L. These results demonstrate that LP TOPSIS is invariably in the top place over most networks.However, baselines have wild fluctuations on different networks.For instance, RA index achieves the best on NS, but falls to the second last on PB.In addition, in most methods, Precision shows slightly downward trend when the size of L increases.The reason is that the increase of L, the probability to uncover relevant items will decrease, and then the value of Precision will lower.Figure 7(k) presents the average rank of each method over the 10 networks with respect to different sizes of L. Clearly, LP TOPSIS outperforms others except when L = 10 and 20.MI index ranks first when L = 10 and 20 but falls behind LP TOPSIS for other values of L. MI index is superior to other baselines under Precision; however, its AUC values are not satisfactory enough.On the whole, LR m performs mediocre in terms of Precision, in spite of its quite good performance under the metric of AUC.
At last, we depict the changes of Precision of all prediction methods for varying proportions of training set E tr in E (from 0.7 to 0.9) in Figure 8.In this experiment, we set L = E ts for all networks.It can be seen from Figure 8 that Precision exhibits the opposite changing trend in comparison with AUC, that is, Precision scores show a downward trend when the proportion increase from 0.7 to 0.9.This phenomenon has also been observed in [5].The main reason is that the decrease of training set E tr will lead to weak n 1 and strong n 2 in the definition of AUC (see (1)) and then lower the value of AUC [5].Oppositely, the increase of   13 Complexity testing set E ts , the probability to obtain the relevant items will increase, which makes it easier for uncovering the missing links [5].Therefore, combining both metrics in the evaluation of the accuracy of a prediction method is necessary in practical application.Figure 8(k) shows the average ranks of different methods for E tr / E = 0 7, 0.8, and 0.9 in terms of Precision.In general, the proposed method is in the second place when E tr / E = 0 7 and ranks first when E tr / E = 0 8 and 0.9.This indicates the stable performance of the proposed method under the metric of Precision.
From the aforementioned results, we can conclude that the proposed method outperforms the compared indices and is applicable to more networks.The striking characteristic of the proposed method is that it aggregates several structural features of a network by means of combining RA, LP, and CAR indices using TOPSIS.Thus, the proposed method can automatically adapt to diverse networks and then performs stability on various networks.Although the ADP index is claimed to be able to automatically adapt to the structure of a network by adaptively penalizing the degrees of common neighbors, it still focuses, in effect, on one structure in a network.Therefore, there are gaps between the accuracy of ADP and LP TOPSIS .Similar with LP TOPSIS , LR m aggregates three structural features with logistic regression.This method achieves the second best under AUC, whereas its accuracy under Precision is very general.

Conclusion
Link prediction aims at finding the missing links and predicting future links in a network.As an important research topic in complex network analysis, it has drawn increasing attention from disparate scientific communities.Among various categories of approaches, similarity-based methods have become the mainstream due to their low complexity and high interpretability.In general, one similarity-based method assumes that its similarity measure is applicable to diverse networks.However, different networks always have different inner topological structures, which results in the unstable performance of similarity-based methods.Inspired by the applications of multiple-attribute decision-making (MADM) problem, we proposed in this paper a novel link prediction method.The proposed method employs three classical similarity indices, that is, RA, LP, and CAR, as the attributes and aggregates their scores by means of TOPSIS, a well-known MADM method, to make a decision for ranking unconnected node pairs.The accuracy of the proposed method is experimentally evaluated over 10 real-world networks with the metrics of AUC and Precision.The experimental results indicate that the proposed method is not only more effective but also stable than the competing methods.The robustness comparisons of the proposed method with baselines for varying sizes of training sets suggest the robust of the proposed method.The results in our experiments demonstrate that MADM method is an effective way to solve the link prediction problem.In the future work, we can further study the application of MADM method in link prediction.

( 6 )
Infectious (INF): a network of people's face-to-face contacts in the exhibition "Infectious: Stay Away" in 2009 at the Science Gallery in Dublin[50].

Figure 2 :
Figure 2: An example network with 8 nodes and 13 links.

Figure 3 :
Figure 3: The values of RA, LP, and CAR for unconnected node pairs in the example network.

Figure 5 :
Figure 5: AUC results on 10 networks with different proportions of training set E tr .The results are the average of 50 independent implementations.

Figure 8 :
Figure 8: Precision results on 10 networks with different proportions of training set.The results are the average of 50 independent implementations.In this figure, L = |E ts | for all networks.
RA , w LP , w CAR ] LP 1 CAR 1 RA 2 LP 2 CAR 2 RA m LP m CAR m To evaluate the performance of the proposed method, we use 10 real-world networks collected from various fields in this work (all data are downloaded from http://deim.urv.cat/~alexandre.arenas/data/welcome.htm, http://www-personal.umich.edu/~mejn/netdata/,http://vlado.fmf.uni-lj.si/pub/networks/data/,http://noesis.ikor.org/datasets/link-prediction). The brief descriptions of these networks are given as follows: Figure 1: The flow chart of the proposed method.Input: E tr : training graph; I: similarity index; p%: percentage of nodes Output: weight of I 1: ρ←randomly selected p% nodes from E tr ; 2: for v i ∈ ρ do 3: Calculate similarity scores for existing links between v i and other nodes based on I; 4: Calculate similarity scores for non-existing links between v i and other nodes based on I; Table3lists the predicted results of different methods under the AUC metric on the 10 networks.The numbers in the round brackets are the ranks.In case of ties, average ranks are assigned.These results are the average of 50 independent realizations for each network.In each realization, we randomly split a network into a training set and a testing set, which contain 90% and 10% links, respectively.The best value for each network is highlighted in boldface.It is evident from Table3that LP TOPSIS achieves the best accuracy in terms of AUC on Email, FBK, FTB, INF, and NS and obtains the second best on CE and PB.These results are fairly decent.On HEP, LP outperforms others because [23] network possesses high average shortest distance, small average degree, and network efficiency.In other words, the network of HEP is very sparse (see Table2), so the local methods, such as RA, CAR, and CCLP, obtain lower accuracy than LP.As stated in[23], in the relatively sparse network, common neighbor-based methods are less distinguishable.Whereas, the additional information provided by the 3-hop

Table 1 :
Relative closeness and rank of each node pair.

Table 2 :
The basic statistics of the giant components of the 10 networks.V and E are the total numbers of nodes and edges, respectively.k and d present the average degree and the average shortest distance, respectively.C and r indicate the clustering coefficient

Table 3 :
AUC values of different methods on 10 networks.The results are the average of 50 independent implementations with E ts / E = 0 1.The best performance for each network is emphasized by boldface.The numbers in the round brackets are the ranks.In case of ties, average ranks are assigned.
Comparison of LP TOPSIS against the others with the Bonferroni-Dunn test.This comparison is based on the results in Table3.All methods with ranks outside the marked interval are significantly different from LP TOPSIS .
Figure 6: The Bonferroni-Dunn test for E tr / E = 0 8 and 0.7.All methods with ranks outside the marked interval are significantly different from LP TOPSIS .
networksFigure7: Precision results on 10 networks with different values of L. The results are the average of 50 independent implementations with E ts / E = 0 1.The size of E ts for FTB is 61, so the max L selected is 60.Similarly, the max L selected for NS is 90.