Identifying vital nodes based on reverse greedy method

Ren, Tao; Li, Zhe; Qi, Yi; Zhang, Yixin; Liu, Simiao; Xu, Yanjie; Zhou, Tao

doi:10.1038/s41598-020-61722-8

Download PDF

Article
Open access
Published: 16 March 2020

Identifying vital nodes based on reverse greedy method

Tao Ren¹,
Zhe Li¹,
Yi Qi¹,
Yixin Zhang¹,
Simiao Liu¹,
Yanjie Xu¹ &
…
Tao Zhou²

Scientific Reports volume 10, Article number: 4826 (2020) Cite this article

1115 Accesses
6 Citations
Metrics details

Subjects

This article has been updated

Abstract

The identification of vital nodes that maintain the network connectivity is a long-standing challenge in network science. In this paper, we propose a so-called reverse greedy method where the least important nodes are preferentially chosen to make the size of the largest component in the corresponding induced subgraph as small as possible. Accordingly, the nodes being chosen later are more important in maintaining the connectivity. Empirical analyses on eighteen real networks show that the reverse greedy method performs remarkably better than well-known state-of-the-art methods.

Identifying vital nodes in complex networks by adjacency information entropy

Article Open access 14 February 2020

Xiang Xu, Cheng Zhu, … Yun Zhou

Identifying vital nodes for influence maximization in attributed networks

Article Open access 31 December 2022

Ying Wang, Yunan Zheng & Yiguang Liu

Integrating local and global information to identify influential nodes in complex networks

Article Open access 14 July 2023

Mohd Fariduddin Mukhtar, Zuraida Abal Abas, … Siti Haryanti Hairol Anuar

Introduction

Network science is playing an increasingly significant role in many domains including physics, sociology, engineering, biology, management, and so on¹. Because of the heterogeneous nature of real networks², the overall connectivity of networks may depend on a small set of nodes, usually named as hub nodes. Taking the Internet as an example, several vital nodes attacked deliberately may lead to the collapse of the whole network³. Therefore, an efficient algorithm to identify vital nodes that have critical impacts on the network connectivity can help to better prevent catastrophic outages in power grids or the Internet^3,4,5,6, maintain the connectivity or design efficient attacking strategies for communication networks⁷, improve urban transportation capacity with low cost⁸, enhance robustness of financial networks⁹, and so on.

Till far, to identify vital nodes for network connectivity, the majority of known methods only make use of the structural information¹⁰. Typical representatives include degree centrality (DC)¹¹, H-index¹², k-shell index (KS)¹³, PageRank (PR)¹⁴, LeaderRank¹⁵, closeness centrality (CC)¹⁶, betweenness centrality (BC)¹⁷, and so on. Recently, Morone and Makse¹⁸ proposed a novel index called collective influence (CI), which is based on the site percolation theory and can find out the minimal set of nodes that are crucial for the global connectivity. CI performs remarkably better than some known methods in identifying the nodes’ importance for network connectivity^18,19. Furthermore, some sophisticated methods with even better performance have been reported, such as the belief propagation-guided decimation method^20,21 (BPD), the two-core based algorithm²² (CoreHD) and the explosive immunization method²³ (EI).

This paper proposes a novel method named reverse greedy (RG) method. The first word stands for the process that we add nodes one by one to an empty network, which is inverse to the usual process that removes nodes from the original network. The second word emphasizes that we choose the nodes added by minimizing the size of the largest component. Empirical analyses on eighteen real networks show that RG performs remarkably better than well-known state-of-the-art methods.

Algorithms

The core of the RG algorithm is the reverse process, which adds nodes one by one to an empty network while minimizes the cost function until all nodes in the considered network are added. Then, nodes are ranked inverse to the order of additions, that is to say, the later added nodes are more important in maintaining the network connectivity. Denote $G(V,E)$ the original network under consideration, where $V$ and $E$ are the sets of nodes and edges, respectively. This paper focuses on simple networks, where the weights and directions of edges are ignored, and the self loops are not allowed. The reverse process starts from an empty network ${G}_{0}({V}_{0},{E}_{0})$, where ${V}_{0}=\varnothing $ and ${E}_{0}=\varnothing $. At the $(n+1)$^th time step, one node from the remaining set $V-{V}_{n}$ is selected to add into the current network ${G}_{n}({V}_{n},{E}_{n})$ to form a new network of $(n+1)$ nodes, say ${G}_{n+1}({V}_{n+1},{E}_{n+1})$. Note that, all progressive networks ${G}_{n}$ ($n=0,1,2,\cdots \ ,N$, with $N$ being the size of the original network $G$) in the process are induced subgraphs of $G$. For example, ${G}_{n}$ is consisted of all edges in $G$ with both two ends belonging to ${V}_{n}$. According to the greedy strategy, the selected node $i$ should minimize the size of the largest component in ${G}_{n+1}$. If there are multiple nodes satisfying this condition, we will choose the one with the help of another structural feature of the node $i$ in $G$ (e.g., degree, betweenness, and so on). Therefore, the cost function can be defined as

$$cost(i,n+1)={G}_{n+1}^{\max }(i)+\varepsilon f(i),$$

(1)

where ${G}_{n+1}^{\max }(i)$ is the size of the largest component after adding node $i$ into ${G}_{n}$, $f(i)$ is a certain structural feature of node $i$ in $G$, and $\epsilon $ is a very small positive parameter and has no effect on the results of RG as long as $\epsilon \ast maxf(i) < 1$ is guaranteed. The parameter $\epsilon $ works only when ${G}_{n+1}^{\max }(\bullet )$ are indistinguishable for multiple nodes. At each time step, we add the node minimizing the cost function into the network, and if there are still multiple nodes with the minimum cost, we will select one of them randomly. This process stops after $N$ time steps, namely all nodes are added with ${G}_{N}\equiv G$. An illustration of such process in a small network is shown in Fig. 1.

Data

In this paper, eighteen real networks from disparate fields are used to test the performance of RG, including four collaboration networks (Jazz, NS, Ca-AstroPh and Ca-CondMat), two communication networks (Email-Univ and Email-EuAll), four social networks (PB, Sex, Facebook and Loc-Gowalla), one transportation network (USAir), one infrastructure network (Power), one citation network (Cit-HepPh), one road network (RoadNet-TX), one web graph (Web-Google) and three autonomous systems graphs (Router, AS-733 and AS-Skitter). Jazz²⁴ is a collaboration network of jazz musicians. NS²⁵ is a co-authorship network of scientists working on network science. Ca-AstroPh²⁶ is a collaboration network of Arxiv Astro Physics category. Ca-CondMat²⁶ is a collaboration network of Arxiv Condensed Matter category. Email-Univ²⁷ describes email interchanges between users including faculty, researchers, technicians, managers, administrators, and graduate students of the Rovira i Virgili University. Email-EuAll²⁶ is an email network of a large European Research Institution. PB²⁸ is a network of US political blogs. Sex²⁹ is a bipartite network in which nodes are females (sex sellers) and males (sex buyers) and edges between them are established when males write posts indicating sexual encounters with females. Facebook³⁰ is a sample of the friendship network of Facebook users. Loc-Gowalla³¹ describes the friendships of Gowalla users. USAir³² is the US air transportation network. Power³³ is a power grid of the western United States. Cit-HepPh³⁴ is a citation network of high energy physics phenomenology. RoadNet-TX³⁵ is a road network of Texas. Web-Google³⁵ is a web graph of the Google programming contest in 2002. Router³⁶ is a symmetrized snapshot of the structure of the Internet at the level of autonomous systems. AS-733³⁷ contains the daily instances of autonomous systems from November 8 1997 to January 2 2000. AS-Skitter³⁷ describes the autonomous systems from traceroutes run daily in 2005 by Skitter. These networks’ topological features (including the number of nodes, the number of edges, the average degree, the clustering coefficient³³, the assortative coefficient³⁸ and the degree heterogeneity³⁹) are shown in Table 1.

Table 1 The basic topological features of the eighteen real networks. $N$ and $M$ are the number of nodes and edges, $\langle k\rangle $ is the average degree, $C$ is the clustering coefficient, $r$ is the assortative coefficient and $H$ is the degree heterogeneity.

Full size table

Besides real-world networks, we also work on ErdŐs-Rényi (ER) random networks⁴⁰ with different densities. We generate four ER random networks with $N=100000$ and $\langle k\rangle =6,9,12,15$, named as ER-6, ER-9, ER-12, ER-15, respectively.

Results

We apply two widely used metrics to evaluate algorithms’ performance. One is called robustness $R$⁴¹. Given a network, we remove one node at each time step and calculate the size of the largest component of the remaining network until the remaining network is empty. The robustness$R$ is defined as⁴¹

$$R=\frac{1}{N}\mathop{\sum }\limits_{Q=1}^{N}S(Q),$$

(2)

where $S(Q)$ is the number of nodes in the largest component divided by $N$ after removing $Q$ nodes. The normalization factor 1/$N$ ensures that the values of $R$ of networks with different sizes can be compared. Obviously, a smaller $R$ means a quicker collapse and thus a better performance. Another metric is the number of nodes to be removed to make the size of the largest component in the remaining network being no more than $0.01N$, denoted by, ${\rho }_{min}$. Obviously, a smaller ${\rho }_{min}$ means a better performance.

Here we use BC, CC, DC, H-index, KS, PR, CI, CI with reinsertion (short for CI+), BPD, CoreHD and EI as the benchmark algorithms (see details about these benchmark algorithms in Methods). Tables 2 and 3 compare $R$ and ${\rho }_{min}$ of RG and other benchmarks algorithms, respectively. Notice that, we use the random removal (Random) as the background benchmark in order to show the improvement by each method. Both BPD and CoreHD need a refinement process to insert back some removed nodes. In each step, it calculates the increase of the component size after the insertion of a node and select the node that gives the smallest increase. Such process stops when the largest component reaches $0.01N$. These two methods do not provide a ranking of all nodes, and thus we cannot obtain $R$ for them. For EI, according to²³, we set $K=6$ and the number of candidates being equal to 2000. So the networks with sizes smaller than 2000 fail to apply EI (those cases are marked by N/A in Tables 2–4). Each result of EI is obtained by averaging over 20 independent realizations. The radii of CI and CI+ are set to be 3 only except for AS-Skitter (its radius is set to be 2) because its size is too large and thus $radius=3$ will leads to too much computation.

Table 2 The Robustness $R$ for RG and other benchmarks. The best performed method for each network is emphasized in bold.

Full size table

Table 3 The minimum number of nodes ${\rho }_{min}$ for RG and other benchmarks. The best performed method for each network is emphasized in bold.

Full size table

Table 4 Running time of the twelve methods (seconds).

Full size table

As shown in Table 2, subject to $R$, CI, CI+, EI and RG perform better than the classical centralities (e.g., BC, CC, DC, H-index, KS and PR) in almost all networks, and RG is overall the best method subject to $R$ for real networks. Figure 2 shows the collapsing processes of four representative networks, resulted from the node removal by RG and other benchmark algorithms. Obviously, RG can lead to faster collapse than all other algorithms, and CI+ is the second best algorithm. As shown in Table 3, subject to ${\rho }_{min}$, BPD, CoreHD, EI and RG perform better than the classical centralities and CI/CI+ in almost all networks. For real networks, RG and BPD perform remarkably better than other methods. For random networks, RG is among the best and much better than the classical centralities. However, RG is slightly worse than CI+ subject to $R$ and BPD subject to ${\rho }_{min}$. In random networks, the topological difference among different nodes is relatively small, and thus RG may mistakenly select some influential nodes as unimportant nodes at the early stage, leading to unsatisfactory results. Table 4 compares the CPU times of the 12 methods under consideration, from which one can see that RG is slower than DC, H-index, KS, PR and CoreHD, similar to CC and BPD and faster than BC, CI/CI+ and EI. Generally speaking, RG is an efficient method.

Discussion

To our knowledge, most previous methods directly identify the critical nodes by looking at the effects due to their removal¹⁰. In contrast, our method tries to firstly find out the least important nodes, so that the remaining ones are those critical nodes. To our surprise, such a simple idea eventually results in an efficient algorithm that outperforms many well-known benchmark algorithms. Beyond the percolation process considered in this paper, the reverse method provides a novel angle of view that may find successful applications in some other network-based optimization problems related to certain rankings of nodes or edges.

The performance of RG can be further improved by introducing some sophisticated skills. For example, instead of degree, $f(i)$ can be different for different networks or set in a more complicated way to improve the algorithm’s performance. In addition, the simple adoption of the greedy strategy may bring us to some local optimums. Such shortage can be to some extent overcame by applying the beam search⁴², which searches for the best set of $m$ nodes adding to the network that optimizes the cost function. The present algorithm is the special case for $m=1$. Although beam search is still a kind of greedy strategy, it usually performs much better when $m$ is larger. At the same time, the beam search with large $m$ costs a lot on time and space. Therefore, how to find a good tradeoff is also an open challenge in real practice.

Methods

Benchmark centralities

DC¹¹ of node $i$ is defined as

$$DC(i)=\sum _{j}{a}_{ij},$$

(3)

where $A=\{{a}_{ij}\}$ is the adjacency matrix, that is, ${a}_{ij}$ = 1 if $i$ and $j$ are directly connected and 0 otherwise.

H-index¹² of node $i$, denoted by $H(i)$, is defined as the maximal integer satisfying that there are at least $H(i)$ neighbors of node $i$ whose degrees are all no less than $H(i)$. Such index is an extension of the famous H-index in scientific evaluation⁴³ to network analysis.

KS¹³ is implemented by the following steps: Firstly, remove all nodes with degree one, and keep deleting the existing nodes until all nodes’ degrees are larger than one. All of these removed nodes are assigned 1-shell. Then recursively remove the nodes with degree no larger than two and set them to 2-shell. This procedure continues until all higher-layer shells have been identified and all network nodes have been removed.

PR¹⁴ of node $i$ is defined as the solution of the equations

$$P{R}_{i}(t)=s\mathop{\sum }\limits_{j=1}^{N}{a}_{ji}\frac{P{R}_{j}(t-1)}{{k}_{j}}+(1-s)\frac{1}{N},$$

(4)

where ${k}_{j}$ is the degree of node $j$ and $s$ is a free parameter controlling the probability of a random jump. In this paper, $s$ is set to $0.85$.

CC¹⁵ of node $i$ is defined as

$$CC(i)=\frac{N-1}{{\sum }_{j\ne i}{d}_{ij}},$$

(5)

where ${d}_{ij}$ is the shortest distance between nodes $i$ and $j$.

BC¹⁶ of node $i$ is defined as

$$BC(i)=\sum _{s\ne i,s\ne t,i\ne t}\frac{{g}_{st}(i)}{{g}_{st}},$$

(6)

where ${g}_{st}$ is the number of shortest paths between nodes $s$ and $t$, and ${g}_{st}(i)$ is the number of shortest paths between nodes $s$ and $t$ that pass through node $i$.

CI¹⁸ of node $i$ is defined as

$$CI(i)=({k}_{i}-1)\sum _{j\in \partial ball(i,\ell )}({k}_{j}-1),$$

(7)

where $ball(i,\ell )$ is the set of nodes inside a ball of radius $\ell $, consisted of all nodes with distances no more than $\ell $ from node $i$, and $\partial ball(i,\ell )$ is the frontier of this ball. CI+ is the version of CI with reinsertion.

BPD²¹ is rooted in the spin glass model for the feedback vertex set problem⁴⁴. At time $t$ of the iteration process, the algorithm estimates the probability ${q}_{i}^{0}(t)$ that every node $i$ of the remaining network $G(t)$ is suitable to be deleted. The explicit formula for this probability is

$${q}_{i}^{0}=\frac{1}{1+{e}^{x}\left[1+{\sum }_{k\in \partial i(t)}\frac{1-{q}_{k\to i}^{0}}{{q}_{k\to i}^{0}+{q}_{k\to i}^{k}}\right]{\prod }_{j\in \partial i(t)}[{q}_{j\to i}^{0}+{q}_{j\to i}^{j}]},$$

(8)

where $x$ is an adjustable reweighting parameter, and $\partial i(t)$ denotes node $i$’s set of neighbors at time $t$. The quantity ${q}_{i\to j}^{0}(t)$ is the probability that the neighboring node $j$ is suitable to be deleted if node $i$ is absent from the network $G(t)$, while ${q}_{i\to j}^{j}(t)$ is the probability that this node $j$ is suitable to be the root node of a tree component in the absence of node $i$. The node with the highest probability of being suitable for deletion is deleted from network $G(t)$ along with all its associated edges. This node deletion process stops after all the loops in the network have been destroyed. Then check the size of each tree component in the remaining network. If a tree component is too large (which occurs only rarely), an appropriately chosen node from this tree to achieve a maximal decrease in the tree size is deleted. Repeat this node deletion process until all the tree components are sufficiently small.

CoreHD²² simply removes highest-degrees nodes from the 2-core in an adaptive way (updating node degree as the 2-core shrinks), until the remaining network becomes a forest. Furthermore, CoreHD breaks the trees into small components. In case the original network has many small loops, a refined dismantling set is obtained after a reinsertion of nodes that do not increase (much) the size of the largest component.

EI²³ selects $m$ candidate nodes from the set of absent nodes at each step and calculate the score ${\sigma }_{i}$ of each of them using the following kernel

$${\sigma }_{i}=\sum _{j\in {N}_{i}}(\sqrt{| | {C}_{j}| | }-1)+{k}_{i}^{(eff)},$$

(9)

where ${N}_{i}$ represents the set of all connected components linked to node $i$, each of which has a size ${C}_{j}$, and ${k}_{i}^{(eff)}$ is an effective degree attributed to each node (please see²³ for the details). Then the nodes with the lowest scores are added to the network. This procedure is continued until the size of the giant connected component exceeds a predefined threshold. The minus one term in Eq. (9) is used to exclude any leaves connected to node $i$, since they do not contribute to the formation of the giant connected component and should be ignored.

Data availability

All relevant data are available at https://github.com/ChinaYiqun/network-data.

Change history

15 June 2020
Due to a typesetting error, in the original version of this Article the character ü was replaced by the character ï¿½. This has now been fixed in the Article.

References

Newman, M. E. J. Networks. (Oxford University Press, Oxford, 2018).
Book Google Scholar
Caldarelli, G. Scale-Free Networks: Complex Webs in Nature and Technology. (Oxford University Press, Oxford, 2007).
Book Google Scholar
Cohen, R., Erez, K., Ben-Avraham, D. & Havlin, S. Breakdown of the internet under intentional attack. Phys. Rev. Lett. 86, 3682–3685 (2001).
Article ADS CAS Google Scholar
Motter, A. E. & Lai, Y. C. Cascade-based attacks on complex networks. Phys. Rev. E 66, 065102 (2002).
Article ADS Google Scholar
Motter, A. E. Cascade control and defense in complex networks. Phys. Rev. Lett. 93, 098701 (2004).
Article ADS Google Scholar
Albert, R., Albert, I. & Nakarado, G. L. Structural vulnerability of the North American power grid. Phys. Rev. E 69, 025103 (2004).
Article ADS Google Scholar
Albert, R., Jeong, H. & Barabási, A. L. Error and attack tolerance of complex networks. Nature 406, 378–382 (2000).
Article ADS CAS Google Scholar
Li, D. et al. Percolation transition in dynamical traffic network with evolving critical bottlenecks. Proc. Natl. Acad. Sci. USA 112, 669–672 (2015).
Article ADS CAS Google Scholar
Haldane, A. G. & May, R. M. Systemic risk in banking ecosystems. Nature 469, 351–355 (2011).
Article ADS CAS Google Scholar
Lü, L. et al. Vital nodes identification in complex networks. Phys. Rep. 650, 1–63 (2016).
Article ADS MathSciNet Google Scholar
Bonacich, P. Factoring and weighting approaches to status scores and clique identification. Math. Sociol. 2, 113–120 (1972).
Article Google Scholar
Lü, L., Zhou, T., Zhang, Q. M. & Stanley, H. E. The H-index of a network node and its relation to degree and coreness. Nat. Commun. 7, 10168 (2016).
Article ADS Google Scholar
Kitsak, M. et al. Identification of influential spreaders in complex networks. Nat. Phys. 6, 888–893 (2010).
Article CAS Google Scholar
Brin, S. & Page, L. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30, 107–117 (1998).
Article Google Scholar
Lü, L., Zhang, Y.-C., Yeung, C.-H. & Zhou, T. Leaders in social networks, the delicious case. PLoS One 6, e21202 (2011).
Article ADS Google Scholar
Freeman, L. C. Centrality in social networks conceptual clarification. Soc. Networks 1, 215–239 (1979).
Article Google Scholar
Freeman, L. C. A set of measures of centrality based on betweenness. Sociometry 40, 35–41 (1977).
Article Google Scholar
Morone, F. & Makse, H. A. Influence maximization in complex networks through optimal percolation. Nature 524, 65–68 (2015).
Article ADS CAS Google Scholar
Morone, F., Min, B., Bo, L., Mari, R. & Makse, H. A. Collective influence algorithm to find influencers via optimal percolation in massively large social media. Sci. Rep. 6, 30062 (2016).
Article ADS CAS Google Scholar
Zhou, H. J. Spin glass approach to the feedback vertex set problem. Eur. Phys. J. B 86, 1–9 (2013).
Article MathSciNet Google Scholar
Mugisha, S. & Zhou, H. J. Identifying optimal targets of network attack by belief propagation. Phys. Rev. E 94, 012305 (2016).
Article ADS Google Scholar
Zdeborová, L., Zhang, P. & Zhou, H. J. Fast and simple decycling and dismantling of networks. Sci. Rep. 6, 37954 (2016).
Article ADS Google Scholar
Clusella, P., Grassberger, P., Pérez-Reche, F. J. & Politi, A. Immunization and targeted destruction of networks using explosive percolation. Phys. Rev. Lett. 117, 208301 (2016).
Article ADS Google Scholar
Gleiser, P. & Danon, L. Community structure in Jazz. Adv. Complex Syst. 6, 565 (2003).
Article Google Scholar
Newman, M. E. J. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74, 036104 (2006).
Article ADS MathSciNet CAS Google Scholar
Leskovec, J., Kleinberg, J. & Faloutsos, C. Graph Evolution: Densification and Shrinking Diameters. ACM Trans. Knowl. Disc. Data 1, 2 (2007).
Article Google Scholar
Guimera, R., Danon, L., Díaz-Guilera, A., Giralt, F. & Arenas, A. Self-similar community structure in a network of human interactions. Phys. Rev. E 68, 065103 (2003).
Article ADS CAS Google Scholar
Adamic, L. A. & Glance, N. The political blogosphere and the 2004 U.S. election: divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery pp. 36–43 (ACM Press, 2005).
Rocha, L. E., Liljeros, F. & Holme, P. Simulated epidemics in an empirical spatiotemporal network of 50,185 sexual contacts. PLoS Comput. Biol. 7, e1001109 (2011).
Article ADS CAS Google Scholar
Viswanath, B., Mislove, A., Cha, M. & Gummadi, K. P. On the Evolution of User Interaction in Facebook. In Proceedings of the 2nd ACM Workshop on Online Social Networks pp. 37–42 (ACM Press, 2009).
Cho, E., Myers, S. A. & Leskovec, J. Friendship and Mobility: Friendship and Mobility: User Movement in Location-Based Social Networks. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 1082–1090 (ACM Press, 2011).
Batageli, V. & Mrvar, A. Pajek Datasets. Available at, http://vlado.fmf.uni-lj.si/pub/networks/data/ (2007).
Watts, D. J. & Strogatz, S. H. Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (1998).
Article ADS CAS Google Scholar
Gehrke, J., Ginsparg, P. & Kleinberg, J. Overview of the 2003 KDD Cup. SIGKDD Explorations 5, 149–151 (2003).
Article Google Scholar
Leskovec, J., Lang, K., Dasgupta, A. & Mahoney, M. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics 6, 29–123 (2009).
Article MathSciNet Google Scholar
Spring, N., Mahajan, R., Wetherall, D. & Anderson, T. Measuring ISP topologies with rocketfuel. IEEE/ACM Trans. Networking 12, 2–16 (2004).
Article Google Scholar
Leskovec, J., Kleinberg, J. & Faloutsos, C. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 177–187 (ACM Press, 2005).
Newman, M. E. J. Assortative mixing in networks. Phys. Rev. Lett. 89, 208701 (2002).
Article ADS CAS Google Scholar
Hu, H. B. & Wang, X. F. Unified index to quantifying heterogeneity of complex networks. Physica A 387, 3769–3780 (2008).
Article ADS Google Scholar
Erdős, P. & Rényi, A. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 5, 17–60 (1960).
MathSciNet Google Scholar
Schneider, C. M., Moreira, A. A., Andrade, J. S., Havlin, S. & Herrmann, H. J. Mitigation of malicious attacks on networks. Proc. Natl. Acad. Sci. USA 108, 3838–3841 (2011).
Article ADS CAS Google Scholar
Zhou, R. & Hansen, E. A. Beam-stack search: Integrating backtracking with beam search. In Proceedings of the 15th International Conference on Automated Planning and Scheduling pp. 90–98 (AAAI Press, 2005).
Hirsch, J. E. An index to quantify an individual’s scientific research output. Proc. Natl. Acad. Sci. USA 102, 16569–16572 (2005).
Article ADS CAS Google Scholar
Karp, R. M., Miller, R. E. & Thatcher, J. W. Reducibility among combinatorial problems. Journal of Symbolic Logic 40, 618–619 (1975).
Article Google Scholar

Download references

Acknowledgements

The authors acknowledge DataCastle to hold the related world-wide competition and to share the data. This work is partially supported by National Natural Science Foundation of China (61473073, 61104074, 61433014), Fundamental Research Funds for the Central Universities (N161702001, N171706003, N181706001, N182608003).

Author information

Authors and Affiliations

Software College, Northeastern University of China, Shenyang, 110819, P. R. China
Tao Ren, Zhe Li, Yi Qi, Yixin Zhang, Simiao Liu & Yanjie Xu
CompleX Lab, University of Electronic Science and Technology of China, Chengdu, 611731, P. R. China
Tao Zhou

Authors

Tao Ren
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Li
View author publications
You can also search for this author in PubMed Google Scholar
Yi Qi
View author publications
You can also search for this author in PubMed Google Scholar
Yixin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Simiao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yanjie Xu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.R., Y.Q., Z.L. and T.Z. devised the research project. Y.Q., Y.X.Z. and S.M.L. performed the research. T.R., Z.L., Y.Q., Y.X.Z., S.M.L. and T.Z. analyzed the data. T.R., Z.L., Y.X.Z., S.M.L., Y.J.X. and T.Z. wrote the paper.

Corresponding authors

Correspondence to Tao Ren or Tao Zhou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ren, T., Li, Z., Qi, Y. et al. Identifying vital nodes based on reverse greedy method. Sci Rep 10, 4826 (2020). https://doi.org/10.1038/s41598-020-61722-8

Download citation

Received: 05 July 2019
Accepted: 03 March 2020
Published: 16 March 2020
DOI: https://doi.org/10.1038/s41598-020-61722-8

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.