Identifying Influential Rumor Spreader in Social Network

. It is of great significanceto identify influentialrumorspreaders for preventingand controlling the rumorpropagation. In this paper, on four real social networks, based on the classical rumormodel and combining one-to-manymodes of propagation, we investigate the rumor propagation by Monte Carlo simulations when the spreading rate is small. Firstly, we layer the network nodes according to network characteristics.If the assortative coefficientis positive, we layer thenetwork nodes by the degree centrality and the nodes with large degree are in high layers. If the assortative coefficient is negative, we layer the network nodes by the K-Shell method and the nodes with large 𝐾 𝑠 value are in high layers. Then the performance of nodes in different layers as origination of rumors and as informed nodes is investigated. We find that the propagation size is larger and the peak prevalence of the rumor is reached in a shorter time when the nodes in higher layers act as origination. Moreover, when the nodes in higher layer are not the origination of the rumor, they are more likely to be informed and they are informed more faster, and they terminate propagation faster. That is, their attendance is more beneficial to propagation size, peak prevalence, and the arrival time of peak prevalence. The conclusion can provide powerful theoretical support for controlling rumor propagation or enhancing information transmission.


Introduction
Rumor propagation is a very common phenomenon in the real world.In the age when Internet media is underdeveloped, the main channel for the rumor spread is word of mouth.With the development of science and technology, especially the emergence of Web 2.0, social networking has become a new tool for people to communicate and for news to spread.Thus rumors spread quickly and widely, and they have great destructive power.In all kinds of emergencies, rumors can not only cause social panic but also may cause mass unexpected incidents and affect social stability.Therefore, for preventing and controlling the rumor propagation, it has great theoretical and practical significance to determine if there is an influential spreader and identify who is the influential spreader in rumor propagation process.
Generally, the nodes are ranked by structural measures firstly to identify influential spreader in propagation dynamics.Up to now, a variety of node ranking methods are proposed according to the specific problems studied.The most direct measure is degree centrality.That is, the larger the degree of a node, the more important it is.When the degree distribution of a network is very wide, the degree centrality can reflect the influence of nodes [1][2][3].Betweenness centrality can measure the importance of bridge nodes connecting several communities in social, transportation, communications, and other networks [4,5].Eigenvector centrality [6] holds that the importance of a node depends both on the number of its neighbors and on the importance of each neighbor node.In 2010, Kitsak et al. [7] proposed the K-Shell decomposition method to determine the core and border of the network.Zeng An and Zhang Chengjun [8] considered the residual degree of K-Shell decomposition and proposed an improved strategy, Mixed Degree Decomposition (MDD).The PageRank algorithm [9] is a method for rating Web pages objectively and mechanically.Duanbing Chen et al. [10] proposed a semilocal centrality measure as a tradeoff between the low-relevant degree centrality and other timeconsuming measures.Ahmad Zareie et al. [11] introduce two new influential node ranking algorithms that use diversity of the neighbors of each node in order to obtain its ranking value.More ranking methods can been seen in [12] and the references therein.These measures behave differently on different networks [13].
Based on SIS, SI, or SIR epidemic models, according to the different network structure, if an appropriate measure is selected to rank nodes, then the influential epidemic spreaders are identified in the case of single source [1-3, 7, 8, 10, 14].But few people think about the case of rumors propagation.
Modeling the rumor propagation is similar to modeling the epidemic spread.In the process of epidemic spread, each person may be one of the following three states: susceptible, infected, and removed.When contacting with an infected person, a susceptible person is infected and become infected person with a certain probability.The infected person is cured and becomes removed with a certain probability.Similarly, in the spread of rumors, every person is in the following three states: ignorant, spreader, and stifler.When contacting with a spreader, the ignorant becomes a spreader with a certain probability.When contacting with a spreader or a stifler, a spreader loses interest in spreading and becomes a stifler with a certain probability.The removal mechanisms of the two propagation processes are different.An infected person can be removed automatically after treatment, but a rumor spreader usually loses interest and stops spreading after meeting another spreader or stifler [15,16].
In recent years, in order to explore whether the K-Shell decomposition method can identify influential rumors spreaders, Borge-Holthoefer and Moreno [17] studied rumor spread in two different ways and under the condition that the spreading rate is higher than the stifling rate.They found that no matter where the rumor originated from, the number of people who eventually heard the rumor (propagation size) was almost the same.This conclusion is different from the conclusion of Kitsak et al.
Two kinds of contact processes are considered when Borge-Holthoefer and Moreno [17] simulated rumor propagation.One is that each spreader randomly selects a neighbor to contact at each time step.The other is that each spreader is accessible to all his neighbors at each time step, but he contacts his neighbors one by one, and once he becomes Stifler, he stops contacting his neighbors.Such a contact process can describe the interpersonal communication scene of rumor propagation.There are other situations in the real world.For example, when sending messages via an e-mail network, you can send bulk mail, when a spreader contacts multiple neighbors at the same time.
In addition, we note that Borge-Holthoefer and Moreno assume the rumor process, and the spreading rate is always 1.The experiment did not use different spreading rates to validate the conclusions.In fact, the rumor propagation rate is different.It is affected by the type of rumor, the audience of rumor, and other factors.And, as Kitsak et al. [7] said, if the spreading rate is large, the propagation ability of each node is very strong and the node will soon inform the entire network.Thus, it is difficult to distinguish the importance of an individual.
Therefore, based on the classical rumor propagation model, we assume that each individual contacts all his neighbors at each time step and propagation rate is different.Firstly, because of the difference of network structure, different node ranking methods are adopted to rank and layer the network nodes.Then, Monte Carlo is used to simulate rumor propagation, and the propagation performance of nodes in different layers as source nodes and informed nodes is analyzed.Finally, influential rumor spreaders in a social network are identified.

Data Set and Data Set Processing
We will simulate rumor spread on the following 4 real social networks.

Email-URV.
Email-URV is an email network based on communications of members of the University Rovira i Virgili (Tarragona) [18].It was extracted in 2003.The nodes in the network represent the email accounts.There are 1700 nodes altogether in the network.We consider only the largest connected subnetwork in the network.The subnetwork consists of 1133 nodes with an average degree of about 9.6.

Netsci.
Netsci is a coauthorship network of scientists working on network theory and experiments.It is compiled by Mark Newman in May 2006 [19].And it is an undirected network composed of all authors of the article quoted by [20,21].Nodes in the network represent scientists from different fields.There are 1589 nodes in the network.Each link between two scientists indicates that they are coauthors of an article.We consider only the largest connected subnetwork containing 379 nodes.[22] is a network of relationships between American PolBlogs in the 2005.We consider only the largest connected subnetwork containing 1222 nodes.

PolBlogs. PolBlogs
2.4.Ca-GrQc.Ca-GrQc is an academic collaboration network from the e-print arXiv and covers scientific collaborations between authors' papers submitted to the General Relativity and Quantum Cosmology category from Jan. 1993 to Apr. 2003 [23].The network contains 5242 nodes, in which the largest connected subnetwork contains 4158 nodes.
The structure properties of these four real social networks are shown in Table 1.
In the experiment, we choose two most intuitive ranking methods for network nodes.One is the degree centrality and the other is the K-Shell decomposition method.We determine which sort of ranking method is used based on the correlation coefficient of the network because the assortative behavior in a network can influence the extent to which hubs will appear in the periphery or in the core of a network [7].When the assortative coefficient is positive, i.e., the network is assortative, the high-degree nodes tend to attach to other high-degree nodes, while the low-degree nodes tend to connect with low-degree nodes.So the degree centrality of nodes is appropriate to measure the propagation ability of nodes in an assortative network.When the assortative coefficient is negative, the hubs tend to be in the periphery, due to their tendency to connect to low-degree nodes.So for a disassortative network, it is more appropriate to use K-Shell to measure the propagation ability of nodes.
Email-URV and Ca-GrQc are assortative.Based on the degree centrality of nodes, we divide the nodes of the Email-URV into 12 layers.The nodes of the Ca-GrQc network are divided into 15 layers.The degrees of the first layer nodes are 1 to 5. The degrees of the second layers nodes range from 6 to 10 and so on.The assortative coefficients in Netsci and PolBlogs are negative.The nodes in Netsci are divided into 8 k-shells, and the nodes of PolBlogs are divided into 36 k-shells.

Rumor Propagation Model
We consider a system made up of N individuals.Each node in the network plays one of three roles: I (ignorant), S (spreader), and R (stifler).The ignorant is one who has not heard the rumor.The spreader is active to spread the rumor and the stifler is an individual who knows the rumor but loses interest in spreading it.At the beginning, only one node is a spreader and all the others are ignorant.At each time step, each spreader contacts all his (her) neighbors.When an ignorant meets a spreader, the ignorant turns into a spreader with probability ; when a spreader encounters another spreader or a stifler, the spreader loses interest in spreading the rumor and becomes a stifler with probability .Until there is no spreader, the propagation progress terminates.
To investigate the propagation ability of nodes as the sources of rumor propagation, we mainly focus on three quantities: propagation size (the final density of the stiflers), the peak prevalence (the maximum density of spreaders), and the arrival time of the peak prevalence.In the case of a single source, if the peak prevalence is large, the arrival time is short, and propagation size is large, and then the propagation ability of the source node is strong.That is, the source node is an influential spreader.Otherwise, the propagation ability of the source node is weak, and it has little influence as a source node.
We will perform a lot of Monte Carlo simulations on each network.For each selected source node , we perform 1000 simulations without special instructions.At time , the density of spreaders obtained in the th simulation is  , ().We average over the results of the 1000 simulations and denote it as   (), i.e., At time , the density of stiflers obtained in the th simulation is  , ().We average over the results of the 1000 simulations and denote it as   (), i.e., ( The final density of the stiflers obtained in the th simulation is  , ∞ .We average over the results of the 1000 simulations and denote it as   ∞ , i.e., In this paper, we divide the social network nodes into multiple layers by appropriate measures, so that each node corresponds to the only one layer.Distinguishing the decomposition methods, we classify nodes with the same layer value  as the same kind of points and denote this point set by   .The number of nodes in   is   .The average prevalence for the rumor propagation originating from the nodes in   is The average peak prevalence for the rumor propagation originating from the nodes in   is The average propagation size for the rumor propagation originating from the nodes in   is We denote the arrival time of the average peak prevalence for the rumor propagation originating from the nodes in   as   ().

Simulations and Results
We investigate the propagation ability of the nodes from two aspects.The one is the performance of nodes as the source nodes of rumor propagation.The other is the performance of nodes as the informed ones.
In the experiment, we always let  = 1 and the propagation rate  takes different values, so that the average final propagation size is limited to the range from 1% to 20% [7].
Figure 1: The time evolutions of (a) the average density of spreaders and (b) the average density of stiflers for rumor propagation with the spreading rate  = 0.2 and the decay rate  = 1.From bottom to top, the spreading originates from a node with degree 1, a node with degree 16, a node with degree 71, respectively.

The Performance of Nodes as the Propagation Sources.
Firstly, we select three nodes with different degrees from Email-URV.The degree of the first node is 1 and the node is labeled 1.The degree of the second node is 16 and the node is labeled 2. The degree of the third node is largest, 71.And the third node is labeled 3. Figure 1 shows the peak prevalence and the propagation size of the rumor propagation originating from the three nodes individually when =0.2 and =1.As we can see from Figure 1, the higher the degree of the node, the greater its propagation size, the higher the peak prevalence, and the shorter time which the propagation has taken.That is, the propagation ability of these three nodes with different degrees is very different.It is not an individual phenomenon in the course of rumor spreading, it is general.
Figures 2-5 display three quantities of rumor propagation in Email-URV, Netsci, PolBlogs, and Ca-GrQc, respectively: (a) the average propagation size, (b) the average peak prevalence, and (c) the arrival time of the average peak prevalence when the spreading rate  takes different values and the decay rate  = 1.In these figures, the abscissa represents the layer which a node belongs to.We can see that when the  value is large, the average propagation size and the average peak prevalence are generally large and they almost increase with the increase of  value; When the  value is large, the arrival time of the average peak prevalence is generally small and it almost decreases with the increase of  value.This shows that the nodes in high layers can detonate rumors in a relatively short time and eventually inform many people.
Figures 2-5 tell us that the high layer nodes in the network are influential rumor spreaders; namely, there are influential source spreaders in the rumor propagation dynamics.Because we consider the rules of rumor propagation in real social networks, the simulation results indicate the existence of influential rumor spreaders in real social networks.This conclusion is contrary to the conclusions in the reference [17], but it is consistent with the conclusions which are obtained by Kitsak et al. based on the SIR epidemic model in [7].

The Performance of Nodes as the Informed
Ones.When we investigate the performance of a node as an informed node, we let each node as a source node to propagate a rumor, then we take the average of their propagation results, and finally we examine four quantities: (1) the average probability of each node to be informed, denoted by  − ; (2) the average waiting time spent by each node from ignorant state to informed state, i.e., the time when each node is informed, denoted by   ; (3) the time when each node terminates propagation, denoted by   ; (4) the propagation duration of each node (the length of time that each node takes to propagate the rumor), denoted by PD.
In Figures 6-8, the abscissas of the subgraphs (a) and (d) mark the sequence number of the nodes sorted by degree centrality ascending and the abscissas of the subgraphs (b) and (c) mark the sequence number of the nodes sorted by K-Shell values ascending.
Figure 6 shows the average probability of each node to be informed on the four networks.From the subgraphs (a) and (d), we can see that the greater the degree of a node, the larger probability the node is informed.From the subgraphs (b) and (c), we can see that the greater the K-Shell value of a node, the larger probability the node is informed.From Figure 6, we know that rumors are more likely to reach higher layer nodes.This also reflects the great impact of high layer nodes on rumor propagation.Figure 7 displays the waiting time spent by each node from ignorant state to informed state on the four real social networks.And Figure 8 displays the time when each node terminates propagation.It is found that the higher layer nodes terminate propagation and become Stiflers earlier.This is consistent with the conclusions obtained in [17].It can be seen from the subgraphs (a) and (d) of Figure 9 that the larger the degree of a node is, the shorter the propagation duration is.Also it can be seen from the subgraphs (b) and (c) of Figure 9 that the larger the K-Shell value of a node is, the shorter the propagation duration is.That is, the high layer nodes have the effect of shortening the propagation process.It is consistent with the conclusions obtained in [17].
As we can see in Figures 6-9, nodes with large degrees or large K-Shell values have a high probability to be informed.Once they are informed, because of their advantages (with a lot of neighbors), they stimulate rumor propagation on one hand.On the other hand, precisely because of their advantages, they make their spreader neighbors stiflers, suppress the spread of rumors, and play the role of firewall [17].The double contradictory roles of these nodes.Which is more powerful?Therefore, to determine whether or not the firewall role of high layer nodes counteract or even exceed the propagation role of these nodes, we simulate rumor propagation on each network deleted several high layer nodes and compare the results of rumor propagation on networks before and after deleted several high layer nodes.Ensuring the connectivity of the networks, the first 7 nodes are deleted in Email-URV where the nodes are ranked in order of descending degrees, the first 4 nodes are deleted in Netsci where the nodes are ranked in order of descending K-Shell values, the first 12 nodes are deleted in PolBlogs where the nodes are ranked in order of descending K-Shell values, and the first 10 nodes are deleted in Ca-GrQc where the nodes are ranked in order of descending degrees.
From Figures 10-13, we can see that after deleting high layer nodes in the networks, regardless of the propagation size, the peak prevalence, or the arrival time of the peak prevalence is almost worse than the case before deleting high layer nodes.It shows that although the high layer nodes have the function of firewall, being able to be informed by a high  probability itself has contributed to the promotion of rumor propagation.In addition, the high layer nodes can shorten the propagation process [17].The combination of these two advantages shows that the high layer nodes as informed nodes have outstanding performance for rumor propagation.

Conclusions
Identification of influential spreaders is a core issue of propagation and control on social networks.The influence of a node in the process of rumor propagation is not only related  to its topological characteristics, but also to the propagation parameters of the rumor model [24].On a few real social networks, Borge-Holthoefer and Moreno [17] studied rumor spread in two different ways and under the condition that the spreading rate is higher than the stifling rate.They found that no matter where the rumor originated from, the number of people who eventually heard the rumor (propagation size) was almost the same.
In the paper, we consider the cases in which the spreading rate is smaller than the stifling rate and the one-to-many modes of propagation.According to the different characteristics of the networks, the network nodes are ranked by different node ranking methods.Following, we use Monte Carlo Method to simulate rumor propagation and investigate the propagation performance of nodes in different layers as source nodes and informed nodes.From the simulations, we have the following conclusions: (1) As propagation sources, nodes in a higher layer can explode rumors in a shorter time and eventually spread more people.
(2) As informed ones, nodes in a higher layer can be informed by higher probability and shorten the propagation process.
The two aspects show that, in the process of rumor spreading, the high layer nodes are influential spreaders; i.e., there exist influential rumor spreaders in social networks.The conclusion is contrary to the ones in reference [17].It is a supplement to the study of reference [17] and points the way for controlling the rumor propagation or promoting the information dissemination.
In our work, only based on the assortative coefficients of the network, we choose two most intuitive ranking methods, degree centrality, and K-Shell decomposition method, to rank the network nodes roughly.It is a shortage.For networks with different structure, there may be other more suitable ranking methods.We need to analyze the specific network by a specific method, even though our results will be greatly beneficial for us to understand the spreading behaviors and design effective strategies to control rumor propagation or promote information transmission.

Figure 2 :Figure 3 :
Figure 2: Three quantities of rumor propagation in Email-URV: (a) the average propagation size, (b) the average peak prevalence, and (c) the arrival time of the average peak prevalence when the spreading rate  = 0.1(),  = 0.125(),  = 0.15(),  = 0.175(), and  = 0.2(), respectively, and the decay rate  = 1.The performance of nodes in the higher layer is better.

Figure 4 :
Figure 4: Three quantities of rumor propagation in PolBlogs: (a) the average propagation size, (b) the average peak prevalence, and (c) the arrival time of the average peak prevalence when the spreading rate  = 0.1(),  = 0.125(),  = 0.15(),  = 0.175(), and  = 0.2(), respectively, and the decay rate  = 1.The performance of nodes in the higher layer is better.

Figure 9 :Figure 10 :Figure 11 :
Figure 9: The propagation duration of each node in (a) Email-URV, (b) Netsci, (c) PolBlogs, and (d) Ca-GrQc, respectively.The high layer nodes have the effect of shortening the propagation process.

Figure 12 :Figure 13 :
Figure 12: Comparison of three quantities of rumor propagation before and after deleting 12 nodes in PolBlogs: (a) the average propagation size, (b) the average peak prevalence, and (c) the arrival time of the average peak prevalence (red circles represent the situation before deleting and blue stars represent the situation after deleting).

Table 1 :
Structure properties of the real social networks.Structure properties include the number of nodes , the number of edges , the average degree ⟨⟩, the assortative coefficient , and the average shortest path length .