Spreading to localized targets in complex networks

As an important type of dynamics on complex networks, spreading is widely used to model many real processes such as the epidemic contagion and information propagation. One of the most significant research questions in spreading is to rank the spreading ability of nodes in the network. To this end, substantial effort has been made and a variety of effective methods have been proposed. These methods usually define the spreading ability of a node as the number of finally infected nodes given that the spreading is initialized from the node. However, in many real cases such as advertising and news propagation, the spreading only aims to cover a specific group of nodes. Therefore, it is necessary to study the spreading ability of nodes towards localized targets in complex networks. In this paper, we propose a reversed local path algorithm for this problem. Simulation results show that our method outperforms the existing methods in identifying the influential nodes with respect to these localized targets. Moreover, the influential spreaders identified by our method can effectively avoid infecting the non-target nodes in the spreading process.

outperforms other centrality methods, especially when the infection probability is near the critical infection probability λ c in these artificial and real networks.

Supplementary Note 3: effect of ǫ and path lengths on the ranking accuracy
We have computed the accuracy of the Reversed Local Path method (τ ) under different ǫ value, as shown in Fig. S3. One can see that τ can achieve a maximum when ǫ is set to an optimal value. The optimal ǫ varies from one network to another and the setting of ǫ we used in the paper (i.e. ǫ = 0.1) is not the optimal ǫ. However, this setting of ǫ can result in rather satisfactory ranking accuracy (i.e. ǫ = 0.1 is near the optimal ǫ in many networks).
In Fig. S3, we also marked the results when ǫ = λ = k /( k 2 − k ) which is the infection probability we used for the SIR model. This setting of ǫ seems to be better than ǫ = 0.1. As in real cases we usually don't know the true infection probability of the spreading process, we present the results of ǫ = 0.1 in the paper.
In addition, we study the dependence of the accuracy (τ ) on the length of paths (l) used in the RLP method, as shown in Fig. S4. One can see in the figure that at l = 3, the accuracy already reaches a plateau, which is why we only consider paths with length three in our method. 3 number of links per node connecting to nodes outside its community. The larger k out is, the less obvious the community structure is. The performance of different algorithms in this network is compared in Fig. S6 where we consider both random target scheme and local target scheme. In the random target scheme, 10% nodes (i.e. 12 nodes) are randomly selected as target nodes. In the local target scheme, 12 nodes within a community are randomly selected as target nodes. In Fig. S6, one can see that as k out increases, the traditional centrality index (e.g. degree, betweenness, k-core) tends to have a better accuracy, while the RLP method's accuracy tends to decreases. In addition, we find that the RLP method generally performs better in the local scheme than the random scheme. These results indicate that the target spreading problem in general becomes more challenging and the advantage of RLP is bigger when the network diameter is large (e.g. the k out is smaller in GN-benchmark).

Supplementary Note 6: spreading with local and global targets in BA networks
We compute ρ i of each node in Barabasi-Albert (BA) networks [2] with size N = 500 and mean degree k = 4. The dependence of ρ i on the spreaders' degree in BA networks with the globalized target case and the localized target case is shown in Fig. S7(a)(b), respectively.
In Fig. S7(a), i.e. the globalized target case, one can see that ρ i strongly correlates with the spreaders' degree k i . However, in the localized target case, the correlation between ρ and k is much weaker as shown in Fig. S7(b). For a fixed degree, there is a wide spread of ρ values, which indicates that degree is no longer a good predictor of nodes' spreading ability.
In Fig. S7(b), the color of each point represents the mean shortest path length d i from the spreader i to the target nodes. One can see that the nodes with small d i and large k i tend to have high ρ i .
To further understand above observations, we investigate the effect of different location of the targets in Fig. S7(c)(d). We fix the number of target nodes as 30 and consider two scenarios, i.e. either the targets are randomly located in the network or they are located in a small area. To realize the second scenario, we first randomly pick up a node and set it as a center for this small area. The rest of the targets are placed in the nodes with the shortest path length not larger than 2 to the central node. We compare the fraction of infected target nodes ρ as a function of the infection probability λ in these two scenarios. As a benchmark, we also plot ρ versus λ with the globalized targets in both Fig. S7(c) and (d). One can see that if the 30 targets are distributed randomly, the curve overlaps well with the curve of the globalized target case. However, when the targets are localized within two step distance, the ρ curve is a bit higher than two cases above. This is because when one tries to select the targets within L = 2 distance from a central target node, the large degree nodes are more likely to be selected. As they are easier to be infected in the spreading process, the fraction of infected nodes in this scheme is higher than the random/global scheme when the same infection probability λ is given. These results also indicate that the localization of the targets makes the spreading properties significantly differs from the traditional case. Supplementary Note 7: the case when target nodes can be chosen as seeds To get a more complete picture, we also consider some real cases where the target nodes can be chosen as seeds. The results are similar to those presented in this paper. The RLP method could also extend to this situation. Therefore, for the target nodes, the formula for RLP reads where f is a 1 × N vector in which the components corresponding to the target nodes are 1, and 0 otherwise. A is the N × N adjacency matrix of the network with A ij = 1 indicating that node i connects to node j and A ij = 0 otherwise.
In is obtained by averaging over 5000 independent realizations. The procedure is that we first take a realization of a network, investigate lots of target node sets in order to compute τ , and then average τ over many network realizations. However, for each of the real network cases ( Fig. S8cd), there is only one network and we just average the results over different target node sets. One immediate observation in Fig. S8 is that the RLP method has much higher accuracy τ than the other methods, especially when λ is small. However, when λ is too large and far exceeding the critical infection probability λ c (marked by the orange vertical dashed lines in the figure), the spreading originated from each node may cover nearly the 5 whole network including the target nodes. In this case, the final spreading coverage can no longer reflect the true spreading ability of nodes. Therefore, the τ value of RLP is similar to that of the other three methods when λ is large. Compared with the Fig. 4 in the paper where the target node cannot be chosen as seeds, one can clearly see that the results are consistent, indicating the advantage of the RLP method over the existing methods.
Supplementary Note 8: the local degree method considering neighbors up to different distances l.
We investigate the spreading ability ranking accuracy τ under different m and L in four networks. At this point, LD 1 method represents that we only consider the degree of the nodes which are neighbors to the targets nodes, while LD 3 method includes the degree of nodes within the distance l = 3 from the target nodes (i.e. the LD method in text). We then compare the performance of the RLP method with this two kinds of LD methods in Fig. S9. The way we place the target nodes is the same as Fig. 2(b). We first select a node in the network as the so-called central node. There are m targets in the network and the m − 1 targets randomly locate in the nodes with maximum distance L (measured by the shortest path length) to the central node. Apparently, when L is infinitely large, these m nodes distribute randomly in the network. The smaller L is, the more localized the targets are. In Fig. S9, one can see that LD with l = 3 indeed outperforms LD with l = 1 in WS, Netsci and Y2H networks. However, LD with l = 3 has a lower accuracy than LD with l = 1 in BA networks. This is because the existence of the hub nodes in a BA network will make the target nodes neighbors up to a distance l = 3 cover almost all the nodes in the network.
In this case, the LD with l = 3 method becomes similar to the traditional degree method and thus has a low accuracy.
We also notice that the accuracy of LD with l = 3 is rather stable under different m and L, which is different from that of LD with l = 1. For LD with l = 1, its accuracy tends to increases with m and L. This is because the number of target nodes neighbors increases with m and L. However, in LD with l = 3, the target nodes neighbors up to distance 3 have already cover most of the nodes in the local area. In this case, increasing m and L will not further increase the number of considered targets neighbors. Therefore, the accuary of LD with l = 3 is insensitive to the parameter m and L. However, both LD methods have lower 6 accuracy than our RLP method.
Supplementary Note 9: the relation between best spreaders and the highest rank in RLP method.
In order to study the relation between best spreaders and the values obtained from RLP method, we investigate two cases like Fig. 2(c)       The difference between (a) and (b) is that the center has k = 27 in (a) while k = 8 in (b). In these two sub-figures, the networks are Netsci with N = 379 and k = 4.8.