Efficient network immunization strategy based on generalized Herfindahl–Hirschman index

The topic of finding effective strategies to restrain epidemic spreading in complex networks is of current interest. A widely used approach for epidemic containment is the fragmentation of the contact networks through immunization. However, due to the limitation of immune resources, we cannot always fragment the contact network completely. In this study, based on the size distribution of connected components for the network, we designed a risk indicator of epidemic outbreaks, the generalized Herfindahl–Hirschman index (GHI), which measures the upper bound of the expected infection’s prevalence (the fraction of infected nodes) in random outbreaks. An immunization approach based on minimizing GHI is developed to reduce the infection risk for individuals in the network. Experimental results show that our immunization strategy could effectively decrease the infection’s prevalence as compared to other existing strategies, especially against infectious diseases with higher infection rates or lower recovery rates. The findings provide an efficient and practicable strategy for immunization against epidemic diseases.


Introduction
The spreading phenomenon is a pervasive process in nature that describes many essential activities in society, such as infectious disease outbreaks, information dissemination, viral marketing, etc [1][2][3][4]. Nowadays, a potential pandemic can possibly reach every city in the world within a few days, which allows for a local disease to evolve into a global pandemic. This is what happened with COVID-19 [5][6][7][8]. It is thus urgent and essential to design efficient mechanisms for the restraint of epidemic spreading.
Complex networks have been proven to be a powerful analytical tool for predicting and controlling epidemic spreading in real-world scenarios [9][10][11]. These infectious diseases are transmitted in a population through the network of contacts between individuals. One of the critical problems is how to best distribute limited treatment and vaccination resources to suppress disease outbreaks [12,13]. There has been an abundant production of heuristic rankings [14][15][16][17] for vaccination or quarantine to identify influential nodes in networks. Some local strategies, such as acquaintance immunization [18] and random-walk immunization [19], have been introduced when the complete knowledge of all individuals is not known. The sampling method was also considered for the immunization strategy of hidden populations [20]. Moreover, recent studies [21,22] seeking immunization strategies have applied message passing techniques, which consider both the network topology and epidemic dynamic. In fact, the immunization problem is similar to the network disintegration problem [23][24][25], which focuses on the destruction of harmful networks through targeted attacks.
The network disintegration problem focuses on determining a set of vertices or links whose removal would collapse the giant connected component (GCC). The most traditional solutions to this problem are The network's case is infected by an infectious disease that hits a node (red) randomly, and the nodes in the component have the risk of being infected. The expectation of the risk region's size in random outbreaks starting from a single random infected node is the HHI. (d) The infectious disease has randomly infected three nodes (red) in the network. The expectation of the total size of the risk region in random outbreaks starting from multiple random nodes is GHI. node ranking methods [26][27][28], which identify the sequence of nodes that will maximize the damage to the network's connectivity. The importance of nodes is often represented by node degree, betweenness, or k-shell centrality, etc [29]. Significantly, the adaptive centrality strategy, which recalculates the centrality of the undismantled nodes at each step, can dramatically improve the effect of dismantling [30]. Recently, several practical algorithms have been proposed for dismantling a network based on collective influence (CI) [31], decycling and tree breaking [32,33], optimal partitioning of graphs [23,34], explosive percolation [24], or articulation points [35]. Moreover, combinatorial optimization-based approaches, including tabu search [36,37], evolutionary algorithm [25], and the deep reinforcement learning framework [38], have been presented to search for the optimal disintegration strategy.
Connectivity is necessary for a network to maintain its function, so the complete fragmentation of a network is the common goal of network disintegration and immunization. However, it is infeasible to fragment the network entirely in many real cases e.g., when vaccine resources are limited, or maintaining the normal operation of society requires certain liquidity under travel restrictions [39,40]. That is to say, there are a certain size and quantity of connected components in the network after immunization. In this context, not all possible spreading occurs in the GCC, and other connected components also contribute to the spread of infectious diseases. Therefore, a new index is required to evaluate the spreading risk of infectious diseases in a network and design effective immunization strategies.
In this paper, we propose an indicator based on the size distribution of connected components to measure the connectivity of the network. Minimizing this index by removing a set of nodes or links, we could obtain an efficient immunization strategy that minimizes the infection risk of individuals in networks. The remainder of this paper is organized as follows. In section 2, we present an index named the generalized Herfindahl-Hirschman index (GHI) and a fast method to approximate the GHI is given in section 3. In section 4, the GHI-based optimization model is proposed to design an immunization strategy. The effects and characteristics of the strategy are discussed through experiments in different networks and spreading models. Finally, the conclusion and discussion are presented in section 5.

The definition of generalized Herfindahl-Hirschman index
Complex networks have long been acknowledged as a key ingredient of epidemic modeling [41,42], which describes how individuals interact with one another. A complex network can be described as an undirected graph G = (V, E), where V is the set of nodes, and E ⊆ V × V is the set of edges. N = |V| and M = |E| are the number of nodes and edges in the network, respectively. The spreading process in the network depends on the network connectivity. The essence of immunization is to fragment the transmission network into small connected components, the largest of which is the GCC, and the GCC's size is a common network connectivity measure. However, when we evaluate the potential risk of infectious diseases, the size of the GCC cannot reflect the infection risk of other components, in which infectious diseases can also break out. Therefore, the infection risk of individuals in a network cannot be judged directly by the GCC. For example, with the two networks presented in figure 1, it is not clear whether the network in figure 1(a) has a lower risk of infection, even though it has a smaller GCC than the network in figure 1(b).
As shown in figure 1(c), the network contains four connected components after immunization. We assume that a disease hits a random node in the network, and therefore all the nodes in the connected component containing the infected node have the risk of being infected. The infection risk of the nodes in the network is defined as the expected fraction of nodes at risk of infection in random outbreaks. In an epidemic model, the nodes can be divided into different states, such as susceptible (S), infected (I), or recovered (R), while the links allow contagion between the nodes. The susceptible-infectious (SI) epidemic spreading model [1,43] represents an infectious disease spread in which infected individuals never recover and keep propagating the disease forever. In the SI model, all the nodes of the connected component will be infected if one node of the connected component becomes infected. Therefore, the infection risk of the individual in the network is described as the expectation of an infection's prevalence (the fraction of infected nodes) in random outbreaks in the SI model, which can be approximated by simulation. In this mechanism, we deduce an accurate expression for calculating the infection risk of the nodes. Say that n i is the size of components C i in the network, where i = 1, . . . , L represents the serial number of components, and p i = n i /N is the proportion of nodes in components C i . Accordingly, the probability that a random outbreak starts in component C i is equal to p i , and the average number of nodes at risk of infection under the infection of a random node is After normalization, the expression of the infection risk of the nodes is equivalent to the Herfindahl-Hirschman index (HHI), denoted by φ.
The HHI [44] is a commonly accepted measure of market concentration in economics. It is calculated by squaring each firm's market share competing in the market and then summing the resulting numbers.
In epidemic outbreaks, the spreading usually starts from multiple infected nodes. Therefore, we generalized HHI to the GHI to measure the infection risk of the nodes from a multi-sourced infection. The distribution of the infection sources in all L connected components is denoted by α = (α 1 , α 2 , . . . , α L ), where 0 α k n k , and L k=1 α k = Ω is the number of the initial infection sources. Calculating the probability of α can be regarded as the problem of placing different sources of infection in different parts (nodes) of the network. The problem is divided into two steps: the first step is to determine which connected component each infection source corresponds to, and then the second step is to assign Ω infection sources to different nodes in the corresponding component. For the infection source distribution α, it contains all the results obtained by sampling the infection sources according to the group division (α 1 , α 2 , . . . , α L ), and the total number of all sampling results is Then, the probability of the distribution of the infection sources α is For a certain placement method such as 1 → C 1 , 2 → C 1 , 3 → C 2 , the probability of putting the infection source 1 in C 1 is n 1 /N. After node 1 is infected, subsequent nodes cannot re-infect node 1, so the where is the characteristic function of α i , and 1 i L I (α i ) p i is the total proportion of nodes at risk of infection (nodes belonging to infected components). The GHI is equal to Ω/N when the network consists of many small components of relatively equal size. In this case, GHI approaches 0 if the initial number of infected nodes Ω is very small compared to N (φ Ω = Ω/N, Ω N, φ Ω → 0). In addition, GHI reaches its maximum 1 when the network is connected (φ Ω = α∈A P(α) = 1). The GHI is formally equivalent to the HHI when Ω = 1. Using (5), the infection risk of the two networks can be calculated as φ 1 = 0.2531, φ 3 = 0.607 in figure 1(a) and φ 1 = 0.2407, φ 3 = 0.5673 in figure 1(b), respectively. The network in figure 1(a) has a greater risk of infection than the network in figure 1(b) although it has a smaller GCC.
It is notably complicated to use equation equation (5) to calculate φ Ω because there are numerous combinations of α. The number of possible situations increases exponentially with the number of initial infections and connected components.

Approximation of the GHI
In this section, an approximate expression of GHI is considered for fast calculation. The actual initial number of infection sources is much smaller than the total number of individuals (i.e., Ω N), so we obtain and and equation (4) is simplified to Meanwhile, we could relax the restriction α k n k in A, when Ω N, i.e., one allows cases for which α k > n k , and obtain A = α| α k 0, L i=1 α k = Ω, k = 1, . . . , L . For set A, α k n k holds for most connected components when Ω N. Therefore, replacing A with A has little effect on the result of calculating GHI.
For α i = 0, we can use the multinomial theorem to simplify equation (9) and approximate equation (5) as in which 1 − 1 − p i Ω is the probability of infection in the connected component C i . Hence, the physical explanation for equation (10) is the sum of expected infection risk for each connected component. The cause of the discrepancy between equation (10) and the precise form equation (5) is that equation (10) allows different sources to infect the same node in the network repeatedly. However, the possibility of a repeated infection is small when Ω N. So it is reasonable to use φ Ω as an approximation of φ Ω . Equation (10) is also equivalent to the precise form equation (5) when Ω = 1.
Next, we compare the computation times for φ Ω and φ Ω in the Erdős-Rényi (ER) network with different numbers of initial infected nodes. The results are shown in figure 2(a), the computation time of φ Ω increases exponentially with the number of infected nodes, preventing it from being executed even with few initial infection sources. The running time can be considerably reduced by the approximate calculation φ Ω . Due to the high time complexity of φ Ω , we use the ρ I instead of φ Ω to verify the effectiveness of φ Ω in subsequent experiments, where ρ I is the average infection's prevalence (the fraction of infected nodes) obtained from simulations of the SI model. In the simulation, we let each node is initially infected with the probability I 0 , and then iterate the SI process with synchronous updating. After the system reaches the state, ρ I is obtained. To verify the effectiveness of φ Ω and φ Ω , we present a comparison between φ Ω , φ Ω and ρ I . In figure 2(b), we can see that the Δ = |φ Ω − ρ I | and Δ = φ Ω − ρ I are so minute that they have little effect on infection risk analysis only. Notably, the results of φ Ω are almost consistent with our simulation results, which verifies the correctness of equation (5). Overall, we can conclude that φ Ω provides an accurate estimation of GHI at acceptable running times when Ω N. In general, the number of initial infection sources Ω is unknown, but the initial infection proportion of the epidemic in the population, denoted by I 0 , can generally be estimated from statistical sampling or clinical data. When I 0 is known, equation (10) can be extended to equation (11). Different from the precise number of infection sources Ω in a single outbreak, N × I 0 represents the expectation of the number of infection sources when each node is initially infected with the probability I 0 .
To further verify that φ I 0 can effectively estimate GHI, φ I 0 is compared with the infection's prevalence ρ I in an epidemic under the SI model for the networks with different component distributions. In figure 3(a), we generate disconnected ER networks with different numbers and sizes of connected components by adjusting the average degree k . Moreover, we randomly remove the edge (in figure 3(b)) or nodes (in figure 3(c)) in the ER network to create different component distributions. The results in figure 3 shows that the simulations (symbols) are in excellent agreement with the theoretical results (lines). These results indicate that φ I 0 can reasonably estimate the GHI, which is especially true when I 0 0.1. With a low level of time complexity and effective estimates of GHI, an efficient immunization strategy aims to reduce the indicator φ I 0 of the network after immunization.

The optimization model of the immunization strategy
Vaccination is one of the most effective ways to prevent or suppress the spread of an epidemic. From the viewpoint of vaccination, immunization corresponds to an attack that destroys the network on which it could spread. This paper considers node immunization approaches and assumes that the attached spreading edges are removed if a node is immunized. The set of immunized nodes is denoted by V immu . The number of immunized nodes is denoted by n, and p = n/N is the immunized proportion of the nodes. An immunization strategy is defined by Thus, we obtain the number of immunized nodes n = N − N j=1 x j . The goal of our immunization method is to identify the optimal solution X * which could minimize the GHI of the network after immunization. With the knowledge of I 0 , we define Φ GHI (X) as the φ I 0 of the network immunized by strategy X. We introduce an optimization model to solve the immunization strategy, which can be described as where j = 1, . . . , N represents the serial number of nodes, and Φ GHI (X) is used as the objective function of the optimization model to measure the effect of X. The solution of the optimization model determines the optimal GHI strategy.
As a contrast, the optimal GCC strategy replaces the objective function Φ GHI (X) in equation (12) with Φ GCC (X) to minimize the GCC of the network, where Φ GCC (X) is the size of the GCC in the network after immunization. Meanwhile, we also compare the optimal GHI strategy with mainstream strategies, including a high-degree adaptive (HDA) strategy and the CI strategy [31]. The HDA strategy removes the nodes according to the adaptive computation of the degree. The CI strategy iteratively calculates the CI value of nodes and removes the node with the highest CI value. The CI value is an extension of the degree centrality which concerns the neighbors of node v j at a distance of and that was set at = 2 in this paper.

Experimental design 4.2.1. Tabu search algorithm
The tabu search algorithm [36,45] has been proved to be an effective method for solving similar problems in the network and thus has been applied here to seek the optimal solution for the above optimization model. The basic principle of the tabu search is to pursue an optimal solution whenever it encounters a local optimum by allowing non-improving moves. Cycling back to previously visited solutions is prevented by using memories, called tabu lists, that record the search's recent history. The procedure of the algorithm is described below.
Step 1: initialization. We set the length of the tabu list L tabu = 100, the number of candidates n can = 500, the maximum total iteration number T max = 30 000, the maximum iteration number without improvement of solution n max = 5000. The termination condition of the algorithm is when the present iteration step T iter reaches T max or the number of iterations for which the optimal solution is not updated n iter exceeds n max .
Step 2: generate the initial solution X 0 . X 0 can either be given randomly or by another strategy with a better performance. Let the current best solution X opt = X 0 . Calculate Φ(X opt ).
Step 3: determine the termination condition. If T iter > T max or n iter > n max , the process stops and output X opt as the results; otherwise, continue to step 4.
Step 4: generate candidate solution. Generate n can new candidate solutions X can by swapping the state of two nodes randomly. Determine X now by X now = max Φ(X can ).
Step 5: update the tabu list. Determine whether X cur / ∈ T list or Φ(X cur ) < Φ(X opt ) (aspiration criterion). If satisfied, add X cur to T list . If not satisfied, find another X cur s.t. X cur = max Φ(X opt ) and X cur / ∈ T list , and then add X cur to T list . Notably, all the elements in the tabu list are abandoned in a certain number of iterations L tabu .
Step 6: update the current best solution X opt . Determine whether Φ(X cur ) < Φ(X opt ). If satisfied, then Φ(X opt ) = Φ(X cur ), T list = NULL. If not satisfied, then return to step 3.
After obtaining the approximate optimal solution, a set of nodes is identified whose removal from the network can minimize Φ(X). The optimal GHI and the optimal GCC strategies are obtained by using the objective function Φ GHI (X) and Φ GCC (X), respectively.

Networks
Many social networks conform to the typical characteristics of small-world, scale-free (SF), or community structures. Hence, we analyze the case of three basic model networks, the Watts-Strogatz (WS) network [46], the SF network [47], and the KOSKK network [48,49].
The WS model starts from a ring of N = 1000 vertices, each of which symmetrically connects to its four nearest neighbors. Then, a fraction of the edges in the network are rewired by visiting all four clockwise edges of each vertex and reconnecting them, with probability p re = 0.5, to a randomly chosen node.
The SF network is generated using preferential attachment [50], which signifies that the more connected a node is, the more likely it is to receive new links. The preferential attachment model is initiated with a small nucleus of m 0 = 5 fully connected nodes. Then, at every time step, a new node is added, with m = 4 links connected to an old node v j whose degree is k j with the probability equal to k j / j k j .
The KOSKK model is a dynamic network evolution model [48,49] that can generate networks with typical features of social networks by utilizing network link weights. The network is initiated with N nodes and zero edges, and then evolved with three mechanisms:  The network is obtained after 10 7 time steps evolution, and the parameters are set as N = 1000, ω 0 = 1, p r = 0.005, p d = 0.001, p Δ = 0.25, and δ = 0.6.

Simulations of epidemic spread
The SI model is somewhat of an oversimplification that is valid only in cases where the time scale of recovery is much longer than the time scale of infection. More realistic models have been proposed in order to better accommodate the biological properties of real diseases. For instance, the susceptible-infectioussusceptible (SIS) and the susceptible-infectious-recovery (SIR) epidemiological models [1,43]. To study the optimal GHI immunization strategy, we compared its efficiency with other strategies in the SIS and SIR models. The comparison results are given in figures 4 and 5. The SIS and SIR models are widely used to simulate the spread of epidemics in a network. In the SIS and SIR models, each node of the network represents an individual, and each edge is a connection through which the infection can spread. In the simulations of this paper, the SIS and SIR spreading processes are implemented by using synchronous updating methods. Namely, at each time step, each susceptible node is infected by its infected neighbor (the node connected) with probability β (infection rate) if it is connected to one or more infected nodes. At the same time, all infected nodes recover with probability μ (recovery rate). The dynamical process terminates when the system reaches a steady state. The SIR model assumes that an infectious individual who recovers from the disease has acquired permanent immunity. In the SIR model, the infection will eventually die out. Conversely, the SIS model assumes that the disease does not confer immunity so that individuals can be infected over and over again. Under SIS, the disease can reach a steady state, where a certain fraction of the population are kept infected. Considering this difference, when we measure the result of the SIR model, the fraction of individuals who have ever caught the disease is denoted by ρ R . For the SIS model, it is the fraction of infected nodes persisting in the steady state denoted by ρ I . p is defined as the fraction of immunized nodes.In the simulation, each node is initially infected with the probability I 0 = 5% (independent of the other nodes), and the spreading model starts with the parameters β = 0.25 and μ = 0.1, averaging over 10 000 independent runs.  Table 1. The basic topological features of the networks. N and W are the number of nodes and links, where β th = k / k 2 is the epidemic threshold of a network, and k and k 2 are the mean degree and second-order mean degree of a network, respectively. C is the average clustering coefficient of a network.

Results in synthetic and real networks
To study the efficiency of the optimal GHI strategy, we focused on the fraction of infected nodes ρ I (steady state) in the SI and SIS models, and the fraction of recovery nodes ρ R (steady state) in the SIR model. Smaller ρ I or ρ R indicates higher efficiency of a strategy. In the simulations, we look at the ρ I or ρ R in the stationary regime (steady state) as a function of the fraction of immunized nodes p. We implement the optimal GHI strategy and other strategies to immunize p proportions of individuals in the networks. Then we let each node is initially infected with the probability I 0 = 5%, and iterate the SI, SIS, and SIR infection process with synchronous updating. The SI, SIS, and SIR process are implemented with a fixed infection rate β = 0.25, and the SIS, SIR process fix the recovery rate μ = 0.1. After the system reaches the steady state, ρ I or ρ R is obtained.
The results of the model networks shown in figure 4 reveal that the optimal GHI strategy has better performance than other strategies, especially in the WS and KOSKK networks. Meanwhile, the advantage of the optimal GHI strategy is not evident in the SF network for the SIR model, and the effects of the HDA and CI strategies are close to that of the optimal GHI strategy. The temporal evolution of spread process in  networks after immunization are given in the insets. For all networks, the infected fraction is significantly lower when using the optimal GHI strategy as compared to other strategies with the same fraction of immunization doses.
The model networks cannot fully describe the characteristics of the real systems. Therefore, we also implemented the selected strategies for three real-world network examples through which epidemics are spread: the politic blog network [51], the Arenas email network [52], and the US air transportation network [53]. The details of these networks are given in table 1. Some conclusions obtained from the model networks are also shown in the real networks. The experiments demonstrate that the optimal GHI strategy exhibits a clear advantage with fewer nodes immunized to achieve the same immunization effect when compared to other targeted strategies ( figure 5). In addition, the fraction of infected individuals for the optimal GHI strategy is significantly lower than those for other strategies with the same fraction of immunization doses. These results show that the optimal GHI strategy for SF characteristics in real networks, and reducing the size of the GCC (P ∞ ) is not as effective as reducing the GHI in the network.
The network is fragmented into many connected components of different sizes by immunization. The size distribution of connected components plays a more significant role than the topology in components for these networks. The immunization strategies with different mechanisms make the distribution of connected components of the network after immunization different. GHI is used to evaluate the infection risk of the network after immunization based on the distribution of connected components. Therefore, the optimal GHI strategy, which minimizes the GHI by immunizing nodes, shows great advantages on immunization in different networks through other strategies.

Robustness of the optimal GHI strategy
So far, we have focused on the performance of the optimal GHI strategy in different networks. These results suggest that, although GHI cannot accurately quantify the expected infection's prevalence under the SIS and SIR models, GHI reflects the structural connectivity of the network and quantizes the impact of the distribution of connected components to the spreading. However, the parameters of the dynamic models are also factors which affect the result of simulation. In previous experiments, the parameters of the SIS and SIR models are fixed. Next, we need to verify whether the strategy is effective under different parameters in the epidemiological model. In this section, we move our focus to the robustness of the optimal GHI strategy and define the robustness in two ways. On the one hand, we consider the robustness of the strategy in terms of sensitivity to infectious disease model parameters. On the other hand, we also evaluate the robustness against the deviation of prior information in the sense of how well the optimal GHI strategy yields, even when the estimated value we obtained does not strictly agree with the precise I 0 .
To further clarify what types of infectious diseases the optimal GHI strategy is suitable for, the performance of different strategies is compared under different infectious disease model parameters on the US air transportation network. There, an infected airport implies that sick people arrive or depart from it. Consequently, immunization means all people at an airport are screened, flights are canceled, or the entire airport is shut down. In figure 6(a), we immunized a fraction p = 0.1 of nodes and compared the impact of different infection rates β on the efficiency of the optimal GHI strategy and at a fixed recovery rate μ = 0.1. Similar experiments were done under different recovery rates μ, at a fixed infection rate β = 0.25 ( figure 6(b)). When β is high or μ is low, the proportion of infections under the optimal GHI strategy maintains a relatively low level. As β increases or μ decreases, the infection's prevalence in the simulation is tend to the value of GHI in the network, and the advantages of the optimal GHI strategy increase significantly. The results show that the effect of the optimal GHI strategy is pronounced for infectious diseases which are highly contagious and difficult to recover. Moreover, from the simulation result in figures 6(a) and (b), the optimal GHI strategy which is effective in the SI model is similarly effective for SIS and SIR models. The effective parameter range of the optimal GHI strategy is suitable for the infection and recovery rates of many real infectious diseases, e.g., SARS and COVID- 19.
In addition, we tested the effects of the optimal GHI strategy to deal with the different initial infection proportions I 0 shown in figure 6(c). It can be seen that the reduced prevalence when the optimal GHI strategy is used performs much better than other strategies under different I 0 .
Meanwhile, in a real-world situation, there is typically no access to the precise initial infection proportion (denoted by I 0 ), and estimated values of I 0 (denoted byÎ 0 ) are generally used to guide decisions. Meanwhile, the initial sources of infection are also not randomly generated and present correlation and aggregation. These factors imply that theÎ 0 referred to for making decisions has a certain deviation from I 0 . Thus far, we have conducted simulations with the assumption that we know a precise I 0 , which is the ideal case forÎ 0 = I 0 . Now, we consider how robust the optimal GHI strategy is against the noise of I 0 . In the experiment,Î 0 is the initial infection proportion used to formulate the optimal GHI strategy, and I 0 is the initial infection proportion used in the simulations of the SIS and SIR model. Without the knowledge of the real I 0 , we formulate an optimal GHI strategy based on the estimatedÎ 0 value. To determine whether this strategy is still valid in the simulation with real I 0 , we test the effectiveness of the strategy in simulation experiments with different I 0 . As is shown in figure 7(a), we takeÎ 0 = 5% to obtain the optimal GHI strategy and study the effects of the optimal GHI strategy for simulations under different I 0 . We find that the epidemic can still be more effectively controlled by the optimal GHI strategy than by others given the same conditions. Moreover, to further test the impact of the estimation accuracy of I 0 on the effect of the optimal GHI strategy, we fixed the I 0 used in the simulation to test the effect of strategies obtained by different estimated valuesÎ 0 . Figure 7(b) shows the effects of the optimal GHI strategy obtained with differentÎ 0 for the simulation under I 0 = 5%. It suggests that our optimal GHI strategy still maintains an effective performance even if there is a certain deviation between theÎ 0 we obtained and the actual value I 0 . As the estimatedÎ 0 moves closer to the actual I 0 , the effect of the optimal GHI strategy becomes increasingly evident.
Based on these experiments, we conclude that the optimal GHI strategy exhibits a low sensitivity to the epidemiological model parameters and a certain robustness against the noise of I 0 .

Conclusion and discussion
In this paper, we proposed the indicator named GHI to measure the infection risk of individuals in a network according to the number of infection sources, along with a computationally efficient method to approximate it. We set our immunization goal as minimizing GHI and established an optimization model to search for the immunization strategy. Our method can immunize or quarantine the population against possible multi-regional outbreaks based on initially infected proportions. We discussed extensive experiments on both synthetic and real-world networks using SIS and SIR simulations. The results show that the optimal GHI method is significantly more efficient at preventing the spread of disease spreading than other basic immunization methods, especially with highly infectious or low recovery rate diseases. Moreover, our strategy shows a certain robustness for deviations in this prior information, which makes the method suitable for the requirements of practical applications.
GHI measures the upper bound of the expected fraction of infected nodes, which is based on the strong assumption that all nodes in the connected components containing infected nodes are at risk of infection. There are many different methods to minimize GHI besides immunization of nodes, such as immunization of links and community isolation. Further research may consider the following two aspects. On the one hand, GHI can be introduced to more application scenarios, and its properties need to be explored further. On the other hand, the tabu search algorithm we utilized in this study is computationally expensive and complex in the face of certain large-scale networks. Therefore, it is necessary to explore heuristics to reduce the computational complexity.