Infectious Disease Containment Based on a Wireless Sensor System

Infectious diseases pose a serious threat to public health due to its high infectivity and potentially high mortality. One of the most effective ways to protect people from being infected by these diseases is through vaccination. However, due to various resource constraints, vaccinating all the people in a community is not practical. Therefore, targeted vaccination, which vaccinates a small group of people, is an alternative approach to contain infectious diseases. Since many infectious diseases spread among people by droplet transmission within a certain range, we deploy a wireless sensor system in a high school to collect contacts happened within the disease transmission distance. Based on the collected traces, a graph is constructed to model the disease propagation, and a new metric (called connectivity centrality) is presented to find the important nodes in the constructed graph for disease containment. Connectivity centrality considers both a node’s local and global effect to measure its importance in disease propagation. Centrality based algorithms are presented and further enhanced by exploiting the information of the known infected nodes, which can be detected during targeted vaccination. Simulation results show that our algorithms can effectively contain infectious diseases and outperform other schemes under various conditions.


I. INTRODUCTION
Infectious diseases pose a serious threat to public health due to its high infectivity and potentially high mortality. Over the last few decades, infectious diseases have caused several regional and worldwide pandemics, resulting in many infections and deaths. For example, during H1N1 pandemic in 2009, more than 600,000 cases were lab-confirmed and more than 14,000 were dead all over the world [1]. According to World Health Organization (WHO), the epidemic of Ebola in 2014 has caused significant mortality in many West African countries, with a reported case with fatality rate of 70% [2]. Even for a seasonal influenza, it is estimated to affect 5% to 15% of the global population and cause 3 to 5 million cases of severe infections and 250,000 to 500,000 deaths worldwide each year [3].
To prevent infectious diseases, one of the most effective ways is to vaccinate the susceptible individuals. However, due to resource constraints such as the limited vaccine supply, in many cases, it may not be practical to vaccinate all the susceptible individuals, especially when a new infectious disease outbreaks. Therefore, targeted vaccination, which vaccinates a small group of people in a community, is an alternative approach to contain infectious diseases. The challenge is how to find the group of people whose vaccination will averagely result in the maximum reduction of disease spread.
Targeted vaccination has been studied in some previous works [4]- [7]. However, these works are limited to theoretical analysis based on synthetic networks such as random, homogeneous or scale-free network, which may not reflect the real contact patterns among people in different scenarios. For example, based on the contact traces collected from high school students, where each student carries a sensor node to record the contacts with others by sending and receiving packets (details shown in Section III), we show in Figure 1 that the distributions of the number of nodes' contacts (i.e., the number of packets received from other nodes) and neighbors (i.e., the number of nodes from which packets are received) are different from the power-law distributions. Since students in the high school spend most of their time in classes and students in the same class may have contacts with each other, most of the nodes in the network will have similar number of neighbors and contacts. As observed from our collected trace, the number of nodes' neighbors and the number of nodes' contacts are more likely to follow normal distributions rather than power-law distributions.
The problem of targeted vaccination has some similarity to virus (worm) containment in the area of computer networks, such as cellular networks [8]- [10] or online social networks [11]- [14]. Based on cluster partition and community detection, different schemes have been proposed for virus (worm) containment [8], [11]. By using various techniques to divide the network into different partitions, these schemes contain the viruses (worms) within the infected partition before they spread out. More specifically, the nodes that separate network partitions are vaccinated in [8] and the neighbors of overlapped nodes between two communities are vaccinated in [11]. However, these works mainly focus on containing the worm (virus) in cellular network or online social network, which has different propagation patterns from infectious disease. For infectious disease, it will be transmitted with a probability when two people contact with each other. However, in cellular network or online social network, a node will be infected immediately once it contacts with an infected node. In addition, they implicitly assume that all nodes in the network are eligible for vaccination, which is not true in disease containment (e.g., we cannot vaccinate an infected node). Thus, we cannot directly apply these schemes to targeted vaccination.
Different from the aforementioned works, we deploy a wireless sensor system in a high school to collect contacts among students. Since the wireless signal strength degrades as the communication distance increases [15], we can measure the wireless signal strength and then infer when and where students meet with each other. This information is important for modeling the propagation of infectious disease. Many respiratory infectious diseases (e.g., influenza) spread among people by droplet transmission, requiring an infected and a susceptible person to be in close physical contact at a short maximum distance [16]- [18]. With our wireless sensor system, we can find student contacts within such distance, and construct a disease propagation graph to model the infectious disease propagation. Then, targeted vaccination becomes a problem of selecting important nodes in a graph to contain infectious disease.
Based on the disease propagation graph, node centrality can be used to measure its importance during disease propagation. Although there are some centrality measures [19] such as degree centrality, betweenness centrality and closeness centrality, they all have disadvantages when applied to disease containment. For example, degree centrality only considers the connection between a node and its neighbors, and thus is limited by its local effect. Betweenness centrality measures the global effect, but it does not describe the difference between a node's influence on its neighbors and that on those nodes far away. Closeness centrality does not work when the graph is disconnected. In disease propagation, an infected node will infect nodes closer with much higher possibility than those far away. Thus, we propose a new metric called connectivity centrality, which takes into account a node's influence on all others and considers nodes closer more important. Based on the proposed centrality measure, we design centrality based algorithm for targeted vaccination. In infectious disease containment, not all nodes are eligible for vaccination. For example, vaccination for a node which has already been infected will not be effective. Some of these infected nodes can be detected during vaccination. With this information, we enhance the centrality based algorithm by considering both a node's infecting capability and its infected possibility. We evaluate the centrality based algorithm and the enhanced algorithm, and compare them with other schemes. The trace driven simulation results show that our algorithms can significantly reduce the infection rate. Although our algorithms are illustrated based on the contact trace collected from a high school, they are trace-independent and can work in other networks.
The rest of this paper is organized as follows. Section II reviews related work and Section III describes our trace collection. We propose centrality based algorithms and enhanced algorithm in Section IV and evaluate the performance in Section V. Section VI concludes the paper. A preliminary work has been published in [20].

II. RELATED WORK
A rich body of work has focused on infectious disease containment. Various disease propagation patterns have been studied in [21] and [22]. Based on the disease propagation model, Prakash et al. [21] derived the epidemic threshold for a given network under which an epidemic will not happen and above which an epidemic will happen. Moreover, they have designed a greedy strategy, which vaccinates the node that causes the largest drop in the eigenvalue of the system matrix. Cohen et al. [22] proposed a mathematical model and an immunization policy based on a small fraction of random acquaintances, and analytically studied the critical threshold for complete immunization. However, these techniques mainly focus on how to avoid the spreading of infectious disease becoming an epidemic, without considering how to decrease the number of infected individuals in a community.
With using the SIR (Susceptible-Infected-Removed) epidemiological model, Madar et al. [23] studied the epidemic spreading behavior in scale-free networks and proposed different immunization strategies. In [24], Hayashi et al. investigated the spread of viruses in growing scale-free networks with new users coming, and compared the performance of targeted vaccination and random vaccination under such network models. However, in these works, they assume that the graph is scale-free and the connections between nodes follow power law distribution.
By analyzing a real cellular network trace, Zhu et al. [8] constructed a graph to describe the social relationships between mobile phones, and proposed two algorithms (balanced partitioning and cluster partitioning) to contain mobile worms at the early stage. Nguyen et al. [11] utilized community structures to contain viruses in online social networks. They presented community detection algorithms to find the overlapping communities and patched the nodes in the overlapped areas to prevent worms spreading from one community to another. With the community structure, Lu et al. [12] calculated the intra-centrality (within community) and inter-centrality (between community) and combined them together to select nodes for vaccination. The Facebook trace in New Orleans regional network was used in [11] and [12]. However, these works mainly focus on the worms (viruses) in cellular networks or online social networks, which have different propagation patterns from infectious diseases. In addition, they implicitly assume that any node in the network can be selected as vaccinated node, while in disease containment, this is not true (e.g., we cannot vaccinate a node that has already been infected).
This paper extends the preliminary version of our algorithm appeared in [20]. In [20], we proposed connectivity centrality and designed an algorithm based on the proposed centrality. In this paper, we enhance the centrality based algorithm by exploiting information of the known infected nodes which can be detected during targeted vaccination. Given a set of known infected nodes, we measure node's infecting capability and its possibility to be infected, and consider these two factors to select nodes to be vaccinated.

III. TRACE COLLECTION
Most infectious diseases spread among people through virus, which is transmitted by airborne infectious particles or small respiratory droplets when two people contact within a certain distance [16]- [18]. Besides, the activity of many infectious viruses (e.g., influenza virus) varies in indoor and outdoor environment because of the different ambient airflow patterns [25], [26]. Therefore, collecting the contacts among people within the disease transmission distance and indicating whether a contact happened indoor or outdoor are important for modeling disease propagation and designing disease containment algorithms. However, most of the existing traces do not consider these two factors, and thus we deploy a wireless sensor system in a high school and collect our own traces.

A. SYSTEM OVERVIEW
Due to the frequent and close contacts among students every day, schools are regarded to play a major role in the spread of infectious diseases into the community [27], [28]. Therefore, we deployed our trace collection system in a high school which has about 800 students. The Crossbow TelosB mote, which has a low-power microcontroller, an IEEE 802.15.4 radio and extended memory, is used to collect student contacts. Since the wireless signal strength degrades as the communication distance increases, we can measure the wireless signal strength and then infer when and where students meet with each other. In the wireless sensor system, we have two types of motes: mobile motes and stationary motes. Mobile motes are carried by students to collect their contacts. As shown in Figure 2, each mobile mote is placed in a pouch attached to a lanyard and worn by a student around his (her) neck. During a school day, the mobile motes are carried by students and each of them is labeled with a unique ID. The mobile mote broadcasts a beacon every 20 seconds and keeps listening to the wireless channel to record beacons from other motes. The beacon includes mote type, mote ID, and its local sequence number which is initialized to 0 and increased by one after each beacon broadcast. Stationary motes are deployed at some fixed places (e.g., classrooms, dining halls and restrooms) to indicate the contact locations. Each stationary mote is also assigned a unique ID and broadcasts beacons with its mote type, ID and sequence number at an interval of 20 seconds. The sequence number starts at 0 when the mote is powered on and increased by one after each broadcast. During trace collection, all the motes keep broadcasting beacons periodically and only mobile motes record beacons from others. Beacons from other mobile motes are recorded as contact information and beacons from stationary motes are recorded to infer whether the contacts happen indoor or outdoor.
The wireless sensor system is deployed during a flu season in 2012. On each school day, the mobile motes are distributed to students around 7 am and received back around 4 pm. In order not to disturb students' activities, the stationary motes are deployed at night before the trace collection and their starting times are recorded manually. The experiment was conducted across two weeks in March 2012. Averagely, 3.4 million contacts were collected between mobile motes each day.

B. DESIGN CONSIDERATIONS 1) DISEASE TRANSMISSION DISTANCE
According to [17], [18], and [29], the airborne droplets can only transmit from one person to another when their contact distance is less than 9 feet. Thus, 9 feet is a critical distance for disease propagation and we only need to collect contacts within this distance. By using received signal strength indicator (RSSI), which reflects the distance between the sending and receiving nodes, we can determine if a contact happens within a specific range by checking if the corresponding RSSI is above certain threshold for a given transmission power.
Different from many existing sensor network applications [30]- [33], where wireless nodes are supposed to communicate with each other with the highest transmission power to achieve higher data delivery rate and reach larger coverage area, we choose a lower transmission power in our system to save energy. Since a TelosB mote only has two AA batteries as its power supply, if it keeps working at the highest transmission power level, its batteries will die very quickly. According to our preliminary experimental results, the transmission power of −16.9 dBm (power level 6 for TelosB mote) is strong enough to ensure a high data delivery rate within a distance of 9 feet. Under this transmission power, the RSSI of the packet received within 9 feet is larger than −80 dBm. Therefore, in our implementation, the transmission power of the mobile mote is set to −16.9 dBm and a beacon from the mobile mote is recorded only if its RSSI is larger than −80 dBm.

2) INDOOR OR OUTDOOR
Since the infecting capability of infectious disease varies in indoor and outdoor environment due to the different ambient airflow patterns [25], [26], stationary motes are deployed to provide location information for inferring where a contact happens. In our system, stationary motes periodically broadcast beacons with transmission power of −11 dBm (power level 10 for TelosB mote) and they are carefully deployed to cover the entire buildings in the school. Thus, if a mote is indoor at some time, it will receive beacons from at least one stationary mote at that time. Further, if a beacon is received from a mobile mote and at the same time both the sender and receiver have recorded beacons from some stationary motes, we can infer that this contact happens indoor; otherwise, it happens outdoor. Therefore, we can discern whether a contact happens indoor or outdoor by checking beacons received from the stationary motes.

IV. TARGETED VACCINATION
In this section, we first construct a graph to model disease propagation based on the collected traces. Then, we propose centrality based algorithm for disease containment, and further enhance the solution by exploiting the knowledge of the infected nodes which have been detected during vaccination.

A. DISEASE PROPAGATION GRAPH
If there is a contact between two students, there will be some probability for the infectious disease to be transmitted between them. Therefore, we can construct a graph (called disease propagation graph) to model disease propagation based on the collected human contacts. The disease propagation graph is represented by G = (V , E), where V is the set of vertices and E is the set of edges. In graph G, each node u ∈ V represents a participant and an edge e = (u, v) ∈ E exists when there is contact between u and v. Since the infectious disease is transmitted bidirectional, G is an undirected graph.
In the disease propagation graph, we assign each edge (u, v) a weight w(u, v) to describe the disease propagation probability between these two nodes. Two factors should be considered when assigning the edge weight: contact frequency and contact location. For two nodes that contact with each other frequently (i.e., they spend a lot of time together), if one node gets some infectious disease, the other one is most likely to be infected. Thus, the more frequently two nodes encounter, the larger weight should be assigned to the corresponding edge. Another factor that affects the probability of infection is contact location. According to [25] and [26], infectious disease such as influenza, is more likely to spread quickly in indoor environment than outdoor environment. Thus, contacts happen indoor should be assigned more weight than contacts happen outdoor.
Considering both contact frequency and contact location, the edge weight w(u, v) is calculated as: 1 if the contact between u and v at time t happens indoor; η 0 otherwise. and T is the time period of the trace used for constructing the graph.
In our trace collection system, each mote (either mobile mote or stationary mote) periodically broadcasts a beacon whose local sequence number is initialized to 0 and increased by one after each broadcast. Since there are many stationary motes whose starting times are manually recorded, beacons received from these motes can be used to synchronize local sequence numbers in the beacons received from mobile motes. Therefore, we use the synchronized global sequence number to represent time t. r(u, v, t) is set to 1 when u receives a beacon from v at t or v receives a beacon from u at t. η(u, v, t) is set to η 0 if the contact happens outdoor. Since infectious disease is relatively inactive in outdoor environment, 0 < η 0 < 1 and its value depends on the characteristic of the specific disease.

B. CENTRALITY BASED TARGETED VACCINATION
The disease propagation graph shows how each node contacts with others and how disease propagates among them. In the graph, each node has different influence on others and thus plays a different role during disease propagation. Since the importance of each node on disease propagation can be measured by centrality, we propose centrality based algorithm for targeted vaccination.
In literature, there are some well known centrality metrics [19] such as degree, betweenness and closeness centrality.
Degree centrality measures how well a node is connected with its neighbors and it is defined as: where N (u) is the set of u's neighboring nodes.
Betweenness centrality measures to what extent a node can connect two other nodes through a shortest path and it is defined as: where σ st is the total number of shortest paths from node s to t and σ st (u) is the total number of shortest paths from node s to t that go through node u. Closeness centrality measures how close a node is to others and it is defined as: where |V | is the cardinality of V and d(u, v) is the shortest path distance between u and v. Although these centralities can be used to measure the node's importance in a graph, they are not applicable to describe a node's influence on others during disease propagation. For example, Figure 3 shows an example of using different centralities to remove one node to contain the disease. Both betweenness and closeness centralities are distance based and the weight between any two neighboring nodes should represent their distance. However, in the disease propagation graph, the edge weight is assigned based on the disease propagation probability. The larger the edge weight is, the closer the nodes are and the smaller their distance is. Therefore, when calculating distance based centralities, 1 w(u,v) is used as the distance between two neighboring nodes u and v. As shown in Figure 3, both node i and j have the highest degree centrality; node d has the highest betweenness centrality; all the nodes have the same closeness centrality of 0. However, none of these centralities returns the optimal vaccinated node e, whose removal will not only separate the graph into different parts, but also remove edges with large edge weights. This is because degree centrality only considers the connection between a node and its neighbors, and thus is limited by its local effect; betweenness centrality considers the global effect, but it does not describe the difference between a node's influence on its neighbors and that on those nodes far away; with considering the distance between two nodes, closeness centrality treats a node's influence on others differently, but its value is dominated by the path with longer distance since all the distances are simply added together, and it does not work in a disconnected graph.
In disease propagation graph, infectious disease is more likely to be transmitted to nodes closer than nodes further away. Thus we propose a new centrality metric called connectivity centrality to measure how contagious an infected node is to others and it is defined as: if there is a path from u to v; 0 otherwise.
and h(u, v) denotes the number of hops between u and v along the shortest path. Connectivity centrality takes into account a node's effect on others and considers nodes closer more important, and thus it is better than other centrality metrics for measuring node's influence on disease propagation. For example, in Figure 3, the optimal vaccinated node e is the node with the highest connectivity centrality.
Based on node centrality, we can propose a straightforward algorithm. To find k vaccinated nodes, we sort all the nodes based on their centrality values and choose the top k nodes. However, in disease containment, not all nodes are eligible for vaccination. For example, vaccinating a node which has already been infected will not be effective. In addition, some nodes may refuse to get vaccinated because of their concerns on potential side effects [34]. Therefore, in the centrality based algorithm, after sorting, we select the first k nodes which are eligible for vaccination as the targeted nodes.

C. ENHANCED TARGETED VACCINATION
Centrality based algorithm selects the nodes with the highest influence to be vaccinated. This is because once these nodes are infected, they are able to infect more nodes due to their close connections. However, it only considers the infecting capability of a node, without considering how possible it will be infected. Both these two factors should be taken into account for vaccination. For example, as shown in Figure 4, each edge has the same weight and a is known as an infected node. By using centrality based algorithm, both d and e can be selected as the candidate nodes due to their high centrality. However, since a has already been infected, d will be a better choice because it is more likely to be infected soon, and vaccinating d will potentially protect more nodes from being infected. In order to describe both a node's infecting capability and its infected possibility, we propose infecting score and infected score by exploiting the information of the infected nodes which can be detected during vaccination, and combine these two scores together to determine which nodes should be vaccinated.

1) INFECTING SCORE
When centrality is used to measure node's influence on others, it implicitly assumes that all the other nodes are uninfected. However, if there are some known infected nodes in the graph, the calculated centrality may not accurately measure node's importance. For example, as shown in Figure 5, the black nodes are infected nodes. With centrality based algorithm, node e should be chosen to be vaccinated first since e has the highest centrality no matter which centrality metric is used. However, considering that node d, f , h and j have already been infected, node b will be a better choice since its removal will separate node a and c from the infected nodes, while e's removal will still leave infected nodes in all partitions. Let I denote the set of known infected nodes; e.g., I = {d, f , h, j} in Figure 5. To measure node b's influence under the infected node set I , nodes in I should not be considered since these nodes have already been infected and b has no influence on them. Also, b has no influence on the nodes that are closer to an infected node than to b. For example, even if node b is infected, it will not affect node i's infection status which only depends on the connection between i and f . As a result, node b's influence should be different with the knowledge of the infected nodes (f in this example). Thus, we define infecting score to measure the infecting capability of a node under infected node set I . As illustrated in Section IV-B, connectivity centrality describes the local and global effects of a node, which is better than other centralities in measuring its importance during disease propagation. Based on connectivity centrality, node u's infecting score under infected node set I is defined as follows: Based on this definition, ϕ(u, I ) = C con (u) when I = ∅, i.e., connectivity centrality is a special case for calculating the infecting score when no node is infected.

2) INFECTED SCORE
To measure the importance of a node more accurately, besides infecting score, infected score is introduced to measure the possibility for it to be infected, which is calculated as follows: if u / ∈ I . (1) Figure 6 illustrates how to apply the above equation to a simple graph, which contains three nodes a, b and c. Suppose I = {a}; i.e., node a has been detected as an infected node. FIGURE 6. Infected score calculation in a simple graph. VOLUME 4, 2016 By applying Equation 1, we have the following linear equations: In the disease propagation graph, for two nodes u and v, we have w (u, v) = w(v, u). Thus, Equation 2 can be easily solved as follows: For a disease propagation graph G = (V , E) with edge weight w(u, v), we can calculate infected score for each node u ∈ V by applying Equation 1. In this way, a |V | × |V | linear equation system can be generated.
It can be easily transformed to a linear equation system denoted by matrices: where I |I |×|I | is an identity matrix, O |I |×|V \I | is a zero matrix,

Theorem 1: The system of linear equations given in Equation 3 has a single unique solution.
The proof of Theorem 1 can be found in Appendix A. As long as a linear equation system shown in Equation 3 can be obtained, some well known methods such as Gaussian Elimination, Cramer's Rule, etc., can be used to solve it and we can get the infected score of each node under a certain infected node set I .

3) COMBINED SCORE
Infecting score measures how a node infects others once it is infected, while infected score evaluates how possible this node will be infected with the current knowledge of the infected set I . Both factors should be taken into account when selecting the vaccinated nodes. Therefore, we combine them as follows to get a node u's combined score.
At each round, the node with the highest combined score is selected as the candidate node. If this node is eligible for vaccination, it is removed from the graph and the combined score is recalculated based on the updated graph; if it has already been infected, it is added into set I and the node with the highest combined score based on the updated I is chosen as candidate. This process is repeated until k nodes are selected for vaccination. Comparing with the adaptive algorithm in [20], our enhanced algorithm exploits the information of some known infected nodes during vaccination and combines both a node's infecting score and infected score together to evaluate a node's influence in disease propagation. The pseudo code of the enhanced algorithm is shown in Algorithm 1.

V. PERFORMANCE EVALUATIONS
In this section, we evaluate the performance of our centrality based algorithm and enhanced algorithm.

A. SIMULATION SETUP
The performance of our algorithms is evaluated based on the trace collected in the high school. The trace is divided into two halves based on the time when it was collected. We firstly use half of the trace as the training data to build the disease propagation graph, and use the other half for performance evaluations. Then we exchange the two halves and run the training and testing again for cross validation.
At the very beginning, we randomly choose a small group of nodes (1%) as the seed set of infection sources to initiate the infection process. The trace is executed based on time units. At each time unit (20 seconds), the SIR model [35] is used to simulate the infection process. In SIR, each node has three states: S (Susceptible), I (Infected) and R (Recovered). For a node which is initially at state S, it will be infected with probability β (called transmission probability) indoor and η 0 β (η 0 is set to 0.5) outdoor by contacting with an infected node. Once the node is infected, it will move into state I. An infected node may recover with a probability δ (δ is set to 0.0003) at each time unit and goes back to state R. Nodes in state R will not get infected again since they have got immunization already.
Although some vaccination strategies have been proposed for certain diseases, e.g., ring vaccination for smallpox and targeted mop-up campaigns for polio, these strategies are based on the knowledge of infected nodes, which is not known in many scenarios. Therefore, instead of comparing with these strategies, we compare our centrality based algorithms and enhanced algorithm (Enhanced) with the community based scheme (AFOCS) [11] and the cluster based scheme (Cluster) [8]. Degree centrality, betweenness centrality and connectivity centrality are used to implement centrality based algorithm (denoted as Degree, Betweenness and Connectivity respectively). Closeness centrality is not used here since it does not work when the graph is disconnected.
Vaccinating Threshold α is used to control when the targeted vaccination starts. It is measured as the percentage of infected nodes in the network. This parameter represents the time delay since the infectious disease starts propagating till a vaccine is generated. Once the percentage of infected nodes reaches the threshold α, we start to distribute vaccines to the selected nodes.  Figure 7 shows how the infection rate changes when the percentage of vaccinated nodes increases with α = 2.5% and 10% respectively. As shown in the figure, no matter which scheme is used, the number of infected nodes will decrease with more vaccines distributed. Enhanced achieves better performance than other schemes under different α. When α = 2.5% and 20% of nodes are vaccinated, the infection rate of Enhanced is about 45%, but the infection rate of other schemes are higher than 50%. For the centrality based algorithms, under different α, Degree performs better than Betweenness since disease is easier to transmit from the infected nodes to their neighbors than to those far away. Connectivity performs better than Betweenness and Degree, verifying that connectivity centrality is better to measure node's importance for disease propagation.

B. COMPARISONS OF INFECTION RATES
Comparing Figure 7a with Figure 7b, we can see that AFOCS and Cluster perform worse than Connectivity when α = 2.5%, but better when α = 10%. The reason is as follows. If more nodes are infected before vaccination (i.e., α is larger), these infected nodes are more likely to be clustered together around the infected nodes. Since AFOCS and Cluster contain the disease by isolating infected communities or clusters, they can perform better when α is larger. However, their infection rate is still much higher than Enhanced.
C. INFECTION RATE VS. TIME Figure 8 shows how the infection rate changes over time with α = 2.5% and 10%, respectively. The spread of the disease can be divided into three phases. At the beginning, the VOLUME 4, 2016  disease is slowly spread from the infection sources. Then, it propagates widely and the infection rate increases quickly. Finally, no more nodes will get infected and the infection rate keeps stable. Comparing with other schemes, Enhanced performs better as the infection rate increases more slowly and is bounded under a much lower level. Figure 9 shows how the disease transmission probability β affects the spread of disease. As can be seen, Enhanced outperforms other schemes under different β. For centrality based algorithms, Connectivity achieves better performance than Degree and Betweenness. Comparing centrality based algorithms with Cluster, both Connectivity and Degree perform better than Cluster when β = 0.002, but Cluster performs better than Connectivity and Degree when β = 0.004. The reason is as follows. Generally speaking, if the infected nodes are uniformly distributed, centrality based algorithms will perform better; if the infected nodes are clustered together, Cluster will perform better. With a lower β, nodes will be infected more randomly, and then their distribution looks more uniform. With a higher β, nodes with close connections will be infected more easily and thus the infected nodes are more likely to be clustered together.

E. EFFECT OF NODE WILLINGNESS
Because of the concerns on the potential side effects, not all nodes are willing to be vaccinated even if they are highly suggested. Thus, we assume each node is willing to be vaccinated with the same probability (called willingness).  Comparing to other schemes, Enhanced is much more robust when node willingness varies. This is because the vaccinated nodes are adaptively chosen at each round in Enhanced. Even if a node is not willing to be vaccinated, its influence on the disease propagation is considered when selecting the next vaccinated nodes. However, in other schemes, the vaccinated nodes are calculated beforehand and node willingness is not considered. For AFOCS and Cluster, if certain bridge nodes (the nodes which connect different communities or clusters) are unwilling to be vaccinated, the goal for isolating the infected communities or clusters may fail.

F. PERFORMANCE IN SCALE-FREE NETWORKS
In some social networks, the number of nodes' neighbors follows power law distribution and the networks are scale free. In order to evaluate the performance of our algorithms in these networks, we generate a synthetic scale-free graph using Barabasi-Albert model [36]. Then we generate contacts in 1400 time steps based on the topology of this graph. At each time step, a node generates contacts with each neighbor with probability p, where p is set to 0.3 in our simulation. With the synthetic contact trace, we can initiate the infection process and compare our enhanced algorithm with other algorithms. As shown in Figure 11, even in the scale-free network, Enhanced achieves better performance than other schemes. Comparing with AFOCS and cluster, centrality based algorithms perform better because they vaccinate the nodes with more neighbors first and these nodes are more likely to be infected in scale-free networks.

VI. CONCLUSION
In this paper, we deployed a wireless sensor system in a high school to collect contacts happened between two students when they are within the disease transmission distance. Based on the collected traces, we construct a disease propagation graph to model the disease propagation, and propose a new metric called connectivity centrality to find the important nodes in the constructed graph for disease containment. Different from centrality measures like degree, betweenness or closeness centrality, connectivity centrality considers both a node's local and global effect to measure its importance in disease propagation. Centrality based algorithms are presented and further enhanced by exploiting the information of the known infected nodes which can be detected during vaccination. We evaluate our algorithms and compare them with other schemes based on the real and synthetic traces. Simulation results show that our algorithms can contain infectious diseases effectively and outperform other schemes under various conditions.

Proof of Theorem 1:
In Equation 3, since if we replace w (u q i , u q j ) with a ij for simplicity, Equation 3 can be rewritten as where A 1 = I |I |×|I | , b 1 = b 1 , · · · −a |V \I |1 −a 12 1 −a 32 · · · −a |V \I |2 −a 13 −a 23  Thus, to prove that the system of linear equations given in Equation 3 has a single unique solution, we only need to prove that Equation 5 has a single unique solution. Let z = [z 1 , z 2 , . . . , z |V \I | ] denote a non-zero column vector of |V \ I | real numbers and let z denote the transpose of z, then we have z A 2 z = z 1 (z 1 − a 21 z 2 − a 31 z 3 − · · · − a |V \I |1 z |V \I | ) + z 2 (−a 12 z 1 + z 2 − a 32 z 3 − · · · − a |V \I |2 z |V \I | ) + . . . + z |V \I | (−a 1|V \I | z 1 − a 2|V \I | z 3 − · · · + z |V \I | ) Considering that a ij = a ji , the above equation can be rewritten as Since ( √ a ij z i − √ a ij z j ) 2 ≥ 0, i.e., 2a ij z i z j ≤ a ij z i 2 + a ij z j 2 , we have and equality holds only when z 1 = z 2 = · · · = z |V \I | = 0.
Since a ij is used to denote w (u q i , u q j ), |V \I | j=1,j =i a ij ≤ 1 and equality cannot hold for every i. Thus, for any non-zero column vector z, we have > 0 Therefore, symmetric matrix A 2 is positive definite and X 2 in Equation 5 can be uniquely solved. Since X 1 in Equation 4 can also be uniquely solved, we prove that the system of linear equations given in Equation 3 has a single unique solution.
MARCEL SALATHÉ received the Ph.D. degree from ETH Zürich, Switzerland. He is currently an Associate Professor with the Schools of Life Sciences, and Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Switzerland. His field of expertise is in digital epidemiology, where he uses sensor data, social media data, and other novel digital data sources to prevent and mitigate the spread of diseases.
GUOHONG CAO (F'11) received the B.S. degree in computer science from Xi'an Jiaotong University and the Ph.D. degree in computer science from Ohio State University, in 1999. Since then, he has been with the Department of Computer Science and Engineering, The Pennsylvania State University, where he is currently a Professor. His research interests include wireless networks, mobile systems, security and privacy, vehicular networks, and cyber-physical systems. He was a recipient of the NSF CAREER Award in 2001. He has served on the Editorial Board of the IEEE Transactions on Mobile Computing, the IEEE Transactions on Wireless Communications, and the IEEE Transactions on Vehicular Technology, and has served on the organizing and technical program committees of many conferences, including the TPC Chair/Co-Chair of the IEEE SRDS'2009, MASS'2010, and INFOCOM'2013. VOLUME 4, 2016