Spatial and Temporal Correlations-Based Routing Algorithm in Intermittent Connectivity Human Social Network

The social network formed by people is one of the key applications of Delay-Tolerant Network (DTN). Owing to its intermittent connectivity and unique human mobility patterns, how to transmit data in an effective way is a challenging problem for the social network. In this paper, we propose the idea of Trip History Model (THM) which establishes a model on a single person's mobility, and then a Spatial and Temporal Correlations-Based Routing Algorithm (STC) is proposed. In STC, the node delivery probability is calculated according to both a node's current moving prediction and its history record to give guidance for message transmission. Our simulation results show that, compared with LABEL and PROPHET algorithms, STC effectively improves the routing performance of the network.


Introduction
The application of wireless network adapters in portable communication devices, such as mobile telephone, personal digital assistants, and laptops, can provide more chances to access global information resources. These portable devices can be connected to dynamic networks. By such networks, people can transfer information to each other. The movement of people and the limited communication range make the network unable to always guarantee its own connectivity. To make communication possible, DTNs (delay tolerant networks) [1] were proposed, which overcome the problems associated with intermittent connectivity, long or variable delay, asymmetric data rates, and high error rates by using store-and-forward message switching. PSN (pocket switched network) is a kind of DTNs, which is composed of mobile communication devices and has broad application in the future.
Because of the dynamic topology and the intermittent connectivity of PSN, people can only communicate by using store-and-forward message switching when they meet. Currently, many proposed PSN routing algorithms are based on historical records, and data is transmitted to the node which can constantly communicate with destination node. Since these algorithms fail to predict the trajectories of mobile nodes, they cannot guarantee the reasonableness of data forwarding. In addition, [2] shows that many people will move according to some certain rules. For instance, they like moving to specific sites periodically; in this case, regular space and time rules can help to guide routing in PSN.
Based on human movement discipline, this paper proposes Trip History Model (THM) to automatically predict the destination of people by studying their trip history recorded by portable communication devices. Meanwhile, according to the spatial and temporal characteristics of human movement, STC (Spatial and Temporal Correlations Routing Algorithm) is designed to forward the message to next node according to the prediction of the current node. Simulation results show that STC effectively improves the routing performance.

Related Work
In recent years, there has been extensive research on routing protocols in delay-tolerant networks. Vahdat and Becker [3] present epidemic routing protocol, which is the representative flooding routing protocol. It attempts to give all nodes a copy of every message through the communication 2 International Journal of Distributed Sensor Networks with them to achieve high delivery ratios. Flooding routing protocol requires no knowledge about the sensor network, but it also has some defects. For instance, since there exists huge amounts of copies of messages, a large proportion of network bandwidth and node buffer resources will be occupied. Meanwhile, more energy of the sensor nodes will have to be consumed to send and receive these copies messages. Based on the research on human society, some scholars proposed the community-based intermittent connectivity routing algorithm.
Literature [4] proposes a Replication-based Efficient Data Delivery (RED) scheme, which consists of two key components for data transmission and message management, respectively. Data transmission scheme decides when and where to transmit data messages based on the delivery probability, and message management scheme makes decisions on the optimal erasure coding parameters according to its current delivery probability. However, the calculation of erasure coding parameters is not a refined one, and a lot of data blocks split by RED will consume lots of energy of the sensor nodes.
Literature [5] presents a LABEL routing scheme, which assumes every node belongs to a community and marks nodes in the same community with the same label. When a node needs to send a message, it will only select the nodes within the communication range which belong to the destination node community as the next hop node. However, when the source node is far from the destination node community, the message cannot be transferred. For this reason, Hui and Crowcroft [5] propose a BUBBLE routing scheme, which takes the node activity into account. Each node has a global level and a local level; the higher its global level is, the more active the node will be and thus having the stronger ability to transfer messages.
In addition, Daly and Haahr [6] present a delaytolerant network routing technique (SimBet) based on the small-world social phenomenon, which comes from the observation that individuals are often linked by a short chain of acquaintances. The betweenness centrality and the social similarity value are jointly decide the Simbet value of each node. When two nodes meet in the network, the message will be transferred to the node with larger Simbet value in order to improve the delivery performance. Simulations using three real trace data sets demonstrate that the achieved delivery performance may be closed to that of Epidemic Routing but the overhead can be significantly reduced.

Analysis
In recent years, great progress has been made in the research of human complex social systems. Based on the tracking data of the wireless mobile network users, Hsu et al. [7] sum up two characteristics of people's movement, that is, preferring specific locations and going to the same location periodically. Based on the tracking data of thousands of anonymous mobile phone users, González et al. [8] propose people's activities have regular space and time rules. These research results prove the movement of people is nonrandom and it has regular space and time rules. In addition, a project named Mobidrive [9] recorded the movement of people in six weeks and found that human movement between 4 to 6 major locations may occupy more than 70% of all the movements. Therefore, the movement of people can be described as a spatial one between regular locations and thus the conclusion that the movement of many persons can be predicted.
The current human movement models fall into the following categories: firstly, the random movement model, including random waypoint, random walk, and so forth.
In the random movement model, node randomly select the destination, direction, and thinking times, and so forth. The movements of all nodes in the models mentioned before follow the same rule. These models have simple structure and can be clearly defined, but they cannot reflect the movement features of people in real world for its lack of authenticity. Second, modeling and analyzing the movement features of people are done based on the collection of the real trajectory. Chaintreau [10] considers intercontact time of portable wireless devices nearly obeys the power-law distribution rather than the original exponential distribution. For such models, the main difficulty lies not in the model itself, but in how to organize data and obtain the implied movement mode through a series of mathematical methods. Third, the movement model is established based on community. Literature [11] divides people into different groups based on the social relations between them. These groups are randomly placed in the simulation area and the nodes select their destination and move according to certain rules. Literature [12] introduces agenda-based routing protocol (ARP), which considers the relationship between human social activities and movement locations based on the survey data of US Transportation Department, but it does not consider the two characteristics of human movement, that is, spatial and temporal. The models above have their defects, and therefore we propose Trip History Model that takes into account the two characteristics of human movement to predict the human activities, which can be easily saved and processed.

Trip History
Model. This paper assumes that the sensor nodes are distributed in a two-dimensional M × M square region A. All the sensor nodes have the same transmission radius R. Network can be described as an undirected graph G = (V , L); V is the set of all nodes in the network, and L is the set of connections in the network. For any two nodes V i and V j , if the distance between them is less than R, there exists a connection between them and let neighbor (v i ) stand for the neighbor of V i .
In addition, the movements of all the sensor nodes follow the Trip History Model. The power of portable devices can be replenished in time, so we assume that the energy of sensor nodes is infinite.
We consider the human activities have certain rules and can be described as a series of data including time and location. All the records of time and location of a person's activities constitute his travel history. Since a person's activities include some repetition every day, we save a person's trip International Journal of Distributed Sensor Networks 3 in a day as a record cycle. Meanwhile, a day can be divided into two discrete sets, that is, Day (Mon., Tues., Wed., Thurs., Fri., Sat., Sun.) and Time (<8, 8-9, 9-11, 11-12, 12-13, 13-15, 15-17, 17-19, >19). For instance, the trip history of a person can be described as follows: (1) Mon. 7:40 move from home to workplace along a certain way. The forth trip record presents a stopover, and such behavior is true in real life. The trip history above can be described as in Table 1.
In addition, for getting easy computing and saving, we adopt some methods to change continuous location into discrete value. Home or Office, or Supermarket, and so forth is not a real GPS location, but a neighborhood GPS location set. People probably play and move near home and office, and the value of GPS may be some measurement errors, so we can consider the positions within a certain range (100 meters) as the same position.
When a person starts moving, the portable device records the current time and position until he reaches the destination. After a trip is finished, the portable device compares the trip with the shortest way according to the map. If the trip is the shortest way, the portable device will save the time, source, and destination and delete other trip history data. Otherwise, it records the time, source, destination, and the midpoints which the person passed by.

Decision Tree.
We use a decision tree to save Trip History Model and help make some decisions on transferring messages in order to improve the delivery performance. As Figure 1 shows, THM can be presented as the branches and leaves of the decision tree. P kq is the probability that q is destination and can be calculated as where f kq is the number of trips according with specific time, midway point, and destination choices at leaf node kq. Since category kq gives a final destination choice, P kq represents the probability of moving toward the destination in history.
Since the car knows current Source, Day, and Time values, it can find a Destination choice with a maximum P kq value in the tree as an initiate prediction. During the movement of people, the portable device will continue to check the current position and identify, if people are still on the way, the predicted midpoint, or destination. If the position disagrees with the predicted route, the new destination prediction will be calculated as ifq is infeasible,  where f k is the total number of the rest of the trips after removing the infeasible ones.

Spatial and Temporal Correlations Routing Algorithm (STC)
The trip history data is collected by the Trip History Model, including spatial and temporal information. To make use of the history data and to predict the movement of the people (nodes) in near future, we introduce spatial and temporal correlations routing algorithm (STC). Table. Assume each node n saves the encounter information including ID, encounter frequency, and predicted destination, as in Table 2. Assume the node V i 's predicted destination is changed, so V i will send a special message to all its neighbor nodes. This message includes the ID of the node V i and the new predicted destination. And the message disseminates as epidemiclike approaches, until all nodes in the network receive this message. When node V j receives this message, it will do the following.

The Encounter Information
(1) If V i 's information has be one included in V i , V j updates V i 's predicted destination.
(2) If no, V j adds V i 's information in its encounter information table, and sets the Encounter Frequency is zero.

Transmission Probability.
Let P io stand for the transmission probability of node V i transferring the message to node V o :  where θ is the angle between two tangents of the predicted destination of node V i to the transmission radius of predicted destination of node V o as Figure 2. According to trigonometric formulas, we can get where R is the transmission radius of node V o , and d is the distance between the predicted destination of node V i and the predicted destination of node V o .

Message Transmission.
Suppose node V i needs to send message to node V o . Firstly, V i would know V o 's predicted destination. If there is not the V o 's information in V i 's encounter information table, V i sends inquiring message to its neighbor nodes. If its neighbor nodes also have not the V o 's information, it will inquire to more nodes, until it gets the V o 's information. Second, V i gets its neighbor's predicted destination by shake hands; if its neighbor nodes' predicted destination is same as V o , it will send the message to these neighbor nodes, if not, V i will send the message to the neighbor node which has the higher transmission probability of V o than V i .

THM Performance.
First, the performance of trip history model is evaluated through the investigation of human trip. In order to get human real trip data, we conducted a 9-week investigation. Invite 5 campus staff members to participate in the survey, and these volunteers have smart handheld devices. In the survey, the drivers need to record the start time, start position, stop position, and midpoint position. The collected trip history data is shown in Table 3. We input these data into the electric map of Chengdu and establish trip history model for each volunteer. Finally, we divide the trip history in Table 3 into two parts. The trip history data in 1-8 weeks regarded as the training set is used to generate the decision tree, and the trip history data in the 9th week  considered as the target set is used to evaluate the predicted result. When the vehicles began to travel, completed a quarter of the trip, half of the trip, three quarters of the trip, and all the trip, the hitting rate compared the predicted destinations of the decision tree with the real destinations, and the results are shown in Figure 3. The average starting accuracy rate is 65.83%, and with the movement of vehicles, the average prediction accuracy rate increases gradually, from 77.56% to 82.77%, to 91.42%, and finally to 97.04%. Although individuals' daily activities are highly regular, the trip in 1-8 weeks cannot fully cover the schedule in the 9th week. Unpredicted circumstances cannot be avoided, so the final average prediction accuracy cannot reach 100%.

STC Performance.
To compare STC to other algorithms, we simulate STC, LABEL, and PROPHET algorithms. In PROPHET algorithm, each node calculates the transmission probability for different destination nodes and only transmits the message to the node which has larger transmission probability for the destination nodes.  consist of 20 "homes", 15 schools, 15 offices, and 10 shops or restaurants. These sites are randomly distributed in the simulation area. In addition, we define three types of nodes, namely, students, workers, and other personnel. For students, the first everyday activity is going to school as it is going to office for workers. Other personnel's first everyday activity is going to other sites except schools and offices (the first activity is randomly selected between 6:00 AM to 8:00 AM). All the people will go home and rest before 12:00 PM Assume that the network bandwidth is 8 Kbps. The message queue length of the nodes is 500, and the survival time of the message is 300 seconds. The simulation duration is 3 days, and the simulation results are as follows.  Figure 4 are different from those in Figure 5, and this shows nodes have already moved. The positions of the nodes are randomly distributed at different times because the sites are randomly distributed in the simulation. In addition, we also find that the density of nodes in the central region is bigger than that in the edge region, and the nodes tend to move to the central region. Considering the actual situation, there will be an uneven distribution of nodes, for example, more nodes should be at "home" in the evening.

Performance Comparison.
The following is the performance comparison of three algorithms under default parameters in Table 4. STC's delivery ratio is bigger than PROPHET's delivery ratio, but the average number of the copies of each message in STC is lower than that in  PROPHET, and this shows the former's average transmission energy consumption is smaller. In addition, we do not consider the energy consumption of exchanging routing tables between two nodes in PROPHET. If the network has more nodes, the energy consumption will be enormous. In STC, when two nodes meet, they do not need to exchange the routing information and much energy will be saved. Table 4 shows the average delivery ratio of LABEL is the lowest and the average delay of LABEL is larger than that of STC, because nodes only transmit collected data to the node with same predicted destination with them in LABEL, and so message transmitted slowly and transmission delay will be longer.

TTL's Impact on Performance.
We change TTL (Time To Live) of messages from 50 s to 300 s, and the simulation results are shown in Figures 6(a)-6(c). In the figures, with the increase of TTL, the average delivery ratio of these three algorithms will increase. There is little difference between STC and PROPHET in terms of the delivery ratio, but compared with LABEL's delivery ratio, theirs are much higher. It shows that the longer survival time the messages have, the more opportunities they get to be transferred to the destination node. Figure 6(b) shows the average number of the copies of each message will increase with the growth of TTL. Although there is little difference between STC and PROPHET in terms of the delivery ratio, the average number of the copies of each message in STC is much lower than that in PROPHET, which implies that STC has less energy consumption. Figure 6(c) shows that average delay increases with the growth of TTL, and the average delay in STC is slightly lower than that in PROPHET.

Node
Density's Impact on Performance. The total number of nodes is increased from 100 to 300 to change the node density, and the simulation result is shown in Figures 7(a)-7(c). Figure 7(a) shows that in all the three algorithms the delivery ratio will increase with the growth of node density.
The increase in the number of nodes will cause a change in the transmission range of each node; as a result, there will be more next hop nodes satisfying the transmission conditions, and thus increasing the delivery ratio. Figure 7(b) shows the average number of the copies of each message will increase with the growth of node density. Although there is little difference between the delivery ratio in STC and that in PROPHET, the average number of the copies of each message in STC is much smaller than that in PROPHET. Figure 7(c) shows the average delay in STC is much lower than LABEL with the number of nodes increasing. With the increase in the number of nodes, there will be more message copies in the network, and it will reduce the number of hops between the source node and the destination node.

Conclusions
This paper explores the trip history model (THM) of people's activities and proposes spatial and temporal correlations routing algorithm. In view of the observation that many people have some regularities in their daily movements, we create a model (THM) to describe the mobility characteristics of people who may be treated as mobile nodes and further make motion prediction. Investigation and experiment results have shown that THM can more accurately predict the moving destination of individuals.
In order to make better use of THM, we propose spatial and temporal correlations routing algorithm (STC). Messages route based on the prediction of node movements in THM, and data will always be transferred to the node which has higher transmission probability of meeting the destination node. Combined with judgments of human's trajectory, STC effectively avoid the blindness in data forwarding caused by the random movement model. According to simulation results, STC can obtain a higher data delivery ratio with lower energy consumption and transmission delay.