A Data Item Selection Mechanism for Mobile Opportunistic Networks

The nonexistence of an end-to-end path poses great challenges to directly adapting the traditional routing algorithms for Internet and mobile ad hoc networks (MANETs) to mobile opportunistic networks (MONs). In this paper, we try to improve the routing performance by resorting to an efficient data item selection mechanism that takes the bandwidth and connection duration time into consideration. For the purpose of evaluation, a specific data item selection mechanism for a probability-based routing is devised, which is formally defined as an optimal decision-making problem and solved by the dynamic programming technique. The simulation results show that our data item selection mechanism greatly reduces the number of aborted transmissions thus enhancing the routing performance in aspects of delivery probability, average latency, and overhead ratio.


Introduction
Recent years have seen the development of mobile devices such as smartphones, laptops, and tablet PCs, which makes it easier for people to contact and share data with each other in a cheap way.Many researchers use the term "mobile opportunistic networks (MONs)" to describe this special kind of delay-tolerant network (DTN), in which mobile users move around and communicate with each other via their carried short-distance wireless communication devices.Since MONs experience intermittent connectivity incurred by the mobility of users, routing is a mainly concerning and challenging problem [1].In traditional data networks such as Internet, there are usually some assumptions of the network model, for example, the existence of at least one end-toend path between source-destination pairs.Any arbitrary link connecting two nodes is assumed to be bidirectional supporting symmetric data rates with low error probability and latency.In addition, the power of each node is considered to be sufficient and thus irrelative to the node throughput.Messages are buffered in intermediate nodes (e.g., routers) and further forwarded to the next-hop relay or successfully received by the destination.In this case, each message is not expected to occupy the buffer of nodes for a long period of time.However, all the above assumptions usually fail in the context of DTNs.A part of applications in DTNs stresses the delivery ratio while being tolerant of an acceptable end-toend latency, which is known as the "delay-tolerant" property.For further popularizing these kinds of applications, we have to reconsider the widely used network architecture so as to relax the assumption based by the traditional TCP/IP, that the end-to-end connectivity always exists [2].
There are many research achievements of the routing problem in DTNs, which highly improve the performance in the network scenarios like MONs.Most of them focused on exploiting the strategies for choosing relay node(s) during the routing process [3][4][5], while few literatures [6][7][8] considered the effect of data item selection on the routing performance.However, the combination of long-term storage and the message replication performed by many DTN routing protocols imposes a high bandwidth and storage overhead on wireless nodes [9].Moreover, the data units disseminated in this context, called bundles, are self-contained.What is more, the application-level data units can often be large [10].As a result, the nodes' buffer, in this context, usually works at a full load status.Similarly, the available bandwidth for a certain connection is likely to be insufficient to have all the buffered messages forwarded.Consequently, regardless of the specific routing algorithm used, it is important to have efficient scheduling policies to decide which message(s) 2 International Journal of Distributed Sensor Networks should be chosen to exchange with another encountered node when bandwidth and connection duration are limited.
In this paper, we try to improve the classical probability routing protocol PRoPHET [11] from this point.By defining the concept of "Transmission Profit, " we introduce the "Optimal Throughput-Aware Probabilistic Routing, " which is consequently modeled as an optimal decision-making problem and is solved by the dynamic programming technique.Based on this model, a data item selection mechanism for PRoPHET is devised in this paper.The data item selection mechanism in this paper applies to the network scenarios with the two following characteristics.
(1) The average throughput of the connection between each pair of nodes is far smaller than the size of messages in their buffer.(2) The energy consumption for each transmission may not be ignored, which highlights the importance of every successful relay operation.
For example, a transmission operation for a message would fail when the available throughput of the used connection is smaller than the size of this message.However, a solution to this problem is to send a message of which the size is not exceeding the connection throughput, thus avoiding the waste of the transmission opportunity.To say the least, even if the throughput of the connection is sufficient for sending either of the messages, the whole profit we gain for each transmission (e.g., the delivery probability or the endto-end latency) would be distinctive when different messages are selected to send.From this perspective, an efficient data item selection mechanism is expected to be employed in challenged network scenarios.
The rest of this paper is organized as follows.In Section 2 we introduce the system model and the routing model.In Section 3 we formally define the key problem.The improved protocol with the data item selection is given in Section 4. In Section 5 we analyze the simulation result.Section 6 reports on previous similar work.We conclude the paper and discuss future work in Section 7.

Preliminary
The mathematical notations are listed in Table 1.In our model, we use a discrete timeline that is divided into many small time slots of which the length is defined as a unit time.We denote the whole nodes set as There is a message time-to-live value () for each generated message   .When a message is generated by a node, the () is preassigned by the corresponding application.Messages would be dropped if their TTLs are exhausted.The size of each message   is denoted by ().We assume that each node   knows its own bandwidth ().This assumption is acceptable due to the fact that the network interface of each device is usually immovable.To say the least, even the bandwidth of each node is not stable, and we can easily measure the average value by a slide window.For any encountered node   of   , we let   record two values   (,) and   (,) that record the start and the end time of the most PRoPHET routing algorithm records history of encounters and transitivity, and the utility metric is based on an encounter probability with the transitivity.PRoPHET estimates a probabilistic metric called delivery predictability,  (,) , at every node   , for each known destination   .This indicates how likely it is that this node will be able to deliver a message to that destination.The calculation process is listed as follows: where  (,) denotes the delivery predictability of reaching   from   and  init , , and  are initialization constants chosen from the range [0, 1].Each node maintains a 1 × |N| vector, with |N| representing the number of nodes, where each element  records the delivery predictability between   and   .

Problem Formalization
In this section, we give the details of our proposed data item selection scheme.

3.1.
Objective.The objective is to maximize the delivery probability of each message to the destination.In MONs, each node routes the message in a "store-carry-forward" way.We can choose to always forward the message to the node with higher meeting probability to its destination.However, this simple strategy does not take the throughput issue into consideration, which is highly relative to the bandwidth and the connection duration time.Since the bandwidth and the connection duration time between each pair of nodes are limited, the forward sequence of messages in the queue has great effect on the routing performance.Assume that   has three messages   ,   , and   for transmission to   , of which the sizes are 150 k, 200 k, and 100 k.However the current connection can only carry a data flow maximal to 120 k in total.Thus neither   nor   can be successfully relayed from   to   due to the highly constrained throughput of the connection.It is not rational yet to let node   forward the message with the smallest size to   .The reason is that there is no explicit optimization objective, which might take away the relay opportunity of those messages that have a little larger size, but great improvement on the delivery probability after being replicated to   .Consequently, it is necessary to choose an explicit optimization objective for our selection.In this paper, we primarily focus on how to efficiently enhance the delivery performance.In the next part we give analysis process of maximizing the profit on delivery probability.
If the message is forwarded to   , then the delivery would fail only if both   and   fail to deliver the message, and the delivery probability for this case can be computed by the following equation: Thus we have the following definition.
Definition 1 (transmission profit).The transmission profit () is the magnitude of improvement on delivery ratio for message   , where The value of variable () reflects the improved result of the delivery probability for message   .From this point, one consequently defines one's "Optimal Throughput-Aware Probabilistic Routing" as follows.
Definition 2 (optimal throughput-aware probabilistic routing).The optimal probabilistic routing always tries to maximally improve the delivery probability for each message to the destination, by taking the estimated bandwidth of current connection into consideration; that is, each node   forwards several selected messages to a node   with corresponding top improved  values by making the most use of the estimated available throughput of current connection.
For example, as shown in Table 2, there are five messages in the buffer of   .For simplification, we number the five messages as  1 to  5 .And the destination node of   is represented as   .For any message   (1 ≤  ≤ 5), we have  (,  ) >  (,  ) , which indicates that, when   meets   , all these five messages should be forwarded from   to   .By using (5) we can get the  value for each message.In the next part of this section, we firstly show the method to estimate the available connection throughput, and then we give the formal expression for our problem.

Estimating the Available Throughput. Consider the following definition.
Definition 3 (throughput of connection).Given a connection duration time  between two nodes and the bandwidth  (KB/unit) of the connection between them, the throughput  of the connection is Since we assume that the bandwidth  of the connection is known in advance, the remaining task of estimating the throughput is to get the connection duration time .We can use the following equation to estimate the duration time for the next upcoming connection between   and   , where  ∈ [0, 1] is the scaling constant and   −   is the duration time of the most recently happening connection, of which the impact is controlled by the parameter .When  is set to be relatively large, the impact of the second part of the following equation would be enhanced and vise versa.Consider When   has not met   for a while, we use the following equation to update  (,) , where  ∈ [0, 1) is the same aging constant in (2) and  is the number of time units that have elapsed since the last time the metric was aged: In each time unit, the node checks the status of the connection between   and   .The updating process for  is shown in Algorithm 1. Then according to Definition 3, International Journal of Distributed Sensor Networks the throughput of the connection between   and   can be estimated by the following equation: An example is shown in Figure 1.The connection between   and   has shown up 3 times in the time interval [0,11].The variable  is reset to be 1 in the black square and increases by 1 in the white square.At the starting time 0, we set  = 0 and  = 1.In the red square, the variable  is updated by (8), while in the green square  is updated by (7).The whole computing process conforms Algorithm 1. Finally we obtain the estimated value  = 1.62.Assuming the bandwidth of the connection is 6.8 KB/unit (this numerical value is easier for the discussion below), then the throughput of the connection can be estimated by (9) as  (,) =  (,) ⋅  (,) = 6.8 × 1.62 (KB) ≈ 11 KB.(10) 3.3.Formalization.We first show that the routing problem can be viewed as a 0-1 knapsack problem, and then we give the formalization of our routing problem.Theorem 4. The optimal bandwidth-aware probabilistic routing problem can be formalized to be an optimal decisionmaking problem and, furthermore, can be viewed as a 0-1 knapsack problem.
Proof.If viewing the estimated throughput  (,) as the maximum weight that knapsack can carry, each message as an item in the knapsack problem, the improvement value () for each message as the value of each item, and the size of each message () as the weight of th item, then the routing problem is equivalent to the corresponding 0-1 knapsack problem, where decisions are made on each item to achieve the maximal profit.
Theorem 4 shows that the routing problem is an optimal decision-making - problem.The problem is formally defined as follows.
Definition 5 (formalization of routing problem).Assume that the optimal bandwidth-aware probabilistic routing problem is an optimal decision-making problem; that is,

Improved Routing Protocol with Data Item Selection
In this section, we give the detailed information about the improved routing protocol.The key problem of routing is formalized in Section 3. We first apply dynamic programming to the data item selection problem.Then we illustrate how to maintain the needed information during the entire routing process.Finally we show the entire procedure of our improved routing protocol.

Solving the Optimal Decision
Problem.The scheme to solve the optimal decision-making problem is stated in Algorithm 2. The calculating process is shown in Lines 1-5.
And the process of solution construction is stated in Lines 6-19.
Overviewing the example through this paper, the value of corresponding  and the size of messages are shown in Table 2.In Section 3.2 the estimated throughput for the connection between   and   , that is,  (,) , is calculated.Based on all the above, the calculation process of the example is shown in Table 3.

Protocol
Description.Now we focus on the routing protocol.The same as that in PRoPHET, first of all we need to maintain a table recording the meeting probability.Besides, since the estimation of connection throughput is based on the connection duration time, we also need to let each node record the variables   and   and the corresponding estimation value  for the most recently happening connection.Finally, we need to record the number of time units that have elapsed since the last time the metric was aged, that is, , for each connection.We denote the routing information table of   by [], which is shown in Table 4, and the space complexity of [] is (N).
There are two parts of our entire protocol.The information exchange protocol is shown in Algorithm 3 and the data transmission protocol is shown in Algorithm 4.
In Algorithm 3, the primary task is to update the needed information for timely routing and then to exchange it with neighbor nodes.The updating process is stated in Lines 1-4, where the equation in PRoPHET and our updating algorithm are used.In Line 5 the request for  is broadcast to all the neighbor nodes of current node   .As shown in Lines 6-8, if current node   receives the request from any other node, [] will be transmitted to that node.In Lines 9-11, when   receives [] from any node   , then the data transmission protocol will be called.In other words, which is the triggering condition of the data transmission protocol.
In Algorithm 4, the current node   scans its buffer, adding all messages that let (,   ) > (,   ) hold to the message list.In our scheme, the forward strategy is the same as PRoPHET, that a message will be transferred from node a to node b only if the b's contact predictability to the destination node is higher than at the other node.However, the throughput of each connection is taken into consideration, as shown in the remaining part of this algorithm.We firstly estimate the throughput of the connection between   and   in Line 7. Then the transmission profit value () is calculated for each message   by (5).Finally we employ Algorithm 2 to obtain the , where all elements are extracted from the .We sort all the messages in  by ascending order according to the  to give the expiring messages a higher priority for transmission, so as to lower the number of dropped messages.Then in Line 13 all the messages in  are transmitted by   to   in the ascending order of message .

Simulation
The results are evaluated by the ONE simulator [12].We firstly adopt the real experimental trace of the Cambridge-iMote dataset, since it is one of the most extensive and widely exploited data traces.This trace includes Bluetooth sightings by groups of users carrying small devices (iMotes) for two months in various locations that we expected many people to visit.Mobile users in this experiment mainly consisted of students from Cambridge University who were asked to carry these iMotes with them at all times for the duration of the experiment.Then we perform the evaluation based on the Helsinki City Scenario, a widely used synthetic

Simulation in Cambridge-iMote Real
Trace.We conduct the simulations by generating 3,300 messages for randomly selected source nodes and by executing the above-mentioned algorithms to forward these messages to their destinations, while recording the delivery ratio, average latency, and average hop count.In simulations on evaluating all these metrics, we set the simulation time as 25%, 50%, 75%, and 100% of the entire time of the dataset.Figure 2 shows the simulation results on varying buffer size, with the message TTL constant at 300 minutes.In Figure 3, we show the simulation results on varying the message TTL, with the node buffer size constant at 50 M.The other settings of the simulation are listed in Table 5. Regarding all the figures in Figure 2, the results show that our improved throughput-aware routing significantly outperforms PRoPHET and Epidemic routing in delivery ratio and average latency and reduces the overhead to an extent.Though the delivery performance between MaxProp and our scheme varies a little, our proposed routing outperforms MaxProp in either of the remaining two criteria.More specifically, from Figures 2(a), 2(d), 2(g), and 2(j), we can see that the effect on the improvement of delivery performance increases with the simulation time prolonged.As shown in the second column of Figure 2, the improved throughput-aware routing has a comparably lower latency than PRoPHET.By referring to the third column of Figure 2, our proposed routing has greater improvement on overhead ratio with the simulation time set longer.From all the subfigures, we can see that the throughput-aware routing has the overall best performance among all the five protocols.
Regarding all the figures in Figure 3, the results show that our proposed routing significantly outperforms the other two protocols in delivery and overhead, while having a slight improvement on average latency.With the simulation time prolonged, the influence of our proposed scheme has greater improvement on all of the three metrics.However, we do not see much improvement on the average latency.From International Journal of Distributed Sensor Networks all the subfigures in Figure 3, our proposed routing has a relatively better performance than Epidemic, PRoPHET, and EncounterBased routing.Compared with MaxProp, our proposed routing performs much better when the whole simulation time is short, which indicates that the proposed routing reaches the best status more quickly in real network scenarios.

Simulation in Helsinki City
Scenario.In Helsinki City Scenario [12], the nodes are assumed to be users with mobile phones or similar devices, using Bluetooth interface at 250 KBps bandwidth and 10 m transmission range.In this case, the initial free buffer size of each node is set to be small, which ranges from 5 M to 55 M.There are six trams following predefined routes, and there is an extra high-speed interface  with the number of pedestrians/cars.As shown in the second column of Figure 4, the improved throughput-aware routing has almost the same latency as PRoPHET.By referring to the third column of Figure 4, our proposed routing has greater improvement on overhead ratio.From all the subfigures, we can see that the throughput-aware routing has the overall best performance when the number of nodes in the network is set to be relatively small.Figure 5 shows a similar result as in Figure 4 which is that, when the number of nodes is relatively small, our proposed routing outperforms the other four protocols in delivery and overhead and has a slight improvement on average latency.From all the subfigures in Figure 3, our proposed routing performs better than its original edition PRoPHET, which reflects that the same relay node choosing strategy with different data item selection mechanism has totally variant performance.In all, when the nodes are relatively abundant, that is, the density of nodes is large in the network, we prefer to choose MaxProp.On the other hand, our proposed scheme has a comparable improvement on the PRoPHET routing thus making it suitable to work in the network with relatively low density of nodes.

Related Works
In [8], Zhu et al. proposed a routing algorithm taking full advantage of predicted probabilistic vehicular trajectories by which the packet delivery probability was theoretically derived.This paper demonstrates that predicted trajectories do help data delivery in vehicular networks.One of the most classical probabilistic routing schemes is probabilistic routing protocol using history of encounters and transitivity (PRoPHET) [11].In PRoPHET, the utility metric is based on an encounter probability with the transitivity property.For example, given that   most likely encounters   and in similar manner that   encounters   , then   may be a good candidate node for node A even if its encounter is least likely.Therefore, messages carried by   would also be replicated to   , in addition to   , alleviating the buffer space exhaustion at   .In particular, the aging factor is also taken into account for the outdated information.
Reference [13] presents two multicopy forwarding protocols, called optimal opportunistic forwarding (OOF) and OOF-, which maximize the expected delivery rate and minimize the expected delay, respectively, while requiring that the number of forwarding operations of per message does not exceed a certain threshold.Reference [14] applies the evolutionary games to noncooperative forwarding control in MDTNs, of which the main focus is on mechanisms to rule the participation of the relays in the delivery of messages in DTNs.Reference [15] provides a reliable data delivery scheme for mobile sensor networks with an enhanced delaying technique.Nodes estimate connectivity and expect interencounter time with sink nodes.Connectivity is estimated based on ratio of past and present connections.When the connectivity is unreliable, nodes delay the transmission for the remaining interencounter duration or per-hop lifetime.Reference [16] theoretically proves that considering both factors leads to higher throughput than considering only contact frequency.And, to fully exploit a social network for high throughput and low routing delay, the authors propose a social network oriented routing protocol for DTNs, in which a duration utility-based metric is utilized for evaluating the most suitable the relay node for each message.
In [4], the authors find that it is wise to wait till much better opportunities arise to minimize the communication cost without degrading the delivery ratio and latency.Consequently a universal scheme, named E-Scheme, is proposed to improve routing on the delivery probability metric.In [3], the authors propose a distributed optimal community-aware opportunistic routing (CAOR) algorithm, where a reverse Dijkstra algorithm is devised so as to compute the minimum expected delivery delays of nodes, thus acheving the optimal opportunistic routing performance.By proposing a homeaware community model, whereby turning an MON into a network that only includes community homes, the computational cost and maintenance cost of contact information are greatly reduced.

Conclusion
In this paper, we try to improve the routing performance by resorting to an efficient data item selection mechanism in MONs.Our motivation is that, due to the fact that the bandwidth and contact duration time of each connection are highly constrained, a routing protocol would perform very differently with various data selection strategies.By defining the concept of "Transmission Profit, " the concept of "Optimal Throughput-Aware Probabilistic Routing" is introduced, which is consequently modeled as a dynamic programming problem.Then a data item selection algorithm for PRoPHET is devised in this paper.The simulation results show that our data item selection mechanism greatly reduces the number of aborted transmissions thus enhancing the routing performance in aspects of delivery probability, average latency, and overhead ratio.Besides, it is possible to apply the proposed scheme in improving other metrics by redefining the "Transmission Profit." Our future work will be focused on evaluating the improvement of the data item selection mechanism on various routing protocols.

Figure 1 :
Figure 1: An example of the bandwidth estimation process.

Figure 2 :
Figure 2: [Cambridge-iMote] Buffer size versus overhead with different percentage of simulation time.

Figure 4 :
Figure 4: [Helsinki City Scenario] Buffer size versus overhead with different number of nodes.

Figure 5 :
Figure 5: [Helsinki City Scenario] Message time-to-live versus overhead with different number of nodes.

Table 1 :
Mathematical notations. and   recently happening contact, respectively.Then   (,) −   (,) is the duration time of this contact, and we will simply use   and   when there would be no ambiguousness in the context.

Table 3 :
Calculation process using dynamic programming.

Table 4 :
The routing information table.

Table 6 :
Simulation settings of Helsinki City Scenario.