Optimal Routing Control in Disconnected Machine-to-Machine Networks

Machine-to-machine (M2M) networks could be connected by a wide range of wireless technologies (e.g., Bluetooth, WiFi, RFID). Due to some factors (e.g., mobility of machines, limited communication range), it is hard to maintain the connectivity of the network, that is, the network is disconnected and it is a specific application of delay tolerant networks (DTN). Communication in such network often needs nodes working in a cooperative way. However, due to selfishness, nodes have no incentive to help others. Therefore, if the source requests help from other nodes, it needs to pay certain fees to them. The main goal of this paper is to explore efficient ways for the source to maximize the average delivery ratio when the total fees are limited. First, we mathematically characterize the average delivery ratio under different policies. Then we get the optimal policy through Pontryagin's maximum principle, and prove that the optimal policy conforms to the threshold form when the fees that other nodes require satisfy certain conditions. Simulations based on both synthetic and real motion traces show the accuracy of our model. Through extensive numerical results, we demonstrate that the optimal policy obtained by our model is the best.


Introduction
At present, Machine-to-machine (M2M) networks have emerged as an innovative topic and are undergoing rapid development [1]. In particular, M2M networks could be connected by a wide range of wireless technologies (e.g., Bluetooth, WiFi). Due to the limited communication range, mobility of machines or other factors, it is hard to maintain the connectivity of the M2M networks. On the other hand, the concept of delay tolerant networks (DTN) has been proposed to support many new networking applications, where the end-to-end connectivity cannot be assumed [2]. Therefore, M2M networks can be seen as a specific application of DTN. In order to provide communication services in such disconnected networks, nodes in DTN communicate through a store-carry-forward way. In particular, when the next hop is not immediately available for the current node, some relay nodes will store the message in their buffer, carry the message along their movements, and forward the message to other nodes when a new communication opportunity is occurring [2].
By abuse of language, we will use node and machine alternately in this paper. It is easily to see that the storecarry-forward communication mode needs nodes working in a cooperative way. In particular, the source (one specific machine) may need other nodes serving as relays to forward message to the destination (another machine). In the real world, machines may be vehicles, smart phones, and so on. Though nodes in M2M networks can communicate with each other without human intervention, they can be manipulated by human, too. Therefore, peoples' social behavior (e.g., selfishness) can have certain impact on communication. Due to the selfish nature, nodes have no incentive to help others [3,4]. To make these nodes be cooperative, some incentive policies are necessary. This paper adopts the fees-based incentive policy, that is, the source has to pay certain fees to other nodes if it requests help from them. The fees may be real or virtual currency, by which nodes can buy certain services (e.g., downloading files) from others. Moreover, these fees may be varying with time. In fact, the buffer space or the forwarding ability can be seen as goods. The event that the source requests help from other 2 International Journal of Distributed Sensor Networks nodes can be seen as that the source buys certain goods from them, so the message transmission process can be seen as the commodities trading process, and nodes want to maximize its income in this process. Therefore, nodes may adjust the price of their goods according to the market state. For example, if the remaining lifetime of message is shorter, nodes may guess that the source is eager to transmit the message as soon as possible, so they may guess that their goods (e.g., buffer space, the forwarding ability) are important for the source. In this case, they may increase the price. In addition, if the remaining lifetime of the message is long, nodes may guess that the source may be not eager to transmit the message quickly and is not willing to pay more fees. In this case, they may help the source with only a fewer fees. Therefore, the fees that the ordinary nodes require may be varying with time. If the source has enough fees and can make every node be cooperative, the message will be transmitted according to the classic epidemic routing (ER) algorithm [5], but it has to pay more fees. As the buyers, the source may be not willing to pay more fees, for example, it may think that it is not economical [6]. Therefore, when the total fees are limited, how to maximize the probability that the destination gets message before the message deadline is a very important problem.
Note that our work is similar to the packet purse model (PPM) of credit-based incentive policy, in which nodes can get certain nuggets (denoted by fees in our work) from the source if they help the source [6]. Therefore, the fees in our work can be seen as nuggets in the PPM model. However, state of the art works about the PPM model have some differences with our work. First, most of them do not consider the case that the fees that nodes require may be varying with time. In addition, they do not study how to use the limited nuggets efficiently. For example, supposing that node j is willing to help the source by charging mnuggets at time t1, but it only requires n nuggets at time t2. If m > n and t1 < t2, the source can pay less nuggets when it requires help from node j at time t2, but this may decrease the probability that node j forwards message to the destination, so it may be not good for the source. On the other hand, because the source uses fewer nuggets to make node j be cooperative, the remaining nuggets are more, so the source may have enough nuggets to make more nodes be cooperative and this is good for the source. Therefore, how to use the limited nuggets of the source is not a simple problem, and this will be our main contribution. At last, most of existing works assume that once a node obtains certain nuggets, they will be cooperative all the time. However, the cost (e.g., energy) of nodes in the routing process is closely related with the forwarding times [7], so nodes may ask for nuggets according to the forwarding times. In other words, once they forward message to one node, the source has to pay certain fees to them.
On the other hand, our work is similar to the optimal control problem of ER algorithm in DTN, which is a popular topic recently, and there are some good works in this field, such as [7,8]. However, they do not consider the selfishness nature of nodes. For the selfishness behavior, there are some works, which evaluate its impact on the routing performance [3,4]. However, none of these works considers the optimal control problem. In addition, the selfishness behavior in those works is not varying with time and is denoted by a simple way. In particular, they use the probabilistic way to denote the selfishness behavior. For example, node i is willing to help node j with probability q.
The main contribution of this paper is to study the optimal routing problem in such complex application. In particular, we propose a unifying framework through a continuous time Markov process, which can be used to evaluate the total fees that the source pays. Then based on the framework, we formulate an optimization problem. Through Pontryagin's maximum principle, we explore the stochastic control problem, and we prove that the optimal policy conforms to the threshold form in some cases. By comparing the simulation results with the theoretical results, we show that our theoretical framework is very accurate. We compare the performance of the optimal policy with other policies through extensive numerical results and find that the optimal policy obtained by Pontryagin's maximum principle is the best.
The rest of this paper is organized as follows. In Section 2, we briefly present some related works, and we describe the network model in Section 3. In Section 4, we first present the theoretical framework, and then formulate the optimal control problem. In Section 5, we introduce the simulation and numerical results. Finally, we conclude our main work in Section 6.

Related Work
In the past few years, many routing protocols have been proposed in DTN, but most of them need certain prior knowledge of the network, or they can obtain such knowledge through some online learning processes, such as the works in [9][10][11]. In some applications, there may be no enough time to learn, so these algorithms have certain shortage. On the other hand, ER algorithm does not need any prior knowledge about the network and can be used in many environments. For this reason, this algorithm is still a very popular topic. However, it works in a flooding way, so it wastes much energy and suffers from poor scalability in large networks. At present, many policies have been proposed to reduce its overhead. Among them, there are probability forwarding policy [12], hop-based forwarding policy [13], and so forth. These algorithms try to achieve big delivery ratio and relatively low transmission cost. Generally speaking, big delivery ratio is obtained at the expense of more cost. Therefore, how to accurately evaluate both strengths and limitations of these algorithms is very important. Some works use the simulation manner [14], but recently, theoretical manner is more popular. The performance of ER algorithm based on the sparsely exponential graph is studied in [15] and then the problem is explored again with heterogeneous nodes in [16]. The authors in [17] get the generic theoretical upper bounds for the information propagation speed in large-scale mobile and intermittently connected network, and then the work in [18] explores the information propagation speed in bidirectional vehicular delay tolerant network. The work in [19] studies the performance of two-hop relay routing under limited packet lifetime. Performance of the ER routing when the contention exists is explored in [20].
Most routing algorithms in DTN need nodes working in a cooperative way, so the selfishness behavior can have important impact on the performance. Panagakis et al. evaluate the impact of selfishness through simulation in [21]. There are also some works, which study the impact of nodes' selfishness behavior by theoretical method, such as [3,4]. The work in [22] is the first to propose the social selfishness behavior in DTN and proposes a user-centered routing policy, which is adaptive to the selfishness nature. Specially, the social selfishness behavior means that nodes are more willing to help the one with whom they have certain social ties (e.g., citation relation). The work in [23] studies the impact of the social selfishness behavior on ER algorithm and finds that ER algorithm is very robust to the social selfishness behavior. Then they study its impact on multicasting in DTN [24].
Because ER algorithm has certain shortage, there are some works, which begin to consider the optimal control problem. The optimal control problem of two-hop routing algorithm based on discrete time model is studied in [7], and this work proves that the optimal forwarding policy conforms to the threshold form. Then the work in [8] explores the problem again with a continuous time Markov process. The work in [25] proposes an optimal activation and transmission policy, and then the work in [26] proposes an energy-efficient optimal beaconing policy. Above works try to maximize certain objective function under some constraints, but the work in [27] study the trade-off between the delivery delay and energy consumption in DTN that uses two-hop relaying method. This work is different from our work in certain aspects. First, the work in [27] is an optimization problem without constraint. Second, the energy consumption within given time is fixed. In particular, it is a special case of our work when PR(t) (see the definition in next section) is a constant. Third, that work uses the two-hop routing method, which is too simple. There are also some works, which consider the optimal control of ER algorithm with SIRD model, in which nodes carrying message (e.g., virus) may exist from the network [28,29].

Network Model
In this paper, we assume that there are one source node S, N ordinary nodes and one destination node D, so the network totally has N + 2 nodes. At time 0, only S is carrying message and it tries to make the destination D obtain message before the deadline T.
To make the destination get message quickly, S requests help from the ordinary nodes. If one ordinary node (e.g., j) gets message, it can forward message to other node. Due to the selfishness nature, S has to pay certain fees to j every time j forward message to other node. Therefore, considering the cost constraint, S needs to control the behavior of itself and the ordinary nodes. In this paper, we assume that both S and the ordinary nodes have the same forwarding policy. In particular, any node carrying message forwards message Ordinary node without message The destination D Ordinary node with message to other ordinary nodes with probability p(t) at time t. However, they forward message to the destination D with probability 1. That is to say, once a node carrying message encounters with D, D can get message immediately. This may need the third organization to make the ordinary nodes abide by the rules. On the other hand, if an ordinary node (e.g., j) which is carrying message forwards message to another node at time t, the fees that S pays to j is PR(t). The goal of this paper is to explore the optimal value of p(t) at any time t to maximize the average delivery ratio when the total fees that the source can pay is limited. The snapshot at time t1 and t2 can be seen as in Figure 1. At time t1, ordinary node O1 encounters with the source, and the source will forward message to O1 with probability p(t1). In addition, node O5 encounters with the ordinary node O3, which is carrying message at time t1. Because ordinary nodes have the same forwarding policy as the source, node O3 will forward message to O5 with probability p(t1), too. If the ordinary O3 successfully forwards message to O5, the source has to pay PR(t1) fees to O3. Similarly, at time t2, node O4 encounters with the ordinary node O2, and node O2 will forward message to O2 with probability p(t2). Nodes in the network can communicate with each other only when they come into the transmission range of each other, which means a communication contact, so the mobility rule of nodes is critical. In this paper, we assume that the occurrence of contacts between two nodes follows a Poisson distribution. This assumption has been used in wireless communications many years. At present, some works show that this assumption is only an approximation to the message propagation process, and they reveal that nodes encounter with each other according to the power law distribution [30]. However, they also find that if you consider long traces, the tail of the distribution is exponential. In addition, the work in [31] shows that individual inter-meeting time can be shaped to be exponential by choosing an approximate domain size with respect to given time scale. Moreover, there are also some works, which describe the intermeeting time of human or vehicles by Poisson process and validate their model experimentally on real motion traces [32,33]. According to these descriptions, the exponential intermeeting time is rational in some applications, and we assume that the intermeeting time between two nodes follows an exponential distribution with parameter λ. Simulations based on both 4 International Journal of Distributed Sensor Networks The number of ordinary nodes carrying message at time t The delivery ratio at time t The total fees that the source pays till time t synthetic and real motion traces show that our theoretical framework based on such assumption is very accurate. The list of commonly used variables can be seen as in Table 1.

Theoretical Framework.
Let X(t) denote the number of ordinary nodes carrying message at time t. Only the source has message at time 0, so we have X(0) = 0. In this paper, we assume that nodes carrying message do not receive the same message anymore. Given a small time interval Δt, we can obtain Symbol {Y (t)} denotes the set of ordinary nodes without message at time t, so the set has N − X(t) elements. ϕ j (t, t + Δt) denotes the event whether the ordinary node j gets message in time interval [t, t + Δt]. If ϕ j (t, t + Δt) = 1, we can say that node j gets message, but if ϕ j (t, t + Δt) = 0, node j does not get message. As shown above, two nodes encounter with each other according to an exponential distribution with parameter λ. Therefore, node j encounters with a specific node (e.g., k) in interval [t, t + Δt] with probability 1 − e −λΔt . Moreover, if node k has message and j encounters with k at time t +t1, node j receives message with probability Δt is very small, we can say that such probability equals to p(t)(1 − e −λΔt ). In addition, there are 1 + X(t) nodes which have message at time t, so we have Therefore, we can get the expectation of ϕ j (t, t + Δt) as follows: Base on (1) and (3), we have Note that E(X(t)) denotes the expectation of X(t). One main metric of routing algorithm in DTN is the delivery ratio, which denotes the probability that the destination D obtains message within given time. Let F(t) denote the delivery ratio when the given time is t. Before getting its value, we first give another symbol W(t) = 1 − F(t), which denotes the probability that D does not obtain message before time t. Moreover, let W(t, t + Δt) denote the probability that D does not get message in time interval [t, t + Δt]. Therefore, we have There are 1 + X(t) nodes which may forward message to D at time t, so we can get the following equation easily: Further, we can obtain Let U(t) denote the total fees that the source pays till time t, we have Let σ j (t, t + Δt) denotes that whether the ordinary node j gets message from another ordinary node in interval [t, t + Δt]. If this event happens (e.g., at time t + ρ j , 0 ≤ ρ j ≤ Δt), one ordinary node carrying message forward message to another node, so S has to pay certain fees denoted by PR(t + ρ j ). In fact, if the destination D gets message from an ordinary node, the source has to pay certain fees to this ordinary node, too. However, the network only has one destination. Therefore, we can ignore the cost in this paper. Note that ϕ j (t, t +Δt) denotes the event whether the ordinary node j gets message in time interval [t, t + Δt], so we have p σ j (t, t + Δt) = 1 = p ϕ j (t, t + Δt) = 1 X(t) 1 + X(t) .
International Journal of Distributed Sensor Networks 5 Based on (8) and (9), we have lim Then, we have

Optimal Control.
Our object is to solve the following optimization problem: Symbol C denotes the maximal fees that the source can pay, and T is the maximal lifetime of the message. The controlling variable is p(t). Let ((X, F), p) be an optimal solution. In particular, at time t, X denotes the value of E(X(t)), and F denotes the value of E(F(t)). Similarly, p denotes the value of p(t). In addition, U denotes the value of E(U(t)) at time t. Consider the Hamiltonian H, and costate or adjoint functions λ X , λ F and λ U defined as follows: Note that at time t, PR is a simple expression of PR(t).
The transversality conditions are shown as follows: Then according to Pontryagin's maximum principle ([34, Page 109, Theorem 3.14]), there exist continuous or piecewise continuously differentiable state and costate functions, which satisfy This equation between the optimal control parameter p and the Hamiltonian H allows us to express p as a function of the state (X, F, U) and costate (λ X , λ F , λ U ), resulting in a system of differential equations involving only the state and costate functions, and not the control function. In fact, this equation means that maximize the value of E(F(T)) equals to maximize the corresponding Hamiltonian H. In particular, at given time t, the state (X, F, U) and costate (λ X , λ F , λ U ) can be seen as constants, and p(t) can maximize H at this time. Therefore, according to (14), we can obtain the optimal policy as follows: The total number of ordinary nodes is N, so we have X ≤ N. When X = N, every ordinary node is carrying message, and the nodes cannot forward message to the ordinary node any more, so the value of p cannot have any impact and it may be any value. If X < N, we have N − X > 0, so if λ X (1 + X) + λ U PRX > 0, we can obtain (N − X)(λ X (1 + X) + λ U PRX) > 0. Therefore, H is increasing with p at present, and the optimal value of p is 1. The optimal value of p can be obtained easily in other cases in similar way, and we have (17). For simplicity, we only consider the case X < N next in this paper, that is, at least one ordinary node is not carrying message.
Below, we will prove that when the function of PR(t) satisfies certain conditions, the optimal policy has a simple structure. The conditions are: PR(t) is nondecreasing with time t; it is differentiable; it is nonnegative. In fact, the maximal lifetime (T) of the message is fixed, so if the value of t is bigger, the remaining lifetime (T − t) is shorter. In this case, the ordinary nodes may guess that the source may be eager to transmit message to D quickly, so they may ask for more fees from the source. That is to say, if the value of t is bigger, the value of PR(t) may be bigger. Therefore, the condition that PR(t) is increasing is rational in some environments. In this case, the optimal policy has at most one jump, and it conforms to the threshold form. Then we have the following Theorem. Theorem 1. If the PR(t) satisfies the above conditions, the optimal policy satisfies: Proof. This theorem means that the source requests help with probability 1 before time h, but it stops requesting help after time h. We simply use X(t) or X to denote E(X(t)) in the proving process, and we only consider the case that X < N. Now, we define a new function as follow: Therefore, we havė 6 International Journal of Distributed Sensor Networks If f (t) < 0, we know that p(t) = 0 according to (17), so we haveḟ (t) =λ X (t)(1 + X(t)) + λ U (t)Ṗ R(t)X(t). (20) Then based on (14) and p(t) = 0, we know thaṫ In fact, we have 1 + λ F > 0 at any time. Otherwise, if 1 + λ F ≤ 0, we have From (22), we can see that λ F cannot be increasing, and it cannot be 0 at time T. This is contradiction with (15). Therefore, we have 1 + λ F > 0 at any time. Based on (20) and (21) and the fact that PR is nondecreasing with t and λ U is nonpositive, we can see that f is not increasing at time t. Because f (t) < 0, we have f (s) < 0, s > t. Further, we have p(s) = 0, s > t. Therefore, once the forwarding probability equals to 0, it remains unchanged. Now, we assume that f (t) = λ X (t)(1 + X(t)) + λ U (t)PR(t)X(t) = 0, and then we havė Combined with (19), we have, Note that we have proved that 1 + λ F > 0 at any time. Based on (14) and (15), we know that λ U is a nonpositive constant. If it is smaller than 0, we can easily know that f (t) is decreasing at time t. If λ U equals to 0, we havė If E(F(t)) = 1, the objective function reaches to the maximal value, and it is not necessary to forward anymore, so we only consider the case E(F(t)) < 1. In this case, based on (7), we haveḞ (t) > 0, Because f (t) = 0, we have f (s) < 0, s > t. Further, we have p(s) = 0, s > t. This means that there is at most one time t at which p can be any value, and then p equals to 0 after time t.
In fact, Theorem 1 means that if PR(t) satisfies certain conditions, the optimal policy has a bang-bang structure. In particular, the source will require help from others with probability 1 before certain threshold, and then stop. In addition, the value of the threshold is denoted by h.

Model Validation.
In this section, we will check the accuracy of our framework by comparing the theoretical results obtained by our model with the simulation results. We run several simulations using the opportunistic network environment (ONE) [35] based on both synthetic mobility model and realworld-based scenarios. For the synthetic mobility trace, we use the famous random waypoint (RWP) mobility model [36], which is commonly used in many mobile wireless networks. There are totally 500 nodes, and all nodes move according to the RWP model within a 10000 m × 1000 m terrain with a scale speed chosen from a uniform distribution from 4 m/s to 10 m/s. The communication range is 10 m. Moreover, the source and destination nodes are randomly selected among these nodes. For the realworld-based scenarios, we use a real motion traces from about 2100 operational taxis for about one month in Shanghai city collected by global position system (GPS) [37]. The location information of the taxis is recorded at every 40 seconds with the area of 102 km 2 . We randomly pick 500 nodes from this trace, and the source and destination nodes are randomly selected among these nodes, too.
The first metric is the total fees that the source pays to the ordinary nodes under different forwarding policies. The source may forward message with any probability, that is, the value of p(t) may be any value between 0 and 1 at time t. Because our main goal is to check the accuracy of our theoretical framework, we only consider two special cases. Case 1: p(t) = 1, t ≥ 0; Case 2: p(t) = 0.5, t ≥ 0. The first case means that the source requests help all the time and message is propagated according to ER algorithm, but in the second case, nodes forward message to other ordinary nodes with probability 0.5. The function of PR(t) may be any form. For simplicity, we define: PR(t) = 3(1−e −t/10000 ). In this section, the total fees is not limited. Let the maximal message lifetime increase from 0 s to 10000 s, we can get the results in Figures 2 and 3, for the RWP model and Shanghai city motion trace, respectively.
From these results, we can see that the average deviation between the theoretical and the simulation results is very small. For example, the average deviation is about 2.92% for the RWP mobility model, and 4.01% for the Shanghai city motion trace. In addition, Figure 3 shows that the source has to pay more fees if it adopts the policy of Case 2. In fact, when it adopts the policy of Case 1, nodes can get message timely, when the fees that the ordinary nodes required is less. However, if it adopts the policy of Case 2, many nodes can get message when the time t is bigger, so the source has to pay more fees to them.
Then based on the same settings, we explore the average delivery ratio in Figures 4 and 5, respectively. These results also show that the average deviation between the theoretical and the simulation results is very small.
To further check the accuracy of the model, we want to explore the performance when the number of nodes is different. In particular, we assume that there are 300 and 600 nodes, respectively. For simplicity, we only consider Case 1, that is, the value of p(t) is 1 all the time. Other settings remain unchanged, and we can obtain Figures 6  and 7, respectively. Both results show that our theoretical framework is very accurate.
All of the above results demonstrate the accuracy of our theoretical framework. For this reason, we can use the numerical results obtained by our theoretical framework to evaluate the performance of different policies.

Performance Analysis with Numerical Results.
In this section, all of the numerical results are obtained by our theoretical framework based on the best fitting for the Shanghai city motion trace.  First, we evaluate the performance of the optimal policy, which is the threshold form. For comparison, we consider three other cases. Case 1: p(t) = p, t ≥ 0; Case 2: No Constraint; Case 3: random. Case 1 means that nodes forward message with the same probability all the time. The policy No constraint means that the total fees C is not limited, and p(t) equals to 1 all the time. The random policy means that the value of p(t) is randomly selected from the interval [0, 1] at time t. In addition, we assume that total fees C equals to 150. Other settings are the same as that in simulation, and then we can obtain Figure 8. The result shows that the optimal policy is better than other policies when the total fees is limited. On the other hand, when total fees C equals to 150, the average delivery ratio is only a little smaller than that when the total fees is not limited. Now, we let the maximal message lifetime T be 10000 s, and the value of C increase from 1 to 200. Other settings remain unchanged, and we can obtain Figure 9. This result shows that the optimal policy is best when the total fees are limited, too. On the other hand, when the total fees reach about 100, the average delivery ratio is about 1. This means that the source only needs to pay limited fees. The number of ordinary nodes may have certain impact on the routing performance. At present, we only consider the optimal policy obtained from (17). We let the value of C be 50, and the maximal message lifetime increase from 0 s to 10000 s. Other settings remain unchanged. Based on these settings, we can obtain Figure 10 when N equals to 200, 600, and 1000, respectively.
The result in Figure 10 shows that if the number of ordinary nodes is bigger, the performance is better. As shown above, the message transmission process can be seen as the commodities trading process. The event that the source requests help from other nodes can be seen as that the source buys certain goods from them. The price of the goods is increasing with time. Therefore, it is good for the source to buy more goods early. However, when the number of ordinary nodes is smaller, the goods are limited, so the source cannot buy many goods early. Therefore, if the number of ordinary nodes is bigger, the performance is better.
On the other hand, as shown in Theorem 1, the source will stop requesting help at certain time. In particular, the optimal policy satisfies: p(t) = 1, t < h, and p(t) = 0, t > h, 0 ≤ h ≤ T. Therefore, the forwarding probability equals to 0 after time h. When the number of nodes is bigger, the value of h is smaller, so the source stops requesting help earlier.
We can see the result in Figure 11 more clearly, when there are 200, 600, and 1000 nodes, respectively. For example, the value of h equals to about 2160s when there are 600 nodes.
In the above simulation and numerical results, we define: PR(t) = 3(1−e −t/10000 ). However, there may be many different forms for the function. Here, we study another special case, that is, we define: PR(t) = b * (t + 1) y , b, y > 0. Other settings are the same as that in above numerical result. However, we let the value of N be 600. The performance of the optimal policy is shown in Figure 12. This result shows that the performance is different when the parameters are different.

Conclusions
Due to the mobility of nodes and many other factors, it is hard to maintain the connectivity in M2M networks. Therefore, the store-carry-forward communication mode is an important message propagation way. Such communication mode needs nodes working in a cooperative way. Due to the selfishness nature, nodes may ask for some fees (denoted by PR(t)) from the source, which may be varying with time. For example, when the message stays in the network a long time, the ordinary nodes may think that its remaining lifetime is shorter, so they may ask for more fees from the source. In this paper, we propose a unifying framework to evaluate the total fees that the source pays under different forwarding policies. Then based on the framework, we study the optimal control problem. In particular, we prove that the optimal forwarding policy conforms to the threshold form when PR(t) satisfies certain conditions. Simulations based on both synthetic and real motion traces show the accuracy of our theoretical framework. Numerical results show that the optimal forwarding policy obtained by (17) is the best one.