Energy-Efficient and Delay Sensitive Routing Paths Using Mobility Prediction in Mobile WSN: Mathematical Optimization, Markov Chains, and Deep Learning Approaches

In Mobile Wireless Sensor Networks there could be scenarios where absolutely all network nodes (including the base station) are mobile, becoming a very hard task to find a communication path between a sensor node and the base station due to many network variables are changing at each moment. In addition, there are delay-sensitive applications that require establishing communication paths as soon as possible to mitigate low network performance in terms of end-to-end delay, reducing, at the same time, the energy consumption of the network. For this reason, we propose a multiobjective mathematical optimization model for finding the optimal communication path between a source node and a sink (base station) considering hard scenarios where all network nodes are mobile and minimizing end-to-end delay and energy consumption. This mathematical model would offer significant advantages to evaluate new algorithms due to we could know how far or close are the algorithm results from the optimal values given by the mathematical model. In addition, we propose a prediction distributed routing algorithm based on Markov Chains that takes into account the network mobility in order to find as fast as possible a communication path between a source node and a sink with minimal energy consumption. We also propose a deep learning approach to predict future nodes’ distances in a mobile network to determine if future movements of nodes will cause communication disruptions in paths. Significant findings were obtained when the Markov Chains and Deep Learning approaches were compared in terms of predicting nodes mobility and reducing the delay and the energy consumption in the network. The performance of our prediction algorithms (Markov Chains and Deep Learning approaches) is evaluated against the mathematical model to determine how good it is. Finally, to analyze our prediction algorithms considering real online scenarios, we compared it against typical routing algorithms, obtaining promising results in terms of delay and energy consumption in all mobile node scenarios.


I. INTRODUCTION
Mobile Wireless Sensor Networks (MWSN) are a particular case of WSN at which some or all network nodes are mobile due to they are attached to entities such as mobile objects, animals, or humans. Due to these entities are continually moving in a specific area, it is probably to experiment long The associate editor coordinating the review of this manuscript and approving it for publication was Juan Liu . delays to establish a path to communicate a sensor node with a sink (base station), affecting the end-to-end delay performance in delay sensitive applications such as military or healthcare monitoring applications [3]. In addition, these sensors are limited in terms of energy, whereby it is necessary to accomplish delay requirements and, at the same time, minimizing energy consumption [1], [2]. In this sense, due to a sink should not experiment undesired delays when it receives information collected by sensors, it is necessary to propose mathematical models and algorithms capable of building communication paths as fast as possible with minimal energy consumption [3], [4].
Given the scenario described above, novel routing algorithms are emerging for solving these mentioned problems such as [11]- [13]. However, they, and many others that will be mentioned in section II-B, do not present a mathematical optimization model and only the sinks are mobile in the network; that is, all network nodes are not mobile. Proposing mathematical optimization models for representing networks at which nodes' position changes across time (time-varying graphs) is not an easy task. However, there are mathematical optimization models for time-varying graphs applied to different technological fields, such as wireless networks, cellular networks, and vehicular networks. Nonetheless, these types of mathematical optimization models are scarce in scenarios where all nodes are completely mobile, and they are not formulated considering the particular requirements of mobile wireless sensor networks such as constraints in terms of energy consumption, computational and memory resources. In this sense, our work pretends to propose a mathematical optimization model for time-varying graphs in the context of a mobile wireless sensor network for finding the optimal communication path between a source node and a sink (destination node) considering all network nodes are mobile and minimizing two objective functions: end-to-end delay and energy consumption. The results obtained by the mathematical optimization model could be compared against algorithms in order to evaluate their performance in terms of delay and energy consumption. In other words, optimal values given by the mathematical model can be reference values to determine how good are the results of a new algorithm proposed in MWSN considering, obviously, the same goals and parameters of the mathematical model. This would give us a great advantage to evaluate new algorithms because we would know how far or close are the algorithm results from the optimal values given by the mathematical model. However, we do not pretend our mathematical model represents all the details involved in the design of an algorithm in MWSN, but the comparison results between the proposed algorithm and the mathematical model could give clues to detect if the algorithm we are proposing is going in the right or wrong direction. On the other hand, we also propose prediction distributed routing algorithms to be compared against the mathematical model in order to evaluate its performance and potential application in real mobile wireless sensor networks scenarios. The first prediction algorithm is based on Markov Chains for considering network mobility with the aim of finding as fast as possible a communication path between a source node and a sink with minimal energy consumption. The second prediction algorithm is based on a Deep Learning approach with the same purpose as the Markov Chains method. Both methods are compared later in Results Section (IV) in order to extract their differences and similarities. Finally, this paper is an extension of a previous work presented in [15]. This extension consists of considering the following improvements: • In the mathematical model, we considered two objective functions: delay and energy functions to approach delay-sensitive applications with minimal energy consumption. These two functions were taken into account to obtain optimal solutions minimizing both functions considering network mobility.
• In the mobility prediction algorithm based on Markov Chains, we considered not only RSSI levels of nodes but their current energy consumption level to be selected as a forwarding node in order to find a path between a sensor source node and a sink.
• We considered more details for the movement model of sensor nodes, that is, we described precisely the theoretical details of the selected movement model.
• The energy consumption model applied to the sensor nodes is described in detail and implemented with realistic parameters in order to obtain results adjusted to real scenarios.
• The energy consumption model applied to the sensor nodes is described in detail and implemented with realistic parameters in order to obtain results adjusted to real scenarios.
• We propose a new prediction algorithm based on deep learning to select the best forwarding node in order to build a path between a sensor source node and a sink. In addition, a dataset has been proposed to be used by our deep learning approach to select the best forwarding node.

A. PAPER ORGANIZATION
The remainder of this paper is organized as follows. The general problem statement is described in section II-A; the mathematical optimization model for the problem is presented in III-A, and the mobility prediction algorithm based on Markov Chains is introduced in section III-B. In Section III-C, the mobility prediction algorithm based on Deep Learning is described. In section IV we presents the results obtained; and finally, section V shows the conclusions.

II. PROBLEM STATEMENT AND RELATED WORKS
A. PROBLEM STATEMENT Figure 1.a illustrates our problem. Suppose we have an MWSN, which at time t 1 there is a communication path between the source node n 1 and a destination node (squared node). However, at time t 2 , node n 2 moves away from node n 3 , causing a communication disruption for transmitting information from n 1 to the destination node. Once n 3 has realized this problem at time t 3 , n 3 has to perform routing corrections in order to reestablish the communication path between n 1 and the destination node. The communication reestablishment of this path can be perfectly performed using routing techniques but at the expense of introducing undesired delays. In some applications, these delays can be ignored because they do not affect application goals, but in other ones, such as delay sensitive applications like health monitoring, this disadvantage might result in a low end-toend delay network performance. Given the problem above, our proposal consists of using a prediction technique described in Figure 1.b [7], [8], [18]. It represents the same situation showed in Figure 1.a, but in this case, node n 3 at time t 1 receives information that indicates node n 2 will rapidly be away from its communication range at time t 2 . Given this information, n 3 at time t 1 is also analyzing a possible candidate node, which could replace n 2 in the case of n 2 fails at a future time. Indeed, if node n 2 at time t 2 fails because it has moved away from n 3 , this node at time t 2 can promptly restore the communication path between n 1 and the destination node, reducing the delay described in Figure 1.a.
Proposing mathematical optimization models for representing networks at which nodes' position changes across time (time-varying graphs) is not an easy task. However, there are many mathematical optimization models for time-varying graphs applied to different technological fields, such as wireless networks, cellular networks, and vehicular networks. Nonetheless, this type of mathematical optimization model is scarce in mobile wireless sensor networks due to this kind of wireless network has particular constraints in terms of energy consumption, computational and memory resources. In this sense, our work pretends to propose a mathematical optimization model for time-varying graphs in the context of a mobile wireless sensor network.

B. RELATED WORKS
Due to we propose three different methods to solve the problem, that is, a mathematical optimization approach, a Markov Chains approach, and a Deep Learning approach, we have divided the related works into three parts respectively, which will be described as follows.

1) MATHEMATICAL OPTIMIZATION RELATED WORKS
Given the problem statement above, novel routing algorithms are emerging for solving these mentioned problems. Specifically, in [11] the authors propose an algorithm to guarantee that each source sensor node gets single hop access to a backbone node in order to reach a mobile sink. However, they do not present a mathematical optimization model and only the sinks are mobile in the network; that is, all network nodes are not mobile.
Authors in [12] present a multi-objective particle swarm optimization for finding the optimal path in a wireless sensor network with a mobile sink for data collection. This proposal corresponds to an evolutionary approach that cannot guarantee the optimal solution and all network nodes are not mobile. In other words, they do not propose a mathematical optimization model in order to obtain optimal solutions.
In [16], the authors present a minimally invasive veneer tree using the particle optimization algorithm for routing wireless sensor networks with a moving sink. This algorithm is population-based, and population members try to find a tree that has less energy and latency by sharing routing information. The proposed algorithm was compared in terms of energy consumption, distance, and the number of steps with previous algorithms. However, this work does not present a mathematical optimization model to obtain optimal values, instead of that, they present a metaheuristic that does not guarantee obtaining optimal values. In addition, they only consider that the sink is mobile and not the rest of sensor nodes.
In [17], the authors propose a mobile sink path planning for collecting the information for static nodes located at the bottom of the sea. They develop a Cluster Head Selection Algorithm (CHSA) and particle swarm optimization algorithm (PSO) to optimize the selection of cluster heads in the Underwater Heterogeneous Sensor Network (UHSN). Their proposed method can balance and save nodes energy consumption while shortening the moving path of the mobile sink. However, this work does not present a mathematical optimization model to obtain optimal values, instead of that, they present a metaheuristic that does not guarantee obtaining optimal values. In addition, they only consider that the sink is mobile and not the rest of sensor nodes.
In [18], the authors propose a Stochastic Optimal Routing Algorithm (SORA) for high-loss WSNs. In SORA, the energy conservation and transmission delay reduction problems are firstly transformed into a stochastic optimization problem. However, the scenario evaluated by them is completely static, that is, none of the network nodes are mobile.
In [19], the authors propose a routing algorithm that has an ant colony optimization (ACO) algorithm. This algorithm uses an endocrine cooperative particle swarm optimization algorithm (ECPSOA) that is used to improve several metrics in WSNs routing such as End-to-end delay, power consumption, and communication cost. The algorithm is evaluated in completely mobile network scenario, whereby is very seemed to our proposal. However, their proposal is a metaheuristic that does not guarantee to obtain optimal values like our mathematical optimization model.
In [20], the authors propose a mobility and energy-aware cross-layer searching routing algorithm that works in a stationary and mobile scenario. It provides a route to nodes which forward data between each other directly in a single hop, or indirectly through multiple hops via neighbouring nodes. The algorithm works for static and mobile scenarios, whereby is very seemed to our proposal. However, their proposal is a heuristic that does not guarantee to obtain optimal values like our mathematical optimization model. Likewise, in [13] the authors present a complete summary of algorithms in MWSN. However, any mathematical optimization model is proposed for a mobile wireless sensor network.
A summary of these related works is presented in Table 1. For understanding this table is necessary to consider the following observations: • Static field refers to scenarios where all network nodes do not change its positions across time.
• Mobile field refers to scenarios where all network nodes can change its positions across time.
• Delay-sensitive field refers to scenarios where the delay is considered to be reduced as much as possible.
• Optimal values field refers to if was proposed a method to guarantee the obtaining of optimal values.
• Prediction field refers to if was proposed a method to predict future positions of network nodes.
According to Table 1, we can see that our proposal is the only one that accomplishes all fields.
On the other hand, remember that we are proposing two prediction algorithms: one based on Markov Chains and another based on Deep Learning. As follows, we are going to describe the related works for these two types of algorithms. In addition, it is necessary to take into account that our prediction algorithms use RSSI levels to know the distance between nodes instead of knowing accurate node positions through GPS devices with the aim of reducing energy consumption in the network and, thus, extending the network lifetime.

2) MARKOV CHAINS RELATED WORKS
According to our prediction algorithm based on Markov Chains, notice that it operates based on the distance of neighbour nodes to determine the best forwarding node in terms of delay and energy consumption requirements. That is, our Markov Chains design does not consider collecting the exact position (through GPS modules or a high amount of RSSI measurements) of neighbour nodes because, otherwise, it will require too much energy wasted for determining the exact position of each neighbour node, causing a high negative impact in terms of energy consumption in the network. Obviously, the optimal solution would be to know the exact position of neighbour nodes because it will allow us to know with high certainty the movement pattern of each neighbour node and, then, it would be the ideal method to select the best forwarding node. However, knowing the exact position of a node through RSSI measurements requires sending many control packets, which represents a high energy consumption in wireless sensor networks. In other words, our prediction algorithm based on Markov Chains only uses a few control packets (with RSSI information) to have a sense of distance between two pairs of nodes instead of using a high amount of control packets (with RSSI information) or GPS modules to obtain accurate nodes positions. In conclusion, with the aim of minimizing the energy consumption in the network, our prediction algorithm based on Markov Chains only uses distances without knowing the exact positions of nodes to determine the best forwarding node. For this reason, our Markov Chains method only considers knowing the distance, and, in this sense, it cannot be compared against Markov Chains methods that consider the exact position of neighbour nodes because they would be not comparable methods. Next, we are going to describe several related works based on Markov Chains.
In [23], the authors propose a multiuser multivariate multiorder Markov model and a multimodal user mobility pattern prediction approach. They propose a proposed to perform precise mobility pattern prediction based on real-world GPS trajectory data set. In other words, this method is based on GPS positions for predicting trajectories and, in this sense, it does not correspond to a method to be compared with our work due to the reasons previously explained.
In [24], the authors propose to integrate an attention technique into the Markov model to predict future locations. However, they predict locations based on a GPS dataset that has user coordinates. Similar to [23], this method is also based on GPS positions and, in this sense, it does not correspond to a method to be compared with our work due to the reasons previously explained.
In [25], the authors propose a weighted Markov prediction model based on mobile user classification. The trajectory information of a user is extracted first by analyzing real mobile communication data, where the complexity of a user's trajectory is measured using the location given by cellular base stations. That is, accurate positions are given by cellular networks, a technology that is not assumed and not common to be present in our wireless sensor network scenario. In this sense, this work does not correspond to a method to be compared with our work.

3) DEEP LEARNING RELATED WORKS
According to our prediction algorithm based on Deep Learning, it assumes the same considerations described for the VOLUME 9, 2021 prediction algorithm based on Markov Chains, that is, it operates based on the distance of neighbour nodes to determine the best forwarding node in terms of delay and energy consumption. Obviously, the selection of the best forwarding node is performed through a prediction model based on a deep learning technique instead of using a Markov Chains method. Likewise, our deep learning approach does not consider the exact position of neighbour nodes due to the reasons described previously for the prediction algorithm based on Markov Chains. In other words, our prediction algorithms, the Markov Chains approach and the Deep Learning approach, take into account the same considerations in order to be comparable, that is, both of them are based on distance instead of knowing the exact nodes' positions. Next, we are going to describe several related works based on Deep Learning techniques to predict the distance between nodes in order to select the best forwarding node.
In [26], the authors propose ML techniques to learn the mobility of the mobile mmWave (Millimeter-wave communication) users and predict their moving directions. The authors propose to use advanced MIMO antenna systems in cellular base stations to know exact user positions. Thus, based on this accurate information, they propose an ML approach to predict user mobility. The authors propose to use advanced MIMO base station techniques to know exact user positions. Thus, based on this accurate information, they propose an ML approach to predict user mobility. We cannot use this ML approach to be compared against our prediction algorithm based on Markov Chains due to two reasons: firstly, they assume knowing exact node positions, which is not assumed by our Markov Chains approach and, secondly, they use advanced antenna techniques appropriated to be employed in cellular networks, which is quite difficult to achieve in wireless sensor networks.
In [27], the authors propose a neural network framework to analyze citywide human mobility based on raw GPS data. The learning model is obtained considering exact user positions. For this reason, we cannot use this ML approach against our prediction algorithm based on Markov Chains due to they are not comparable.
In [28], the authors propose a deep learning approach to predict the exact future location of a node based on RSSI measurements in a dynamic environment. They use many RSSI control packets to determine the exact position for just one node. We cannot use this ML approach to be compared against our prediction algorithm based on Markov Chains due to two reasons: firstly, they assume to determine exact node positions, which is not assumed by our Markov Chains approach and, secondly, the fact of using many RSSI control packets to determine exact node positions would cause an excessive energy consumption in our wireless sensor network.
In [29], the authors propose a deep learning approach to determine the exact location of a device through RSSI measurements and interference detection. Similar to [28], many RSSI control packets are used to establish the exact device position. We cannot use this ML approach to be compared against our prediction algorithm based on Markov Chains due to two reasons: firstly, they assume to determine the exact device position, which is not assumed by our Markov Chains approach and, secondly, the fact of using many RSSI control packets would cause an excessive energy consumption in our wireless sensor network. Finally, it was not possible to find works that build machine learning models based only on node distances instead of exact node positions. The methods used by these works require a high energy consumption or using technologies (advanced MIMO antenna systems) that cannot be allowed in wireless sensor networks. For this reason, in this work, we also propose a deep learning approach based on node distances in order to be compared against our Markov Chains approach under the same considerations.

C. CONTRIBUTIONS
• We propose a mathematical optimization model for MWSN considering that absolutely all network nodes are mobile, while many works present a mixed scenario where some nodes are static and some are mobile, but not all network nodes are mobile. However, if a work assumes all network nodes are mobile, it proposes an algorithm instead of proposing a mathematical optimization model. Given these previous reasons, we consider our proposal is novel in MWSN.
• The results obtained by the mathematical optimization model could serve as a reference to evaluate new algorithms in order to analyze their performance in terms of delay and energy consumption in MWSN. In other words, optimal values given by the mathematical model can be used to determine how good are the results of a new algorithm proposed for MWSN applications considering, obviously, the same goals and parameters of the mathematical model. • We proposed an algorithm based on Markov Chains to predict if neighboring nodes will be far or close in order to avoid future interruptions in communication paths. This strategy allowed us to establish more reliable communication paths, which reduced considerably the end-to-end delay in scenarios where the number of nodes was scarce. In addition, in order to contrast the Markov Chains results, we have proposed a Deep Learning approach to select the best forwarding node taking into account only distances between nodes instead of using the exact node positions. This feature makes our approach very special due to most works consider exact node positions to predict node mobility. In summary, our approaches, with only a sense of distance between nodes, are able to predict node mobility minimizing at the same time the delay and energy consumption in the network.
• While many algorithms are based on accurate positions given by GPS devices, our prediction algorithms use RSSI levels to avoid using GPS devices with the aim of reducing energy consumption in the network and, thus, extending the network lifetime.

A. MATHEMATICAL MODEL FORMULATION
In this section, we propose a multi-objective mathematical optimization model to build a path from a source node to the sink minimizing the delay and energy consumption of the network. The mathematical model needs too much time to provide a solution, whereby it is not an affordable and scalable solution for real mobile wireless sensor networks applications because they require solutions that must be obtained as fast as possible. However, in despite of wasting a considerable time to provide a solution, our mathematical optimization model is used as an offline method that has global information about the network, which allows us to obtain the best possible solution, that is, the optimal solution value for a specific network scenario. For this reason, it is obvious that the mathematical model will always obtain the best results for all metrics evaluated later in IV. As a consequence, the optimal solutions offered by the mathematical model can be used as reference values to evaluate how close is the performance of the algorithms proposed later.

1) PROBLEM DESCRIPTION AND ASSUMPTIONS
In this section, our problem is enunciated and described in detail, as well as some assumptions are shown in order to simplify our mathematical optimization model. Based on Figure 2, we will describe our problem: • Mobile Network: Assume we have a mobile network at which network nodes' position changes across time periods. For this reason, the links cost between network nodes also changes across time periods. This means that at each time period the network has particular links costs. For this reason, we could say these particular links costs reflect the network state at a given time period.
In this sense, each network at a given time period will be called Network State. For instance, Network State at time period 1 is called Network State 1, Network State at time period 2 is called Network State 2, and so on. In other words, according to Figure 2.a we have an initial network (Network State 1) compound by four nodes. Due to these nodes conform a network, there are interrelations between them that we will call Links. These links have a cost, which can be represented, for example, by the distance, the delay, or the energy consumption. In this work, we consider two types of cost: delay cost and energy consumption cost. In the next time period, network costs at the Network State 1 change, and then, these new interrelations between the nodes are now the Network State 2. As the next time period occurs, the network at Network State 2 becomes the network at the Network State 3, and this network will be the network at the network State 4, and so on. For example, Figure 3.a shows a mobile network of 10 nodes at a given time, that is, at a network state i. Due to the network will move in the next time frame, the link costs of the network will change, generating a new network state i + 1. As the network moves for each time frame, a new network state is generated to represent the network movement across time, as we can generically see in Figure 2. The same logic applies to a bigger scenario, for example, a network of 20 nodes in Figure 3.b.
• Nodes: Each node is denoted as n it where i is the number node and t is the network state of the node. Depending on the communication range, a node can communicate with another node in the direction described by the Figure 2. For example, n 11 can communicate with n 21 and n 31 . • Buffers: In telecommunication networks, a router or a sensor (a node) can decide not-sending its message, storing it in a buffer until will be appropriate to send it to another node. In our model, this situation is represented as a link between n 11 and n 12 , meaning that n 11 can store its message in its buffer, that is, node n 12 .
• Costs: As was mentioned before, a link has a cost. Then, there is a cost for sending a message from n 11 to n 21 called C 21l 11 , and denoted as C jul it . This expression is the cost to carry a message from node i at the state t to node j at the state u at the Network State l. As we mentioned previously, we have two types of costs: delay cost and energy consumption cost. The cost C jul it is a general form (for illustrative reasons) for representing any cost in our network.
• Directed graph: In Figure 2, our goal consists to carry a message from node 1 to nodes 2 and 4. Then, our Source node is the node 1, and our Destination nodes are the nodes 2 and 4. In this sense, a directed graph is constructed from Source node to Destination nodes. For this reason, links direction points to the Destination nodes.
• Goal: Our goal consists of carrying a message from a Source node to a Destination node using neighboring nodes as forwarding nodes for sending a message, and even using buffers, if it is necessary, for waiting an appropriate situation for sending the message. In this sense, we have to find the minimum cost path between a Source node and a Destination node considering the network is changing across time, which is represented by Network States. Additionally, for simplicity, we assume only one link can be selected for sending the message per each Network State. This means that if a message is at the node n 11 , this node at this Network State 1 can send a message to only one neighbour, n 21 or n 11 , or storing it in its buffer, that is, n 12 .
• Example Result: According to the example shown in Figure 2.b and based on links cost, the minimum cost path from the Source node n 11 to the Destination node 4 is the path compounded by the highlighted links: n 11 to n 31 , n 32 to n 33 , n 33 to n 34 and n 34 to n 44 . In other words, X 3114 11 = 1, X 3324 32 = 1, X 3434 33 = 1 and X 4444 34 = 1. Likewise, the minimum cost path from the Source node n 11 to the Destination node 2, is the path compounded by the highlighted links: n 11 to n 12 and n 12 to n 22 . In other words, X 1212 11 = 1 and X 2222 12 = 1. Notice that the solution does not correspond to the path n 11 to n 21 due to its cost, 6, is higher than the optimal solution provided by our proposal, that is, 4.

2) SETS, PARAMETERS AND VARIABLES
Next, the sets, parameters, and decision variables of the mathematical optimization model are summarized in Table 2.

3) OBJECTIVE FUNCTIONS
Next, the mathematical optimization model is described as follows: This expression indicates we are going to obtain a solution that combines delay and energy consumption requirements. Otherwise, if we consider only one function, for example, the delay function, the solution will only minimize the delay without taking into account the energy consumption, which is very disadvantageous in the context of wireless sensor networks because they have limited energy levels that we have to preserve as much as possible. Likewise, if we only consider the energy consumption function, the solution will not take into account the appearance of long delays, which will cause a negative impact on delay-sensitive wireless sensor networks.

4) CONSTRAINTS
Subject to: Equation 1 corresponds to the general objective function, which is composed of two objective functions: one based on the delay cost and the other on the energy consumption cost. The previous expressions are explained in the following items: • Destination State Constraints (from 2 to 10 ): The following expressions are referred to the Destination State; that is, the network state at which a Destination node is found at the minimum possible cost.
-Defining D d jl : D d jl allows to obtain the Destination State l at which a Destination node j is found at the minimum possible cost. The expression 2 avoids that D d jl will be one at first state. The equation 3 avoids D d jl will be one for nodes different from the destination node d.
-Defining DS d l : DS d l allows to determine the Destination State l at which a Destination node d has been found at minimal cost. Expression 4 allows us to know the state l at which D d jl was selected. Equation 5 indicates that only one destination state is possible. In the expression 6 we assume it is not possible that the destination state will be the first state.
-Selecting forwarding nodes: A forwarding node indicates the node selected at each state for building the minimum cost path. Expressions 7 and 8 restrict to one the number of Y d jl for each State that is less than the Destination State. The equation 9 restricts to zero the number of Y d jl for each State that is higher than the Destination State. The expression 10 indicates that it is possible only one link to the Destination node for all states; that is, only one state is selected, and for the rest of the states, the link must be zero. is, the minimum cost path to reach a destination node d could be at the first network state. For this reason, it is necessary to apply the following post-processing pseudocode: This pseudocode basically indicates that if there is a directed link between a source node and a destination node d at which its cost is less than the solution found by the mathematical optimization model. In such case, the optimal solution is at the first network state. Otherwise, the solution is in a future network state calculated by the mathematical optimization model. Notice that our multi-objective mathematical optimization model was designed for finding paths from a single source node to multiple sinks (destination nodes). However, later, in the Results Section (IV), we will see that the network scenario simulation only considers one source node and one sink. In this sense, our mathematical model is configured to find a path between one source node and one sink.

B. MOBILITY PREDICTION ALGORITHM BASED ON MARKOV CHAINS
In this section, we propose the use of a mobility prediction algorithm based on Markov Chains to estimate future distances between nodes in a mobile network. Forecasting future distances help us to determine if future movements of nodes will cause communication disruptions in paths. Managing this information provided by the mobility prediction algorithm can be useful for decreasing communication disruptions, and therefore, result in the reduction of the endto-end delay in the mobile network. The details about why the mobility prediction algorithm can reduce the end-to-end delay of the network were described in Section II-A, specifically, in Figures 1.a and 1.b. In order to be aware of network mobility, we use RSSI (Received Signal Strength Indicator) measurements, which indicate an approximated distance measurement between a pair of nodes. Specifically, our mobility prediction method allows each network node to estimate future distances (RSSI measurement) of neighboring nodes for determining if they will be farther or closer at a future time. In summary, we propose to use a mobility prediction method based on Markov Chains for estimating future RSSI measurements, which will help us to determine if a node will cause a communication disruption at a future time. Managing this information will be useful to minimize the delay experimented in the network. Details about how we manage this information are explained as follows. In relation to the Figure 4.a), suppose we have a network that consists of two nodes: n k and n l , where n l is a neighboring node of n k . There are two times, t 1 and t 2 , at which our small network is evolving in time. At time t 1 the node n l is located at a certain distance from n k . However, at time t 2 we want to predict if n l will be farther or closer (or at the same distance in t 1 ) from n k .
Additionally, according to Figure 4.b), there is a minimum and maximum distance at which n l can be located to establish a communication link with n k . At a minimum distance, n l will have a maximum RSSI, RSSI max , and, at a maximum distance, n l will have a minimum RSSI, RSSI min . At t 2 , n l could be located at any distance between RSSI min and RSSI max . Our goal consists of estimating a location between RSSI min and RSSI max at which n l will be in a future time (in this case, t 2 ). Theoretically, there are infinite locations between RSSI min and RSSI max , but for our model, we assume discrete locations equitably distributed. These possible locations, at which n l could be, we call them states. In this sense, at a future time t 2 , n l could be at S 1 , S 2 , S r or S G , where G is the maximum number of states. The initial probability of n l for being at any state S i is 1/G, which is called Initial Probability Distribution of set S (π), which can be expressed as follows: According to Figure 5.a), suppose we want to know the probability to go from the state S 2 to the state S 4 , which is calculated with the following expression: where N (S i , S j ) is the number of times that state S i follows state S i . This expression can be applied for the rest of the probabilities using the following expression: In this sense, we have the probability to go from any state S i to any state S j . These probabilities can be expressed in a matrix, which is called Transition Matrix: In relation to the Figure 5.b), suppose that in a current time t 1 , n l is at state S 3 and we want to estimate the future state of n l at a future time t p . For this purpose, we can apply the following expressions: S p = max{P s 1 , P s 2 , . . . , P s G } According to the expression 20, n k can finally obtain the most probable future state at which n l will be at time t p , and we can use this information for routing decisions in order to reduce the delay caused for communication disruptions in paths. More precisely, these routing decisions are made by our routing algorithm which, in addition to our mobility prediction algorithm, allows us to reduce the end-to-end delay and minimize the energy consumption of network nodes.
Based on the previous stochastic model proposed, we present our mobility prediction algorithm. The pseudocode of this algorithm is shown as follows.
This algorithm pretends to find the forwarding node with the best probability to be near to the current node in order to find as fast as possible the paths between a source node and multiple sinks. The Transition Matrix is calculated for each node at each period time t in order to update this matrix with the aim of select the most appropriated forwarding node in terms of delay and energy consumption. Lines 1 to 3 initialize the sensor nodes and the sink. Line 4 indicates that for each time period, it is required to check where is a data packet in  25: end for order to be sent through several forwarding nodes to finally achieve the sink (line 5). If the data packet is at node i, then we obtain the list of neighbour V (line 6). From that list, the best forwarding node is obtained through the Probability Transition Matrix (lines 7 and 8). If the node i does not have neighbour nodes, the node i stores the data packet until the next time period (lines 9, 20-23). If between the neighbour nodes there is the sink, we send the data packet to the sink, and then, the path has been built (lines 10 to 13). If between the neighbour nodes there is a connected node, we send the data packet to it because it has high chances to find faster the sink (lines 14 to 16). Finally, if between the neighbour nodes there is not a connected node nor the sink, we send the data packet to the best forwarding node obtained in the lines 7 and 8. For the next time period, this process is repeated until the sink is achieved. Once the sink is achieved, the path between a source node and the sink has been built and the algorithm finishes. In terms of computational complexity, we use the Big-O notation to indicate the time complexity of our Markov Chains approach. In this sense and according to algorithm 2, our prediction algorithm based on Markov Chains is O(T * V * G); where: T is the total number of time periods (line 4), V is the total number of neighbour nodes of a node N i (line 6) and, G corresponds to the number of discrete states for calculating the Transition Matrix (line 7).

C. MOBILITY PREDICTION ALGORITHM BASED ON DEEP LEARNING
In this section, we also propose a deep learning approach to predict future distances between nodes in a mobile network. Predicting future distance helps us to determine if future movements of nodes will cause communication disruptions in paths. For this reason, handling this information provided by the mobility prediction algorithm can be useful for decreasing communication disruptions, and therefore, result in the VOLUME 9, 2021 reduction of the end-to-end delay in the mobile network. The details of this approach are described in the following items.

1) DEEP LEARNING APPROACH
We propose to use a supervised deep multi-layer perceptron (DMLP) neural network for predicting future nodes' positions in a mobile network taking into account the same considerations seen with the Markov Chains approach. These considerations are described as follows: • In order to be aware of network mobility, we use RSSI (Received Signal Strength Indicator) measurements, which indicate an approximated distance measurement between a pair of nodes. Specifically, our deep learning method allows each network node to estimate future distance measurements (RSSI levels) of neighboring nodes for determining if they will be farther or closer at a future time. This will help us to determine if a node will cause a communication disruption at a future time. Handling this information will be useful to minimize the delay and the energy consumption experimented in the network.
• With the aim of minimizing the delay and the energy consumption in the network, this deep learning approach pretends to find the forwarding node with the best chances to be near to the current node (the node that has a packet and has to send it to a neighbour node) in order to build as fast as possible a path between a source node and a sink. Taking into account the previous considerations, Figure 6 shows a diagram that summarizes the operation of our deep learning approach.
This diagram ( Figure 6) is divided in two phases: Offline Phase and Online Phase, which are described in detail as follows: • Offline Phase: The offline phase is used to build a deep learning model to classify the best forwarding node to be selected in terms of minimum distance and energy consumption. A dataset compound of many samples is used to feed a training phase in order to obtain a learning model for a classification task. The dataset and the classification proposal are explained in detail in the  Dataset section. This offline phase is performed outside of the network simulator operation described in III-D.
• Online Phase: The online phase predicts in real time the best forwarding node to be selected in terms of minimum distance and energy consumption. In detail, each node i has an intern prediction phase to determine the best forwarding node when it will be required, that is, when a node i has a packet that has to be sent to a forwarding node in order to build a path to achieve the sink.

2) DATASET
A dataset compound of many samples is used to feed a training phase in order to classify the best forwarding node to be selected in terms of minimum distance and energy consumption. This dataset represents the fact of having several neighbour nodes with a certain distance (RSSI level) from a node i and a certain energy consumption level. In other words, each neighbour node has a distance in relation to a node i and also has an energy consumption level. These values, distance, and energy consumption, must be denoted across time in order to predict the best forwarding node in the future. In detail, each neighbour node j must describe its distance (respect to a node i that has a message packet) and energy consumption level at a different time in order to know which neighbour node will be farther or closer in the future (See Figure 7). The samples of this dataset are shown in Figure 8. This dataset is described in detail as follows: • s 1 to s k correspond to all samples from which the deep learning model will learn. Notice that this learning process corresponds to an offline phase.
• Each s i has two types of values: distance values and energy consumption values.
• At each s k , distance values are assigned for each neighbour node j, in this case, node 1, node 2 and node m.
These values are manually assigned in order to fill out all of these values for the training phase. In addition, each neighbour node j has a distance value related to a specific time t p in order to represent the fact that this neighbour node is moving, and then, this movement means that distance changes across time. Each neighbour node j has, from t 1 to t n , to represent its movement in terms of distance values.
• Similar to distance values, at each s k , energy consumption values are also assigned for each neighbour node j. These values are also manually assigned in order to fill out all of these values for the training phase. In addition, each neighbour node j has an energy consumption value related to a specific time t p in order to represent the fact that this value can change across time. Each neighbour node j may have, from t 1 to t n , different energy consumption values.
• For each s k , the best forwarding node j in terms of distance and energy consumption is labeled. In other words, from node 1 to m, one of them is selected to be the best forwarding node. In this sense, this case corresponds to a multiclass classification problem, at which from many categories (neighbour nodes), one of them must be labeled as the correct one. Once each sample s k is labeled, a training phase is launched in order to create a learning model according to this dataset.
• In a previous item, we said that each neighbour node i had several neighbour nodes from node 1 to node m. Initially, the default value of m was assumed to be equal to N − 1, where N was the total number of network nodes, and minus 1 to omit the node i. However, after several tests taking into account a network of maximum of 50 nodes, a communication radius of 20m and the Markov-Gauss Mobility model, we obtained that, on average, the maximum number of neighbour nodes for a node i was 8. In this sense, there was a significant reduction of the number of features, going from N −1 to only 8.
• In a previous item, we said that each neighbour node j had a distance and an energy consumption value for each time from t 1 to t n . The n value was considered for three scenarios: 5, 10, and 15. n = 5 means that it will be considered the last five values of distance and energy consumption to determine the best forwarding node. Likewise, for n = 10 and n = 15 it will be considered the last ten and fifteen values, respectively, of distance and energy consumption to determine the best forwarding node. It is necessary to test several values of n in order to determine the best results in terms of delay and energy consumption in the network simulator, which is described later.
• Considering the previous items, and taking into account that m = 8, n can be at least 15, and there are distance and energy consumption values, each sample s k has 240 columns (8 × 15x2). As we said previously, an additional column is added to each s k in order to label it, indicating the best forwarding node j in terms of distance and energy consumption.
Once each sample s k is labeled, a training phase is launched in order to create a learning model according to this dataset. The dataset was divided into two groups: one group for the training phase, and another group for the prediction phase (testing phase). Table 3 summarizes the parameters and results for the training and testing phases. For this classification problem, we used the Deep Learning Toolbox in MATLAB. In detail, we configured a five-layer neuronal network with a sofmax activation function. For each sample s k , and once our model has been trained, this softmax activation function shows us the probability of each neighbour node to be the best forwarding node. In other words, if a node i has eight neighbour nodes with certain values of distance and energy consumption across time, then, the softmax activation function shows a list {P 1 , P 2 , . . . P 8 } where P i is the probability of the neighbour node i to be the best forwarding node and, 8 i=1 P i = 1. In this sense, the best forwarding node corresponds to the neighbour node with highest probability, that is, P best = max{P 1 , P 2 , . . . P 8 }. The training and testing results are shown in Table 3.
From Table 3 is necessary to remark the following details: • The deep learning model obtained through the training phase is applied later in the network simulator to select the best forwarding node for a node i when it has a packet that needs to be sent to any of its neighbour nodes j.
• The higher is n, better values are obtained for accuracy, precision, recall, and F1. In other words, a higher value of n allows the learning model to have more distance information in order to improve the distance prediction of its neighbours.
Based on the description of our Deep Learning approach, we present the pseudocode of how this approach is incorporated in our network simulator, which will be described later, to select the best forwarding neighbour node in terms of delay and energy consumption. The pseudocode of this algorithm is shown the Algorithm 3.
This algorithm pretends to find the best forwarding node in terms of delay and energy consumption according to the explanation described in section III-C2. Lines 1 to 3 initialize the sensor nodes and the sink. Line 4 indicates that for each time period, it is required to check where is a data packet in order to be sent through several forwarding nodes to finally achieve the sink (line 5). If the data packet is at node i, then we obtain the list of neighbour V (line 6). From that list, the best forwarding node is obtained through the Prediction Phase of the Deep Learning approach (lines 7 and 8). If the node i does not have neighbour nodes, the node i stores the data packet until the next time period (lines 9, 20-23). If between the neighbour nodes there is the sink, we send the data packet to the sink, and then, the path has been built (lines 10 to 13). If between the neighbour nodes there is a connected node, we send the data packet to it because it has high chances to find faster the sink (lines 14 to 16). Finally, if between the neighbour nodes there is not a connected node nor the sink, we send the data packet to the best forwarding node obtained in lines 7 and 8. For the next time period, this process is repeated until the sink is achieved. Once the sink is achieved, the path between a source node and the sink has been built and the algorithm finishes. In terms of computational complexity, we use the Big-O notation to indicate the time complexity of our Deep Learning approach. In this sense and according to algorithm 3, our prediction algorithm based on Deep Learning is O(T * V ); where: T is the total number of time periods (line 4) and V is the total number of neighbour nodes of a node N i (line 7). The complexity obtained for the Deep Learning approach (O(T * V )) is less than the Markov Chains method (O(T * V * G)). For this reason, in terms of computational complexity, the Deep Learning approach is recommended to be used for our problem.

D. NETWORK SIMULATOR
In order to test our prediction distributed routing algorithm based on Markov Chains and the Deep Learning approach, we have designed a Mobile Wireless Sensor Network Simulator in MATLAB, which has the following basic network components: • Destination node: It is the final node that will receive a data message. In our simulations, this node will always be the last network node.
• Source node: This node will have a data message, which must arrive to the destination node. In our simulations, this node will always be the first network node.
• Connected node: If a message arrives to this node, this node knows the path to achieve the destination node. This is a technique that helps us to find faster the sink when the data packet is close to it.
• Forwarding node selection: When a node has a data message, this process consists to select properly a neighbour node as a forwarding node, which is selected according to the following priorities: If among the neighbour nodes there is the destination node, then, the forwarding node is the destination node. If among the neighbour nodes there is not the destination node, but there is a connected node, then, the forwarding node is the connected node. If among the neighbour nodes there is not a destination node neither a connected node, then, the forwarding node is a node obtained by the Prediction method selected: the Markov Chains approach or the Deep Learning approach.
• Sink refreshing: This process consists to determine which nodes will be connected nodes at each certain period. This refreshing process is required due to network mobility, since it causes that connected nodes established in a previous state period, they will not possibly be connected nodes in the next period.
• Loop detection: It is a very important process in order to avoid that a packet remains in a loop.
• Prediction at each k-state with the Markov Chains approach: At each network state the Transition Matrix (T) is calculated for all network nodes, except the destination node. Remember that this Transition Matrix stores the probability of each node to be at certain distance level respect to their neighbour nodes.
• Prediction at each k-state with the Deep Learning approach: At each network state and for each node i, the last n distance, and energy consumption values are considered to determine the best forwarding node.
• Prediction for selecting a forwarding node: As we said before, if among the neighbour nodes there is not a destination node neither a connected node, then, the forwarding node is a node given by the Prediction method. In the case of the Markov Chains approach, this forwarding node is selected based on the information given by the Transition Matrix (see lines 7 and 8 in Algorithm 2). On the other hand, in the case of the Deep Learning approach, this forwarding node is selected based on the information given by the Prediction Phase (see lines 7 and 8 in Algorithm 3).

E. ENERGY MODEL
The energy consumption model is required to take into account special considerations. If a node has to send a data packet of K bits to another node located at a D distance, then, the following are the expressions to calculate the energy consumption in the transmitter node as well as the receiver node. In the transmitter node, the consumption is E elec +E amp , where E elec is the energy consumption for codification, modulation and filtering. E amp corresponds to energy consumption for the Transmitter Power Amplifier. In the same way, in the receiver node, the consumption corresponds to E amp . Then, the expressions for the transmitter and receiver sensor are the following [19]: In the constraint 21 there will have a higher energy consumption than constraint 22 because for transmission is required an extra consumption for codification, modulation and filtering (E elec ), in addition to energy consumption for amplifying the signal received (E amp ). In detail, in the context of this work is possible to send two types of packets: data packets and control packets. A data packet corresponds to information that is collected by a source sensor node and, then, this source node needs to build a path in order to send this data packet to the sink. Thus, when a data packet has to be sent to a neighbour node, the expression 21 is applied to the node that sends the data packet and, the expression 22 is applied to the node that receives the data packet. In this sense, this data packet is sent several times through different network nodes until it achieves the sink. The data packet size corresponds to the value indicated in 4. On the other hand, in the context of our work, a control packet is used to collect information about the distance of neighbour nodes in order to build a path according to the indications given for our prediction algorithms, that is, the prediction algorithm based on Markov Chains and the prediction algorithm based on Deep Learning. The control packet size corresponds to the value indicated in 4.

F. MOBILITY MODEL
With respect to the network nodes movement, the present approach was evaluated considering a Gauss-Markov mobility model [9], [20], at which the mobility network was configured for not being totally random in order to be predictable because, otherwise, there would not have any reason of applying a prediction method in scenarios where the movements of the sensors are totally randomized. In other words, the mobility network must be predictable in a certain manner since we are dealing with sensors attached to objects, animals, or humans which exhibit movements that are not totally randomized, that is, present a certain movement pattern.
In this model, the values of the mobility speed and movement direction of the node at each instant are calculated only on the basis of those values in the previous instants as follows: where v n and θ n denote the new speed and direction of the mobile node at time interval n, respectively, 0 ≤ α ≤ 1 is the tuning parameter to vary the randomness degree, v and θ are the expected values of the speed and direction as the Gauss-Markov random process, respectively, and v x n −1 and θ x n −1 are Gaussian distributed random variables with zero mean and unit variance, and independent of v n and θ n . In this sense, positions provided by the mobility model were used to calculate distances, which were used to determine RSSI values, as follows [14]: In Equation 25, n, d 0 and RSSI d 0 are given values, and configured for outdoor environments [14]. The first one corresponds to the path loss coefficient. The second one indicates a known distance of reference, and the third one establishes the RSSI level at distance reference d 0 . Finally, the RSSI level for a specific distance d is calculated through the previously given data values.
As we presented in previous sections, these RSSI values were used as an indirect way (since we assume nodes are not equipped with GPS devices) to know how far are two pairs of nodes.

IV. IMPLEMENTATION AND RESULTS
Our mathematical optimization model was implemented using GAMS, and MATLAB was used for implementing the rest of the approaches. In order to properly understand the results, it is necessary to describe each label presented in the figures as follows: • Mathematical Model: It corresponds to the multiobjective mathematical optimization proposed in section III-A. This mathematical model was implemented in GAMS. In addition, for this model, w 1 = 0.5 and w 2 = 0.5 to provide the same weight for the delay function and energy consumption function.
• PAMC: It corresponds to the Prediction Algorithm based on Markov Chains proposed in section III-B.
• PADL for n = 15: It corresponds to the Prediction Algorithm based on Deep Learning for n = 15 proposed in section III-C. Remember that n = 15 means that it will be considered the last fifteen values of distance and energy consumption of neighbour nodes to determine the best forwarding node.
• PADL for n = 10: It corresponds to the Prediction Algorithm based on Deep Learning for n = 10 proposed in section III-C. Remember that n = 10 means that it will be considered the last ten values of distance and energy consumption of neighbour nodes to determine the best forwarding node.
• PADL for n = 5: It corresponds to the Prediction Algorithm based on Deep Learning for n = 5 proposed in section III-C. Remember that n = 5 means that it will be considered the last five values of distance and energy consumption of neighbour nodes to determine the best forwarding node.
• AODV for MWSN: This algorithm corresponds to the traditional AODV algorithm enhanced to be applied in mobile wireless sensor networks [21]. Remember that this algorithm is based on distances to find routes and, thus, we want to know if our prediction algorithm can surpass it.
• Random Algorithm: This algorithm builds a path selecting a random neighbour node as a forwarding node. These approaches (except the Mathematical Model) were tested 10000 times for each network size in order to obtain significant results. In other words, 10000 different scenarios for each network size were generated to evaluate each approach. The mathematical optimization model was tested 100 times for each network size instead of 10000 because it is an offline solution that requires too much time. In addition, the maximum number of network states (network movements) for each network size was 10000, that is, the number of times that each network node changed its position. This value was enough for finding a path from the source node to the sink for each approach.
In summary, we have proposed prediction distributed routing algorithms that takes into account the network mobility in order to build as fast as possible a path between a source node and a sink, comparing their performance against the optimal solution given by the mathematical optimization model and compared against traditional routing algorithms such as the AODV for MWSN and the Random Algorithm described previously. Table 4 summarizes the most important parameters assumed in the simulations.
From Table 4, we assume to deploy the nodes in an area of 100 × 100 m 2 considering a communication radius (r c ) of 20 meters. We also assume just one source node and one sink in order to build a path from this source node to the sink. The rest of the information is supported by references given in this table.
The metrics used to evaluate the performance of each approach are the following: • Delay: It corresponds to the time needed to carry a data packet from the source node to the sink.
• Energy Consumption: It corresponds to the energy wasted for all the network since a data packet is transmitted from the source node until it is received by the sink.
• Hops: It corresponds to the number of hops taken by a data packet since it is transmitted from the source node until it is received by the sink.
• Overhead: It corresponds to the control packets required to build a path to carry a data packet from the source node to the sink.
The most important results are described and analyzed in the following items: • Figures 10a, 10c, 10e and 10g present the performance of each approach for delay, energy consumption, hops and overhead respectively. Figures 10b, 10d, 10f and 10h are just a zoom in version of Figures 10a, 10c, 10e and 10g respectively. This zoom in is performed to see in detail the performance from 30 to 50 nodes.
• In all figures, we could verify that our mathematical model always obtained the best results, which was expected. In other words, our mathematical model proposal always obtains the best solution for each network size for all metrics. Remember that the mathematical model needs too much time to provide a solution, whereby it is not an affordable solution for real mobile wireless sensor networks applications because they require solutions that must be obtained as fast as possible. In this sense, in this work our mathematical optimization model is used as an offline method that has global information about the network, which allows us to obtain the best possible solution, that is, the optimal solution value for a specific network scenario. For this reason, it is obvious that the mathematical model always obtains the best results for all metrics evaluated. As a consequence, the optimal solutions offered by the mathematical model can be used as reference values to evaluate how close is the performance of our prediction algorithms (PAMC and PADls), AODV for MWSN, and the Random Algorithm to these optimal solutions.
• Figures 10a and 10b show the delay performance of all approaches for each network size (10 to 50 nodes). Here, our prediction algorithms (PAMC and PADLs), after the mathematical model, showed a better performance than AODV for MWSN and Random Algorithm for each network size. These results confirm that using mobility prediction is very useful to establish fastly a path from a source node to the sink. In detail, in terms of the Deep Learning approach, the model used for n = 15 (PADL for n = 15) showed better results than the Markov Chains approach (PAMC). In other words, after the mathematical model, the PADL for n = 15 obtained the best results in terms of delay performance. This indicates that a high value of n (15) for PADL allowed us to predict more precisely the movement pattern of neighbour nodes and, thus, selecting a better forwarding node. In other words, a higher value of n allows the learning model (PADL) to have more distance information in order to improve the distance prediction of its neighbours. Remember that a better forwarding node represents the node that will be closer in the future to the node i, that is, the node that has a data packet and is deciding which neighbour node must be selected as a forwarding node to build a path between a source node and the sink. Selecting a forwarding node with this method reduces the delay caused for communication disruptions.
On the other hand, as network size decreases, the delay performance of our prediction algorithms (PAMC and PADLs) is each time better than AODV for MWSN and the Random Algorithm. This means that if our network has few nodes, our prediction algorithms (PAMC and PADLs) are capable of obtaining a large advantage against the other algorithms (AODV for MWSN and the Random Algorithm) for finding the destination node. Few nodes mean that there is less probability to find a neighbour node, and thus, is more difficult to establish a path to the sink. However, our prediction algorithms are capable of finding reliable forwarding neighbour nodes, allowing us to establish fastly a communication path to the sink. In other words, the fact of finding reliable forwarding neighbour nodes indicates we are selecting forwarding nodes with less probability of suffering an interruption, that is, neighbour nodes that, in the next future, will be closer to the node that currently has a packet to be sent. In addition, as network size increases, the delay performance of AODV for MWSN and the Random Algorithm is each time closer to our prediction algorithms (PAMC and PADLs). This means that if a network has many nodes, this favors the fact of establishing a path for AODV for MWSN and the Random Algorithm. Many nodes mean that there is more probability to find a neighbour node, and thus, is easier to establish a path to the sink since there is less chance to suffer an interruption.
In summary, the prediction capability is more effective as network size decreases. Finally, in terms of delay performance, it is recommended to use our prediction algorithms as network size decreases, that is, for small networks (10 to 40 nodes). In contrast, as network size increases, the prediction capability begins to be irrelevant against traditional solutions such as AODV for MWSN and the Random Algorithm.
• Figures 10e and 10f show the hops performance of all approaches for each network size (10 to 50 nodes). There is a clear proportionality between the delay performance and the hops performance. This is because the more hops are necessary to traverse the network to finally achieve the sink, the more delay is required. In this sense, the behavior of each approach for the hops performance evaluation is equivalent to the delay performance. For this reason, the same analysis done for the delay performance is applied to the hops performance.
• Figures 10g and 10h show the overhead performance of all approaches for each network size (10 to 50 nodes). According to these figures, our prediction algorithms (PAMC and PADLs) require many control packets to build as fast as possible a path between a source node and the sink. The more nodes are in the network, the more control packets are needed for our prediction algorithms because each network node continually collects distance information from its neighbour nodes according to the information provided in sections III-B and III-C.
• Figures 10c and 10d show the energy consumption performance of all approaches for each network size (10 to 50 nodes). According to these figures, for 10 and approximately 15 nodes in the network, our prediction algorithms (PAMC and PADLs) have less energy consumption than AODV for MWSN and the Random Algorithm. In detail, PADL for n = 15 obtained the best results in terms of energy consumption performance for 10 and approximately 15 nodes in the network. PADL for n = 15 needed less energy consumption than the other algorithms (not the mathematical model) because it required a less number of hops and control packets to build a path between a source node and the sink. That is, the fewer hops and control packets are needed to build a path, the fewer amount of transmission and reception processes that waste energy are needed. On the other hand, as network size increases, the energy consumption of our prediction algorithms (PAMC and PADLs) increases because the more nodes the network has, the more control packets are needed to build a path. Remember that, in our prediction algorithms, all nodes continually collect distance information from their neighbour nodes to build a path to the sink. As a result, the energy consumption of AODV for MWSN and the Random Algorithm is less than our prediction algorithms because they do not need as many control packets as our prediction algorithms.
In summary, the performance of our prediction algorithms (PAMC and PADLs) in small networks (10 to 40 nodes) is very beneficial since it allows us to find fastly a path between a source node and a sink. However, as network size increases, the prediction capability begins to be irrelevant against traditional algorithms such as AODV for MWSN and the Random Algorithm. However, analyzing at the same time the delay and the energy consumption performances, our prediction algorithms are recommended to be used in small networks, that is, from 10 to 15 nodes approximately. Anyway, our prediction algorithms offer the best results in terms of delay as the network size increases, specially PADL for n = 15, but at the expense of increasing the energy consumption of the network. Thus, our prediction algorithms could be used depending on the requirements of the application. For example, we can simply use AODV for MWSN or the Random Algorithm for applications that do not have delay requirements but needs to minimize energy consumption. On the other hand, if the application has delay requirements but does not need to minimize the energy consumption, we can use our prediction algorithms, specially PADL for n = 15. Finally, if the application has requirements in terms of delay and energy consumption, we can use our prediction algorithms in small networks, but if the network size is not small, we can use our prediction algorithms at the expense of increasing the energy consumption of the network.

V. CONCLUSION
We proposed a multi-objective optimization model and prediction distributed routing algorithms based on Markov Chains and a Deep Learning approach for finding the minimum cost path between a source node and a gateway node (destination node) considering all nodes are mobile. The results obtained by the mathematical optimization model served as a reference to evaluate our prediction algorithms and other traditional algorithms in order to analyze their performance in terms of delay and energy consumption in MWSN. In other words, optimal values given by the mathematical model were be used to determine how good were the results obtained by the algorithms. Additionally, we implemented typical distributed routing algorithms to know the performance of the prediction distributed routing algorithm. As expected, our mobility prediction algorithms obtained the best solutions in terms of delay and energy consumption compared against not-using prediction techniques (AODV for MWSN and the Random Algorithm), being more effective as network size decreases.
In detail, our mobility prediction algorithms allowed us to establish the most reliable path for finding the sink and, at the same time, it allowed us to obtain the best path to the sink compared against traditional algorithms in terms of delay and energy consumption. Thus, the reliability offered by the mobility prediction algorithms allowed us to select the most stable forwarding nodes in terms of their network connectivity. In this sense, it was less likely that a data message would be in isolated network zones, and then, there was a higher probability to reach the sink by the data message. For this reason, when the number of network nodes was scarce, the mobility prediction algorithms performance was too high in terms of delay in comparison with the rest of the algorithms. In other words, we proposed to apply our prediction algorithm in networks of 10 to 40 nodes because we considered more interesting the fact of applying the prediction methods in scarce networks where the number of neighbours is very limited and, for this reason, the probability of finding a sink is much less than in large networks. This means that if our network has few nodes and, as a consequence, it is more difficult to find a path to a sink, our prediction algorithm was capable to obtain a large advantage in terms of delay against traditional algorithms for finding the sink.
In terms of energy consumption, the energy performance of our proposal besides the delay performance makes the mobility prediction algorithms totally suitable for scarce networks, that is, for mobile wireless sensor networks applications where the number of nodes is not too high and it is required data messages arrive at the sink as soon as possible.
In summary, the performance of our prediction algorithms in small networks (10 to 40 nodes) is very beneficial since they allow us to find fastly a path between a source node and a sink at minimum energy consumption. However, as network size increases, the prediction capability begins to be irrelevant against traditional algorithms such as AODV for MWSN and the Random Algorithm.
As future works, we are planning to evaluate our dataset in other network simulators such as OMNET++ and EDGF [30]. On the other hand, our proposed network simulator was developed taking into account the network layer, but, in addition, we are considering to extend our proposal for incorporating the MAC layer.