A Deep Learning-Based Routing Approach for Wireless Mesh Backbone Networks

Optimal routing decisions are key in communication network environments to minimize bottlenecks such as traffic congestion and limited bandwidth. Routing in wireless mesh backbone networks is the focus of this study given that they are popular particularly for providing broadband connectivity to a huge number of users accessing and transmitting multimedia data hence are susceptible to communication-oriented bottlenecks. Existing routing solutions are mainly optimistic approaches. These mainly depend on link states, distance and hop counts which present a routing generalization bottleneck especially in huge wireless mesh networks because it is difficult to get the entire footprint of the network. Simply put, it is very difficult to determine an optimal route in a huge wireless mesh network (WMN) with dynamic network conditions. Since deep learning has a strong generalization ability. In this paper, a deep learning-based routing approach is proposed with the goal of ensuring a defined optimal quality of service (QoS) in a WMN. In order to achieve the study purpose, a wireless mesh network simulation environment is built and a network data feature set is orchestrated to generate a data set used to train a Long short Term Memory (LSTM)-based deep learning model that estimates the route with the optimal QoS. The generated data set is validated by training other learning models including the Multilayer Perceptron (MLP), Logistic Regression (LR) and Random Forest (RF). Our results show that the routes selected by the LSTM-based model provides the best packet delivery ratio (PDR) and throughput. Our results further show that the learning models (MLP, LR, RF) also provide better PDR and throughput compared to the traditional Ad-hoc On-demand Distance Vector (AODV) routing protocol.


I. INTRODUCTION
Wireless traffic is growing tremendously due to the huge number and usage of wireless devices available today. The devices generate huge chunks of data that are routed along communication channels yet always encounter difficulties in coping with the unpredictable and unreliable wireless medium [1]. As a result, some challenges such as transmission delays and packet loss arise. Routing solutions are one way to deal with challenges in communication networks in order to realize better quality of service (QoS). The following are potential metrics for route optimization: node-to-node distance, minimum number of hops, interference, delay, error rates, power consumption; the maximum data rates and route The associate editor coordinating the review of this manuscript and approving it for publication was Jiankang Zhang . stability; use of multiple routes to the same gateway and use of multiple gateways.
Routing Algorithms can be classified as adaptive and non-adaptive algorithms. Adaptive algorithms change routing decisions based on dynamic topology or link state conditions while non adaptive algorithms do not change the routes once they have been set. Adaptive routing algorithms are further classified as isolated, centralized and distributed. In the isolated approach, routing decisions are made by each node without knowing any information about other nodes. The problem with this approach is that packets may be sent through a congested link which introduces huge end-to-end delays. The centralized approach uses one major node which knows all link state information and uses that information to determine the best route. The problem with this approach is a single point of failure which renders the whole network down. In the Distributed approach, nodes share information with each other and each node uses the collected network information to make routing decisions. The problem with the distributed approach is that it introduces overhead on the network as each node tends to broadcast or flood probing packets to the neighbors at defined intervals. Additionally, absence of node information introduces further delay before a route to a desired destination can be fixed.
Opportunistic routing algorithms are counted among the adaptive routing algorithms. Most traditional routing solutions for wireless networks such as AODV [2] DSR [3] network slicing [4] traffic shaping [5] and load balancing [6] are majorly opportunistic algorithms yet these present major design challenges which must be minimized in order to attain efficient routing. These include 1). forwarding node selection, 2). avoidance of duplicate transmissions 3). packet loss recovery and 4) transmission rate control. These design challenges need redress in order to minimize transmission overheads which are a hindrance to proper utilisation of the available bandwidth.
The mentioned design challenges are can be summarized as a result of the difficulty in acquiring an overall footprint of the network in terms of the active links, broken links, number of nodes, etc. This implies that existing routing solutions do not work well when generalized for the entire network structure and conditions. Therefore, conventional opportunistic routing approaches are still limited. Deep learning approaches have the ability to produce more generalized optimal systems. This study therefore uses deep learning as a suitable approach to orchestrate a more generalizable model which can be used for route estimation in WMN's.
Deep learning systems are a new data-driven solution technology and ha{ve} proved to provide both accurate and optimal solutions in systems estimation, prediction and also network routing [7]. For deep learning systems to effectively perform routing functionality, network data should be collected for training at proper time intervals to cater for dynamic network conditions. This study therefore has considered deep learning is the approach to experiment and improve on WMN routing yet defines a clear data collection and learning model criteria. A few studies such as [8], [9], [10], [11], and [12] have already utilized deep learning systems to manage network traffic in large scale heterogeneous networks and also to perform route selection in real-time. Other studies such as [13] have used computational intelligent approaches such as ant colony optimization for routing decision making.
We use LSTM as a deep learning model to estimate the best routes. The LSTM model is constructed based on a dataset made of the following features; sent packets, received packets, packet delivery ratio, throughput and jitter. Other features such as delay and packet loss rates that have been implemented in the previous studies [14], [15] are also used in this study. In particular the contributions in this paper are as follows: • A QoS metric-based dataset for selection of the best route in wireless mesh networks is constructed. The features of the data set are the following: sent packets, received packets, packet loss ratio, packet delivery ratio, throughput, delay and jitter.
• A fine tuned LSTM model used for selection of the best QoS routes in a wireless mesh network.
• A performance evaluation of three conventional machine learning models and AODV together with the LSTM model in the determination of the best routes.

II. RELATED WORKS
Subsection II-A, reviews conventional studies on network traffic control using traditional routing approaches and Subsection II-B reviews deep learning methods.
A. TRADITIONAL ROUTING APPROACHES Gupta et. al presents the WMN routing challenge as an optimization problem for both static and dynamic network conditions. The authors in [14] contend that even though various dynamic algorithms such as [16] and [15] have been orchestrated, they lack a theoretical foundation to analyze how well the network performs globally. Tang et. al [15] specifically provides a bandwidth allocation strategy as a means to optimize routing performance. Gupta et. al [14] considers that most of this kind of approaches point to optimal bandwidth utilization but does not consider how much traffic overhead is at the transmitting nodes and hence neglecting network resource demand. It is for this reason that they proposed to consider a solution based on an optimization problem for both dynamic and static network condition states. The objective of their work was to maximize the ratio between flow throughput and its demand, subject to the schedulability and fairness constraints. Our approach in this study is deep learning-based. It has been presented in [17] that optimal deep learning solutions provide near to very optimal generalizations for the problems they address. This implies that our approach to the routing problem in WMN's is a suitable candidate to solving the generalization bottleneck that exists in conventional WMN routing solutions.
Ke et. al [18] presents a multi-cast algorithm for WMN's. The authors in [18] note that the problem of identifying a suitable multi-cast tree in a multi-constrained WMN is NP-Hard [19]. Because of this, most solutions that solve NP-Hard problems, are generally heuristic approaches of which genetic algorithms have proved to be an efficient approach. Their study therefore proposed to use the CHC genetic algorithm to optimise multiple QoS parameters for routing purposes. Ke et. al [18] further presents that using genetic algorithms requires a rich set of global information which is a huge challenge to achieve in WMNs. In their work they proposed to use a powerful mesh router to acquire edge state information. Compared to our approach in this article, the data used is also QoS data acquired at each node but our approach to attaining the optimal paths is based on a deep learning model. Additionally, our deep learning model uses more QoS parameters compared to only delay which is used in Ke et. al [18] approach.
Liu et. al [20] present a cross-layered approach to routing in order to achieve high performance in dynamic WMN environments. High performance QoS routing cannot be easily guaranteed in dynamic wireless environments especially because of the multiple constraints which make the problem NP complete [19]. They contend that existing approaches ignore the interaction between the medium access layer (MAC) and network layer in the orchestration of their routing protocols. This has leads to low Qos which is a bottleneck to mission critical wireless communications DSR, AODV, Destination Sequenced Distance Vector (DSDV) and Link Quality Source Routing (LQSR) [4], [5], are the most popular traditional routing protocols. They all utilize the hop count metric such that the route with the least hop count is used. Opportunistic routing [21] are as well popular and mainly depend on using the closest node to the target to forward packets. To mention, Some Opportunistic routing approaches depend on broadcasting and node-to-node distance information. The rest of the paragraphs present a study on opportunistic routing as well as other approaches.
Rozner et al. [1] implemented a Simple Opportunistic Adaptive Routing Protocol (SOAR) in WMN's to manage wireless traffic. It utilized four algorithms namely; (1.) adaptive forwarding path selection to leverage path diversity while minimizing duplicate transmissions, (2.) priority timer-based forwarding to let only the best forwarding node forward the packets, (3.) local loss recovery to efficiently detect and re-transmit lost packets, and (4.) adaptive rate control to determine an appropriate sending rate according to the current network conditions. Simulation results in NS2 showed that SOAR performs significantly in reducing loss rates compared to wireless traditional routing protocols. Improvement ranges from 18.37% to 578.62%. Opportunistic routing in general achieves 2.5 times the throughput of traditional wireless protocols [21].
Zhang et al. [6] applied a traffic control scheme with load balancing based on Path Computation Element (PCE) Architecture on a network congestion scenario on China Telecom's network traffic model and topology. The traffic was arising from classic routing policy that aims to find one best path for traffic. PCE is an entity that has capabilities to determine and find a suitable route through a network for conveying data between a source and a destination. PCE architecture employs two algorithms to determine the optimal path that is, end to end path selection algorithm [22] and global routing Optimization algorithm. End to end path selection algorithm selects high-quality of service main and backup paths with the consideration of hop count and path utilization. The Global routing optimization algorithm adjusts the path for the traffic by considering congestion feedback, adjustment hop count and the expectation of load balance. The measurable data parameters used in [6] were hop count, path utilization and congestion feedback; the study approach provided a considerable improvement in network congestion.
Network traffic prediction algorithms have been exploited in a quest to ensure proper control of traffic in WMN. A study on prediction of network traffic efficiently for real time applications [23] was done to address the problem of network congestion due to high unexpected and mismanaged network traffic in real time applications. Prediction of network traffic using a double exponential smoothing predictor that has a higher accuracy in network traffic prediction and a low cost overhead.
In this method lower weights were assigned exponentially to older observations with consideration of a trend in data [23].Trend means that the average value of the time series increases or decreases with time. A new metric called Error_Energy Score (EE_Score) that combines both accuracy and energy consumption metric was considered. The receiving router with the least energy consumption and the link with the least traffic are considered as the optimal processing router and path for forwarding packets respectively. The high accuracy and consideration of energy consumption of forwarding routers in prediction makes it possible for better optimal path choices and better provisioning of the network traffic, thus reducing congestion and providing a better Quality of Service.
Performance results indicated the following; network traffic is generally predictable, the choice of the predictor is dependent on the characteristics of the network and the same predictor performs consistently well for all the traces from the same source. Despite the findings, the following gaps remained uncovered for future research. (a) Random variations of traffic on a network are not considered in double exponential smoothing prediction. In the event of a random variation, the traffic will flood the routers and links causing congestion, delays, packet drops and negligible throughput. (b) In addition to network traffic prediction, it is important to train a model that will aid proper forwarding of packets by learning from previous congestion occurrences, router and link information to better forward this traffic thus better QOS and achieve a high throughput.
Some wireless mesh networking protocols have been enforced with a complement of edge routing to manage traffic and ensure a sustained throughput [24]. Edge computing is a distributed computing technology where computing, data storage, data processing happens off the network and happens on the edge of the network, and closer to the source of information.
Two protocols were evaluated Hybrid Wireless Mesh Protocol (HWMP) [25] and Greedy Perimeter Stateless Routing (GPSR) [26]. Burchard et al. [24] proposed GPSR over HWMP on grounds that HWMP has an overhead to learn the current state of a network and therefore ends up congesting the network and degrading the overall throughput. GPSR was recommended to ensure a sustained throughput due to its statelessness. Each mesh print needs to know the state of one hop current neighbor and thus no overhead since control messages do not have to be transmitted over the network. GPSR also uses a purely reactive method of path selection and for this reason; it is extremely fault tolerant since a failure in a mesh point doesn't increase any overhead in a network.
The above approaches aimed to control traffic on the network. The deep learning approaches below have been exploited to factor in the aspect of learning from past traffic congestion scenarios.

B. LEARNING-BASED ROUTING APPROACHES
Stampa et al. [8] conducted a study that adjusts a routing strategy in software defined networks a using deep reinforcement learning approach. The reinforcement learning agent implemented an off policy, actor-critic, deterministic policy gradient algorithm [27] that interacts with the network through exchanging of three signals state, action and reward. This approach registered a defined level of success that cannot be achieved by traditional table-based Reinforcement Learning agents.
Similarly, Yu et. al [28] proposed Deep Deterministic Policy Gradient (DDPG) mechanism to optimize the routing in the SDN with the DDPG Routing Optimization Mechanism (DROM).DROM uses link weight adjustments for route selection in a reinforcement manner by using a reward system. Sun et al. [29], proposed an intelligent network control architecture based on deep reinforcement learning that can dynamically optimize routing strategies in an SDN network without human experience. The architecture is called (TImerelevant DEep reinforcement learning for routing optimization). in TIDE, an AI plane is introduced in addition to the conventional software defined network (SDN) layers. The AI plane is composed of a smart agent which outputs weights that are mapped to links for purposes of route making.TIDE also operates in a reinforcement learning manner with a reward function implemented.
Casas-Velasco et. al in [30] introduced a model-free DRLbased algorithm called Deep Reinforcement Learning and Software-Defined Networking Intelligent Routing (DRSIR). DRSIR uses path-state metrics and the global view and control offered by SDN to compute and install optimal routes proactively in forwarding devices, thus allowing adaption to dynamic traffic changes without prior knowledge of the underlying network. Using path-state metrics enables the reduction of knowledge abstraction needed by the routing agent since this approach directly explores different path options instead of link state information. Other studies which use path metrics in a reinforcement learning approach for routing decisions include [31] and [32].
Our approach also follows the use of path metrics as in [30], [31], and [32]. However, we do not follow a reinforcement or reward approach for learning the best routes to use in the network. Our approach uses a trained LSTM model at each router or switch to estimate the next 2-hop route based on previous knowledge-base of PDR, delay, jitter, packet loss rate and throughput. In order to cater for dynamic network conditions, the LSTM model at each node is trained after accumulation of a defined number of new network QoS data during transmissions. The retraining of the LSTM model at such time is done to improve on its route estimation as network conditions may keep changing.
Fengxiao et al. [33] applied deep Convolutional Neural Networks (CNN) to construct the deep learning model. Delay and packet loss as parameters were used. The algorithm was implemented in four stages; initial, running, updating and training phases. The path selection and routing judgement was completed in Real-time updating phase. Their Results showed that routers only choose the first path combinations, congestion occurrences reduce greatly to almost negligible by 500 seconds of training and the model avoids congestion by 98.7%. Our deep learning model is developed based on a data set that included network performance metrics such as Packet Delivery Ratio (PDR), delay, Packet Loss Ratio (PLR), jitter and throughput. The learning model used is LSTM. Table 1 shows a comparison of existing deep learning-based routing approaches and traditional routing algorithms considering the metrics used in the route estimation. For the deep learning models, the metrics are those which exist in the data set composition for routing decision making.  [28], [33], [29], [30] and traditional routing approaches [1], [18].
From the table 1 , our deep learning based routing approach uses all the metrics mentioned while the other conventional models use one, two or three QoS metrics for route selection. This implies that our routing solution has a greater level of generalization based on the data used for route decision making.

III. TECHNICAL PRELIMINARIES
A brief overview of the technical machinery used to achieve the objective of this study is presented in this subsection. This is mainly the LSTM. LSTM networks are redesigned or an extension of RNNs that solve a problem of vanishing gradient in RNNS due to their capabilities of learning long-term dependencies. The memory is referred to as a gated cell. The input information can be deleted or preserved; this is determined by the weight value assigned to the information in the training process. An LSTM model is made up of three gates. These are the 49512 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
forget, input and output gates. Forget gate decides on which existing information to preserve or remove. It outputs f t , a value between 0 and 1 where 0 represents get rid of the learned value and 1 represents preserve the learned value. The value f t is computed as shown in the equation below; where b f is a constant bias value.
The input gate decides which information to be loaded in the LSTM memory. It consists of two layers; a sigmoid layer and a tanh layer. The sigmoid layer decides which values need to be updated and the tanh layer creates a vector for new values to be loaded into the LSTM memory. The outputs of the sigmoid layer i t and tanh layer c t are calculated as shown below.
The combination of the two updates the LSTM memory by forgetting the current value using the forget layer through multiplication of the old values (c t−1 ) and then adding a new value i t * c t as shown in the equation below The output gate utilizes a sigmoid layer to decide the content of the LSTM memory that contributes to output. It then carries out a nonlinear tanh function to map values between 1 and -1. The result is then multiplied by the output of the sigmoid layer as expressed in the equation below where o t is the output value and h t is the value between-1 and 1.

IV. PROPOSED APPROACHES A. DATA GENERATION AND LEARNING MODEL TRAINING
Since our approach uses QoS link data which can be got in every point-to-point connection, it is possible to collect and aggregate multi-hop link data for our purpose. However, because we are interested in finding the best route, our routing point or node of decision making must not be one that spans many hops after the previous decision point. It is the reason why in this study, a 2-hop approach of QoS link data collection and routing decision was made to minimize the possible routing errors or delays. Additionally, for purposes of clarity, all nodes in the mesh network are involved in a 2-hop connection hence presenting an end-to-end coverage of the network.
Using the 2-hop approach of data collection, a total of more than 5242 records were collected. For purposes of our experiment, we chose to fix a maximum of 5242 records to be used for training in each router. However, the training takes place after every 50 records are added in a router for purposes of keeping track of the dynamic network environment.
Additionally, when the maximum record size is reached is reached, the oldest 50 records in the data set are replaced by new 50 QoS record in a First-in-First-Out (FIFO) manner. Figure 1 elaborates the data formulation and collection process and Figure2 illustrates the mesh network topology which was used in the experiment for this study. Table 2 shows the list of acronyms used in the description of the proposed approaches. The pseudocode is as follows: 1. Broadcast dummy packets in the entire network and capture performance data on all links at each transmitting node. The referred to performance data between two routers i and j includes s i,j , r i,j , l i,j , D i,j , J i,j t i,j . The composition of data set record at the transmitting router will therefore take the following form R i,j {s i,j , r i,j , l i,j , D i,j , J i,j , t i,j }.
2. Store data values of s i,j , r i,j , l i,j , D i,j , J i,j , t i,j of each link at the sending node by backward mapping in their routing tables.
3. Compose a two hop data set and store the performance data values {s 1 , r 1 , l 1 , D 1 , J 1 , t 1 , s 2 , r 2 , l 2 , D 2 , J 2 , t 2 } on the initial node by backward mapping; where the subscript 1 on each data value represents the first hop and 2 represents the second hop.
4. Populate the two hop data set by repeating step 3 using performance data from dummy data transmissions and real traffic data until a initial minimum data set size of Min records is realized.
5. The class of each record is the two hop link that provides the performance data record during each data capture moment.
6. if the initial minimum data set size is realised, At each router, train the deep learning model using the two-hop dataset.
7. After every interval of 50 individual router transmissions and the maximum data set size of Max records is not yet  realized, continue to populate the two hop data set by using performance data from dummy data transmissions and real traffic data and then going back to step 6. Table 3 shows an example of some of the two hop data set records stored at a router. Please note that some data features such as s and r are not shown in the table due to space to ensure clarity. The target class for each record is defined by R i,j,k where i, j and k are the routers forming the 2 hop links. The target class R 1,3,5 appears twice in the record implying that the data values were captured at different times and the network performance wasn't the same. B. TRANSMISSION MODEL 1. Initially, at the transmitting node, select from the dataset the best QoS record. The best two hop QoS record is one which has the best average of the d, t, D and J .
2. Use the link defined as the class of the best QoS record for the initial 2 hop transmission.
3. If the destination router is not reached, use the QoS data realised in the transmission to feed into the trained model at the router to estimate the 2 hop link which will offer a QoS that matches with the best known QoS link.

C. ASSUMPTIONS
To assure utility of the algorithm, the following realistic assumptions were made.
• Every router is programmed and configured to collect link QoS data which is used for the routing procedure.
• Every router has the capacity and is programmed to train a Deep learning model without compromising on network performance.
• Every router has the capacity and is configured to execute the trained model to estimate the best QoS route. Table 7 below shows the configurations and setup of the simulated wireless mesh network. Table 5 shows the steup of the LSTM deep learning model. The configurations such as learning rate and dropout rate were selected arbitrarily yet are within the conventionally 49514 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  acceptable ranges. An arbitrary selection of different batch sizes was done and experimented on the LSTM model to determine the best performing LSTM model to use for the best route estimation. The different batches of size 200, 300, 375 and 450 provided accuracy results of 89.1%, 91.1%, 95.8% and 93.4% respectively. The batch size of 375 provides the best LSTM model performance. It was therefore considered as the optimized LSTM model used in the further experiments to analyze network performance in the results section.

D. NETWORK TOPOLOGY AND LEARNING MODEL SETUP
For our experiment, network simulator (Ns3) was used to simulate the wireless mesh network from where the QoS data extracted. The deep learning studio (DLS) was used to train and test the LSTM model using the QoS data extracted from Ns3. Ns3 was then used for testing the performance of the estimated routes.

V. PERFORMANCE EVALUATION METRICS
The developed deep learning model was evaluated based on accuracy and cross entropy loss. For purposes of evaluating the efficiency of utilizing the deep learning model in routing decisions, network performance metrics including throughput, PDR, jitter, delay, number of received packets, the number of lost packets, PDR and PLR were used.

VI. RESULTS AND DISCUSSIONS
The average training time for the LSTM model is 15 minutes using GPU resourced computers. It is considered in this study that training is done at intervals determined by when the minimum number of QoS link data records are stored at the router. The testing results after building the LSTM model is on average 50 ms which is good considering the fact that routing decisions must be made quickly.
In order to understand the performance of the proposed algorithm, we explore and analyze results for throughput and packet delivery ratio. We implemented various data rates for both the traditional AODV and the machine learning-based routing approaches on a simulated Wireless Mesh Network. For purposes of having a clear show of the results, Figure 3 and 4 are based on one source and destination to analyze the performance of the different algorithms. We then select other four source and destination pairs in Figure 5 and 6 to show the performance of LSTM. Note that class(x, y) in the Figure 5 represents a source and destination pair where x and y can be any number 1 . . . .n as long as x and y are not the same. Figure 3 shows that initially, as the data rate increases, there is an increase in throughput. A throughput peak value is reached with increase in the data rates after which the throughput value begins to reduce and a tendency of convergence is realized when further increase in the data rate is made. This phenomenon is true for all the routing protocols being tested. The technical reason behind this performance trend is that an increase in the data rate will increase congestion in the communication channel which may lead to packet dropping or even packet collision. This in turn reduces on the throughput.
AODV offers the worst performance compared to all machine learning based approaches. LSTM provides the best performance. It is therefore clear that using routes determined by machine learning approaches provides better throughput in the wireless mesh network.
In Figure 4, packet delivery ratio results are shown. LSTM provides the best PDR and AODV provides the worst performance as the data rate increases. The fundamental reason why the PDR reduces with increased data rate is mainly because of packet collision in the channel. We note that LSTM seems to converge faster than all the other algorithms to a steady PDR at 25 Mbps compared to the rest. AODV, MLP, RF and LR starts convergence at 30 Mbps and 35 Mbps.
It is clear in Figure 5 that for different source and destination pairs, LSTM provides peak throughput at a data rate of 25 Mbps. The peak values of throughput attained by the other approaches apart from LSTM in Figure 3 are got at data rates less than 25 Mbps. This shows that LSTM is superior compared to all the other approaches given different source and destination pairs. Figure 6 on PDR using LSTM for different source and destination pairs generally shows that LSTM convergence takes place at a higher value compared to the other routing approaches seen in Figure 4.
The determination of the best next hop using the machine learning model approaches provided accuracies of 0.97, 0.72, 0.61 and 0.5 for the LSTM model, MLP, RF and LR respectively.
For each evaluated scenario, in Figures 3 to 6, 25 independent replications of configurations were run. The numerical values in the same Figures 3 to 6 represent the mean values of the measured PDR and throughput. The steady state confidence intervals represented in Tables 6 and 7 are based on a 95% confidence. The confidence intervals for both PDR and VOLUME 11, 2023   throughput while using LSTM is the best compared to all the other approaches used in our experiments.
Figures 7a-7d and 8a-8d are boxplots showing the significance of PDR and throughput results respectively. A twotailed t-test was used to find out whether there are differences between the different pairs of algorithms in terms of PDR and throughput based on the mean PDR and throughput values   Figure 3. for each group at a 5% significance level. In Figure 7a, the proposed LSTM-based results are the most competitive compared to all the other approaches. All pairs in Figure 7a have very strong significance apart from the 3 pairs; LSTM-RF, MLP-RF and MLP-AODV. LSTM-RF and MLP-RF both have strong significance while MLP-AODV shows no significance in the results implying that the data does not have indicate any causal effect between them. The no significance results for MLP-AODV at 20 Mbps is not true for the other data rates experimented. This implies that the causal effect exists as the data rate increases. Figures 7b-7d do not present any no significance results between the different algorithm pairs except two pairs with weak evidence (MLP-LR) and moderate evidence (LSTM-RF). It should also be noted that these pairs do not exhibit the same significance at the other data rates. They exhibit strong and very strong significance. This implies that there is a causal effect in the results which cannot be refuted.
The no significance results for throughput results in Figures 8b and 8c are also not persistent for the other data  rates hence a deduction that there is a causal effect and not that the data used were a result of chance.
The means for PDR and throughput in all the boxplots are represented by the horizontal line between the upper and lower quartiles. For purposes of simplicity, the average values for PDR and throughput exhibited in the all the boxplots in Figures 7a-7d and Figures 8a-8d show that the proposed LSTM-based routing approach offers the highest average PDR and throughput compared to all the other routing approaches at varying data rates.

VII. CONCLUSION AND FUTURE WORK
In this paper, we elaborated why route selection is important in wireless mesh networks while particularly focusing on network performance metrics such as packet delivery ratio, throughput, delay and jitter. An analysis of traditional routing algorithms was carried out with the intention of comparing them with machine learning based algorithms which attempt to solve the same challenge of network route selection.
By using a network simulated environment, we developed an algorithm to collect and generate a two-hop QoS-based dataset based on network performance metrics. Using the developed dataset, learning models including MLP, RF and LR and LSTM were trained with the intention of estimating the best next 2-hop link to be used. The performances of the learning based algorithms were compared with AODV which is a traditional routing algorithm. The performance in terms of packet delivery ratio and throughput showed that LSTM performs best and also offers the highest data rate for the peak throughput. The LSTM model also provided the best accuracy in the test and estimation of the best qualified next 2-hop links for transmission.
To the best of our knowledge this work is one of the few studies which applies the use of deep learning systems for route selection and particularly providing a process flow that formulates a network QoS-based dataset.
For future work, reinforcement learning and Deep Q-Networks models can be tested on data derived from our data collection pipeline and network performance checked. Additionally, our learning model can further be tested with multi-hop data to determine the best hope size which provides best network performance.