Effective Routing in Vehicular Adhoc Network (VANET) using an Bio-inspired Algorithm: Enhanced Deep Reinforcement Learning (EDRL) for Secure Wireless Communication

For improving the performance of city wide-ranging lane networks through the optimized control signal, we proposed a framework in Vehicular Adhoc Network (VANET). Node which reduces the traffic efficiency drastically is identified as critical node, with the help of defined framework. Tripartite graph is used for identifying critical node through vehicle trajectory in the over-all viewpoint. Enhanced Deep Reinforcement Learning (EDRL) method is introduced to control the traffic signal and gives appropriate decision for routing the data from Road Side Unit (RSU) to intermediate or destination node. Various experiments were done with proposed model and the result shows considerable efficiency in delay and travelling time of the node in VANET.


Introduction
In VANET optimized control signal for dynamic and effective routing is a fascinating problem and various strategies were planned for the improvement of traffic efficacy. Reinforcement learning mechanism can dynamically learn the state of network and traffic, accordingly it will take the decision for effective routing [1]. In order to reduce the congestion occurrence in network traffic an adaptive method based signal control is used based on the real-time and it optimizes the path using reinforcement learning [2]. Vehicular traffic is optimized considering the parameters like planning and earmarking then agent gives the reward and transition probability is used [3]. Relationship between control signal (agent) and their consequences (effect) will helps to improve the efficiency comparing to convolutional scheme in network traffic [4]. For reducing travelling time and delay obtained by vehicle, a multi-agent architectural scheme is introduced and sensor that collects historic traffic data from all junctures, that computes the delay based on parameters like threshold value and weight [5]. For handling close loop optimization problem in an adaptive signal controlled VANET, Reinforcement Learning method is introduced, which gives more efficiency than predetermined controller [6]. Various performances like total round trip time, average time, and waiting time were optimized by consistent configuration of signal at all intersection points in VANET by multi agent system.
Moreover increasing data flow rate, avoiding accidents at domestic area, imposing node to move with modest velocity for minimal fuel ingesting were done [7]. Modified reinforcement algorithm is introduced with mathematical function to improve the performance of reinforcement learning in the parameters like average objective function and average no of learning episodes with various dimensions [8]. For large distance travel, time prediction is an effective method for managing the traffic, in [9] Gradient Boosting [GB] method is introduced in this work for time prediction. Actually GB is used for smaller distance time prediction, here additional parameters like time, day, week and tolls etc were considered and Fourier filtering is used to avoid noise and applied for larger distance. Deterioration and grouping problems are solved by integrating regression tree and boosting scheme without fluctuating the factors to minimize loss function. It also indicates how much amount the predicted value deviates from the accurate value [10].

Related Works
In VANET when nodes are connected in an urban area, collecting locations of the vehicle, velocity and signal strength can be easily done. Here genetic algorithm based adaptive signal regulator system is used for dynamic traffic signal optimization [11]. There is a difference between driver operated vehicle and automated vehicle for safety, in prior case pre-defined information will be known by operator but in the case of automated vehicle injecting historic data with high definition is hectic task. It is handled with location based decision making [12]. In [13] instantaneous amalgamation of vehicle and Road Side Unit (RSU) is done based on the virtual path. Dynamically vehicle change from its lane, merge or weave, which can be done through dynamic joining assistance and its node-gap combining can be done with the help of relationship between node and virtual trajectory. In heterogeneous VANET environment occurrence of delay is minimized by implementing machine learning with software defined networking (SDN). Mobility of vehicle is predicted by using centralized forwarding method which is assisted by SDN controller [14]. [15] Identifying congestion and extracting correlation in VANET traffic is an important concern for optimized routing and is done through floating car data method which considers two main parameters like speed limit of the road and current vehicle velocity.
This method exposes the understanding of urban traffic flow prominence and congestion dissemination, also it supports for the prediction of node travelling speed, estimation of travelling time and analysis about the data reliability. In congested urban traffic, selecting particular vehicles which are supporting for data transmission currently and estimating the performance of these selective vehicles are done through hysteresis spectacle [16]. Based on seven categories that affects travel time of node in urban area is analysed and extracted using contextual data and semi unsupervised learning method is incorporated to train architecture positively [17]. In [18] traffic video analysis is done through incorporating artificial intelligence scheme. Large amount of video data is received, analysed and handled without any loss using directed transmission scheme also, extensions of travelling time is measured in vehicle traffic. Driver behaviour is collected using on-board unit and Global Positioning System (GPS) fixed in vehicle. Six comparable curvatures, velocity, acceleration and side viewpoint of driver are grouped as clusters and it is optimized incorporating Hidden Markov Model [19]. In [20] Deep Reinforcement Learning algorithm is used for timing for traffic signal. This algorithm supports for learning about sampled data using neural network. Proper signal timing is modelled based on system conditions.

Deep Reinforcement Learning (DRL)
Machine learning is classified as various types like supervised, unsupervised and Reinforcement Learning. RL is a method that receives information/data in environment which is surrounded by an agent. RL is particularly used for complex issues that need to be addressed dynamically. In RL pre-processed data are not used for decision making, like supervised and semi supervised learning. Agent gets penalty or reward for its every action by the environment. By the repeated action, agent learn to improvise its performance to get more reward and by the same time it tries to reduce its penalties. While incorporating tripartite graph to identify the critical node in intersection of the network, which makes supplementary help to get the optimistic solution and this becomes EDRL. Since identifying critical node it reduces decision making time at RSU. When more no of nodes are participating in network reinforcement learning takes more time for decision making that drawback is overcome by EDRL.
Where node move from state s to s' is the transition probability and PQ represents the distribution of probability. In (1) In (2) ( , ) is the collection of experience that may be reward or penalty because of action at current state. ( , ) is an array, that collects the experiences and uses for decision making. State policy can be expressed as, Here in (3) is the state policy, for performing an action ɑ in the state s it provides the probability value. By considering the discount factor, long term return of the state is represented and it is given as, Where in (4) M represents long term return, denotes discount factor and time is represented by t, respectively.

Critical node discovery
Intersection Road Rank (IRR) is considered for discovering critical node discovery. Comparing to various available methods, IRR takes geographical information of VANET and collects all possible node trajectories through the Global Positioning System (GPS) which is mounted on vehicle. Due to external interference (noise) GPS may deviate from expected road segment that is mapped again to corresponding road segment by matching. Addition to that Source and Destination pair (SD-pair) information is obtained through disjoining city map with sophisticated road divisions using area divider. Tripartite graph helps to create journey statistics of all nodes based on sections and processed trajectory of all participating nodes that contains three types SD-pair, trajectory and juncture.

Fig 2. Tripartite graph
Above graph shows the tripartite graph used for the scenario where n no of nodes are participating in VANET in colossal road juncture. Here identification of critical road and appropriate intermediate node for communicating between the source and destination node is cited.

Problem Definition
Consider the situation where N no of nodes{ 1. 2, 3, … . } are participating in urban area VANET, where inflow and outflow of the nodes are dynamic. Since nodes are dynamic in nature, communicating data between these nodes are highly complex. In order to predict the node stability in dynamic environment RL is considered where ( , ′ ) is used to calculate the transition probability, each state procedure is estimated with ( , ) and long term return is evaluated with the equation M = ∑ ∞ =0 . Additionally critical node is identified in the congested environment when more no of nodes are participating in the network. Tripartite graph supports for identifying critical node by considering three parameters{ , , }.  In each and every increment in participating nodes routing overhead also increases gradually. But comparing to other existing protocols, our proposed protocol EDRL gives reduced routing overhead. Velocity of the vehicle too increased gradually and estimated the routing overhead in all the protocols.  Enormousness of the speed of a node is equal to or less than speed of light. This proposed algorithm gives less amount of latency equating to prevailing protocols.

Conclusions
In VANET identifying an efficient path for transmitting our data to destination with optimized decision making is complex scenario. Since mobility of the node is unpredictable in this environment. Implementing DRL algorithm in VANET can able to handle dynamic scenario but still when the participating node is increased gradually, providing optimized solution for path identification is not more effective. For handling this situation incorporating tripartite graph with DRL identifies critical node, which can be avoided. Through this EDRL selecting an optimal path for reaching destination with the help of effective decision making is done. We examined various routing algorithms like QL, DRL, DQRN with our proposed algorithm EDRL, which gives an improved efficiency in three parameters routing overhead, packet delivery ratio and latency.

Conflicts of interest/Competing interests
There is no conflict of interest from all the authors in the manuscript.

*Availability of data and material
Not