Connectivity Enhancement of E-VANET Based on QL-mRSU Self-Learning Energy-Saving Algorithm

With the development of smart cities and smart electric vehicles (EVs), the problem of improving the performance of Vehicular Ad-hoc Networks (VANETs) is gradually being emphasized. To improve the network performance of VANETs, some scholars have considered parked vehicles as roadside units, but have not paid attention to the energy consumption characteristics of vehicles, especially electric vehicles. Therefore, in this paper, we propose a QL-mRSU series artificial intelligence energy saving method to optimize the energy consumption of parked electric vehicles during communication. The method is based on electric vehicle self-organizing networks (E-VANETs), which dynamically cluster electric vehicles parked in parking lots by parameters such as traffic flow, number of service demands, and charging index in reinforcement learning, select the most suitable vehicles as mobile roadside units (mRSUs), and adjust the working mode according to environmental changes such as the number of service demands to achieve the effects of self-learning and energy saving. The simulation experimental results show that compared with other energy-based routing algorithms, the method is able to make optimal choices through self-learning with guaranteed communication quality and is more adaptable to traffic flow changes on the road, thus ensuring the stability of energy-saving efficiency. In addition, the method significantly improves the energy structure of electric vehicle parking clusters.


I. INTRODUCTION
Currently, with global electrification, the development of electric vehicles is gradually being emphasized [1]. However, the development of electric vehicles has many problems, the most important of which is the energy consumption. There are many factors that affect the energy consumption of vehicles, such as battery type and capacity, body weight, and size [2]. Therefore, some scholars have proposed a VANET-based electric vehicular ad-hoc network according to the characteristics of electric vehicles, such as energy management, communication interfaces, and energy consumption. [3].
There are two major communication technology paths in the VANET. One is Cellular Vehicle-to-Everything (C-V2X) The associate editor coordinating the review of this manuscript and approving it for publication was Peng-Yong Kong .
technology based on cellular network communication; the other is Dedicated Short Range Communication (DSRC) technology based on Wi-Fi improvement. This is also true in the E-VANET. However, as the number of smart electric vehicles grows, it is clear that DSRC technology cannot meet the communication needs of VANETs. Compared to DSRC, C-V2X has the advantages of higher safety, wider coverage, and more reliable communication connections, which means that C-V2X has a higher cost and power consumption [4]. The C-V2X technology includes two types of interfaces: PC5 and UU. The former is a direct communication interface, which enables direct communication between vehicles, people, and road infrastructure over short distances; the latter is a cellular network communication interface, where terminals need to communicate with each other through a base station, which enables long-distance and a wider range of reliable communication.
In addition, the Wireless Sensor Network (WSN) is also widely used in VANET. WSN mainly through the deployment of sensor nodes that can sense environmental information in real time and complete intelligent data collection and processing, is the key link between the Internet of Things (IoT) sensing layer and the real world to achieve connectivity. Therefore, the intelligent routing algorithm of WSN is also very important. In this paper, we design a WSN intelligent routing algorithm based on having clusters and cluster heads and use parameters such as remaining power and charging index as evaluation indicators for filtering out the most suitable cluster heads.
In recent years, with the continuous development of intelligent transportation systems (ITS), the number of roadside units (RSUs) and on board units (OBUs) in cities has exploded, and the demand for VANET services has increased [5]. In order to relieve the construction pressure of RSUs as well as increase the connectivity of VANETs, many researchers have chosen to use idle vehicles parked in parking lots as mobile RSUs.
Some studies have shown that during VANET communication, the battery of the fuel car is sufficient to support the vehicle to maintain continuous communication with other communication devices for up to 5 days without charging [6]. Therefore, some scholars believe that the energy consumption aspect of parked cars is negligible. However, the number of electronic devices in electric vehicles is much larger than that of fuel vehicles, resulting in the fact that electric vehicles still consume a lot of power during parking. According to the Chinese national standard GB/T-31484-2015, the electric vehicle power battery must meet the standard that when the capacity decays to 80% of the initial value, the cycle test must be greater than 1000 times; or when the capacity decays to 90% of the initial value, the cycle test must be greater than 500 times. This also makes electric vehicles have more stringent standards in terms of battery wear and tear, so their energy consumption is worth noting [7].
In this paper, we study the energy consumption of E-VANET-based electric vehicle parking clusters. The uncertainty of the external environment and various parameters of EVs may bias the results of this study. Therefore, this paper only considers the scenarios of parking in surface parking lots with charging facilities (CF-Parking Lots) and on-street parking lots without charging facilities (NCF-Parking Lots).
In addition, self-learning has an important place in the algorithm of this paper. The self-learning method selected in this paper is reinforcement learning (RL). Reinforcement learning is one of the paradigms and methods of machine learning that describes and addresses the problem of learning strategies to maximize rewards or achieve specific goals during the interaction of an intelligent body with its environment. Reinforcement learning is a reward-guided behavior in which an intelligent body learns through ''trial and error'' and interacts with its environment with the goal of maximizing rewards. Since the external environment provides little information, the reinforcement learning system (RLS) must learn through its own experience. In this way, the RLS gains knowledge of the action evaluation environment and improves the action plan to adapt to it.
In fact, RSUs are not cost-effective to build in large numbers because they are too costly. Therefore, using parked cars in parking lots as mobile RSUs can greatly increase the connectivity of VANETs. However, the power consumption of EVs increases as the length of time that parked vehicles are involved in communication increases. When the remaining power of an electric vehicle is too low, it may not be able to start the vehicle. Therefore, we propose a method that can achieve self-learning dynamic energy saving with guaranteed quality of service (QoS), so that parking clusters can work longer. The main contributions of this paper are as follows: • The problem under study is modeled using a Markov model based on parameters such as vehicles, roads, parking lots, energy consumption, and traffic flow.
• To prevent the electric vehicle battery from being so depleted that it cannot start the vehicle, this paper defines a threshold value below which the battery charge does not participate in the algorithm selection.
• A reinforcement learning approach is incorporated to design separate energy-saving algorithms for parking clusters of EVs parked in CF-Parking Lots and NCF-Parking Lots.
• Based on the external environmental changes, the reinforcement learning algorithm QL-mRSU-Mode dynamically adjusts the state of the mRSU to adapt to the dynamic changes of the E-VANET environment. This paper is organized as follows. Section II discusses related work on VANET. Section III describes the problem of energy saving in parking clusters in detail. Section IV presents the proposed formulation and design of three algorithms: QL-mRSU-Num, QL-mRSU-Choose, and QL-mRSU-Mode. Section V performs simulations and analyzes the simulation results. Section VI concludes the paper and discusses related issues.

II. RELATED WORK
According to parking statistics from two surveys in the downtown area of Montreal, Canada [8], 61,000 vehicles go in search of a parking space every day in a survey area of almost 5,500 square kilometers. The survey report shows that people are much more likely to choose outdoor parking than indoor parking (including garages and underground parking). Among them, the probability of parking in on-street parking is 69.2%; the probability of parking in outdoor street parking is 27.1% and the probability of parking in indoor parking is only 3.7%. According to another survey report [9], the parking time of some special vehicles can even reach 23 hours per day. The number of vehicles parked in the parking lot and the number of vehicles on the road accounted for 95.83% and 4.17%.
In addition to this, the data for the Ann Arbor, Michigan, USA off-street parking lot is shown in Table 1 [10]. The two sites A and B represent two different areas of vehicle VOLUME 11, 2023 density. The time periods represent peak, off-peak, and allday periods, respectively. According to Table 1, the average occupancy rate of parking spaces at site A is 93.5% throughout the day; during the peak period, the occupancy rate of parking spaces at both sites A and B is nearly 100%; during the off-peak period, the occupancy rate of parking spaces at site A is also as high as 79.0%. In other words, a large number of vehicles in the city are parked in parking lots rather than on the road. As early as 2011, Liu et al. [11] pioneered the idea of parked vehicle assist (PVA). The core idea is to use the parked vehicles as static nodes in the VANET and to improve the network connectivity of the VANET by communicating with other nodes in the VANET through radio devices and using batteries as energy sources.
Inspired by Liu et al., Sargento et al. designed a parking cluster algorithm based on multiple aspects such as signal strength as a scoring criterion to switch roles between vehicles [12], [13]. However, the algorithm does not detail the decision process regarding energy usage. Some scholars have also considered buses or cabs as mobile RSUs responsible for forwarding messages from other nodes in the VANET, but the non-stop moving nature of buses and cabs may lead to unstable network connections to other nodes in the VANET [14], [15].
Zhao et al. [16] propose a parking space-based VANET data distribution scheme that enables the data cache to be stored on the parked vehicles at the roadside, providing continuous data distribution services to the passing vehicles. Some scholars have also proposed the idea of parking edge computing, where parked vehicles or surrounding vehicles are used to assist the edge server in offloading task processing, and relevant task scheduling algorithms are designed to select the allocation of resources [17], [18], [19], [20], [21]. However, not many of the above schemes and algorithms have been studied in terms of energy consumption.
In terms of smart routing for WSNs, Hu et al. [22] proposed a customized network topology based on intelligent transportation systems that allows users to obtain traffic and road information via wireless sensor networks directly from local sources within their wireless range rather than remote ITS data centers. Agarwal et al. [23] proposed a scheme for smart data routing in WSNs based on mobile sinks and validated it in terms of stability period and network performance metrics such as network lifetime and throughput. Patel et al. [24] proposed a cross-layer variant of AODV based on energy conservation, replacing the hop count metric with link quality and conflict counting. However, these schemes are not applicable to E-VANETs.
A number of scholars have designed routing algorithms to enhance the connectivity of the network by using parking clusters or mobile vehicles as relay nodes [25], [26], [27], [28]. Liu et al. [29] proposed a parking area-assisted spider web routing protocol (PASRP) based on parked vehicles as relay nodes, selecting the path with the least delay as the transmission path. Zhu et al. [30] designed a new messaging scheme to efficiently transmit messages to target vehicles through the proposed virtual overlay network. Chang et al. [31] proposed an energy-efficient geographic routing algorithm that exploits the direction, density, and distance between nodes in a cross-roads routing strategy to improve link stability. An improved multicast-based energyefficient opportunity data scheduling algorithm has also been proposed to serve selected groups of users by multicasting services at optimal data rates [32]. All these schemes are good at enhancing the connectivity of the network, but they do not pay much attention to the energy consumption aspect.
Moreover, many scholars have done other research on VANET. For example, using aerial drones as relay nodes for VANET to reduce the obstruction of buildings [33], [34], [35]. But this tends to have some impact on the environment. Zhu et al. [36] proposed a reputation-based cooperative content delivery mechanism using the relationship between mobile vehicles, roadside units, and parked vehicles formulated as a two-level auction game model to improve the efficiency and security of content delivery. However, the authors did not address the issue of limited energy.
In fact, many scholars have done a lot of research in the field of energy management for VANET. Some scholars have proposed to install solar panels on the top of electric vehicles or on top of buildings to collect solar energy and convert it into electricity to be stored in batteries [37], [38], [39], [40]. Some scholars have also implanted renewable energy sources into RSUs to reduce their power consumption and extend the life of their batteries [41]. Liu et al. [42] proposed a two-layer optimization model with a hybrid genetic algorithm/particle swarm optimization (GA/PSO) algorithm. Sun et al. [43] presented an optimal approach to enable parked vehicles to provide services in the most energy-efficient way, dynamically exploiting external environmental factors to achieve energy savings and emission reductions. All of the above solutions have energy consumption studies, but are not applicable to E-VANET.

III. PROBLEM STATEMENT AND MOTIVATION
In Section II, several related proposals are summarized and referenced. This paper is based on one of the proposals to use parked vehicles in parking clusters as mobile RSUs to reduce RSU construction pressure and enhance E-VANET network connectivity, as shown in Figure 1.
Electric vehicles parked in parking lots are mainly divided into three categories: the first category is vehicles that are charging or not charging in CF-Parking Lots; the second category is vehicles parked in NCF-Parking Lots; and the third category is vehicles parked in indoor parking lots. Since there are fewer vehicles parked in indoor parking lots and the signal is unstable, it has little effect on the connectivity enhancement of E-VANET, so the research background of this paper is based on the first two cases only, as shown in Figure 2.
Electric vehicles are parked in parking lots in both charging and non-charging states. According to Patt et al. [44], the percentage of vehicles in the CF-Parking Lot with more than 80% remaining energy is about 10.6%; between 60% and 80% is 19.3%; between 40% and 60% is 24.7%; between 15% and 40% is 25.5%; and below 15% is 19.9%. In the NCF-Parking Lot, the proportion of vehicles with more than 80% of remaining energy is 16.5%; between 60% and 80% is 37.9%; between 40% and 60% is 30.1%; between 15% and 40% is 11.5%; below 15% is 4.0%, as shown in Figure 3 and Figure 4. The survey results also show that about 60% of the EVs in the CF-Parking Lot choose to charge, and the lower the remaining charge, the higher the percentage of EVs that choose to charge. For the electric vehicles parked in the parking lot, it is a worthwhile research problem to select some of them to act as mRSUs. In contrast to the problem that the battery does  not get energy after the fuel vehicle is parked, a portion of electric vehicles have the option of recharging to replenish energy in the parking lot, which makes the parking energy saving algorithms proposed by some scholars not applicable to electric vehicles [46]. In this paper, we design respective algorithms for parking clusters in surface parking lots with charging facilities (CF-Parking Clusters) and on-street parking lots without charging facilities (NCF-Parking Clusters) from the perspective of energy saving. It improves the energy-saving efficiency and reduces the wear and tear on the battery while ensuring the quality of communication and continuously optimizing the mRSU through reinforcement learning.
Besides, the vehicle will not be in communication all the time after it is selected, especially at night when there are not many moving vehicles. Because being in communication all the time will waste the power of the parked car. The vehicle executes its own algorithm to continuously optimize the energy structure of the vehicle as the environment changes, at regular intervals, or when the parking cluster structure changes.

IV. FORMULATION AND ALGORITHM DESIGN
In this section, the corresponding algorithms are designed for the number of mRSUs, the selection of mRSUs, and the selection of operating modes, respectively. The core ideas of this paper mainly include the following points: 1) The algorithms all use reinforcement learning algorithms in artificial intelligence, allowing the parking VOLUME 11, 2023 clusters to learn on their own and make better choices based on the changing environment. 2) Based on the traffic flow and the parking cluster status of the parking lot, etc., we determine whether a parking cluster is suitable as a relay node, and select the appropriate number of mRSUs.
3) The electric vehicles in the selected parking clusters are scored and compared based on their condition, and the better vehicle is selected as the mRSU. 4) The selected vehicle can dynamically adjust its working mode according to the change in environment to achieve an energy-saving effect. In reinforcement learning, one of the most classic and widely used algorithms is the Q-Learning algorithm. The Q-Learning algorithm is the value-based algorithm in the reinforcement learning algorithm. Q is Q(s, a), which is the expectation that taking action a (a ∈ A) in state s (s ∈ S) at a certain moment in time can lead to a gain, and the environment will be based on the action of the intelligent body's feedback and the corresponding payoff. So the main idea of the algorithm is to construct a Q-table of states and actions to store Q-values, and then select the action that can obtain the maximum gain according to Q-values. Specifically, the Q-value is updated by the following formula: where Q(s, a) represents the current Q-value of state s when action a is chosen. At some state s, the intelligent body selects an action a, which finds the maximum possible Q-value in the next state s ′ . The current Q-value is updated assuming that a is used. The discount factor γ (γ ∈ (0, 1)) represents the decay value, the larger the value of the discount factor γ , the more the Q-Learning algorithm will focus on future rewards; conversely, the smaller the value of γ , the more the Q-Learning algorithm will consider the immediate benefits. The learning rate α(α ∈ (0, 1)) is used to determine how much of the error is to be learned this time.
Since the core algorithm of Q-Learning is to select the optimal solution by continuous learning, and the purpose of this paper is to make the parked vehicles select the most suitable mRSU by themselves to achieve energy saving, so the Q-Learning algorithm is the most suitable algorithm for this study. Therefore, the three algorithms proposed in this paper are designed based on the Q-Learning algorithm, which are named as QL-mRSU-Num, QL-mRSU-Choose, and QL-mRSU-Mode algorithms, respectively.

A. QL-mRSU-NUM
In this paper, it is necessary to select the appropriate mRSU according to the environment and the condition of the vehicle itself. But before that, the number of electric vehicles acting as mRSUs in different parking clusters should be selected based on the main factors. The main factors are as follows: 1) Traffic flow: the traffic flow is different on different roads and at different times of the day. Therefore, the traffic flow is an evaluation indicator for the selection of mRSU. 2) Service demand quantity: in general, the traffic flow is proportional to the service demand quantity, as shown in Figure 5. On the road, not every vehicle needs to access mRSU, the number of mRSU connections should be selected according to the service requirements of the vehicle. 3) Charging facilities in the parking lot: mRSUs in NCF-Parking Lots can only consume their own power and cannot replenish it. However, in CF-Parking Lots, mRSUs can choose to use charging facilities, so the charging situation is also a very important factor. According to the previous description, it is first assumed that the movement of mobile vehicles above the road follows Poisson distribution, and the number is M v ; the vehicles parked in the parking lot and roadside are randomly distributed; the length of the experimentally designed road is D; and the maximum communication coverage radius is R. From this, the distance between two adjacent vehicles is exponentially distributed and follows the parameter 1/λ.
Because moving vehicles on the road follow the Poisson distribution [43], [44]. Therefore, the probability of having x moving vehicles on the road can be calculated by formula (2).
According to the law of Poisson distribution, the average number of vehicles traveling on a road section of length D can be calculated using formula (3).
Assume that the average energy saving efficiency is ω; P avg represents the average power of all parked vehicles; P all represents the power when all parked vehicles are on communication, except for parked vehicles with remaining power below the threshold; P on represents the power when parked vehicles are on communication; and P off represents the power when parked vehicles are at rest. M work is the number of parked vehicles with communication equipment turned on; M rest is the number of parked vehicles in the rest state; and M close is the number of parked vehicles with remaining power below the threshold to enter the close state. Therefore, the formulas of P avg , P all and M rest can be expressed by (4), (5), and (6). where P a denotes the probability that the driving vehicle communicates directly without the intervention of the mRSU, and P b denotes the probability that the driving vehicle needs to communicate with the mRSU. Since the experiments are based on the condition that the driving vehicle communicates with the mRSU directly. Therefore, in this paper, P a = 0 and P b = 1, from which the formulas of the average energy saving efficiency can be expressed by (7) and (8).
Furthermore, formula (1) shows that the reward R plays an important role. The average traffic flow V veh , the traffic flow veh, the service demand quantity demand, the number of charging facilities in the parking lot N fac , and the number of charging facilities in use N use are the main components of R in this section, as represented by formula (9), where a, b, and c are weighting factors.
where a + b + c = 1. Thus, the flowchart of the QL-mRSU-Num algorithm as shown in Figure 6. The specific algorithm is shown in Algorithm 1, and the key parameters are shown in Table 2.

Algorithm 1 QL-mRSU-Num
Input: (1) Parameters for reinforcement learning α, γ , epsilon; (2) Traffic flow veh, parking quantity N park , service demand quantity N demand , daily average traffic flow V veh , parking lot charging facility quantity N fac , number of vehicles not participating in communication N close , number of charging facilities in use N use , threshold W ; (3) The weights a, b, c. Output: Number of vehicles in this parking cluster that need to act as mRSUs, N mRSU . 1: for each parking cluster do 2: N close = 0, N mRSU = 0; 3: for each parked car do 4: Collect remaining electricity Energy; 5: if Energy ≤ W then 6: N close = N close + 1; 7: end if 8: end for 9: Calculate R according to formula (9); 10: Run Q-Learning; 11: Select the optimal number of mRSUs, N mRSU ; 12: if N mRSU > N Park -N close then 13: Ignore this parking cluster; 14: end if 15: end for 16: return N mRSU . The key parameters of QL-mRSU-Num. VOLUME 11, 2023 B. QL-mRSU-CHOOSE Scenario analysis: in this paper, based on a survey and study of relevant parking lots, parking clusters in two types of parking lots are mainly considered.
In the first scenario, the parking lot is a temporary parking lot located on the side of the road. In this type of parking lot, electric vehicles cannot be replenished with energy due to the absence of charging facilities. Therefore, this type of parking cluster is characterized by the following main features: 1) Short parking time. According to a survey study [44], 57% of the parking time in on-street parking lots is less than 40 minutes; 70% of the parking time is less than one hour; only 9% of the parking time is more than 3 hours; and the average parking time is 42 minutes. 2) The initial average power of parked electric vehicles is generally high. Figure 4 shows that 54.4% of the EVs parked on the roadside have a remaining energy of more than 60%, while only 4% of the EVs have a remaining energy of less than 15%.
3) The parking space utilization rate is high. According to a research study by Adiv et al. [10], the utilization rate of parking spaces in all participating on-street parking lots was 100% during peak hours. Even during off-peak hours, the utilization rate of parking spaces can reach about 80%. Thus, the incentive mechanism for the first type of parking cluster can be represented by formula (10).
where a, b and c represent the weighting factors, a + b + c = 1. Energy represents the remaining energy of electric vehicles; V Energy represents the average remaining energy of this parking cluster; veh represents the traffic flow within the communication range of this parking cluster; Demand represents service demands quantity; N mRSU represents the number of mRSUs needed for this parking cluster as derived from Algorithm 1; N park represents the total number of parked cars in the parking cluster; and N close represents the number of vehicles with remaining energy below the threshold in the close state.
In the second scenario, the parking lot is a surface parking lot with charging facilities. In this type of parking lot, the main differences and characteristics from the previous type of parking lot are as follows: 1) Have charging facilities. In CF-Parking Lots, the range of parking lots is more concentrated, and most of them are near large commercial facilities, making it convenient to build charging facilities. 2) Long parking time. According to the survey report [47], 19% of the parking time in the parking lot is less than one hour; 48% of the parking time is less than three hours; 40% of the parking time is more than four hours, and the average parking time is 327 minutes.
3) The initial average power of parked vehicles is low. From Figure 3, the percentage of EVs with more than 60% remaining energy is 29.9%, while the percentage of vehicles with less than 15% remaining energy is 19.9%.  Thus, according to the difference between CF-Parking Lots and NCF-Parking Lots, the reward mechanism of the second type of parking clusters is represented by formula (11).
where a, b, c and d denote the weighting factors, a + b + c + d = 1. I charge (I charge ∈ (0, 1)) is the charging index.
In summary, different reward mechanisms need to be designed according to the different features to select a more suitable mRSU. The algorithm flow is shown in Figure 7, the specific algorithm is shown in Algorithm 2 and the key parameters are shown in Table 3.

Algorithm 2 QL-mRSU-Choose
Input: (1) Parameters for reinforcement learning α, γ , epsilon; (2) Traffic flow veh, cycle time T , parking quantity N park , service demand quantity N demand , daily average traffic flow V veh , number of charging facilities in the parking lot N fac , Charging index I charge , number of charging facilities in use in the parking lot N use , number of mRSUs in the parking cluster N mRSU , number of vehicles not participating in communication N close , threshold W ; (3) The weights a, b, c, d. Output: Status of parked vehicles. 1: while (parking clusters change) or (go to the next cycle T ) do 2: for each parked vehicle do 3: Vehicle initialization; 4: if N mRSU ≥ 1 then

14:
Turn on the communication device and become new mRSU; 15: Remove the vehicle from the next algorithm selection; 16: continue; 17: end if 18: end if 19: end if 20: if N mRSU = 0 then 21: No more mRSUs are added to this parking cluster; 22: break; 23: end if 24: end for 25: end while 26: return Status of parked vehicles.

C. QL-mRSU-MODE
In a parking cluster, this paper sets the mRSU in a cycle with the following three states: 1) ''Work'' mode: after the parked vehicle is selected as an mRSU, the communication device is turned on to process or forward the information from other vehicles. 2) ''Rest'' mode: the parked vehicle turns off the communication function and only turns on the listening function, listening to the information from the moving vehicles, and operates with low power consumption. 3) ''Close'' mode: In this mode, the vehicle turns off all the functions, only retaining the necessary functions such as the alarm function, in order to reduce battery consumption. During the selected period in the previous section, the selected parked vehicles do not need to be in ''Work'' mode all the time during a cycle, but can switch between ''Work'' mode and ''Rest'' mode according to the change in the external environment. Once the remaining energy of the vehicle is below the threshold, the vehicle will directly enter the ''Close'' mode, and all communication devices will be turned off. Until the vehicle replenishes its energy to levels above the threshold, at which point it can continue to participate in the mRSU selection.
Therefore, according to the previous description, each cycle T can be expressed by formula (12). (12) where t work indicates the time that the vehicle is in ''Work'' mode during a cycle and t rest indicates the time that the vehicle is in ''Rest'' mode during a cycle. The energy consumed in a cycle T , W t , can be expressed using formula (13). W t = P on t work + P off t rest (13) where P on indicates the power when the parked vehicle turns on the communication device, P off indicates the power when the parked vehicle is on standby.
The duty cycle f can be expressed by formula (14).
Therefore, the duty cycle energy efficiency formula can be derived and expressed by formula (15).
Based on the communication scenario described earlier, combined with the Q-Learning algorithm, this paper designs the action selection of mRSUs to be determined by the number of time slots within a frame. In each mRSU, a set of Q-values are stored, and each Q-value can be coupled to a specific time slot within each frame. The Q-value represents the reward obtained by the mRSU when it is in ''Work'' mode. During the communication process, specific events occur in the same time slot of each frame, and the Q-value is updated VOLUME 11, 2023 as the specific events occur. In addition, the rewards obtained by the mRSU are related to the state information of neighboring nodes. Specifically, each Q-value update formula for each mRSU can be represented by formula (16).
where Q i s (f ) ∈ [0, 1] denotes the current Q-value associated with the slot s on a given frame f . Q i s (f + 1) denotes the Q-value of the next frame f +1 of frame f , i.e., it is the updated Q-value associated with the same slot s, and α ∈ (0, 1) is the learning rate.R i s (f ) is the reward obtained in association with a slot s on a certain frame f .
In this paper, we consider setting the discount factor γ in formula (1) to 0, i.e., no future rewards are considered, only the most recent ones. This approach requires a suitable reward function that can consider both mRSU as well as the communication of its domain nodes. Thus, after considering the traffic load aspect, R i s (f ) can be represented by formula (17).
where RES denotes the number of received but unsent packets, TOT denotes the total number of packets received by mRSU in the time slot of a certain frame f , P i represents the number of packets to be broadcast by mRSU in the time slot, S j represents the number of packets sent by neighboring node j to mRSU in the time slot, and N j represents the set of neighbors of node j. a, b, and c represent different terms of the corresponding weighting functions, respectively.

Algorithm 3 QL-mRSU-Mode
Input: (1) Parameters for reinforcement learning α, γ , epsilon; (2) The number of packets heard by the mRSU RES, the total number of packets received by the mRSU in the frame TOT , the number of packets to be broadcast by the mRSU during time slot s P i , the number of packets sent by neighboring nodes to the mRSU during time slot s S j , the set of neighbors of the mRSU N j , the remaining energy threshold W , open communication threshold threshold; (3) The weights a, b,  if Energy > W then 8: Initialize the Q-value of mRSU to 1; 9: Calculate R According to formula (17); 10: Run Q-Learning; 11: if Q i s (f) ≥ threshold then 12: Keep the mRSU in ''Work'' mode; 13: else 14: Change the mRSU to ''Rest'' mode; 15: end if 16: end if 17: end for 18: return Operating state of mRSU.
After setting the duty cycle reward function of the mRSU, the Q-value of each mRSU node is set to 1, i.e., all mRSUs turn the communication device on for the whole frame. In the subsequent learning process, the Q-value changes with the reward function until a certain threshold is reached, and the mRSU changes from ''Work'' mode to ''Rest'' mode and goes to sleep. The specific process can be expressed by formula (18).
where threshold represents the threshold of Q-value, Energy represents the remaining energy of mRSU, and W represents the energy threshold.
According to this formula, if the Q-value at a time slot s is below the threshold, then the mRSU will switch to ''Rest'' mode for the entire time slot. On the contrary, the mRSU should be in the ''Work'' mode during the whole time slot. The algorithm flow chart is shown in Figure 8, the specific algorithm can be found in Algorithm 3, and the key parameters are shown in Table 4.

D. SUMMARY OF QL-mRSU SERIES ALGORITHM
In the above three subsections, we describe the three algorithms of QL-mRSU in detail and give the relevant parameters, algorithm flowcharts, and pseudo-code for each algorithm. In this subsection, we link the three algorithms into one that can be run for every parked car, and the specific algorithm can be found in Algorithm 4.
In terms of computational complexity, the execution time of the algorithm is proportional to the number of executions per line of code, which can be expressed by formula (19).
where T (n) denotes the total algorithm execution time, f (n) denotes the total number of code executions per line, and n tends to denote the size of the data. We performed calculations for three algorithms, QL-mRSU-Num, QL-mRSU-Choose, and QL-mRSU-Mode, and verified that the Q-Learning algorithm needs to be executed only once in each QL-mRSU algorithm, so its time complexity is O(n) in all cases.

V. SIMULATIONS AND ANALYSIS
The simulations presented in this paper are based on the energy-saving scheme from the previous section, bringing in the relevant parameters. The simulation results are mainly compared based on the comparison method proposed by Sun et al. [43]. The simulation experiment is divided into three main parts: the first part is to simulate the number of mRSUs needed for the parking cluster for parameters such as the service demand quantity; the second part is to select the corresponding mRSUs and simulate the energy saving efficiency for parameters such as running time, density of parked vehicles, and service demand quantity; the third part is to switch the working mode according to the external environment and compare the various working modes. In order to ensure the stability and reference of the experiment, this paper sets key parameters such as road section length D, communication radius R, communication diameter L, communication power consumption P on , sleep power consumption P off , maximum connection number C max , stable connection number C sta , maximum electric vehicle energy Energy, remaining energy threshold ratio W and charging power W charge , as shown in Table 5.
The battery capacity of electric vehicles decreases with years of use. According to the Chinese national standard Algorithm 4 Summary of QL-mRSU Series Algorithm Input: QL-mRSU series algorithm parameters Output: Operating state of mRSU. 1: for each parking cluster do 2: while (parking clusters change) or (go to the next cycle T ) do 3: N close = 0, N mRSU = 0; 4: for each parked car do 5: Collect remaining electricity Energy; 6: if Energy ≤ W then 7: N close = N close + 1; 8: end if 9: end for 10: Calculate R according to formula (9); 11: Run Q-Learning and Select the optimal number of mRSUs, N mRSU ; 12: if N mRSU ≤ N Park -N close then 13: for each parked car do 14: Vehicle initialization; 15: if N mRSU ≥ 1 then 16: Collecte remaining power Energy; 17: if Energy ≤ W then 18: Turn off the communication equipment; 19: else 20: Calculate R according to formula (10) or (11); 21: Run Q-Learning; 22: if Q-Valve =MAX Q-Valve then 23: N mRSU = N mRSU -1; 24: Set the vehicle to mRSU and remove it from the next algorithm selection; 25: Initialize the Q-value of mRSU to 1; 26: Calculate R According to formula (17); 27: Run Q-Learning; 28: if Q i s (f) ≥ threshold then 29: Keep the mRSU in ''Work'' mode; 30: else 31: Change the mRSU to ''Rest'' mode; 32: end if 33: end if 34: end if 35: end if 36: if N mRSU = 0 then 37: No more mRSUs are added to this parking cluster; 38: end if 39: end for 40: else 41: Ignore this parking cluster; 42: end if 43: end while 44: end for 45: return Operating state of mRSU. VOLUME 11, 2023  GB/T-31484-2015, when the battery capacity is lower than 80% of the original capacity, the battery has reached the upper limit of its service life. Therefore, in the simulation, the relevant data also needs to consider battery depreciation.

A. THE NUMBER OF mRSUs
In Figure 9, the horizontal coordinate is the service demand quantity, and the vertical coordinate is the number of mRSUs selected for a particular parking cluster. The algorithm is able to select 1, 2, 3, and 4 parked electric vehicles to act as mRSUs stably when the service demand quantity    Figure 10 mainly shows the changes generated by the algorithm in the selection of the number of mRSUs after the change in the service demand quantity. The different color curves represent the probability of selecting different numbers of mRSUs. From the figure, it can be seen that as the number of service demands increases, the probability of selecting a smaller number of mRSUs for this parking cluster decreases, while the probability of selecting a larger number of mRSUs increases. From the above results, it can be seen that this algorithm can increase the number of mRSUs more steadily with the increase of service demands to ensure communication quality.

B. THE SELECTION OF mRSUs
In this section, the algorithms of the CF-Parking Cluster and the NCF-Parking Cluster are simulated and analyzed, respectively, while keeping the parameters unchanged. Figure 11 and Figure 12 show the comparison of the running time and the average remaining energy for the CF-Parking Cluster and the NCF-Parking Cluster, respectively. The horizontal and vertical coordinates are the running time and the average remaining energy, respectively. It can be seen from Figure 11 and Figure 12 that the average remaining energy is higher for the dynamic operation than for the continuous and full operations, which indicates that the algorithm has a better energy-saving effect for the CF-Parking Cluster and the NCF-Parking Cluster.
In Figure 13, the relationship between simulation time and energy saving efficiency is mainly simulated during the peak period. The horizontal coordinate is the simulation time, and the vertical coordinate is the energy efficiency. With a parking cluster density of 0.02 veh./m, the energy efficiency of the CF-Parking Cluster is about 82%, and that of the NCF-Parking Cluster is about 55%; with a parking density of 0.05 veh./m The energy efficiency is about 96% for the CF-Parking Cluster and 86% for the NCF-Parking Cluster. From the simulation results, it can be seen that the energy efficiency of the CF-Parking Cluster is generally higher than that of the NCF-Parking Cluster at both low and high densities. In the same parking lot, the higher the density of the parking clusters, the higher the energy efficiency. In addition, as the simulation time increases, the corresponding energy saving efficiency basically remains the same, which also verifies the stability and robustness of the algorithm. Figure 14 mainly shows the effect of the density of two different parking clusters on the energy efficiency during the peak period. From the simulation results, the parking cluster cannot be selected as a communication node until the parking cluster density is 0.002 veh./m. Between the densities of 0.002 veh./m and 0.01 veh./m, the energy efficiency of dynamic operation increases with the parking cluster density, while after 0.01 veh./m, the energy efficiency curve no longer increases with the parking cluster density, which indicates that when the parking cluster density reaches 0.01 veh./m, the Y. Feng et al.: Connectivity Enhancement of E-VANET Based on QL-mRSU Self-Learning Energy-Saving Algorithm   parked vehicles acting as mRSU begin to saturate, and the energy efficiency will not change significantly. In addition, regardless of the parking clusters, the energy efficiency is higher in the mode of dynamic operation than in continuous operation.
The simulations shown in Figure 15 and Figure 16 are designed to demonstrate the effect of the service demand quantity on the energy efficiency at different parking cluster   densities. The simulation results show that the energy efficiency of both types of parking clusters does not increase with the service demand quantity in the dynamically running algorithm. By comparing the two simulation results, it can be determined that the higher the density of parking clusters in both operating conditions, the higher the energy efficiency will be, and the energy efficiency of the dynamic operation algorithm is always higher than the energy efficiency of the continuous operation.

C. THE OPERATING MODE OF mRSUs
This section simulates the effect of the external environment on the operating mode of the mRSUs after the corresponding mRSUs are selected by the algorithm in the previous section.
In Figure 17, the variation in duty cycle of mRSU is simulated for two types of parking clusters over a 24-hour period. Where the horizontal coordinate is the time and the vertical coordinate is the duty cycle. The simulation results show that the duty cycle of the two types of parking clusters is almost the same, which also indicates that the differences in the average remaining energy and the structure of the remaining energy of the two types of parking clusters are not directly related to the duty cycle.
As mentioned earlier, in a dynamically operating model, both ''Work'' mode and ''Rest'' mode are mainly determined by the external environment. Therefore, in this section, external environment variables such as the service demand quantity, parking cluster density, and cycle time are simulated to examine their influence on the choice of mode. Figure 18 represents the effect of the service demand quantity on the duty cycle for the parking cluster at different cycle periods. As the service demand quantity increases, the duty cycle of the parking cluster shows a more linear increase until it reaches 1.0. It can be concluded that the mRSU also increases the duty cycle as the service demand quantity increases. The simulation results also show that the higher the cycle times, which also means that there are fewer vehicles that can become mRSUs, the longer the mRSU is in operating mode, which also leads to a higher duty cycle.
In Figure 19 and Figure 20, the relationship between the density of parking clusters and the duty cycle is mainly simulated for different cycle times during a certain time of day and night. It can be seen from the figures that the duty cycle of the parking clusters during the daytime is basically larger than that of the parking clusters during the nighttime at various parking densities under each cycle. There is no   data for this parking cluster until the parking cluster density reaches 0.002 veh./m, which also indicates that the parking cluster does not have enough parked vehicle density to cause it to not be selected as a relay node. At a certain time of the day, when the parking cluster density is between 0.002 veh./m and 0.004 veh./m, the duty cycle is 1.0 and the mRSU is always in working mode; while at a certain time of the night, the same parking cluster density is 0.002 veh./m, but the mRSU is not always in working mode. When the parking cluster density is between 0.004 veh./m and 0.016 veh./m, the duty cycle decreases, which also indicates that as the parking cluster density increases, the number of vehicles that can be replaced increases and the mRSU does not need to be in operation all the time. After the parking cluster density reaches 0.0016 veh./m, the duty cycle no longer decreases with the increase of the parking cluster density, which means that the parking cluster is saturated and the increase of parked vehicles will not decrease the duty cycle of mRSU anymore.

D. NETWORK PERFORMANCE SIMULATION
In this section, we perform network performance simulations for the QL-mRSU series of energy-efficient algorithms. SUMO is used as the road, building, and vehicle simulator to realistically simulate the local road network. OMNET++ is used as the network simulator for vehicle networking. VEINS is used as the infrastructure, and the parameters of radio propagation, application layer nodes, MAC layer nodes, and mobile modules are modified on this infrastructure to make it closer to the experimental environment of this paper. The running interface is shown in Figure 21. The steps of the simulation of the text design are mainly the following steps. Firstly, the road network information is built using SUMO and connected with OMNET++, and the relevant network parameters of OMNET++ are adjusted. Secondly, the network performance indexes under full  operation, continuous operation, and dynamic operation are tested, respectively, according to the results of algorithm selection. Finally, the delay and packet loss rate under the three operation modes are compared, and the results are obtained, as shown in Figure 22 and Figure 23.
As can be seen in Figure 22 and Figure 23, there is no significant difference in time delay or packet loss rate in any of the three modes of operation. In terms of packet loss rate, the dynamic operation mode only increases slightly at the beginning of the communication and then starts to decrease. In terms of delay, dynamic operation increases slightly compared to the other two modes, but the increase is small and even negligible compared to the energy savings. The experiment verifies that the algorithm proposed in this paper does not have a significant impact on communication.

VI. DISCUSSION AND CONCLUSION
This paper focuses on the continuous optimization of mRSU by self-learning in different types of parking clusters for smart electric vehicles to reduce the communication energy consumption in E-VANET. The energy-saving scheme has three main steps. First, the smart EVs parked in the parking lot are clustered, and an algorithm is used to determine whether the parking cluster is suitable as a communication node. Then, the number of mRSUs in that parking cluster is determined by the QL-mRSU-Num algorithm itself. Second, after a parking cluster is selected as a communication node, the QL-mRSU-Choose algorithm selects the most suitable mRSU based on the external environment and its own parameters, and the rest of the vehicles enter standby listening mode to achieve energy saving. This paper also proposes a QL-mRSU-Mode algorithm that decides the working mode of the parked car by listening to packets. The research results show that dynamic operation can lead to longer service times for the parking clusters as well as less battery loss compared to full operation and continuous operation approaches. Compared with the energy saving algorithm proposed by Sun et al. [43], the algorithm in this paper integrates reinforcement learning with some electrical characteristics of electric vehicles, which is not only more applicable to E-VANETs but also more intelligent, and can be more effective to better reduce the communication energy consumption of E-VANETs while ensuring the communication quality.
In the next step, we will continue to optimize the algorithm for rapid changes in the service demand quantity and other extreme cases, and improve the optimization speed of the algorithm to achieve high and stable energy efficiency in the face of most road conditions.