Energy Consumption Optimization of Edge Computing Based on Reinforcement Learning

Cloud computing faces the problem of over-centralization, where the distance between users and cloud servers is very far. The communication between users and clouds faces long communication latency and high computational energy consumption. Edge computing is a hot research topic in academia now. Edge computing sinks cloud computing nodes to the edge, allowing users to offload tasks directly to edge servers. Compared with cloud computing, edge computing is closer to the user side. The communication between users and edge servers has lower transmission latency and less energy consumption. In order to better exploit the features and promote the development of edge computing, in this paper, we mainly focus on exploring the basic architecture of edge computing. In addition, we propose a scheme that can optimize the energy consumption of edge computing based on reinforcement learning methods. Finally, we verify the effectiveness of our proposed scheme by comparing it with other schemes through simulation experiments.


The background of edge computing
The number of IoT devices is increasing dramatically worldwide, which is expected to reach approximately 24.6 billion connected devices in 2025 [1]. This is a very impressive figure. We can learn that the significant increase of these devices is a pretty strict challenge for data processing capabilities. In the past, we usually used cloud computing solutions to upload any data, information, tasks, etc. to cloud servers and wait for cloud servers to process our requests. We had to face long waiting times, intolerable connection stability, and frequent privacy and security breaches. An interesting idea in academia now is to bring servers down to the edge, that is, closer to the user. We usually call it edge computing. It allows users to offload services that are difficult to process locally to a nearby edge server for processing. Due to the shortened transmission distance, the communication latency between the user and the server is drastically reduced. So it can meet the requirement of low latency for IoT devices. In addition, edge computing keeps the data around the user and avoids transmission to centralized servers, and the user's data security is guaranteed [2].

High energy consumption of 5G
The advent of 5th generation mobile networks (5G) brings higher rates, greater bandwidth, and lower latency to data communication [3]. And 5G services are based on a service-oriented architecture, so it 2022 8th International Symposium on Sensors, Mechatronics and Automation System Journal of Physics: Conference Series 2246 (2022) 012076 IOP Publishing doi: 10.1088/1742-6596/2246/1/012076 2 could invoke service requirements flexibly and allow us to have flexible and dynamic access to edge servers. We can expect that in the future, various types of edge computing will be directly integrated in the 5G network architecture [4]. This will bring a broader prospect for edge computing.
However, the truth is not always as rosy as we imagine. Due to the high-frequency nature of 5G, the energy consumption caused by the data during transmission is extremely high. On top of that, the user's tasks are offloaded to the edge server, and the computation and processing of data in the edge server cause another part of energy consumption. The huge energy consumption means high cost, which will make edge computing exist only in thesis and difficult to spread to all industries. According to [5], global CO2 emissions have reached 34041045974 tons in 2018. In order to save energy and reduce costs, we must find a feasible way to optimize the energy consumption of edge computing.

Contributions
In this paper, our contributions are mainly as follows: (1) We build a four-tier edge computing architecture to facilitate computation offloading by users. (2) We propose an Energy Consumption Optimization Algorithm (ECOA) based on the Q-learning algorithm to reduce the energy consumption caused by edge computing. (3) The effectiveness of the method is verified by comparing it with several algorithms. The remainder of the paper is organized as follows: In section 2, we briefly introduce the related work. In section 3, we describe the architecture of edge computing. Section 4 introduces the energy consumption model. Section 5 shows the Markov decision process and Qlearning-based ECOA. And then, the effectiveness of the method is proved by simulation experiments in 6. Finally, section 7 obtains the conclusion and summarizes the paper.

Edge computing architectures
A number of scholars have proposed edge computing architectures, which are usually presented as layered architectures that are easier to separate and decouple each layer. For example, Li et al. describe a simple two-layer edge computing architecture. The bottom layer is the User Equipmet (UE), consisting of a variety of mobile devices, which are responsible for data collection [6]. The upper layer is the edge computing layer, which consists of a large number of edge servers that are responsible for task processing and computation. In other words, the UEs offload their tasks to the edge servers for computing. In addition, an alternative edge computing architecture design scenario is described by Abbas et al. A cloud data center layer is added to the two-tier architecture, which is responsible for persistently storing the data in the edge computing layer [7]. These already existing edge computing architectures can complete the work of edge computing, but not enough. For example, the detailed scheduling scheme of edge servers is completely ignored, and there is no mention of how to use the data obtained from edge computing, although the data is stored persistently. Therefore, we need to propose a better edge computing architecture to solve the scheduling problem of edge servers and to apply the data obtained from edge computing.

Energy consumption optimization algorithms
After establishing the edge computing architecture, the energy consumption of edge computing needs to be optimized. The problem of edge computing energy consumption optimization has been given by many scholars with their own solutions. Wei et al. [8] constructed a greedy algorithm to select the edge server with the lowest energy consumption for each offloading. Bi et al. [9] divided the energy consumption of edge computing into data upstream transmission energy consumption, data downstream transmission energy consumption and the energy consumption required for computing on the edge servers, and modeled the corresponding energy consumption. Later, PSO (Particle Swarm Optimization) was used to optimize these energy consumptions. Wang et al. [10] used a deep learning approach to deal with the energy consumption optimization problem and they constructed a DNN (Deep Neural Network) using gradient descent to perform the optimization. Although the greedy algorithm can find the optimal solution in the form of global traversal under small data sets, the efficiency of global access will be very poor along with the increase of the number of edge servers, which is difficult to apply to real business scenarios. Heuristic algorithms such as PSO have a certain degree of improvement in efficiency over the greedy algorithm, but the majority of time is wasted in useless search due to its large search solution space. Moreover, heuristic algorithms usually have no rigorous theoretical proof of convergence, and their stability needs to be considered. As for DNN methods, on the other hand, it has to part data sets as test sets and training sets, on the other hand, it generally used to solve classification and regression problems, and further processing of the output results is required to get decisions for offloading. In conclusion, the currently existing energy consumption optimization schemes for edge computing need to be improved and we need to propose a new solution. As shown in Fig. 1, we propose a generic edge computing architecture, which consists of four main components: user layer, scheduler layer, edge computing layer, and data application layer.

User layer
The user layer mainly consists of various user terminal devices and various types of sensors that are responsible for collecting data and transmitting tasks. For example, wearable devices on the user can be used to collect various physiological data and communicate with the edge computing server by sending heartbeat data packets to achieve user life and health detection. It also includes sensors placed on vehicles for speed, position, and direction to monitor vehicle driving data in real time, thus enabling assisted driving and intelligent navigation. There are also include users' smartphones, tablets, which offload locally tasks such as gaming, video processing, etc. to the edge by installing and deploying computational offloading services. Such devices send the collected information to the edge computing scheduler in real time to feed the edge servers for analysis and computation.

Scheduler layer
The main task of the scheduler layer is to schedule the edge servers distributed around the users, which is defined as a queue model. In this layer, the data from the computational offloading is encapsulated as a service and added to the request processing queue. After that, each element of the queue leaves the queue one by one and the edge server with lower energy consumption is selected for task processing according to our proposed ECOA.

Edge computing layer
The edge computing layer consists of a series of edge servers deployed around the users, responsible for handling the computational processing of tasks from the scheduler. On the one hand, since the edge servers are deployed in the vicinity of the user, the energy consumption and computational latency of user data transmission will be reduced. On the other hand, the data avoids being transferred to the cloud server, the user's privacy and security can be guaranteed to a certain extent.

Data application layer
The main significance of the data application layer is to fully utilize the value of the data. The main task is to present sensitive content from the data processed by the edge computing layer to researchers and analysts. Such as performing data statistics, data analysis, and providing data set for machine learning to train, etc.

Model
The energy consumption of local computing can be expressed as the following equation.
where 0 k is a coefficient related to the specific architecture and design of the chip. It is typically set as 26 0 1.0*(10 ) k − = [11]. Usually, to save energy as much as possible, the CPU clock frequency is adjusted in real time based on voltage and frequency, such as the turbo boost technology of intel and the turbo core technology of AMD.

Edge computing
When the user's data is offloaded to the edge server for computing, this energy consumption mainly includes the energy consumption during task transmission and the task running on the edge server. Analogously to local computation, we can express the latency spent by the service s to compute in the edge server m by the following equation: is the CPU clock frequency of the edge server m executing the service s . Correspondingly, we can obtain the energy consumption of the edge server m executing the task s as follows: According to Shannon's theorem, we can define the data transfer rate from the user's computational offload task s to the edge server m as follows: ,  The total delay in the task s computational offloading is the sum of the transmission delay and the calculation delay, which we express using the following equation: So the energy consumption of the next task s for computational offloading as a whole can be expressed as: Ultimately, the problem of optimizing energy consumption and resource allocation for the overall computational offload can be modeled as follows: where 1 C indicates whether the task s is offloaded to the edge server m . 2 C ensures that each edge computing server can only accepts no more than one task at a time. 3 C indicates whether the service s is transmitted to the edge server m over the subcarrier n . 4 C ensures that each subcarrier is assigned exclusively to one user. 5 C specifies that the range of CPU clock frequencies in the edge server should be an integer and less than a given value. Similarly, 6 C specifies that the range of channel transmission gain should be an integer and less than a given value. And then 7 C ensures the time of each offloaded task no more than dead line. It is easy to notice that the problem is mixed integer nonlinear programming (MINLP), which is characterized by an exponential increase in the complexity of the algorithm as the number of UEs increases. Traditional methods generally decompose it into several subproblems and then deal with each subproblems, which is very complex and inefficient. Next, we explore the reinforcement learning-based solution.

Markov decision process
We usually use the Markov decision process to describe the learning process of an agent in reinforcement learning. It is usually represented as a five-tuple: , , , , E S A R P γ =< > . Where E is the environment where the agent learns in, A is the action that the agent makes to the environment, and R is the gain that the agent obtains from the environment after taking an action on it. P is the probability that the agent makes a state transfer, and γ is the decay rate of the cumulative gain obtained by the agent. Specifically, it denotes that the agent observes the state t s S ∈ from the environment through continuous interaction with the complex environment and makes an action The agent will acquire a new state from the environment based on the state transfer probability. The system will cycle through this series of operations. We use ( ) U t to record the cumulative gain of the agent, which usually needs to be multiplied by the decay rate to weaken the future gain as follows： And then we can define the state action value function: In this paper, we assume that the agent is decision scheduler for offloading edge computing. Depending on the specifics of edge computing, Action, State and Reward are defined as follows： Actions: The service needs to offload subcarrier n to the edge server m , which are denoted like this: Reward: In general, the reward that an agent obtains from the environment is related to the objective function, so we define the reward of an agent taking action on the environment as t t E a s is used to evaluate the energy consumption in the current state.

Q-learning based solution
Q-learning is a typical reinforcement learning method that automatically learns in an edge-computing system according to the MDP parameters that have been set. An action-state pair is obtained in each step according to the guide of the reward function. The value ( , ) where α is the learning rate and 0 1 α < < , and γ is the decay rate, indicating the degree of acceptance of future rewards and 0 1 γ < < . For the selection strategy of action, we usually have two strategies: exploration and exploitation. Exploration refers to the random selection of actions under the state t s , and exploitation refers to the full use of existing Q values to select the action corresponding to the optimal Q value. We usually use  to denote the probability of exploration, and to make it dynamically adaptable to different stages of the cycle, we define it as a variable. where k is the number of current cycles and T is the total number of cycles. When the probability is lower than  , the agent will choose the action through the strategy of exploration, and when the probability is higher than  , the agent will choose the action through the strategy of exploitation. We can see that when the agent is at the beginning of the cycle, the agent is more inclined to learn new knowledge. As the number of cycles rises, the opportunity to exploit gradually increases, which will facilitate the algorithm to achieve convergence.
We propose ECOA based on the Q-Learning algorithm. It works like this: First, initialize the Qtable. And then execute the cycle according to the count of training. For each cycle, first determine  according to (17) to choose the corresponding policy, select the action and execute it. The environment will feedback the state based on the reward function. Finally, the Q-table is updated based on these states according to (16). The specific execution process is shown in Algorithm 1. 2): Random Offloading Algorithm (ROA). Rach UEs randomly select the edge computing server to offload services.

Energy consumption with the increase of services
In Figure 3, we compare the change in energy consumption of each algorithm with the number of UEs increases. We can see that the total energy consumption of edge computing tends to increase with the increase of the number of UEs, but our proposed ECOA has the lowest energy consumption. ECOA can dynamically make computational offloading decisions according to environmental conditions. In addition, the dynamically changing greediness also makes the algorithm adapt to the environment to a great extent. So the model tends to choose the edge server with lower energy consumption for computational offloading.

Energy consumption with the increase of edge computing servers
In Figure 4, we compare the change in energy consumption of each algorithm as the number of edge computing servers increases for a constant number of services. We can see that compared to other algorithms, the overall energy consumption of ECOA remains stable along with the increase in the number of edge servers, which proves the effectiveness of the algorithm.

Conclusion
In this paper, we propose a four-layer edge computing architecture and propose the ECOA to optimize the energy consumption problem in edge computing. Finally, we compare ECOA with three benchmark algorithms to prove the effectiveness of the algorithm. In the future, we will explore more solutions for optimize the energy consumption of edge computing and try to apply and deploy ECOA in more realistic environments. Table 1. Simulation settings