A Novel Video Transmission Optimization Mechanism Based on Reinforcement Learning and Edge Computing

As we know, the video transmission traffic already constitutes 60% of Internet downlink traffic. (e optimization of video transmission efficiency has become an important challenge in the network. (is paper designs a video transmission optimization strategy that takes reinforcement learning and edge computing (TORE) to improve the video transmission efficiency and quality of experience. Specifically, first, we design the popularity prediction model for video requests based on the RL (reinforcement learning) and introduce the adaptive video encoding method for optimizing the efficiency of computing resource distribution. Second, we design a video caching strategy, which adopts EC (edge computing) to reduce the redundant video transmission. Last, simulations are conducted, and the experimental results fully demonstrate the improvement of video quality and response time.


Introduction
In 2019, video transmission traffic made up 60.6% of overall Internet downlink traffic [1]. In the future, with the rapid development of 4K/8K, AR/VR, holographic communication, smart city, intelligent transportation, and other technologies, network video transmission demand and traffic will be further inspired. In addition, the number of video users on the Internet has maintained a rapid growth tendency, not only due to the rapid improvement of traditional network bandwidth but also because the quick expansion of mobile Internet has further stimulated the potential of the video transmission market. e industry of video transmission service has been shown in a trend of explosive growth, where fierce competition has existed among video transmission service providers. First, the traditional film-television industry has gradually begun to promote online video media library (including commercial television providers and public service broadcasters). Second, video portals (YouTube, Youku, Tencent Video, etc.) have developed rapidly. In addition, the service scale of professional video streaming media providers (such as Netflix and Maxdome) also shows an explosive growth trend. erefore, the competition among Internet video transmission services has become increasingly fierce. As of Q3 in 2018, Netflix's global membership reached 137 million, far ahead of more than 30 streaming media companies in the United States. It was reported by Sandvine providing intelligent bandwidth management services in October that Netflix has accounted for 15% of global network traffic (excluding China and India). Although Netflix has basically won the first place, its competitive pressure is increasing. Domestic competition is also fierce, video portals such as Youku, iQIYI, and Tencent Video are making efforts one after another. Although these three video portals still occupy the first echelon of domestic video transmission business, Bilibili, Mango TV, and other video platforms also develop rapidly. In addition, domestic P2P live video services, short video, and other services are also developing rapidly.
Both domestic and foreign video transmission service providers are committed to optimizing the user experience, so as to provide users with higher-quality video transmission services. erefore, whether from the perspective of users or video service providers, both investigations on the video transmission mechanisms and algorithms to obtain the best user experience have significant industrial value and academic value. In short, the object of the video transmission optimization is to provide videos for users with maximum video resolution and minimum video stream stuck. In addition, there are other factors to be considered, such as the switching frequency of video rate. ese factors are finally summarized into the quality of experience (QoE) score, which is used to evaluate the quality of video transmission.
At present, academia mainly studies the optimization mechanism of video transmission according to the transmission architecture, scheduling algorithm, and evaluation mechanism. In terms of architecture, there are inevitable conflicts between the best-efforts transmission mode of traditional Internet based on TCP/IP and the deterministic quality assurance of video transmission. erefore, some researchers put forward the novel network transmission architectures, such as information-centric networking (ICN), to improve the network transmission efficiency, so as to optimize video transmission efficiency; some researchers proposed content-distributed network (CDN) in the application layer based on the current Internet architecture network to optimize the efficiency of video transmission, in order to make up for the low efficiency of Internet transmission; some researchers proposed the intelligent video transmission strategy using edge computing, aiming to break through the limitations of end-to-end transmission and achieve the joint intelligent video transmission mechanism of "end-edge-cloud." In terms of scheduling algorithms, researchers have explored the adaptive rate adjustment from the end side, intelligent video compression mechanism based on server side, intelligent video caching and scheduling based on edge computing, etc. In addition, in the aspect of evaluation mechanism, researchers have conducted a lot of research on both subjective evaluation mechanism and objective evaluation mechanism, in order to accurately evaluate the video quality, so as to design a more accurate QoE model, and finally achieve the accurate matching between algorithm objectives and user QoE.
In this paper, we propose a video transmission optimization mechanism based on RL and EC, and the main contributions are summarized as follows: (i) we design the popularity prediction model for video requests based on the RL and introduce the adaptive video encoding method for optimizing the efficiency of computing resource distribution; and (ii) we design a video caching strategy, which adopts EC to reduce the redundant video transmission. e rest of the paper is structured as follows. Section 2 reviews the related work. In Section 3, a novel video transmission optimization mechanism based on RL and EC is proposed. e experimental results are reported in Section 4, and finally, Section 5 concludes this paper.

Optimization of Transmission
Architecture. In order to improve the network transmission efficiency, researchers have proposed many transmission architecture optimization algorithms. e objective of the algorithm is to analyze the key problems from the traditional best-effort packet forwarding mode of the Internet and improve the network transmission efficiency in terms of eliminating transmission redundancy, cooperation between end side and network side, etc., so as to reach an all-purpose video transmission optimization scheme.

Content-Centric
Network. Content-centric network (CCN) [2] is a novel Internet architecture proposed by Lv in 2009, which is content-centric and is a typical representative of ICN [3,4]. ICN achieves the interconnection of information in the network, based on the information-centric network communication mode. ICN abandons the traditional Internet IP protocol and the corresponding packet switching mechanism and reaches an efficient and secure content distribution mechanism by redesigning the network architecture and introducing identification and network caching. ICN extremely reduces the network redundancy by content labeling and network caching, so as to improve the transmission efficiency of network videos. Based on the naming mechanism, routing mechanism, distribution mechanism, and caching mechanism, ICN has evolved in many forms, among which the most representative scheme is CCN [2].
One of the key technologies of CCN is name-based content routing. e implementation of name-based content routing includes two main modules: forwarding information base (FIB) and pending interest table (PIT), which are shown in Figure 1. FIB is able to forward interest messages to nodes that may cache corresponding data. Another key technology of CCN is network caching. e router in CCN can achieve a content storage module (as shown in Figure 1), which is similar to the buffer space in the IP network, but it can have different content replacement strategies.

Content Distribution
Network. Related schemes of ICN can improve the efficiency of network video transmission, but it is difficult to implement such schemes in the current Internet in the short term. erefore, the content distribution scheme in the application layer has emerged in the current Internet, namely content distribution network. Many content service providers have been developed at home and abroad, including Akamai in the United States, Netresidence Technology and Blue Flood in China, etc. In addition, Alibaba Cloud and Tencent can all provide content acceleration services in CDN.
As a platform of content accelerated distribution, service providers of CDN can provide content accelerated services for content providers (CPs). CDN service providers deploy multiple CDN cloud platforms and CDN points of presence (PoPs) in the country or even around the world to promote the content to be distributed (such as volume video) to the PoP in advance. e content requested by users is obtained by redirecting to the nearest CDN PoP through DNS canonical name. As shown in Figure 2, from the perspective of content transmission, CDN can greatly decrease the bandwidth requirements of the backbone network and eliminate huge redundant transmission between CDN cloud platforms and CDN PoPs, so as to improve the network transmission efficiency.
In the current Internet content transmission, CDN has been widely deployed and applied and achieved obvious performance improvement. However, CDN also faces some problems and challenges. First of all, the cost of node deployment is high in CDN, which is difficult to be widely deployed as a heavy asset platform. So, the number of CDN PoPs is often limited, and it is difficult to achieve extensive coverage for massive users. erefore, the transmission efficiency between CDN PoPs is still low. Second, the process of content request in CDN is complex, which results in additional content request delay, so as to affect the user experience, especially the time latency. In addition, CDN is closed and independent based on the application layer, where content service providers and network service providers cannot participate in the optimization of content distribution, so the available communication and joint optimization mode between the network side and the end side are impossible to form.

Video Transmission Optimization Algorithm
2.2.1. Adaptive Bitrate Adjustment. Some researchers have proposed adaptive bitrate (ABR) algorithms [5][6][7]. Aiming at dynamic network available bandwidth, the object of ABR is to achieve end-to-side adaptive bitrate adjustment and avoid lags to improve the user QoE of watching videos. e detailed comparison and analysis of ABR algorithm have been provided in literature [8,9]. Specifically, the Mobile Information Systems comparison of paper [9] found that the configuration of various parameters has a significant impact on ABR performance. erefore, in practical application, dynamically adjustment for ABR according to network state characteristics, user system characteristics, and other factors, that is, setting ABR parameters, was a huge challenge. To solve this problem, MIT's research team proposed a reinforcement learning-based intelligent dynamic bitrate adjustment scheme, called pensive [10]. As shown in Figure 3, the scheme achieves intelligent and dynamic end-to-end video bitrate adjustment by reinforcement learning, which effectively addressed challenges of the complex parameter configuration in ABR and showed an efficient video transmission application effect. However, the end-based ABR strategy still has limitations. In this kind of scheme, each video client dynamically adjusts the request policy according to its own network state, which is based on the local optimal decision and is difficult to ensure the global optimization of network bandwidth resource utilization.

Intelligent Video Transmission Based on Edge
Computing. e scheme to optimize the video user QoE should have the following characteristics: (1) it can optimize the video transmission globally for users sharing bottleneck bandwidth, rather than only making decisions locally; (2) it can reduce the redundancy of video transmission and ensure the efficient utilization of network bandwidth; and (3) it can obtain the network state in real time, and it resists the dynamic network jitter by designing corresponding mechanisms to ensure the smooth watching experience for video users.
Recently, edge computing has emerged as a novel technology, which can satisfy the demand for video transmission scheduling. First, the edge computing platform is close to the terminal users and can provide the ability to optimize video transmission for all users globally. Second, the edge computing platform has a strong ability of sensory, storage, and computing, which can effectively address the insufficiency of network transmission ability. erefore, video transmission based on edge computing can improve the utilization of network bandwidth and transmission efficiency of massive videos, and it plays an important role in realizing the joint optimization of user QoE.
In literature [10,11], the authors proposed joint bitrate optimization mechanisms based on edge computing. ese schemes make intelligent joint bitrate decisions through deep learning. Compared with the traditional end-based QoE optimization mechanism, the optimization scheme based on edge computing has prominent advantages in terms of total QoE.

Cloud-Based Intelligent Video Coding Mechanism.
e video is increasingly popular as a core experience of people's online activities. Only on Facebook, more than 8 billion videos are viewed every day [12]. e client downloads videos from the cloud server of the video provider by ABR to watch videos [13,14]. e ABR algorithm can dynamically select the highest bitrate that the network bandwidth can support and avoid the jam phenomenon during watching. Higher bitrate can provide higher video quality, but it also results in more video transmissions, so the end-to-end connection with the higher bandwidth is required for clients.
When the original videos are uploaded, different basic bitrate versions of the videos are generated [15], which consumes huge computing resources. In the network video transmission, there are more than 100 video resolutions, and the same resolution also contains multiple different video bitrates, so the number of potential output types of video bitrates is large. By default, FFmpeg is used to encode the video uploaded to the server into a small number of standard versions. More computation can improve the user's video viewing experience by improving the coding performance (decreasing the amount of transmitted data for the same video quality) or increasing the coding selection (providing more fine-grained bitrate selection to adapt to the dynamic network bandwidth). However, the computing power of video coding in the cloud is limited, and it is impossible to generate enough coding versions for all videos. erefore, dynamically allocating appropriate coding power to the cloud among different videos to achieve the optimal global user experience is one of the problems to be solved in the network video transmission.
In this paper, we propose a cloud-based intelligent video coding mechanism with popularity consideration, assigning computing power and encoding bitrate versions of videos according to the popularity. However, the popularity of videos in the real situation is extremely imbalanced, where less than 1% of the videos contribute more than 80% of the time spent in viewing, so the imbalance is very obvious. is feature is of great value to computing power allocation for cloud dynamic coding. In the cloud, the highest quality coding or more customized bitrate versions are produced on demand for a small number of the most popular videos, so that the overall video viewing quality can be significantly improved with only a small amount of computing power.

Prediction of Video Popularity Based on Reinforcement
Learning. Analysis and prediction of video popularity are required for targeting cloud coding based on the feature of high concentration of video watching. In our scheme, the request processing logging mode is in charge of logging the sequence of video user requests, including video ID, request bitrate, request time, terminal parameters (such as resolution), etc. e popularity prediction should have following characteristics: first, the prediction should be quick, so that it can decrease the number of missing video requests; second, the prediction should be accurate, which can ensure that the computing is consumed on the most valuable videos; and third, the prediction should be scalable to analyze and predict massive request records. e popularity prediction methods proposed in papers [16,17] mainly aimed at the analysis and prediction of popularity at the day level. ese methods need great prediction delay, and the goal of this paper is to quickly predict the popularity at the minute level, so it is very important to design a fast-incremental popularity prediction algorithm. To be able to further maintain stability and adaptability to network dynamics, we use reinforcement learning to predict the popularity of videos.
Video requests that occurred in the past time t will have an impact on the popularity of future moment T, which is represented by f (T-t). f is a function of probability distribution defined on the space [0, +∞], which is generally monotonically decreasing. erefore, in principle, the more recent the visit, the greater the effect on the popularity, and the effect of a particular visit on the popularity gradually converges to zero over time. For a video, t i represents the time of the visit i; and the total number of times to watch the video in the future time T can be calculated by the following formula: e key problem is to set the core probability density function f to make incremental update possible, so as to accelerate the process of video popularity prediction. Previous works [18][19][20] used power law distribution as the probability density function to predict the popularity. However, a complete calculation is required to solve the popularity every time in this method, which greatly decreases the prediction speed and affects the timeliness of the popularity feedback. In this paper, we use exponential distribution as the probability density function, which can largely reduce the computations needed for the popularity prediction, and is expressed as follows: where w indicates the range of the time window for future impact, and it mainly serves to remove visits made long ago, which have minimal effect on the accuracy of popularity prediction and can be ignored. For a video, we suppose T 2 is the present request time of the video to trigger the present popularity upgrade, and T 1 is the last request time of the request. Aiming at current time T 2 , the future popularity of the video can be calculated by the following formula: Reinforcement learning is a field of machine learning, which selects the action based on the environment to maximize the expected benefits. In reinforcement learning, the agent chooses an action to be acted in the environment [21][22][23]. After the environment receives the action, the state changes and generates a reward according to the quality of action, and the reward is forwarded to the agent. e agent selects the next action according to the reward and the current state of the environment, which form a positive feedback mechanism and increase the probability to choose the optimal action for each state [24,25]. In this paper, we apply reinforcement learning to popularity prediction and design a popularity prediction strategy based on reinforcement learning, which is able to further support the dynamic network and improve the accuracy of prediction. e process of popularity prediction based on reinforcement learning is shown in Figure 4. e video requests in the past time period t are regarded as the state, the predicted popularity on time T is regarded as the action, and the network performance of video transmission is considered the reward. e agent chooses action as predicted  Mobile Information Systems popularity according to the reward. e proposed method can choose the popularity with the highest video transmission performance as the prediction popularity, which ensures the accuracy of popularity prediction adapted to the dynamic network [26][27][28]. Specifically, we adopt the Q-learning method to predict the request popularity of videos. We consider the video popularity of the past time t as the state expressed as s and the video request popularity at the moment T as the action expressed as a. en, the Q value expressed as Q (s, a) is calculated as follows: where α represents learning step, c represents discount factor for rewards, and Q(s ′ , a ′ ) is the maximum Q of the state s ′ and action a ′ at the next moment. Furthermore, Q (s, a) is obtained by the performance of the video transmission corresponding to that state and action. In this paper, the performance expressed as P is set to be related to the request delay, which is calculated as follows: where k is the coefficient of impact of time delay on performance. en, the action corresponding to the maximum Q value is selected as the predicted video request popularity at moment T, which is expressed as

Adaptive Computing Power Allocation for Video
Coding. e computation distribution management mode is responsible for accepting both raw video regular encoding requests and popularity-sensitive on-demand custom encoding requests, which also dynamically allocates and balances CPU computational resources at the core level of granularity according to the different workloads of the two request types.
Based on the above popularity prediction, a set of popularity-sensitive customized coding task is obtained. Popularity prediction of the video is triggered and tasks in the on-demand coding set are generated with different priorities because the bitrate requested by the user does not exist. At the same time, in our mechanism, we consider that even videos with the same popularity should have different priorities because overall improvement of the user QoE may be different under the same computing power. For example, the requested bitrate of video A is 720p, while there are only 180p, 480p, and 1080p in the actual video caching module. Due to the bandwidth limitation of the user's requested bitrate, the closest video version is 480p (1080p may cause huge lags due to insufficient bandwidth). If the requested bitrate of video B is 720p, and there are only 180p and 1080p in the actual video caching module, the actual bitrate should be 180p. In the above case, although the popularity prediction of A and B is the same, B should be given priority to conduct on-demand coding to maximize the effect of QoE. erefore, we introduce the QoE increment factor expressed as where x indicates the multiplication coefficient between the request bitrate and the response bitrate. at is, when the request and response bitrate are 720p and 480p, respectively, then x takes the value 1.5. e computing power distribution management mode receives the conventional original video coding request, such as regular encoding requests encode the raw video in both 480p and 1080p by default. In fact, the amount of conventional coding can be increased or decreased according to the computing power of the cloud video encoding platform. e remaining potential encoding options, including 180p, 360p, 720p, in accordance with the popularity of user video requests, trigger on-demand specialized encoding services, thus providing intelligent and specific encoding services. In the actual video cloud platform, the coding types involved are far more than those mentioned in this paper. e cloud transcoding platform can dynamically allocate the computing power according to the actual computing power and the conventional transcoding requirements of original videos.

Popularity-Based Intelligent Edge Caching Mechanism.
Much transmission redundancy is generated in the process of video transmission, which has a strong locality in time and space, that is, a small number of videos are requested by users in the same area many times in a short time.
erefore, as shown in Figure 5, the mechanism introduces an edge computing platform, which breaks the limitation of traditional end-to-end video transmission and achieves an intelligent video transmission mechanism of end-edge cooperation by edge caching. And the process of the mechanism is specifically described as follows: Step 1. e edge computing platform receives video requests from all users within its coverage area.
Step 2. Search the local cache space in the edge computing platform: (1) if there is a corresponding video and the bitrate matches completely, the video cached in the edge platform is directly used to respond to the user request. (2) if the corresponding video is available but the bitrate does not exactly match and no superior choice is found in the cloud, respond to the user request directly with the cached video and at the same time inform the cloud of the request in order to count and predict the video popularity. (3) if the corresponding video is available but the bitrate does not exactly match, but a better option can be found in the cloud, forward the video request to the cloud for that user. (4) if no corresponding video is available, the request will be directly forwarded to the cloud.
Step 3. For videos responded by the cloud, the edge computing platform caches these videos according to the predicted popularity within the edge coverage, and the videos with lower popularity will be replaced preferentially.  [29], and joint rate control and buffer management (JRCBM) [30], under different numbers of requests, and the results are analyzed in terms of video relative quality and video lag degree.

Video Relative Quality.
e comparison simulations on video relative quality under different numbers of requests are shown in Figure 6. Our algorithm is always the best under different numbers of requests because, when the basic bitrates do not match the user request, the coding task can be customized to ensure the relative quality of the video.

Video Lag Degree.
To analyze and compare the video lag degree in different algorithms, the relative smoothness index of video viewing is considered the metric to evaluate the video lag degree, which is calculated as where t wch indicates the duration of video viewing, and t wit indicates the duration of waiting during video viewing, which includes the time of buffering process during startup. Comparison simulations on the video lag degree under different request numbers are shown in Figure 7, and the proposed TORE is always the best under different numbers of requests. We can explain the advantages of the proposed approach in two aspects. On one hand, the EC-based intelligent caching strategy adaptively allocates arithmetic power and tasks to edge-side nodes, which will decrease the transmission latency of the requests. On the other hand, the popularity-based edge intelligent caching reduces the redundant transmission of the network. As a result, the path will not be jammed to ensure the stability of the huge network video transmission.

Video Response Time.
As can be seen from Figure 8, the proposed TORE has a good performance in response time.
e intelligent caching method is implemented according to the regional popularity characteristics in the EC platform, which is combined with video forwarding to minimize the network transmission redundancy and maximize the video transmission efficiency. e proposed scheme is of significant value for optimizing the video response time, which can improve the network transmission efficiency and user QoE.

Conclusions
In this paper, we propose a dynamic computing power allocation mechanism based on intelligent popularity prediction for video user distribution. e proposed mechanism can take into account both the conventional encoding demand and the dynamic on-demand customized encoding demand of users and can fully and reasonably utilize the limited computing power in the cloud to adaptively allocate the computing power to each server to reduce the response latency of requests and thus improve QoE. At the same time,  this scheme introduces the edge computing architecture and reinforcement learning method to achieve video popularity prediction, which further realizes intelligent caching based on video popularity. We experimentally demonstrate that the proposed method can optimize the efficiency of video transmission and reduce network latency.
e key research of the proposed optimization mechanism is to improve the video quality and response time of users in watching videos. However, compression and decoding in video transmission optimization are not analyzed. In the future, we can try to optimize the video content by using different bitrates to encode the video streaming that users are interested in and uninterested in, so as to directly reduce the redundant traffic in video transmission.
In particular, to make the readers more easily follow this paper, the commonly used abbreviations are listed below.

ABR:
Adaptive bitrate AR: Augmented reality CCN: Content-centric network CDN: Content-distributed network CP: Content provider EC: Edge computing FIB: Forwarding information base ICN: Information-centric network JRCBM: Joint rate control and buffer management PIT: Pending interest

Data Availability
All the data used to support the findings of the study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.