Joint Optimization of Service Migration and Resource Allocation in Mobile Edge–Cloud Computing

: In the rapidly evolving domain of mobile edge–cloud computing (MECC), the proliferation of Internet of Things (IoT) devices and mobile applications poses significant challenges, particularly in dynamically managing computational demands and user mobility. Current research has partially addressed aspects of service migration and resource allocation, yet it often falls short in thoroughly examining the nuanced interdependencies between migration strategies and resource allocation, the consequential impacts of migration delays, and the intricacies of handling incomplete tasks during migration. This study advances the discourse by introducing a sophisticated framework optimized through a deep reinforcement learning (DRL) strategy, underpinned by a Markov decision process (MDP) that dynamically adapts service migration and resource allocation strategies. This refined approach facilitates continuous system monitoring, adept decision making, and iterative policy refinement, significantly enhancing operational efficiency and reducing response times in MECC environments. By meticulously addressing these previously overlooked complexities, our research not only fills critical gaps in the literature but also enhances the practical deployment of edge computing technologies, contributing profoundly to both theoretical insights and practical implementations in contemporary digital ecosystems.


Introduction 1.Motivation
In recent years, the rapid expansion of Internet of Things (IoT) devices and mobile applications has catalyzed the development of mobile edge-cloud computing (MECC).This innovative paradigm combines the extensive computational resources of cloud computing with the immediacy of edge computing to meet essential requirements for low latency, high reliability, and superior quality of service (QoS) [1].By harnessing the robust data handling and computational capabilities of cloud data centers, MECC efficiently manages large datasets and executes complex computations beyond the processing power of edge devices alone [2,3].Additionally, MECC significantly reduces latency by processing data closer to their source, enhancing performance for latency-sensitive applications such as autonomous driving, augmented reality (AR), virtual reality (VR), and real-time gaming [4][5][6][7][8].
MECC also improves system reliability by enabling edge nodes to autonomously manage critical operations, ensuring uninterrupted service even during network disruptions.Furthermore, the architecture of MECC supports dynamic and scalable resource allocation, optimizing cloud resources for computationally intensive tasks while delegating real-time processing to edge servers (ESs) [9].This adaptability makes MECC highly suitable for modern digital ecosystems, capable of efficiently handling variable network loads and diverse application demands while seamlessly integrating smart technologies.
Despite its advancements MECC faces significant challenges, primarily stemming from user mobility and the dynamic computational demands of contemporary mobile applications, which can lead to potential delays and instabilities, adversely impacting QoS [10,11].Common issues arise when users move to locations farther from their current ES, necessitating the offloading of tasks back to the original server to maintain service continuity.This often results in increased communication delays and can substantially degrade user experience, particularly in latency-sensitive applications.While recent advancements in dynamic computational offloading and resource allocation have significantly mitigated issues related to user mobility by optimizing offloading and allocation decisions to minimize response times, there remains a notable gap in fully addressing the dependencies of task execution on the underlying environments and user contexts at ESs.
Current research has responded to these operational challenges by prioritizing service migration strategies that relocate task execution environments and user contexts to ESs closer to the user's real-time location, thereby maintaining or enhancing QoS [12].However, these studies often overlook the critical impact of the reduction in computational time caused by migration processes, which can be detrimental to the performance of time-sensitive services.Furthermore, the interplay between service migration decisions and resource allocation strategies remains underexplored.Effective strategies should not only minimize data transfer delays by considering user proximity but also assess the computational capacities of various ESs to ensure that after migration the resources are adequate to handle the relocated services without causing service timeouts.Although some studies have acknowledged the complex interdependence between service migration and resource allocation, they often fail to account for scenarios where tasks are not completed before migration.This oversight can lead to data loss, increased delays, and significantly compromised QoS.
In summary, while significant progress has been made in addressing various aspects of migrating services and allocating resources within MECC environments, critical gaps remain.Developing comprehensive strategies that simultaneously enhance the efficiency of service migration and resource distribution, considering unfinished tasks and their contexts, is crucial.Addressing these gaps will significantly enhance the efficiency and effectiveness of MECC environments, aligning their capabilities with the evolving demands of contemporary digital ecosystems.

Our Contributions
In the rapidly evolving domain of MECC, the critical interplay between service migration and resource allocation commands increasing attention.This paper explores these intricacies by establishing robust service migration and computational models, enhanced by a reinforcement learning-based strategy aimed at minimizing service response times.Addressing gaps identified in previous research, particularly the overlooked aspects of user context migration and the inadequate attention to computational resource allocation, our research introduces an innovative framework.It adeptly manages user mobility and the need for uninterrupted service by migrating unfinished user data and context.Moreover, the framework dynamically adjusts service provisions and resource allocations in real time, significantly enhancing QoS within strict time constraints.This approach not only ensures service continuity but also improves responsiveness across MECC environments.
The main contributions of this paper are summarized as follows: • Comprehensive optimization of service migration and resource allocation: We provide a thorough problem formulation that simultaneously optimizes service migration and resource allocation within MECC frameworks.Addressing the challenges of heterogeneous ES environments, our study tackles the intertwined issues of user mobility and fluctuating computational demands.The optimization strategically aims to mini-mize average response times, substantially enhancing QoS while meeting rigorous temporal constraints.

•
Strategic transformation into a Markov decision process (MDP): Moving from theoretical models, this paper adeptly transforms the joint optimization challenge into an MDP.We introduce a novel deep reinforcement learning (DRL)-based algorithm to tackle this MDP, autonomously adapting migration and resource allocation strategies without relying on prior knowledge of system states.

•
Rigorous evaluation through simulation: The efficacy and robustness of our proposed DRL-based dynamic migration and resource allocation strategy are rigorously tested through comprehensive simulations.Performance metrics, including task failure rate and average task response delay, serve as benchmarks.The results demonstrate that our DRL-based approach sustains high service quality and markedly reduces average response delays, thereby outperforming established benchmarks in this paper.
The remainder of the paper is organized as follows: Section 2 summarizes the related work.Section 3 describes the system model in detail, which includes the migration model, computational model, and problem description.Section 4 provides a detailed description of the proposed A2C-based approach.The results of the simulation experiments are discussed in Section 5. Finally, Section 6 summarizes the paper and outlines future research directions.

Related Works
In this section, we review existing research on the joint optimization of dynamic computation offloading and resource allocation, alongside the optimization of service migration policies in MECC environments.We then highlight the distinctions between our study and the prior works, elucidating the unique contributions and advancements our research offers in this domain.

Joint Optimization of Dynamic Computation Offloading and Resource Allocation
This section reviews existing research on strategies for task offloading and resource management in dynamic network environments.
Liu et al. [13] considered the mobility characteristics of user equipment (UE) and proposed a dual time-scale framework that resolves user-server association problems by incorporating long-term channel interference, workload, and server computational constraints with short-term dynamic task offloading and resource allocation.Wang et al. [14] developed a decentralized offloading framework, accommodating mobile users dynamically entering or exiting an MEC system, and adapting to their varying offloading demands.Yang et al. [15] introduced a priority-driven multi-agent (PDMA) cooperative task offloading algorithm to address the dynamic characteristics of task arrivals, mobility of devices, and load imbalances across ESs.Liang et al. [16] investigated joint task cache placement and offloading in mobile edge computing systems characterized by dynamic task arrivals.Fang et al. [17] devised a dynamic task offloading algorithm using DRL, considering dependency relationships among user tasks to optimize task offloading and resource allocation decisions effectively.This algorithm seeks to minimize task completion time and reduce device energy consumption amid channel variations.Zhu et al. [18] proposed a scheduling algorithm that integrates communication and dynamic tasks by considering the vehicle's mobility patterns and task sizes in vehicular networks supported by an intelligent reflecting surface (IRS) for MEC services.Ma et al. [19] based their joint offloading strategy on vehicle mobility patterns, aiming to minimize the weighted sum of execution time and computation costs, considering both response delay and economic factors.Dang et al. [20] introduced a task offloading cost model for scenarios involving multiple vehicles and MEC servers, utilizing the DDPG algorithm to make decisions that minimized the overall system's task processing costs.Liao et al. [21] explored task execution queues and priorities within a multi-MEC server environment, dividing computation offloading into power scheduling and task offloading phases.In the power scheduling phase, the focus is on minimizing energy consumption through optimal transmission power and CPU frequency settings, while the task offloading phase aims to reduce execution latency through strategic offloading decisions.Huang et al. [22] addressed a dynamic Internet of Vehicles (IoV) architecture that supports both MEC and cloud computing, employing a DRL-based algorithm to optimize task offloading and resource allocation for self-driving vehicles on straight roads.This algorithm aims to minimize processing costs by optimizing computational offloading and bandwidth allocation, adhering to processing delay and transmission rate constraints.Xu et al. [23] proposed a vehicular edge computing architecture utilizing non-orthogonal multiple access (NOMA), focusing on cooperative resource optimization among ESs to maximize the service ratio through game theory and convex optimization methods.
While recent advancements in dynamic computational offloading and resource allocation have significantly mitigated issues related to user mobility by optimizing offloading and allocation decisions to minimize response times, there remains a notable gap when it comes to fully addressing the dependencies of task execution on the underlying environments and user contexts at ESs.In scenarios where user mobility is rapid, optimizing task offloading alone proves insufficient.This is due to the fact that task execution often depends on specific characteristics of the ESs and user contexts.Consequently, integrating service migration strategies that dynamically adapt to these environmental and contextual dependencies is crucial.Such strategies not only complement the offloading process but are also essential to effectively reduce delays caused by task routing and result returns, thereby enhancing overall service quality.

Optimization of Service Migration Strategy
Numerous studies have focused on optimizing service migration policies to ensure QoS.
Liu et al. [24] aimed to efficiently manage the allocation of diverse heterogeneous resources and user tasks to maximize system utility in the context of vehicular edge computing.Their innovative hybrid computing offloading strategy, incorporating both vehicleto-infrastructure and vehicle-to-vehicle communications, allows for service migration to other ESs when an ES's computing capacity is insufficient, thus achieving load balancing.Liang et al. [25] addressed mobility management across different time scales, proposing a framework that integrates service migration with transmission power adjustments.This strategy enables making service migration decisions at a broader time scale, while adjusting transmission power at a finer scale to support task offloading, aiming to minimize long-term energy consumption and ensure reliable computational offloading.Researchers in [26] introduced a digital twin edge network architecture, distinguishing between latencysensitive and latency-insensitive tasks.By utilizing real-time and historical data to predict future user movements, they tailored service migration decisions to reduce costs while maintaining QoS.Peng et al. [27] considered the comprehensive migration costs associated with service migration, integrating both computing and communication costs.In dynamic networks, they employed reinforcement learning combined with transfer learning to derive effective migration strategies within a dynamic vehicular edge computing environment.Xu et al. [28] tackled user mobility by modeling the service scheduling problem in an MEC environment, proposing a service management method using a probabilistic approach to effectively reduce service delay and migration costs.Wang et al. [29] emphasized the importance of balancing benefits and service costs during migration, proposing a dynamic service migration algorithm based on DRL aimed at minimizing the weighted sum of service delay and migration costs.Li et al. [30] introduced an edge caching strategy balancing energy and latency, utilizing a deep neural network for predicting future request content, followed by determining an optimal caching placement with the branch-and-bound algorithm and refining service migration strategies using a DQN algorithm to reduce service latency.Chen et al. [31] used a DRL algorithm to optimize service migration decisions in ES, focusing on minimizing user-perceived latency and system energy consumption, addressing service interruptions caused by user mobility.Additionally, Chen et al. [32] developed a service migration optimization algorithm based on deep recursive Q-learning, aiming to minimize both user latency and system energy consumption by considering user mobility and the coverage range of ESs.
Current research on service migration predominantly focuses on cost reduction but often overlooks the significant reduction in computational time during migration, which can adversely impact the performance of time-sensitive services.Moreover, the dynamics between service migration strategies and resource allocation decisions have not been extensively studied.Effective strategies must not only minimize data transfer delays by considering user proximity but also rigorously evaluate the computational capacities of various ESs.This ensures that the resources available post-migration are adequate to support the services without causing timeouts.Joint consideration of service migration and resource allocation is crucial for improving system performance and user experience.

Joint Optimization of Service Migration and Resource Allocation
This section explores the challenges associated with optimizing service migration and resource allocation in MECC environments, a topic that has received limited attention in existing research.
Liang et al. [33] addressed user mobility in cellular networks to ensure seamless task migration between base stations without compromising resource efficiency or link reliability.Their research optimized migration and handover policies by jointly managing computational and radio resources.The policy framework integrated virtualization, I/O interference between virtual machines, and challenges associated with wireless multi-access.They developed a solution based on relaxation and rounding that includes an optimal iterative algorithm and a novel integer-recovery design.This approach surpassed traditional rounding methods by leveraging derived problem properties and applying matching theory.Additionally, their study included "hotspot mitigation", aiming to redistribute the load from overloaded to idle servers or base stations.Simulation results validated the effectiveness of their policies in multi-cell MECC networks, demonstrating near-optimal performance in managing joint service migration and base station handover.Building on this, Liu et al. [34] proposed a method to reduce access latency for IoT users in MECC by jointly optimizing service migration and resource allocation.They introduce a Service Migration and Resource Allocation (SMRA) algorithm based on DRL, which accounts for the mobility of IoT users.This algorithm determines whether to migrate services, identifies optimal migration destinations, and allocates resources using the long short-term memory (LSTM) and the parameterized deep Q-network (PDQN) algorithms.
Despite the recognition of the interdependencies between service migration and resource allocation, existing studies often overlook scenarios where tasks are incomplete before migration.Such oversight can result in data loss and significant deterioration in QoS.Addressing these critical aspects is essential for developing more effective migration and resource allocation strategies, ultimately enhancing both system performance and user experience.
To underscore the novelty and uniqueness of our work, we compare our study with existing research in the field and identify several key distinctions:

•
Acknowledgment of migration delays and impacts: Unlike existing studies that often overlook the critical impact of the reduction in computational time caused by migration processes, our research takes these factors into account.We analyze the direct consequences of migration processes on the operational efficiency of systems, ensuring a more comprehensive understanding of the migration dynamics.

• Exploration of service migration and resource allocation interplay:
The interaction between service migration strategies and resource allocation decisions has not been thoroughly examined in prior research.Our study delves into this interplay, aiming to establish a balanced approach that optimizes both elements to improve overall system performance.

•
Consideration of incomplete tasks during migration: While a few studies have begun to address the interdependencies between service migration strategies and resource allocation decisions, they rarely consider scenarios where tasks are not completed before migration.This oversight can lead to significant challenges, such as the need to migrate unfinished task data and service contexts together, which can further complicate resource allocation strategies.Our work addresses this gap by incorporating these scenarios into our optimization model, aiming to minimize disruptions and enhance service quality.
By addressing these aspects, our study not only contributes to the academic understanding of service migration dynamics but also offers practical insights that can be applied to improve the responsiveness and efficiency of IoT applications in MECC environments.

System Model and Problem Definition
As illustrated in Figure 1, we examine an edge-cloud collaboration system comprising ESs and a single cloud server.In this system, users continuously move and offload tasks generated by their mobile devices to the edge servers for execution.The ESs then send the task information and users' location data to a centralized scheduling and resource allocation system, which is responsible for making decisions regarding migration and resource allocation.By transmitting only the necessary information to the centralized system for decision making, this approach effectively reduces network load.At the same time, it allows for easier access to global network information, enabling more efficient and reasonable resource allocation strategies [30].

System Model
We assume that each ES in the edge-cloud collaboration system is associated with a corresponding radio access network (RAN) node.Let S = {s 1 , . . ., s i , . . ., s n } denote the set of servers accessible to users via these RAN nodes.The required service functions (SFs) are deployed on the ESs to support emerging IoT applications.When users offload specific tasks to the ESs for processing, the corresponding SF creates a dedicated service instance (SI) to execute the task.The ESs support Λ types of application tasks, and their allocatable computational resources are represented as {p 1 , . . ., p i , . . ., p n }, measured in GHz.
In the system, m users, denoted as U = {u 1 , . . ., u j , . . ., u m }, are continuously active in the environment.Each user has installed Λ types of intelligent applications on their respective devices.It is assumed that ESs periodically update the information regarding the users they serve.Time is segmented into discrete intervals T = {1, . . ., t, . . ., T}, each with a duration of τ.After an ES updates the information of the users it serves in slot t, the user selects one of the Λ application types to generate a task for offloading to the ES.Then, the ES creates an SI for the offloaded task, uploads the task information at this time slot, and inserts it into the task table Task maintained by the control system, where Task = {Task 1 , . . ., Task t , . . ., Task T }.Here, Task t = {task 1 t , . . ., task k t , . . ., task K t }, where Task t represents the list of task information in time slot t.We define by |task t | = K t the number of tasks in the system in time slot t.And task k t defines {T W , Rst k t , V k t , W, loc k t , X k t , Z k t }, which contains the response time constraints of the task type W, the remaining time of the task in time slot t, the remaining size of the task (measured in MB), the type of the task, the ESs associated with the user who offloaded the task, the location of the task, and Z k t = 1 indicates that the task is completed in time slot t; otherwise, it remains unfinished.
When service migration is required, the corresponding SI, including task and context information, must be migrated from the origin server to the destination server.The model and assumptions for service migration are elaborated below.

Migration Model
For each task task k t , we use This stipulates that each task is allocated to exactly one ES.The ES hosting task k t is denoted as follows: Migration occurs if |Ψ k t+1 − Ψ k t | > 0, and we use Ψ k t+1 to denote the location of the server where the task is located in the next time slot.
Whenever a migration occurs, the SI of task k t must be migrated via the communication link of the RANs from the current ES to a new ES for continued processing.It is crucial to transfer the SI's state context during migration, which includes user-specific information, intermediate processing results, and more.Before resuming the task, the new ES must synchronize the SI's state context and restore the task's progress.As depicted in Figure 2, the migration process from suspension to restoration involves various time delays, including service suspension delay, synchronization delay, and service restoration delay.Thus, the total migration delay of SI can be expressed as follows: where h k t represents the service suspension delay associated with the SI context of task k t , w k t denotes the synchronization delay as the SI migrates from the source ES to the target ES, and r k t+1 is the service restoration delay for restoring the task to its state before suspension.During the service suspension and restoration processes, both the suspension and restoration delays are contingent upon the remaining size of the task, the processing inten-sity required for either suspension or restoration, and the computing resources allocated to the task's SI.The service suspension delay can be articulated as where ρ sp W represents the suspension processing intensity requirements of the application type W to which the task k t belongs when the SI is paused (measured in CPU cycles required by processing per bit state context), V m W is the context size of the application type W (measured in MB), and p Ψ k t denotes the computational resources allocated to the SI on s Ψ k t (measured in cycles per second).And, the service restoration delay can be written as where σ sp W denotes the restoration processing intensity requirements of the application type to which task k t belongs when the SI is restored, and p Ψ k t+1 denotes the computational resources allocated to the SI on s Ψ k t+1 in the next time slot.The term V k t+1 is further discussed in Equation (14).The synchronization delay is the transmission delay determined by the remaining data size of the task and the bandwidth between the original server and the target server, which can be expressed as where is the link bandwidth between the source server s Ψ k t and the target server . Specifically, we have In time slot t, we consider the migration of SI into or out of ES s i , where both service suspension and restoration demand computational resources.To avoid conflicts in resource usage, a period ϕ t,i within each time slot is dedicated exclusively to these processes, expressed as where C t,i represents the average delay for SIs migrating into target ES s i , and V t,i represents the average delay for SIs migrating out of the original ES s i .When the remaining time Rst k t < 0 for task k t and the task is incomplete, the SI of task k t must migrate to the cloud server.Given the cloud server's distance from the ES, the synchronization delay for SI migration is + µ, where µ represents the propagation delay.It is assumed that the cloud server possesses ample computational capacity to process the tasks and can return the results to the user in the subsequent time slot.

Computation and Communication Model
At the start of each time slot t, a user offloads a task from a specific application type to the associated ES s i , where loc k t = i.Utilizing the Shannon formula, the maximum transmission rate for offloading the task can be expressed as where B denotes the wireless channel bandwidth between the user and the RAN, p j t is the transmission power of user u j or the associated RAN node, g t represents the free-space path loss, and N 0 is the noise power spectral density.
The uplink transmission delay of task k t consists of two components: the delay for offloading the task to the nearest RAN node and the delay from this RAN node to ES s i , which hosts the SI.This can be represented as Upon completion of task k t after r k t time slots, we define the resultant data size as V R,k , where V W denotes the average task data size for the application type W of task k t , and ω W ret represents the ratio of the resultant data size to V W .The return delay of the task result can be expressed as where T back,k t+r k t is the delay for returning the task result to the user's associated ES, and T down,k t+r k t is the delay for transmitting the result to the user over the wireless channel, given by T down,k Specifically, for results from the cloud, we have The computational delay of a task on an ES depends on the task size, the computational intensity required for processing tasks, and the allocated computational resources.After task k t has been executed for a period of time in time slot t, the remaining size of task k t can be expressed as where κ W denotes the computational intensity required by processing task k t of type W (measured in CPU cycles required per bit), and p Ψ k t denotes the computational resources allocated to task k t on s Ψ k t in time slot t.The remaining time of task k t after the execution of the task in time slot t is updated as Rst k t+1 = Rst k t − τ.The sum of the resources allocated to all tasks on s i must satisfy its maximum resource constraint, i.e., C2 : All mathematical symbols employed in this paper up to this point are systematically organized and presented in Table 1, following their order of introduction in the text.The average migration delay for services migrating to server s i V t,i The average migration delay for services migrating out of server s i ϕ t,i The average migration delay of server s i µ Propagation delay between ESs and cloud server R The computational intensity of the task generated by application W

Problem Formulation
With the above notation and modeling, we can obtain the response delay of the task as In this paper, we minimize the average response delay of tasks by optimizing the service migration and resource allocation policies.The combined challenge of optimizing service migration and resource allocation in the MECC environment can be expressed as follows:

Proposed A2C-Based Algorithm
In this section, we detail our proposed Advantage Actor-Critic (A2C)-based dynamic migration and resource allocation approach.We discuss the transformation of problem P into an MDP and describe the A2C-based algorithm.

Problem Transformation
In MECC systems, the ultimate goal is to improve the long-term QoS for users by implementing service migration and resource allocation policies.Given that tasks in time slot t + 1 depend on the execution status of tasks in time slot t, and considering the dynamic nature, heterogeneity, and complexity due to cross-time-slot execution characteristics, we utilize reinforcement learning methods to address this problem.
Prior to applying reinforcement learning, we first transform the problem into an MDP, represented by the tuple M = {S, A, P, R, γ}.Here, S denotes the state space, A represents the action space, P is the state transition probability, R is the reward function, and γ signifies the discount factor.Within this MDP framework, the agent perceives the state s t of the environment at time slot t, selects an action a t from the action space to execute migration and resource allocation, and receives a reward r t .The environment then transitions to the next state s t+1 based on P [35].Further details of the MDP are as follows.
State: For any time slot t, the system's state is represented by an array reflecting the real-time status of tasks in the MECC system.This array includes information such as the remaining time for tasks, remaining task size, task type, the ES associated with the user, and the ES where the task is located, i.e., Each row represents the state of a task in the system, with a total of |task k t | rows.Action: In time slot t, actions for state s t are represented by an array that includes migration decisions and resource allocation strategy, i.e., Each row indicates the migration target server s Ψ k t for the task task k t , and the computing resources assigned to task k t on that server.Reward: Reinforcement learning methods are typically employed to maximize the long-term reward of the system.By converting the objective of minimizing the average delay of all tasks into minimizing their cumulative response times, we define the reward for an action a t as the negative value of the delay.Additionally, we assign specific negative rewards for tasks that fail to complete their upload to the cloud within the given time constraints. (20)

Dynamic Migration and Resource Allocation Algorithm Based on A2C
Reinforcement learning is a robust approach for addressing dynamic programming challenges.Recent advancements in this field involve the integration of deep neural networks to represent both policy and value functions effectively.In this paper, we utilize the A2C method to optimize our objective.As illustrated in Figure 3, A2C not only refines the policy to select the optimal action for each state but also develops a value function that supports policy optimization.In DRL, for the given state s t the expected reward obtained by selecting action a t according to policy π is defined as the action value Q π (s t , a t ), which is defined as follows:

MECC Environment
where R t represents the expected sum of reward under the strategy π.R t can be expressed as where γ t represents the discount factor used to discount future rewards.According to Bellman's equation, the action value Q π (s t , a t ) can be re-expressed as and V π (s t+1 ) is the expected reward obtained by following policy π, called the state-value function, i.e., In the A2C framework, there are two primary components: the actor and the critic.The actor is responsible for maintaining a policy π, guiding action selection, and interacting with the environment.The critic, on the other hand, learns the state-value function based on rewards derived from the interactions between the actor and the environment, and aids the actor in policy updates.
The actor employs a neural network to model the policy function, generating action probabilities from the observed state.Its objective is to identify the optimal policy that maximizes the expected reward in the environment.We can define the objective function as We use θ a to denote the parameters of the actor's network and θ c to denote the parameters of the critic's network.We can rewrite the function J(π) as where A(s t , a t ) is advantage function.It indicates whether the reward obtained by choosing action a t is higher than the average reward in s t .We define the advantage function as Then, the loss function of the actor's network is Hence, we have where l a is the learning rate of the actor network.
As with the actor network, we use the critic network to estimate the state-value function V θ c (s t ), and its loss function can be expressed as Similarly, we have where l c is the learning rate of the critic network.Algorithm 1 outlines the A2C-based approach for migrating services and allocating resources described in this paper.Initially, we initialize the algorithm parameters, as well as the actor and critic networks, and set up the MECC system based on input values (lines 1-4).At the beginning of each episode, the system's state is reset and transmitted to the agent (lines 6-8).The agent then interacts with the environment: it inputs the observed state into the actor network, which returns the corresponding action distribution (line 9).The agent samples an action from this distribution (line 10), executes it, and receives the resultant reward and the next state (lines [11][12].This process repeats until the predefined update frequency is met.When it is time to update the networks, the observed states at time step t + 1 are fed into the critic network to obtain the average reward for that state (line 16).We then backtrack to time step t − t g + 1, compute the advantage function using (27), and update the parameters of the actor and critic networks according to ( 29) and (31) (lines 18-21).The interaction with the environment continues until the termination of the algorithm.Ultimately, the agent consistently produces actions that yield favorable rewards, demonstrating that the A2C-based approach has effectively optimized migration and resource allocation strategies in the MECC system.This system is now capable of executing pre-trained policies for efficient management.Input : The number of ESs m, the number of users n Output : Trained policy with parameter θ a 1 Randomly set the initial weights for both the Actor and Critic networks; 2 Initialize the MECC system;

Performance Evaluation
In this section, we conduct a series of simulations to evaluate the performance of the A2C algorithm against four alternative schemes across various environments.The results indicate that the algorithm proposed in this paper outperforms the comparative approaches in terms of efficiency.

Simulation Settings
In this simulation, we configure a randomly generated MECC system where the computing power and link bandwidth of each ES are randomly determined within specified parameter ranges.The system supports five types of applications, with parameters for each also randomly generated.The distance from the user to the RAN nodes is fixed at 10 m.Users can randomly select from one of the five application types to generate tasks for offloading.The wireless communication bandwidth between the user and the RANs is set to B = 10 MHz, the transmission power to p j t = 10 dBm, the channel power at the reference distance of 1 m to −50 dB, and the Gaussian noise power spectral density is established at N 0 = −170 dBm/Hz.
In our simulations, all deep neural networks (DNNs) are structured as four-layer fully connected networks, consisting of an input layer, two hidden layers, and an output layer.Each hidden layer in both the actor and critic networks contains 256 neurons.The learning rate for the networks is set at 0.001.As the objective is to minimize the average response delay across all tasks over a prolonged period, we set the discount factor for reinforcement learning to 0.9.Additional details on the experimental simulation parameters are provided in Table 2.

Comparison Experiments
To evaluate our solution in a dynamic migration and resource allocation environment, we benchmarked it against four distinct schemes.

•
Follow-Avg scheme: This scheme targets the user's current location for migration if a task remains incomplete and there is residual time.It then allocates computing resources equally among all tasks on the same server.• PSO scheme: In this scheme, migration targets and resource allocation decisions are treated as particles within a particle swarm optimization (PSO) algorithm, using average response delay as the fitness function.Decisions are made for each time slot state.• PPO scheme: Proximal policy optimization (PPO) is employed, a method from online reinforcement learning within the DRL spectrum, to determine service migration and resource allocation.• DDPG: Deep deterministic policy gradient (DDPG) utilizes the actor-critic framework of DRL to derive migration and resource allocation strategies.

Simulation Results
In all simulations, the parameters for the ESs were randomly generated within the ranges specified in Table 2.At each simulation step, new values were randomly selected from these ranges to update the system state.To assess performance, we calculated the average response delay and the average service failure rate across 10,000 episodes.Each episode consisted of 100 steps, during which we recorded the average service delay, the number of tasks generated by users, and the number of service timeouts.
The impact of the number of ESs: In this section, we investigate the effects of varying the number of ESs on the average service response delay and task failure rate.We fixed the number of users at 10, the duration of each time slot at 0.8 s, and the size of tasks generated by applications between 2 and 3 MB.Figure 4 shows the reduction in average response delay as the number of ESs increases from 4 to 8, incrementing by one each time.As the number of ESs grows, more computational resources become available, facilitating better task resource allocation and reducing service response times.Although the Follow-Avg, PPO, and DDPG methods exhibit minor differences in response delays across various Figure 5 demonstrates that our A2C-based approach achieves the lowest task failure rate, outperforming PSO, while PPO and DDPG record the highest failure rates.Under conditions of abundant resources, other methods experience failure rates as high as 20%, whereas our approach maintains a rate below 10%.Analyzing both Figures 4 and 5, with six or more ESs present, the Follow-Avg scheme shows a lower failure rate than PSO but suffers from longer response delays.This difference stems from the system's dual focus on migration and resource allocation decisions.While abundant resources can mitigate response delays through efficient migration, leading to reduced failure rates, the generic resource allocation strategy of the Follow-Avg scheme, which does not account for task-specific requirements, results in prolonged average response delays.We conclude that the A2C-based algorithm excels over the other four schemes, offering superior service migration and resource allocation policies that effectively minimize both the average service response delay and failure rate, particularly in unpredictable environments.Impact of time constraints: In our system, the time constraints for applications are defined by the number of time slots, and adjustments to the duration of these slots influence the time constraints for each application.For this analysis we fixed the number of ESs at five while maintaining the number of users and task sizes consistent with previous comparisons.Figure 6 illustrates the average service response delays for varying time slot durations, from 0.6 s to 1.0 s, with an increment of 0.1 s.As shown in Figure 6, excluding PSO and our method, the differences in average response delays among the other three methods are minimal.Our method consistently achieves the lowest average service response delay.As the time constraints are relaxed with a constant computational workload, the average service response delay decreases.Shorter time slot durations impose stricter constraints on each task, leading to an increased number of tasks that fail to complete within the designated time.Consequently, these tasks are offloaded to the cloud for continuation, introducing significant propagation delays during transmission and increasing the average service response delay.According to Figure 7, as time constraints become less stringent the task failure rate diminishes.Notably, under tighter time constraints, simple follow-migration methods exhibit higher failure rates, whereas our method consistently outperforms others.With more lenient time constraints, our method maintains a task failure rate below 10%.These observations highlight the importance of intelligently performing dynamic migration and resource allocation according to the current state in dynamic environments.

Impact of the number of users:
The number of users in the system directly impacts the quantity of tasks, influencing the QoS for users.In this comparison, the number of ESs, task sizes, and time slot durations are consistent with previous experiments.We increment the number of users from 8 to 12, adjusting by one each time.As demonstrated in Figure 8, the average service response delay for all methods increases as the number of users rises due to more intense competition for resources among tasks.Notably, when the user count reaches 11, the average response delays for our method and the PSO approach converge.However, our approach consistently maintains a lower average response delay across various scenarios compared to other benchmarks.Furthermore, as shown in Figure 9, the PSO method requires significantly more time to make migration and resource allocation decisions than other methods, which may be impractical in real-world settings.Our method, however, delivers optimal results more efficiently.Figure 10 illustrates that task failure rates increase with the number of users.Despite this, our method outperforms others even under intense resource competition, achieving failure rates approximately 10% lower than PSO and 30% lower than the Follow-Avg method.

Impact of task data size:
In this section, we compare the impact of different task sizes on the results by varying the task sizes generated by the applications while keeping the number of ESs, time slot duration, and number of users consistent with the previous experiments.Regarding the task sizes generated by the applications, we set the application data size range from 2 ± 0.5 MB to 3 ± 0.5 MB, with an increment of 0.25 MB each time.Figure 11 demonstrates the impact of different application data sizes on the decisions made by various methods, with average response delay as the metric.It can be observed that as the task data size increases, the average response delay gradually rises.One reason for this is that the task's data size affects migration delay, thereby reducing the time available for computation.Additionally, although the computational resources in the system remain unchanged, an increase in data size also contributes to an increase in average response delay.Figure 12 illustrates the variation in task failure rates under different data sizes.We observe that as the data size increases, the increase in failure rate becomes more pronounced for the Follow-Avg scheme.In the scenario with the highest data size, the failure rate of the Follow-Avg scheme exceeds that of the PPO and PSO schemes, while our approach ensures the lowest failure rate in all scenarios.Overall, our method achieves lower response delay while maintaining a low failure rate.

Impact of network scale expansion:
To the effectiveness of the algorithm in a real-world scenario, we increased the number of users to 40 and the number of ESs to 20.As shown in Figure 13a, our method achieved the best performance in terms of average response delay, reducing it by 0.1 s compared to other strategies.Additionally, Figure 13b shows that the average failure rate was reduced by 10% compared to other methods.Overall, in a large-scale network environment, our method outperforms other approaches in dynamic migration and resource allocation strategies, maintaining lower average response delays and failure rates.

Evaluation of Algorithm Overhead
In our simulation, we measured the overhead of the algorithm under different network scales, including memory usage, the number of iterations, and the training time per iteration.
In the experiments, we used a 13th Gen Intel ® Core™ i9-13900K CPU without utilizing a GPU.For network scales involving 10 and 40 users, the CPU usage for a single training did not exceed 10%.As shown in Table 3, the differences in the number of training iterations and the time required for each iteration across different network scales were minimal.However, when the number of users increased to 40, memory usage rose by approximately 30 MB compared to when there were 10 users.This indicates that as the problem scale increases, the overhead of the algorithm increases only slightly.We also observed that the duration of each iteration increased from 0.24 s to 0.938 s, due to the expansion of the neural network size caused by the increase in users and edge servers, which in turn increased the computational load.To address this growth, GPU acceleration could be considered in practical applications to effectively reduce the training time.

Conclusions and Work
In this paper, we investigate the dynamic migration and resource allocation issues in MECC systems.We emphasize the importance of service migration in networks characterized by dynamic features.Our investigation covers both the migration process and its performance, and we conduct a modeling analysis of computational performance postmigration.To tackle the challenges posed by dynamic computational demands and user mobility, we propose a method based on the Advantage Actor-Critic framework.This method determines migration and resource allocation operations for each time slot, based on observed states, aiming to minimize the average task response delay.The simulation results demonstrate that our A2C-based approach consistently reduces the average task response delay across various scenarios and ensures the lowest task failure rate compared to benchmark methods.
However, several aspects require further research.While our primary focus has been on the impact of migration on average response delay, the migration process also leads to additional network effects, including migration costs.Future research should explore strategies that simultaneously reduce migration costs and average response times.Additionally, our study mainly examines the impact of migration and computing resource allocation decisions.However, in real-world scenarios, the task offloading strategy significantly influences these migration and resource allocation strategies due to varying communication conditions associated with user mobility.This complex interplay between the offloading strategy and system performance in realistic settings demands more detailed investigation to optimize both cost and efficiency effectively.Another important issue to consider is that the centralized decision-making process may pose a risk of user data leakage.Therefore, finding ways to ensure data security while minimizing the impact of privacy protection mechanisms on system decisions has become a critical challenge.

Figure 1 .
Figure 1.An example of an MECC environment.

Figure 2 .
Figure 2.An example of the migration process.

tt
Wireless transmission rate of user j in time slot t V W Average task data size for application W T tran,k t The uplink transmission delay of task k t ω W ret The ratio of the result data size to the V W T back,k t+r k The delay in returning task results to the ES s loc t+r k t T down,k t+r kThe delay in downlinking the task results to the user κ W

Figure 3 .
Figure 3. Training of A2C-based dynamic migration and resource allocation algorithm.

Algorithm 1 :
Training of A2C-based dynamic migration and resource allocation algorithm.

18
Calculate the Loss of Actor by(28); 19 Update θ a of the Actor network according to (29); 20 Calculate the Loss of Critic by (30); 21 Update θ c of the Critic network according to (31);

Figure 4 .
Figure 4.The impact of the number of ESs on average response delay.

Figure 5 .
Figure 5.The impact of the number of ESs on failure rate.

Figure 6 .
Figure 6.The impact of the time constraint on average response delay.

Figure 7 .
Figure 7.The impact of the time constraint on failure rate.

Figure 8 .Figure 9 .
Figure 8.The impact of the number of users on average response delay.

Figure 10 .
Figure 10.The impact of the number of users on failure rate.

Figure 11 .
Figure 11.The impact of data size on average response delay.

Figure 12 .
Figure 12.The impact of data size on average failure rate.

Figure 13 .
The impact of network scale expansion in an environment with 40 users and 20 ESs.(a) Average response delay.(b) Average failure rate.
1 k,t , . . ., x n k,t ] to denote the allocation of task k t on ESs in time slot t, where x n k,t = 1 indicates that task k t is allocated to server n; otherwise, it signifies that task k t is not on server n.In time slot t, task k t needs to satisfy

Table 1 .
List of defined notation.

3
Set global counter i and step counter t = 1; 4 Set t g , γ, EP_num, T, learning rate l a , l c ; 5 for episode i ← 1 to EP_num do The Actor network takes s t as input and outputs the policy distribution of action π(a t |s t ); 9 10 Agent samples the action a t according to π(a t |s t ); 11 Execute the action a t ; 12 Obtain the reward r t , get new state s t+1 ; 13 if t%t g == 0 then // t g is the frequency of update neuronal network 14

Table 3 .
Evaluation of algorithm overhead.