Energy-Efficient Task Offloading in Wireless-Powered MEC: A Dynamic and Cooperative Approach

: Mobile Edge Computing (MEC) integrated with Wireless Power Transfer (WPT) is emerging as a promising solution to reduce task delays and extend the battery life of Mobile Devices (MDs). However, maximizing the long-term energy efficiency (EE) of a user-cooperative WPT-MEC system presents significant challenges due to uncertain load dynamics at the edge MD and the time-varying state of the wireless channel. In this paper, we propose an online control algorithm to maximize the long-term EE of a WPT-MEC system by making decisions on time allocations and transmission powers of mobile devices (MDs) for a three-node network. We formulate a stochastic programming problem considering the stability of network queues and time-coupled battery levels. By leveraging Dinkelbach’s method, we transform the fractional optimal problem into a more manageable form and then use the Lyapunov optimization technique to decouple the problem into a deterministic optimization problem for each time slot. For the sub-problem in each time slot, we use the variable substitution technique and convex optimization theory to convert the non-convex problem into a convex problem, which can be solved efficiently. Extensive simulation results demonstrate that our proposed algorithm outperforms baseline algorithms, achieving a 20% improvement in energy efficiency. Moreover, our algorithm achieves an [ O ( 1/ V ) , O ( V )] trade-off between EE and network queue stability.


Introduction
With the continuous evolution of communication technologies and the Internet of Things (IoT), a growing diversity of wireless device applications is observed across various sectors.These include autonomous navigation, virtual reality (VR), intelligent urban planning, and telemedicine procedures [1].Such applications demand substantial computational resources and are highly sensitive to latency [2,3].However, they are often constrained by the limited processing capabilities and finite battery life of mobile devices [4].Mobile Edge Computing (MEC) has emerged as a promising solution to address these constraints, which is a distributed computing paradigm that enhances the computational capabilities of networks [5,6].By decentralizing computational and storage resources to the edge of the network, in close proximity to end-users, MEC facilitates the offloading of computationintensive tasks from wireless devices to nearby servers [7].This approach not only conserves energy on mobile devices but also significantly reduces execution latency, thereby enhancing the performance and user experience of latency-sensitive applications [8].
However, the constraint of limited battery capacity in mobile devices poses a substantial challenge, particularly given the logistical difficulty of regularly replacing batteries in a vast number of devices.To address this challenge, Wireless Power Transfer (WPT) has been proposed as a sustainable energy solution [9].In a Wireless Powered Mobile Edge Computing (WPT-MEC) network, a Hybrid Access Point (HAP) serves as the conduit for broadcasting Radio Frequency (RF) energy to wireless devices.Leveraging Energy Harvesting (EH) technology, edge devices can transduce the received RF signals into usable energy to recharge their batteries [10].This harvested energy then enables the devices to accomplish computation tasks either locally or by offloading them to MEC servers.WPT not only extends the battery lifespan of devices but also significantly boosts their computational capabilities [11].
In a WPT-MEC network, Energy Efficiency (EE) is a critical performance metric, defined as the ratio of data processed to the energy consumed by the system [12,13].For instance, Zhou and Hu [13] introduced two iterative-based optimize algorithms aiming to maximize the computational efficiency of the MEC system, considering both partial and binary offloading modes.Fei et al. [12] tackled the challenge of optimizing long-term energy efficiency in IRS-assisted multiuser wireless-powered Mobile Edge Computing (MEC) systems.They proposed an online task offloading algorithm that leverages the Lyapunov optimization method and the Penalty Dual Decomposition (PDD) technique.Sun et al. [14] leveraged the stochastic network optimization technique to design a task offloading scheme to optimize the network EE in the device-to-device (D2D)-aided WPT-MEC system.However, these studies have not considered the double-near-far effect, which can significantly impact edge mobile nodes situated far from the HAP.When a mobile device (MD) is placed at a considerable distance from the HAP, it can result in degraded channel conditions due to the increased communication distance and reduced energy harvesting.This degradation can subsequently cause inefficient data transmission due to signal interference between MDs in close proximity and those further away.
Cooperative computing schemes have been introduced to mitigate the impact of the double-near-far effect [4,9,[15][16][17].These schemes leverage relay technology, where devices closer to the AP act as relays to transmit signals for devices situated further away, thereby enhancing data rates under unfavorable channel conditions.For example, Li et al. [15] proposed a novel multi-user cooperation scheme to maximize the weighted sum computation rate.He et al. [4] tackled the user-cooperation problem for maximizing the EE for a WPT-MEC integrated with Backscatter communication.However, these works mainly focused on immediate network performance and usually assume that a priori the load level at the edge node can be obtained, neglecting the dynamics of task arrival, battery level rate, and time-varying wireless channel.In a volatile network environment, the dynamic allocation of resources and task offloading for edge and auxiliary nodes become significantly challenging.
This study focuses on the problem of maximizing energy efficiency in a user-cooperation WPT-MEC network by jointly considering stochastic task arrival and dynamic wireless channel variations.Furthermore, we integrate a battery to store the harvested energy for both the MD and the helper node.The problem presents significant challenges in two main aspects.First, the unpredictability of task arrivals and the randomness of channel states on both the data transmission and wireless charging channels result in a stochastic optimization problem.Second, the time coupling among wireless charging duration, task offloading at the edge node, and task processing at the helper node pose a great challenge in finding the optimal solution.To tackle this problem, we formulate it as a stochastic programming problem.By leveraging Dinkelbach's method [18] and Lyapunov optimization technique [19], we transform the stochastic optimization problem into a deterministic problem for each slot.This problem, while non-convex, is then converted into a convex problem using variable substitution and convex optimization techniques.We proposed an efficient online control algorithm, the Dynamic Offloading for User Cooperation Algorithm (DOUCA), which can be easily implemented and operated without prior knowledge of future system information.

Related Work
WPT-MEC has gained significant attention from researchers since it helps alleviate energy limitations of IoT nodes while ensuring real-time performance of mobile applications [13,20,21].Maraqa et al. [20] designed an energy-saving scheme for a multi-user NOMA-assisted cooperative terahertz single-input multiple-output (SIMO) MEC system that aims to maximize the total user computation energy efficiency (CEE).Ernest and Madhukumar [21] introduced an energy efficiency maximization algorithm based on multiagent deep reinforcement learning.This algorithm enhances the computation offloading strategy to achieve maximum energy efficiency in MEC-supported vehicular networks.Shi et al. [9] addressed the practical nonlinear energy harvesting (EH) model by jointly optimizing various factors such as computation frequency, execution time of MEC servers and IoT devices, offloading time, EH time, transmission power of each IoT device, and power beacons (PB).Additionally, Park and Lim [22] considered a distributed sleep control method, which autonomously decides whether to enter sleep mode, thereby reducing energy consumption and improving energy efficiency.
To mitigate the double-near-far effect and fully utilize available resources, many researchers focus on user cooperation-assisted WPT-MEC networks [4,11,16,23].Su et al. [23] proposed an algorithm to maximize computational efficiency for a user cooperation (UC) and non-orthogonal multiple access (NOMA) WPT-MEC network, taking into account a nonlinear energy harvesting model.He et al. [4] demonstrated an innovative UC scheme integrating BackCom and AC to maximize user energy efficiency by leveraging a helper node as a relay between the source node and the HAP.Wang et al. [24] considered a user cooperation scheme for a WPT-assisted NOMA-MEC network to minimize the total energy consumption of the system using the Lagrangian method to convert the non-convex optimization problem into a convex one.Wu et al. [25] proposed a novel multi-user cooperation scheme to maximize the weighted sum computation rate by considering partial task offloading and orthogonal frequency-division multiple access (OFDMA) communication technology.Due to the constraints of time-varying network environments, achieving a long-term stable optimal solution remains a significant challenge.
In volatile network environments, more and more research focuses on achieving longterm average system performance [1,17,26,27].Zeng et al. [26] designed an online algorithm to minimize energy consumption based on the Lyapunov optimization framework and meta-heuristic methods.Luo et al. [27] proposed a deep reinforcement learning (DRL) algorithm to minimize long-term energy consumption and employed a concave-convex procedure (CCCP) algorithm to solve the computation and communication resource sub-problem for an MEC system with non-complete overlapping non-orthogonal multiple access (NCO-NOMA) technology.Liu et al. [28] used a dynamic optimization scheme based on queuing theory for a 5G MEC heterogeneous network with multiple energy-harvesting MDs, aiming to minimize the average execution delay of the system.Sun et al. [29] proposed a multi-agent reinforcement learning algorithm that combines federated learning and adopts a fine-grained training strategy to accelerate convergence in a dynamic communitybased MEC environment.Ensuring Quality of Service (QoS) and adhering to Service Level Agreements (SLAs) are crucial for the performance and reliability of IoT applications.Seifhosseini et al. [5] proposed a multi-objective optimization genetic algorithm (MOGA) to address the inefficient deployment of IoT applications on fog infrastructure, aiming to minimize bandwidth wastage and power consumption while ensuring service reliability and Qos.Seifhosseini et al. [5] introduced a multi-objective cost-aware discrete Grey Wolf Optimization-based algorithm (MoDGWA) for efficient task scheduling in IoT applications under the fog computing paradigm, addressing the challenge of balancing execution time, cost, and reliability to meet user SLAs and budget constraints.
Unlike the aforementioned studies, this paper addresses the challenges of task offloading and user cooperation in dynamic WPT-MEC network environments.Uncertainty in load dynamics and fluctuations in wireless channel conditions, factors often overlooked in previous research, significantly impact the energy efficiency and stability of WPT-MEC systems.Poor channel conditions can increase the energy consumption of IoT devices, and excessive task offloading during periods of good channel conditions may lead to latency [30].Moreover, sudden surges in load can overload the system and degrade service quality.Concurrently, real-time networks are affected by changes in device location and network congestion [31,32], with time-varying channel states and the unpredictable task arrival process adding to the system's complexity.Therefore, we consider the dynamic arrival of tasks, time-varying wireless channel conditions, and the time-slot coupling of battery levels.Additionally, the time coupling between user cooperative communication and wireless charging, as well as the data offloading coupling in cooperative communication, further complicate the problem.

Motivations and Contributions
In this paper, we address the problem of long-term energy efficiency optimization in a user-cooperation WPT-MEC network by taking into account the uncertain load dynamic at edge node and time-varying wireless channel state, which has not been extensively investigated in the literature.The main contributions of our work are summarized as follows: • We propose a dynamic task offloading model to optimize the energy efficiency for a WPT-MEC systems with user cooperation.This model considers the randomness of task arrivals, fluctuating wireless channels, and dynamic battery levels.By extending the methodologies presented in [4,33] to accommodate volatile network environments, our model effectively balances system energy efficiency and queue stability.This enhancement makes it highly applicable to real-world scenarios.

•
We design a low-complexity algorithm for long-term network energy efficiency maximization based on Lyapunov optimization theory.Leveraging the drift-plus-penalty technique, we decouple the stochastic programming problem into a non-convex deterministic optimization sub-problem for each time slot.By utilizing variable substitution and convex optimization theory, we transform the sub-problem into a convex problem with a small number of variables, enabling efficient solutions.

•
We conduct extensive simulations to evaluate the effectiveness and practicality of our proposed algorithm, focusing on the impact of the control parameter V, network bandwidth, task arrival rate, and geographical distance on energy efficiency and network stability.The results demonstrate that our algorithm achieves 20% higher efficiency compared to baseline algorithms and exhibits a clear O 1 V , O(V) energy efficiency-stability trade-off.
The rest of the paper is organized as follows.Section 2 presents the system model of the user-cooperation WPT-MEC network and formulates a stochastic programming problem.In Section 3, we employ the Lyapunonv optimizing technique to solve the problem and propose an efficient online algorithm, accompanied by a theoretical performance analysis.In Section 4, simulation results are presented to evaluate the proposed algorithm.Finally, Section 5 concludes our work and discusses the future directions.

System Model
As illustrated in Figure 1, the WPT-MEC system comprises two MDs and a HAP.One MD, situated at a considerable distance from the HAP, is burdened with a substantial computational workload.The other MD, in proximity to the HAP and in an idle state, acts as a helper.Both MDs operate on the same frequency band and are equipped with integrated batteries for energy storage.The HAP is fitted with an RF energy transmitter and an MEC server, which provide wireless energy and computation offloading services to edge nodes within the base station's coverage.To mitigate mutual interference, each MD employs a Time-Division Duplexing (TDD) [34] approach to alternate between communication and energy harvesting operations.We adopt a discrete time-slot model over a time horizon divided into T time blocks, each of duration τ.At the beginning of each time slot, both nodes harvest energy from the RF signals emitted by the HAP, which is then stored in their batteries to facilitate subsequent data transmission or local task execution.A partial offloading strategy is implemented, allowing for the flexible offloading of a portion or the entirety of the computational data to a remote device.Due to poor channel conditions between the distant MD and the HAP, exacerbated by the double near-far effect, direct offloading to the MEC server is infeasible.Consequently, the MD offloads computation data to the helper, which then relays it to the HAP.The helper processes the offloaded tasks or further offloads a segment to the HAP.Upon completion, the HAP returns the computation results to the MD, facilitated by the helper.The key notation and definitions are listed in Table 1.The transmit power of HAP, MD and helper

Mobile device(MD)
The amount of tasks processed locally at MD in slot t d off m (t) The amount of tasks offloaded to helper at MD in slot t d loc h (t) The amount of tasks processed locally at helper in slot t d off h (t) The amount of tasks offloaded to HAP at helper in slot t e loc m (t) The energy consumed by processing tasks at MD in slot t e off m (t) The energy consumed by offloading tasks at MD in slot t Table 1.Cont.

Notation Definition e loc h (t)
The energy consumed by processing tasks at helper in slot t e off h (t) The energy consumed by offloading tasks at helper in slot t d m (t) The amount of tasks processed in slot t e m (t) The energy consumed at MD in slot t e h (t) The energy consumed at helper in slot t f m , f h The local CPU frequency at MD and helper ϕ m , ϕ h The CPU cycles required to compute one bit task at MD and helper b max m , b max h The maximum battery capacity µ The energy conversion efficiency κ The computing energy efficiency W The channel bandwidth ε 2  The additive white Gaussian noise Note that although our model considers only a network with two user nodes, it can be easily extended to scenarios with multiple user nodes.For instance, in such scenarios, a matching algorithm can be used to pair distant nodes with nearby nodes, effectively transforming the problem into multiple two-node user cooperation models.

Wireless Powered Model
The HAP is equipped with a reliable power source and is responsible for transmitting RF energy to the array of WDs dispersed within its service area.In the first part of each time slot, the HAP broadcasts wireless energy to the MD and the helper for a τ t 0 amount of time.Let e eh m (t), e eh h (t) denote the harvested energy of MD and helper from HAP at time slot t, respectively.So we have [35] e eh m (t) = µh t m τ t 0 P 0 (1) and where 0 < µ < 1 represents the energy conversion efficiency [35], and P 0 denotes the RF energy transmit power of the HAP.h t m and h t h denote the channel gain between the MD and HAP, between the helper and HAP, respectively, which remain constant within the same time slot and vary across different time frames.

Task Offloading Model
As depicted in Figure 1, the task data arrival at MD in the tth time slot is denoted as A(t) ∈ [A min ,A max ].It is assumed that A(t) follows an independent and identically distributed (i.i.d.) process in different time slots, with an exponential distribution with mean λ, e.g., E[A(t)] = λ [35].The generated computation task at time slot t will be placed in the data queue Q at MD, waiting to be processed in a First Come First Server (FCFS) manner.Let Q(t) denote the backlog of the data queue at slot t.Thus, the data queue update [36] can be denoted as follows: where d m (t) = d loc m + d off m (t) denotes the total data process at MD at slot t. d loc m represents the data of a task executed by the MD locally, and d off m denotes the task offloaded to the helper by wireless transmission.
Let f s denote local CPU frequency at MD, a constant value, and ϕ s represent the CPU cycles required to compute one bit task at MD.The raw data (in bits) processed locally on MD at slot t is [37] Note that here, d loc m (t) is a constant value, so we rewrite d loc m (t) as d loc m .Meanwhile, the corresponding energy consumed [38] for local computing at slot t is [37] where κ > 0 denotes the computing energy efficiency parameter, which depends on the circuit structure.Here, we adopt the partial task offloading strategy, which means a portion of task data will be offloaded the helper.Let P t m denote the transmit power of MD, which is constrained by the maximum power P t m ≤ P max m , and τ 1 is the amount of offloading time for MD.Thus, according to Shannon's theorem, the offloading data to the helper can be expressed as [37] where W denotes the channel bandwidth, g t s denotes the channel gain from MD to helper at slot t, and σ 2 is additive white Gaussian noise.Here, there is an upper bound of . The corresponding energy consumption for task offloading is The MD maintains an energy queue to store the harvested energy from HAP for local computation and task offloading, the energy queue B m evolved as follows [39]: where e m (t) = e loc m (t) + e off m (t), represents the total energy consumption of the MD at slot t and b max m represents the maximum battery capacity of the MD.Here, we adopt a minimum battery level of 0 for the sake of simplicity.But our model can be easily extended to handle a specified minimum battery threshold by simply changing the battery update formula.

User Helper Model
We assume that the helper also adopts the partial offloading mode, which means the helper can process the offloading task from the MD, while offloading the computation task to the edge server.After the initial (τ t 0 + τ t 1 ) time, when the offloading task has arrived at the helper, the helper should determine the transmission power P h of offloading data to the edge server.Similar to the MD, the amount of local computing task data and the corresponding energy consumption at the helper at slot t can be derived as [37] where f h denotes local CPU frequency at helper, and ϕ h represents the CPU cycles required to compute one bit task at helper.The amount of offloading data to edge server and the corresponding energy is [37] e off m (t) = P t h τ t 2 (12) where P t h ≤ P max h denotes the transmit power of the helper and g t h represents the channel gain from the helper and HAP at slot t.Note that here, the helper must process the total offloading task data from the MD at at each slot t, so there is not a data queue at the helper, and we have the following constraint Similar to the MD, the helper also maintains an energy queue B h to store the harvested energy from HAP to support the local computing and task offloading, the battery level of helper updates as [37] where e h (t) = e loc h (t) + e off h (t) represents the amount of energy consumption of the helper at slot t and b max h represents the maximum energy that can be stored in the battery of the helper.

Network Stability and Utility
Definition 1 (Queue Stability).The task data queue is strongly stable [19] The length of the task queue in our problem reflects the processing latency of the tasks.A stable queue ensures that all tasks can be completed within finite time slots, thereby guaranteeing the quality of service (QoS) of the system.The method of evaluating QoS using queues is also utilized in [40].The network utility here is defined as the ratio of the total achieved computation data to the total energy consumption.The total accomplished computation data and the total energy consumption of a user-assisted network at slot t can be expressed as, respectively, E tot (t) = e m (t) + e h (t) The EE of the network is defined as the time-average achieved computation data by using a unit energy consumption, which is defined as the ratio of the long term processed data to the total energy consumption, as follows [14]:

Problem Formulation
In this paper, our objective is to design a dynamic control algorithm that maximizes the time-average network EE for a user-assisted WPT-MEC system, all under the constraint of network stability.For each time slot, we make decisions regarding the time allocation for WPT, task offloading time to the helper, task processing time at the helper, and transmit power at the MD and helper, without knowing the future channels and data arrivals.Let − → τ (t) = τ t 0 , τ t 1 , τ t 2 denote the time allocation at slot t and − → P (t) = P t m , P t h denote the transmit power of the MD and helper.The problem can be formulated as a multi-stage stochastic optimization problem, as follows: lim Constraint (18b) represents the time allocation constraint.Constraints (18c) and (18d) correspond to the battery energy constraints for the MD and helper, respectively, indicating that the energy level in the battery must be greater than zero.Constraint (18e) ensures the network stability of the system.Constraint (18f) defines the upper bound of the data processed at slot t, meaning that the amount of data processed will not exceed the length of the current data queue Q. Constraint (18g) ensures that the data offloaded to the helper should be completely processed within the current time slot.
Problem P0 is a fractional optimization problem, which is typically non-convex.To handle this, we first utilize the Dinkelbach method [41] to transform P0 into a more manageable problem, similar to [42].We denote the optimal value of η Proof.For brevity, here, we omit the proof details.See Proposition 3.1 of [43].
Since η opt EE is unknown during the solution process, ( 18) is still infeasible to tackle.In accordance with the methodology employed in [44], we introduce a new parameter u(t) and define it as We set u(0) = 0 at the beginning of the problem.Replacing η EE (t) in ( 18), the problem P1 can be transformed into where u(t) is a given parameter that should be updated through the resolution process.
It should be noted that u(t) obtained by (20) will get closer to η opt EE as time goes by [14].Therefore, this transformation is reasonable and has the same optimal solution with P0.Although problem P1 is easier to solve compared to problem P0, it still presents the following challenges: (1) Constraints (18c) and (18d), and Equations ( 8) and ( 14) lead to the coupling of battery levels across different time slots over the entire period, meaning that current energy consumption affects future battery levels; (2) The stochastic arrival of user tasks and the dynamic variations in channel states are difficult to predict accurately, which introduces temporal coupling in decision-making.These challenges significantly complicate the problem-solving process.

Algorithm Design
For problem P1, even with full knowledge of system information for all future time slots, solving it using dynamic programming or similar algorithms would be infeasible due to the curse of dimensionality arising from the vast decision space.Therefore, this paper seeks a dynamic control method that does not require predicting future system information.Our proposed algorithm makes decisions based solely on the current time slot's system state, without needing any prior system information.First, to decouple the battery energy level across time slots and ensure the stability of the task queue, we leverage the Lyapunov network optimization technique to transform the long-term average problem into a deterministic optimization problem for each time slot.

Lyapunov Optimization Formulation
To simplify the battery energy queue at MD and helper, we introduce two virtual queues Following the Lyapunov optimization framework, we define a combined queue vector Θ(t) ≜ Q(t), B m (t), B h (t) and the quadratic Lyapunov function as follows: Then, we obtain the one-period conditional Lyapunov drift [45] as follows: Note that L(Θ(t)) represents the congestion of all queues Q(t), B m (t) and B h (t).According to the Lyapunov optimization theory [45], we derive the one slot drift-plus-penalty expression as where control parameter V is a positive value, used to balance the trade-off between network EE and network stability.Actually, V acts as a weighting factor of the cost optimality in the drift-plus-penalty expression.Increasing the value of V causes the algorithm to focus more on network EE, which may also result in a larger backlog of the task queue Q.We derive an upper bound of ∆ V (Θ(t)) as Lemma 1.
Lemma 1.For any control strategy − → τ (t), − → P (t) at each time slot t, the one slot Lyapunov drift-plus-penalty ∆ V (Θ(t)) is bounded by the following inequality: where B is a constant that satisfies the following ∀t: Proof.By using the inequality (max[a − b, 0] + c) 2 ≤ a 2 + b 2 + c 2 + 2a(c − b), ∀a, b, c ≥ 0 and combining Equation (3), we have Based on the definition of battery energy queue B m (t) and B h (t), we have Combining the above inequalities ( 27)-( 29), we obtain the upper bound of the Lyapunov drift-plus-penalty.
According to the drift-plus-penalty technique in Lyapunov optimization theory [45], we seek to greedily minimize the upper bound of ∆ V (Θ(t)) at each time slot t, then we can obtain a close-to-optimal solution of problem P2.Therefore, we transform problem P2 to a minimization problem of the RHS (right hand side) of ( 25).Note that we can observe the value of A(t), Q(t), B m (t) and B h (t) at the beginning of each slot t, so we can solve the optimizing problem at each slot.Then the one time slot problem can be represented as The proposed problem P2.1 is a non-convex problem and cannot be easily solved by classic convex optimization methods.To address this issue, we first introduce auxiliary variables φ 1 = p t m τ t 1 and φ 2 = p t h τ t 2 .Then the problem P2.1 can be simplified as follows: P2.2 : min where The problem P2.2 is still a non-convex problem due to the non-convex constraint (31f).In (31f), both sides of the equation are concave, which does not satisfy the conditions for convex constraints.We introduce an auxiliary variable ψ to replace the concave function Bringing in ψ and constraint (32), the problem P2.2 can be transformed into s 1 when the problem P3 reaches the optimal solution, which is consistent with P2.P3 is a convex optimization problem, which can be efficiently solved by convex optimization tools, such as CVX [46].
Step 2. In problem P3, the objective function (33a) is linear with respect to all variables.Constraints (33b)-(33d) and (33g) are all linear inequality constraints.Moreover, for constraint (33e), Since the perspective operation preserves convexity [47], s 2 is concave with respect to φ 2 and τ t 2 .It is obvious that a 4 τ t 2 and ψ are linear functions.Thus, the (33e) is a convex constraint.For the same reason, ) is concave with respect to φ 1 and τ t 1 so that (33f) is a convex constraint.Thus, P3 is proved to be convex.
According to Lemma 2, the original problem is transformed into a convex problem, and achieves the global optimal solution.Thus, at each time slot, we only need to solve a convex problem, P3, which contains a small number of variables.By doing so, we can achieve optimal long-term average EE, even without knowledge of future system information.Our proposed algorithm, the Dynamic Offloading for User Cooperation Algorithm (DOUCA), is summarized as Algorithm 1.

Algorithm Complexity Analysis
At each time slot, we are required to solve a simple convex optimization problem, P3, which contains only five decision variables.This can be efficiently solved using mature methods such as the interior point method, which has a computational complexity of approximately O(n 3.5 log(1/ϵ)), where n is the number of decision variables.In our case, we efficiently solve P3 using CVX.

Algorithm Performance Analysis
In this section, we analyze that the proposed scheme can achieve the optimal long-term time-average solution.First, we give some assumptions as follows: lim lim then, we can also conclude that the expectation converges to the same solutions lim lim Lemma 3. Based on (35)-( 37), we have lim To start with, we give the existence of the optimal solution based on the current queue status.Lemma 4. If the problem (P1) is feasible, there exists a policy { − → τ (t), − → P (t)} * that satisfies the following conditions ∀t, ε > 0: where * represents the value under optimal solution.
Theorem 2. The optimal long-average utility function obtained by P1 is limited by a lower bound that is independent with the time space.The following solutions can be achieved by the algorithm, All queues Q(t), B m (t), B h (t) are mean rate stable.Thus, the constraints are satisfied.
Proof.For any ε > 0, let us consider the policy and queue state in ( 42)- (44).Since the result values are independent of queue status Θ(t), we have By integrating these results into (25) and making ε → 0, we have is a constant value, which is independent of the current queue status Θ(t).Utilizing the iterated expectation and obtaining the sum of the above inequality over time t ∈ {0, 1, . . ., T − 1}, we have Dividing both sides of (63) by VT, utilizing the Jensens inequality and the fact that E{L[Θ(T)]} ≥ 0, we have Furthermore, letting T → ∞, we have considering lim Furthermore, we obtain Proof.By taking iterated expectation and using telescoping sums over t ∈ {0, 1, . . ., T − 1}, we have Dividing both sides of (69) by Tε, taking T → ∞, rearranging the terms yields Theorems 2 and 3 provide a rigorous mathematical performance analysis for our proposed algorithm.They demonstrate that the time-average η EE increases at a rate of O(1/V), while the queue length increases at a rate of O(V).The WPT-MEC system EE η EE can be improved by adjusting the value of V.However, the time-average task queue Q will increase with V. Therefore, we can tune V to achieve a [O(1/V), O(V)] trade-off between η EE network EE and task queue length.According to Little's low [44], the latency is proportional to the time-average task queue length.This also implies that our proposed algorithm can achieve an EE-latency trade-off.This balance is critical in many real-world applications where both efficiency and response time are important.

Simulation Results
In this section, we conduct numerical simulations to evaluate the performance of our proposed algorithm.The simulation experiments were executed on a platform equipped with an Intel(R) Xeon(R) Silver 4116 CPU, manufactured by Intel Corporation in Santa Clara, California, USA, operating at 2.10 GHz, featuring 48 cores, and supplemented by four GeForce RTX 2080 Ti GPUs.These GPUs are manufactured by NVIDIA Corporation, based in Santa Clara, California, USA.In our simulation experiments, we utilized Python 3.12, integrated with the convex optimization library CVXPY 1.5, running on a Windows 10 operating system.The simulations were conducted using the PyCharm Professional Edition 2024.1.4.We employed a free-space path-loss channel model, and the averaged channel gain h is denoted as where For simplicity here, we assume that a t = [1.0, 1.0, 1.0, 1.0] and the channel gains remain the same in a single slot.The interval between task arrivals follows an exponential distribution with a constant average rate λ.The other parameters are set similar to [35] and listed in Table 2.  2, the EE exhibits significant fluctuations during the initial phase, but as time goes on, the curve gradually stabilizes and converges rapidly.Meanwhile, the task queue length Q decreases as time slot t increases and becomes stable over time, demonstrating the effectiveness of our proposed algorithm.Furthermore, we observe that a larger control parameter V results in higher EE.However, the average queue length also increases accordingly, which is consistent with the theoretical analysis of algorithm.Figure 3 demonstrates the influence of control parameter V on EE and average task queue length.We find that as V increases from 10 to 100, the EE escalates from 1.95 × 10 7 bits/joule to 2.28 × 10 7 bits/joule, while the backlog of queue Q expands from 0.5 × 10 8 bits to 2 × 10 8 bits.This trend signifies that both EE and queue length augment with an increasing V.This is because with the increase in value of V, our algorithm will focus on optimizing EE, and pay less attention to the network queue stability.Here, V acts as a knob to balance the trade-off between EE and network queue stability.
In Figure 4, we evaluate the the impact of network bandwidth W on system performance under V = 50.As the network bandwidth W increases from 0.85 × 10 7 Hz to 1.35 × 10 6 Hz, the EE rises from 1.7 × 10 7 bits/joule to 2.8 × 10 7 bits/joule, while the task queue length Q decreases from 3.6 × 10 8 bits to 0.25 × 10 8 bits.This is because the increase in network bandwidth improves the speed of task data upload, enabling more tasks to be offloaded to the helper and HAP.Consequently, the amount of tasks processed at the MD also increases, leading to a rise in energy efficiency.Additionally, the increased network bandwidth enhances the MD's data processing rate, contributing to a reduction in task queue length.Moreover, Figure 4 shows that when the network bandwidth is below 1, the variation in bandwidth has a more significant impact on the task queue length than on EE.Conversely, when the bandwidth exceeds 1, its variation has a greater impact on EE.This analysis indicates that appropriately increasing the network bandwidth can significantly enhance system performance.
In Figure 5, we evaluate the impact of task arrival rate λ on system performance when V = 50.As observed in Figure 5, an increase in task load corresponds to a decrease in EE while the task queue length Q exhibits an increasing trend, which is consistent with real-world expectations.The reason is that as the task arrival load increases, the processing efficiency of both MD and helper remains unchanged, causing an accumulation of task data in the MD's task queue.Furthermore, Figure 5 indicates that when λ exceeds 2.3 × 10 6 bits/s, the queue length Q increases rapidly, while energy efficiency continues to decrease linearly.This analysis implies that an excessive load can negatively impact system performance.Consequently, it is crucial to either expand the bandwidth or improve the processing capacity of the MD to address this issue.

Comparing with Baseline Algorithms
To evaluate the performance of our algorithm we consider the following three representative benchmarks: 1.
Edge computing scheme: The MD does not perform local computation and offloads all tasks to the helper and HAP. 2.
Random offloading scheme: The MD randomly selects part of the tasks to offload to the helper and HAP. 3.
Equalized time allocation scheme: Allocate task offloading time equally to the MD and helper, which means τ 1 = τ 2 in our model.For fairness, it is essential to maintain network queue stability across all methods.Therefore, the three baseline approaches mentioned above are implemented based on the Lyapunov optimization framework.
In Figure 6, we evaluate the performance of our algorithm and the three baseline algorithms over a total of 3000 time slots under V = 50.All algorithms converge after 1000 time slots.Our algorithm achieves the best EE, followed by the random offloading approach, with the equalized time allocation scheme ranking third, and the edge computing method performing the worst.Our algorithm outperforms the other three by 10%, 10%, and 20% respectively.This superior performance can be attributed to our algorithm's consideration of the relationship among charging time, offloading time, and the helper's cooperative time.It also leverages both the local computing resources of the MD and the computing resources of the edge server.The edge computing method, which offloads all tasks to the edge server, only considers edge computing resources and overlooks the computing resources of the MD endpoint, resulting in inferior performance.The random offloading algorithm, which can utilize both edge server and local resources, ranks second in performance.Furthermore, the equalized time allocation method ignores the priorities of the MD and Helper when offloading tasks, leading to an inefficient offloading process, and thus reducing the overall system performance.Figure 7 shows the impact of the change of network bandwidth, ranging from 0.85 × 10 6 Hz to 1.35 × 10 6 Hz, on network performance across different algorithms.As can be seen from the figure, the EE achieved by all schemes increases with the growth of network bandwidth.This is because all schemes utilize edge computing resources for task computation, and an increase in bandwidth means more tasks can be uploaded to the edge server.This demonstrates the significant influence of network bandwidth on mobile edge networks.Furthermore, our algorithm consistently achieves the best EE across different bandwidths, demonstrating its superiority.Also, it can be seen from the figure that when the network bandwidth is around 1.0 Mbps, the superiority of our algorithm is most evident, far exceeding that of other baseline algorithms.In Figure 8, we evaluate the system performance under varying distances between the MD and helper for all different algorithms, with the distance ranging from [130-148] m.We find that the EE of all algorithms decreases as the distance increases.Across all distances, our algorithm achieves the best EE.When the distance d = 148 m, our algorithm's performance improves by 17%, 17%, and 31% compared to the other three algorithms, demonstrating that our algorithm can more effectively utilize network resources and edge computing resources.Additionally, it can be seen from Figure 8 that as the distance increases, the advantages of EE achieved by our algorithm tends to decline.This suggests that in practical environments, the distance between edge node devices and relays should not be too large, as it could lead to a rapid decline in network performance.

Conclusions and Future Direction
The joint optimization of computation offloading and resource allocation in WPT-MEC systems poses a significant challenge due to the time-varying network environments and the time-coupling nature of battery charging and discharging.In this study, we concentrate on maximizing the long-term EE of a WPT-MEC system through user collaboration.We formulated an EE maximization problem that takes into account the uncertainty of load dynamics and the time-varying wireless channel.This problem presents substantial difficulties due to the coupling of multiple parameters.To address this issue, we propose an efficient online control algorithm, termed DOUCA.This algorithm employs Dinkelbach's method and Lyapunov optimization theory to decouple the sequential decision problem into a deterministic optimization problem for each time slot.Extensive simulation results validate the effectiveness of our proposed algorithm, which achieves a remarkable improvement in energy efficiency compared to benchmark methods, striking a balance between EE and system stability.
The limitations of our paper lie in that the decision-making process is currently based solely on the system state of the present time slot, neglecting the potential of historical data and system dynamics to improve decision-making efficiency and outcomes.Moving forward, we intend to explore a deep learning-based strategy capable of harnessing historical data to forecast system behaviors, guiding more informed decision-making and thus enhancing overall system performance.

Figure 1 .
Figure 1.Architecture of WPT-MEC network with user assistance.

optTheorem 1 .
EE as η EE and obtain the following Theorem 1.The optimal η opt EE is achieved if and only if max

Theorem 3 .
Let eupper m be the upper bound of e m (t).The time-average sum rate of queue length is bounded

Figure 2 .
Figure 2. Convergence performance of energy efficiency EE and task queue Q over time slots.

Figure 3 .Figure 4 .Figure 5 .
Figure 3. Energy efficiency EE and task queue Q with different control parameter V.

Figure 6 .
Figure 6.Energy efficiency EE in different schemes over time slots.

Figure 7 .
Figure 7. Energy efficiency EE in different schemes with different bandwidth W.

Figure 8 .
Figure 8. Energy efficiency EE in different schemes with different distances between MD and helper.

Table 1 .
Key notations and definitions.
4 11Update the battery queue B m (t) and B h (t).
denotes the antenna gain, f c = 915 MHz denotes the carrier frequency, d e = 3 denotes the path loss exponent, and d i in meters denotes the distance between two nodes.The time-varying WPT channel gain and task offloading channel gain h t = a t

Table 2 .
Simulation parameters.Figure 2, illustrates the variation curves of EE and average task queue length Q over the 5000 time slots under different control parameters V = 30, 50 and the task arrival rate λ = 2.2 Mbps.As shown in Figure