Provision for Energy: A Resource Allocation Problem in Federated Learning for Edge Systems

The article explores an energy-efficient method for allocating transmission and computation resources for federated learning (FL) on wireless communication networks. The model being considered involves each user training a local FL model using their limited local computing resources and the data they have collected. These local models are then transmitted to a base station, where they are aggregated and broadcast back to all users. The level of accuracy in learning, as well as computation and communication latency, are determined by the exchange of models between users and the base station. Throughout the FL process, energy consumption for both local computation and transmission must be taken into account. Given the limited energy resources of wireless users, the communication problem is formulated as an optimization problem with the goal of minimizing overall system energy consumption while meeting a latency requirement. To address this problem, we propose an iterative algorithm that takes into account factors such as bandwidth, power, and computational resources. Results from numerical simulations demonstrate that the proposed algorithm can reduce energy consumption compared to traditional FL methods up to 51% reduction.


Introduction
There has been significant growth in mobile data in recent years, much of it generated in real-time and distributed to edge devices such as smartphones and sensors [1] [2].Artificial intelligence (AI) technology is widely used to process this mobile data and support various services, such as computer vision and the internet of vehicles [3].A common practice is to train AI models using elastic cloud computing, which allows operators to achieve optimal performance by accessing large-scale datasets.However, this process poses challenges due to privacy concerns [4], network congestion [5], and service latency [6].Federated learning (FL) in the edge framework offers a solution to these issues.FL implements distributed machine learning at the network edge, where clients, or edge devices, train local models with their private data and only share parameters like model weights [7] [8] [9].An FL server is used to aggregate these models into a global model and broadcast updates to each edge device.After several iterations, accuracy is achieved, and the training process is completed.FL avoids the need for data uploads and enables rapid access to real-time data, thus reducing pressure on communication resources and lowering service latency.It is a promising distributed learning algorithm that is likely to be applied in future internet of things systems [10,11,12,13,14,15].
Wireless devices, such as those that communicate through cellular networks, benefit greatly from the use of distributed learning frameworks.These frameworks allow for the training of locally collected data using a shared learning model [16,17,18].However, edge devices such as smartphones have limited computation resources, and the spectrum resource is scarce.Only a few edge devices are able to upload their trained local models in each round.Additionally, the limited battery life of these devices is a growing concern, and the energy consumption caused by communications is increasing.
In addition, wireless devices can cooperate and execute a learning task by only uploading their local learning models to the base station (BS) instead of sharing the full training data [19].One approach to improve this process is the use of a gradient quantization-based digital transmission scheme [20].The coverage area of wireless devices is also taken into consideration to reduce the number of edge devices needed [21].However, the limited wireless resources such as time or bandwidth, make it necessary for wireless devices to transmit their local training results over wireless links [22], which can affect the accuracy of the FL framework in edge systems [23].Furthermore, the limited energy of wireless devices makes energy efficiency optimization crucial for the successful deployment of FL due to these resource constraints [24].
In order to address the challenges discussed above, we propose a framework that balances both energy consumption and learning accuracy.Specifically, we model the energy consumption of computation and communication in the FL framework for edge systems.The novelty of this work is to take both learning accuracy and resource constraints into consideration for FL.The main contribution of this paper is to construct an energy consumption minimization problem and provision of a solution to this problem.Our key contributions include: • Using wireless communication networks, we investigate the performance of FL algorithms for a scenario in which each user locally computes its model under a given learning accuracy, while the BS broadcasts the aggregated model to all users.The convergence rate of FL is first determined for the considered algorithm.
• An optimization problem is formulated to minimize total energy consumption for both local computation and wireless transmission.
A low-complexity iterative algorithm is proposed to solve this problem.This algorithm includes new closed-form solutions for time allocation, bandwidth allocation, power control, computation frequency, and learning accuracy.
• The FL completion time minimization problem is established as a feasible solution to the total energy minimization problem.Our theoretical analysis shows that the completion time is dependent on learning accuracy.To minimize FL completion time, we propose a bisectionbased algorithm that is based on the theoretical result.

Related works
Mobile data is processed and supports various services through the use of artificial intelligence (AI) technology [3].Elastic cloud computing is commonly used to train AI models, which can access large-scale datasets and achieve optimal performance [4,5,6].However, this process presents challenges such as privacy concerns, network congestion, and service latency.To address these issues, federated learning (FL) in the edge framework implements distributed machine learning at the network edge [7,8,9].In FL, clients or edge devices train local models with their private data and only share parameters like model weights.
Studies have been conducted on the challenges of federated learning (FL) over wireless networks.For example, in [25], a broadband analog aggregation scheme was used to minimize communication latency in multiaccess channels.In [26], the authors explored a minimization problem for FL in cell-free venues using multiple inputs and outputs (MIMO) systems, and in [27], an energy-aware user scheduling policy was proposed to maximize the number of scheduled users in FL with redundant data.In [28], a novel sparse and low-rank model was developed to improve statistical learning performance for on-device distributed training, and in [29], an energyefficient bandwidth allocation scheme was proposed with constraints on learning performance.However, these works tend to focus on the trade-off between completion time and energy in wireless transmission and do not take into account the trade-off between learning and transmission.In recent works, such as [30], [31] and [32], the authors considered both local learning and communication energy but did not take into account computation delays on local FL models, and it was not feasible for all users to send their learning model synchronously.

System model
This paper presents a framework for a federated edge learning system that includes a set of edge devices and one edge server (BS) as shown in Figure 1, the applications in federated learning can be self-driving, users augmented or virtual reality headsets.Typically, AI-enabled edge applications can also be allocated in federated learning frameworks.The set of edge devices is denoted as K = 1, 2, • • •, k, and each device has its own local data sample Dk.Additionally, we use xi and yi to represent the input and output of data sample i, respectively.The edge server and users jointly execute the learning model, with the learning algorithm trained on both the user-local side (called the local model) and the server side (called the global model).The communication between the user and server is the main process that consumes cost and energy.(1 while Ck is the number of CPU demand of data sample at user k and β is the fraction of data sample executed locally.
The energy consumption is affected by the CPU frequency and CPU chip.So, the energy consumption caused by local computation for βkDkCk is denoted as where α is the coefficient of the chip architecture at edge device k.
(2) Global Communication: After the process of local computation, the users upload their trained data to BS through the wireless access.The rate of user k is where pk is the transmit power, N0 is the Gaussian channel noise and hk denotes the channel gain between the user k and the base station.We know that Shannon capacity gives an upper bound of the transmission rate.Because the transmitting data size is βkDk, so the rate can also be calculated as ) It is clear that, the energy consumption of uploading data to BS is The overall consumption contains the cost in both process of computation E k c and transmission E k t : The completion time of FL algorithm is the main concern of the edge systems.So, the completion time of user k includes both the computation and communication time, denoted as +   (7) while the completion time of algorithm is the maximum completion time Tk.

Federated learning model
In federated edge learning, we use θ to represent the training parameters related to the global model.We define Dk as the data sample for user k.We define the loss function f i (θ) = (y i − θ T x i ) 2 which represents the difference between the input and output.We aim to minimize this loss function to find the optimal value of θ.The reason we adopt this expression is that in most federated learning applications, the training sample is large enough that we can assume that the difference between the input and output follows a normal distribution with a mean of 0. Therefore, for a specific sample xi and given θ, the conditional probability of yi can be expressed as: We assume that all samples are independent and identically distributed.To find the optimal θ, we define the likelihood function.Our goal is to obtain as many observed outputs as possible, so we maximize the product of the likelihoods of each individual sample.

𝐿𝐿( 𝑥𝑥
We then convert the Eq. ( 9) into a log-likelihood function.The reason for this is that logarithms are strictly increasing functions, so maximizing the likelihood is equivalent to maximizing the log-likelihood: We remove items that are not related to lo g  , and convert the remaining item into a negative loglikelihood: By setting σ = 1 we obtain the original loss function.
Essentially, maximizing the likelihood is equivalent to minimizing the loss function.Thus, we can obtain the optimal θ by minimizing the loss function.
The objective of federated learning is to find a set of parameters that minimize the value of Fk (θ). (12) where  = ∑  is the total data size of all users in the system.The BS collects the partially trained models from all users and updates the global model based on the collection of trained models and then distributes the updated model to all users.The process involves several local iterations and global iterations.The number of efficient computations is smaller than local updates.From the system's perspective, it is crucial to schedule as much as possible under the limitations of wireless resources while providing good performance in terms of completion time and energy consumption.

Problem formulation
In this section, we formulate a problem of minimizing the energy consumption of all devices while taking into account the constraint of latency.The objective of our proposed model is to minimize the overall energy consumption of the K edge devices.The energy-oriented problem can be formulated as follows: ≥ 0,   ≥ 0, ∀ ∈  (18) while f k max and p k m ax denote the maximum computation capacity and maximum transmit power of user k, respectively.This is the original problem P1, whose objective is to minimize the total energy consumption of all users, including the computation and communication phases.Constraint Eq. ( 14) is based on the definition of βk, the resource allocated to all k users.Constraint Eq. (15) indicates that the communication time is sufficient to transmit the data sample to the BS.The completion time constraint is limited in Eq. ( 16), and constraint Eq. ( 17) describes the limitations of frequency and transmit power.
Lemma1 The problem P1 is a non-increasing function with tk and βk, ∀k ∈ K.The derivative of the P1 objective shows that it is a non-increasing function.Thus, the optimal solution is found by optimizing the transmission time of each device, which is independent of another parameter βk.
Then, by applying the KKT conditions [33], the optimal solution of P1 can be obtained as follows.
The optimal solution of bandwidth allocation is Proof: From the Eq.(4.3.19),use T k to replace t k in P1, so the original problem P1 can be rewritten as: s. t.Eq. ( 14)(16)(17)( 18), ( 22) As is mentioned before, it is a convex problem.Lagrange multiplier T are applied, and the multiplier u * , so the KKT conditions can be written as To solve this problem, it can be obtained and the W(.) is the Lambert W function, the multiplier value u * can be calculated by solving Here, )BN 0 e , and substitute it into γ k * , .
In order to solve it, this can be used It is obvious that y is a non-decreasing variable with W(x), so γ k * is also non-decrease with T k .From the equation of x, it can conclude that ℎ k 2 = BN 0 eT k u * (1 +
is defined, it is easy to obtain z is nonincreasing to W(x).Since W(x) is non-decreasing with x and ℎ k 2 , so γ is also non-increasing with ℎ k 2 .It can be seen that more bandwidth resources should be allocated to devices with weak computation capacities.This is because the main factor affecting the minimization of energy consumption is the synchronous updating and execution of tasks.For simplicity, weak devices require larger bandwidth to complete the entire process of computation and communication.In addition, more bandwidth should be allocated to weak channels, as weak channels have a lower transmission rate, and therefore, require larger bandwidth.

Methodology
In this section, an efficient algorithm is proposed to solve problem P1, which was formulated in the previous section.The objective of solving problem P1 is to determine if these devices can finish their task within the completion time T. Therefore, it is equivalent to transforming P1 into the following problem P2: s. t.
while λ is defined as a pre-trained parameter.This problem is difficult to solve due to the integer constraint.Therefore, relax the constraint β k ∈ 0,1 to 0 ≤ β ≤ 1, allowing the integer problem to be solved by the relaxed problem.β k can be considered as the importance of various devices, which includes both bandwidth allocation and sequence scheduling.In detail, the bandwidth problem can be solved by P1, and the second part is to decide the importance of devices, defined as P3, Eq. ( 15)(16)(17) (18). ( Based on the definition of β = [β 1 , β 2 , … , β k ] and format of P3, another function is introduced: And the partial derivative of G(β) can be calculated as: = 0, the result can be obtained: Here discusses the result of β k ′ under the limitations of the P3: If β k ′ ≤ 0, the minimizing value will be at β k = 0; If 0 < β k ′ ≤ 0, the minimizing value will be at the minimizing value will be at β k = 1.In conclusion, the optimizing value is From this result, it can be inferred that the importance of devices with strong computation capacity and bandwidth is higher than that of others.
Based on the result of β k and Eq. ( 39), it can be concluded that the importance of device k is related to transmission time t k and the condition of the channel ℎ k .The effect of t k is larger than that of ℎ k based on Eq. (39).

Algorithm1 Allocation method
The complexity of Algorithm1 is determined by the number of iterations of Eq. ( 5), ( 6), (7).The optimal solution of Eq. ( 11) is obtained by the bisection method, which has a complexity of (Klog2n) , where n is the interval of β k .In this algorithm, the bandwidth allocation is fixed during each time slot, which results in a certain energy consumption at each iteration.

Results and discussion
In this simulation settings, the device number is set as K=50 following the uniform distribution in the area of 500m*500m.In the edge system, the bandwidth is set as B= 2MHz.The transmit power is defined as 10 dB, and the computation frequency is 2 GHz.In the meantime, the power gain that presents noise is set as -30 dB.It is considered that the machine learning model is CNN, and the data set is applied by MNIST, and 500 data samples are in total.Most applications in the edge systems desire for lower completion time, so it can compare the proposed EF methods with FDMA and TDMA methods (Tran et al., 2019).Figure 3 shows the variation of completion time with the increase of average transmit power.It can be observed that the completion time decreases with the increase of transmit power, which is because the increase of transmit power can reduce the communication time of the end-user and BS.The proposed EF method demonstrates a 19.8% and 13.2% reduction in energy consumption compared to other methods.The method takes into account both energy and latency factors of tasks in the edge system.FDMA performs better than the TDMA method because the TDMA method divides time slots among different users, the task of the user may not be ready when it's their turn to use the time slot.Thus, the completion time of the TDMA method is worst in this scenario.

Figure 4. Computation and communication completion time comparison with various transmit power
In addition, it is necessary to find out the relationship between computation and communication in both methods.The changes in the maximum average transmit power of each user are illustrated in Figure 4, which shows how these variations affect the amount of time needed for communication and processing.It is clear from looking at the chart that both the amount of time spent communicating and the amount of time spent computing decreases as the maximum average transmit power of each user grows.It is also possible to notice that the amount of time required for computing is invariably greater than the amount of time required for communication, and that the rate at which the amount of time required for communication is decreasing is higher than that of the computation time.Energy consumption is an important concern for edge users.Figure 5 illustrates how energy consumption changes as transmit power increases over a period of T=150s.It can be seen from the figure that the proposed EF method outperforms TDMA and FDMA.Additionally, energy consumption decreases as transmit power increases, as a result of the reduced completion time as shown in Figure 3. TDMA method allocates specific time slots to users and the users respond periodically, which leads to wasted energy.When the transmit power is larger than 18dB, TDMA outperforms in computation energy consumption.

Figure 7. Energy consumption comparison with various completion time
The relationship between completion time and energy consumption is a crucial aspect of the edge system, as shown in Figure 7.It can be observed that the performance of FDMA is better in terms of low completion time since it schedules all data to the BS in a global iteration while only a fraction of users send data to the BS at one time.On the other hand, the performance of TDMA is similar to other methods when the completion time is larger.The proposed EF method can significantly reduce energy consumption, up to 51% and 27% compared to TDMA and FDMA respectively.

Conclusion
We have studied the problem of energy-efficient computation and resource allocation for FL over wireless networks.Based on the convergence rate, we developed time and energy consumption models for FL and formulated a joint learning and communication problem with the objective of minimizing the total computation and transmission energy of the network.We have derived closed-form solutions for computation and transmission resources at each iteration and proposed an iterative algorithm with low complexity to solve this problem.The proposed scheme is efficient in terms of energy consumption and outperforms conventional schemes TDMA and FDMA with 51% and 27% reduction, especially when the maximum average transmit power is low.

Figure 1 .
Figure 1.Federated learning framework (θ) (11) Since FL is designed to share a model among users, data communication of various data samples is crucial in the FL problem.Therefore, the FL training problem can be formulated as

)
We use W to denote the Lambert W function[34].The transmission time Tk is strictly constrained by the transmission time of the device k with the largest completion time, when u * and e are used as Lagrange multipliers.

Figure 2 .Figure 2
Figure 2. The training accuracy versus the number of global iterations with different local iteration

Figure 3 .
Figure 3. Completion time comparison with various transmit power

Figure 5 .
Figure 5. Energy consumption comparison with various transmit power within T=150