Joint Accuracy and Latency Optimization for Quantized Federated Learning in Vehicular Networks

Nowadays, the vehicular networks have emerged as a boosting technology to enhance traffic efficiency and safety within the transportation systems. As the amount of onboard data increases and the data privacy concerns grow, federated learning (FL) has gained popularity for harnessing the data for intelligent transportation operations. To satisfy the strict latency criteria in the vehicular networks, a quantization scheme is employed within FL to reduce the size of the local models before uplink transmission. In this article, considering the nature of vehicles’ high mobility, we aim to optimize both the learning performance and latency simultaneously by jointly considering the communication resource budget and quantization strategies. Specifically, we first analyse the convergence performance of the quantized FL, which demonstrates the effects of both the quantization error and the number of clients on the convergence rate. Then, we formulate a multiobjective optimization problem (MOP) to maximize the number of participating clients and minimize the overall latency, by jointly optimizing the quantization level, wireless resource allocation, and client selection. To deal with the MOP, we decompose the MOP into a set of scalar optimization subproblems, each formulated as a Markov decision process (MDP). To solve the MDP in high-mobile vehicular networks, we propose a novel deep reinforcement learning-based vehicle heterogeneous quantization FL (DRL-VQFL) method, which leverages a DRL framework built upon the proximal policy optimization algorithm. Then, a parameter transfer strategy is employed to solve the neighboring subproblems efficiently. Our extensive simulations demonstrate the effectiveness and efficiency of the DRL-VQFL approach, showcasing its superiority over the other benchmark methods.

Abstract-Nowadays, the vehicular networks have emerged as a boosting technology to enhance traffic efficiency and safety within the transportation systems.As the amount of onboard data increases and the data privacy concerns grow, federated learning (FL) has gained popularity for harnessing the data for intelligent transportation operations.To satisfy the strict latency criteria in the vehicular networks, a quantization scheme is employed within FL to reduce the size of the local models before uplink transmission.In this article, considering the nature of vehicles' high mobility, we aim to optimize both the learning performance and latency simultaneously by jointly considering the communication resource budget and quantization strategies.Specifically, we first analyse the convergence performance of the quantized FL, which demonstrates the effects of both the quantization error and the number of clients on the convergence rate.Then, we formulate a multiobjective optimization problem (MOP) to maximize the number of participating clients and minimize the overall latency, by jointly optimizing the quantization level, wireless resource allocation, and client selection.To deal with the MOP, we decompose the MOP into a set of scalar optimization subproblems, each formulated as a Markov decision process (MDP).To solve the MDP in high-mobile vehicular networks, we propose a novel deep reinforcement learning-based vehicle heterogeneous quantization FL (DRL-VQFL) method, which leverages a DRL framework built upon the proximal policy optimization algorithm.Then, a parameter transfer strategy is employed to solve the neighboring subproblems efficiently.Our extensive simulations demonstrate the effectiveness and efficiency of the DRL-VQFL approach, showcasing its superiority over the other benchmark methods.

I. INTRODUCTION
A. Background V EHICULAR networks play a pivotal role in supporting the evolution of the intelligent transportation systems (ITSs) [1], enabling seamless wireless connections among vehicles, roadside units (RSUs), and pedestrians [2].Equipped with vast amounts of sensors, vehicles generate ever-increasing onboard data that can be delivered through these vehicular networks [3].To pave the way for information exchange, vehicle-to-vehicle and vehicles-to-infrastructure communication have emerged as two primary communication types in the vehicular networks [4].
With the remarkable growth in the vehicular data and the expansion of vehicular computing and storage capabilities, machine learning (ML) has been widely employed to create safer and more convenient driving conditions [5].By leveraging the wealth of the collected data, ML provides intelligent predictions and adaptive decision making to facilitate an ocean of applications, including adaptive resource allocation policies, smart traffic operations, rapid attack identification, autonomous driving, and so on [6].
To learn a shared ML model, large amounts of data are transmitted from the local vehicles to the central server, which raises the data privacy concerns and fails to meet the stringent communication delay requirement in the vehicular networks.Nowadays, federated learning (FL) [7] has been regarded as a promising approach to tackle these ongoing challenges.In FL, the clients learn their local models by leveraging the data stored on their respective devices and subsequently upload the models to the server.Once the server collects these local models, it aggregates them into an updated global model, which will then be broadcast to all the clients for the next round of training.By utilizing the distributed computing and storage facilities on the clients, FL can expedite the training process without compromising data privacy [8].

B. Related Work 1) FL in Vehicular Networks:
Assisted by various techniques, vehicular networks play a crucial role in advancing autonomous vehicles, traffic management systems, and the development of smart cities. Raza et al. [9] introduced the potential of vehicular edge computing in supporting realtime vehicular interactions by enabling the computation close to the vehicular data source.Moreover, Tang et al. [10] investigate the pivotal role of ML in enhancing vehicular networks, focusing on applications, such as radio resource allocation, transmit scheduling, communication power control, traffic offloading, and dynamic routing.To preserve the clients' privacy in a communication-efficient manner within the vehicular networks, FL is regarded as a promising technique by pushing the computation to vehicles and employing the ML models to analyse vast amounts of data [11].
During the deployment of FL in vehicular networks, several challenges emerge due to communication bottlenecks, system heterogeneity, and heterogeneous data.To tackle these problems, various techniques have been proposed, including client selection, resource allocation, and incentive design.Zhao et al. [12] maximized the number of vehicles while considering dynamic wireless channels and varying computation capacities, ensuring tasks are completed within specified latency constraints.An over-the-air computation (AirComp)based method is proposed in [13] to improve communication efficiency in FL by leveraging wireless channel superposition.To further minimize the communication overhead in practical AirComp-based FL systems, a convergence analysis is conducted in [14] to reveal the impact of the aggregation error caused by the channel fading and noise.Zeng et al. [15] analysed how the mobility of vehicles and the nonindependent and identically distributed (non-iid) nature of data have an impact on the FL convergence, and propose a contract theorybased incentive mechanism to accelerate the FL convergence speed.To strive for the lowest cost in the worst-case scenario of FL, Xiao et al. [16] considered the vehicles' position and velocity, and present a vehicle selection and resource allocation optimization method to minimize the overall system cost.
2) Quantization in FL: Despite FL involving parameter exchanges instead of raw data, communication efficiency remains a bottleneck due to the increasing size and complexity of neural networks (NNs) as the FL tasks become more complex.The quantization scheme emerges as an effective solution to mitigate this issue, where the trained models are compressed before the uplink transmission [17], [18], [19], [20], [21], [22], [23].The work in [17] highlights the profound impacts of the physical layer quantization and transmission options on the convergence of FL.To enhance energy efficiency and reduce the number of global rounds, [18] introduces a multiobjective optimization problem (MOP) to optimize the quantization level.Moreover, how to minimize the service delay for FL by controlling the local updates, the weight quantization level and gradient quantization level is analysed in [19].
The devices in the FL systems often exhibit heterogeneity in communication and computing capabilities, but prior works overlook the system distinctions by assuming a homogeneous quantization scheme across all the devices.Following that, Chen et al. [24] proposed FEDHQ to assign different aggregation weights and set heterogeneous quantization levels for the devices.Considering the tradeoff between the quantization error and the transmission outage under restricted transmission delay, Wang et al. [25] presented FedTOE to enhance the model accuracy through a theoretic convergence analysis.Moreover, to deal with the statistical heterogeneity and device heterogeneity, QuPeD [26] is a novel scheme to allow clients to learn the compressed and personalized models with varying quantization levels and model dimensions.The resources and the personalized quantization bits are jointly optimized in [27] to mitigate the communication bottleneck issue.Given that the client's location changes in FL can introduce variations in the wireless channels, Lan et al. [28] investigated the heterogeneous quantization when dealing with varying channels in the wireless FL.In the vehicular networks, the system heterogeneity is a prevalent feature, prompting the adoption of a heterogeneous quantization method in this article.
While various heterogeneous quantization schemes are investigated, the aforementioned studies primarily focus on the general wireless FL systems, overlooking the distinctive challenges posed by vehicular networks, including stringent real-time communication requirements and precise learning performance demands.Li et al. [29] implemented heterogeneous quantization precisions in vehicular networks.To improve communication efficiency and minimize aggregated squared quantization errors within vehicular ad hoc networks, FEDDO is proposed to assign different aggregation weights to the local models with individual quantization precisions.It is worth noting that the quantization scheme reduces the training latency at the expense of compromising the FL learning performance.Hence, it is pivotal to design an appropriate quantization strategy to balance the FL accuracy and training latency.

C. Motivation and Contributions
As a pivotal characteristic of vehicular networks, these existing works disregard the nature of vehicles' high mobility when applying the heterogeneous quantization scheme.Vehicles may move out of the coverage range of RSUs with high speeds, exacerbating the straggler effect.Furthermore, the nature of vehicles' mobility results in varying wireless conditions across different FL global rounds, further impacting the system communication efficiency.Given the constraints of limited wireless resources in vehicular networks, it is critical to jointly optimize wireless resource allocation and quantization levels that can adapt to the dynamic mobility of vehicles.
Motivated by the aforementioned observations, this article studies the performance of FL under the quantization scheme and aims to deal with these difficulties in vehicular networks.Specifically, we first provide a convergence rate analysis for FL while considering the heterogeneous quantization schemes.Then, to improve the FL accuracy and reduce the latency, we formulate an MOP and aim to optimize the bandwidth allocation, client selection, and quantization levels.To solve the mixed integer nonlinear programming problem, we employ deep reinforcement learning-based vehicle heterogeneous quantization FL (DRL-VQFL) to obtain the optimal solutions efficiently.In particular, the main contributions of our work include as follows.
1) We investigate FL implementation in vehicular networks characterized by highly mobile vehicles.To alleviate the straggler effect arising from the mobility of vehicles, we propose a vehicle selection mechanism.Then, to effectively account for the dynamic wireless environment incurred by the vehicles' high mobility, we model the FL process as a Markov decision process (MDP), and derive the dynamic wireless channel conditions required for calculating the total latency.2) We leverage the heterogeneous quantization scheme to reduce the communication delay and conduct a convergence rate analysis to study how quantization errors and the number of participating users impact the performance of FL.To ensure FL accuracy, we aim to improve the number of participating vehicles while effectively managing the quantization error introduced by the quantization method.Following that, we formulate an MOP to maximize the number of participating participants and minimize the overall latency.To accomplish this, we explore the tradeoff between these two objectives by jointly optimizing quantization levels, bandwidth allocation, and vehicle selection indices.3) To obtain the feasible solution to the MOP problem, we propose the DRL-VQFL algorithm.We break down the MOP into multiple subproblems, subsequently modeling each of these subproblems using an NN.In addressing these subproblems, we treat each as an MDP and apply a proximal policy optimization (PPO)-based approach to find solutions effectively.4) Finally, to validate the effectiveness of our proposed scheme, we evaluate the performance of DRL-VQFL through the comprehensive simulations utilizing the open data sets and models.The simulation results demonstrate the superiority of our proposed method over other baselines, particularly in terms of communication efficiency and model accuracy.The remainder of this article is organized as follows.In Section II, we introduce the system model of quantized FL in the vehicular networks.The convergence rate of the heterogeneous quantization FL and vehicles' mobility are investigated in Section III.Section IV formulates the optimization problem.In Section V, the DRL-VQFL algorithm is developed to solve the formulated MOP.Simulation results are presented in Section VI.Finally, Section VII concludes this article.

II. SYSTEM MODEL
In this section, we first present the FL workflow in the vehicular network.Then, we introduce a quantization scheme to reduce communication delays.Next, we elaborate on the latency costs.As depicted in Fig. 1, we consider an urban road where there is one RSU and a set K = {1, 2, . . ., K} of vehicles collaboratively learning the FL model.

A. FL Model
We define the set of the global communication rounds as M = {1, 2, . . ., M}.Given the constraints of limited wireless resources and the dynamic mobility of vehicles, only a subset N m ∈ K of vehicles can participate in the FL task, with |N m | = N.With a variety of onboard sensors, each vehicle We where η m corresponds to the lr at global round m.Additionally, ξ i−1 k,m represents a mini-batch, which is a sample uniformly selected from the local data set D k .
To reduce the data size for transmission, the vehicles can upload a weight differential w k,m as proposed by [25] and [27] Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where w I k,m is the calculated local model after I local rounds and serves as the updated result for aggregation in the RSU.We adopt w k,m to represent w I k,m to simplify the notations for further analysis.
Then, by aggregating all the local weight differentials, we can obtain the global model parameters

B. Quantization Scheme
To mitigate transmission latency, the quantization scheme is leveraged to compress the NN parameters into a smaller quantization level.Let Q( w k,m ) represent the quantized weight differential of vehicle k at global round m for transmission, and (4) can be changed as ( We denote w max k,m and w min k,m as the upper and lower bounds of the absolute values of parameters w k,m,j (j ∈ {1, 2, . . ., J}) in the J-dimensional vector parameters w k,m , such that We employ stochastic rounding [30] in the quantization method, denoting q k,m as the quantization bits for the vehicle k at global round m.Then, there are 2 q k,m integer values, and [ where ϕ ∈ {0, 1, 2, . . ., 2 q k,m − 2}.The value of the knob z ϕ is given by When the value of w k,m falls within the interval V ϕ , the quantization function can be written as where we have ι = (z ϕ+1 − | w k,m,j |/z ϕ+1 − z ϕ ) and the sign function sign(•) yields the sign of w k,m,j as −1 or 1.Moreover, "w.p." is short for "with probability." The size of the quantized weight differential Q( w k,m ) can be calculated as where α is the sum of the number of bits to represent the sign, and the size of w max k,m and w min k,m .Utilizing the quantization function defined above, we can obtain the following lemma like [25] and [27].
Lemma 1: The quantized parameter Q( w k,m ) remains unbiased with respect to w k,m .Moreover, let 2 and the quantization error can be bounded as follows: Proof: Both the equality can be obtained by calculating the expectation with the quantization method in (6).Further details of this derivation can be found in [25].
Lemma 1 reveals that a smaller value of the quantization bit q k,m results in a higher quantization error, because a smaller q k,m leads to lower precision of the trained model.

C. Computation and Communication Model
1) Local Computation: Nowadays, vehicles can be equipped with GPU and possess abundant computational resources [31].Unlike CPUs, which operate in a serial mode, GPUs function in parallel.When the training batch size (bs) is small, GPUs can process all the data simultaneously, resulting in a consistent computation latency.However, as the training bs surpasses the maximum number of data that can be processed at once, the latency increases.According to [32], the computation time T cp k,m with bs φ can be defined as follows: where T th k , o k are the parameters related to the NN model and GPU configuration and φ th k is the threshold bs for the vehicle k. 2) Uplink Communication: We consider an orthogonal frequency domain multiple access (OFDMA) as the transmission scheme.The channel gain between the vehicle k and the RSU at global round m can be represented as follows [33]: where h k,m represents the power gain of the channel at a reference distance of 1 m, d k,m is the distance between the vehicle k and the RSU, and ν is the path loss exponent.Let p k,m represent the uplink power.Then, the achievable uplink transmission rate is given by [34] where B and σ 2 represent the total uplink bandwidth and the variance of the complex white Gaussian channel noise.Besides, the parameter b k,m ∈ (0, 1) denotes the allocated bandwidth index of vehicle k at global round m with Before uplink transmission, the model for the vehicle k at global round m is quantized with q k,m bits.Let e o denote the original model size with q max bits.Since, α is much smaller than Jq k,m , from (7) we can approximately calculate the size of the quantized data with q k,m bits as [18] e k,m = e o q k,m q max .( Then, the uplink communication time T cm k,m can be expressed as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. 3) Aggregation and Downlink Transmission: Once having collected the local models, the RSU aggregates them into a global model and then broadcasts it to all the vehicles.The aggregation delay is typically negligible, as assumed in [35], due to its basic averaging operation, which can be rapidly performed using the RSU's high computational capacity.
Furthermore, the downlink delay is considered insignificant in comparison to the uplink delay, as presented in [36], due to the RSU's significantly higher downlink power and the allocation of a larger downlink bandwidth for data distribution.
4) Total Latency: Overall, the total latency at global round m can be calculated as the maximum latency among all the vehicles due to the straggler effect

III. CONVERGENCE AND VEHICLES' MOBILITY ANALYSIS
In this section, we first derive the convergence of FL, followed by an analysis of the impacts on the performance of FL.Next, we model the dynamics of wireless networks caused by the vehicles' mobility and propose a vehicle selection mechanism.

A. Convergence Analysis
Before we discuss the convergence of FL, we consider the general assumptions of the loss function, which are similar to those in works [17], [18], and [37].
Assumption 1: F k (w) is L-smooth and lowered bounded for all k ∈ K.For all w 1 and w 2 : For all w 1 and w 2 : Assumption 3: For all k ∈ K, the variance of stochastic gradients of the randomly sampled data ξ k,m for the vehicle k is bounded as  where Proof: Refer to the detailed proof the Appendix.It can be observed that as the quantization bit q k,m increases, the upper bound of E[F(w M )] − F(w * ) decreases, resulting in improved convergence of FL.This observation aligns with the understanding that a smaller quantization bit may lead to more missing information, thus potentially reducing the model's accuracy.Furthermore, Theorem 1 also shows that an increase in the number of participants N can reduce the above upper bound, thereby enhancing the FL's performance, which is consistent with the conclusions drawn in [12] and [34].

B. Vehicles' Mobility and Selection Mechanism
1) Vehicles' Mobility: Due to the mobility of vehicles, the path loss in the uplink transmission varies and is determined by the distance between the RSU and the vehicle, as depicted in Fig. 2. We assume that the distance vehicle k has traveled at global round m − 1 is denoted as u k,m−1 and v is the velocity of vehicles.The traveling distance at global round m can be represented as The traveling distance for the vehicles to upload the local parameters can be represented as ûk,m = u k,m−1 + T cp k,m v. Let R represent the coverage radius of the RSU, and H denote the vertical distance between the RSU and the road.Then, the distance between the vehicle k and the RSU can be calculated as 2) Vehicle Selection Mechanism: The vehicles on the road are at varying speeds and the wireless conditions are highly dynamic.Therefore, the diverse remaining time for the vehicles in the RSU's coverage makes a great impact on the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
FL training accuracy.Let c k,m ∈ {0, 1} represent the client selection index at global round m for the vehicle k.When the vehicle k is selected, c k,m = 1.When the vehicle k is not selected, then c k,m = 0. To ensure that selected vehicles can complete the task before leaving the RSU's coverage, we have To avoid the straggler effect, we set a latency tolerance T max , represented as IV. PROBLEM FORMULATION In this section, we first discuss the two objectives for optimization and then formulate an MOP to simultaneously enhance the FL performance and reduce the total latency in the vehicular network.
Inspired by the conclusion after Theorem 1, we propose the following metric to measure the FL performance by calculating the number of participants, which can be expressed as follows: where c k,m represents the selection index of the vehicle k at global round m.When more vehicles engage in the training, G m becomes larger, indicating improved FL performance.It is noted that when no vehicles are selected, K k=1 c k,m = 0, resulting in G m = 0. Besides, as a concave increasing function, the logarithm function can better capture the relationship between the FL performance and the number of participants.This is because as the number of participating vehicles increases, the growth rate in the FL performance diminishes [38].
Next, we denote the vehicle selection indices, quantization levels, and bandwidth allocation for all the vehicles at global round m as To improve the FL performance, the problem of maximizing the accumulated number of participants can be expressed as max where In addition, the real-time feedback is critical in the vehicular networks for safety concerns, so another problem in minimizing the total latency can be described as min (23) where Based on the above analysis, we formulate an MOP problem to maximize the weighted number of accumulated participants in FL and minimize the vehicles' total latency by jointly optimizing the selection indices C, quantization level Q, and bandwidth allocation B. The problem can be expressed as max where the constraint C 1 limits an upper bound ε of the quantization error to ensure the FL performance based on the above convergence analysis.C 2 ensures the sum of all allocated bandwidth equals 1.The constraints in C 3 , C 4 , and C 5 limit the range of the three variables.In C 6 , the selected vehicles are ensured to stay within the coverage area of the RSU.In C 7 , the latency at one round must not exceed the latency tolerance to avoid the straggler effect.C 8 guarantees at least one client participates to facilitate the FL training process.Now, we have formulated an MOP problem involving vehicle selection, quantization levels assignment, and bandwidth allocation.Nevertheless, several challenges hinder deriving the optimal solution for the above problem.First, the three variable metrics are coupled with each other, with a mix of integer parameters and continuous variables, which complicates the optimization process.Second, the mobility of vehicles introduces uncertainty and high dynamics within vehicular networks.Third, the two objectives conflict with each other when dealing with the limited wireless resources within a predefined latency tolerance.To deal with these challenges, we propose the DRL-VQFL algorithm in the following section.

V. PROPOSED DRL-VQFL ALGORITHM
In this section, we propose DRL-VQFL to tackle the above MOP.First, we apply the decomposition strategy to divide the problem in (24) into several scalar optimization subproblems.Then, each subproblem is optimized in a sequence efficiently by the parameter transfer strategy [39], as displayed in Fig. 3.

A. DRL-VQFL Framework
To solve the MOP effectively, decomposition is regarded as the essential technique.In DRL-VQFL, we first apply the min-max scaling method adopted in [40] and [41] to normalize the objects with different scales, yielding f 1 and f 2 .Then, we leverage the weighted sum method [42] to convert the biobjective problem into the T scalar subproblems.We denote the set of weight vectors as λ 1 , . . ., λ t , . . ., λ T , where λ t = (λ t 1 , λ t 2 ), and λ t 1 + λ t 2 = 1.In our bi-objective problem, the set can be given as (1, 0), (0.9, 0.1), . . ., (0, 1) [39].Hence, the objective of the tth scalar optimization problem can be represented with the weight vector λ t = (λ t 1 , λ t 2 ) as [43] max By applying different λ t and solving all the T subproblems, we can obtain the Pareto optimal solutions, which together form the Pareto front (PF).It is noted that neighboring subproblems share adjacent weight vectors, which implies that they have similar solutions.Then, the solution to a subproblem can be obtained by transferring knowledge gained from its neighboring subproblems, as depicted in Fig. 3. Therefore, the neighborhood-based parameter transfer strategy [39] can be employed to transfer the model parameters from the previous subproblem to the current one, facilitating the collaborative and efficient resolution of all the scalar subproblems, as demonstrated in [44] and [45].
To deal with each subproblem, we model it as an NN and employ the DRL-based method to derive the optimal solutions.Specifically, we define the parameters of the network model of the tth subproblem as ω λ t with the optimal parameters ω * λ t .Then, ω * λ t can be regarded as the initial parameters for training the network model in the (t + 1)th subproblem.Then, the decomposition and neighborhood-based parameter-transfer strategies allow for efficient training of the DRL-VQFL model, since all the subproblems can be solved in sequence by transferring the network weights.The general framework of DRL-VQFL is shown in Algorithm 1.

B. Markov Decision Process
To solve the decomposed scalar subproblems, each subproblem is modeled as an MDP <S, A, P, R>, where S Applying the PPO algorithm to learn the first subproblem model with the initial parameters ω λ 1 , the optimal model parameters ω * λ 1 can be obtained.
Continuing to learn the subproblem t with the initial parameters ω λ t , the optimal model parameters ω * λ t can be obtained. 10: end if 11: end for is the state set, A denotes the action set, P represents the transition probabilities, and R is the reward function.The RSU is regarded as the agent that observes the environmental information and conducts the calculations.

1) State Space S:
The state space is a collection of all observable environmental states within the system, capturing the dynamic changes occurring during each global round.At global round m, the state s m includes the following elements: where 1,m , T cm 2,m , . . ., T cm k,m ] is the vehicles' communication latency for parameter uploads; and T m represents the longest communication latency in the system.
2) Action Space A: The actions involve determining an optimal strategy for vehicle selection, power allocation, quantization level, and bandwidth allocation index.At global round m, the agent's set of actions is This action triggers varying uploading times T cm k,m with allocated bandwidth b k,m and quantization bits q k,m for selected vehicle satisfying c k,m = 1.Then, T m is updated according to (14) at global round m.During this period, vehicles traverse new distances at u k,m , each corresponding to a distance d k,m from the RSU.The varying positions lead to different channel gains g k,m , which subsequently impact the uploading time at the next round.
3) Transition Probabilities P: The transition function p(s m+1 |s m , a m ) is the transition probability from the current state to the next state when the agent takes action a m .
4) Reward R: Our objective is to optimize the model performance by maximizing a reward.Designing the reward Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
has a pivotal impact on the training performance.The formulation of the reward function is based on the formulation in (24), which aims to maximize the number of accumulated participants and minimize the total latency.We include the two objectives in the reward function To maximize the reward r obj , reducing the quantization bit can decrease the transmitted model size, consequently reducing latency.However, a small quantization bit can enhance the quantization error, which is restricted within C 1 to maintain the FL performance.Therefore, to ensure the quantization error constraint, we propose the following penalty: where κ is a dynamic hyperparameter to satisfy the constraint requirement and balance the objects and the constraint in the reward function.Then, the reward function can be calculated as the sum of the expected reward and penalty as Then, the cumulative discounted reward can be expressed as R m = ∞ =0 γ r m+ with the discount factor γ .

C. DRL-Based Subproblem Solution
To address the optimization challenge involving both the discrete and continuous variables, we introduce a PPO-based framework [46].PPO is a model-free, on-policy, and the policy gradient-based RL algorithm developed by OpenAI [47].PPO achieves a balance between ease of implementation, sample efficiency, and tuning simplicity while enforcing minimal policy deviations via a clipping technique.
Let π θ denote the policy determined by the parameter θ , and π θ (a m |s m ) = P(a m |s m .θ ) describes the probability to take action a m given the state s m .To solve the optimization problem, the agent interacts with the environment to seek the optimum policy π * .The value network can be denoted as V ψ determined by the parameter ψ.We denote Âm as the advantage function The probability ratio of the current policy to the old policy is defined as where θ is the parameter of the final step.To get the optimal policy while mitigating the excessively large policy deviation, the policy network maximizes a clipped surrogate objective function, as follows:  Apply policy π θ in the environment and collect the set of trajectories.

6:
Compute the reward R m .

8:
Update the parameter θ of the policy network by maximizing (32) based on the parameters θ , θ.

9:
Update the parameter ψ of the value network by minimizing (33).The value network aims to minimize the mean-squared error loss, as follows: The proposed PPO-based method is outlined in Algorithm 2. It initializes the NN parameters θ, ψ and interacts with the environment to observe vehicle positions and latency during each episode.Actions are selected based on the policy, determining vehicle selection, resource allocation, and quantization level assignment.Through this interaction, the PPO collects samples, computes rewards, and then obtains the advantage function.Due to the clipping technique, the policy can be updated when the advantage function falls within a reasonable range, determined by the threshold υ.
Then, the complete procedure of the quantized FL in the vehicular networks can be found in Algorithm 3. The major complexity of Algorithm 3 lies in optimizing C, Q, and B in problem (24).To obtain the optimal results, the key complexity lies in executing Algorithm 1 with a loop of length T, where the outcome of ω * λ t in each iteration is determined by Algorithm 2. Obtaining ω * λ t in Algorithm 2 involves the PPO algorithm, with complexity O(|θ | + |ψ|), where |θ | and |ψ| denote the number of parameters in the policy and value function networks [48].Hence, the total complexity of the proposed Algorithm 3 is O(T(|θ | + |ψ|)).

A. Simulation Settings
To demonstrate the effectiveness of our proposed algorithm, we conduct a series of simulations.We assume the RSU is deployed H = 10 m away from the road with a coverage radius of 500 m.There are K = 10 vehicles uniformly distributed between the entrance of the coverage and 100 m away from the entrance.From (9), we understand that the computation time depends on the system capacity when the bss are uniform across all vehicles, and we model the computation time for for i = 1 to I do 7:   for each parameter j in w k,m with a total J parameters do 12: Calculate the interval based on q k,m .

13:
Calculate the indices of parameter j in the interval.

14:
A lower bound and an upper bound is given based on the indices.The quantized value equals its lower bound. 19: The quantized value equals its upper bound.

22:
The sign of the quantized value is the same as the original parameter.

23:
end for

24:
Upload Q( w k,m ) with b k,m B bandwidth to RSU.

25:
end for

27:
The RSU sends global model w m to all vehicles.28: end for 29: Return the final global model w M vehicles as an uniform process with a range of 10 ms for each local round.In addition, we consider there are T = 11 subproblems in the decomposition procedure.The remaining simulation parameters are outlined in Table I.
The goal of the FL task is to solve an image classification problem using the CIFAR-10 and CIFAR-100 data sets [49].In both data sets, each vehicle is allocated the same amount of data, so they share equal weight ρ k .We leverage Python 3.8 with Pytorch as the software platform.For the NN model, we apply the ResNet-20 [50], which consists of 19 convolutional layers and 1 fully connected layer, and the original size is  1 MB.We utilize the Adam optimizer for optimization, and the local epochs are set to 5. For simplicity in analysis, we assume that the value of β k,m is similar across all vehicles like in [25].
Additionally, we also conduct the simulations on two benchmarks, which are as follows.
1) Random Bandwidth Allocation (RBA): Each vehicle's bandwidth is randomly allocated, but other variables are optimized using the DRL-VQFL algorithm.2) Random Quantization Assignment (RQA): Quantization levels are randomly assigned, while other variables follow the DRL-VQFL algorithm.

B. Performance Evaluation and Analysis
Fig. 4 depicts the accumulated rewards with confidence bands for different methods as the episodes vary.It can be found that our proposed DRL-VQFL consistently outperforms the RBA and RQA methods due to the joint optimization of client selection, bandwidth allocation, and quantization levels, which increase vehicle participation while reducing latency.In RBA, inefficient bandwidth allocation can result in the straggler effect, extending the overall training time.In RQA, vehicles fail to strike a balance between the quantization error budget, the minimum latency requirement, and the number of participants.Some vehicles may employ a small quantization bit, which leads to fewer vehicles participating owing to the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.quantization error constraint.Furthermore, we can see that by replacing the PPO framework with Advantage Actor-Critic (A2C), the A2C method yields lower rewards compared to PPO when solving (24).PPO's advantage over A2C can be attributed to its utilization of the clipped surrogate objective function, which effectively limits policy update magnitudes, thus enhancing algorithmic stability and convergence.
Fig. 5 illustrates the performance of the three methods under varying bss.It can be seen that all curves exhibit convergence except for the curve corresponding to RQA with a bs of 128.This deviation is attributed to RQA's utilization of a random quantization allocation policy, where the quantization error constraint may be violated even in the final stages of training, resulting in a substantial penalty.We can find that the RBA method yields similar rewards across various bss, which are lower compared to those achieved by DRL-VQFL.This is because the random allocation of bandwidth in RBA leads to an inefficient system with longer latency.Moreover, regardless of the chosen bs, DRL-VQFL consistently outperforms the other two baselines in terms of the reward.We observe that the DRL-VQFL achieves the highest reward when utilizing a bs of 64, so we set the bs as 64 for subsequent simulations.
Fig. 6 shows the convergence performance with different learning rates (lrs).In this figure, we can observe that the DRL-VQFL outperforms the RBA and RQA methods across all lr.Specifically, all three methods can obtain the largest reward when lr is set to 0.01.Furthermore, it can be seen that the DRL-VQFL becomes convergent at approximately 5000 episodes when both the policy network and the value network have an lr of 0.01.As the lr decreases to 0.001, DRL-VQFL converges to a lower reward value at around 7500 episodes.Moreover, a further decrease in the lr to 0.0001 results in a decreased convergence rate, stabilizing at a lower reward value of around 10 000 episodes.Hence, to demonstrate the efficiency of DRL-VQFL, we set the lr to 0.01 in the following simulations.
Fig. 7 depicts the reward as functions of the latency tolerance T max at one global round.DRL-VQFL outperforms the other two baselines across all the T max values, highlighting its overall superiority.In DRL-VQFL, the reward first increases with larger T max , because larger T max allows less bandwidth for uploading, so more vehicles can participate.Then, the reward stabilizes approximately when T max > 0.6 s, as enough vehicles can engage, and the number of participants and latency remain consistent.In RBA, the reward initially benefits from the optimal quantization bit allocation but subsequently decreases as the vehicles fail to acquire sufficient bandwidth to complete the tasks within the allowed latency.In RQA, the reward remains at a low value, because incorrect quantization allocation triggers the quantization error penalty, causing the agent to prioritize constraint adherence over reward maximization.Additionally, a sharp decline is observed in RQA when T max = 0.35 s, which can be attributed to fewer vehicles participating in FL training, making it easier to meet the quantization error constraint.
Fig. 8 plots the accumulated number of selected vehicles for different algorithms under various quantization error constraints.In DRL-VQFL (loose), RBA, and RQA, the same quantization error constraint is applied, while a smaller quantization error constraint is utilized in DRL-VQFL (tight).We can find that the accumulated number of selected vehicles in all curves increases as the latency tolerance T max becomes larger.This is because a larger T max allows more vehicles to utilize the allocated bandwidth to upload their parameters  within the allowed latency.We can also observe that the value in DRL-VQFL (loose) exceeds those in RBA and RQA.This difference arises because RBA can lead to the straggler effect, reducing the actual number of participants, while random quantization allocation can result in a larger quantization error, leading to fewer vehicles being selected.We also notice that the DRL-VQFL (tight) archives lower values compared to the DRL-VQFL (loose) when T max is small because the quantization error constraint limits the number of selected vehicles to satisfy the desired latency requirement.However, as T max becomes larger, DRL-VQFL (tight) and DRL-VQFL (loose) can obtain similar values.
Fig. 9 displays the latency under various latency tolerance T max constraints.We can find that as T max increases, the latency in each FL iteration also increases, and the difference of each curve diminishes.Besides, it can be observed that the latency for T max = 0.3 and T max = 0.35 keep decreasing, indicating that the latency decreases at each round as vehicles approach the RSU with improved wireless channel conditions.When T max ≥ 0.4, the curves of latency exhibit convexity.This is because as the latency tolerance increases, the vehicles travel a longer distance, resulting in initially decreasing and then increasing distances between the vehicles and the RSU.Fig. 10(c) shows the training loss and the testing accuracy of the CIFAR-10 and CIFAR-100 data sets as the latency increases in seconds.We compare the FL performance under the four different quantization error constraints ε 1 , ε 2 , ε 3 , and ε 4 (ε 1 < ε 2 < ε 3 < ε 4 ) with the lossless scheme (MaxBits), where all the vehicles upload the local models with the maximum quantization bit q max .It can be found that a tighter quantization error constraint achieves lower loss and higher testing accuracy, with the tightest constraint approximating the performance of the lossless scheme.In addition, as the quantization error constraint becomes looser, the convergence rate accelerates.This is because a looser constraint allows for smaller quantization bits, resulting in a smaller size of the transmitted model.
In Fig. 11(c), the training losses and testing accuracy under different optimization algorithms for the CIFAR-10 and the CIFAR-100 data sets are depicted.Across all the subfigures, it can be observed that the DRL-VQFL achieves comparable loss and accuracy to RBA with significantly reduced latency, underscoring the superiority of DRL-VQFL in terms of convergence rate.This is because the efficient bandwidth allocation in DRL-VQFL avoids the occurrence of the straggler effect and minimizes unexpected latency.Moreover, even though DRL-VQFL demonstrates a slightly longer latency compared to RQA in both the data sets, it consistently and significantly outperforms RQA at equivalent latency levels in terms of both training loss and testing accuracy.This can be attributed to optimized quantization allocation in DRL-VQFL, which facilitates greater participant involvement while satisfying the quantization error constraint and reducing the total latency.

VII. CONCLUSION
In this article, we have investigated the performance and efficiency of FL within vehicle networks by incorporating the quantization scheme into the local models before uplink transmission.Through a theoretical derivation of the convergence rate of the convex loss function in FL, we have gained insights into how the quantization error and the number of participating clients impact the FL performance.Following that, we have formulated an MOP intending to maximize the participation of clients while minimizing the overall latency via the joint design of quantization levels, bandwidth allocation, and client selection.To tackle this MOP, we have employed the decomposition strategy to break down the original problem into several scalar subproblems.We also leveraged the parameter transfer strategy to establish the initial parameters for each subproblem.For each subproblem, we have converted it to an MDP, considering vehicles' high mobility and dynamic wireless channel conditions.Then, we proposed the DRL-VQFL approach to solve the MDP.Through extensive numerical results and simulations, our proposed scheme has consistently demonstrated superior performance and efficiency in terms of learning accuracy and total training latency.Therefore, DRL-VQFL effectively harnesses the potential of vehicular networks while addressing the challenges posed by the quantization errors and dynamic conditions.In the future, we will investigate split learning (SL), which divides the deep models into Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.k∈N i ρ k O k,i and ν ≤ max{(χ 2 (W 1 + W 2 )/χ μ − 1), τ 0 }.With the above derivations, we use the induction method also applied in [37] to prove our assumption Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Let χ = (2/μ), and then from the definition of ν we have Then, E[F(w m )] − F(w * ) can be bounded as Therefore, the proof of Theorem 1 is completed.

Fig. 1 .
Fig. 1.Workflow of the quantized FL in vehicular networks.
define the local training rounds as I = {1, 2, . . ., I}.Let w m denote the global model at global round m and w i k,m as the local model of vehicle k in local round i at global round m.Then, by employing the local stochastic gradient descent (SGD) method, the vehicle k updates its local model parameters at the global communication round m as quantify the degree of non-iid, where F(w * ) and F * k are the minimal values of F and F k .Then, the upper bound of E[F(w(T))] − F(w * ) is presented in Theorem 1 to display the convergence result as follows.Theorem 1: Let Assumptions 1 to 4 hold.Let 0 = E w 0 − w * 2 .If τ > max{2, (2/μ), (L/μ)}, and the lr
and B = [B 1 , B 2 , . . ., B M ] are the metrics of decision vectors Q m and B m for m ∈ M.

Algorithm 3 :
Quantized FL in Vehicular Networks 1: The RSU initializes global model parameters w 0 and sends global model to all vehicles.2: The RSU determines the optimal selection indices C, quantization bit assignment Q and the bandwidth allocation B by solving the MOP in (24) based on Algorithm 1. 3: for m = 1 to M do 4:for each selected client k in K do .

10 :
Calculate w maxk,m and w min k,m across all parameters in w k,m .11:

Fig. 4 .
Fig. 4. Value of the reward as the number of episodes varies for different algorithms.

Fig. 5 .
Fig. 5. Value of the reward with different bss.

Fig. 6 .
Fig. 6.Value of the reward under different lr.

Fig. 8 .
Fig. 8. Accumulated number of selected vehicles for different algorithms under various quantization error constraints.

Fig. 9 .
Fig. 9. Time delay for different latency tolerance versus the FL iteration numbers.