Privacy amplification for wireless federated learning with R´enyi di ff erential privacy and subsampling

: A key issue in current federated learning research is how to improve the performance of federated learning algorithms by reducing communication overhead and computing costs while ensuring data privacy. This paper proposed an e ffi cient wireless transmission scheme termed the subsampling privacy-enabled RDP wireless transmission system (SS-RDP-WTS), which can reduce the communication and computing overhead in the process of learning but also enhance the privacy protection ability of federated learning. We proved our scheme’s convergence and analyzed its privacy guarantee, as well as demoonstrated the performance of our scheme on the Modified National Institute of Standards and Technology database (MNIST) and Canadian Institute for Advanced Research, 10 classes datasets (CIFAR10).


Introduction
The proliferation of advanced technologies, including Artificial Intelligence (AI), internet of things (IoT), and cloud computing, has led to the emergence of federated learning (FL) as a very promising distributed machine learning approach [1].Specifically, using the edge network could benefit FL by enhancing the efficiency of learning updates and responses.Meanwhile, FL encounters the challenge of costly communication and time-sensitive training when deploying FL in the edge network.Hence, wireless multi-access channels are introduced in reference [2] as a means to transmit parameters or gradient information, therefore mitigating communication overhead.Furthermore, multi-channel communication in intricate wireless communication networks can conserve resources, provided that the communication information is effectively distributed across various channels [3].Therefore, the proposed approach is to incorporate the gradient compression mechanism into FL in order to enhance communication efficiency.Gradient compression strategies typically encompass methods like sparsification and quantization.The sparsification technique is employed to compress the local model before transmission to the parameter server (PS), significantly reducing communication costs [4].Nevertheless, it is possible that the process of sparsification could potentially impede the rate at which convergence occurs.Quantization can potentially decrease the amount of data in a given communication round prior to transmitting gradients to the PS, potentially expediting the convergence process [5].
Although FL addresses certain challenges related to safeguarding data privacy, it has vulnerabilities in terms of security [6][7][8].One of the primary factors contributing to this issue is the need for more consideration for safeguarding the model parameters in FL.In scenarios where servers and clients are semi-honest, the potential for the inference of sensitive information exists, thus leading to privacy breaches.In this context, current approaches integrate FL with privacy-preserving technologies to augment privacy.These technologies encompass differential privacy (DP) [9][10][11], homomorphic encryption [12], secure multi-party computation [13] and Blockchain [14], among others.DP stands out among the various methods due to its rigorous mathematical description and efficient computational requirements [15].
Additionally, DP can precisely measure data privacy and loss [16,17].Therefore, it is essential to highlight that a prominent area of focus in the field of DP pertains to exploring various forms of divergence for developing differential privacy variations.In this regard, Rényi differential privacy (RDP) has demonstrated the capability to yield robust privacy outcomes by employing a sequence of random mechanisms when accessing device datasets [18][19][20].The implementation of DP has the potential to enhance local data privacy.However, it is essential to note that employing encryption procedures with DP may result in increased computational overhead and a decrease in training accuracy, as stated in reference [21].Therefore, optimizing FL's performance by minimizing communication overhead and computing costs while simultaneously upholding data privacy and security emerges as a pivotal concern.
While DP can effectively safeguard private data by introducing noise, it also introduces potential challenges, such as increased computing overhead and a potential decrease in training accuracy [21].Motivated by the inherent danger, we present a novel framework SS-RDP-WTS.The proposed scheme involves the utilization of multi-access channels within a wireless communication network to ensure data privacy protection and efficient communication in the context of FL.The technique of gradient compression is implemented by employing a multi-level stochastic quantization approach inside a federated learning framework that incorporates wireless multiple access channels.This approach effectively reduces the burden of communication and processing expenses by selectively sampling and reducing randomness.Furthermore, considering the favorable combination aspects of the RDP, we propose augmenting the privacy protection capability of the enhanced method.The safeguarding of model parameters is achieved through discrete Gaussian noise during client updates, enhancing their security.Additionally, clients' privacy is further ensured by leveraging the privacy amplification effect of uniform subsampling.Our paper provides a theoretical formulation for a comprehensive distributed mean estimation problem.The empirical findings demonstrate that the subsampling private wireless transmission technique efficiently ensures a balance between model utility and communication while offering enhanced privacy protection.
The remainder of this paper is organized as follows.Section two describes the relevant prerequisites, including basic definitions and theorems related to DP. Section three introduces the proposed system model with a subsampling private wireless transmission scheme.Section four analyzes the system model's privacy analysis and convergence rate, section five evaluates its performance, and section six presents the conclusions.

Background
In this section, we present the system model and give RDP and its related definitions.The commonly used notations are listed in Table 1.
Table 1.Summary of the main notations.

Symbol Description ε
Privacy budget δ The upper bound of the probability of differential privacy failure α The order of the ratio of two probability distributions σ k,t Noise level of client k in iteration t w t The parameter vector in iteration t ℓ(w)

Loss function ∆ f
The function of sensitivity k The number of client The label corresponding to the data point i of clientk range of data and more powerful computing resources to improve the accuracy and performance of the model.One potential approach to addressing the FL objective while safeguarding client privacy involves training the model using a modified and distorted representation of the data, which does not expose the actual data directly during the training process.To solve this privacy issue, Du et al. [22] implemented a machine learning strategy for smart edges using DP.Seif et al. [23] studied the problem of FL over a wireless channel, modeled by a Gaussian multiple access channel (MAC) and subject to local differential privacy constraints.Liu et al. [24] demonstrated that as long as the privacy constraint level, measured via DP, is below a threshold that decreases with the signal-to-noise ratio, the uncoded transmission achieves privacy "for free", i.e., without affecting the learning performance.During the learning process, model updates using local private samples and large-scale parameter exchanges among agents impose severe privacy concerns and communication bottleneck.To address these problems, Ding et al. [25] proposed two DP and communication efficient algorithms.In this purpose, Wei et al. [26] to minimized FL training delay over wireless channels, constrained by overall training performance as well as each client's DP requirement.

Privacy definitions
It is known that DP has put private data analysis on the firm theoretical foundation [27,28].We present the definitions of DP below.Definition 1. (ε -DP [29]).For ε > 0, a randomized algorithm M is said to be ε -differentially private if, and only if, for any two adjacent sets D 1 , D 2 and any S ⊆ Range(M) , it holds that where ε is the privacy budget.
In order to attain anonymity, it is imperative that the algorithm M has a randomized component.A reduced privacy budget implies that the adversary has less capacity to discern the existence of any data based on the output.Definition 2. ( (ε, δ) -DP [9]).For ε, δ > 0, a randomized algorithm M is said to be (ε, δ) -differentially private if, and only if, for any two adjacent set D 1 , D 2 and any S ⊆ Range(M) , it satisfies that (2.2) Definition 3. (RDP [19]).For ε > 0 and α > 1 , a randomized mechanism M satisfies (α, ε) -RDP if, and only if, for any two adjacent sets D 1 , D 2 ∈ S , it holds that It is evident that RDP is strictly stronger than (ε, δ)-DP for δ > 0, and it enables more precise bounds for the composition of the Gaussian mechanism.The subsequent result is employed to transform an RDP into a central DP.

The advantages of RDP over DP
DP employs the concept of sensitivity to quantify the level of data privacy exposure.Sensitivity refers to the utmost extent to which the outcomes of a complete database query are influenced by the alteration of a single individual's data.Nevertheless, RDP offers an enhanced flexibility and a wider range of choices with the incorporation of a sensitivity parameter α for measurement.
To begin, RDP enables the selection of suitable sensitivity levels based on diverse application scenarios, hence, improving the adaptability of privacy protection to a range of data processing requirements.RDP, for instance, was equal to DP at α = 1.However, RDP is appropriate for instances with higher privacy requirements and might offer stronger privacy protection when α > 1.
In contrast, when α < 1, enabling RDP resulted in increased query accuracy.On the contrary, when α < 1, RDP led to a notable improvement in query accuracy.
Furthermore, when compared to DP, RDP demonstrates superior efficacy in safeguarding privacy inside certain attack models.In certain scenarios, such as when facing a reorganization assault or when numerous queries are allowed, it has been observed that DP may not offer adequate security for privacy.DP synthesis can be achieved in RDP by strategically selecting suitable α values, resulting in enhanced privacy protection performance.
Furthermore, RDP exhibits superior composability.This feature enables the integration of diverse privacy strategies in order to enhance privacy protection, while still upholding privacy performance.The inherent composability of this feature holds significant importance in facilitating advanced data analysis and ensuring the secure exchange of data.

Discrete Gaussian mechanism via subsampling
We state the definitions of the discrete Gaussian mechanism and establish relevant privacy guarantees.We first introduce discrete Gaussian distribution.Definition 4. (Discrete Gaussian distribution [31]).Let σ ∈ R with σ > 0 .Discrete Gaussian distribution is a probability distribution on a discrete additive subgroup L. The discrete Gaussian distribution with scale σ is denoted as N L (σ) , x ∈ L , and the probability mass on x is proportional to e −x 2 /2σ 2 .
In order to ascertain the magnitude of the introduced noise, it is necessary to impose limitations on the ℓ 2 -sensitivity of the gradient aggregation.Within the context of DP, the process of calibrating the amount of noise to be added is influenced by the sensitivity of the function, as explicitly specified as follows.
Definition 5. (ℓ 2 -sensitivity [32]).Suppose that a function f : D → R and two adjacent datasets are D 1 and D 2 .The ℓ 2 -sensitivity of f is defined as Lemma 2. (RDP for discrete Gaussian mechanism [18]).A mechanism , which alters the output of another algorithm range ( f ) ⊆ L, by adding Gaussian noise, i.e., f (.) The discrete Gaussian mechanism is realized by adding noise with discrete Gaussian distribution to the output function evaluated on a sensitive dataset.Canonne et al. (2020) demonstrated concentrated DP for the discrete Gaussian mechanism and provided a thorough analysis of the privacy and utility properties of the discrete Gaussian [30].For FL, Wang et al. (2021) proposed a discrete Gaussianbased RDP with an analytical moments accountant-based RDP [20].
It is significant to exploit the randomness in subsampling.Wang et al. (2018) remarked that discrete Gaussian exhibits tight privacy amplification bound via subsampling and indicated that RDP enables the discrete Gaussian mechanism to be composed tightly with an analytical moments accountant, which saves the privacy budget in a multi-round FL [33].

Our proposed SS-RDP-WTS scheme
This section presents our proposed SS-RDP-WTS, as shown in Figure 1.
The proposed system performs an iterative process in which the following steps are repeated: 1) The PS broadcasts the initialization model to a randomly selected subset of local clients.Each client then performs local random gradient descent using its local data to obtain an updated local gradient.2) To tackle the communication bottleneck, the local gradient is compressed using multi-level random ladder quantification for the clients that are selected through subsampling.

SS-RDP-WTS:
We study an FL system with with a central server, N clients, and a wireless communication technique.The clients establish communication with the processing system through wireless GMAC technology.The pseudocode of this algorithm is shown in Algorithm 1.
Each client k has a local dataset D k , which includes the number of data points, denoted as u (k) i .Each data point i within client k's dataset is associated with a label, denoted as v (k)  i .Its size is denoted as |D k |, and so . The client establishes communication with the PS through GMAC to train the model.This training process involves minimizing the loss function ℓ(w) where w ∈ R d is the parameter vector, ℓ k (•) is the loss function for client k, and D total = ∪ K t k=1 D k is the total dataset participating in training minimized ℓ(w) iteratively by the stochastic gradient descent (SGD) algorithm.In the training iteration t, the PS broadcasts the global parameter vector w t to the clients.Each client k calculates its local gradient on the local dataset Then, the clients statistically quantify the gradient values into a discrete domain and implement the discrete Gaussian technique to ensure anonymity.During the quantification process, quantification parameters are optimized considering the capacity level of the wireless GMAC.To streamline the presentation, we have omitted the iteration index t from our paper.During each iteration, each client performs quantization on its local vector g t k by dividing it into b t k discrete levels, ensuring that the quantized value lies within the range −g max k , g max k .Let g max k be the clipping bound L. For every integer φ in the range 0, b t k , B k (φ) where b t k ≥ 2 represents the quantization level for client k.It is usual to choose s k as 2g max k , and B k (φ) is the index.If g k, j ∈ B k (φ) , B k (φ + 1) holds for the client k, it follows that So, q k, j is an unbiased estimator of g k, j , meaning that E q t k, j = g t k, j .Additionally, the variance can be bounded After quantizing the complete gradient vector, the Gaussian technique is subsequently employed.This method is not suited for transferring quantized local gradients.One alternative method for ensuring privacy is the incorporation of discrete Gaussian noise.If the local privacy model is outside the quantization range, it is to employ preprocessing techniques to substitute it.
Subsequently, client k uses a preprocessing function to acquire the code word x t k , denoted as x t k = f t k q t k, j , and transmits it to PS.In our paper, we examine the random set of participants K t , obtaining using uniform subsampling.Among them, the client k participates in the training process during iteration t with probability P. When K t =N, it can be inferred that all clients are engaged in the training.The signal generated by the client k during iteration t is where n k,t ∼ N L 0, σ  Parameter Server: In addition, when clients transmit x t k to the PS through wireless GMAC, a total of dlog 2 b bits are required for transmitting present the quantized gradient vector, where the transmission rate of clients is r t k = dlog 2 b t k , which adheres to the GMAC capacity region [30] given by the inequality where C K t is the combined capacity of wireless GMAC in subset K t .Suppose that the channel inputs during each iteration adhere to the average power constraint, denoted as x t k 2 2 ≤ NP k , ∀k.Furthermore, C K t = 0.5 log 1 + K t k=1 p k /N 0 .The input-output relationship over GMAC can be expressed as follows: Here, x t k ∈ R d is the signal transmitted by client k at iteration t and m t ∼ N L (0, N 0 ) is the independent identically distribution (IID) channel noise.
After the PS receives signal y t , the objective of the PS is to acquire the average value of the accurate gradient by utilizing a post-processing function h t (•).This average value can be represented by ĝt = K t k=1 y k /µ |K t | .Nevertheless, due to the implementation of preprocessing and post-processing techniques, as well as the GMAC system capabilities, the PS can only decode and restore the perturbed local gradients.

Theoretical analysis
In this section, we present the primary results of our proposed approach.In this section, we analyze the RDP features of the subsampling private wireless transmission system.Specifically, we discuss the privacy guarantee outlined in Theorem 1.Additionally, we introduce the privacy-convergence trade-off of the scheme, as shown in Theorem 2.

Privacy analysis under SS-RDP-WTS
The subsequent theorem elucidates the RDP guarantee associated with the discrete Gaussian mechanism within the subsampling privacy model, as outlined in Theorem 1.Initially, a process is conducted to select a subset of K t ≤ N clients from a more extensive set of clients.This subsampling is performed independently and uniformly, with each client having an equal likelihood of being selected.The probability of selection is denoted by P = K t /N.Subsequently, within the chosen subset of k clients, each client k proceeds to quantize its respective local gradient and subsequently implements a discrete Gaussian process.
Theorem 1.Let the bound for clipping be L, the scale of noise scale be σ and the quantization level be b k with b k ≥ b min .The proposed scheme meets (α, ε ′ (α))-RDP with According to Lemma 1, the discrete Gaussian mechanism adheres to α, α(∆ f ) 2 /2σ 2 -RDP.Next, we prove the bound of the sensitivity for client k.Note y t = K t k=1 g t k + Z t and the variance of the effective Gaussian noise where L is the Lipschitz parameter.Thus, According to [20], the upper bound on the quantized ℓ 2 -sensitivity is denoted as Thus, it is sufficient to show that the composition of quantization and discrete Gaussian mechanism is α, α(∆ f ) 2 /2σ 2 -RDP if we have According to Theorem 9 in [15], we obtain Our scheme meets α, ε ′ (α)/2σ 2 -RDP.This completes the proof of Theorem 1.
Remark: Our proposed system offers a more robust privacy guarantee unlike prior research.This can be primarily illustrated by considering the following factors: 1) The proposed scheme adheres to RDP principles, which offer enhanced privacy measures and a more stringent composition, which have been shown in [19].
2) As a result of implementing privacy amplification through subsampling and the addition of discrete Gaussian noise, our scheme maintains a reduced noise level.

The convergence rate on SS-RDP-WTS
Theorem 2. Assuming the loss function ℓ(.) is λ -strongly convex, µ-smooth, and L-Lipschitz gradient, the learning rate is η t = 1/λt.The convergence rate satisfies Proof.Note that In the iteration t, the PS averages the received K t gradients, which is After post-processing, we get gt = f (ĝ t ).Note that K t is a binomial random variable, and the probability of each client being selected is the subsampling probability of P = K t /N, so Then we obtain the bounds on the second moment of the gradient where (a) follows that for any two vectors m, n ∈ R d and ρ > 0, 2⟨m, n⟩ ≤ ρ ∥m∥ 2 2 + ρ −1 ∥n∥ 2 2 , (b) follows that the the local gradients satisfying the Lipschitz condition are bounded by g t k 2 ≤ L, ∀k.
According to the variance expression of the gradient estimate and the expected properties, we have where (c) follows by E q t k, j − g t k, j We complete the proof of Theorem 2.
Remark: The convergence rate of our scheme is determined by applying the established convergence results of SGD, as in reference [34,35].We subsequently compute the necessary parameters to ensure convergence.Furthermore, we demonstrate the potential for enhancing the convergence rate of our methodology by adjusting the quantization level, subsampling probabilities and noise parameters within the constraints of a specific privacy budget and communication set.The convergence rates per round of SS-RDP-WTS can be improved due to the more compact structure of subsampling RDP, given a specific noise level, in accordance with the convergence rate boundary.Simultaneously, discrete Gaussian noise offers more robust privacy assurances at equivalent noise levels.
To characterize the convergence of our scheme, the optimum values of quantization level and noise parameters maximizing the convergence rate in Theorem 2 are the solutions to an optimization problem, but the optimization problem is an instance of constrained integer nonlinear programming (INLP).To simplify the calculation, we set some given noise parameters.Therefore, the optimization problem can be written as Computation cost analysis of SS-RDP-WTS: Our proposed approach involves calculating the gradient of the clients, which is influenced by factors such as the dataset size of each client and the complexity of the model.Typically, the computation of the gradient on the client side is considered a reasonably low-cost process, as it only entails navigating through local data and making adjustments to model parameters.The discretization operation is a technique that partitions the gradient values into distinct regions, enhancing the privacy preservation of the gradient.The discretization procedure is generally cost-effective because it mainly focuses on aligning gradient values with discrete regions.To safeguard individual privacy, noise is introduced as an additional measure.This is accomplished by perturbing the discretized gradient, a commonly employed and cost-effective procedure.Our proposed scheme achieves this objective by directly injecting noise into the gradient.
Communication cost analysis of SS-RDP-WTS: The initial step involves selecting a smaller portion of the dataset, followed by determining the client's participation in training based on the probability P. The outcome of the client to engage in training is then communicated to the server.The communication overhead associated with this process is minimal, as it entails transmitting a single binary decision outcome.The second component pertains to the transportation gradient, wherein each individual communicates the discretized, noise-injected and amplified gradient to the server-side using the wireless GMAC.The presence of communication overhead is contingent upon the dimensionality and precision of the gradient.Lastly, the received gradient refers to the process in which the server side is responsible for receiving gradients from all clients and subsequently computing the global average gradient.This procedure entails obtaining gradient data from each client, which is contingent upon the number of clients and the quantity of the gradient data.

Performance evaluation
In this section, we evaluate the efficacy of our proposed strategy.We examine two learning tasks utilizing convolutional neural networks training on CIFAR10 and MNIST datasets, employing with a cross-entropy loss function.The model's dimensionality on CIFAR10 and MNIST is d = 62, 006 and d = 44, 426, respectively.In our training process with the CIFAR10 dataset, we utilize IID partitions corresponding to 10 clients.However, in the MNIST experiment, we employ non-IID partitions corresponding to 10 clients.
Furthermore, it is necessary to establish a given subsampling probability.In this context, let δ = 10 −5 for different subsampling probabilities P. In particular, when P = 1, it indicates that all clients are involved in the training process.Let σ k,t = 1, ∀t ∈ [T ] be the Gaussian noise.Each client applies a clipping operation to the local gradient, utilizing an empirically determined Lipschitz constant, denoted as L = 1.To perform the calculation of RDP accounting, the Google's DP library is employed.In the experiment, we conducted tests on the variable α within the range of two to 64, along with other numerical values like 128, 256 and 512.Our objective was to identify the minimum value of ε through the process.The identified of the minimal value of the quantization level is given as b min = 101 in Figure 2 while b min = 64 in Figures 3 and 4. To facilitate the process of comparing different bounds, the RDP bounds are translated into the epsilon-delta differential privacy framework.In Figure 2, it can be shown that the values ε corresponding to P = 1 and P = 0.1 , exhibit relatively high levels, indicating a lower degree of privacy protection.It has been observed that reducing the subsampling probabilities leads to an improvement in the privacy guarantee.In the context of the actual wireless network environment, a significant number of clients are involved in the training process.Hence, it would be rational to choose for a reduced subsampling probability.Figure 3 examines the relationship between model accuracy and noise level variation, while maintaining a constant sampling probability.Hence, opt to select two clients for participation in the training process, with an average transmit power limitation of p 1 = 320 and p 2 = 80 for the respective datasets.
The number of channels in the MAC is set to s = 5d for each iteration.The minimum value of the quantization level is given as b min = 64.To maintain the quantized attributes, it is postulated that the privacy local model undergoes truncation when it is beyond the truncation threshold, denoted as q max + 3σ k,t .Here, q max denotes the highest value of the quantized gradient.The findings illustrate that the magnitude of the discrete Gaussian noise has a certain impact on the convergence rate outlined in Theorem 2, once the accuracy of the model has already reached convergence.Simultaneously, when the magnitude of noise decreases, the level of precision increases.its accuracy gradually approaches that of the scheme without privacy, given the various privacy budgets and the same subsampling probability.Furthermore, the enhancement of our method in terms of model accuracy post-convergence has a greater magnitude on the MNIST dataset compared to the CIFAR10 dataset.Furthermore, it can be observed that the model achieves convergence with a reduced privacy budget when applied to the CIFAR10 dataset.This phenomenon occurs to a certain degree once the accuracy of the model has reached convergence.Simultaneously, when the scale of noise decreases, the level of precision increases.

Conclusions
In this paper, we presented a novel SS-RDP-WTS that aimed to minimize communication costs and enhance the privacy of FL.In the FL framework with MAC, the model gradient was compressed using multi-level stochastic gradient quantization to address the constraint of limited communication resources.This compression technique helped minimize communication costs and enhance algorithm communication efficiency by means of subsampling.Additionally, the system incorporated the introduction of discrete Gaussian noise and leveraged the privacy amplification effect of subsampling to strengthen the privacy protection measures, taking into account the closely intertwined characteristics of RDP.The theoretical analysis of the subsampling private wireless transmission technique encompassed the examination of its convergence and privacy boundary.Based on the empirical findings, the implemented scheme exhibited the capability to enhance the efficacy of the FL algorithm, concurrently mitigating communication burdens and safeguarding data confidentiality.

Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
3) To further enhance the algorithm's privacy protection capability, discrete Gaussian noise is employed in the quantized discretized gradient.4) The model parameters of the noisy version are transmitted through wireless global mobile access communication (GMAC) to improve the communication efficiency of the model.5) The PS aggregates the uploaded model parameters in an average manner to obtain a new global model.

Figure 2 .
Figure 2. Privacy measured by for different subsampling probabilities across iterations.

Figure 3 .
Figure 3. Model accuracy measured for different noise levels across iterations.

Figure 4 .
Figure 4. Model accuracy measured for different privacy budgets across iterations.

Figure 4
Figure 4 the model's accuracy when subjected to the various privacy budgets.Let σ k,t = 1, ∀t ∈ [T ] be the Gaussian noise.As the subsampling private wireless transmission strategy converges, In the wireless communication network environment, due to the rapid change of data and the complexity of edge devices, wireless FL can not only update the model in real time, but also improve the learning effect.Moreover, it enables FL to train in a broader 2 k,t I d .Algorithm 1: SS-RDP-WTS Input: N is the total number of clients; η t is the learning rate; n k,t is the noise of client k; m t is the channel noise; b t k is the quantization level.Subsampling a subset of client K t ⊂ [N] , |K t | = PN and broadcast w t−1 and g max 3 6 Train the local model w k t with w t as initialization 7 w k t ← ClientUpdate(k,w t−1 ) /* Local model updating */