Efficient Verifiable Protocol for Privacy-Preserving Aggregation in Federated Learning

Federated learning has gained extensive interest in recent years owing to its ability to update model parameters without obtaining raw data from users, which makes it a viable privacy-preserving machine learning model for collaborative distributed learning among various devices. However, due to the fact that adversaries can track and deduce private information about users from shared gradients, federated learning is vulnerable to numerous security and privacy threats. In this work, a communication-efficient protocol for secure aggregation of model parameters in a federated learning setting is proposed where training is done on user devices while the aggregated trained model could be constructed on the server side without revealing the raw data of users. The proposed protocol is robust against users’ dropouts, and it enables each user to independently validate the aggregated result supplied by the server. The suggested protocol is secure in an honest-but-curious environment, and privacy is maintained even if the majority of parties are in collusion. A practical scenario for the proposed setting is discussed. Additionally, a simulation of the protocol is evaluated, and results demonstrate that it outperforms one of the state-of-art protocols, especially when the number of dropouts increases.

Abstract-Federated learning has gained extensive interest in recent years owing to its ability to update model parameters without obtaining raw data from users, which makes it a viable privacy-preserving machine learning model for collaborative distributed learning among various devices. However, due to the fact that adversaries can track and deduce private information about users from shared gradients, federated learning is vulnerable to numerous security and privacy threats. In this work, a communication-efficient protocol for secure aggregation of model parameters in a federated learning setting is proposed where training is done on user devices while the aggregated trained model could be constructed on the server side without revealing the raw data of users. The proposed protocol is robust against users' dropouts, and it enables each user to independently validate the aggregated result supplied by the server. The suggested protocol is secure in an honest-but-curious environment, and privacy is maintained even if the majority of parties are in collusion. A practical scenario for the proposed setting is discussed. Additionally, a simulation of the protocol is evaluated, and results demonstrate that it outperforms one of the state-of-art protocols, especially when the number of dropouts increases.

I. INTRODUCTION
F EDERATED learning has been actively researched in the last five years [1], [2] as a collaborative way to perform machine learning tasks between many clients, possibly mobile devices, without the data leaving the clients to preserve their privacy. The service provider in this setting just orchestrates many clients, receives local models' parameters, then updates, and ensures the validity of the global aggregated model whether it is a deep-learning model [1], a tree-based model [3], [4] or any other model type.
Federated learning (FL) faces many challenges [2], such as devices' heterogeneity, limited resources, availability, and communication overhead. Besides that, federated learning faces some security and privacy threats from external malicious actors that might do model update poisoning or traditional data poisoning. A wide range of different threats and attacks are reviewed in [5] and [6]. Mitigating these kinds of poisoning attacks is challenging to distinguish between honest and malicious updates [7], [8]. An adversarial server also represents a threat to a federated learning setting as a curious actor at the server side might do reverse engineering for the local model's parameters received at the server to reveal private data.
In order to do this aggregation in a secure manner, researchers studied numerous possibilities [9], including using homomorphic encryption, differential privacy, secure multiparty computation protocols [10], and trusted execution environments [2].
Each of these techniques has its own pros and cons. While homomorphic encryption provides a private solution by aggregating encrypted data from clients, prohibiting the server from reverse-engineering the model's weights or discovering training data, it is computationally expensive and impractical for the majority of applications [11].
It also does not consider the problem of dropouts when some devices drop out of the network due to connectivity problems or battery power outages. Using differential privacy alone doesn't consider the dropouts as well and faces the privacyutility trade-off challenge but can operate in the shuffled model where a trusted third party shuffles the noisy client updates before forwarding them to the server [12].
Research in secure multi-party computation protocols and their applications for have started a long-time ago [10], [13]. However, state-of-the-art secure multi-party computation protocols based on secret sharing for federated learning [14], [15], [16] still faces challenges with the communication overhead incurred in the aggregation and verification process. They also cannot accommodate the cross-silo setting where client devices do the local training, but these clients belong to different organizations with the requirement to ensure their clients' privacy. Examples of this scenario include healthcare organizations, different banks, or multiple operating vehicular ad hoc networks (VANETs) [17] scenarios. The proposed protocol addresses these two issues for secure multiparty computation of the gradient vector. In the next Section, the contributions of our research work are emphasized.

A. Our Contributions
Our protocol primarily addresses the fundamental security challenges in federated learning: the confidentiality of local gradients and the verifiability of aggregation through incorporating auxiliary nodes that represent organizations such as hospitals, banks, or VANETs. These auxiliary nodes participate in the protocol to help in keeping the client nodes' data private; further explanation will be provided in section V. The contributions of this research work to the area of privacypreserving and verifiable aggregation depending on secure multi-party computations can be summarized as follows: • An efficient, verifiable privacy-preserving aggregation algorithm is proposed. It relies on lightweight primitives.
• To ensure the confidentiality of the user's local gradients, a single-masking protocol is used in our scheme instead of a double-masking protocol used in most literature work.
• For the verification of the aggregated result, we use the concept of double aggregation, which is very lightweight in computation compared to cryptographic primitives used in most of the research work in the literature.
• The algorithm is compared in terms of communication, computation, and storage complexities to existing algorithms.
• Performance evaluation and analysis of the proposed algorithm with changing the number of auxiliary nodes and dropout percentage is presented. We are mainly focused on the setting where several organizations are cooperating, and each organization has several users. The organizations need to collaborate to train a global model on all user's private data without violating the privacy of the users' data either by the organization they are affiliated with or by other organizations. Our scheme allows each organization to participate in the protocol as an auxiliary node, which lets each organization guarantee its users' privacy.

B. Organization
The rest of the paper is organized as follows. Section II presents related work in the literature. In Section III the background needed for the approach in the paper is briefly discussed. Section IV presents an application scenario for the proposed protocol in healthcare domain and illustrates the system architecture of the proposed scheme and its threat model. Section V explains the protocol steps and handling of dropouts. It also addresses the verifiability of the aggregated result. Security analysis for the protocol is presented in section VI. Evaluation of the performance of the proposed algorithm is discussed in section VII. Finally, the paper is concluded in section VIII. To make the paper easy to follow, we summarize the mathematical symbols and notations used in the paper in Table I. II. RELATED WORK Our research contributes to two areas; secure aggregation and verification of server work in federated learning. In this section, we briefly review recent related research work in these two areas.

A. Secure Aggregation
Secure aggregation in federated learning (FL) refers to the aggregate computation of the sum of local models' parameters updates in a secure way without learning any information about the personal private data that produced these parameters. This has been done in the literature in various ways that differ in terms of computation complexity, communication latency, and how they deal with the problem of dropout nodes which is a common problem in federated learning settings. This research area has been actively researched in the last five years. In this section, only a limited number of examples of the research work using differential privacy, homomorphic encryption, secret sharing, and other secure multi-party computation techniques are reviewed.
1) Using Differential Privacy (DP): The authors in [18] proposed using a local differential privacy mechanism to update the local weights of a deep neural network adapting to the varying ranges of weights at different layers. They used parameter shuffling aggregation to bypass the curse of dimensionality to avoid privacy budget explosion. In a similar way, the authors in [19] used local differential privacy to add noise to the local models' parameters before aggregation. They analyzed the compromise between convergence performance and privacy protection levels. They showed that increasing the number of users participating in FL can increase the model convergence and emphasized the trade-off between the model convergence and the privacy-protection level. Applying local differential privacy at the local models has the advantage of less communication time needed as only the differentially private local model parameters are exchanged. However, this approach requires a large number of participating users, and it isn't evaluated for the dropouts effect. To alleviate some of these problems, Kairouz et al. [20] proposed adding discrete Gaussian noise before performing secure aggregation and after discretization of the user model updates. The authors in [21] depended on a distributed Laplace perturbation mechanism which is more efficient in terms of noise generation time. A problem with the approach in [20] is that privacy guarantees degrade as the dropped-out users increase. In [22], the authors combine the addition of Gaussian noise with a learning with errors (LWE)-based masking protocol that substantially reduces the communication complexity required to add large vectors. The authors in [23] also achieved low communication overhead with a training mechanism that requires flexible participation of clients. In [19], the authors used differential privacy to protect privacy by adding artificial noise to parameters at the client's side before aggregation. The study explored the relationship between convergence performance and levels of privacy protection. In [24], a comparison was made between FL and local differential privacy in terms of efficiency and privacy loss. However, the performance of applying local differential privacy to FL was not investigated. The work in [25] introduced a local differential privacy FL framework for industrial-grade text mining, demonstrating that it could provide data privacy and model accuracy. In [26], the authors describe a hybrid approach that combines differential privacy and SMC to achieve a balance between accuracy and vulnerability to inference attacks. The goal is to address the potential for low accuracy when using differential privacy and the vulnerability to inference associated with SMC.
2) Using Homormorphic Encryption: Homomorphic encryption (HE) has been actively researched for use in multiparty computation for deep learning tasks and then in federated learning [27], [28], [29], [30], [31], especially after succeeding in supporting approximate arithmetic over encrypted data [32], which means users can send their gradients encrypted to be added while keeping it private.
Phong et al. [33] used additively homomorphic encryption in asynchronous stochastic gradient descent training for a neural network. Truex et al. [27] combined additively homomorphic encryption (HE) with DP, but their approach cannot handle client dropouts. Using HE results in a significant runtime overhead which can be seen as impractical for real-world FL. Using a batch encryption technique, BatchCrypt [28] reduces the encryption and transmission overhead of HE-based aggregation and only requires a single round of communication. To safeguard model parameters, The authors in [29] proposed using (HE) approach that can directly execute arithmetic operations on ciphertexts without decryption. Based on a lightweight symmetric homomorphic encryption, the authors in [30] proposed an efficient and verifiable cipherbased matrix multiplication algorithm to ensure training security in a completely decentralized framework. In [34], the authors proposed a federated learning approach that prioritizes privacy using a multi-key homomorphic encryption protocol. The approach encrypts model updates with an aggregated public key before aggregating them on the server. Decryption requires collaboration from all participating devices, preventing unauthorized access to the participants' data. The authors of [35] combined ternary gradients federated learning with secret sharing and homomorphic encryption techniques to develop privacy-preserving protocols to protect against semihonest adversaries. However, the computational burden of HE renders it inapplicable for real-world training with FL and negatively affects scalability.
3) Using Secret Sharing: Bonawitz et al. [36] presented FL's secure aggregation. Their protocol can withstand client dropouts. To prevent access to local models, they employed blinding with random values, Shamir's Secret Sharing (SSS), and symmetric encryption. However, their aggregation needs at least four communication cycles every iteration between each client and the aggregator. This imposes a severe burden on clients with limited resources and WAN connections. VerifyNet [15], and VeriFL [14] modified the protocol of Bonawitz et al. [36]. Authors in [15] added verifiability on top of the protocol in [36] to guarantee the correctness of the aggregation, and in [14], the authors reported optimization of the communication and computation overhead in case of a large number of dropouts as it is always the case in a federated learning setting. However, these protocols rely on a trusted party to generate public/private key pairs for all clients. SAFE-Learn [37] introduced a generic design for efficient private aggregation for FL to overcome the aforementioned problems since their proposed protocol needs only two communication rounds in each iteration, it does not rely on expensive cryptographic primitives on client devices, and there is no need to trust a third party. The authors in [38] mixed masking using random keys while supporting quantization-based model compression to boost communication efficiency. They relied on hardware-assisted trusted execution environments (TEE) for verification which requires extra costs. Blockchain technology can be used to secure federated learning and introduce device and model trust as demonstrated in [39] and [40]. In [41], the authors proposed a secure aggregation protocol that is robust to client dropouts using a novel multi-secret sharing scheme based on Fast Fourier Transform (FFT). A new framework for secure aggregation was introduced in [42], which uses a multigroup circular strategy and additive secret sharing for model aggregation.
Our proposed protocol belongs to this category, but it uses lightweight primitives and single masking protocol, as will be discussed in detail in section V, without depending on TEE as in [38], or the need to trust a third party as in [14] and [36], or the use of expensive cryptographic primitives as in [36].

B. Verification
As the service provider may return incorrect results to the users either deliberately or due to unexpected situations, it is recommended that client devices have the ability to verify the aggregated model parameters sent by the service provider. The authors of VerifyNet [15] proposed that the server the aggregated result together with a proof to each client device. They utilized homomorphic hash function and pseudorandom generation to provide verifiability for each user. Modifications to this technique were done in [14] and [16] to decrease communication overhead and computational complexity, respectively. However, these techniques [14], [15] were analyzed in a recent publication [43], and it was pointed out that they still face some security vulnerabilities if the server colludes with a malicious user. In [43], the authors used linear homomorphic hash and digital signature for achieving traceable verification for the aggregation results and identifying the epoch at which the results went wrong but at the cost of increasing communication overhead. Luo et al. [44] used a basic signature method for the problem of verification where each client only needs to verify an aggregated signature which is independent of the number of clients. Each client then unmasks the aggregated gradient, updates the parameters of its local model, and proceeds to the next iteration. It was claimed in [30] that integrity verification is guaranteed for every model training step using their aggregation method. Differently, SafetyNets [45] used interactive proof techniques to verify the accuracy of the aggregated result supplied by the server. In work [46], a verifiable system is offered to perform verification, similar to [38], based on trusted hardware such as SGX, TrustZone, and Sanctum. However, these techniques provide a limited number of activation functions or demand additional hardware.

III. PRELIMINARIES
To make the article easy to follow, we explain some cryptographic primitives used in our approach, which should facilitate understanding the proposed protocol.

A. Key Agreement
A key agreement algorithm allows any party u to combine their private key s S K u with the public key s P K v for any party v to obtain private shared key s u,v between u and v. We use the Diffie-Hellman key agreement in our protocol to generate the shared key (seed) between each user and each auxiliary node. Specifically, given a group G with prime order q, where g is the generator of group G, each user can agree with each auxiliary node on a secret share as follows: • Each user chooses a secret key U S K and generate its public key as g U S K mod p and shares the public key with the server.
• Each auxiliary node chooses a secret key A S K and generates its public key as g A S K mod p and share the public key with the server.
• The server broadcasts the public keys to the parties.
• The shared key is now computed as key = (g A S K ) U S K mod p = (g U S K ) A S K mod p

B. Symmetric Encryption
Symmetric encryption is the traditional algorithm that uses only one key for encryption and decryption. Given the key S K and the information x to be encrypted, the encrypted information is obtained by the algorithm AE.enc(S K , x) → X . The ciphertextX can be decrypted by the algorithm AE.dec(S K ,X ) → x. In our model, we use symmetric encryption to communicate the messages between auxiliary nodes and users through the server without the server violating the confidentiality of the messages. We rely on this technique to avoid making private channels between each user and each auxiliary node to exchange the messages. We encrypt the messages, send them to the server, and broadcast them to the users.

C. Pseudo-Random Generator
We employ a secure Pseudo random generator PRG that takes a seed and produces a random number as an output. The PRG has to preserve two properties: • The output must be computationally indistinguishable from a uniform element sampled from the output space as long as the seed is hidden from the distinguisher.
• The same exact output is generated using the same seed.

IV. SYSTEM ARCHITECTURE A. General System Architecture and Threat Model
As shown in Figure 1, our system model consists of three entities, auxiliary nodes, users, and the server.
• Auxiliary Nodes: These nodes are a set of nodes that can't all collude together and are keen on the privacy of the data of the users so that the server can't reveal the machine learning local model parameters and analyze them to learn about the users' training data. These auxiliary nodes can represent organizations such as hospitals or medical entities running research in the healthcare domain or banks in the banking domain. They are robust and don't participate in any training process. Their main job is to agree with the users on shared keys used as seeds for generating the random numbers used in masking the gradients. In each protocol round, each auxiliary node agrees with each user in the system on a distinct secret random key. This key agreement would typically happen without directly communicating with users. Each auxiliary node would then compute the sum of all the random numbers at its disposal and send the summation to the server.
• User: Each user sends its local gradients to be aggregated securely at the server without revealing these gradients, as they can be used in a reverse engineering setting to trace back the users' data. The users will use the random numbers shared with the auxiliary nodes to mask their private gradients. Finally, each client verifies that the server has computed the correct aggregation.
• Server: The cloud server aggregates the masked gradients uploaded by all online users and the summation of all the random numbers uploaded by the auxiliary nodes. As a result, the server will aggregate all the local gradients without revealing each user's gradient. The following are the assumptions in our threat model: • All participants will follow the protocol steps, but they may try to infer other users' private data.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. • The server could collude with up to |N | − 2 users. • The server could collude with up to |M| − 1 auxiliary nodes.
• The server could return a modified version of the aggregated result to the users. The protocol aims to protect the confidentiality of users' local gradients while enabling each user to verify the aggregated result returned by the server. Using our protocol, for the server to violate a user's privacy, it has to collude with all the auxiliary nodes or collude with |N | − 1 users where |N | is the number of the users.

B. Healthcare Application Scenario
In the healthcare domain, AI applications that are provided by different service providers are facing challenges and receiving critiques on the privacy of patients' or users' data [47], which is stored and kept at the service provider to learn from. Federated learning [48] was proposed to solve this problem to help learn a model from users' data without the data leaving the users' devices to assure its privacy. Since then, research has been going to improve issues with the security of the federated learning approach, such as the problem discussed in this paper of having a server that will aggregate the local model updates and send an updated global model but may be curious to analyze the gradients and parameters received from the clients to learn more about their personal data. The proposed scheme presented in section V sends these gradients with certain random perturbations to the service provider with the help of auxiliary nodes as will be specified. Additionally, verifying the aggregated model by the server is a challenge as the server may return incorrect results to reduce its costs or to have an edge over other competitors.
An example scenario in the healthcare domain where the proposed scheme can fit is shown in Figure 2. An application service provider for an AI application/study, in the middle of the figure, orchestrates the federated learning process for learning from private health data such as different biosignals, personal attributes, and possibly medications at the client devices. In this paper, we will use the words; users and client devices interchangeably. All participants (service providers, hospitals engaged in the study/application, and users through their client devices, "cellular phones/wearable devices") follow the protocol correctly. However, there are some threats that the proposed protocol can handle, such as possible reverse engineering of the models' updates at the service provider to infer users' private data, the service provider may provide wrong information about dropout users, and it may attempt to manipulate the aggregated result.
In this scenario, it is assumed that some hospitals can collude with each other, e.g., they could belong to the same entity (university/research institute/project), but collusion cannot involve all. Similarly, the service provider may collude with some hospitals participating in a funded research project, for example, and with some users/client devices but not with all.
Each hospital will run one or more auxiliary nodes to exchange the seeds and generate random numbers for all the client devices through the server. At step 0, the auxiliary nodes, as well as the client devices, send their public keys to the service provider as shown in Figure 2 with the step numbers for the messages exchanged following the round numbers in which the message is sent in the proposed protocol; Protocol I. The service provider, in turn, broadcasts the keys of the clients to all auxiliary nodes and all keys of auxiliary nodes to all the clients in step 1. The client devices mask the model parameters by adding random numbers to them to keep them private. The masked models' parameters are sent to the service provider in round 2. The service provider, in turn, requests auxiliary nodes to send the sum of random numbers for participating client devices. Each auxiliary node will then send the sum of random numbers which it generated to the service provider. The service provider aggregates the resulting model parameters by adding all the data received from client devices and subtracting all the numbers received from the auxiliary nodes (unmasking) in round 3. The client devices will then verify the aggregated results in round 4.

V. PROPOSED SCHEME
In this section, we present the technical details of the proposed protocol. From a high-level view, the protocol aims to solve three problems that exist in the federated learning process by: • Protecting the user's privacy that may be leaked from the user's local gradients.
• Eliminating the effect of the dropout of users during the training process.
• Enabling users to verify the result computed by the server. The process starts after each user trains the model locally by their private dataset. Each user has to upload its local gradients to be aggregated by the server. The aggregation has to be done in a secure and private manner such that the local gradients can't be revealed to any party, even to the aggregator. In this protocol, the idea of masking to hide the local gradients of the users is adopted. Each user will add a set of random numbers to its gradients before sending them to the server. For generating and handling these random numbers, the protocol relies on a setting where a set of nodes called auxiliary nodes are used. These auxiliary nodes have two main jobs: • Each auxiliary node agrees with each user on a shared key which will be used as a seed for generating synchronized random numbers. Therefore, starting with this seed, in the same round, both parties generate the same random values.
• Each auxiliary node helps the server aggregate the gradients by providing the server with the required masks to cancel all the random values that have been added to the gradients. The complete protocol steps are listed in Protocol I, and an illustration of the data kept at each participating party and how it is aggregated at the server is shown in Figure 3. In the beginning, each auxiliary node and each user generate three key pairs {( pk 1 n , sk 1 n ), ( pk 2 n , sk 2 n ), ( pk 3 n , sk 3 n )} and {( pk 1 m , sk 1 m ), ( pk 2 m , sk 2 m ), ( pk 3 m , sk 3 m )}respectively and send their public keys to the server. Each user can agree with each auxiliary node on three shared keys by these keys. The first key is used in encryption and decryption, the second is used as a common seed for generating synchronized random numbers, and the third is used in the process of verification. Each user uses the second key as a seed to generate random values to mask its local gradients. Each auxiliary node has only a part of the mask that each user adds to its local gradients. The set of keys can be viewed as a matrix where each auxiliary node creates a column in this matrix, and each user takes a row from that matrix. Therefore, none of the auxiliary nodes can reveal the mask of any of the users. Each user uploads its masked gradients to the server and each auxiliary node uploads the summation of its generated random numbers. By aggregating all these values at the server, all the masks will be canceled, and the server will be able to get the right aggregated result of the actual local gradients of the users.

A. Protecting the User's Local Gradients
Assume that the number of the users is |N |, the number of the auxiliary nodes is |M|, and the number of online users that participate in the current round is |U | where each user n ∈ U has a unique ID known to both the server and the auxiliary nodes. Each user n ∈ U holds a private gradient x n and needs to hide it from all other parties. Each auxiliary node m ∈ M will agree on a shared key with each user s n,m (the number s n,m is the shared key between the auxiliary node number m and the client number n). By these seeds, in every round i each client n and each auxiliary node m generate an agreed random number denoted as P RG(s n,m , r ound(i)). Hence, each user can encrypt their local gradient as follows.
Also, each auxiliary node sum all the random numbers at its disposal as follows.
Then, each user submits their encrypted gradientx n to the server, and each auxiliary node submit the sum of its random numbers P m to the server. The server can aggregate all the local gradients X = N n=1 x n as follows.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.

B. Handling the Dropouts
Our protocol will handle the dropouts by default. Firstly, the server will receive the gradients from the users. Then, after some specific time, the server will ask the auxiliary nodes to upload the randomness corresponding to users who send their gradients. Even if some users upload their gradients late, the server will not reveal their private data as the users' private data is still masked with the randomness values, but those users will be excluded from participating in this round.

C. Verifiability
To enable each user to verify the result returned by the server, we rely on the concept of double aggregation. The first aggregation is used to compute the aggregated gradient, whereas the second is used to demonstrate the correctness of the first.
Each user will agree with each auxiliary node on a shared key K n,m ← KA.agree(sk 3 n , pk 3 m ) and compute the summation of these keys to generate their key denoted as K n ← m∈M k n,m . Also, each auxiliary generates a shared key with each user K n,m ← KA.agree(sk 3 m , pk 3 n ) and compute the summation of these keys to generate their key denoted as K m ← n∈n k n,m . It is obvious that the summation of N n=1 K n → K is equal to the summation of M m=1 k m → K . Each auxiliary node has to share the value K m with all the users. Each auxiliary node will also sample a random number α m and share it with all users. The summation of M m=1 α m → α will be used as a universal key. The way of sharing both α m and K m is through the server, as there are no channels between users and the auxiliary nodes. Each auxiliary node will encrypt the values α m and K m for each user using a shared agreed key ct n,m ← AE.enc(KA.agree(sk 1 m , pk 1 n ), α m ||K m ) then send the value to the server. The server will forward the cipher texts to corresponding users. Each user will decrypt the received ciphertexts as α m ||K m ←AE.dec(KA.agree(sk 1 n , pk 1 m ), ct n,m ). So each user computes K and α as follows: and Then, each user computes its mask MAC as follow: Besides the final result X , The server aggregate all the MACs and sends the result M AC = N n=1 M AC n to each user. So each user n can validate the result by verifying the below equation: Each user can easily verify equation (7) as they receive the universal key α and the final key K from the auxiliary nodes and receive the result X and the key M AC from the server.

VI. SECURITY ANALYSIS
In this section, the security of the proposed protocol is analyzed to show how it preserves the privacy of each user's local gradients. The protocol security is ensured in the presence of at least one auxiliary node and two honest clients. As shown above, each user masks its local gradient aŝ The security of the above equation relies on a lemma that says if uniformly random numbers are added to users' values, the resulting values look uniformly random. To prove thatx n provides a sufficient level of security, we first introduce some notations. We will consider that a server S interacts with a set of U of n users, and the underlying cryptographic primitives are instantiated with the security parameter k. We will use the symbol U i to denote for the set of users that successfully upload their local gradients in round i − 1, such that U 4 ⊆ U 3 ⊆ U 2 ⊆ U 1 ⊆ U as the users may drop out at any point during the execution.
Given a subset W ⊆ U ∪ S of parties, the joint view of all parties in W can be seen as a random variable where k indicate the security parameter used in the protocol. The view of a party during the protocol execution consists of its input, randomness, and all messages this party received from other parties. Once the party aborts and stops receiving messages, its view remains with the last message received. Here we are going to discuss the security of our protocol under collusion between the parties. First, we introduce a theorem that shows that any collusion between the users and the auxiliary nodes will not violate the privacy of other users' private data.
Theorem 1 (Defense Against Joint Attacks From Multiple Users and Multiple Auxiliary Nodes): For all k, W ⊆ U, x U , and U 4 ⊆ U 3 ⊆ U 2 ⊆ U 1 ⊆ U there is a Probabilistic-Polynomial-Time (P P T ) simulator S I M whose output is indistinguishable from the output of REAL u,k W .
The joint view of the parties in W does not depend on the user's inputs that are not in W because the server's view is removed (users do not share their inputs with auxiliary nodes). One way to achieve a perfect simulation is to let the simulator run the honest but curious users on their genuine inputs and the rest on fake input. We emphasize that the simulated view of users in W is identical to the output of the real view. For the honest users (not in W ), the simulator uses random values instead of the gradients to compute the masked valuex n . The parties in set W will not be able to identify which values have been used by other parties. The server just sends the list of the online user's participating in the round of masking to the auxiliary nodes and the final aggregation to the users. Therefore, the simulated view of parties in W is indistinguishable from the output of real view REAL u,k W . Theorem 2 (Defense Against Joint Attacks From the Cloud Server, Multiple Users, and Multiple Auxiliary Nodes): For all k, W ⊆ U ∪ S, x U , and U 4 ⊆ U 3 ⊆ U 2 ⊆ U 1 ⊆ U there is a P P T simulator S I M that its output is indistinguishable from the output of REAL u,k W .
Proof: To prove the above theorem, we use a standard hybrid argument. The idea behind this approach is to start from the actual view and then execute a series of secure modifications in the condition that any two subsequent variables are computationally indistinguishable, which ultimately makes a simulated view indistinguishable from the real view. Hyb0 This random variable represents the joint view of the parties in W in a real execution of the protocol. Hyb1 In this hybrid, we fix a specific userń ∈ {U 3 \ W }. For the honest auxiliary nodes m ∈ {U 3 \W }, the simulator replaces the operation of generating the shared key between each auxiliary node m ∈ {U 3 \ W } andń with a uniform random number. Specifically, a random valueŕń ,m is selected for auxiliary node m ∈ {U 3 \ W } andń. Instead of sendinĝ The simulator submitŝ The DDH assumption ensures that this hybrid is computationally indistinguishable from the first one. Hyb2 In this hybrid for the same specific userń ∈ {U 3 \ W } instead of using P RG(rń ,m ) the simulator uses uniformly random number r m with appropriate size to replace it. Note that the only change in this hybrid is to substitute the output of a P RG with a uniformly random value. Therefore, depending on the security of the P RG, we can argue that this hybrid is computationally distinguishable from the previous one. Hyb3 In this hybrid, for each user n ∈ U 2 \ W instead of sending their gradient x n they use a random value selected by the simulator R n , conditioned on   II  COMPARISON OF COMPUTATION, COMMUNICATION, AND STORAGE COMPLEXITY OF THE PROTOCOL AT THE CLIENT SIDE   TABLE III  COMPARISON Therefore, the simulator has already completed the proof since S I M Simulates R E AL without knowing x n for all the users n ∈ U 3 \ W and the output of the S I M is computationally indistinguishable from the output of R E AL.

VII. PERFORMANCE EVALUATION
In this section, we compare our proposed protocol with two well-known secure aggregation protocols used in [36] and [15]. While several other secure aggregation protocols exist in the literature, we specifically chose to compare our protocol with these two because they are the most relevant state-of-the-art protocols. One key reason for this selection is that they employ the same technology as our proposed protocol for secure aggregation. This allows for meaningful and useful comparisons of performance. Other protocols that adopt a different approach to secure aggregation may have different performance parameters, making it difficult to compare them in a meaningful manner. Both of the protocols we chose to compare against, as well as our proposed protocol, address the challenge of user dropouts, which is a significant issue in federated learning. Furthermore, these two protocols are widely used as benchmarks in the literature.
All complexity calculations presented below assume a single server, M auxiliary nodes, and N users, where each user has a model parameters vector of size V . The cost of the public key infrastructure and all signatures are ignored as they do not change any of the asymptotic complexities depending on M, N , and V . The results of comparing our protocol to the two aforementioned protocols in terms of computation, communication, and storage, on both the client and server sides, are reported in Table II and Table III respectively. 3) Storage Cost: Beside the user's own data stored for its keys generation, the user must store 3M keys corresponding to auxiliary nodes, the values of α and K , and the data vector for the model parameters (which it can mask in-place), which has a size of V . The overall storage cost at the client sums up to O(M + V ). 2) Communication Cost: The communication cost at each auxiliary node is attributed to (1) sending its 3 public keys to the server, (2) receiving 3N public keys and sending N encrypted secret values to the server, (3) receiving a list of online users from the server, and (4) sending the computed sum of the random vectors at an auxiliary node. Thus, the overall communication cost at the auxiliary nodes is O (N ).
3) Storage Cost: Besides the auxiliary node's own data stored for its keys generation, it has to store the users' public keys received from the server (3N ), 2N values for the shared keys K n,m and α m , and the list of online users(N ). Therefore, the total storage cost is O(N ).

Protocol I: Verifiable Secure Aggregation Protocol Using Auxiliary Nodes -Setup
-All parties agree on the security parameter λ, and honestly generated public parameter pp ← K A.gen(λ).
-All users have a private authenticated channel with the server.
-Collect the keys from all auxiliary nodes |M|, where M is set of all auxiliary nodes.
-Broadcast to all the users in U 1 the list of {( pk 1 m , pk 2 m , pk 3 m )} m∈M . -Broadcast to all the auxiliary nodes in M the list of {( pk 1 n , pk 2 n , pk 3 n )} n∈U 1 .

-Round 1 (Key Sharing)
Auxiliary node m: -Receive the list of keys ( pk 1 n , pk 2 n , pk 3 n ) n∈U 1 from the cloud server. -For each user n ∈ U 1 , compute K n,m ← KA.agree(sk 3 m , pk 3 n ).
-Sample a random element α m ← F. -For each user n ∈ U 1 , compute ct n,m ← AE.enc(KA.agree(sk 1 m , pk 1 n ), α m ||K m ). -Send all the cipher-texts {ct n,m } n∈U 1 to the server User n: -Receive the list of keys ( pk 1 m , pk 2 m , pk 3 m ) m∈M from the cloud server. -Receive the set of cipher-texts {ct n,m } m∈M . -For each user n ∈ U 1 , compute α||K ← m∈M AE.dec(KA.agree(sk 1 n , pk 1 m ), ct n,m ). -For each user n ∈ U 1 , compute K n ← m∈M KA.agree(sk 1 n , pk 1 m ). Server Side: -Collect cipher-texts{ct n,m } n∈U 1 sent by each auxiliary node.
-Send a set of cipher-texts {ct n,m } m∈M generated by m ∈ M {ct n,m } m∈M to each user n ∈ U 1 -Round 2 (Masking Input) User n: -Calculate the shared key with every auxiliary node m as s n,m ← K A.agr ee(sk 2 n , pk 2 m ) and expand this value using a PRG and calculate the masked input vectorx n ← x n + m∈M PRG(s n,m ) (mod R).
-Calculate the MAC of the input vector as MAC n = K n + α × x n (mod R).
-Send to the server the masked parameters vector x n and the MAC vector M AC n . Server Side: -Receive messages (masked parameters vectors x n and M AC n ) from the online users (represented as U 2 ⊆ U 1 ).
-Broadcast the list of U 2 to each auxiliary node ∈ M.

-Round 3 (Unmasking Input)
Auxiliary node m: -Receive the list of U 2 that represent the online users.
-Calculate the shared key with each user n ∈ U 2 s n,m ← K A.agr ee(sk 2 m , pk 2 n ), and expand this value using a PRG into a random vector P n,m ← PRG(s n,m ).
-Calculate the P m ← n∈U 2 P n,m -Send the value P m to the server. Server Side: -Receive the values P m from the auxiliary nodes.
-Calculate the aggregated gradients for all users n ∈ U 2 as X = n∈U 2x n − m∈M P m (mod R).  2) Communication Cost: The server acts as the interface between the users and auxiliary nodes. It participates in all the communication between users and auxiliary nodes. In round 0, it collects the keys from users and auxiliary nodes and broadcasts the user's keys to all auxiliary nodes and auxiliary nodes' keys to all users with communication overhead of complexity O(N + M). In round 1, the server collects the ciphertexts for the encrypted values sent by the auxiliary nodes and forwards them to each user, which is O(M + N ). In round 2, the server receives masked parameters vector of length (V ) and MAC vector of length (V ) from each online user with a maximum of N users in case of no dropouts with overall complexity O(N V ). In round 3, the server receives the lists of randoms from auxiliary nodes O(M) and broadcasts the aggregated parameters vector (V ) and aggregated MAC (V ) to online users (maximum N ) with a communication complexity of O(N V ). Therefore, the total communication complexity sums up to O(N V + M + N ).
3) Storage Cost: The server has to store the public keys of the users (3N ) and the public keys of the auxiliary nodes (3M) in the first round. It stores the auxiliary nodes' ciphertexts (M) in the second round. The list of online users (N in case of no dropouts) is stored in the third round. In the last round, the server stores the aggregated parameters vector (V ) and the aggregated MAC vector (V ) for a total storage complexity O(N + M + V ).

D. Prototype Implementation and Setup
We developed a Python prototype on a desktop machine with a 2.60 GHz Intel Core i7-6700HQ CPU and 7.5 GB RAM. The prototype included the following cryptographic primitives: • For the key agreement, we used the elliptic-curve Diffie-Hellman protocol.
• For Secret Sharing, we used t-out-of-n Shamir secret sharing.
• For encryption, we used Advanced Encryption Standard (AES) with a 128-bit key in counter mode.
• For the pseudorandom number generator, we used SHA-256. To evaluate the performance, we evaluated the execution time for four phases: key sharing, Masking gradients, aggregation, and validation.

E. Experimental Results
To conduct our experiments, we used randomly generated 10K-entry vectors with 64-bit entries as the users' local gradients. We varied the number of users and user dropout ratio to acquire a broad understanding of how the two factors affect the performance of the four phases. Table IV compares the overall performances of the PSA model, the Verifynet model, and our model with varying user numbers and dropout ratios. During the sharing key phase, our model demonstrates a lower cost than PSA and Verifynet as in our model, only the auxiliary nodes have to share the values of α M and k M to users, whereas, in the other two models, each user must make shares of both its secret key and its private value and share them with every other user. During the masking input phase, we did not observe a significant performance gap between PSA and our model. VerifyNet, on the other hand, incurs enormous overhead, mostly because of its extensive use of group operations to achieve bilinear pairing. During the phase of unmasking input and aggregation, when there are no dropouts, the costs of PSA and our model are comparable. Nevertheless, when dropouts occur, the computation cost in PSA increases exponentially while our model maintains a constant computation cost. This is expected since, in PSA, for each dropped user, the server must remove that user's pairwise masks for each surviving client. This requires the server to talk to a certain number of clients to get this dropped user's secret key and then figure out all the masks. However, in our model, only the auxiliary node excludes dropped users from the computation. VerifyNet's costs are much higher than PSA and our model for the same reason mentioned for the masking input phase. Lastly, during   the verification phase, our model is considerably lighter than Verifynet since it predominantly leverages computationally lightweight PRG operations. Verifynet, on the other hand, employs computationally intensive cryptographic operations such as bilinear pairing and Homomorphic Hash Functions, which are far more expensive than PRG. Figure 4 illustrates the impact of increasing the number of users on the number of messages associated with the exchange of keys between users and the server for various auxiliary node ratios. As seen, as the number of users increases, there is a slight change in the number of messages, and the lowest use of auxiliary nodes shows the less number of messages.

F. Accuracy
We evaluated the performance of two different neural network architectures on two popular datasets -MNIST and CIFAR100: • MNIST [49] is a dataset consisting of grayscale images of handwritten digits from 0 to 9, comprising 60,000 training and 10,000 testing images, each of size 28 × 28 × 1, and labeled into one of the ten classes.
• CIFAR100 [50] is a dataset that contains RGB images of 100 classes, with 500 training images and 100 testing images per class, each of size 32 × 32 × 3. For the MNIST dataset, we used a three-layer network with two hidden, fully connected layers of 256 neurons and rectified linear units. The output layer was fully connected with 10 output neurons and utilized softmax activation. We used the stochastic gradient descent optimizer with a learning rate of 0.001. For the CIFAR100 dataset, we employed a convolutional neural network (CNN) with seven convolutional layers, each consisting of 3 × 3 filters and a stride of 1. Each convolutional layer was followed by rectified linear units and 2×2 max pooling with a stride of 2. The fully connected layer used softmax activation. We used the Adam optimizer with a learning rate of 0.001.
We evaluated the accuracy of Federated Learning (FL) models against a default model. The default model utilized ResNet as a pre-trained model and solely trained the fullyconnected layer while maintaining the convolutional layer parameters constant. Fig 5 shows the classification accuracy and number of rounds for different numbers of clients. The figure illustrates that as the number of clients increases, more rounds are needed to reach a specific accuracy. Fig 6 compares the accuracy of the FL model with that of the default model. The results show that the FL model achieves comparable accuracy, albeit slightly lower than the accuracy attained by the default model.

VIII. CONCLUSION
In this research, we propose a secure aggregation protocol that can be employed in a federated learning setting. The protocol depends on the use of auxiliary nodes that cannot all practically collude together. At the same time, the use of auxiliary nodes reduces the communication and computation costs as well as the storage cost on low-resource devices for clients and the service provider as well. These auxiliary nodes can represent hospitals in a healthcare scenario, banks in a financial or banking application, etc. The analysis of the protocol showed reduction in the computation, communication and storage cost compared to state-of-art protocols at the client nodes. In the proposed protocol, the cost of computation, communication, and storage on the low-resource devices (e.g., mobile phones/wearable devices) of client nodes depends only on the number of auxiliary nodes and the length of the weight parameters vector and is independent of the number of users joining or leaving the training process at each round. Additionally, the verification step in the proposed protocol relies on lightweight computations, which reduces power consumption on client devices.