VerSA: Verifiable Secure Aggregation for Cross-Device Federated Learning

In privacy-preserving cross-device federated learning, users train a global model on their local data and submit encrypted local models, while an untrusted central server aggregates the encrypted models to obtain an updated global model. Prior work has demonstrated how to verify the correctness of aggregation in such a setting. However, such verification relies on strong assumptions, such as a trusted setup among all users under unreliable network conditions, or it suffers from expensive cryptographic operations, such as bilinear pairing. In this paper, we scrutinize the verification mechanism of prior work and propose a model recovery attack, demonstrating that most local models can be leaked within a reasonable time (e.g., <inline-formula><tex-math notation="LaTeX">$98\%$</tex-math><alternatives><mml:math><mml:mrow><mml:mn>98</mml:mn><mml:mo>%</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="hur-ieq1-3126323.gif"/></alternatives></inline-formula> of encrypted local models are recovered within 21 h). Then, we propose <sc>VerSA</sc>, a verifiable secure aggregation protocol for cross-device federated learning. <sc>VerSA</sc> does not require any trusted setup for verification between users while minimizing the verification cost by enabling both the central server and users to utilize only a lightweight pseudorandom generator to prove and verify the correctness of model aggregation. We experimentally confirm the efficiency of <sc>VerSA</sc> under diverse datasets, demonstrating that <sc>VerSA</sc> is orders of magnitude faster than verification in prior work.


INTRODUCTION
W ITH the explosive growth in data, neural networks have demonstrated promise and usefulness in solving real-world problems in diverse domains, such as computer vision and speech recognition [1], [2], [3]. Due to the everincreasing need to develop a robust and accurate deep learning model in unpredictable environments, building a joint dataset across organizations and individuals is incredibly important for quality training of the learning model and widespread deployment of its technologies. However, it raises daunting challenges regarding data privacy, especially when the data are shared across multiple users [4].
As an effective countermeasure to handle such a privacy concern, Google proposed federated learning (FL) [5], [6], in which a federation of users individually train a shared global model maintained by a central coordinating server. In FL, each user has a training dataset on the local device, and only computes an update to the current model. The central server aggregates the locally updated models from the users to obtain a global model. Although the training data never leave the users' local storage, many studies have shown that diverse privacy attacks are possible on the local model updates to reconstruct the private training data and infer the presence of a specific data item in the training dataset [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17]. Manifold counter approaches to resolving the problem have leveraged various security primitives, such as differential privacy [18], [19], [20], homomorphic encryption [21], [22], post-quantum secure gadgets [23], and multi-party computation [24], [25]. By properly applying these techniques, the central server can perform model aggregation without compromising confidentiality of the local model updates.
Alongside the model privacy, ensuring correctness of model aggregation is another important security requirement in FL. Indeed, the central server may have a financial incentive if it performs an exceedingly fast but somewhat incorrect computation for the sake of saving computational resources. One straightforward method to guarantee correctness is applying multiclient verifiable computation [26], [27], [28], which supports the verification of computations over a series of joint inputs from multiple users while preserving privacy of individual input. However, multiclient verifiable computation allows only one user to learn the computational result, which does not fit into FL because all users in FL require this result (i.e., the aggregated model).
Several verifiable FL schemes were recently proposed to address the problem [29], [30], [31], [32]. Among them, Veri-fyNet [30] was designed specifically for cross-device FL. Cross-device FL refers to a distributed machine learning model where numerous users equipped with potentially lightweight devices and unreliable network connections participate in the training phase [33]. In this setting, many pragmatic constraints are considered, such as user dropouts, the joining of never-before-seen users at any time, and the survival of only a fraction of users at the end of each training epoch. These constraints significantly complicate some security assumptions, such as a private key sharing across all participating users. However, VerifyNet depends on reliable secret distribution among all users, where the shared secret is a key to achieving verifiability [30].
In this regard, we raise two questions. First, is the trusted setup adopted by prior work secure in the context of privacy-preserving cross-device FL? Second, is it possible to construct a verifiable FL scheme for privacy-preserving cross-device settings without relying on any trusted setup?
In search of answers, we first scrutinize the verification mechanism of VerifyNet and demonstrate that an attacker can uncover encrypted local model updates from a victim within a reasonable time. Specifically, this attack is realized by our twofold observations. First, the model updates are highly biased, forming bell-shaped distributions [34]. Second, users in VerifyNet share a predefined secret key for a keyed homomorphic hash [35], [36], not only to verify the correctness of model aggregation but also to encode the local model updates. Consequently, this creates a potential vulnerability allowing attackers to deliver brute-force attacks on a victim's hashes (images) and recover the corresponding local model updates (preimages). In our experiment on the MNIST training dataset run by VerifyNet, an attacker can recover about 980 out of 1,000 local model updates from their hashes within one day (when model updates have 16-bit precision) to 90 days (32-bit precision).
Then, we propose VERSA, a verifiable secure aggregation (SA) scheme for cross-device FL, which is designed to run on top of SA [37], a lightweight model aggregation protocol that preserves the privacy of individual model updates in the cross-device FL environment. Considering the unreliability and resource constraints of users in FL, VERSA has the following two design goals. First, verifiability should be achieved without relying on any trusted setup among all users, and second, the verification process should be computationally lightweight.
VERSA satisfies these requirements by employing a secret expansion through a pseudorandom generator (PRG). Specifically, users in SA collectively generate a set of pairwise shared secrets for local model encryption. In VERSA, the shared secret is used as a master key to derive three session keys through PRG. One session key is used to encode the local model update into a model verification code. This code is particularly useful for verifying model aggregation because an aggregation of individual codes encodes an aggregation of individual local model updates. Unfortunately, users cannot send their model verification codes in plaintext to the central server because they are vulnerable to the brute-force attacks described earlier. To address this problem, the other two session keys are used to protect the local model update itself and the corresponding model verification code via the encryption of SA. Under the security of SA, an attacker can only obtain an aggregated verification code and cannot recover any individual model verification code from it. Meanwhile, every user employs the aggregated verification code to verify the correctness of the corresponding aggregated model update.
VERSA achieves the verifiability of the model aggregation via the same lightweight cryptographic primitive as the PRG that SA utilizes. This property is extremely useful when a model update is a high-dimensional vector [38], [39]. To verify an aggregated model that consists of an n-dimensional vector, each user in VERSA performs n executions of integer multiplication and addition. In contrast, users in VerifyNet perform 4n pairing and n exponentiation operations in a cyclic group of prime order, both of which are computationally extensive group operations. To demonstrate our scheme's efficacy, we implement VERSA under three datasets (MNIST [40], SVHN [41], and CIFAR100 [42]). The SA [37] is a baseline privacy-preserving model aggregation protocol on which VERSA is built; thus, we demonstrate the cost of achieving verifiability over SA by comparing it with SA. As VerifyNet [30] was also built upon SA, we present the cost-effectiveness of our scheme compared with VerifyNet. According to the experimental results, the user-side cost in terms of time for verifying a 1,000-dimensional vector in VERSA is approximately 15 ms, whereas the cost is 20 min for VerifyNet.

Contribution
Our work makes the following contributions: We propose a model recovery attack on VerifyNet [30] and demonstrate that an attacker can recover a victim's model updates within a reasonable time. We propose VERSA, a verifiable and privacy-preserving model aggregation scheme. VERSA achieves verifiability of model aggregation via a lightweight primitive like pseudorandom generators, best supporting cross-device FL. We experimentally confirm the model accuracy of VERSA under three datasets (MNIST, SVHN, and CIFA100), revealing that VERSA achieves both privacy and verifiability of model updates without degrading accuracy. We conduct a comparative performance analysis of VERSA over SA and VerifyNet. The evaluation results demonstrate that the additional cost to run VERSA on top of SA is remarkably small and orders of magnitude faster than VerifyNet.

Organization
The rest of this paper is structured as follows. We introduce the background of FL, SA, and verifiability in the FL context in Section 2. Then, we describe the system and threat model of VERSA in Section 3. In Section 4, we explain the model recovery attack and its result. In Section 5, we present the construction of VERSA in detail, followed by the security and performance analyses in Sections 6 and 7, respectively. We provide related work in Section 8, and conclude the paper in Section 9.

BACKGROUND
In this section, we introduce the systematic overview of FL and SA, and their security issues. Then, we describe verifiability in the context of FL.

Federated Learning
Federated learning (FL) is a decentralized machine learning technique where a centralized model is trained by aggregating locally computed models, while the training data are distributed to each user and never shared with others [5]. A central server orchestrates the training process by repeating a series of rounds, each consisting of the following processes: -User selection: The server selects a set of users and provides the current model parameters to them. These users may survive throughout the round or drop out before the current round is completed. -Local training: Each user locally computes an update to the model. The update typically involves a variant of a mini-batch stochastic gradient descent rule [39], which returns a gradient as a local training result. -Aggregation: The server collects a set of gradients sent from the surviving users and computes a model aggregation.
-Global model update: The server updates the current global model using the aggregated gradient. This new global model is published for the subsequent round. Although the training data of users never leave each user's local storage, the locally computed model parameters can be exploited to infer the training data [7], [10]. Therefore, it is highly desirable to keep the local model parameters confidential while correctly enabling their aggregation.
FL is widely applicable to two distinctive scenarios: the cross-device and cross-silo settings. In the cross-device setting, numerous resource-constrained users participate in the learning, and the network condition is highly unreliable [43], [44]. In contrast, the cross-silo setting encompasses more stable environments where a small number of organizations equipped with rich computing resources participate as users, and the network channel is reliable [45], [46]. We focus on the cross-device setting in this paper.

Secure Aggregation
Secure aggregation (SA) [37] can effectively deal with the privacy concern of cross-device FL. SA enables a set of users, who are unreliable and do not have direct communication channels to collectively update global models via a central server while hiding individual gradients. Precisely, at the user selection phase, the server broadcasts a list of selected users to request their participation in the current round. The server and users conduct the following processes sequentially: -Advertising and sharing key: This corresponds to the user selection phase in FL. Each selected user is notified of the list of the other participating users and their public keys. The user derives a noise from the secret key. This secret key is further split into n shares via a t-out-of-n secret sharing algorithm [47], where n and t refer to the number of participating users and quota for reconstructing the secret key, respectively. Each share is encrypted using public keys and sent to the corresponding users individually. -Masking: This corresponds to the local training phase in FL. Each user trains locally and masks the gradient using the noise he computed in the previous phase.
-Unmasking: This corresponds to the aggregation phase in FL. The server collects a set of masked gradients sent from the users in the previous phase and computes a model aggregation by aggregating all masked gradients. If all masked gradients are submitted successfully without any dropouts, then their aggregation leads to the correct aggregation of all gradients. Otherwise, the aggregated gradient contains noises, where each noise is made by each dropped-out user. The server handles this problem by making t surviving users reconstruct the secret keys of dropped-out users. Then, the server calculates the corresponding noise from the reconstructed secret keys, removing the noise from the noisy aggregation. Due to the usage of t-out-of-n secret sharing [47], the privacy of the individual gradient of each user is preserved as long as the maximum number of users the server may corrupt does not exceed t. In addition, SA uses a lightweight cryptographic primitive (e.g., a PRG) to preserve privacy. Thus, SA is applicable to cross-device FL where user communication and computational resources are limited. A technical overview of SA is given in Section 5.2.

Verifiable Federated Learning
Verifiabililty is one of the important security requirements in FL [33]. In FL, achieving verifiability is considered in two different scenarios. The first scenario is that the server plays the role of a prover, attesting to its execution of faithfully aggregating the local model parameters. Users in this scenario are individual verifiers. In the second scenario, the users play the role of provers attesting that they are not deviating from the protocol (e.g., individual local model parameters are correctly generated), whereas the server is the verifier. In this paper, we focus on the first scenario.
Xu et al. [30] recently showed how to achieve verifiability on top of SA. Their scheme has limitations in terms of performance and security. The performance issue arises from using a homomorphic hash [36], which is computationally costly as it heavily relies on the bilinear pairing [48]. The usage of the bilinear pairing is a dominant performance bottleneck when the gradients are high-dimensional vectors because the pairing operations must run on each entry in a vector.
The security concern is significantly more alarming than the performance issue because it can lead to privacy violations. More precisely, we conduct an in-depth analysis of the scheme by Xu et al. [30] and find that collusion between the server and user can enable brute-force attacks on the hashed values submitted by a victim such that most entries of the input vectors can be recovered within a reasonable amount of time. In Section 4, we describe how the homomorphic hash is used to verify the model aggregation and demonstrate that their scheme is vulnerable to our model recovery attack.

SYSTEM AND THREAT MODEL
In this section, we describe the system and threat model of VERSA. Fig. 1 depicts the system model of VERSA compared with that of FL. In Fig. 1a, FL is run by n participants and a server. All participants send their local model updates to the server (➀). The server aggregates local model updates (➁) and sends the result back to each participant (➂). Lastly, all participants apply the global model update to their model (➃). By applying SA to FL, the local model updates are encrypted prior to be sent to the server (➀), while enabling the server to retrieve the aggregated result in plaintext without decryption (➁). In Fig. 1b, VERSA runs on top of the SA-enabled FL, where the server generates a proof attesting to the correctness of its execution (➌) and broadcasts the result (➍). Finally, each participant verifies the global model update (➎).

System Model
VERSA enables every surviving user to verify the correctness of the aggregated model and runs on top of SA, achieving verifiability using the same lightweight cryptographic primitive as SA. Each surviving user runs the following sequential processes: -Advertising and sharing key: This is same as the advertising and sharing key phase in SA. -Masking: This corresponds to the masking phase in SA. In addition to the gradient masking, the user derives a public evaluation key and private verification key, where the two keys are used in the subsequent phases by the server and user, respectively. -Unmasking and generating proof of model aggregation: This corresponds to the unmasking phase in SA. In addition to the model aggregation, the server generates a proof attesting to the correctness of the aggregated gradient using the evaluation key. The server broadcasts the proof and aggregated gradient. -Verifying model aggregation: The user verifies the correctness of the model aggregation using the verification key. The aggregated gradient is accepted if the verification succeeds; otherwise, it is rejected. Technical details of VERSA are provided in Section 5.

Threat Model
In VERSA, we assume that all users agree upon a model aggregation orchestrated by the central server. The users consent to release the final result of the model aggregation to every user. These users have a common interest in soundness (i.e., retrieving correct global model updates from the untrusted central server) and privacy (i.e., hiding their local model updates from each other and the server).
The central server should be reliable because its role is important in FL, i.e., it orchestrates the training process. Nevertheless, the central server also potentially represents a single point of failure, and having a trusted server may not always be available or desirable in many collaborative learning scenarios [33].
In accordance with the aforementioned assumption, we suppose the server is malicious in terms of soundness. The server may deviate arbitrarily from the protocol by supplying incorrect messages or executing different computation than expected to return an incorrect result to all users. Such an assumption captures the adversarial behaviors in multiclient verifiable computation [26], where the goal of the server is to deceive the users into accepting a wrong result.
In terms of privacy, we assume that the server is malicious and and users are semi-honest. The server and some users may collude to obtain the best offensive capabilities as possible against the other victims' local model updates. The maximum number of users the server may corrupt does not exceed t which is the threshold of the t-out-of-n secret sharing algorithm [47]. Specifically, given a set of user group U U and any subset C C & U U [ fSg, where S is the server, it is required that jC C=fSgj < t.
Out-of-Scope Attacks. In this paper, we do not consider user-server collusion attacks to bypass the verifying model aggregation phase. This exception is due to the impossibility result [27], in which multiclient verifiable computation cannot achieve verifiability in the presence of the users colluding with the server. In addition, we consider neither membership inference attacks [13] that infer the training data from the model itself nor poisoning attacks [12] in which malicious users deliberately affect their training data to manipulate the global model. In the poisoning attack, these malicious attackers may submit distorted local model updates (e.g., outside the bounds of their expected range). Such bad inputs could be handled by employing noninteractive zero-knowledge [49] under a trusted setup among all users. However, it is costly in the cross-device FL setting because users may frequently drop out and do not have direct communication channels with each other.

MODEL RECOVERY ATTACK
We describe the model recovery attack on VerifyNet [30], a homomorphic hash-based verifiable SA, and demonstrate the feasibility of our attack. To better understand the attack procedure, we briefly explain the general methodology for achieving verifiability over aggregated gradients using homomorphic hashes.

Homomorphic Hash
A homomorphic hash H [35], [36] allows evaluating an arithmetic function f on the input from a set of hash values Hðm 1 Þ; . . . ; Hðm t Þ such that an evaluation algorithm returns Hðfðm 1 ; . . . ; m t ÞÞ. More precisely, a family of one-way keyed homomorphic hash H consists of three algorithms, as follows: k H:gen: This is a private key k generation algorithm for a keyed hash function H k .
-H k ðmÞ H: This is a hash computation algorithm that returns H k ðmÞ on input m.
The security of a one-way keyed hash function H guarantees that it is practically infeasible to invert from H k ðmÞ to recover m.
Application to SA. An aggregated model update is the sum of all gradients in FL, each generated by individual users. As H is one-way and supports computations of arithmetic circuits, one can easily apply a homomorphic hash to SA to verify the correctness of the sum while preserving the privacy against the server by performing the following process sequentially: -All users u 2 U U share the secret key k for a homomorphic hash H. -Each user u computes H k ðx u Þ, a hash of the gradient x u . Each user u submits the pair ðSA.maskðx u Þ; H k ðx u ÞÞ to the server, where SA.maskðx u Þ refers to the masked gradient of x u . -The server aggregates all masked gradients fSA.maskðx u Þg u2U U , which returns z. The server evaluates all hash values received from u 2 U U, which returnsẑ ¼ H k ð P u2U U x u Þ. The server broadcasts a pair ðz;ẑÞ.
-In this phase, users verify whether z ¼ ? P u2U U x u with the assistance ofẑ. To do this, each user computes a hash of z. Then, the user accepts z as a correctly aggregated model update if and only if H k ðzÞ ¼ẑ and aborts otherwise. This process does not intend to enable the server to learn the individual gradient from Hðx u Þ as H is a one-way keyed hash function. VerifyNet [30] uses the above approach to allow users to check the correctness of aggregated results while preserving privacy of individual gradients.

Our Attack
Our model recovery attack on VerifyNet [30] leverages the following two features we observed. First, the distribution of model parameters (i.e., gradients) is highly biased, such that the entries of gradient generated by an SGD rule form bell-shaped distributions around zero [34]. Without this observation, we would have no choice but to launch a naïve brute-force attack over a very large range like ðÀ1; 1Þ, which is computationally infeasible. Second, to encode gradients and verify the aggregated gradient in VerifyNet, all users must share the same secret parameters used for a homomorphic hash. These observations lead us to hypothesize that VerifyNet is vulnerable to brute-force attacks launched by malicious users colluding with the server such that the encoded gradients can be recovered from a bruteforce attack on a victim's homomorphic hash outputs. Such user-server collusion is the topmost security concern for preserving confidentiality of the individual gradients [30], [37]. Overall, these two observations are strong clues to balance recovery ratios and the time taken to recover the models, i.e., recovering as many model parameters as possible within a relatively short period.

Attack Scenario
We simulated a cross-device FL setting to prove our hypothesis, where we run TensorFlow [50] on the MNIST training dataset [40] on each user's local device using the stochastic gradient descent rule. Then, we encoded the gradients using a homomorphic hash and sent the result to the server. By colluding with a malicious user, the server obtained all homomorphic hash parameters and ran a brute-force attack on the homomorphic hash outputs submitted by the victim.

Implementation Setup
To implement the homomorphic hash, we used two hash algorithms: the DRV hash [36] and the KFM hash [35]. The former was chosen because it is used in VerifyNet, whereas the latter was chosen to demonstrate that the feasibility of our attack does not depend on any specific homomorphic hash used to encode the gradients. For the DRV hash, we used MNT224, a Type-III curve for pairings implemented in the pairing-based cryptography library [51]. For the KFM hash, we used 256-bit, 512-bit, and 1,024-bit prime numbers, each instantiating three independent KFM hashes written in Java. Consequently, we used these three KFM hashes in order to measure the feasibility of our attack over various bit-length prime numbers.
Our attacks were run on two Ubuntu 18.04 LTS servers. First, the DRV hash attack was conducted on a system equipped with 32GB RAM and an AMD Ryzen 3950x 16core processor, each running on a base clock of 3.5 GHz with 16-thread OpenMP setup. Next, to attack the KFM hashes, we used a system with an Intel Core i9-9900K processor with 3.6 GHz base clock and 64GB RAM. We used only a single core to launch our attack for each hashed gradient.

Processing Float-Type Gradient
Because both DRV and KFM hash take integer values as input, the floating-point parameters obtained from training need to be scaled in the range fÀ pÀ1 2 ; . . . ; 0; . . . ; pÀ1 2 g, where p is a large prime number used for the hashes. Following the float-to-integer conversion method [52], [53], we quantized a floating-point value v into an integer bv Á ac with a scaling factor a, where a larger value of a leads to low quantization errors. We set a ¼ 2 prec , where prec refers to the bitprecision a neural net employs to represent the gradient values.
Recent studies have demonstrated that, instead of traditional single precision (32-bit), 16-bit precision can be sufficient to train a neural net without affecting the model accuracy in favor of energy efficiency and reduced precision representations [54], [55]. Thus, we used 32-bit precision and 16-bit precision for the MNIST training dataset to obtain two separate gradient sets. Our attack on the latter set measures the effect of reduced precision regarding the brute-force attacks over the homomorphic hash-based verifiable SA.

Attack Process
Each of the two gradient sets contains 750 gradients, where each gradient forms a vector of 10 entries. As the distribution is expected to be highly biased around zero, most of the parameters would reside within a short symmetric interval, e.g., ðÀx; xÞ. Fig. 2 plots the distributions of those entries, indicating that they form narrow bellshaped curves centered on zero. The figure also indicates that most parameters would be recovered even if we set x as a small value, e.g., x ¼ 0:2. Lastly, we obtained four hash values for each entry using DRV, KFM-256, KFM-512, and KFM-1024 hash functions.
Let hðbv Á acÞ be the hashed counterpart of a float-type gradient entry v. Let 0 < r < 1 be a float-type value such that there exists an integer n satisfying bv Á ac ¼ r Á n. While exhaustively increasing (or decreasing) i from zero, we check whether hðbv Á acÞ ¼ hðbr Á ac Á iÞ. This loop ends when br Á ac Á i > bx Á ac (or br Á ac Á i < bÀx Á ac), indicating that the loop is beyond the given range ðÀx; xÞ, where x is a real number.
The choice of ðr; xÞ affects the attack performance. For example, the number of recovered gradients would increase when we set r and x as small and large as possible, respectively, at the cost of increased time to complete the attack. In this regard, we tried diverse pairs of ðr; xÞ to empirically balance the number of recovered gradients and the required time. In our experiment, we set r as 10 À7 and 10 À9 for 16-bit and 32-bit precision, respectively, with x 2 f0:2; 0:3; 0:4g.
For each hashed gradient set, we chose 100 hashed gradients uniformly at random and launched our attack on each hashed gradient entry to measure the time taken to recover its counterpart (i.e., the float-type gradient entry). We aborted the attack on each hashed gradient entry when its counterpart was not found within the given range ðÀx; xÞ. Table 1 summarizes the attack results measured over three months, while Fig. 3 presents in detail how fast we recovered gradient entries within the given time period. The attack results are evaluated regarding accuracy and efficiency. Accuracy refers to the ratio of the number of recovered gradient entries over total number of gradient entries. Efficiency refers to the time taken to complete the attack.

Attack Result
First, we explain the attack results in terms of accuracy. Within a small range ðÀ0:2; 0:2Þ, our attack has an accuracy of 0.875 on average in both 16-bit and 32-bit precision. The accuracy increases as we expand the range, as shown in Table 1. For example, it reaches an accuracy of 0.936 and 0.978 when we launch the attack using DRV for 16-bit precision within the ranges ðÀ0:3; 0:3Þ and ðÀ0:4; 0:4Þ, respectively. Lastly, our attack demonstrates that one cannot witness any significant accuracy difference between different homomorphic hashes. This result is alarming because, regardless of the type of underlying homomorphic hash, our attack can recover gradients with high accuracy.
Next, we discuss our attack results in terms of efficiency. As the table and figures depict, one of the significant factors that affect efficiency is the bit precision. For example, in 16bit precision, all of our attacks are complete within a day, e.g., the most time-consuming attack is on KFM-1024, which takes 20.44 hours. However, as we used 32-bit precision, efficiency degrades to a minimum of 9.56 to a maximum of 87.03 days. Such time is still considered extremely short. Moreover, utilizing multi-core capabilities could drastically  reduce the time. Recall that we used only a single core to launch the attack. Eventually, the attack confirms that the gradients encoded into homomorphic hashes can be recovered within a short period with high accuracy.

VERSA CONSTRUCTION
VERSA is designed to support the verification capability over the SA protocol from Bonawitz et al. [37] using the same lightweight cryptographic gadgets that SA relies on. Accordingly, we first recapitulate the cryptographic primitives of SA (and, accordingly, VERSA). We provide a technical overview of SA and describe the core technique of VERSA for achieving verifiable computation on top of SA. Lastly, we provide details on the design and implementation of VERSA.

Key Agreement
The key agreement (KA) protocols [56] allow two different users to generate a shared secret key over a public channel.
The KA consists of the following algorithms: pp KA:paramðÞ: This algorithm takes a security parameter as input and returns a public parameter pp.
ðpk u ; sk u Þ KA:genðppÞ: This algorithm takes pp as input and returns a public and secret key pair ðpk u ; sk u Þ for user u.
s u;v KA:agreeðsk u ; pk v Þ: This algorithm takes sk u of user u and pk v of user v as input and returns a shared key s u;v .

Secret Sharing
Shamir's t-out-of-n secret sharing (SS) protocol [47] allows a user to split a secret into n shares such that at least t n shares are required to recover this secret. Shares less than t leak no information about the secret. The SS consists of the following algorithms: pp SS:shareðs u ; t; U UÞ: This algorithm takes a secret value s u of u, a threshold t, and a set of user group U U as input, where t jU Uj. It returns fs u;v g v2U U .
ðpk u ; sk u Þ SS:reconðfs u;v g v2V V ; tÞ: This algorithm takes a set of shares The protocol aborts if t > jV Vj; it reconstructs s u otherwise.

Authenticated Encryption
Authenticated encryption (AE) [57] guarantees the confidentiality and integrity of messages. In this paper, we use symmetric AE (i.e., the encryption and decryption key is identical). The AE consists of the following algorithms:

Pseudorandom Generator
A pseudorandom generator (PRG) [58] maps an input seed to a pseudorandom output sequence and guarantees that the output distribution on a uniformly chosen seed is computationally indistinguishable from a uniform distribution. In this paper, we use PRG : f0; 1g Ã ! Z n R , which expands an input value into n-dimensional output vector, where R is a large integer.

Public Key Infrastructure
The public key infrastructure [59] binds public keys with the respective identities of users, underpinning the confidence in trust that users have. A user with identity u has a signing and public key pair ðsk u ; pk u Þ issued by a trusted third party. In VERSA, we use the public key infrastructure when users advertise their keys to join the SA protocol.

Digital Signature
With the public key infrastructure, a digital signature (DS) provides an authentication mechanism that securely associates users with the messages of their choice. The DS consists of the following algorithms: DS:vrfyðpk; m; sÞ: This algorithm takes pk; m, and ct as input and returns a verification result of 0 and 1, indicating verification failure and success, respectively. The SA can be either DS-enabled or DS-disabled. The former and latter are used to guarantee security in the active adversary and honest-but-curious adversary model, respectively. In this paper, we build VERSA on top of the DS-disabled SA for ease of presentation. Nevertheless, one can easily extend VERSA using the DS-enabled SA.

Technical Overview of SA
The SA uses KA to allow every pair of two users u; v, whose secret keys are sk u and sk v , to generate a shared secret value jointly. We denote this secret value as s u;v if it is held by user u or s v;u if it is held by user v. Note that s u;v ¼ s v;u . User u uses s u;v as a seed for PRG to derive a random vector p u;v ¼ PRGðs u;v Þ to encrypt the gradient. More precisely, given a set U U of n users with logical identities ½1; . . . ; n, user u masks gradient x u as follows: where y u ðmod RÞ 2 Z n R . Consider an aggregation of two masked gradients y u and y v , where u < v. Then, the two random vectors PRGðs u;v Þ and PRGðs v;u Þ, each generated by u and v respectively, cancel each other out. Consequently, if all users submit y u2U U successfully, then the server obtains the aggregated gradient P u2U U x u ¼ P u2U U y u . The SA is further elaborated by addressing user dropouts and late response problems.

User Dropouts
Consider when user u drops out before sending his masked gradient. Then, all random vectors PRGðs v;u Þ, where v 2 U U remain uncancelled in P u2U U y u . The SA addresses this concern as follows. Before submitting y u , user u splits the secret key sk u into n shares using SS. These shares are distributed to all users v 2 U U such that user v is given exactly one share. Then, the server asks the surviving users to submit the shares of the dropped-out user's secret key sk u , so the server recovers the key, computes PRGðs u;v Þ, and removes all instances of PRGðs v;u Þs from P u2U U y u .

Late Response
Consider when the users fail to communicate with the server promptly. Specifically, user u may transfer the gradient late, so the server collects the shares of sk u . This clearly raises a security issue because the server can recover sk u from the shares and leak x u from y u , which arrives late at the server, by deriving and removing all fs u;v g v2U U used for masking x u . The SA solves this problem by allowing u to mask x u twice: u selects a random seed b u and computes the following: The server should remove PRGðb u Þ to obtain the aggregated gradient. To this end, every user u splits b u and sends the shares to all users v 2 U U beforehand. Then, during the unmasking phase, the server must make an explicit choice concerning each user. For all dropped-out users u 2 U U, the server asks the surviving users to submit the shares of sk u . For all surviving users u 2 U U, the server asks them to submit the shares of b u . Consequently, the server can recover the aggregated gradient in plaintext by removing all PRGðs v;u Þs and PRGðb u Þs from P u2U U y u .

Proposed Scheme
A complete construction of VERSA is provided in Fig. 4. It is described following most terminologies used in SA [37] for consistency. Hereafter, we ignore user dropouts and late response for simplicity. That is, we assume that all pairs ðy u ; y u Þ u2U U are derived from Eqn. (1) and arrive at the server in time. We believe that this assumption helps readers quickly grasp the key ideas behind VERSA.
Intuitively, VERSA achieves the verifiability of an aggregated gradient employing double aggregation. The first aggregation is for computing the aggregated gradient itself, whereas the second one is for proving the correctness of the first aggregation. In VERSA, each user u submits y u and y u . Specifically, y u encrypts a model verification code F ðx u Þ ¼ a x u þ b, where operation is the Hadamard product. The two vectors ða; bÞ are secret vectors hidden from server view, and all users u 2 U U can compute the same pair of vectors ða; bÞ. Below we explain (i) how the model verification code is used for verifiable computation and (ii) how to share ða; bÞ without requiring additional communications between users.
The server performs aggregation twice such that it obtains z ¼ P u2U U y u and z ¼ Eventually, the users verify z by checking if the following condition holds: Intuitively, the verifiability of z is preserved for the following reason: z encapsulates z using two vectors ða; bÞ, where these vectors are hidden from the server. Thus, the server's probability of forging P u2U U x u (i.e., the aggregated result obtained from z) is reduced to the probability of recovering ða; bÞ, which is infeasible due to the one-wayness property of the PRG. Moreover, the server cannot recover F ðx u Þ from y u as long as the privacy guarantee of SA is preserved. That is, we mask F ðx u Þ using SA's masking method. Thus, even if ða; bÞ is revealed to the server due to user-server collusion, the server probability of recovering x u from y u is reduced to the probability of breaking SA.
This simplified description ignores a technical hurdle. Specifically, all surviving users must have the pair of vectors ða; bÞ in advance. One straightforward approach for realizing the assumption is establishing secure channels between them and ensuring they share such a pair via secure channels. However, this method is unsuitable for cross-device FL settings in practice. As an alternative approach, a multiparty computation protocol [60] can be applied to allow all users to compute the pair jointly; however, it requires communication rounds linear to the number of the participating entities. Although recent advances in multiparty computation require a constant number of communication rounds (see [61] for more details), its computation cost is still overwhelming due to heavy cryptographic algorithms, such as fully homomorphic encryption [62].
We address this issue using a secret expansion through the PRG. Specifically, recall that each user u in SA runs KA locally with all public keys of surviving users v 2 U U to compute a set of secret values fs u;v g v2U U to mask his gradient. Our approach is to allow every surviving user to derive another secret value from fs u;v g v2U U through the PRG. First, user u computes a P v2U U s u;v ðmod RÞ. Next, u expands a into two vectors as follows: Due to the one-wayness and pseudo-randomness of the PRG, the server can recover neither a nor ða; bÞ from ðz; zÞ. In the meantime, every surviving user u can derive a from fs u;v g v2U U and generate ða; bÞ, thus verifying z.

Correctness
The correctness of the sum (i.e., P u2U U x u and P u2U U F ðx u Þ) in VERSA is reduced to the correctness of SA, and this reduction holds even when some users drop out. For ease of presentation, we assume that the server receives all pairs ðy u ; y u Þ u2U U and performs SA correctly. In the case of aggregating y u , the following condition holds.
where all PRGðs u;v Þ and PRGðs v;u Þ generated by user u and v cancel each other out.
In the case of aggregating y u , the following condition holds.

Soundness
Intuitively, VERSA guarantees soundness when the server can convince users if and only if it returns a correct aggregated gradient. The server cannot generate a valid proof (i.e., z) without running VERSA correctly. To formally capture the soundness property of VERSA, we use a game-based security model in which an adversary (who tries to break the scheme) interacts with a challenger (who runs the scheme). The adversary is the server, and the challenger is an entity representing the (surviving) users. Assuming that the PRG is secure, we demonstrate that VERSA is secure. To do this, we define a soundness game in which the adversary is given a set of pairs fðy u ; y u Þg u2U U from the challenger. The adversary returns ðz; zÞ attesting to the aggregation being done correctly. The adversary goal is to induce the challenger to accept a false pair ðz Ã ; z Ã Þ. In this setting, we put a restriction on the challenger to verify ðz Ã ; z Ã Þ by running the Validating Output phase of VERSA, not by aggregating fðy u ; y u Þg u2U U from scratch. This restriction is reasonable because, in a cross-device FL setting, no entity accesses all pairs fðy u ; y u Þg u2U U , except for the server (the adversary in the game). The soundness game is as follows: Soundness Game: -Setup: The challenger C runs the Setup to ensure users agree on the cryptographic gadgets and security parameter , number of users n, threshold t, and public parameter pp KA:paramðÞ. On behalf of the users, C runs the Advertising Key and Sharing Key with the server. Specifically, all users use C as a proxy that relays from or to users. In addition, C returns all public parameters and output values of the Advertising Key and Sharing Key to A.
-Query: A adaptively makes the Masking Input queries. It sends a user set U U Ã of its choice to C, where t jU U Ã j. Moreover, C runs the Masking Input with all users in U U Ã to obtain a set of masked input vectors fðy u ; y u Þg u2U U Ã . Then, C returns this set to A. Next, A continues to query C for the set of masked input vectors that correspond to the user set U U Ã of its choice. A and C record all sets of masked input vectors and the user sets in order.
-Challenge: C chooses a user set U U Ã and requests an aggregation of all masked input vectors corresponding to U U Ã . -Forge: A computes and returns a pair of vectors ðz Ã ; z Ã Þ. If C accepts z Ã after running the Validating Output, then A wins the game; otherwise, A loses the game. We then have the following definition.

SECURITY ANALYSIS
In this section, we analyze the security of VERSA regarding the gradient privacy and soundness. In terms of gradient privacy, any attempt to leak gradients in VERSA is reduced to breaking the underlying gradient encryption protocol, i.e., SA, because VERSA runs on top of SA to encrypt gradients. Because SA hides all information about users' individual gradients except their sum and because the user never decrypts the ciphertexts outside his local storage throughout the protocol, the privacy of the individual gradient is preserved in the presence of the server and any subset U 0 & U of users.
In terms of soundness, to prove that the server can convince users if and only if it performs aggregation correctly, we rely on the security of the PRG. More precisely, PRG guarantees that its output on a seed chosen uniformly at random is computationally indistinguishable from an element of the output space sampled uniformly at random, as long as the seed is hidden from the distinguisher. Theorem 1. Provided that PRG is secure, VERSA guarantees soundness in the random oracle model.
Proof. Throughout the proof, we reduce the security of the proposed scheme to the security of the PRG. Specifically, we assume that adversary A wins the Soundness Game with nonnegligible probability . Then, we show how another adversary B uses A to break the security of the PRG. Without loss of generality, we assume that dropouts do not occur in favor of the simplicity of proof. However, our proof can easily be extended to consider dropouts by employing the SS.recon algorithm. We model an instance of the PRG as O P and a random oracle O R such that O P and O R are in charge of responding oracle queries to PRG and truly random sequences, respectively. Note that B does not communicate with the oracles directly. Instead, we model C as an oracle proxy that receives oracle queries from B, flips a coin c 2 fP; Rg, and relays oracle queries to O c . Then, B interacts with A as follows: Setup: B chooses a KA protocol, SS protocol, and AE and sets the security parameter , number of users n, and threshold t. Lastly, B obtains pp KA:paramðÞ to ensure users agree on cryptographic gadgets and , n, t, and pp.
Query: A adaptively makes the Masking Input queries: A selects a set of vectors of the same dimension F ¼ fx 1 ; . . . ; x n g and sends F to B. Then, B encrypts F and sends fy u ; y u g u2U back to A, where each pair ðy u ; y u Þ corresponds to x u . Further, A continues to query B for the vector sets F 0 of its choice. More precisely, B responds to A's queries as follows: -If A makes a query for F that has not been made, then B makes oracle queries to C as follows: (i) B requests two vectors of the same dimension. B requests to C, which retrieves the recorded tuple ðc; U; V U ; fu; V u g u2U Þ and returns ðV; V u Þ. -For each user u 2 U, B computes Gðx u Þ ¼ a x u þ b and computes ðy u ; y u Þ as follows: Lastly, B sends fy u ; y u g u2U to A.
Challenge: B requests from A the aggregation of a certain vector set F that B previously received from A.
Forge: A generates and sends a proof ðz Ã ; z Ã Þ to B. Then, B checks the validity of ðz Ã ; z Ã Þ. Because we assume that A wins the Soundness Game with non-negligible probability , ðz Ã ; z Ã Þ is a valid proof with the same probability. If z Ã ¼ a z Ã þ n Á b ðmod RÞ, then B returns P , indicating that all the vectors B received are derived from the instance of the PRG; otherwise B returns R, indicating that all the vectors are derived in a truly random manner.
The proof further proceeds as follows. If the vectors are derived from the instance of the PRG, then B correctly simulated the Masking Input queries from A. As the view of A in the simulation is identical to its view in the Soundness Game, the probability of B distinguishing between the outputs of the PRG and a truly random generator is reduced to the probability of A winning the Soundness Game. Specifically, the probability of A to win this game is reduced to the probability of A to recover ða; bÞ which is derived from the instance of the PRG. Therefore, we have Even when the vectors are not derived from the instance of the PRG, it still holds that B has simulated the Masking Input queries of A correctly. However, in this case, B cannot exploit the advantage of A because the probability of A to win this game is reduced to the probability of the ability of A to recover ða; bÞ which is derived in a truly random manner. Thus, the probability of B distinguishing between the outputs of the PRG and a truly random generator is no better than flipping a coin. Therefore, we have This concludes the proof. t u

EVALUATION
In this section, we evaluate VERSA compared to SA [37] and VerifyNet [30], which are the most relevant state-of-the-art proposals for privacy-preserving cross-device FL. We use these two schemes as a baseline to illustrate how efficiency can be improved while achieving a higher level of security.

Implementation Setup
We measured the computation time for four phases: the sharing key, masking input, unmasking input and returning output, and validating output. We ignored nondominant costs, such as running the setup and advertising key, which is considered a one-time cost. The experiment was conducted using Java on a desktop machine with a 3.00GHz Intel Core i7-9700 processor and 16 GB RAM. We used elliptic-curve Diffie-Hellman, t-out-of-n Shamir secret sharing, Advanced Encryption Standard Galois/Counter Mode with a 128-bit private key, and SHA-256 to implement the KA, SS, AE, and PRG, respectively. We set t ¼ 10 for SS. We used randomly generated 10K-entry vectors with 64-bit for each entry, while varying the number of users and user dropout ratio, to gain a general perspective on how the two varying factors affect the performance of the proposed four phases. To implement VerifyNet, we utilized a pairing-based cryptography library written in Java [63]. The experiment was performed on two user groups, consisting of 500 and 1,000 users, respectively.

Experimental Result
We provide the comparative experimental result in Table 2.
The table presents the overall performance of three schemes while varying the number of users and the user dropout ratio. In the sharing key phase, we did not observe any noticeable performance gap among the three schemes. In the masking input phase, SA exhibits the best efficiency over VerifyNet and VERSA because SA only supports the privacy preservation of gradients, whereas the others support verification of aggregate gradients as a supplementary functionality. The cost of VERSA approximately doubles the cost of SA. VERSA performs vector expansion though the PRG twice compared to SA, where the vector expansion is the computationally dominant operation in SA and VERSA. In contrast, VerifyNet incurs significant costs compared with both SA and VERSA. This is because of the extensive usage of group operations in VerifyNet. The computation cost of these group operations overwhelms that of the vector expansion through the PRG. In the unmasking input and returning output phase, VERSA and SA show the similar cost when the dropouts do not occur. However, these schemes exhibit a noticeable cost difference when the dropouts occur because the server reconstructs these vectors twice compared with SA. The cost of VerifyNet overwhelms the costs of SA and VERSA for the same reason in the masking input phase. Lastly, in the validating output phase, VerifyNet and VERSA incur a constant cost irrespective of the number of users and the user dropout ratios. However, the cost of VERSA is orders of magnitude smaller than that of VerifyNet. The main reason for such a performance gap between two schemes is that VERSA uses only a computationally lightweight PRG operations, whereas Verify-Net performs pairing operations, which are much more intensive than the PRG in terms of computation.

Dataset
We conducted an evaluation using the following three datasets: -MNIST [40] is a dataset of grayscale images of handwritten numbers from 0 to 9, consisting of 60,000 training, and 10,000 testing images. Each image has a size of 28 Â 28 Â 1 and one of 10 labels. -SVHN [41] is a dataset of RGB image of house numbers obtained from Google Street View. Similar to MNIST, it contains images of small, cropped digits but incorporates much more labeled data from significantly more challenging and unresolved realworld problems. It is divided into 73,257 digits for training and 26,032 digits for testing. This dataset has 10 classes for each number, and each image size is 32 Â 32 Â 3. -CIFAR100 [42] is a collection of photos of 100 objects in RGB. It has 100 classes containing 500 training images and 100 testing images per class, each of size 32 Â 32 Â 3. The characteristics of the above three datasets are summarized in Table 3.

Neural Networks
Each dataset was trained using different models. For the MNIST dataset, we used a three-layer network with two hidden, fully connected layers with 256 neurons and rectified linear units. The output layer is fully connected with 10 output neurons and softmax activation. For the SVHN and CIFAR100 datasets, a convolution neural network, consisting of seven convolutional layers with 3 Â 3 filters and a stride of 1, was used. Each convolutional layer was followed by rectified linear units and 2 Â 2 max pooling with a stride of 2. The fully connected layer used softmax activation.
For the three datasets, we also used ResNet [64], a model used in real-world environments. ResNet consists of 50 convolution layers, rectified linear units and 1 fully connected layer with 2,048 neurons. Originally used for ImageNet, we reduced the number of output neurons from 1K to the number of labels for each dataset.
The stochastic gradient descent optimizer was used for the MNIST dataset with a learning rate of 0.001 and the Adam optimizer was used for the simple convolution neural networks with a learning rate of 0.001. In terms of ResNet, we used stochastic gradient descent optimizer again to follow the training method in [65], i.e., train at different learning rates for each epoch.
The gradient size of each dataset's output of the fully connected layer was calculated by multiplying the input and output size, adding the bias size, as listed in Table 4.

Default Model Accuracy
In the experiment, N models were trained, where each model was trained by an individual user with 1,000 randomly selected data from each dataset. Each user received a pre-trained model and trained only the fully-connected layer while holding the parameters of the convolution layers. Then, the accuracy was evaluated for each test set, where N 2 f500; 1000g. The minimum and maximum accuracy value of N models are displayed in Fig. 5. The figures plot the model accuracy evaluated by each user. The minimum/maximum accuracy difference becomes slightly greater as the number of trained models increases from 500 to 1,000. We provide the model accuracy results in these figures as a baseline to measure the FL accuracy.

FL Accuracy
We compared the accuracy of N default models with that of FL in Fig. 6. From the experimental result, we did not observe any noticeable accuracy difference between the two groups of models. Nevertheless, the accuracy of FL is higher than that of the default models. Although the tested accuracy was measured only on the four types of model, rooms still exist to advocate that FL is slightly better than the default models.

Accuracy Comparison
The gradient entries should be in the form of an integer to encrypt the gradient of the model. Because all gradient entries are decimal numbers, we transformed the numbers to integers following the float-to-integer conversion method [52], [53]. We quantized a floating-point value v into an integer bv Á ac with a scaling factor a, where larger values of a leads to low quantization errors. In the experiment, we set a ¼ 10 x , where x is called a walk, and each walk is a positive integer. Such a transformation has a trade-off between computational overhead and accuracy. If we set a larger    Label Image size   MNIST  60,000  10,000  10  28x28x1  SVHN  73,257  26,032  10  32x32x3  CIFAR100  50,000  10,000  100  32x32x3 walk, then we have greater computation time. In contrast, if we set a smaller walk, then the accuracy of the FL model is reduced. Fig. 7 demonstrates the accuracy difference between the floating-point type FL and float-to-integer transformed FL models. The accuracy of floating-point type FL models is always greater than or equal to that of the integer type FL models. When we set walk less than 3, the accuracy of the floating-point type FL model is always higher than that of the float-to-integer transformed FL model. However, when we increased the walk, we observed no significant difference between the two types of models. Moreover, when we set walk larger than 6, no accuracy difference occurred at all. In Fig. 8, we plotted the details of the effect of the walk on accuracy. As we used a smaller walk (e.g., walk ¼ 1), the accuracy decreased significantly compared with the floating-point type FL models. However, such an accuracy gap was drastically reduced when walk ! 3.

RELATED WORK
We describe previous studies for secure FL in two main approaches (i.e., privacy preservation and verifiable computation).

Privacy-Preserving Deep Learning
Training a machine learning model on a third-party provider, such as machine learning as a service, may leak sensitive training data despite its promising aspect. For example, neural network parameters trained locally on the user side could be exploited to leak sensitive training examples [7], [8], [10]. Some proposals employ differential privacy techniques over gradients obtained by the stochastic gradient descent rule to protect against training data leakage attacks [9], [20]. However, these schemes rely on a trusted party who can access the training data to add noise.
Bonawitz et al. [37] proposed an SA protocol for the cross-device federated learning settings. In SA, the local model parameters are encrypted while a central server can derive a global, aggregated model parameter from the encrypted local model parameters without decryption. The SA uses a PRG to mask model parameters, making it best suitable for encrypting the model parameters of highdimensional vectors in resource-constrained user devices. Moreover, the SA is resilient to dropouts by users, which would occur frequently in a cross-device setting. Some prior work has relied on homomorphic encryption for privacy preservation during training [21], [22], yielding impractical performance concerns.
Sav et al. [23] recently proposed POSEIDON, a privacypreserving FL protocol. Their scheme highly relies on a multiparty variant of homomorphic encryption, and the user dropout issue remains untouched, both of which we consider crucial in cross-device FL. Therefore, we designed VERSA especially for SA, to inherit all the beneficial properties of SA.

Verifiability in Federated Learning
Only a few studies have addressed verifiability in the context of FL. Recently, Xu et al. [30] proposed VerifyNet, an SA protocol with verifiable computation. VerifyNet is similar to VERSA for the two reasons. First, both schemes are designed to run on top of SA for the cross-device FL settings [37]. Second, they aim to provide the verifiability of global model parameters in the training phase. In VerifyNet, all users must share a secret value that is used to generate an evaluation key for a parameter. Each user runs a local training model and encodes the output parameter to an evaluation key. The server receives the encrypted local model parameters and evaluation keys and returns the proof by aggregating all evaluation keys and computing the sum of all  parameters. The user verifies the correctness of the sum using the proof.
VerifyNet has two limitations. First, the user-side computational cost is too extensive to process high-dimensional data. Specifically, for each entry in a gradient, Verify-Net requires six exponentiations in a cyclic group G to generate an evaluation key and four pairings and one exponentiation to verify the result. Second, VerifyNet could allow malicious users to learn a victim's local model parameters. As we show in Section 4, the attack results on Verify-Net demonstrate that an adversary can recover most of the victim's local model parameters in a reasonable time.
Guo et al. [66] recently proposed VERIFL to reduce the communication cost of VerifyNet when processing highdimensional gradient vectors. To achieve verifiability in FL, VERIFL employs a homomorphic hash in the same way as VerifyNet, i.e., all users in VERIFL share the secret key of the homomorphic hash. Thus, as our model recovery attack on VerifyNet exploits the shared key, it is straightforward that VERIFL is also vulnerable to the attack. Fu et al. [32] proposed VFL, a verifiable FL protocol for cross-device settings. However, similar to VerifyNet and VERIFL, VFL relies on the assumption that all users share a predefined secret parameter in advance to verify the global model parameters, which is not practical in cross-device federated learning settings.
While VERSA and prior work [30], [32], [66] aim at achieving the verifiability of aggregated models under the untrusted central server, several works addressed it under different threat models. Specifically, Zhang et al. [67] showed how to guarantee the correctness of aggregated models from misbehaving users who may not execute training tasks as intended. Peng et al. [68] addressed the setting where the role of the single central server is distributed to multiple parties. They then achieved the verifiability of aggregated models, provided that a majority of parties are honest.

Verifiability in Various Settings
In addition to the verifiable FL schemes described above, several proposals have introduced different verifiable machine learning approaches. Ghodsi et al. [52] proposed Safetynets, a neural net framework for verifying the correctness of inference tasks in a single client-server setting. Xu et al. [69] demonstrated how to support verifiability for inference tasks using a homomorphic encryption. Tram _ er et al. [70] proposed Slalom to verify machine learning tasks employing a trusted execution environment. Slalom is faster than computationally expensive cryptographic techniquebased approach like [69], but it highly relies on the trust assumption concerning chip vendors, such as Intel's software guard extensions [71]. Niu et al. [72] proposed a verifiable inference model protocol for a typical client-server setting, but its computational workload is extensive due to the extensive usage of bilinear pairings.

CONCLUSION
In this paper, we studied the problem of verifying the model aggregation of a neural network in the presence of a federation of users and an untrusted central server. We designed a novel attack against prior work and demonstrated that local model parameters are revealed within a reasonable time (e.g., 98% of encrypted local model parameters sent to the central server during the training phase are recovered with 21 h), violating privacy. Then, we proposed VERSA, a verifiable secure aggregation protocol for cross-device federated learning. VERSA effectively supports privacy-preserving model aggregation and verifiability, even when the model parameters are high-dimensional vectors, in a communication-efficient and failure-robust manner. VERSA does not require any trusted setup between users while enabling both the central server and users to use only a PRG to prove and verify the correctness of model aggregation. We experimentally confirmed the efficiency of VERSA under three datasets (MNIST, SVHN, and CIFA100), demonstrating VERSA is orders of magnitude faster than models in prior work.