Privacy Preserving Collaborative Machine Learning

Collaborative machine learning is a promising paradigm that allows multiple participants to jointly train a machine learning model without exposing their private datasets to other parties. Although collaborative machine learning is more privacy-friendly compared with conventional machine learning methods, the intermediate model parameters exchanged among di ﬀ erent participants in the training process may still reveal sensitive information about participants’ local datasets. In this paper, we introduce a novel privacy-preserving collaborative machine learning mechanism by utilizing two non-colluding servers to perform secure aggregation of the intermediate parameters from participants. Compared with other existing solutions, our solution can achieve the same level of accuracy while incurring signiﬁcantly lower computational cost.


Introduction
Collaborative machine learning is a promising paradigm for training models from datasets hosted by distributed parties.In contrast to conventional centralized machine learning in which a central server with access to all the training data trains a machine learning model locally, collaborative machine learning allows multiple parties each with a local dataset to jointly train a global model over the whole dataset without revealing any party's local dataset to others.Collaborative machine learning is particularly attractive when local datasets involve highly sensitive information such as health records.
Unfortunately, even though the local training dataset of each participant is kept secret from other parties during the training process, intermediate model parameters exchanged among different participants during the training process may still reveal some information about the local dataset, which may be used to infer or even recover the local dataset of a target participant [1][2][3][4].In particular, recent studies have shown that it is possible to reconstruct local input data from gradient information in collaborative learning [5].Therefore, it is important to design a sound mechanism to prevent such privacy leakage in collaborative machine learning.
Existing solutions for protecting participants' data privacy in collaborative machine learning can be broadly divided into two categories.The first category [6][7][8][9][10][11] uses cryptographic techniques to encrypt the intermediate model parameters while still allowing global model updates.Although these solutions can ensure the confidentiality of local model parameters during the training process, encryption and decryption operations usually incur high computation and communication cost, which may be even infeasible for resource-constrained mobile devices.The second category [12][13][14][15][16][17] adopt the Differential Privacy (DP) paradigm to have each party randomly perturbs its intermediate model parameters to prevent others from inferring its local dataset while still allowing a reasonably accurate model to be trained.In comparison with the encryption-based solutions, DP-based collaborative machine learning methods are easier to deploy and incur much lower computation cost.However, there is an inherent trade-off between the level of privacy guarantee and the accuracy of the trained model.In addition, the random noise introduced in every iteration of the training process may make the training process converge much slower.
In this paper, we tackle this challenge by introducing a novel privacy-preserving collaborative machine learning mechanism.Our mechanism explores two non-colluding servers and efficient cryptographic techniques to realize secure aggregation of local model parameters during the training process and protect the privacy of the local dataset of each participant.Compared with existing solutions, our solution can train a global model with the same accuracy of standard machine learning methods while incurring very low computation and communication overheads.Our contributions in this paper can be summarized as follows.
• We a novel privacy-preserving collaborative machine learning mechanism that explores two non-colluding servers and efficient cryptographic primitives.
• Our mechanism allows multiple parties to train an accurate global model without revealing any party's local dataset to each other.
• We confirm the efficacy and efficiency of the proposed mechanism via detailed simulation studies.
The rest of this paper is structured as follows.Section 2 discusses the related work.Section 3 formulates the problem to be solved and presents the system model, adversary model, and our proposed privacy preserving mechanism.Section 5 describes the experiments we conduct to evaluate the performance of our proposed mechanism.Section 6 finally concludes this paper.

Related Work
Various solutions have been proposed to protect the data privacy of participants in collaborative machine learning, which can be broadly classified into two categories: encryption-based solutions and Differential Privacy (DP)-based solutions.
Encryption-based solutions protect the intermediate model updates through Secure Multi-Party Computation (SMPC).SMPC was developed for the scenarios where multiple parties wish to jointly evaluate a function over their private data without revealing any party's data to other parties.Different SMPC techniques have been used to realize privacy-preserving collaborative learning, including Yao's garbled circuit protocol [6,7], homomorphic encryption [8,9], secure aggregation methods [10,11], etc.These encryption based mechanisms can produce accurate model without harming the prediction accuracy of the trained model because they do not change the model parameters.On the other hand, since these techniques commonly involve expensive public key operations, they usually incur high computation and communication costs.
DP-based solutions protect participants' data privacy by having each participant randomly perturb its local model parameters.Different DP-based solutions add random noises to different model parameters, including local model parameters [12][13][14], local objective functionss [15,16], and local training datasets [17].DPbased collaborative machine learning methods provide a tunable balance between data privacy and model utility.The additional computation cost introduced by the perturbation is also small, making DP-based solutions more efficient than encryption-based methods.However, the noises introduced during the training process result in the decrease in the accuracy of the trained model.

Problem Formulation
In this section, we introduce the system and adversary models along with our design goals.

System Model
We consider a system in which a central server and N participants collaboratively trains a global model.We use the Alternating Direction Method of Multipliers (ADMM) [18] as our machine learning method, which is an promising machine learning framework and has been attract a lot of attentions in recent years due to its capability to support a wide range of objective functions and mild constrains on objective functions such as weak convexity.
In ADMM, participants minimize their local loss functions based on their local dataset and reach consensus with others to train a global model.The constrained collaborative optimization problem can be formulated as following: where x i ∈ R D is participant i's local copy of the model parameter to learn, and f i is participant i's local loss function.A consensus over x 1 , ..., x N needs to be reached with the global copy of the model z to complete the training process.
The high-level system structure and the training process is illustrated as Fig. 1.Participants are responsible to update and maintain their own x values, while the server aggregates the local updated x values from all participants and updates the global z value.The system works in a synchronous fashion.In the (k + 1)th iteration, each participant i uses the current global model parameter z k to calculate the new x value x k+1 i according to the x-update step in Eq. ( 3) and then sends x k+1 i and λ k i to the server.The central server gathers the received x values and λ values from the N participants and calculate new global parameter z k+1 based on the z-update step in Eq. ( 3) and send it to all participants.After receiving the new global parameter z k+1 , each participant i calculates the new λ value based on the λ-update step in Eq. ( 3).This alternative and iterative parameter updating process terminates when the change in z and the maximum difference between x i and z between two adjacent iterations are both smaller than predefined thresholds.
In this collaborative learning mechanism, the server only requires participants to update their x and λ values to train the final model, thus participants can keep their local dataset unexposed to other parties.

Adversary Model
We assume that N participants and a central server jointly train a machine learning model using the ADMM-based collaborative learning algorithm.We assume that the adversary is either be an honest-butcurious central server or a malicious participant that engages in the collaborative training and eavesdrops the updated parameters from the victim.The adversary's goal is to infer a victim participant's private dataset via inference attacks.
We assume that the adversary can observe the communication between the server and the victim participant.Specifically, the adversary knows the z values sent by the server, and the x and λ values sent by the victim participant in each iteration.However, we assume that the adversary does not have the direct access to the victim's local dataset.
We stress that if the accurate global model is published by the end of training, the information leakage of a participant's local dataset caused by the final model itself is inevitable.Therefore, we only seek to prevent the information leakage caused by the intermediate parameters in the training process alone.
Inspired by the secure aggregation mechanisms [10,11], we observe that the iterative updates of the global model parameter can be considered as a SUM aggregation of local model parameters.Therefore, we propose a light-weight privacy preserving collaborative learning method that explores two non-colluding servers to allow participants to train an accurate model efficiently without revealing information of their private datasets.Such two non-colluding servers have also been explored in other secure multiparty computation applications such as [19].
We assume there are two non-colluding servers, including a primary server S 1 and an auxiliary server S 2 .Server S 1 is responsible for the global model updates, and server S 2 is merely used for privacy provision.We assume that each participant i shares a secret key K i with the S 2 via some secure channels.The proposed privacy preserving collaborative machine learning algorithm works as follows.
Before the first iteration, server S 1 informs the IDs of all the participants to S 2 and then randomly selects its initial z 0 value and broadcasts it to all the N participants.Each participant i selects its initial λ 0 i .In the (k + 1)th iteration (k ≥ 0), each participant i first executes the x-step according to Eq. ( 3) as and computes Let H : {0, 1} * → R D be a cryptographic hash function that maps any input into a D-dimensional vector.In particular, suppose that each λ k i and x k i is represented as a q-bit binary number, i.e., λ k i , x k i ∈ {0, . . ., 2 q − 1}.Given a cryptographic hash function H (•) that maps any input to a digest of h bits, where h ≥ qD, we can realize H(•) by defining The participant i then generates a random noise r k+1 i = H(K i ||k), where K i is the secret key shared with S 2 .He then computes and send y k+1 i to S 1 .After receiving y k+1 1 , . . ., y k+1 N , server S 1 informs S 2 that it has received the local model parameters from all participants.S 2 then computes and sends r k+1 to S 1 .On receiving r k+1 , server S 1 then computes ) and send it to all participants.
After receiving the new global parameter z k+1 , each participant i calculates the new λ value based on the λ-update step in Eq. ( 3), and the (k + 1) th iteration ends.Fig. 2 illustrates the proposed privacy preserving collaborative learning system.
As we can see from Eq. ( 7) and Eq. ( 8 We can also see from Eq. 8 that S 1 can still compute an accurate global model parameter z k in each iteration, so the utility of the final model is not harmed.In contrast to existing encryption-based solutions that involve expensive public key operations, the cryptographic hash function that we use incurs a much lower computation cost.There is only one extra simple computation required from both S 1 and each participant, compared with the non-privacypreserving version.There is also only one more round of message exchange between S 1 and S 2 in each iteration.Therefore, our design goal has been fulfilled: high accuracy of the model, high privacy preservation level of participant's data, and low computation cost.

Performance Evaluation
In this section, we report our experiment results.

Experiment Setup
We implemented our Privacy Preservation ADMMbased Collaborative Learning algorithm (PPADMM) based on Least Absolute Shrinkage and Selection Operator (Lasso) as it can be easily extended to a wide variety of statistical model.In addition, we choose SHA-256 as the cryptographic hash function in our algorithm.We implemented our solution using Python 3. Our training and testing dataset in this experiment is generated by the in-built regression distribution generator in sklearn.datasets, a library developed specifically for machine-learning purpose in Python 3. sklearn.datasetscan generate a dataset that satisfies a certain Gaussian distribution with tunable mean values and noise.The software is run on a laptop with a 2.60 GHZ CPU that has 8 Intel i7-6700HQ cores and 16GB RAM.

Collaborative
Learning System.The collaborative machine learning system consists of 10 participants and 2 non-colluding servers S 1 and S 2 .S 1 serves as the primary server which gathers local parameters and updates the global model, and S 2 is responsible for privacy provision.S 1 , S 2 and participants update global model and local models iteratively following the protocol described in Section 4.
Dataset and System Parameters.We generate 22000 synthesized samples following a Gaussian distribution, each with 10 features, and use them to train and test our Lasso predictor.The size of the training set is 20000, and the size of the test set is 2000.The standard deviation of the gaussian noise of the dataset is 20.We set the learning rate α to be 0.001, and the regulation parameter ρ to avoid overfitting to be 1, which are the common parameter settings for Lasso regression.
Performance Metrics.We evaluate the performance of our algorithm with following metrics.
Accuracy: We use 2 metrics to measure the accuracy of our algorithm.The first one is the Root Mean Square Error (RMSE) defined as follows.
where y i is the true label of sample i, ŷi is the label predicted by the model, and N is the total number of samples.RMSE is a common method used to evaluate the prediction error of a regression-based model.It reflects the differences between the true values and the predicted values of samples.The lower the RMSE is, the more accurate the model is.
The second metric is the Model Parameter Euclidean Distance (MPED), which is defined as follows.
where W is the true regression function that generates the training dataset, and Ŵ is the predictor trained with our algorithm.w i and ŵi are the i th parameter of W and Ŵ respectively, and m is the total number of parameters.MPED represents the difference between the parameters of the true regression function and the predicted regression function.Smaller MPED indicates that the predicted regression is closer to the true distribution of data in the dataset.
Computation Cost: We measure the computation cost of our algorithm when it is run with a certain  iteration number.We consider the actual running time of the algorithm as the computation cost in this experiment.

Experimental Results
We compare three algorithms: the original ADMM-Lasso, our algorithm PPADMM, and a classic DP-based privacy preserving collaborative learning algorithm, DSSGD [12], with respect to their performance on model accuracy and computational efficiency using the metrics in Section 5.2.We measure both RMSE and MPED to evaluate the accuracy of the regression models trained by all these three algorithms with different privacy parameters = 0.01, 0.1, 1, 10, respectively.The result is shown in Fig. 3.We can see that our algorithm PPADMM has the same RMSE as the original ADMM-Lasso algorithm.This is because during the secure aggregation process of PPADMM, the primary server S 1 removes the aggregated noise with the help of the auxiliary server S 2 to obtain the exact original global model parameters, thus the final model produced by PPADMM and ADMM-Lasso are identical.Since in DSSGD the participants perturbs the local parameters with Laplace noise in every iteration, it results in a model that predicts labels of data with a much higher RMSE compared with PPADMM and ADMM-Lasso.For example, RMSE of DSSGD can still be larger than 100 when = 0.1 after it converges, while PPADMM and ADMM-Lasso has a RMSE close to 20, which is the standard deviation of the gaussian noise of the dataset.The convergence speed of PPADMM and ADMM-Lasso is also faster than that of DSSGD, according to Fig. 3.We can also see that there is an inherent trade-off between  We also measure how Model Parameter Euclidean Distance (MPED) of three algorithms (see Eq. ( 10)) change with the iteration number as the other criteria for model accuracy.The result is shown in Fig. 4. We can find that similar to the measurement of RMSE, PPADMM and ADMM-Lasso generate smaller MPED under different privacy parameters compared to DSSGD, which means our algorithm trains a model closer to the true distribution of original dataset.Fig. 5 shows the computation time of three algorithms.The x-axis represents the iteration number that the algorithm runs, and the y-axis represents the real-world computation time of the algorithm spends with such number of iterations.We can see that PPADMM has a higher computation cost compared with ADMM-Lasso and DSSGD.The main reason that PPADMM incurs higher computational cost is that even though we apply SHA-256 as our cryptographic hash function, and it is more efficient than cryptographic techniques such as Yao's Garbled Circuit and homomorphic encryption used in other existing cryptographic-based privacy preserving collaborative algorithms, SHA-256 itself is still more time-consuming compared with the generation of Laplace noise in DSSGD.Although our algorithm has a higher computation cost than DSSGD, we can see that the difference between these two is not   [6,8,11], our algorithm results in a linear-growth computational cost.We let participants and S 2 apply SHA-256 to generate secure hash digests at each iteration in our implementation, allowing S 2 to change the secret key shared with a participant during the training process.An more computationally efficient approach for participants and S 2 is that they can pre-compute the hash digest before the collaborative learning starts, if they agree on not changing the shared secret key in the middle of the training.
Since the information leakage of participants' local datasets caused by exchanged intermediates in collaborative learning is fairly complex, and it varies corresponding to different types of inference attacks that the adversary launches, there is yet no universal criterion that quantitatively measures such privacy leakage in a collaborative learning system.We argue that our algorithm provides stronger privacy guarantee than DP-based mechanisms, since the intermediate local parameters exchanged in PPADMM is merely the cryptographic hash digest of the original ones.It is computationally infeasible for the adversary to acquire the original local parameters if he does not know the secret key shared between participants and S 2 , due to the one-way property and collision-free property of the cryptographic hash function.On the other hand, DP-based mechanisms provide privacy protection by adding noise generated from a certain distribution such as Laplace distribution.Such distribution can be estimated through multiple rounds of observation on the perturbed intermediates, and such estimation could be used to reduce the impact of perturbation on these intermediates and infer the distribution of original intermediates.

Conclusions and Future Work
In this paper, we introduce a novel privacy-preserving collaborative learning mechanism based on secure SUM aggregation via two non-colluding servers.Our solution allows the server to receive accurate aggregated local model update in each iteration without learning any individual participant's local model update and can achieve the same level of accuracy of standard collaborative learning mechanisms.Built upon efficient cryptographic primitives, the computation cost of our mechanism is also orders of magnitude lower than existing encryption-based solution.We have confirmed the efficacy and efficacy of our mechanism through experiment studies.

Figure 1 .
Figure 1.ADMM based collaborative machine learning model

Figure 3 .
Figure 3. Relationship between RMSE and the number of iteration of ADMM, PPADMM, and DSSGD.

Figure 5 .
Figure 5.Comparison of the computation time of ADMM, PPADMM, and DSSGD