Distributed Outsourced Privacy-Preserving Gradient Descent Methods among Multiple Parties

)e Internet of)ings (IoT) is one of the latest internet evolutions. Cloud computing is an important technique which realizes the computational demand of largely distributed IoT devices/sensors by employing various machine learning models. Gradient descent methods are widely employed to find the optimal coefficients of a machine learning model in the cloud computing. Commonly, the data are distributed among multiple data owners, whereas the target function is held by the model owner. )e model owner can train its model over data owner’s data and provide predictions. However, the dataset or the target function’s confidentiality may not be kept in secret during computations. )us, security threats and privacy risks arise. To address the data and model’s privacy mentioned above, we present two new outsourced privacy-preserving gradient descent (OPPGD) method schemes over horizontally or vertically partitioned data among multiple parties, respectively. Compared to previously proposed solutions, our methods improve in comprehensiveness in a more general scene. )e data privacy and the model privacy are preserved during the whole learning and prediction procedures. In addition, the execution performance evaluation demonstrates that our schemes can help the model owner to optimize its target function and provide exact prediction with high efficiency and accuracy.


Introduction
e Internet of ings (IoT) is the latest internet evolution which provides multifarious novel digital, smart services and products by integrating abundant devices into networks [1]. It enables the communication between the physical world and the cyberspace [2]. IoT system contains radio-frequency identifications, wireless sensor networks, and the cloud computing [3]. Cloud computing realizes the computational demand of large-scale distributed IoT devices or sensors through various machine learning methods. Since IoT devices have tiny memory, the collected data are required to be stored and managed by the cloud servers [3][4][5]. Data can be downloaded from the cloud for different purposes such as machine learning. However, since there may exist sensitive data such as physiological data, location data, and some other data which are closely related to our personal information [6], this exposes the data to security breaches. erefore, IoT not only provides convenience but also brings about security and privacy issues [7]. How to deal with security, privacy, and trust has been one of the main barriers in developing IoT in the real world [8,9]. Most of the existing work on the protection of sensitive data is based on the secure communication channels and authorization [10]. In our paper, we focus on the protection of sensitive data in machine learning or deep learning. e data can be protected during the transmission phase, the computation phase, and the prediction phase. Furthermore, the computation and prediction results' privacy can also be preserved.
In machine learning or deep learning, the prediction function is usually called the decision model. e model coefficients' quality determines the accuracy of the model. In order to minimize the error of the model, the optimal coefficients are indispensable.
is process is called model learning. Gradient descent methods are effective methods to find the optimal coefficients of the decision model, such as linear regression, hyperplane decision classification, and neural networks. Gradient descent methods conclude four types: classical gradient descent method (GD), stochastic gradient descent method (SGD), minibatch stochastic gradient descent method (minibatch SGD), and momentum. rough these methods, the optimal prediction function can be obtained after several iterations.
In the cloud computing, the cloud server offers huge storage and computing capacity. e model owner initializes the prediction function, and the training data are distributed among different data owners who hope to get desired results with these data by cloud servers without exposing their privacy.
ese data form an enormous training dataset which is divided into different disjoint subsets held by different data owners. e dataset partition can be horizontal or vertical. e number of data owners can be two, even more than two. As is known to all, the channel transmission is not secure in our real life. In addition, data owners, the model owner, and the cloud server do not trust each other. When they train a decision model together, they worry about that any other participant may get information from their own data. So, they encrypt their training data or the decision model with their own public keys or blind their data to preserve confidentiality before delivering them to the cloud server. e training data and the decision model can be kept confidential during the whole cloud computing. After finishing training the decision model, the model owner learns the model securely based on the training dataset with the help of the cloud server. At this time, the clients can get the prediction about their request data from the cloud server according to this decision model.
At present, although a lot of researchers focus on the data privacy protection or the model privacy protection when gradient descent methods are utilized to optimize machine learning models, few schemes can provide both data privacy and model privacy at the same time. Beyond that, some privacy-preserving gradient descent schemes can protect data owners' privacy, but they are not applied to an outsourcing computation. In addition, the dataset's partition is usually horizontal or vertical in the distributed system. In many previous literature studies, few schemes can be applied to two different partitioned datasets at the same time. Besides, both training data and the decision model are held only by data owners rather than the model owner. In fact, it is more practical that the models are held by the model owner rather than the data owner. Motivated by the above, we construct two novel outsourcing gradient descent methods to solve these problems.
Generally speaking, it is necessary to preserve the privacy of the training data, the decision model, and the request data during the model training. Assume that there exists a training dataset X, and the corresponding label vector is y. Each row of the dataset represents one sample x with a set of attributes. By f(x), we denote the prediction function which maps the sample x i into its corresponding category label y i . According to the partition of the dataset, each data owner has part of data samples or part of the attributes. e model owner holds the coefficients of the prediction function f(x). e target of data owners and the model owner is to minimize the error of the prediction function and obtain the optimal coefficients ultimately through the gradient descent methods. us, the model owner holds the optimal decision model. en, it can provide the client accurate prediction. In this paper, we focus on outsourced gradient descent methods over distributed data among multiple parties which conclude data owners, the model owner, the cloud server, and the client. Both horizontal and vertical partition of the dataset are discussed. For the horizontally partitioned dataset, two or multiple data owners hold different samples with the same attributes, whereas two or more data owners hold all same samples but with different sets of attributes when the dataset is vertically partitioned.

Contributions.
To address the privacy when performing gradient descent methods by multiple parties via the cloud computing, we propose two OPPGD schemes over horizontally or vertically distributed data. Our main contributions of this paper are summarized as follows: (1) We design an outsourced privacy-preserving scalar product (OPPSP) algorithm. e cloud server computes the inner product of two vectors encrypted under different keys securely. For example, one data owner and the model owner hold one vector, respectively. Both parties first encrypt their own vector with their own key and send the encrypted vector to the cloud server. en, the cloud server computes the scalar product of these two encrypted vectors.
(2) We propose two secure and comprehensive schemes to perform OPPGD over horizontally or vertically distributed dataset, respectively. e number of data owners can be two or more than two. e prediction functions are linear regression or neural networks. e OPPGD schemes are applied to classical GD, SGD, minibatch SGD, and momentum. It is worth noting that our schemes are with higher applicability and practicability contrasted with other schemes.
(3) We demonstrate that our OPPGD schemes are privacy-preserving. e computational cost and communication complexities are discussed. e analyses show that our OPPGD schemes are with high efficiency and accuracy.

Organization.
e remainder of this paper is as follows. In Section 2, we discuss the related works on privacy-preserving gradient descent methods. In Section 3, we briefly introduce some preliminaries, Elgamal homomorphic cryptosystem [11], and gradient descent methods. In Section 4, we describe the system model, problem statements, the threat model, and the system requirements. We present two OPPGD schemes and prove their correctness, security, and complexity in Section 5. e performance evaluation of the schemes is analyzed in Section 6. Section 7 makes a conclusion on our OPPGD schemes.

Related Works
In this section, we review works on privacy-preserving gradient descent methods among parties. According to the existence or absence of cloud servers, the existing works can be classified into two categories.

e Absence of Cloud Servers.
Wan et al. [12] presented the first privacy-preserving scheme for gradient descent methods. ey proposed a generic formulation of gradient descent methods by defining the prediction function f(x) as a composition g°h(x). e formulation is used to perform the specific iteration-based algorithm in linear regression or neural networks. In our paper, we also use this formulation. However, the partition of the dataset discussed in their scheme [12] is only vertical. Han et al. [13] extended the scheme [12] to the horizontally distributed dataset and proposed the least square approach to perform gradient descent methods. Both schemes [12,13] utilize a secure scalar product to gain their privacy preservation, but they cannot be applied to the outsourced model. Gabor Danner and Jelasity [14] designed a novel fully distributed privacypreserving minibatch SGD that can avoid collecting any personal data centrally. eir scheme does not require the precise sum of gradients. A tree topology and homomorphic encryption are employed to produce a "quick and dirty" partial sum. e protocol can resist collusion attacks. Hegedus and Jelasity [15] adopted differential privacy technology to solve privacy-preserving stochastic distributed gradient descent methods. Mehnaz et al. [16] designed two secure gradient descent schemes over horizontally partitioned data and vertically partitioned data via a secure sum protocol. Later, they designed a secure gradient descent method scheme [17] without Yao's circuits over the arbitrarily partitioned dataset. Based on output perturbation, Wu et al. [18] devised a novel "bolt-on" differentially private algorithm for stochastic gradient descent.

e Existence of Cloud Servers.
Liu et al. [19] designed an encrypted gradient descent method. Both data owners and the cloud server perform operations collaboratively to learn the target function without leaking any data privacy. ey extended their scheme to the outsourced model by utilizing the BGN cryptosystem. However, their protocol is only suitable for a two-party scenario. Shokri and Shmatikov [20] learnt an accurate neural network model without sharing input datasets by using the stochastic gradient descent method. After the parameter server initializes the parameter vector, it updates the parameters with the help of the cloud server without leaking any privacy. Kim et al. [21] provided a practical frame for mainstream learning models such as logistic regression. ey calculated the gradient descent method securely by using homomorphic encryption, but this is inefficient. Since the required bit length of ciphertext modulus per iteration is too long, it also takes up too much space. Francisco-Javier et al. [22] realized training supervised machine learning over ciphertext. rough the gradient descent method, the server optimizes the predicted training model without exposing the data or the training model. Mohassel and Zhang [23] used the stochastic gradient descent method to construct new and efficient privacypreserving machine learning protocols for linear regression, logistic regression, and neural network. eir protocol is involved with a two-server model. Data providers distribute their private data among two noncolluding servers, while the servers train models on the joint data through secure twoparty computation techniques. Li et al. [24] also presented a multikey privacy-preserving deep learning scheme in the cloud computing environment. eir protocols realize outsourced multilayer backpropagation network learning via the gradient descent methods. Ma et al. [25] took advantage of two noncolluding servers' framework to build a new outsourced model of the privacy-preserving neural network. However, the model owner can only make prediction rather than learning the model itself.

e Other Works on Privacy Preservation for Machine
Learning. Aside from the above privacy-preserving gradient descent methods, there are also plenty of other works on privacy-preserving computation over distributed data among multiple parties under the cloud environment. Liu et al. [26] constructed an efficient privacy-preserving method to compute outsourced data.
ey [27] also proposed a privacypreserving outsourced calculation toolkit, which allows data owners to securely outsource their data to the cloud for storage and calculation. Rady et al. [28] designed a new architecture that achieves the confidentiality and integrity of query results of the outsourced database. Yu et al. [29] devised a verifiable outsourced computation scheme over encrypted data by employing fully homomorphic encryption and polynomial factorization algorithm. Chamikara et al. [30] presented an efficient and scalable nonreversible perturbation algorithm of data mining without leaking privacy of big data via optimal geometric transformations. Li et al. [31] proposed a novel outsourced privacy-preserving classification scheme based on homomorphic encryption. In their scheme, multiple parties outsource securely their sensitive data to an untrusted evaluator for storing and processing. Li et al. [32] devised a novel scheme for a classifier owner to provide users with the privacy-preserving classification service by delegating a cloud server. However, they focus on two concrete secure classification protocols: naive Bayes classifier and hyperplane decision classifier. Park et al. [33] described a privacy-preserving naive Bayes protocol. No intermediate interactions are required between the server and the clients. Hence, their protocols can alleviate the heavy computational cost of fully homomorphic encryption. Li et al. [34] proposed an outsourced privacy-preserving C4.5 decision tree algorithm over both horizontally and vertically partitioned datasets. ey used the BCP cryptosystem to present an outsourced privacypreserving weighted average protocol. Rong et al. [35] presented a series of privacy-preserving building blocks for verifiable and privacy-preserving association rule mining under the hybrid cloud environment. Li et al. [36] used an efficient homomorphic encryption with multiple keys to design an outsourced privacy-preserving ID3 data mining solution. Xue et al. [37] built a differential privacy-based privacy-preserving classification system for secure edge computing. Yang et al. [38] realized privacy-preserving medical record sharing in the cloud computing environment. Kaur et al. [39] devised an efficient privacy-preserving collaborative filtering for the healthcare recommender system over arbitrary distributed data. In our work, we aim at designing outsourced privacy-preserving gradient descent methods among multiple parties. To the best of our knowledge, there has not been any work which addresses the issue comprehensively.

Preliminaries
In this section, we introduce some preliminaries for our outsourced privacy-preserving gradient descent schemes.

e Elgamal Homomorphic Cryptosystem.
e Elgamal cryptosystem [11] comprises the following algorithms: preparation, key generation, encryption, and decryption: Preparation (λ): given a security parameter λ. e system generates the public parameter PP as follows.
e system first chooses a large prime number N and a random number g less than N. And it publishes the multiplicative cyclic group G of prime order N with the generator g. e public parameter PP � (Ng) KeyGen (PP): taking PP as the input, each party P i randomly selects a number sk i less than N as its private key and computes pk i � g sk i mod N as its public key. Enc pk i (M i ): P i selects a random integer r i which is coprime to (N − 1) and encrypts its plaintext M i with the public key pk i to generate the ciphertext Dec pk i (C i ): each party P i decrypts C i with its secret key sk i and obtains the plaintext M i . e decryption process is Its correctness is early confirmed.
e semantic security of the Elgamal cryptosystem is based on the hardness assumption of discrete logarithm problems over finite fields.

e Key Conversion System.
As for the secure outsourced computation over the dataset among multiple parties, the essential difficulty is how to deal with different ciphertexts encrypted under different keys which are sent from multiple parties. Based on Gentry's fully homomorphic encryption [40], we transform the ciphertext under different keys into the ciphertext under the same key. Take two parties, Alice and Bob, as an example. Assume that their respective key pairs are (pk a sk a ) and (pk b sk b ). For a plaintext m, its ciphertext encrypted under key pk a is [m] pk a . e goal is to switch encrypted [m] pk a into a new ciphertext [m] pk b which is encrypted under the public key pk b . e conversion can be divided into the following steps: Rekey generation (pk b sk a ): taking pk b and sk a as the input, it outputs the rekey sk a � sk ai Evaluation algorithm (pk b D Π , p i sk ai ): taking the public key pk b , rekey sk a , ciphertext m, and a decryption circuit presents the i-th sample's m attributes and y i denotes the target attribute. e goal is to determine a prediction function f(x) such that f(x i ) is as close to y i as possible. us, when one makes prediction about the test data, the basic strategy is to make the prediction function to produce the smaller error. Gradient descent methods are always applied to search f(x)'s optimal coefficients. e technique can minimize the prediction error. e whole process can be described as follows. At the beginning, one determines the loss function L(x), randomly initializes a coefficient vector of f(x), and calculates the current error about the learning dataset. If the current error is not ideal, one can take the derivative of L(x) with respect to the vector, modify the coefficient vector, and update f(x) based on the derivative. en, one recalculates the loss and repeats optimizing its model until the minimum error appears. To this end, one can generate the optimal value through several iterations.
ere are four main gradient descent methods, such as classical GD, SGD, minibatch SGD, and momentum. In classical GD, the loss function is determined by all samples in each iteration which leads to high computational complexity. For SGD, the loss function is determined by a random sample every iteration which reduces computing overhead. However, this method has one weakness that, sometimes, the final coefficient vector may be the local optimal value rather than the global optimal value. When the loss function is determined by a batch of random samples every iteration, the gradient descent method is called minibatch SGD. e minibatch SGD has classical GD's and SGD's advantages and overcomes their weaknesses. So far, SGD is the most widely applied in machine learning. Momentum is the latest gradient descent method which greatly improves the accuracy and speed of the prediction. Beside the learning rate η, the coefficient vector ω � cω − η∇ω in momentum contains a new parameter c, the attenuation rate. However, our schemes can be applied to the above four main gradient descent methods. e error function of every sample Given l arbitrary samples, the loss function is e prediction function f(x) is a composition function of two functions g(z) and z � h(x), where g(z) is any differentiable function and h(x) is a linearly separable function: is the coefficient vector of the prediction function. When l � 1, the method is SGD, when 1 < l < N, the method is minibatch SGD, whereas when l � N, the method is GD. Subsequently, we update the coefficient vector ω � ω − η∇ω, where ∇ω � zLzω and η is a constant parameter called learning rate. When the coefficient vector is ω � cω − η∇ω, where c is a constant parameter called attenuation rate, this method is momentum. For As the function f(x) changes, ∇ω is also different. Here, we discuss two specific functions used in linear regression and neural network.
In linear regression, the prediction function for an arbitrary sample In neural networks, f(x) is also called as activation function that is a sigmoid function, . If the function is a sigmoid function, the prediction function for an rough the Taylor expansion formula, the function f(x) can be expanded into a polynomial T(a). en, we have

Models and Requirements
4.1. System Model. As shown in Figure 1, the system comprises five entities: data owners, a model owner, a cloud server, a key conversion server, and a trusty decryption server. Each entity is described as follows: Data owner (DO): after receiving the public parameter PP, each DO generates their own key pair and encrypts their data. en, DOs send their respective ciphertext to the cloud server, depicted as Step 1 in Figure 1. Step 5.
In our system model, each entity is semihonest except TDS. All the entities have some background knowledge of the attribute names, class names, and the number of their attributes. Each data owner has a part of the complete dataset, which can be partitioned horizontally or vertically.

Security and Communication Networks
When the dataset is distributed vertically, all data owners have the class value vector. e complete attribute dataset X is of size n × m, and the target vector y is represented as follows: where y i is x i 's corresponding class value. For the horizontally partitioned dataset, each data owner has n samples with all the attributes and the corresponding class value, as described in Figure 2. For the vertically partitioned dataset, each data owner has m i attributes with all the samples and the corresponding class value. e data owner P i 's data can be depicted as in Figure 3 e scheme consists of the preparation phase, the training phase, and the prediction phase. An overview of the scheme can be described as follows: Preparation phase: according to the public parameter PP, DOs and MO generate their respective key pairs. ey also share a secret value k in advance. en, DOs encrypt their dataset with their respective keys, while MO encrypts the coefficient vector of its model with his public key. en, DOs and MO send their ciphertext to CS, respectively. Our goal is to train the MO's target function with DOs' datasets. MO needs to get ∇ω to optimize the coefficients of the target function f(x) after renewing coefficients over MO's coefficients and DOs' datasets. We discuss two kinds of machine learning methods: linear regression and neural network. For linear regression, each MO's task is to obtain encrypted x i (y i − f(x i )) of every sample x i with the help of  Figure 1: e system model. 1 x n,2 x n, 3 x n,m • • • Figure 2: Horizontally partitioned dataset. 1 x n,2 x n, 3 x n,m • • • the CS. For the neural network, each MO's task is to obtain for every sample x i . After getting the results ∇ω, MO chooses one gradient descent method to refresh its coefficients. In the end, MO can provide accurate prediction services about the query through its optimal target function.
Since each P i encrypts D i with its public key pk i and MO encrypts its coefficients with its public key pk, CS performs computations only over encrypted data. TDS performs the decryption algorithm of the final results, while DO and MO share a secret value k. is can prevent the TDS from getting the information about the coefficients.

reat Model.
Assume that all the entities except TDS are semihonest, honest-but-curious. In other words, these entities follow the protocol, but they may try to obtain as much as secret information from the message which they receive.
Consider two kinds of adversaries in this model: an external adversary and an internal adversary. An external adversary may obtain some information, i.e., encrypted data or encrypted results, during every iteration via public channels. An internal adversary could refer to a malicious data owner DO, the model owner MO, the cloud server CS, or the key conversion server KCS. e goal of a malicious DO is to extract the coefficients of target function f(x). An internal adversary KCS tries to extract the intermediate results and the MO's coefficient vector, while the goal of an adversary MO is to reveal the information of each DO's partitioned dataset. In addition, if the CS is an internal adversary, it tries to acquire MO's coefficients or DO's datasets.

Privacy Requirements.
In the outsourced gradient descent schemes, privacy preservation is essential. In our model, we assume that the cloud server is semihonest. In order to measure the extent of privacy preservation, now, we define two privacy preservation levels. Definition 1. Explicit privacy leakage means that privacy may be exposed during the computation of the cloud server or among the message transmission over public channels. If an outsourced computation scheme can prevent the explicit privacy leakage, we call it achieving the level-1 privacy. Definition 2. Implicit privacy leakage means that one's privacy may be leaked by deducing from results of the cloud server. If an outsourced computing scheme can prevent the implicit privacy leakage, we call it achieving the level-2 privacy.
In our OPPGD scheme, DOs' data and MO's coefficient vector are uploaded to the cloud server in the ciphertext.
Explicit privacy leakage means that DOs' data or MO's coefficient vector and final desired results are leaked during the scheme. Implicit privacy leakage means that it is impossible to deduce DOs' data or MO's coefficient vector from intermediate results. Our OPPGD schemes can realize level-1 privacy or level-2 privacy.

Two OPPGD Schemes
In this section, we present two outsourced privacy-preserving gradient descent schemes over horizontally partitioned data or vertically partitioned data. For simplicity, we make the following assumptions. When data are horizontally partitioned, each DO has only one record with all the attributes and the class value. When data are vertically partitioned, each DO has one attribute of all the samples and the corresponding class vector. An outsourced privacy-preserving gradient descent scheme is composed of the preparation phase, the training phase, and the prediction phase. Now, we first describe the OPPGD scheme over horizontally partitioned data.

Preparation Phase.
e phase is involved with several essential algorithms, parameter generation, key pair generation, and encryption.
Step 1: the system runs Algorithm 1 to generate PP � (g, N) and SP � k Step 2: after receiving the PP, DOs, MO, and TDS operate Algorithm 2 to obtain their own key pair (pk i sk i ), (pk M sk M ), and (pk * sk * ) Step 3: DO encrypts its x i and y i to be [x i ] pk i � pk r i i x i mod N and [y i ] pk i � pk r i i y i mod N. en, MO encrypts its coefficient vector ω to be

Training Phase
Step 4: each DO sends their encrypted [x i ] pk i and [y i ] pk i to the CS, and MO sends [ω] pk M to the CS.
Step 5: CS operates Algorithm 4 and obtains the encrypted scalar product vector S after receiving [x i ] pk i , [y i ] pk i , and [ω] pk M from DOs and MO, where s i � [x i ] pk i · [ω] pk M . In addition, CS also makes some other computations over some components of ∇ω. To be specific, CS computes I i1 and I i2 in the linear regression model or computes I i3 , I i4 , I i5 , and I i6 in the neural network model,

Security and Communication Networks
Input: the security parameter λ Output: the public parameter PP � g, N , a secret value k (1) generate a prime N, choose a primitive element g in Z * N (2) generate a secret value k (3) end ALGORITHM 1: Parameter generation.
Input: the public parameter PP gN and a secret value k Output: the key pair (pk sk) (1) choose sk < N (2) compute pk � g sk mod N (3) end (4) return (pk sk) ALGORITHM 2: Key pair generation.
Input: the key pair (pk sk), a message m, and a random integer r i which is a coprime to N − 1 Output: the encrypted message [m] pk (1) choose a random integer r which is a coprime to N − 1 (2) compute [m] pk � pk r mmod N, g r mod N  Security and Communication Networks Step 6: CS sends the above encrypted results to the DO. After receiving encrypted scalar product S, DO performs decryption operation. e TDS and the MO perform decryption as shown in Algorithm 5 Step 7: once DOs receive encrypted results I i1 I i2 or I i3 , I i4 , I i5 , I i6 from CS, DO runs Algorithm 5 to get the new ciphertext: Step 8: DO blinds above ciphered data with the security parameter k to be kI 1 ′ kI 2 ′ in the linear regression model or kI 3 ′ , kI 3 ′ , kI 5 ′ , kI 6 ′ in the neural network model.
Step 9: DO sends these blinded encrypted results to the KCS.
Step 10: KCS operates Algorithm 6 to convert the blinded encrypted results kI 1 ′ , kI 2 ′ or kI 3 ′ , kI 3 ′ , kI 5 ω j x ij (12) which are all encrypted under the TDS's key pk * Step 11: subsequently, the KCS sends the above in- to the TDS.
Step 12: TDS runs Algorithm 5 and gets where Security and Communication Networks 9 and then TDS makes some simple computations: in the neural network model to get the final results kl 1 or kl 2 Step 13: TDS sends kl 1 or kl 2 to the MO.

Prediction Phase
In this phase, DO requests prediction with the help of the CS and MO.
Step 14: MO receives kl 1 or kl 2 and removes the security parameter k to obtain different ∇ω of each sample x i Step 15: then, the MO chooses one gradient descent method and then optimizes its coefficient vector through Algorithm 7 Step 16: each of the DO encrypts a query feature vector [q i ] pk i , and the MO encrypts its optimal coefficient vector [ω] pk M Step 17: each of the DO and MO sends [q i ] pk i and [ω] pk M to the CS, respectively.
Step 18: finally, MO, CS, and DO operate together to help the DO to extract the prediction results by operating subprotocol prediction (Algorithm 8).

OPPGD Scheme over Vertically Partitioned Data.
e OPPGD scheme over vertically partitioned data is a little different from the OPPGD scheme over horizontally partitioned data. After receiving [x i ] pk i , [y i ] pk i , and [ω] pk , CS executes Algorithm 4 n times in the first scheme, whereas CS operates Algorithm 4 nm times in the second scheme. is is because one record's m attributes are sent to the CS by its DO, Input: the ciphertext [ms] pk o of the message ms with the original public key pk o , the decryption circuit D Π o of the original encryption, and the target key pk * Output: the ciphertext [ms] pk * of the message ms.
Input: the update information ∇ω, the coefficient vector ω, the learning rate η, and the attenuation rate c Output: the renew coefficient vector ALGORITHM 7: Renewing the coefficient vector.

Target: prediction result pr′
Step 1: DO and MO send [q i ] pk i and [ω] pk M to the CS, respectively.
Step 2: CS computes pr, whereas pr � m j�1 [q ij ] pk i [ω ij ] pk M � m j�1 g sk i r i +sk M r M q ij · ω ij mod N Step 3: CS sends pr to the MO.
Step 4: MO runs Algorithm 5 and decrypts pr with its key pair (pk M , sk M ) and obtains pr � m j�1 g sk i r i q ij ω ij mod N Step 5: MO sends pr to each DO.
Step 6: MO runs Algorithm 5 to decrypt pr with its key pair (pk M , sk M ) and gets access to the desired prediction result: pr′ � m j�1 q ij · ω ij mod N ALGORITHM 8: Subprotocol prediction.
respectively. In addition, when the KCS receives the blinded encrypted results, it needs to add blinded encrypted results together m times to get the inner product of a record and the coefficient vector. For simplicity, we omit the same steps of the OPPGD scheme over vertically partitioned data as the steps of the OPPGD scheme over horizontally partitioned data.

Scheme
Correctness. Now, we prove the correctness of our proposed OPPGD scheme over horizontally partitioned data. e correctness of the other scheme can be verified in a similar manner.
Theorem 1. MO can correctly obtain ∇ω to update its coefficient vector.
Proof. After receiving [x i ] pk i , [y i ] pk i , and [ω] pk , CS computes an encrypted scalar product S, where s i � m j�1 g sk i r i +sk M r ω j x ij mod N. For linear regression, CS calculates I 1 and I 2 , whereas for the neural network, CS calculates I 3 , I 4 , I 5 , and I 6 . After receiving the encrypted results from the CS, each DO decrypts the message sent from the CS and obtains I 1 ′ and I 2 ′ or I 3 ′ , I 4 ′ , I 5 ′ , and I 6 ′ in linear regression or the neural network, respectively. en, it blinds these encrypted results with k to be kI 1 ′ and kI 2 ′ or kI 3 ′ , kI 4 ′ , kI ' 5 , and kI 6 ′ and sends them to the KCS. Consequently, KCS converts the ciphertext into kI ′ ′ 1 and kI ′ ′ 2 or kI ′ ′ 3 , kI ′ ′ 4 , kI ′ ′ 5 , and kI ′ ′ 6 under the key pk * of the TDS. TDS decrypts the above intermediate results through Algorithm 5 to produce kI * 1 and kI * 2 or kI * 3 , kI * 4 , kI * 5 , and kI * 6 . en, it computes kI * 2 − kI * 1 or kl 2 � kI * 3 + kI * 4 − kI * 5 − kI * 6 and generates the final results kl 1 or kl 2 for linear regression or the neural network. Ultimately, after the MO receives them, he removes the security parameter k and obtains ∇ω � x i m j�1 ω j x ij − x i y i in linear regression or ∇ω � (12) ) in the neural network which are equal to equation (3) or equation (5), respectively. en, MO can achieve accurate ∇ω □

Privacy and Complexity Analysis
We will analyze the privacy, computational cost, and communication overhead of the OPPGD scheme over horizontally partitioned data. We can perform analysis of the OPPGD scheme over vertically partitioned data in terms of the privacy, computational cost, and communication overhead in almost the same way. For simplicity, we omit the latter.

Privacy Analysis.
According to the definitions of two different privacy levels in Section 4.4, we conduct the privacy analysis of our proposed OPPGD scheme over horizontally partitioned data.
Proof. Upon the hardness assumption of the Diffie-Hellman problem, our proposed OPPGD schemes achieve level-1 privacy against any probabilistic polynomial-time adversary.
□ Proof. Now, we show that our scheme can preserve MO's model privacy and DO's data privacy.
In Step 3 of Algorithm 3, MO and DO hide their input via Elgamal encryption. After receiving [x i ] pk i , [y i ] pk i , and [ω] pk , the CS runs Algorithm 4 and obtains the encrypted scalar product S. Especially, MO's and every DO's encrypted input are g sk M r M ωmod N and g sk i r i x i mod Ng sk i r i y i mod N n i�1 . Upon the hardness assumption of the Diffie-Hellman problem, although CS knows MO and DO's public keys g sk i mod N and g sk r mod N, it is still impossible for them to acquire their secret keys sk i and sk. Since the randomness r i and r are chosen by DO and MO, respectively, any adversary who attempts to solve {g sk i r i y i mod N, g sk M r M ωmod N} from the public keys {g r i mod N, g r M mod N} will have to be faced with two instances of Diffie-Hellman problems. us, DO's x i and y i and MO's ω will not be exposed to other parties. When the KCS performs Algorithm 6 to convert the encrypted results {I 1 ′ I 2 ′ , I 3 ′ I 4 ′ , I 5 ′ , I 6 ′ }, it receives MO and DO's secret keys encrypted under the TDS's public key. However, TDS is a trusty decryption server, so KCS cannot obtain TDS's secret key, which means KCS knows nothing about MO and DO's secret keys and their private value. So, the encrypted results {kI 1 ′ kI 2 ′ kI 3 ′ kI 4 ′ kI 5 ′ kI 6 ′ } cannot leak any secret information. Next, TDS runs Algorithm 5 and obtains encrypted ∇ω. However, without the secret value k, TDS cannot obtain ∇ω. Hence, MO's model parameters will not be exposed.
Since MO's coefficient vector, gradient ∇ω, and DO's data will not face the privacy problem, our OPPGD schemes can provide level-1 privacy. □ Theorem 3. Upon the hardness assumption of knapsack problems, our OPPGD schemes can provide level-2 privacy against any probabilistic polynomial-time adversary.
Proof. After receiving the encrypted results from the CS, DOs run Algorithm 5 to generate new encrypted results under MO's key. For linear regression, DO knows {I 2 ′ , g sk r mod N, x i y i }. For neural networks, DO knows {I 3 ′ I 4 ′ , I 5 ′ , I 6 ′ , g sk r mod N, x i y i }. However, with the knowledge of the information, it is still impossible to acquire ω. is is because that the knapsack problem is assumed to be difficult: given a scalar product z and a vector a, it is hard to find vector b that satisfies z � ab Consequently, MO's coefficient vector and gradient results ∇ω cannot be deduced from the intermediate results all over the scheme. erefore, we conclude that our schemes can achieve level-2 privacy. □ 6.2. eoretical Efficiency Analysis. Now, we carry out the theoretical efficiency analysis of the schemes. We consider the situation for linear regression. Assume that the MO chooses the SGD method to update its coefficient vector within one epoch. In essence, MO optimizes its coefficients within several epochs. In the following, we analyze the feasibility of our proposed schemes in detail in terms of

Performance Evaluation
In this section, we evaluate the efficiency of the OPPGD scheme over horizontally partitioned data by using a custom simulator built in JAVA. e running time of the OPPGD scheme over vertically partitioned data can be evaluated in a similar way. e scenario we focus on in our paper is the data are partitioned among multiple data owners, and the target function is owned by the model owner. e model owner can not only train its model over data owner's data but also provide users with predictions. To the best of our knowledge, no other prior work in the literature discusses this scenario. So, we present detailed performance evaluation of our schemes rather than comparing them to previous works.     ere are five entities in the scheme: the model owner MO, the data owner DO, the cloud server CS, the key conversion server KCS, and the trusty decryption server TDS.
We run the data owners DOs and the model owner MO on a laptop with Intel Xeon(R) E5-1620 3.50 GHZ CPU processor and 16 GB RAM memory. e cloud server CS, the key conversion server KCS, and the trusty decryption server TDS sides are operated on a computer with Intel(R) Core (TM) i7-4770 3.40 GHz CPU processor and16 GB RAM memory.
In our experiments, DO's data X are represented as one n * m matrix, where n ranges from 1000 to 6000 and m � 50.
We evaluate the computational efficiency of our OPPGD schemes without considering communication latency. We simulate four stages: the KeyGen algorithm, the encryption algorithm, the training phase, and the prediction phase. As the data size n changes, the corresponding time cost is also different. When the key bit-length is 2048 bits, the running time of each stage of the schemes with the number of data tuples can be seen from Table 2. e calculation of the OPPGD scheme is mainly in the training stage, while the calculation cost of the rest stages is very low. We use the histogram to explicitly present the running time in the KeyGen algorithm and the encryption algorithm in   e running time in the KeyGen algorithm, the encryption algorithm, and the training phase is shown in Figure 6. In addition, when the data dimension is 6000, the running time mainly verified in the KeyGen algorithm, the evaluation algorithm, and the training phase based on various key bit-lengths is different. So, we simulate these stages and the running time, as shown in Table 3 and Figure 7. When the key bit-length is 2048 bits, the total running time of each entity in our OPPGD scheme is shown in our Table 4. According to the variation of the tuples or key bit-lengths, the running time of each party is shown in Figure 8

Conclusion
Massive work on the protection of sensitive data of IoT devices is based on the secure communication channels and authorization. In our paper, we focus on the protection of data which are collected by the IoT devices, stored, and calculated on the cloud end and the privacy of the machine learning model which is held by the MO. Gradient descent methods are employed comprehensively to train a machine learning model in the cloud computing environment. In order to preserve data privacy and model privacy during the cloud computing, we propose two secure schemes to perform outsourced privacy-preserving gradient descent methods over a horizontally or vertically distributed dataset. e proposed schemes enable the model owner (MO) to train its learning model and obtain the optimal coefficient vector based on the dataset owned by the DO with the help of CS, TDS, and KCS. After the MO improves its model, it can offer prediction service to the DO. Both the privacy of the MO's model and DO's dataset can be protected. Complexity and performance evaluation are also given in detail. In the future work, we will try to optimize our system to reduce the number of entities.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.