An Efficient Outsourced Privacy Preserving Machine Learning Scheme With Public Verifiability

Cloud computing has been widely applied in numerous applications for storage and data analytics tasks. However, cloud servers engaged through a third party cannot be fully trusted by multiple data users. Thus, security and privacy concerns become the main obstructions to use machine learning services, especially with multiple data providers. Additionally, some recent outsourcing machine learning schemes have been proposed in order to preserve the privacy of data providers. Yet, these schemes cannot satisfy the property of public verifiability. In this paper, we present an efficient privacy-preserving machine learning scheme for multiple data providers. The proposed scheme allows all participants in the system model to publicly verify the correctness of the encrypted data. Furthermore, a unidirectional proxy re-encryption (UPRE) scheme is employed to reduce the high computational costs along with multiple data providers. The cloud server embeds noise in the encrypted data, allowing the analytics to apply machine learning techniques and preserve the privacy of data providers’ information. The results and experiments tests demonstrate that the proposed scheme has the ability to reduce computational costs and communication overheads.


I. INTRODUCTION
Cloud computing, with its high data processing capabilities, is important to all applications that require high processing costs such as data processing machine learning [1]. Nonetheless, it is not appropriate to trust a third-party-based cloud system, especially with storing sensitive data. Cloud computing suffers from several security issues that represent highly debated topics. Cloud computing provides accessible computing services using on-demand, elastic, and easyto-use techniques. Indeed, cloud computing provides many resources but also possesses crucial security issues. Thirdparty storage introduces different potential risks, especially concerning data security [2].
The data storage paradigm in the cloud brings several challenges and issues which have a huge influence on the security of the system [3], [4]. Data integrity verification at untrusted servers is one of the main issues with cloud systems. Mainly, the existing schemes falling into two categories: private verifiability and public verifiability. The private verifiability can deliver higher system efficiency, while the public verifiability allows anyone, not just the data providers, to challenge the cloud server for the data correctness and without holding private information. Cloud systems attempt to employ distinct security techniques. Yet, most of these systems cannot guarantee either users' privacy or data confidentiality without using multi-layer cryptography techniques [5]- [7]. In this case, encryption is the primary technique used to ensure data security, where data are encrypted and then stored in the cloud [8], [9]. Encrypted data exploitation is also extremely difficult amidst the high complexity.
Homomorphic encryption displays a promising role in cloud computing, developing the privacy of data providers. Homomorphic encryption gives a way to achieve several services on encrypted data and improves cloud users' privacy. Accordingly, the companies store encrypted data in a public cloud as well as perform analytic services of the cloud provider's on the encrypted data. Homomorphic encryption has properties that make these encrypted data useful for companies such as random self-reducibility, re-randomize encryption, and verifiable encryption [10]. Nevertheless, the homomorphic encryption still suffers from the high computational costs. Researchers presented several contributions such as fully homomorphic encryption [11], [12], Partially Homomorphic Encryption (PHE) [13], [14] , and Somewhat Homomorphic Encryption (SWHE) [15], [16]. Herein, differential privacy is employed to guarantee the privacy of the users' data in the cloud. Differential privacy encourages the companies to collect and share aggregate information regarding the customers, at the same time they can maintain the privacy of their customers. Differential privacy displays with a probabilistic form, where the differential privacy algorithm outputs a distribution that changes little in the dataset and does not affect the privacy of an individual's data [17].
The foremost issue with cloud systems concerns the security and data privacy stored in the cloud hosting system. Most companies question their data security, and how they can trust storing their sensitive data through third-party services outside their off-line databases [18]. To this end, cryptography techniques have been proposed using various approaches that guarantee data security and users' privacy. At the user side, data encryption should be sufficient and considered as a standard form of defense that provides a high level of security in the cloud system [19].
To overcome the above issues, it is important to propose an effective privacy scheme based on machine learning given multiple data providers. Any proposed solution should reduce the cost of implementation and maintain the privacy of the participants [20]- [22]. Furthermore, it is important to address multiple data providers' problems. For example, Li et al. [23] proposed a privacy-preserving machine learning framework dealing with multiple data providers. However, this solution came with a high computational cost due to the dependence on integer factorization in their proposed framework. Additionally, none of the participating components in their proposed model can publicly verify the correctness of the outsourced data. This issue increases the overhead for all parties. Furthermore, the analyst should start the transaction with data providers through the cloud system and be online during communication.
This paper aims to overcome the mentioned above issues by proposing an efficient privacy-preserving machine learning scheme for multi-providers data in the cloud system. We propose a privacy-preserving framework using the additive homomorphic encryption scheme. First, the data providers encrypt their sensitive data using the unidirectional proxy reencryption scheme and uploaded the ciphertexts to the proxy server cloud. Then, the cloud re-encrypt the received ciphertexts with a generated noise-data using partially homomorphic encryption of the Hashed-ElGamal scheme. Finally, the cloud will send the noisy-ciphertext to the analytic to perform the machine learning techniques for his predictive analytics. All the components of the proposed framework can publicly check the correctness of ciphertexts before performing any operation which ensures public verifiability. The proposed framework guarantees a high level of security and preserves the privacy of users.
The main contributions are highlighted as follows.
• This paper proposes an efficient privacy-preserving machine learning scheme with public verifiability. • The unidirectional proxy re-encryption allows different data providers to delegate their data using the same public key. • All the participating parties in our scheme can check the validity of the ciphertext before any operations are performed. This feature can reduce the time that checks for invalid ciphertext. • This scheme can protect the privacy of the providers' data in the cloud and of the data analyst. • This scheme uses -differential privacy ( -DP), which improves the accuracy of applying machine learning techniques. This paper is presented as follows. The related works are discussed in Section II. The preliminaries are given in Section III. The proposed scheme is explained in Section IV, while the result and discussion are introduced in Section V. The security analysis of our protocol is discussed in Section VI. The . Finally, the conclusions are given in Section VII.

II. RELATED WORK
Most companies currently believe that machine learning will be a key customer expectation [24]. In this regard, machine learning performs an important role in technology generally, especially upon cloud computing. The extensive developments of the machine learning community have reduced the network overhead. However, these developments affect the computational cost, as most applications have high computational costs [1], [25]. Many machine learning techniques have been adopted to automatically employ complex mathematical computations, thereby suffering high computational costs.
Recently, machine learning over encrypted data has become an important topic in industry and academy. Various approaches have been introduced to overcome these challenges such as Partial traditional homomorphic encryption schemes. Unfortunately, these approaches are inefficient because they suffer from high computational costs, high network overhead and certain security issues. Several protocols have been proposed by researchers such as [19], [26]- [28]. For instance, Chen et al. [29] introduced a twoparty distributed algorithm to preserve privacy for backpropagation neural networks (BPNNs). Their scheme allows the two parties to train their data while ensuring that the data are secure. They only considered training datasets that are vertically partitioned. To improve on previous work, Bansal et al. [30] proposed another algorithm that can be applied when the dataset is arbitrarily partitioned. Both Chen et al. [29] and Bansalet al. [30] used the homomorphic properties to protect the privacy of the two parties.
The aforementioned works do not work properly in multiparty environments because of the high communication overhead. Samet and Miri [31] presented a protocol to protect the privacy of both the input data and the created learning model in a BPNN and extreme learning machine. Their protocol can be applied when the dataset is vertically or horizontally partitioned. Graepel et al. [32] proposed a scheme to ensure the confidentiality of the encrypted data during machine learning over these data in the training and test phases. Their algorithm can be applied to two types of classification algorithms: linear means and Fisher's linear discriminant [32]. Liu et al. [33] worked on preserving the users' privacy in social networks and keeping their sensitive information secure, considering the identity disclosure problem in weighted social graphs.
Nowadays, with the development of cloud computing and outsourcing, several techniques have proposed to guarantee users' privacy. For example, Wei et al. [34] presented practical outsourcing algorithms for exponentiation computation based on homomorphic mapping. Gao et al. [35] worked to compute the social contiguity between users to identify potential friends and keep their privacy based on a proxy reencryption scheme with additive homomorphism.
The notation of differential privacy approach has been introduced to overcome privacy and accuracy issues [36]. Drwok [37] proposed the -DP approach, which considers how much data can be revealed. Consequently, researchers have introduced various approaches based on the formal definition of DP, for instance, computational differential privacy [36] as well as the differential privacy consensus algorithm [38], as discussed herein. Recently, Li et al. [23] proposed a privacypreserving machine learning framework that is suitable to work with multiple data providers. They employed a double decryption public-key encryption scheme to ensure security and user privacy. However, their framework has high computational costs and communication overhead. Additionally, the framework consuming time on the invalid ciphertext.

III. PRELIMINARIES
In this part, we illustrate some related notations and concepts. First, the computational Diffie-Hellman assumption is described below. We also introduce the divisible computational Diffie-Hellman assumption, which concerns the difficulty of measuring the discrete logarithm in cyclic groups. Then, we present unidirectional proxy re-encryption, which is primary for the proposed protocol implementation. We illustrate the partial homomorphic hashed-ElGamal encryption scheme for predictive analytics. Finally, we present a brief introduction to Differential Privacy

A. COMPUTATIONAL DIFFIE-HELLMAN (CDH)
Let assume that we have g, g a , g b ∈ G, where ∀{a, b} ∈ Z * q and G is a cyclic multiplicative group with prime order q, then g ab ∈ G cannot be computed due to its difficulty.

B. DECISION DIFFIE-HELLMAN (DDH)
Let assume that we have two distributions A = (g x , g y , g xy ) and B = (g x , g y , g z ) for randomly distributed x, y, z ←− Z q . Distinguish A from B [39].

C. DIVISIBLE COMPUTATIONAL DIFFIE-HELLMAN (DCDH)
In this part, we assume that we have g, g a , g b ∈ G, where ∀{a, b} ∈ Z * q , then g b/a ∈ G cannot be computed due to its difficulty. The DCDH and CDH are equivalent in the same group [40].

D. UNIDIRECTIONAL PROXY RE-ENCRYPTION (UPRE) SCHEME
The definition of the unidirectional PRE scheme [41] is composed of six algorithms, described as follows: • Initialization(k): This algorithm uses a security parameter k as input. Then, the algorithm returns the public parameters param. Additionally, the message space M description is given in this algorithm. • KeyGen(): This algorithm calculates the users' public key and corresponding private key pair (pk ui , sk ui ). • ReKeyGen(sk ui , pk a ): This algorithm uses the private key of the delegator sk ui and the public key of the delegate pk a . Then, the algorithm returns a re-encryption key rk ui→a . • Enc(pk ui , m): This algorithm uses the public key pk ui , the delegator and the message m ∈ M. Then, the algorithm returns a ciphertext C i under pk ui . • ReEnc(rk ui→a , C i , pk ui , pk a ): This algorithm uses rk ui→a as input to C i . Then, the algorithm returns a ciphertext C a under the public key pk a . • Dec(sk, C): This algorithm uses sk and C. Then, if the ciphertext is valid, the algorithm returns the plain message m ∈ M; otherwise, the algorithm throws an error. This algorithm uses for the delegator User Dec() and the delegate Analyst Dec().

E. PARTIAL HOMOMORPHIC HASHED-ELGAMAL ENCRYPTION SCHEME
Hash-ElGamal is partially homomorphic using only xor operator. Assume that the encryption of message m is known, then, anyone can compute the encryption of a message m = m ⊕ K for any selected value K ∈ {0, 1}. Lets us assume that we have C = (c 1 , c 2 ) with c 1 = g r , and c 2 = m⊕h(y r ).
This is the hash-ElGamal encryption of m . The partially homomorphic based on Hash-ElGamal encryption supports adding a mask on the ciphertext. Furthermore, this encryption is known to be one-way under the computational Diffie-Hellman assumption, and indistinguishability holds under the DDH assumption [39].

F. DIFFERENTIAL PRIVACY
Differential privacy is a probabilistic mechanism, where the algorithm outputs a distribution that changes little in the dataset and does not affect the privacy of an individual's data.
Assuming that d 1 and d 2 are two data sets; d 1 , and d 2 can be neighboring if they are different in only one record [17]. Particularly, DP guarantees the security of the distributed VOLUME 4, 2019 data by adding measured perturbations to the ciphertext using the principal of homomorphic encryption [42]. Definition 1 ( -DP). A randomized mechanism R isdifferentially private if we have any pair of neighboring data sets d 1 and d 2 , and K in Range(R). Then, the following equation holds: Note that if epsilon can get smaller, implying more stringent privacy. This randomized-based algorithm ensures differential privacy during the analytic process.
Definition 2 (Sensitivity). Let f be a function f : d → R d in the input space of the dataset. The sensitivity of f represents two neighboring datasets d 1 and d 2 , given as follows: In this equation, the maximum is over the pairs d 1 and d 2 in R d differing in one element (at most) and · 1 symbolize the 1 norm. Definition 3 (The Laplace mechanism). The Laplace mechanism adds noise from Laplace distribution by using the probability density function. Where, noise(y) ∝ exp(−|y|/λ), with a mean equal to zero and standard deviation equal to √ 2λ. In our work, we use the Laplace mechanism which is given as follows: Where, we have a function f : N |x| → R k . Laplace's mechanism relies on adding measured noises to the ciphertext that we want to compute.

G. SYSTEM MODEL
The considered system model consists of data providers, proxy servers and data analysts, as shown in Figure 1. The communication between these components is explained as follows: 1) Data providers u i ∈ {u 1 , u 2 , u 3 , ..., u i } provide the system with data from different sources. u i needs to upload their sensitive data set d i after encrypting it to the proxy server and delegate the encrypted sensitive data set [d i ] to the analyst. 2) The proxy cloud server (P S ) is responsible for redirecting the encrypted data set by the users to the analyst. P S is a semi-honest cloud with high computational power. 3) Analyst D A receives the encrypted data set and trains the machine learning model on it. The analyst can perform training on these ciphertexts without compromising the privacy of the data providers.

H. SECURITY MODEL
Suppose that the data providers u i , P S , and D A are semihonest but untrusted. Furthermore, it is assumed that there is no collusion between the parties participating in the system model. Algorithm B answers the adversary queries according to our scheme. Adversary A has the following capabilities for attacking the plaintext of the users: 1) A can collude with u i to obtain the plaintexts of all encrypted data downloaded from the cloud. 2) A can attack P S to estimate the plaintexts of all ciphertext outsourced to the P S by the users u i and all data sent from D A . 3) A may corrupt some data from u i to produce the plaintext from other users' ciphertexts.

IV. THE PROPOSED FRAMEWORK
The following subsections explain the proposed framework structure which contains an efficient privacy-preserving machine learning scheme with public verifiability.

A. OVERVIEW OF THE PROPOSED SCHEME
In the proposed framework, the data providers encrypt their data sets using the UPRE scheme and uploaded it to the proxy server cloud. In this step, the data providers could asset their data sets and decide which data are sensitive to be encrypted. Then, the cloud re-encrypt the ciphertexts received from the data providers under the public key of the analyst. Accordingly, the server will add encrypted noised to the data provider ciphertexts using the partially homomorphic encryption of the Hashed-ElGamal scheme. After adding the encrypted noise, the cloud forwards the noise-ciphertext to the data analyst. In this step, the data analyst will decrypt the noise ciphertext to get the noise dataset. Then, the analyst performs machine learning algorithms for his predictive analytics. For instance, the analyst could use a k-nearest neighbor classifier, support vector machine classifier, naive Bayes classifier, and so on.
The data analyst will decrypt first the noise ciphertext to get the noise dataset using his private key. Then, the data ana-lyst chooses and performs the mentioned-above classifiers on the noised data set using -DP without revealing the privacy of the individual users.
In our proposed framework, all the components of the model can publicly check the correctness of ciphertexts before performing any operation which ensures public verifiability. This part explains a significant contribution to this work because using this feature can reduce the waste time on invalid ciphertexts.

B. STRUCTURE OF THE PROPOSED SCHEME
Our protocol consists of the following phases: • Initialization Phase: Here, the data providers prepare their public and private keys. Then, they compute the re-encryption key to use for redirecting their ciphertext to the data analyst. To do that, the data providers need to prepare the public parameters. p and q are selected as primes where q|p − 1 and the bit-length of q is the security parameter k. The g uses as a generator of group G, which is a subgroup of Z * q with order q. Four hash functions The following algorithms represent this phase: 1) KeyGen(): Pick sk ui = (s ui,1 ← Z * q , s ui,2 ← Z * q ) and set pk ui = (pk ui,1 , pk ui,2 ) = (g su i ,1 , g su i ,2 ). 2) ReKeyGen(sk ui , pk a ): As input, with the private key of the users sk ui = (sk ui,1 , sk ui,2 ) and the public key of the analyst pk a = (pk a,1 , pk a,2 ), this algorithm generates the re-encryption key rk ui→a as follows: In this phase, the data providers will upload their encrypted data to cloud. The data provider's data set is represented by denoted the data vector and y i ∈ Y := {0, 1} denotes the associated binary label. First, the data providers encrypt their sensitive data x i using the Enc(pk ui , x i ) algorithm with their public key pk ui = (pk ui,1 , pk ui,2 ) . Algorithm 1 details how the users encrypt x i ∈ M.
2: Pick w ← Z * q and compute ri = H1(x i , w). 3: Compute Ei = (pk H 4 (pk u i ,2 ) u i ,1 pku i ,2) r i and Fi = H2(g r i ) ⊕ (x i ||w). 4: Compute µi = β + ri.H3(ϕ, Ei, Fi)modq. The final result of the Enc(pk ui , x i ) algorithm can also be represented as tuple of [x i ] = (E i , F i ). Therefore, the data set can be written as Enc(pk a , where ϕ, µ i used as signature to confirm the ciphertext correctness. Second, the data providers determine the sensitivity level of their query function ∆f i and the privacy level i for d i . Finally, they send the encrypted data set [d i ], ∆f i , i and the re-encryption key rk a→b to P S . • Download Phase: This phase illustrates how the data providers can download their ciphertext from the cloud. The data providers use the Dec(sk i , C i ) algorithm to download their outsourcing encrypted data from P S . The Dec(sk ui , C i ) algorithm takes sk ui = (sk i,1 , sk i,2 ) and ciphertext C i as input to get back the corresponding data set. Data providers can verify the correctness of C i using (ϕ, µ i ) to receive the valid ciphertext. This step was used for the publicly verifiable experiments. Algorithm 2 describes how the data providers can decrypt their encrypted data and apply publicly verifiable data.

Algorithm 2 User Decryption Algorithm
Input: leftmargin=4mm ski = (sku i ,1, sku i ,2): private key -Ci = (Ei, Fi, ϕ, µi): the ciphertext Here, the cloud uses Re-encryption algorithm and transfers the generated ciphertext to the Analytic. The cloud receives [d i ], ∆f i and i from the data providers. Algorithm 3 details how P S re-encrypt C i with multiple public keys using the public key of the analyst. In this part, P S checks the correctness of the ciphertext VOLUME 4, 2019 using ϕ, µ i . P S uses the ReEnc(rk ui→a , C i , pk ui , pk a ) algorithm. Second, this algorithm uses rk ui→a = (rk 1 a→b , V, W ) and (E i , F i ) under the public key of the data providers pk ui = (pk ui,1 , pk ui,2 ) as inputs. Then, ReEnc(rk ui→a , C i , pk ui , pk a ) returns a new ciphertext [x i ] a under the public key of the analyst pk a = (pk a,1 , pk a,2 ). Which means that the inputs of C i with multiple public keys transfer to another ciphertext with the public key of analyst pk a = (pk a,1 , pk a,2 ).  . In this step, the cloud uses partial homomorphic hashed-ElGamal encryption [43]. Finally, the cloud sends [d i ] to the data analyst. • Learning phase

Algorithm 3 Re-encryption Algorithm
Here, the analyst obtains the noisy ciphertexts from the cloud. Algorithm 4 describes this phase in detail. First, D A checks the validation of [d i ] using V, W . Second, the analyst decrypts E i with the associated public keys of the data providers. Finally, the data analyst can obtain a noisy data set and apply his machine learning classification algorithms.

V. RESULT AND DISCUSSION
In this section, the implementation of the proposed scheme is presented with different perspectives in terms of results, performances, and discussions. The following subsections describe the implementation setup, the experimental results, and the discussions.

A. EXPERIMENT SETUP
To explain the theoretical analysis of computational complexity of our scheme, We denote by T exp the computational cost of exponentiation in G. Then, we have the Encryption time is 3 T exp , re-Encryption time is 2.5 T exp , the user Decryption time is 3.5 T exp , and analyst decryption time is 4 T exp .
To evaluate the computational costs of the proposed scheme, the experiment is conducted using Java pairingbased cryptography (JPBC) Library [44]. In this work, we employ a personal computer with CPU Intel Core i 7-3537U dual core (2.00 and 2.50) GHz and RAM 12 GB. Additionally, we use the curve y 2 = x 3 + x over the field F q to obtain type A pairings for q = 3 mod 4. To obtain security level, the experimental employ 80-bit, 112-bit, and 128-bit AES key size security level as shown in Table 1.

B. COMPUTATIONAL COST
This part illustrates the computational cost of our proposed framework. The processing costs of the proposed scheme are given according to the computational time. Figure 2 shows the computational times of our protocol phases using different security levels 80-bit, 112-bit, and 128-bit, respectively.

C. CIPHERTEXTS SIZE
The proposed scheme computes the ciphertexts with different security levels. For instance, we use the security level with FHE Javier et al. [46] AHE 80-bit. Then, the elliptic curve with q = 160/8 bytes is employed. The size of G 1 is given as 1024 bits. The size of G 1 can be used with 65 bytes according to the related work [47]. As a result, the size of data providers' ciphertexts uploaded to the cloud is 3|G| + |Z q | = 3 × 65 + 160/8 = 215 bytes. The re-encrypted ciphertexts size will be transformed to the analyst is 2|G| + 2|Z q | = 2 × 65 + 2 × 160/8 = 170 bytes.

D. PERFORMANCE OF THE -DP
The proposed scheme transforms the encrypted data with multiple public keys into noise-ciphertext. Accordingly, the proposed scheme improves the efficiency and accuracy of data processing. The analyst performs the chosen machine learning algorithm on noise-data set with -DP and without revealing any information about the users. By using partial homomorphic hashed-ElGamal encryption scheme, the evaluation of the performances show excellent results in terms of accuracy and efficiency according to several related works [23], [43]. The analyst can use different machine learning such as naive Bayes classifier, support vector machine classifier, and a knearest neighbor classifier, etc.

E. PUBLIC VERIFIABILITY
The public verifiability is an important property, enabling a third party system to verify the integrity of the ciphertext stored in the cloud on behalf of data providers. Hence, the goal of our work is to guarantee data integrity with public verifiability and availability.
In the proposed protocol, all the components of the proposed model can publicly check the correctness or the validity of all the ciphertexts in public before doing any operation which is called public verifiability. Using this feature we can reduce the time that consuming for working on the invalid ciphertexts.

F. COMPARATIVE STUDY
In this part, we provide a comparative study between our proposed scheme and some relevant state-of-art schemes [23], [35], [45], [46]. Table 2 describes several points in our comparison including the variations and contributions of these schemes. The first column represents the state-ofart schemes [23], [35], [45], [46]. The second column shows whether a scheme needs to pre-sharing information or not to perform the computations between the data providers and the cloud. The third column illustrates whether all the actors in the systems should be online or not during the operations.
The fourth column presents the type of homomorphic encryption used for each scheme. In this column, we denote "Fully Homomorphic Encryption" by "FHE" and "Additive Homomorphic Encryption" by "AHE". The fifth column illustrates the use of the differential privacy technique or not for each scheme. The sixth column compares the ability to work with multiple data providers or not. The seventh column explains whether the delegator in these schemes can re-direct the ciphertext to another actor in the system or not. The eighth column shows the resistance of the schemes against the collusion attack. Finally, the last line explains whether all the components in the system model of these schemes [23], [35], [45], [46] can check the correctness of the data providers' ciphertexts publicly or not.

VI. SECURITY ANALYSIS
This section introduces the security analysis with the ciphertext scenarios and the adversary relevant information. In detail, We adopt two security definitions: Adversary attacks the original ciphertext. Adversary attacks the re-encrypted ciphertext. Theorem The UPRE scheme is indistinguishability against chosen-ciphertext attacks (IND-PRE-CCA) secure in the random oracle model, where the CDH assumption is hard to solve in a group G.
Assume that there is an adversary A to attack the IND-PRE-CCA security of the proposed scheme. Then, there exists an algorithm B to solve the CDH problem.
Here, we consider two types of adversaries: the first type of adversary attacks the original ciphertext of the users (encrypted data uploaded to the cloud). We denote this type by A or . The second type of adversary attacks the transformed ciphertext from the proxy cloud to the data analyst. We denote this type of adversary by A tr . Therefore, algorithm B answers the adversary queries. Setup Assume that B submits the public parameters (q, G, g, H 1 , H 2 , H 3 , H 4 , e 0 , e 1 ) to A. B simulates the random oracles of H 1 , H 2 and H 3 with lists {L H1 , L H2 , L H3 }, respectively, to avoid collision and ensure consistency. B also prepares two lists L K for the public and private key and L R for the re-encryption key. B starts generating the original keys and corrupted keys. Lemma 1 Considering A or for communicating with B according to the IND-PRE-CAA game Phase 1 A or issues a series of queries, and B answers A or as in the proposed scheme. Challenge A or challenges B and returns (pk u * i ,1 , pk u * i ,2 ) as well as two messages of the same length m 0 , m 1 ∈ {0, 1} e0 .
Then, B responds to A or with the challenge ciphertext C * = (E * , F * , ϕ * , µ * ) contains the instance element of the DCDH problem in H 1 (m ς , w * ) = ab and H 2 (g ab ) = (m ς ||w * ) F * for ς ∈ {0, 1}. Phase 2 A continues to issue queries as in Phase 1 with the restrictions described in the IND-PRE-CCA game. Algorithm B responds to these queries for A as in Phase 1. Guess Eventually, A or responds with a guess ς and sends it to B. Finally, B returns the solution of the DCDH instance. Lemma 2 Considering A tr communicating with B according to the IND-PRE-CAA game. Phase 1 A tr issues a series of queries, and B answers A tr as in the proposed scheme. Challenge A tr challenges B and returns (pk u * i ,1 , pk u * i ,2 ) as well as two messages of the same length m 0 , m 1 ∈ {0, 1} e0 . Then, B responds to A tr with the challenge ciphertext C * = (E * , F * , V * , W * ) contains the instance element of the DCDH problem in H 1 (m ς , w * ) = r * = (b/a)(t/rk i →i * (x i * ,1 H 4 (pk i ,2 ) + x i ,2 )) and H 2 (g a/b ) t/rk i →i * (x i * ,1 H4(pk i ,2 )+x i ,2 ) (m ς ||w * ) = F * for ς ∈ {0, 1}. Phase 2 A tr continues to issue queries as in Phase 1 with the restrictions described in the IND-PRE-CCA game. Algorithm B responds to these queries for A tr as in Phase 1. Guess A or responds with a guess ς and sends it to B. Finally, B returns the solution of the DCDH instance. The UPRE scheme is IND-PRE-CCA secure in the random oracle model.
If A has corrupted D A or P S to get the outsourced data, then, A cannot get the plaintext due to the IND-PRE-CCA of our scheme. Additionally, if A obtains access to some data providers and if the re-encryption key cannot provide access to the plaintext of the data providers, our scheme achieves -DP. Our scheme is secure under the DCDH in the random oracle model.

VII. CONCLUSION
Cloud computing security is still considered as a major issue, especially with privacy-preserving of the data providers to the third party systems. This paper presents an efficient privacy-preserving machine learning scheme for multiproviders with the collaboration of a third party system. In this regard, the proposed protocol employed a unidirectional proxy re-encryption protocol to protect cloud data sets. All parties can publicly verify the encrypted data sets, which reduces the computational cost and network overhead. The proposed scheme is secure under the CDH assumption in the random oracle model. The proxy server adds noise to the ciphertext using -DP, rather than the data providers, to facilitate data analytics tasks. Our proposed protocol guarantees a secure multi-party computation and privacy-preserving classification based on partial homomorphic encryption.
For future works, we plan to study the parallel algorithm for secure multiparty computation on Blockchain techniques. We also aim to investigate the using of more complex com-putations such as partial differential equations over encrypted data.
PING LI received the M.S. and Ph.D. degree in applied mathematics from Sun Yat-sen University in 2011 and 2016, respectively. Later, she worked at Guangzhou University as a postdoc from 2016 to 2018. Currently, she is a researcher with the School of Computer Science, South China Normal University. Her current research interests include cryptography, privacy-preserving and cloud computing.