Secure Privacy-Preserving Association Rule Mining With Single Cloud Server

To preserve the privacy of data uploaded on the cloud, it is widely accepted to encrypt the data before uploading it. This leads to the challenge of data analysis, especially association rule mining while protecting data privacy. As one of the solutions, homomorphic encryption is presented allowing encrypted data processing without decryption. In particular, the twin-cloud structure is frequently applied in the privacy-preserving association rule mining schemes based on asymmetric homomorphic encryption, which contradicts the reality that most of the practical applications applied the single cloud server. However, the existing related single cloud server schemes suffer from privacy leakage problems. To fill this gap in the literature, in this paper, we first present a universal secure multiplication protocol with the single cloud server using the garbled circuit and additive homomorphic encryption. Based on this multiplication protocol, we construct the inner product protocol, comparison protocol, frequent itemset protocol, and the final association rule mining protocol that is secure against privacy leakage. Finally, we give the theoretical security analysis of the proposed protocols and show its performance analysis.


I. INTRODUCTION
In the information era, people's life is transferred into data and stored on the network, such as shopping preference, traffic line, and hobbies. This promotes the industry field to focus on data mining techniques for the great economic value that the mining results will bring. In particular, association rule mining is one of the directions being concerned, which is to search out the related dependencies between items in the given dataset. For example, it can analyze the customers' shopping baskets, and the supermarket can accordingly adjust The associate editor coordinating the review of this manuscript and approving it for publication was Mansoor Ahmed . the placement of commodity shelves to implicitly obtain more benefits. Association rule mining has been widely applied in many fields, such as business [1], health analysis [2], and website analysis [3]. Along with the increasing amount of data needed to be analyzed, the cloud that can support strong computing and storage power is widely accepted and applied [4]. As a natural extension, outsourced association rule mining allows the cloud server to perform the data mining work on the uploaded database. However, the cloud platform which is run by the not fully-trusted company in the practical applications may lead to the privacy leakage problem or even economic loss. Various solutions are, therefore, proposed to further realize privacy-preserving against the malicious cloud server in the outsourced association rule mining, such as data perturbation [5], [6], anonymization [7], [8], differential privacy [9], [10], and cryptography-based methods [11], [12].
Data perturbation [5], [6] adds noise to the data to create perturbed data, where the noise is large enough to mask the record's specific attribute value. When performing computing tasks, the perturbed data is iteratively reconstructed to obtain an approximate original data distribution. It is easy to lose the original data and decrease the effectiveness of the results. As a classic technique of anonymization [7], k-anonymity [8] helps the data to be invisible by constructing the equivalence class which contains at least k pieces of indistinguishable data. This method may cause privacy disclosure since the sensitive attributes in the equivalence class are not constrained. Anonymization relies too much on background knowledge assumptions, and association attacks, link attacks, and multi-source attacks may break its security. Differential privacy [9], [10] allows the cloud server to analyze the entirety of a data set without revealing individual privacy, which does not care about the background knowledge possessed by the adversary. Even if the adversary has obtained the information of all the records except a certain record, it cannot get the privacy of this record. However, when the dimension of the data is high, the noise of the differential privacy processing result will be very large, resulting in the low usability of the result.
Compared with the abovementioned techniques, we have cryptography-based methods that can usually obtain an accurate and effective mining result at a high-security level, which means encrypting and then uploading the data to the cloud to further protect data privacy against curious or malicious cloud server. The universal system model in the outsourced association rule mining contains the cloud server, the data owner, and the rule miner. The existing related schemes always design their system model based on two noncollusion cloud servers, where cloud servers are responsible for the computations generated during the mining process and the data owner and miner can be offline. Such schemes guarantee a high level of privacy security for each participant in the entire mining process, while they are based on a strong assumption that the two cloud servers cannot collude with each other. In addition, one of the cloud servers in the twincloud setting schemes usually owns the decrypting ability, which leads to hidden security risks. The single-cloud-server setting [13], [14] can avoid the aforementioned trouble, while most of the existing schemes suffer from the privacy leakage problem. To fill this gap in the literature, we propose a secure privacy-preserving association rule mining scheme based on a single-cloud-server setting.

A. RELATED WORK
As a classical topic, privacy preservation of data has been put a lot of attention on [15]- [18]. In particular, Various research has been conducted on privacy-preserving outsourced association rule mining to further protect data privacy against the curious or malicious cloud server while correctly performing the association rule mining process in cloud computing. Vaidya and Kantarciogl [19], [20] first considered the privacy-preserving requirements in data mining and proposed the idea of defining and measuring privacy. They [19], [20] presented the privacy-preserving association rule mining schemes in vertically and horizontally partitioned databases, respectively.
Data Perturbation and Anonymization: To resist background attacks in the mining process, Wong et al. [21] proposed one-to-many item mapping, which can randomly convert transactions. The core idea of the scheme is to add noise to the transaction database. In 2010, Tai et al. [5] proposed the conception of k support anonymity in which k-1 fake items with similar support will be added to protect the privacy of given items. Giannotti et al. [5] improved the security of this primitive to k privacy, where a converted item set is indistinguishable from at least k-1 other item sets by adding fake transactions to the database. This scheme trades off the privacy-preserving security against computation efficiency.
Outsourced Privacy-Preserving Association Rule Mining: Lai et al. [22] proposed the first outsourced privacypreserving association rule mining scheme with semantic security based on predicate encryption, while its efficiency is still undesirable in practice. Yi et al. [23] proposed a privacy-preserving association rule mining protocol based on Elgamal homomorphic encryption algorithm. In their protocol, a data miner can outsource all computations to cloud servers, and data owners only perform some encryption operations. However, this protocol requires n servers to perform distributed homomorphic computations, which will bring a lot of communication between cloud servers.
Outsourced Privacy-Preserving Association Rule Mining With Twin-Cloud Setting: In 2017, Qiu et al. [24] proposed a twin-cloud structure, which is currently a commonly used system model framework. They constructed a scheme for frequent itemset mining with three privacy levels using Paillier [25] and [26] homomorphic encryption. But none of the three privacy levels can guarantee the security of mining results. Ma et al. [27] improved the mining efficiency of this work [24] and designed a block algorithm for frequent itemset mining, where encrypted transactions can be divided into multiple blocks based on the sparse attributes of the transaction database. Kim et al. [28] proposed a privacy-preserving association rule mining on the twin-cloud structure without adding fake transactions. Besides, Liu et al. [29] proposed a privacy-preserving association rule mining scheme for outsourced cloud data in a multi-key environment based on the BCP homomorphic encryption [30], [31]. Pang and Wang [32] presented a privacy-preserving association rule mining scheme in a multikey environment under the twincloud setting.
Outsourced Privacy-Preserving Association Rule Mining With Single Cloud-Sever Setting: In 2016, Li et al. [13] constructed one symmetric fully homomorphic encryption algorithm and designed a privacy-preserving association rule mining scheme on a vertical partition database based on this algorithm. The system model of this solution uses a single cloud server structure. Although the scheme is very efficient, the original data may be leaked. Wu et al. [14] proposed an association rule mining scheme called SecEDMO under the single cloud server structure. They constructed a secure symmetric homomorphic encryption algorithm. But if there is a collision between two data owners, it may lead to the disclosure of sensitive information of others.

B. OUR CONTRIBUTION
To remove the strong assumption that two cloud servers will not collude with each other in the twin-cloud setting and fill the gap that the single-cloud setting always leads to privacy disclosure, we propose a secure privacy-preserving association rule mining scheme based on a single-cloudserver setting.
Based on the garbled circuit and additive homomorphic encryption, we construct a universal secure multiplication protocol that can securely compute the product of the secret bits between the data owner and the data miner with the single-cloud setting. To further facilitate understanding, we instantiate it into a concrete construction by applying the Paillier cryptosystem as the additive homomorphic encryption. Based on this protocol, we construct our inner product protocol with which the inner product of the mining query vector and transaction record vector can be securely computed. By proposing another comparison protocol, we construct the frequent itemset mining protocol and the final association rule mining protocol. Finally, we give the theoretical security of the proposed protocols which has proven that our protocol is secure in the defined security models and show its practical performance analysis.

C. ORGANIZATION
The rest of this paper is organized as follows. Section 2 introduces the preliminary knowledge for designing our protocols. In Section 3, we define the system model and attack model. In Section 4, we give a concrete construction including a multiplication protocol, inner product protocol, comparison protocol, frequent itemset protocol, and association rule mining protocol. In Section 5, we give the security proof of our protocols. In Section 6, we show the performance analysis. Finally, in Section 7, we give the conclusion.

II. PRELIMINARIES
In this section, we recall the definitions of some concepts, including frequent itemset mining and association rule mining, Paillier cryptosystem, and garbled circuit.
To begin with, we list the basic notations throughout this paper in Table 1. It is worth noting that both the transaction records t i ∈ T , i = {1, 2, · · · , m} and the mining query q are represented as binary bit strings with bitlength l, where l denotes the number of items in the database.

A. FREQUENT ITEMSET MINING
Frequent itemset mining [33] finds frequently occurring item sets, sequences, or substructures from the database and provides some support for possible decisions. It is also the basic concept of association rule mining.
We take an example to explain the definition of frequent itemset mining with the given item set I = {item 1 , item 2 , · · · , item l } and transaction record set T = {t 1 , t 2 , · · · , t m }. For a mining query q and a component holds, the transaction record t i is said to support the mining query q, which means the items in the mining query q are also contained in the conponent t i , where q 1 denotes the 1-number of q. The total number of transaction records supporting the mining query q in T is called the support of the mining query q, namely supp(q). Compared with the support threshold supp min , if supp(q) ≥ supp min , the mining query q is defined as a frequent itemset in T .

B. ASSOCIATION RULE MINING
Association rule mining is to find the currently unknown and valuable relationship between some items in the database under the premise of frequent itemset mining. The association rule between the itemsets X and Y is defined as X ⇒ Y , where X ⊆ I , Y ⊆ I , X , Y = ∅, X ∩ Y = ∅. The index that measures the close relationship of the association rule X ⇒ Y is called confidence, namely conf (X ⇒ Y ), we have It can also be used to describe the proportion of Y in the presence of X .
In particular, taking into a threshold conf min of the confidence into consider, X ⇒ Y is called a strong association rule if conf (X ⇒ Y ) ≥ conf min and X ∪ Y is a frequent itemset. And it should be known that in practical applications, the industry field always concentrates on finding strong association rules to pursue more economic benefits. Our association rule mining protocol, therefore, is to find strong association rules.
We still take an example to illustrate this definition. As shown in Table 2, people's location records in the daily life are listed. Each location record can be represented as a binary vector, as is the mining query. If a location is included in a person's location record, it is represented by 1. Otherwise, it is represented by 0. Assuming that the mining query is q = (0, 1, 1, 0, 0), to determine whether the location information record t i supports q or not, we perform the following.
• Compute the inner product of two vectors t i , q to obtain p i .
• Check whether p i = q 1 holds or not.
--If p i = q 1 , t i supports q.
--Otherwise, t i does not support q. Therefore, the support for mining query q is calculated as where q · t i denotes the inner product of q and t i . For a rule X ⇒ Y , to find whether it is a strong association rule, we perform the following.
Otherwise, it is not.

C. PAILLIER CRYPTOSYSTEM
The Paillier homomorphic encryption [25] is one of the most practical addition homomorphic encryption algorithms. We recall its definition as follows.
• Enc(pk, M ): Taking as input the public key pk and a message M ∈ Z N , the encryption algorithm randomly selects r ∈ Z N and computes • Dec(sk, C): Taking as input a secret key sk = λ and a ciphertext C, the decryption algorithm computes In terms of the homomorphism, Paillier encryption owns several interesting features as follows.
. Given a number s ∈ Z N and a ciphertext C = g M r N mod N 2 , we can obtain which is an encryption of sM .
which is still an encryption of M .
which is an encryption of −M .
• Given a ciphertext C 1 = g M 1 r N 1 mod N 2 , we can compute

D. GARBLED CIRCUIT
Garbled circuit [34] enables two-party secure computation, where two mistrusting parties can jointly evaluate a function over their private inputs and the function is described as a Boolean circuit.
Below, we take the scenario in Fig.1 as an example to illustrate the workflow of garbled circuit, where Alice owing VOLUME 9, 2021  bit u ∈ {0, 1} and Bob owing bit v ∈ {0, 1} intends to perform the logical AND privately on u, v using the garbled circuit. The AND gate has two input lines x, y and one output line z, and each line has two possible values of 0 and 1.
• Alice sets two random key-values for each line representing 0 and 1 respectively, namely K 0,i , K 1,i for line i ∈ {x, y, z}.
• Alice uses the key-value to encrypt the truth table as Table 3. It then sends K u,x , K 0,y , K 1,y and the disordered encrypted truth table to Bob.
• Bob applies K u,x , K v,y to decrypt each row of the disordered truth table and returns the decryption results K w , w ∈ {1, 2, 3, 4} to Alice.
• Since only one row can be successfully decrypted, there will be only one w ∈ {1, 2, 3, 4} satisfying K w = K k,z , k ∈ {0, 1}. The logical AND of u and v is, therefore, k. Since the received key-values are all randomly chosen numbers from the poing of view of Bob, he will not obtain any additional information.

A. SYSTEM MODEL
We are committed to achieving privacy-preserving association rule mining with the single-cloud setting in the scenario where a miner submits the mining query to the cloud server that has collected a large set of encrypted transaction records from data owners.
As shown in Fig. 2, the system model consists of a cloud service provider (CSP), data owner (DO), and data miner (DM). Specifically, the CSP in our system is considered to be honest but curious, namely the semi-honest model.  • Data Miner (DM): The data miner intends to mine potentially unknown association rules of the items by outsourcing mining queries to the cloud server.
• Cloud Service Provider (CSP): The cloud server receives the mining query from data miner, performs association rule mining on the encrypted transaction records uploaded by the data owner, and sends the mining result back to the data miner.

B. ATTACK MODEL
We mainly consider external attackers and internal attackers. For external attackers, the active attack method is relatively expensive, while the passive attack cost is extremely low and does not leave traces. Therefore, the main attack method of the external attacker we consider is the passive attack. Besides, internal attackers can collect intermediate results in the computation process. The ability of attacker A is defined as follows: • A can obtain communication data between all entities by eavesdropping on the public channel.
• A can corrupt the CSP to try to obtain privacy or mining results of DO and DM.
• A can corrupt the DM and the CSP meanwhile to try to obtain the privacy of DO. Note that, the CSP is not assumed to collude with the DO since this will directly lead to the privacy disclosure of DO.
To resist the attacks defined in the above attack models, our protocol is supposed to protect the mining query and mining result privacy of DM and data privacy of DO.

IV. PROTOCOLS
In this section, we construct a privacy-preserving association rule mining protocol under the single-cloud setting by invoking four protocols, namely the multiplication protocol, inner product protocol, comparison protocol, and frequent itemset mining protocol, separately. In particular, the proposed multiplication protocol can be regarded as a general construction basing on the garbled circuit and any additive homomorphic encryption system. In this protocol, to facilitate understanding, we instantiate it by applying the Paillier cryptosystem. In the following discussion, we refer the encryption and decryption to the Paillier cryptosystem unless specified otherwise. Besides, to further clarify how the invoked four protocols support the final association rule mining protocol, we illustrate their relations in Fig. 3.
Next, we show a rough overview of the protocols. Combined with the definition of association rule mining given in Sec. II, to compute the association rule of X ⇒ Y in the Association Rule Mining Protocol, we must compute out supp(X ∪ Y ) and supp(X ) which needs the join of Frequent Itemset Mining Protocol, and to compare the ratio value of them with the given threshold conf min which needs to invoke the Comparison Protocol. Also, according to the definition of frequent itemset mining given in Sec. II, to compute the support of a mining query in the transactions, we need to compute out the inner product of this query and each transaction record that requires Inner Product Protocol, and to compare the value with the support threshold supp min that requires Comparison Protocol. As defined, the transaction records and mining queries involved in the computation are represented as binary vectors. By splitting the inner product operation on vectors into multiplications and additions, we have that the Inner Product Protocol can invoke the Multiplication Protocol to compute the inner product value.
Suppose that the DM runs the Paillier homomorphic encryption to generate its public/secret key pair (pk, sk) and publishes the public key pk.

A. MULTIPLICATION PROTOCOL
To clarify the applied symbols, we summarize the symbols used in the multiplication protocol to Table 4.
Suppose that the DM owns bit x ∈ {0, 1} and the DO owns bit y ∈ {0, 1}, they interact with each other and the CSP to obtain the privacy-preserving multiplication result of x and y. Since x · y equals to the result of logical AND between x and y, we then combine the AND gate in the garbled circuit and homomorphic encryption to design our multiplication protocol as follows.
• Initialization of DM. With the public key pk generated using Paillier homomorphic encryption, the DM generates three ciphertexts C 1 , C 2 , C 3 of message 0 and one  ciphertext C 4 of message 1 as representing its and DO's choises of 0 and 1 respectively, and chooses a random secret number r A ∈ Z N and a public cryptographic hash function H : {0, 1} * → {0, 1} N . The DM computes It then generates the secret truth table as Table 5 and shuffles the order of the four secret values to obtain the disordered secret truth values T 1 , T 2 , T 3 , T 4 .
• DM→DO. With the x ∈ {0, 1}, the DM selects k x a , computes  • CSP→DM. The CSP randomly chooses a number R ∈ Z N , generates four ciphertexts of R as and computes .
• CSP. The CSP computes and reserves W as the output of the multiplication protocol, which means the underlying bit of W is the multiplication result of x and y, namely W = [x × y]. Note that, the output of the multiplication protocol, W , usually will not be returned to the DM since W is just an intermediate result and this will enable the DM to obtain y by decrypting W with sk.

B. INNER PRODUCT PROTOCOL
The symbols used in the inner product protocol are summarized in Table 6.
By invoking the above multiplication protocol, we construct the following inner product protocol, where the DM sends the ciphertext of a mining query,  • it utilizes the additive homomorphism of the Paillier ciphertext to compute where · denotes the inner product operation.

C. COMPARISON PROTOCOL
The symbols used in the comparison protocol are summarized in Table 7.
Since the association rule mining consists of the comparison operation, such as between supp(q) and supp min , conf (X ⇒ Y ) and conf min , we construct the comparison protocol as follows.
Suppose that the CSP owns two ciphertexts [s], [t], it intends to output the privacy-preserving comparison result of s and t without decryption.

D. FREQUENT ITEMSET MINING PROTOCOL
The symbols used in the frequent itemset mining protocol are summarized in Table 8.
Given a transaction record database T = {t 1 , · · · , t m }, where t i = {t i,1 , · · · , t i,l }, where m is the total number of location records in the database and l is the length of the location record. The DM generates and encryptes a mining query q and a support threshold supp min and sends them to the CSP who will output supp(q) and determine wheter q is a frequent itemset.
• DM→CSP. The DM encrypts a mining query q = (q 1 , · · · , q l ) and a support threshold supp min as The CSP computes n dummy ciphertexts encrypted with messages chosen from [0, l] and concatenates them to the inner product ciphertext to obtain For any i ∈ [1, m + n], the CSP randomly selects d i ∈ Z N , computes It then chooses a secret permutation ψ to shuffle the order of W to and sends W to the DM. In particular, we have The CSP computes the ciphertext of the mining query's support degree as --Invoke the Comparison Protocol on supp(q) and supp min to obtain the comparison result B and send it to the DM.
• DM. The DM decrypts B and obtain the comparison result.
--If the result is 1, q is a frequent itemset.
--Otherwise, q is not a frequent itemset.

E. ASSOCIATION RULE MINING PROTOCOL
We construct the association rule mining protocol by invoking the above proposed four protocols. Given a transaction record where m is the total number of location records in the database and l is the length of the location record. Suppose that the DM intends to determine whether a rule X ⇒ Y is a strong association rule or not, it interacts with the CSP and DO to perform the following steps. Note that, taking the practical applications into consideration, this mining process bases on the fact that X is a the frequent itemset and its support degree supp(X ) is known.
the CSP interacts with the DM to invoke the Frequent Itemset Mining Protocol on X ∪ Y and each t i for i ∈ {1, 2, · · · , m} to obtain the ciphertext of support degree of X ∪Y , namely [supp(X ∪Y )]. And the frequent itemset mining result is sent to the DM.
• DM↔CSP. The DM performs the following.
DM generates the minimum confidence threshold conf min and uploads it to the CSP. 1) CSP. By converting the comparison between conf (X ⇒ Y ) = supp(X ∪ Y )/supp(X ) and conf min = α/β to that between β × supp(X ∪ Y ) and α × supp(X ). The CSP sets This completes our association rule mining protocol.

V. SECURITY ANALYSIS
In this section, we give the theoretical security analysis of the proposed protocols as below based on the defined attack model in section III-B. Since our protocol is assumed to be constructed under the semi-honest model, we first recall the definition of security in the semi-honest model as below. Given a protocol π, let a i be the input of participant P i and b i be the output of P i ; let REAL i (π) be the view of P i in the real-world execution of the protocol π; and let IDEAL i (π ) be the view of P i , which simulated from a i and b i , in the ideal world execution of the protocol π. If REAL i (π ) is computationally indistinguishable from IDEAL i (π ), the protocol π is said to be secure in the semi-honest model.
Below, we give the security proof of the proposed multiplication protocol, inner product protocol, comparison protocol, frequent itemset mining protocol, and association rule mining protocol in Lemma 1, 2, 3, 4, and 5 separately.
Lemma 1: The proposed multiplication protocol is secure under the semi-honest model.
Proof: We can have the execution view of data miner W in the real world as REAL W ({T i } i∈{1,2,3,4} ), whereT i = T i × C R,i , C R,i = [R] for R ∈ Z N randomly chosen by the cloud service provider. We assume that IDEAL S,W ({T i } i∈{1,2,3,4} ) is the execution view of a simulated data miner in an ideal world. Among them, T i is a random number in Z N 2 . Since the data miner cannot extract the information of random number R,T i and T i are indistinguishable from its point of view. We, therefore, have that REAL W ({T i , h i } i∈{1,2,3,4} ) and IDEAL S,W ({T i , h i } i∈{1,2,3,4} ) are computationally indistinguishable.
For the data owner, we represent its execution view in the real world as are random numbers defined by the data miner and therefore it is also random from the point of view of the data owner. Sicne C i = C i × r A mod N 2 with random number r A chosen by the data miner and {C i } i∈{1,2,3} = [0], C 4 = [1] are encrypted with Paillier encryption scheme, we have C i and C i are random from the point of view of the data owner for i ∈ {1, 2, 3, 4}. Since 2,3,4} are also random for the data owner. We assume that IDEAL S, Proofs about the cloud service providers are similar to that of the data miner and data owner. Combining these three results, we have that the proposed multiplication protocol is secure under the semi-honest model.
Lemma 2: The proposed inner product protocol is secure under the semi-honest model.
Proof: Since the proposed inner product protocol comes directly from the constructed multiplication protocol and additive homomorphic encryption whose security has been proved in Lemma 1 and [25] respectively, we have that our proposed inner product protocol is secure under the semihonest model.  The security proof of cloud service provider is similar to that of the data miner. Combining these two results, we have that the proposed comparison protocol is secure under the semi-honest model. Lemma 4: The proposed frequent itemset mining protocol is secure under the semi-honest model.
Proof: Note that, the invoked inner product and comparison protocols in the proposed frequent itemset mining protocol have been proved to be secure in the above lemmas. Besides, we show the detailed security analysis of the frequent itemset mining protocol except for these two protocols as below.
According to the setting of the proposed frequent itemset mining protocol, we have the execution view of the data miner in the real world is REAL W (W , B) [1], · · · , [l]} are dummy ciphertexts computed by the cloud service provider, and d i ∈ Z N are randomly chosen by the cloud service provider, we can have that although the data miner can decrypt W to {w 1 , · · · , w m+n }, this vector is indistinguishable from a random vector from {0, 1, · · · , l} m+n . Since B ∈ {[0], [1]} is the encryption of the comparison result, the data miner cannot extract the additional information from B except the comparison result. Based on the above analysis, we have that REAL W (W , B) and IDEAL S,W (Ŵ , B ) are computationally indistinguishable. The security proof of cloud service provider is similar to that of the data miner.
Next, we discuss the data confidentiality and query privacy against an external adversary A. Assume that A can eavesdrop on the transmission link between each entity such that it can obtain all the uploaded data. Since the uploaded data is encrypted by the Paillier cryptosystem whose security has been proved in [25], A cannot extract any information of the original data. If A can comprise all the data owners, they still cannot obtain the miner's query since all the intermediate results are ciphertext. As long as the cloud service provider does not cooperate with the data miner, all the data confidentiality and query privacy will be protected. We then claim that our frequent itemset mining protocol is secure and also can preserve the data confidentiality and query privacy against the external adversary.
Lemma 5: The proposed association rule mining protocol is secure under the semi-honest model.
Proof: In the proposed association rule mining protocol, the frequent itemset mining and comparison protocols are invoked, where these two protocols have been proved to be secure under the semi-honest model as above. Except for the intermedia information generated by these two protocols, the data miner in the association rule mining protocol only receives two comparison results of supp(X ∪ Y ), supp min and β × conf (X ⇒ Y ), conf min . The data miner cannot extract any information except the comparison results. For the cloud service provider, it obtains an additional conf min which does not affect the security of the protocol basing on the security of comparison protocol, frequent itemset protocol, and Paillier cryptosystem.
For the external attacker A, the proof method of this protocol is similar to that of the secure frequent itemset mining protocol. We omit it here. This completes the security proof of the proposed association rule mining protocol under the semi-honest model.

VI. PERFORMANCE ANALYSIS
In this section, we analyze the proposed protocols in terms of theoretical and experimental aspects as below.

A. THEORETICAL ANALYSIS 1) ENCRYPTION OVERHEAD
The security of the Paillier encryption algorithm comes from the difficulty of decomposing large integers. We assume that the modulus N = pq used in this scheme is 2k bitlength. Next, we conduct a theoretical analysis of the encryption's performance. The public key of the Paillier encryption is (N , g) and the secret key is λ = lcm(p − 1, q − 1). The bitlength of g is approximately equal to the bitlength of N 2 , that is, 4k. The bitlength of the public key is about 6k, and the bitlength of the secret key λ is slightly less than 2k. The size of the plaintext space is N , and the bitlength of ciphertext is about 4k, so the ciphertext/plaintext length expansion ratio is 2.
The time cost of its key generation algorithm mainly comes from finding two large prime numbers. The encryption algorithm mainly consists of two modular exponentiation operations. The computational complexity of the modular exponentiation operation is O(k 3 ). Similarly, the computational complexity of the decryption algorithm is also O(k 3 ). The additive homomorphic operation is a modular multiplication operation.

2) COMMUNICATION OVERHEAD
Based on the above assumptions, the modulus N used in the scheme is 2k bits. In the following, we estimate the sub-protocol communication overhead. The multiplication protocol requires three-party communication. Among them, the communication traffic between the data miner and the data owner is about 40k bits, between the data miner and cloud service provider is about 32k bits, and between the data owner and cloud service provider is about 16k bits. The inner product protocol is designed based on the secure multiplication protocol, and its communication traffic depends on the length of the position information record vector and the communication traffic of the secure multiplication protocol. The comparison protocol is executed by the data miner and CSP. The data miner uploads about 4k bits of data to the cloud service provider who needs to send about 4k bits of data to the data miner. The amount of communication between them is about 8k bits.

B. EXPERIMENT ANALYSIS
The experimental simulation of this scheme was tested on a notebook computer running Windows 10 with Intel Core i7-8750H 2.20GHz CPU and 8GB RAM. We implemented the Paillier encryption algorithm with a large integer N of 1024 bits in Visual Studio through the Miracle library in the C++ development kit and tested the proposed scheme using this encryption algorithm. In our experiments, we first tested the performance of the scheme and the efficiency of the subprotocols, including time cost and communication cost. Then, we conducted simulation experiments on the secure frequent itemset mining protocol on an original database provided by Roberto Bayardo. There are 3196 location information records in the original database, and the record vector length is 75.
In order to facilitate the experiment, we selected some transaction records from the original database to form a test database. In addition, we analyze the performance of this program by changing the parameters. It can be seen from the agreement that in the time overhead of the participants, encrypting the data occupies the main part of the time   overhead, while in the communication overhead, the bit length of the ciphertext data is greater than the bit length of the plaintext data. However, in general, the computing and communication resources of data miners and data owners are limited. Therefore, the performance of the encryption scheme is an important factor affecting the performance of this scheme.
We use 512-bit large prime numbers p and q and 1024-bit large integer N to implement 1000 simulation tests of the Paillier encryption. As listed in Table 9, the key generation time cost is about 89.49ms, the encryption time cost is about 11.62ms, the decryption time cost is about 18.94ms. The homomorphic operation is only one modular multiplication, and the time consumption cost is about 0.01 ms. The average length of the ciphertext is about 0.2495KB.
When examining the performance of the sub-protocol in this scheme, the number of 1s in the mining query vector we randomly generated is 15. We use the above encryption and evaluate the three sub-protocols 200 times respectively and list the average time cost and the average communication cost of the sub-protocols in Table 10 and 11. As listed in the table, we obtained the following results. In the multiplication protocol, the data miner runs about 48.22ms to complete the computation and needs to send about 3.496KB of data to the other two parties. Among them, about 0.999KB of data is sent to the cloud service provider and about 2.497KB of data is sent to the data owner. The data owner only needs to run about 23.24µs and sends about 0.999KB to the cloud service provider. And the service provider runs about 48.91ms and sends about 0.999KB to the data miner. In the inner product protocol, the computation cost and communication cost of the parties are about 75 times that of the parties in the multiplication protocol. The computational time costs of the data miner, data owner, and cloud service provider are about 3.63s, 3.68s, and 1.75ms respectively. Note that the computation cost of the data owner is much lower than that of the data miner. Their communication costs are about 262.191KB, 74.952KB, and 74.952KB, respectively. In the comparison protocol, participants are the data miner and cloud service provider whose computational time costs and communication costs are 30.76ms, 0.249KB, and 46.43ms, 0.249KB, respectively.
In order to better observe the total computation and communication costs of the frequent itemset mining protocol, we used different parameters to simulate the protocol. Suppose that when the protocol is executed, the number of records in the database is m, and the number of dummy ciphertexts added by CSP is n. After randomly generating the mining query vector, we performed the secure frequent itemset mining protocol on the mining query vector in the test database and tested the total computation time cost and total communication cost of the protocol in the cases of m = 1000, n = 500, m = 2000, n = 1000, and m = 3000, n = 1500.
As listed in Table 12, the total computation time costs for the three cases are 154.48min, 315.73min, and 461.32min, respectively, and the total communication cost is 366.23MB, 732.37MB, and 1098.70MB. Through analysis, we have that the factors affecting the total communication cost of the secure frequent itemset mining protocol are mainly the number of data records in the database and the length of each record vector. In the association rule mining protocol which is designed based on the frequent itemset mining protocol, the number of 1s in the mining query vector determines the number of times performing the frequent itemset mining protocol.
After simple reasoning, it can be obtained that the total computation time and total communication cost of the security association rule mining protocol are approximately proportional to that of the secure frequent itemset mining protocol, and exponentially related to the number of 1 in the mining query. It is worth noting that to show the performance more intuitively, we do not perform the mining job in a distributed way which will effectively reduce the time cost.

VII. CONCLUSION
In this paper, we propose a privacy-preserving association rule mining protocol on encrypted cloud data under a single cloud server. To realize the security under the single-cloud setting, we construct the multiplication protocol which is the base of association rule mining protocol using garbled circuit and additive homomorphic encryption system. Compared with the state-of-art works, our protocol reduces the number of cloud servers and enhances the flexibility of cloud outsourcing association rule mining. In future work, we will focus on further improving the efficiency of our scheme.
ZHILI ZHANG received the Ph.D. degree in computer applied technology from the South China University of Technology, in 2006. He is currently a Professor with the School of Information Engineering, Xuchang University. His research interests include information security, the Internet of Things, and machine learning.
PU DUAN received the Ph.D. degree from Texas A&M University, in 2011. He has been a 20 veteran on cryptography, information security, and networking security. He joined Cisco after obtaining his Ph.D. degree. At Cisco, he led the research and development of new cryptographic algorithms for TLS.13 on Cisco firewall product. He is currently working with the Secure Collaborative Laboratory (SCI), Ant Group, as a Senior Staff Engineer, leading the team on research and implementation of privacy-preserving technologies. He has published more than 20 papers on areas of cryptography, networking security, and system security.
BENYU ZHANG has been engaged in AI research and development for two decades. Since, he went into industry, he has Initiated and led numbers of core AI systems R&D in advertising, search, and recommendation at Google and FB. He is currently leading the Secure Collaborative Intelligence Laboratory (SCI Lab), Ant Group, to build the next generation privacy-preserving data mining and AI platform. The vision is to form the ''data internet'' and enable data mining and AI applications on data from multiple parties securely, efficiently, and effectively. He published 49 peer-reviewed papers with more than 11,000 citations, has 70 U.S. patents approved and 84 U.S. patents pending. ZHEN ZHAO received the M.S. and Ph.D. degrees in cryptography from Xidian University, China, in 2016 and 2020, respectively. She is currently a Lecturer with the School of Cyber Engineering, Xidian University. Her research interests include the public-key cryptography; in particular, security proof, signature, and encryption schemes.