Abstract

Decision tree models are widely used for classification tasks in data mining. However, privacy becomes a significant concern when training data contain sensitive information from different parties. This paper proposes a novel framework for secure two-party decision tree classification that enables collaborative training and evaluation without leaking sensitive data. The critical techniques employed include homomorphic encryption, function secret sharing (FSS), and a custom secure comparison protocol. Homomorphic encryption allows computations on ciphertexts, enabling parties to evaluate an encrypted decision tree model jointly. FSS splits functions into secret shares to hide sensitive intermediate values. The comparison protocol leverages FSS to securely compare attribute values to node thresholds for tree traversal, reducing overhead through efficient cryptographic techniques. Our framework divides computation between two servers holding private data. A privacy-preserving protocol lets them jointly construct a decision tree classifier without revealing their respective inputs. The servers encrypt their data and exchange function secret shares to traverse the tree and obtain the classification result. Rigorous security proofs demonstrate that the protocol protects data confidentiality in a semihonest model. Experiments on benchmark datasets confirm that the approach achieves high accuracy with reasonable computation and communication costs. The techniques minimize accuracy loss and latency compared to prior protocols. Overall, the paper delivers an efficient, modular framework for practical two-party secure decision tree evaluation that advances the capability of privacy-preserving machine learning.

1. Introduction

The two stages of a machine learning process are as follows. A model or classifier is developed using a potentially vast collection of training data during the first phase, also known as the learning phase. Then, the raw data are classified using the model. In many fields, including healthcare, finance, spam filtering, intrusion detection, and remote diagnostics, machine learning (ML) classifiers are useful tools [1]. These classifiers frequently need access to highly sensitive personal information like medical or financial records to execute their duties. Investigating systems that guarantee data privacy while reaping the rewards of ML is, therefore, essential. On the one hand, the ML model itself could include private information. For instance, a bank that utilizes a decision tree to evaluate its credit by clients would wish to keep the information about the model private. On the other hand, the model may have been created using private information. So-called model inversion attacks are widely known. Furthermore, these attacks might jeopardize the confidentiality of the training data, which are promoted by white-box and, even worse, black-box access to ML models [24]. Therefore, publicizing the ML model can conflict with the training data privacy.

Private decision tree evaluation can be implemented using general secure two-party computation [57] techniques like secret sharing and garbled circuits. The goal is to protect the decision tree algorithm so that it may be reviewed without disclosing any personal information. Some frameworks like ObliVM [8] and CBMC-GC [9] can transform plaintext programs written in high-level programming languages into oblivious programs suitable for secure computing. Their straightforward application to decision tree algorithms unquestionably improves performance compared to a manually created architecture. Nonetheless, the size of the resulting ignorant program is still proportional to the size of the tree. Generic methods are, therefore, typically useless, especially when the tree is large.

2. Overview of Our Construction

We make use of homomorphic encryption and function secret sharing to implement a server-server secure two-party decision tree classification evaluation protocol without the trusted third party.

The basic idea is as follows. (1) Without relying on a trusted third party, two different servers share their data sources and combine their common data to perform security decision tree training on the ciphertext. (2) In this study, we address the problem of private decision tree training on confidential data from different data sources. The characteristics of the data of the two servers are public, and the server requires to combine the data of the other party to classify the private attribute vector. The goal of the computation is to determine the classification while keeping the user input and the decision tree confidential. Once the calculations are complete, only the classification models and their respective training results are shared in secret; neither party knows anything else. Use any general secure multiparty computation to solve the problem. There are specialized solutions that integrate multiple methodologies and leverage subject matter expertise to create effective agreements.

In this study, we provide a 2PC-based (two-party computation) framework for decision tree training and inference that is quicker and more precise. We provide several new building blocks based on the comparison protocol [1019], support and implement secret sharing comparison on 2PC, and establish a new preprocessing protocol for mask creation. The experimental findings demonstrate that our approach is more accurate and time-effective than the majority of existing frameworks.

It is more challenging to prevent collusion among participants when several parties are involved and there are issues with the deployment itself. Although the present 3PC (three-party computation) or multi-PC security architecture must ensure an honest majority, the real world makes it difficult to meet this condition. The cooperation of parties can only be easily regulated if it is implemented on cloud servers owned by many businesses. Nevertheless, 2PC can fulfill this need.

Our intent is to deliver and implement a unique two-server protocol that gives both parties access to a complete classification model while maintaining the privacy of their own data and a tolerable level of speed. The plan is to evaluate ciphertext trees encrypted with a server public key while using fully or somewhat homomorphic encryption (FHE/SHE). Therefore, the evaluation server is not informed of any intermediate or final calculation results. Existing fully homomorphic encryption techniques have high computational overhead and data transmission costs. To address this, we use efficient data representation and algorithm improvements. However, fully homomorphic encryption still has substantial overhead compared to our approach.

We summarize the key differences between our method and prior work by De Cock et al. [20] and Lu et al. [21] in Table 1. In this work, we have introduced a novel framework for secure two-party decision tree classification that provides substantial improvements over prior art. As evidenced by the table, our approach achieves higher accuracy, lower communication overhead, and reasonable computation complexity compared to De Cock et al. [20] and Lu et al. [21]. Our innovations in computing decision bits and combining secret sharing with homomorphic encryption lead to a highly performant and accurate framework with demonstrable gains. The empirical results substantiate the concrete efficiency and accuracy advantages of our proposed techniques over existing methods.

In this paper, we present secure two-party decision tree classification for different data sources’ training and inference. The two-party setting is reasonable for real-world applications [22] and has been widely employed in privacy-preserving machine learning [2326]. First, exploiting an advanced cryptographic primitive, function secret sharing (FSS) [27], we present an efficient comparison protocol for the choice of the best split. The main challenge is that directly using the general FSS scheme [28] leads to a high evaluation overhead, since it requires two FSS invocations to handle the wrap around problem illustrated in Section 3. We address this by providing a novel theoretical analysis, which shows that the probability of incurring the wrap around problem is negligible with appropriate parameter settings even though we only invoke one FSS evaluation. This achieves approximately reduction in the online runtime compared to the most efficient FSS scheme [28], while resulting in a slight accuracy loss in the training of trees. For communication, our protocol only requires one communication round with 2 ring elements. Nonetheless, the computational workload can be parallelized, thereby reducing computational time, even though the computational overhead may still end up being larger than existing protocols. By providing encrypted input and returning only encrypted output, we are able to provide a noninteractive protocol that enables clients to outsource evaluation to servers. Furthermore, it is possible to make existing systems unilaterally simulatable and secure in a semihonest model by employing techniques that may double computation and communication costs.

The scope of our research pertains to private function evaluation (PFE) [29, 30], specifically, privacy-preserving decision tree evaluation [1016] as a component of secure multiparty computation [57, 3135]. In this section, we will give a brief evaluation, while a more comprehensive analysis can be found in the literature.

Brickell et al. [12] introduced the first private decision tree evaluation protocol by utilizing a novel combination of homomorphic encryption (HE) and garbled circuits (GC). The server translates the decision tree into a GC, which the client subsequently executes. This protocol combines homomorphic encryption and oblivious transmission, which enables the client to discover its garbling key (OT). While the evaluation time is sublinear in the tree size, the technique could be more efficient for large trees due to the linear secure program and communication cost. Barni et al. [10] improved upon this technique by removing the leaf node from the secure program, thereby reducing calculation costs by a constant factor.

Bost et al. [11] have modelled the decision tree as a multivariate polynomial where the constants in the polynomial signify the classification labels and the variables signify the outcomes of the Boolean conditions at the decision nodes. In order to clandestinely calculate the value of the Boolean conditions, each threshold is matched with the respective encrypted attribute values under the client’s public key. Subsequently, the client receives the result once the server homomorphically evaluates the polynomial.

In their study, Wu et al. [4] have employed various methods that exclusively require additive homomorphic encryption (AHE). They have also used the protocol from [36] to compare data and broadcast the encrypted comparison bits with the client’s public key to the server. Upon evaluating the tree, the server communicates the client’s index of the matching categorization label, and the outcome is conveyed to the client via an OT. Tai et al. [15] have implemented the comparison methodology of [36] and AHE in their work. They have assigned costs to the left and right edges of each node, namely, b and 1 − b, respectively, where b is the result of the comparison at that specific node. Ultimately, the costs are tallied along each tree branch, and the classification label pertains to the path that yields zero cost.

Tueno et al. [16] have represented the tree as an array and conducted comparisons of the depth of the tree using small garbled circuits to obtain secret shares of the subsequent node’s index along the tree. They have also introduced a novel primitive called oblivious array indexing to enable the selection of the following nodes without memorization. Using a modular approach, Kiss et al. [14] have incorporated subfunctionalities such as attribute selection, integer comparison, and route evaluation. For covertly computing these subfunctionalities, they have thoroughly examined the trade-offs and performance of various potential combinations of reduction protocols.

De Cock et al. [20] utilized a similar approach to earlier methods by initially carrying out comparisons. To minimize interactions, they have implemented secret sharing-based secure multiparty computation (SMC) and commodity-based cryptography [37] in an information-theoretic model. In contrast to ours and other protocols, De Cock et al.’s approach is secure in the computational environment. Lu et al. [21] have proposed XCMP, a noninteractive comparison protocol using BGV homomorphic method [38] with the polynomial encoding of the inputs. They have further employed output expressive XCMP to construct the private decision tree protocol suggested by Tai et al., thereby maintaining additive homomorphism.

The decision tree technique is efficient and noninteractive due to its short multiplicative depth. However, it has limitations and is not universal as it works best with small inputs and depends on BGV-type homomorphic encryption (HE) schemes. Furthermore, it lacks output expressiveness like XCMP. It cannot support SIMD operations, which makes it unsuitable for expanding to more complex protocols such as random forest [39] while preserving its noninteractive nature. Additionally, the output length of the technique is exponential in the depth of the tree. In contrast, our binary instantiation has a marginally linear output length, and the integer instantiation can further reduce it by utilizing SIMD.

Like most privacy-preserving techniques based on FSS [28, 4042], their schemes derive correlated randomness through a third party. However, the role of the third party can be jointly simulated by the two parties using either generic two-party secure protocols such as garbled circuits (GCs) [43] and GMW [44], or specific techniques [45]. Specifically, (1) one can use generic GCs or GMW style protocols to produce the required correlated randomness during the offline phase. Although versatile, these protocols necessitate a private evaluation of underlying pseudorandom generators (PRGs) during the FSS key generation phase. (2) In a customized approach, Doerner and Shelat [45] proposed a new solution that offers significant efficiency advantages, as PRG evaluation takes place locally without the need for secure simulation. However, it is suitable only for moderate domain sizes and challenging to extend to more generalized and significant cases.

4. Preliminaries

This section provides essential definitions and notations for our system, serving as a background for the rest of the study. Fully or somewhat homomorphic encryption is the fundamental concept, wherein we have simplified the mathematical intricacies to facilitate the reader’s comprehension and presentation. In this work, we utilize the terminology presented in [46] to delineate several foundational concepts. Relevant literature [12, 4652] regarding homomorphic encryption is recommended for further understanding.

5. Decision Tree Classifier

Machine learning relies heavily on decision trees for data classification and regression. This study considers two parties that provide their distinct input variables and independently reconstruct the tree model, with data from one party kept confidential from the other. The decision tree classification’s primary objective, given an input query x, is to follow the tree model and compare the input entries to node-specific thresholds for each decision node. The left or right child node is chosen as the next node depending on the comparison result. The classification model eventually ends in a specific leaf node, giving the input query a unique classification label.

Let elements be a feature vector. A function with a -dimensional feature space is implemented by a decision tree . Let the input query of the party be an -dimensional positive integer vector over . The Boolean function is connected to each child node in the tree, where is indeed the index of the feature vector and is the threshold.

Then the decision tree evaluation on input is given by with , in which is the number of leaf nodes. This function starts from the root node and then does a comparison at each decision node. Let be the index of a decision node and f be the function mapping the decision node index to the corresponding input index . Besides, let be the threshold value of decision node . Then if holds for node , the right child is chosen as the next decision node; otherwise, the left child node is chosen. At the end, the function outputs the classification label of the final leaf node.

A decision tree (DT) is a function that maps an attribute vector to a finite set of classification labels. The tree consists of(i)Nodes that either contain a test condition or are internal decision nodes.(ii)Nodes that contain a classification label and are considered as leave nodes.

The decision tree model comprises a decision tree and the functions outlined below:(i)A threshold value is assigned to each decision node by the function .(ii)An attribute index is assigned to each decision node by the function .(iii)A label of each leaf node is assigned to each decision node by a labeling function .

At each decision node, a comparison of “greater-than” is made between the assigned threshold and attribute values, i.e., the decision at node is .Node Indices. If we have a decision tree, the index of a node can be determined using breadth-first search (BFS) traversal, starting at the root with index . When the tree is complete, a node with index will have a left child of and a right child of .

6. Homomorphic Encryption

This article focuses on lattice-based homomorphic encryption methods that allow for computations on ciphertexts by generating an encrypted output that corresponds to the result of a function applied to the plaintexts. Such encryption schemes facilitate several linked additions and multiplications on plaintexts in a homomorphic manner.

Definition 1. Consider the plaintext space defined as a ring , where is a prime number and can be expressed as a power of two. The homomorphic encryption (HE) scheme under consideration includes the following algorithms:(i): The generation of private key , public key , and evaluation key is achieved through a probabilistic algorithm denoted by . This algorithm employs a security parameter to ensure the randomness and security of the generated keys.(ii): An encryption algorithm using a probabilistic algorithm is employed to produce a ciphertext from a given message and public key . We will denote the resulting encryption as .(iii): a probabilistic algorithm is utilized to generate a ciphertext by employing the evaluation key , an -ary function , and ciphertexts denoted as .(iv): a message can be generated from a given ciphertext and private key using a deterministic algorithm.When using the encoding method in homomorphic encryption (HE), the ciphertext is modified by introducing “noise,” which can increase during homomorphic evaluation. While the noise level grows exponentially upon multiplication, adding ciphertexts results in a linear increase. If the noise level becomes too high, it makes the decryption of the ciphertext impossible. To avoid this problem, either the refresh algorithm can be employed or the depth of the circuit for the function can be kept sufficiently low. These techniques include key-switching or bootstrapping procedures that convert a ciphertext encrypted with one key into a ciphertext of the same message encrypted with another key and a specified amount of noise [46].

7. Function Secret Sharing

Function secret sharing (FSS) works by splitting a function into two succinct function parts such that each part reveals nothing about the function , but when the evaluations are combined at some point , the result is .

Formally, an FSS scheme is a pair of algorithms and with the following syntax. We identify two FSS constructions [28, 40] as a natural fit for our scheme: (1) distributed point function (DPF) that satisfies is and otherwise and (2) distributed comparison function (DCF) that satisfies is and otherwise.

Definition 2 (function secret sharing [27, 53]). A two-party function secret sharing (FSS) scheme is a pair of algorithms such that(1) is a probabilistic polynomial-time (PPT) key generation algorithm that given secure parameter and a function outputs a pair of keys . We assume that explicitly contains descriptions of input and output groups .(2) is a polynomial-time evaluation algorithm that given (party index), is defined as the key of function . Let be the input of function and output a group element .When is omitted, it is understood to be . When , we sometimes index the parties by rather than .

Definition 3 (correctness and security [27, 53]). Let be a function family and be a function specifying the allowable leakage about . When is omitted, it is understood to output only . We say that as in Definition 2 is an FSS scheme for (with respect to leakage ) if it satisfies the following requirements.(3) for all and every , if , then .(4) for each there is a PPT algorithm (simulator), such that for every sequence of polynomial-size function descriptions from and polynomial-size input sequence for , the outputs of the following experiments and are computationally indistinguishable:(i).(ii).A central building block for many of our constructions is an FSS scheme for a special interval function referred to as a distributed comparison function (DCF) as defined below. We formalize it below.

Definition 4 (distributed comparison function). A special interval function , also referred to as a comparison function, outputs if and otherwise. We refer to an FSS scheme for comparison functions as DCF. Analogously, function outputs if and otherwise. In all of these cases, we allow the default leakage .

Theorem 5 (concrete cost of DCF). Given a PRG, there exists a a DCF for with key size , where and . For , the key generation algorithm invokes at most times and the algorithm invokes at most times.

We use to denote the total key size, i.e., , of the key with input length and output group . On the other hand, we use (nonbold) to denote the key size per party, i.e., . This captures the key size used in algorithm. In the rest of the paper, we use to count number of invocations/evaluations as well as key size per evaluator , .

8. Our Construction

This section outlines a modular description of our base protocol for secure two-party decision tree classification. We first introduce the data structures used in the protocol. By employing this structured representation of data, we can ensure that each party has access to necessary information while preserving the privacy of sensitive data. This enhances the security of our protocol, making it suitable for real-world applications requiring secure data analysis.

At last, we show the honest-but-curious adversarial model assumed in our protocol and cryptographic primitives like function secret sharing, homomorphic encryption, and secure comparison to prevent leakage of sensitive data to each party for our protocol. Overall, the modular design of our base protocol enables us to address specific security concerns by considering data structures and access control mechanisms. In subsequent sections, we describe the key components of the protocol in more detail, including the cryptographic primitives employed and the communication protocol used to facilitate secure multiparty computation.

9. Data Structure


Definition 6. For a decision tree model , we denote the tree of each node which consists of(i).threshold: the threshold of node , denoted by thr(), is stored in the variable .threshold.(ii).aIndex: the associated index, denoted by att(), is stored in the variable .aIndex.(iii).parent: A pointer to the parent node is stored in the variable .parent. For the root node, this pointer is null.(iv).left: Pointers to the left child nodes are stored in the variables .left. For leaf nodes, these pointers are null.(v).right: Pointers to the right child nodes are stored in the variables .right. For leaf nodes, these pointers are null.(vi).cmp: During tree evaluation, the comparison bit is computed and stored in the variable .cmp. If is a right node, it stores ; otherwise, it stores .(vii).cLabel: the classification label is stored in the variable .cLabel if is a leaf node; otherwise, it stores an empty string.

Definition 7 (classification function). Let the attribute vector be and the decision tree model be . We define the classification function to be where root is the root node and is the traverse function defined as

10. Building Blocks

A one-time key is generated as part of the initialization process for a homomorphic encryption system. The server is responsible for creating the triple , which consists of the public, private, and evaluation keys. Following this, sends to the other server . For each instance of data categorization, encrypts their input and forwards it to the server . A trusted randomizer can be employed to reduce transmission costs, which is not authorized to cooperate with the server and does not participate in the actual protocol. This technique is similar to commodity-based cryptography, except that the client can act as the randomizer themselves and provide the list of before the start of the protocol when the network is not overloaded.

The server starts by computing for each node the comparison bit and stores b at the right child node and at the left child node (). It is illustrated in Algorithm 1.

(1)function EVALPATHS ()
(2) let be a queue3
(3)
(4) while do
(5)  
(6)  
(7)  
(8)  if then
(9)   
(10)  if then
(11)   
10.1. Initialization

The initialization consists of a one-time key generation. One server generates appropriate triple of public, private, and evaluation keys for a homomorphic encryption scheme. Then, another server sends to the server. For each input classification, just encrypts its input and sends it to the other . To reduce the communication cost of sending input of , can use a trusted randomizer that does not take part in the real protocol and is not allowed to collaborate with . The trusted randomizer generates a list of random strings and sends the encrypted strings to server and the list of to . For an input , this server then sends to the server in the real protocol. This technique is similar to the commodity-based cryptography with the difference that can play the role of the randomizer itself and sends the list of ’s (when the network is not too busy) before the protocol setting.

10.2. Computing Decision Bits

The server starts by computing for each node the comparison bit and stores at the right child node and at the left child node . It is illustrated in Algorithm 2.

(1)function EVALDNODE ()
(2) for each do
(3)
(4)
(5)
10.3. Aggregating Decision Bits

Then for each leaf node , the server aggregates the comparison bits along the path from the root to . We implement it using a queue and traversing the tree in BFS as illustrated in Algorithm 1.

10.4. Finalizing

After aggregating the decision bits along the path to the leave nodes, each leaf node stores either or . Then, the server aggregates the decision bits at the leaves by computing for each leaf the value and summing all the results. This is illustrated in Algorithm 3.

(1)function FINALIZE ()
(2)
(3)  for each do
(4)   
(5)  return

The comparison operation is used to select the maximum Gini impurity gain. Algorithm 4 gives a specific comparison protocol based on FSS, which outputs the shares of . Note that the comparison protocol is executed over the secret-shared inputs rather than public values, which should be supported by our designed FSS scheme. As a result, the key idea is to construct the FSS scheme for the offset function , where is randomly selected from and secret sharing between and . In this way, and first reconstruct and then evaluate , which exactly equals to evaluating . Note that the offset function fails if wraps around. Our protocol only invokes DCF and introduces communication bits within round in the setup phase.

(1)function COMPARE ()
(2) generate using PRFs with seed.
(3) samples and sends to
(4) evaluates and sends to
(5) sends to , and
(6) evaluates
(7)Return

11. Secure Two-Party Decision Tree Classification

In this section, we present our secure two-party decision tree classification protocol that caters to scenarios where two counterpart parties provide privacy information, and both parties can own a decision tree model (see Figure 1). The proposed protocol ensures that both servers possess knowledge of the classification results but only of the individual inputs of the self-party. Our protocol is designed to be secure for “honest and curious” parties.

To establish the necessary functionality for the tree array and feature array , and perform the required setup work for function secret sharing. Additionally, shares the root node with , which serves as the starting evaluation node. Our secure two-party decision tree classification protocol provides an effective solution for secure data analysis while maintaining data privacy. It enables both parties to access the classification results without compromising sensitive information, thereby ensuring transparency in the data analysis process.

In each iteration, the evaluation process starts with the call of the FFS functionality on , which initiates the sharing of among the parties. The parties then perform a secure comparison between and , the purpose of which is to obtain a comparison result, denoted as . Subsequently, the MUX computation determines which child becomes the next evaluation node. The computation of this decision incorporates the application of the XOR operator on two values: and . In such a case where equals , becomes equal to . Conversely, if does not equal , becomes equal to . Thus, determining the next evaluation node during each iteration depends on the outcome of the secure comparison and the MUX computation.

From the shared index , the parities invoke to share . , , , , and are then updated correspondingly. Besides, is stored in where the final classification label will stay in. Note that we encode a self-loop for each leaf node, and thus will always hold a correct classification label once the evaluation reaches a leaf node. Moreover, it is easy to hide length information: and just run iterations of evaluation. In the end, sends to , and recovers as classification result. The protocol runs in iterations with secure comparison and MUX operations. If the comparison protocol is used over and a function secret sharing protocol is used over .

Lemma 8 (Correctness). Assuming the evaluation correctness of the underlying FSS scheme and our algorithm, then the above construction is a dual-server private decision tree classification protocol, outputting the correct classification label.

Proof. Based on the correctness of FSS and , the is equal to if and only if is equal to ; then the corresponding result satisfies that .

Lemma 9 (security). The algorithm securely realizes the functionality , assuming the existence of secure protocols for FSS procedures.

Proof. We prove the security of . receives no private information of , and hence this protocol is trivially secure against “curious but honest” adversary. Now, we prove the security against corruption of , when server receives and . Given the security of PRFs, is a random value unknown to . Thus, the distribution of is uniformly random from the view of . Then given the security of FSS, the information learned by can be perfectly simulated. Hence, our protocol is trivially secure against “curious but honest” corruption of .

12. Security Analysis

We now present a formal security proof of our secure two-party decision tree classification protocol described in this section. We show that the protocol satisfies computational semihonest security by proving the existence of probabilistic polynomial-time (PPT) simulators whose output is computationally indistinguishable from the real view of each party during the protocol execution.

Let denote the view of party during an execution of the protocol on inputs , consisting of its input , internal random coins , and received messages. Similarly, denotes the view of party . We construct the following PPT simulators (Algorithms 5 and 6).

(1)Generate random coins for PRF evaluation
(2)Generate SETUP message to FSS functionality
(3)Run protocol execution locally on inputs , outputting
(4)Use as randomness and arbitrary as input
(5)Output view

We now show that the output of each simulator is computationally indistinguishable from the real view.

(1)Generate random coins for PRF evaluation
(2)Generate SETUP message to FSS functionality
(3)Run protocol execution locally on inputs , outputting
(4)Use as randomness and arbitrary as input

Theorem 10. The secure two-party decision tree classification protocol satisfies computational semihonest security. Formally:

The SETUP message and PRF randomness generated by Sim1 are identically distributed as in the real protocol execution. The simulated transcript consists of(1)Encrypted inputs computed on (2)FSS keys generated independently of inputs(3)Encrypted outputs that encrypt results from

These are all computationally indistinguishable from the real transcript due to the IND-CPA security of the encryption scheme and the security of the FSS scheme. Therefore, . By a similar argument, we can show . Since PPT simulators Sim0 and Sim1 exist where the output is computationally indistinguishable from the real view of each party, this proves that the protocol satisfies computational semihonest security.

This security proof demonstrates that our protocol protects the privacy of each party’s inputs and decision tree model during the secure two-party computation. By simulating the views using arbitrary inputs, we have shown that the views leak no additional information beyond the intended output. Therefore, our protocol provides provable security guarantees for practical applications requiring privacy-preserving decision tree classification.

13. Experiment

We present experimental results evaluating the performance of our secure two-party decision tree classification protocol on the MNIST dataset [54]. Specifically, we analyze the impact on accuracy of varying the number of training epochs. We also benchmark the runtime of training and inference under different model configurations. Finally, we compare our approach to prior frameworks from related works regarding efficiency and accuracy.

14. Experimental Setup

In our study, we implement the secure two-party decision tree training algorithm in Python. To facilitate communication between parties, we utilize the communication backend of the Porthos framework in EzPC [55]. We employ a pseudorandom function (PRF) based on the block cipher AES using the OpenSSL-AES library [56, 57]. At the same time, the fully homomorphic secret sharing (FSS) schemes are implemented using the LibFSS library. The implementation is executed on two terminals with Intel(R) Core(R) CPU i7-6700 running the Ubuntu 18.4 operating system and 16 GB of RAM, with each terminal representing a party ( and ). The reported communication overhead includes the communication between the two parties, while the runtime incorporates the computational costs of local computation within each entity and the communication latency between them. For experiments conducted over a local area network (LAN), we assume a bandwidth of 2 Gbps and an echo latency of . We use secret-sharing protocols over the ring following existing works [23, 58]. We encode the inputs using a fixed-point representation with a precision of bits.

Our implementation demonstrates the practical viability of secure two-party decision tree training for data analysis applications prioritising privacy. By leveraging commonly available resources such as Python and the Porthos framework in EzPC, we provide a simple yet effective solution that can be quickly adopted for particular data analysis tasks. In summary, our study presents an efficient and practical approach to implementing secure two-party decision tree training, providing insights into designing secure data analysis systems for real-world applications.

In this section, we give the accuracy of secure two-party decision classification. Our study aims to evaluate the effectiveness of secure two-party decision tree classification by conducting several epochs of training on a decision tree classifier. Specifically, we perform , , and epochs of training on the model and record the corresponding accuracies obtained in each case.

Upon analyzing the results, we present the findings in Table 2. Due to space constraints, we only report the plaintext training results and the corresponding secure training results. The table shows that the trend in secure training accuracy is similar to that of plaintext training accuracy, with no discernible fluctuations. Furthermore, the difference between the accuracy obtained from secure and plaintext training is approximately . These results suggest that the secure two-party decision tree classification method effectively achieves high accuracy while preserving data privacy. To sum up, the experimental results demonstrate that secure two-party decision tree classification can achieve performance comparable to plaintext training, with only a negligible difference in accuracy, making it an up-and-coming method for secure data analysis.

In our study, we investigate the impact of varying the number of training data and the maximum tree depth on the communication overhead of secure two-party decision tree classification. Firstly, we examine the relationship between the number of training samples and the communication cost. Figure 2 shows that as the number of training samples increases, both the cost of data and communication grow roughly linearly. This result can be attributed to the secure two-party decision tree training phase requiring more multiplication operations to compute the impurity gain with a more significant number of training samples. Secondly, we explore the impact of varying the maximum tree depth on the communication overhead. As the well-trained tree tends towards a complete binary tree, approximately internal nodes are constructed for a given depth . Therefore, as shown in Figure 3, the communication overhead increases logarithmically with the tree depth. This result is because deeper trees require more computation and excellent communication between parties.

Our results show that the communication overhead is influenced by critical factors such as the number of training samples and the maximum tree depth. As such, it is essential to carefully consider these factors when designing secure two-party decision tree classification systems to ensure optimal performance while maintaining data privacy. In conclusion, our study highlights the need for efficient and secure methods for decision tree classification, especially in situations where data privacy is of utmost importance.

The study [59] presents an initial GPU-based implementation of function secret sharing, although further optimizations could reduce the memory footprint of cryptographic keys by approximately to match theoretical minimum bounds. Moreover, the marginal divide between LAN and WAN runtimes intimates that computational overhead supersedes communication for sufficiently extensive networks. Thus, optimizing GPU-centric calculations proffers the potential to enhance overall efficiencies of inference and training paradigms markedly.

15. Discussion

We have demonstrated the utility of function secret sharing for private training and evaluation of decision trees. Compared to related works, our protocols are highly competitive and achieve negligible failure rates for ML applications.

Numerous opportunities remain to improve performance further and expand the applicability of private ML via function secret sharing. Running experiments at 16 bit precision versus 32 bit could be another promising improvement, as major ML frameworks now support 16 bit encoding on CPU. Reducing key sizes, leveraging lower precision, and GPU optimizations can help overcome scaling bottlenecks. Testing new model architectures and data modalities will be essential to gauge general viability. Overall, there is tremendous promise in employing function secret sharing primitives to enable practical secure computation for diverse machine learning pipelines.

Moreover, a core technical challenge in applying homomorphic encryption (HE) to secure machine learning is managing the noise growth inherent in lattice-based cryptosystems. Our scheme introduces randomness into the ciphertext to ensure security when applying HE. However, each subsequent homomorphic operation (addition or multiplication) also accumulates and amplifies this noise. Excessive noise during HE evaluation inhibits correct decryption and reduces arithmetic fidelity. In the context of secure decision tree protocols, imprecise calculations may propagate errors when calculating attribute thresholds and reduce model accuracy. While multiplication noise worsens exponentially, even repeated additions can produce considerable noise.

We employ techniques, including optimized circuitry, modular design, and regular ciphertext refresh, to suppress noise. However, some accuracy loss may still occur for deep trees and large datasets. We will empirically quantify the potential degradation in accuracy due to noise in future work. Analyzing the impact on accurate data will better reveal the actual impact. If noise-induced inaccuracies prove unacceptable, an alternative, homomorphic encryption scheme with slower noise growth is a better choice. However, these usually require more computational overhead. Developing robust protocols for large amounts of noise remains an open problem when applying HE to machine learning.

The inherent stochasticity of lattice-based HE affects model accuracy when noise accumulates across multiple operations. While this paper mitigates noise growth through multiple strategies, more empirical analysis is needed to determine the extent of this problem in practice. Managing noise persistence remains an active research challenge in building efficient and accurate protocols for the secure computation of encrypted data. We have demonstrated the utility of function secret sharing in private training and evaluation of decision trees. Our protocol is highly competitive compared to related work and suffers a negligible failure rate for machine learning applications.

16. Conclusions

In conclusion, our research has introduced a two-party secure decision tree classification protocol that offers low communication and computational costs and minimal client interaction. Our approach enhances the practical implementation of the solution by improving the multiplication depth of the tree evaluation circuit and the efficiency of the underlying general FSS solution. Notably, we have utilized a unique approach of adding relatively small amounts of blurring noise by each participant in threshold decryption, resulting in a considerable reduction in the overall computational cost and ciphertext size of FSS. Together, our contributions have enabled the application of our protocol with a lower computational overhead while maintaining a higher level of security.

Data Availability

No underlying data were collected or produced in this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under grant no. 12171114.