1 Introduction

Secure multi-party computation (SMC for short) protocols allow participants to compute a public function jointly while keeping their inputs private. A Private Set Intersection (PSI for short) protocol is one of the SMC protocols, and allows participants to compute the intersection of their sets without revealing them to each other. Since PSI protocols have many real-world applications (see for example [18, Sect. 2.6]), PSI protocols have extensively been studied in the last decade.

The first PSI protocol was proposed by Freedman et al. in [12] based on polynomial interpolation and an additive homomorphic encryption scheme. Later, Kisser et al. in [17] generalized the first PSI protocol in [12] to the multi-party setting. Le et al. in [19] proposed a two-party PSI protocol with an untrusted third party. There exists several techniques to construct PSI protocols such as public-key techniques [7, 12, 13, 17, 22], Map-To-Prime technique [16], oblivious transfer technique [25], and so on.

1.1 Related work

Ion et al. in [13] proposed the Private Intersection-Sum Protocol (PI-Sum for short). The PI-Sum is one of the two-party PSI. In the PI-Sum, two parties (say Alice and Bob) have the private sets A and B. Moreover, Bob additionaly has a rational integer associated with each element of B. The PI-Sum allows Bob to obtain the sum of the rational integers associated with the elements of \(A \cap B\). The PI-Sum ( [13, Figure 2] or Algorithm 1) is constructed by Precomputation Algorithm (Algorithm 2), Intersection Set Algorithm (Algorithm 3), and Intersection Sum Algorithm (Algorithm 4). Precomputation Algorithm (Algorithm 2) encrypts all the elements in A and B, respectively. Intersection Set Algorithm (Algorithm 3) computes \(A \cap B\) without revealing the private sets A and B to each other. Intersection Sum Algorithm (Algorithm 4) computes the sum of the rational integers associated with the elements of \(A \cap B\) without revealing the private sets A and B to each other. The PI-Sum is based on Diffie-Hellman key exchange [9]. The PI-Sum requires to compute the cardinality \(\sharp (A \cap B)\) of the intersection set \(A \cap B\) of input sets A and B for obtaining the output. By DDH (Decisional Difffie-Hellman) assumption, the security of the PI-Sum in the honest-but-curious model was proven by Ion et al. in [13]. Participants in honest-but-curious model follow the steps in cryptographic protocol honestly, however the participants try to obtain the private information of other participant. For the input sets A and B, the PI-Sum requires to compute \(2(\sharp A + \sharp B)\) exponentiations for obtaining the output. In computing an online ad revenue, since the input set A is the set of the user who saw online ad, one may assume that the cardinality of A and B to be several hundreds of thousands to million. Therefore the PI-Sum needs to improve the efficiency for computing an online ad revenue. Other PI-Sum protocols can be found in [14, 21].

1.2 Our contribution

This paper proposes the efficiency improvement techniques for the PI-Sum ( [13, Figure 2]). We have two interesting contributions as follows:

  • We propose a new PI-Sum protocol, say the proposed protocol 1 (Algorithm 7). The proposed protocol 1 is constructed by Precomputation Algorithm (Algorithm 2), the proposed algorithm for computing the private set intersection (Algorithm 8), and Intersection Sum Algorithm (Algorithm 4). Thus, the only difference between the PI-Sum ( [13, Figure 2] or Algorithm 1) and the proposed protocol 1 (Algorithm 7) is the phase of the computation of private set intersection (Algorithm 3 and Algorithm 8). The proposed protocol 1 uses a Bloom filter for computing the cardinality of the intersection of input sets A and B. Since Bloom filters are probabilistic data structures, the computational cost of Algorithm 8 is smaller than that of Algorithm 3. Thus, the proposed protocol 1 is more efficient than the PI-Sum (See Eqs. (9 and  10)). We assume that \(\sharp A = \sharp B\), and let us define \(T := \sharp A\) (\(=\sharp B\)). Then the computational cost of the proposed protocol 1 is about 1% smaller than that of the PI-Sum if \(T = 2^{16}\), and the computational cost of the proposed protocol 1 is about 50% smaller than that of the PI-Sum if \(T=2^{22}\).

  • This paper also proposes the proposed protocol 2 (Algorithm 10) and the proposed protocol 3 (Algorithm 11) for preventing the security problem described in Sect. 5. By adding an aborting functionality in the proposed protocol 2 and the proposed protocol 3, we can avoid the security problem described in Sect. 5. The computational costs of the proposed protocol 2 and the proposed protocol 3 are also discussed (See Eqs. (14 and  15)).

The rest of this paper is organized as follows: In Sect. 2, we describe the notations in this paper, and we review the PI-Sum [13] and a Bloom filter [4]. Section 3 proposes the proposed protocol 1 (Algorithm 7) that applying a Bloom filter in order to improve the efficiency of the computation about the intersection \(A \cap B\), and show the correctness of the proposed protocol 1. In Sect. 4, we estimate the computational costs of the PI-Sum and the proposed protocol 1, and compare them. We show that the proposed protocol 1 is more efficient than the PI-Sum [13, Figure 2]. In Sect. 5, we propose the proposed protocol 2 and the proposed protocol 3 for preventing the above security problem. In Sect. 6, we discuss the efficiency of the proposed protocols. Section 7 concludes the paper.

2 Mathematical preliminaries

In this section we describe the notations that will be used in this paper. For more details, we refer the reader to [20].

For a finite set X, we denote by \(\sharp X\) the cardinality of X. We use the symbol Inj(XY) to represent the set of all the injective maps from a set X to a set Y, and use the symbol Bij(X) to represent the set of all the bijective maps on X. Let \({\mathbb {Z}}\) be the rational integer ring. We denote by \({\mathbb {N}}_{+}\) the set \(\{1, 2, \ldots \}\). For a positive integer \(m \ge 2\), we set \({\mathbb {Z}}_{m} := \{0, 1, \ldots , m-1\}\) (the residue class ring modulo m). For a prime p, \({\mathbb {F}}_{p} := \{0, 1, \ldots , p-1\}\) denotes the finite field with p elements. We define the map \({int}: \{0, 1\}^{256} \rightarrow {\mathbb {Z}}_{256}\), \((x_1, \ldots , x_{256}) \mapsto \sum _{i=0}^{255} {x}_{256-i} {2}^{i}\), where int is the bijective map from \(\{0, 1\}^{256}\) to \({\mathbb {Z}}_{256}\). Set \([k] := \{1, \ldots , k\}\) for each \(k \in {\mathbb {N}}_{+}\). For \(n\in {\mathbb {N}}_{+}\), \(\{0, 1\}^n\) denotes the set of all the n-bit strings when \(n \ge 1\), and \(\{0, 1\}^n\) denotes the empty set \(\emptyset\) when \(n=0\). In addition, we define \(\{0, 1\}^{*} := \bigcup _{n \ge 0} \{0, 1\}^{n}\) as the set of all the finite bit strings. We let “\(\perp\)” denote an abort symbol.

Let us denote by \(({\mathcal {X}}, {P}_{{\mathcal {X}}})\) the probability space, where \({\mathcal {X}}\) is a finite set and \({P}_{{\mathcal {X}}}\) is the map from \({\mathcal {X}}\) to the real closed interval [0, 1]. For each \(x \in {\mathcal {X}}\), we have \(0 \le {P}_{{\mathcal {X}}}(x) \le 1\) and \({P}_{{\mathcal {X}}}({\mathcal {X}})=1\). A subset \({\mathcal {A}} \subseteq {\mathcal {X}}\) is said to be an event, and we denote by \(\mathrm{Pr}[{\mathcal {A}}] := \sum _{x \in {\mathcal {A}}} {P}_{{\mathcal {X}}}(x)\) the probability of the event \({\mathcal {A}}\).

For a public key encryption scheme, let us denote the Alice’s public key and the Alice’s secret key (resp. Bob’s public key and the Bob’s secret key) by \(pk_{A}\) and \(sk_{A}\) (resp. \(pk_{B}\) and \(sk_{B}\)), respectively. Let \(\lambda\) be a security parameter. We use the symbol \(\mathrm{HomEnc}_{pk_B}(m)\) to represent the ciphertext of a palaintext m using Bob’s public key \(pk_{B}\). We assume that an Abelian group \({\mathbb {G}}\) is parameterized by the security parameter \(\lambda\).

2.1 Private intersection-sum Protocol

Before the description of the PI-Sum, we need some additional notations. Alice and Bob share the security parameter \(\lambda\). The set of user identifiers held by Alice is \(A = \left\{ u_i \right\} _{i \in [I]}\), and the set of all the pairs of user identifier and an integer associated with each user identifier is \(B = \left\{ \left( v_{j}, t_{j}\right) \right\} _{j \in [J]}\). In this situation, \(u_{i \in [I]} \in U\), \({v}_{j \in [J]} \in U\), and \(t_{j \in [J]} \in {\mathbb {N}}_{+}\) hold, where U is the set of user identifiers. The private exponent held by Alice (resp. Bob) is \(k_{1}\) (resp. \(k_{2}\)). In the PI-Sum, the Paillier encryption scheme [24] is used for an additive homomorphic encryption scheme. Bob generates the pair of the public key \(pk_{B}\) and the secret key \(sk_{B}\) using the key generation algorithm of the Paillier encryption scheme. Bob shares the public key \(pk_{B}\) with Alice, and privately holds the secret key \(sk_{B}\).

The PI-Sum uses the random oracle \(\mathrm{RO}: U \rightarrow {\mathbb {G}}\) from the set of user identifiers U to the Abelian group \({\mathbb {G}}\). We will describe the details of the Abelian group \({\mathbb {G}}\) in Sect. 4. The map from the set of user identifiers U to the Abelian group \({\mathbb {G}}\) is composed with the cryptgraphic hash function SHA-256 and the Montgomery curve Curve25519 [1]. By [1, Theorem 2.1], we can regard each element of \({\mathbb {F}}_{p}\) as the x-coordinate of the corresponding point on Curve25519. For each \(x \in \{0, 1\}^{256}\), we can treat \(int(x)\in {\mathbb {F}}_{p}\) as the x-coordinate of a point on Curve25519. Therefore, we can represent each 256 bit string as the point on Curve25519. We compute the hash value of each user identifier using SHA-256, and represent each 256 bit string as the point on Curve25519 via the map int. From the above discussion, we can compose the map from the user identifier space U to the Abelian group \({\mathbb {G}}\). In [13], it is assumed that the map from the user identifier space U to the Abelian group \({\mathbb {G}}\) is the random oracle for the security of the PI-Sum in honest-but-curious model [13]. For this reason, we also assume that the map from the user identifier space U to the Abelian group \({\mathbb {G}}\) is the random oracle \(\mathrm{RO}: U \rightarrow {\mathbb {G}}\).

We describe the PI-Sum [13] in Algorithm 1.

figure a

One can see that the PI-Sum ( [13, Figure 2] or Algorithm 1) is constructed by Precomputation Algorithm (Algorithm 2), Intersection Set Algorithm (Algorithm 3), and Intersection Sum Algorithm (Algorithm 4). Next, we explain Precomputation Algorithm (Algorithm 2), Intersection Set Algorithm (Algorithm 3), and Intersection Sum Algorithm (Algorithm 4), respectively.

We describe Precomputation Algorithm in Algorithm 2. Precomputation Algorithm encrypts all the elements in A and B, respectively. Precomputation Algorithm outputs \(\{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})^{k_{1}k_{2}}\}_{i \in [I]}\) and \(\{(\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}}, \mathrm{HomEnc}_{pk_B}\)

\((t_{{\phi }_{3}(j)}))\}_{j \in [J]}\) from Alice’s input set \(A = \{u_{i}\}_{i \in [I]}\) and Bob’s input set \(B = \{(v_{j}, t_{j})\}_{j \in [J]}\).

figure b

We describe Intersection Set Algorithm in Algorithm 3. Intersection Set Algorithm computes \(A \cap B\) without revealing the private sets A and B to each other. Intersection Set Algorithm outputs \({\mathcal {J}}\) from Alice’s input \(\{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})\)

\({}^{k_{1}k_{2}}\}{}_{i \in [I]}\), \(\{\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}}\}_{j \in [J]}\). The set \({\mathcal {J}}\) defined by the following equation:

$$\begin{aligned} {\mathcal {J}} = \left\{ {\phi }_{3}(j) \in [J]\ \big |\ \mathrm{RO}\left( v_{{\phi }_{3}(j)}\right) ^{k_1k_2} \in \left\{ \mathrm{RO}\left( u_{\phi _2\left( \phi _1\left( i\right) \right) }\right) ^{k_{1}k_{2}}\right\} _{i \in [I]}\right\} . \end{aligned}$$

By doing so, Alice obtains \({\mathcal {J}}\).

figure c

We describe Intersection Sum Algorithm in Algorithm 4. Intersection Sum Algorithm computes the sum of the rational integers associated with the elements of \(A \cap B\) without revealing the private sets A and B to each other. Intersection Sum Algorithm outputs the intersection sum \(\sum _{t_{{\phi }_{3}(j) \in {\mathcal {J}}}} t_{{\phi }_{3}(j)}\) from Alice’s input \(\{\mathrm{HomEnc}_{pk_{B}}(t_{{\phi }_{3}(j)})\}_{j \in [J]}\), intersection set \({\mathcal {J}}\) and Bob’s input \(sk_{B}\).

figure d

The next assumption (Assumption 1) and the addictive homomorphic property of the Paillier encryption scheme is needed in steps 3–8 of Algorithm 4. More precisely, Assumption 1 guarantees that the sum (not modular addition) of the rational integers associated with the elements of \(A \cap B\) is not exceed the modulus N.

Assumption 1

Set \({\mathcal {M}} := \{m \in {\mathbb {Z}}_{N} \mid m \ll N\} \subset {\mathbb {Z}}_N\). We assume that \(t_{j \in [J]} \in {\mathcal {M}}\) for each \(j \in [J]\), and we also assume the following inequality condition:

$$\begin{aligned} \sum _{m\in {\mathcal {M}}} m < N. \end{aligned}$$

Remark that by Assumption 1, we always have \(\mathrm{Sum} < N\) in step 6 of Algorithm 4. By doing so, Bob obtains the intersection sum \(\sum _{t_{{\phi }_{3}(j) \in {\mathcal {J}}}}t_{{\phi }_{3}(j)}\). Bob outputs the intersection sum \(\sum _{t_{{\phi }_{3}(j) \in {\mathcal {J}}}} t_{{\phi }_{3}(j)}\) in step 4 of the PI-Sum.

Thus, by the PI-Sum (Algorithm 1), Alice obtains \({\mathcal {J}}\) (the set of all indices of the elements \({v}_{{\phi }_{3}(j)} \in A \cap B\)), and Bob obtains the intersection sum \(\sum _{{\phi }_{3} (j) \in {\mathcal {J}}} {t}_{{\phi }_{3}(j)}\), where \({\phi }_{3} \in \mathrm{Bij}([J])\) is generated in step 16 of Algorithm 2.

2.2 Bloom filter

A Bloom filter is the probabilistic data structure using the k hash functions \(H_{\mu \in [k]}: \{0, 1\}^{*} \rightarrow {\mathbb {Z}}_{l}\). If there exists \(x \in \{0, 1\}^{*}\) such that \(H_{\mu }(x)\ne H_{\mu ^{\prime }}(x)\) for all \(\mu , \mu ^{\prime } \in [k]\) with \(\mu \ne \mu ^{\prime }\), then we call the k hash functions independent. We use the k independent hash functions \(H_{\mu \in [k]}: \{0, 1\}^{*} \rightarrow {\mathbb {Z}}_{l}\) for a Bloom filter. We use the k appropriate cryptographic hash functions for a Bloom filter. For an element \(x \in X\), we can check whether \(x \in Y\) or not using a Bloom filter. A Bloom filter is constructed using Algorithm 5 and Algorithm 6. Algorithm 5 stores a set Y in a l boolean array \(\mathrm{BF}[l]\), and Algorithm 6 checks whether \(x \in Y\) or not. We use [22, Algorithm 5] (resp. [22, Algorithm 6]) for Algorithm 5 (resp. Algorithm 6). The l boolean array \(\mathrm{BF}(Y)\) denotes the boolean array \(\mathrm{BF}[l]\) storing a set Y using Algorithm 5.

figure e
figure f

A Bloom filter may cause a false positive. If \(\sharp Y = w\) then the probability of a false positive \(\epsilon\) is given by the following equation:

$$\begin{aligned} \epsilon = \left\{ 1 - \left( 1- \frac{1}{l}\right) ^{wk}\right\} ^{k}. \end{aligned}$$
(1)

The probability of a false positive \(\epsilon\) has a negligible probability for \(l \ge 128w\) [4, Section 2.2]. It is well-known that for given l and w, the probability \(\epsilon\) is minimum when \(k = \frac{l}{w}{\log }_{e} 2\) [4, Section 2.1]. We refer the reader to [2, 4] for details.

3 Proposed protocol 1

In this section, we propose the proposed protocol 1 that applying a Bloom filter in order to improve the efficiency of the PI-Sum (Algorithm 1). Inspired by the technique using a Bloom filter proposed by Miyaji, Nakasho, and Nishida in [22], we shall improve the efficiency of Intersection Set Algorithm (Algorithm 3) by using a Bloom filter.

3.1 Description of proposed protocol 1

The proposed protocol 1 is described as Algorithm 7.

figure g

The PI-Sum is constructed by Algorithm 2, Algorithm 3, and Algorithm 4, whereas the proposed protocol 1 is constructed by Algorithm 2, Algorithm 8, and Algorithm 4. Thus, the only difference between the PI-Sum (Algorithm 1) and the proposed protocol 1 (Algorithm 7) is the phase of the computation of private set intersection (Algorithm 3 and Algorithm 8). Algorithm 8 (the proposed algorithm) is used for the phase of the computation of private set intersection in the proposed protocol 1, and is based on a Bloom filter. In the proposed protocol 1, Alice obtains \({\mathcal {J}}\) (the set of all indices of the elemens \(v_{{\phi }_{3}(j)} \in A \cap B\)), and Bob obtains the intersection sum \(\sum _{{\phi }_{3}(j) \in {\mathcal {J}}} t_{{\phi }_{3}(j)}\). Let \(\mathrm{RO}: U \rightarrow {\mathbb {G}}\) be defined as in Sect. 2. Alice and Bob share a security parameter \(\lambda\), and perform the setup phase in exactly the same manner as in the PI-Sum. Since the proposed protocol 1 uses a Bloom filter, the proposed protocol 1 requires Alice to generate \(\psi \in \mathrm{Inj}({\mathbb {G}}, \{0, 1\}^{*})\) and share \(\psi\) with Bob.

3.2 Main idea of proposed protocol 1

In this section, we explain the details and the main idea of the proposed protocol 1. In the proposed protocol 1, Alice and Bob perform Precomputation Algorithm (Algorithm 2) in step 1 of the proposed protocol 1 in exactly the same manner as in the PI-Sum, and Alice obtains \(\{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})^{k_{1}k_{2}}\}_{i \in [I]}\) and \(\{(\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}}, \mathrm{HomEnc}_{pk_{B}}(t_{{\phi }_{3}(j)}))\}_{j \in [J]}\) by this procedure. Alice and Bob also perform Intersection Sum Algorithm (Algorithm 4) in step 3 of the proposed protocol 1 in exactly the same manner as in the PI-Sum, and Bob obtains the intersection sum \(\sum _{t_{{\phi }_{3}(j) \in {\mathcal {J}}}}t_{{\phi }_{3}(j)}\) by this procedure. On the other hand, Alice obtains \({\mathcal {J}}\) in step 2 of the proposed protocol 1 by performing Algorithm 8 (the proposed algorithm), whereas Intersection Set Algorithm (Algorithm 3) are used in the PI-Sum. The proposed algorithm is described as Algorithm 8. We refer the reader to Sect. 3.1 for the definition of \(\psi \in \mathrm{Inj}({\mathbb {G}}, \{0, 1\}^{*})\) in Algorithm 8.

figure h

As we see above, the only difference between the PI-Sum and the proposed protocol 1 is the phase of the computation of private set intersection (Algorithm 3 and Algorithm 8). The proposed algorithm (Algorithm 8) outputs \({\mathcal {J}}\) from Alice’s input \(\{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})^{k_{1}k_{2}}\}_{i \in [I]}\), \(\{\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}}\}_{j \in [J]}\) using a Bloom filter. Since Alice computes \({\mathcal {J}}\) by checking whether \({\delta }_{1} = {\delta }_{2}\) or not for each \(({\delta }_{1}, {\delta }_{2}) \in \{\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}}\}_{j \in [J]} \times \{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})^{k_{1}k_{2}}\}_{i \in [I]}\), Alice exactly obtains \({\mathcal {J}}\) in Intersection Set Algorithm (Algorithm 3). However, the computational cost of Algorithm 3 is relatively large because of \(I \times J\) times computation of checking whether \({\delta }_{1} = {\delta }_{2}\) for each \(({\delta }_{1}, {\delta }_{2}) \in \{\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}}\}_{j \in [J]} \times \{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})^{k_{1}k_{2}}\}_{i \in [I]}\).

Inspired by the technique using a Bloom filter proposed by Miyaji, Nakasho, and Nishida in [22], we use a Bloom filter in the proposed algorithm (Algorithm 8) in order to improve the efficiency of Intersection Set Algorithm (Algorithm 3). More precisely, Algorithm 8 requires a computation of Const Algorithm (Algorithm 5) and J times computation of ElementCheck Algorithm (Algorithm 6). Since Algorithm 5 and Algorithm 6 essentially need multiple computation of hashing which is relatively small, the computational cost of the proposed algorithm (Algorithm 8) is smaller than that of Algorithm 3. This is the main observation which allows us to improve the efficiency of Intersection Set Algorithm (Algorithm 3).

On the other hand, since a Bloom filter may cause a false positive, \({\phi }_{3}(j) \in [J]\) satisfying \(\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}} \not \in \{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})^{k_{1}k_{2}}\}_{i \in [I]}\) might be included in \({\mathcal {J}}\) in Algorithm 8. We will discuss this point in Sect. 3.2 (Lemma 1).

Proposition 1

Step 1 of Algorithm 7 (the proposed protocol 1) corresponds exactly to step 1 of Algorithm 1 (the PI-Sum). Steps 3–4 of Algorithm 7 (the proposed protocol 1) also corresponds exactly to steps 3–4 of Algorithm 1 (the PI-Sum). On the other hand, step 2 of Algorithm 7 (the proposed protocol 1) only deffer from step 2 of Algorithm 1 (the PI-Sum).

Remark 1

Let \(\mathrm{Enc}_{b}: \{0, 1\}^{b} \times \{0, 1\}^{\lambda } \rightarrow \{0, 1\}^{b}\) be a pseudorandom permutation. If \(\lambda = 128\) and \(b = 128\), then we can easily construct k independent hash functions \(H_{\mu \in [k]}: \{0, 1\}^{*} \rightarrow {\mathbb {Z}}_{256}\) by using SHA-256 \(H: \{0, 1\}^* \rightarrow \{0, 1\}^{256}\) and \(\mathrm{Enc}_{128, {\mathcal {K}}}: \{0, 1\}^{128} \times \{{\mathcal {K}}\} = \{0, 1\}^{128} \rightarrow \{0, 1\}^{128}\), where \({\mathcal {K}} \in \{0, 1\}^{128}\). For \(x \in \{0, 1\}^{*}\), we first compute \(H(x) = H_{\mathrm{left}}(x) \; \vert \vert \; H_{\mathrm{right}}(x) \in \{0, 1\}^{256}\). Here, \(H_\mathrm{left}(x) \in \{0, 1\}^{128}\) (resp. \(H_{\mathrm{right}}(x) \in \{0, 1\}^{128}\)) is the left (resp. the right) most significant 128-bits of H(x). Since \(H_{\mathrm{left}}(x) \in \{0, 1\}^{128}\) and \(H_\mathrm{right}(x) \in \{0, 1\}^{128}\), we have \(\mathrm{Enc}_{128, {\mathcal {K}}}(H_{\mathrm{left}}(x)), \mathrm{Enc}_{128, {\mathcal {K}}}(H_\mathrm{right}(x)) \in \{0, 1\}^{128}\). Thus, we can construct a hash function from \(\{0, 1\}^{*}\) to \({\mathbb {Z}}_{256}\) by taking

$$\begin{aligned} int(\mathrm{Enc}_{128, {\mathcal {K}}}(H_{\mathrm{left}}(x)) \; \vert \vert \; \mathrm{Enc}_{128, {\mathcal {K}}}(H_{\mathrm{right}}(x))) \in {\mathbb {Z}}_{256} \end{aligned}$$

for each \(x \in \{0, 1\}^{*}\). We can construct \(\mathrm{Enc}_{128, {\mathcal {K}}}\) using 128-bit AES [8], and we can construct k independent hash functions by changing the key \({\mathcal {K}} \in \{0, 1\}^{128}\).

3.3 Correctness of proposed protocol 1

In this section, we shall prove the correctness of the proposed protocol 1 in Theorem  1. The correctness of the PI-Sum was proven by Ion et al. in [13, Section 3]. Since the only difference between the Algorithm 1 and Algorithm 7 is the phase of the computation of private set intersection, it suffices for the proof of the correctness of Algorithm 7 to show the following lemma (Lemma 1).

Lemma 1

Let \({\mathcal {A}}\) denote the event that in an execution of Algorithm 7, it is determined in step 4 of Algorithm 8 that the number\({{\phi }_{3}(j)} \in [J]\) satisfying \(\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}} \not \in \{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})^{k_{1}k_{2}}\}_{i \in [I]}\) belongs to \({\mathcal {J}}\). Let \({\mathcal {B}}\) denote the event that Alice inputs\({\psi }(\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}})\) satisfying \(\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}} \not \in \{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})^{k_{1}k_{2}}\}_{i \in [I]}\) and \(\mathrm{BF}(\psi (\{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})\}_{i \in [I]}))\)in Algorithm 5, and obtains1 in step 4 of Algorithm 8. Namely, the event \({\mathcal {B}}\) is that the Bloom filter determines the number\(\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}} \in \{\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}}\}_{j \in [J]}\)belongs to\(\{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})^{k_{1}k_{2}}\}_{i \in [I]}\) although \(\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}} \not \in \{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})^{k_{1}k_{2}}\}_{i \in [I]}\). Then, we have the following inequality:

$$\begin{aligned} \mathrm{Pr}[{\mathcal {A}}] \le \mathrm{Pr}[{\mathcal {B}}] = \left\{ 1 - \left( 1- \frac{1}{l}\right) ^{Ik}\right\} ^{k}. \end{aligned}$$

In other words, the probability \(\mathrm{Pr}[{\mathcal {A}}]\) is negligible for sufficiently largel.

Proof

If the event \({\mathcal {A}}\) occurs, then the event \({\mathcal {B}}\) also occurs in step 4 of Algorithm 8. From this, we have \(\mathrm{Pr}[{\mathcal {A}}] \le \mathrm{Pr}[{\mathcal {B}}]\). A Bloom filter has a false positive. If the event \({\mathcal {B}}\) occurs, then a false positives occurs in Algorithm 6. If a false positives occurs in Algorithm 6, then the event \({\mathcal {B}}\) occurs. Thus the following equation holds:

$$\begin{aligned} \mathrm{Pr}[{\mathcal {B}}] = \left\{ 1 - \left( 1- \frac{1}{l}\right) ^{Ik}\right\} ^{k}. \end{aligned}$$
(2)

By Inequality \(\mathrm{Pr}[{\mathcal {A}}] \le \mathrm{Pr}[{\mathcal {B}}]\) and by Eq. (2), the following inequality holds:

$$\begin{aligned} \mathrm{Pr}[{\mathcal {A}}] \le \mathrm{Pr}[{\mathcal {B}}] = \left\{ 1 - \left( 1- \frac{1}{l}\right) ^{Ik}\right\} ^{k}. \end{aligned}$$
(3)

By setting \(l=128I\) and \(k = 128{\log }_{e} 2\), we have \(\mathrm{Pr}[{\mathcal {B}}] = \{1 - (1 - 1/l)^{Ik}\}^{k} \approx 1/2^{128}\) (see [4, section 2.2]). Therefore, the probability \(\mathrm{Pr}[{\mathcal {B}}] = \{1 - (1 - 1/l)^{Ik}\}^{k}\) is negligible. Thus, by Inequality (3), the probability \(\mathrm{Pr}[{\mathcal {A}}]\) is also negligible. This concludes the proof of Lemma 1. \(\square\)

By Lemma 1, the following theorem (Theorem 1) holds.

Theorem 1

The proposed protocol 1 (Algorithm 7) satisfies the correctness.

3.4 Security analysis of proposed protocol 1

In this section, we briefly discuss the security of the proposed protocol 1. Our analysis is similar to that of the PI-Sum [13]. The security of the PI-Sum was proven by Ion et al. in [13, Section 3.1]. More precisely, the authors prove the security of the PI-Sum in the honest-but-curious model in [13, Theorem 1] and [13, Theorem 2]. Since the only difference between the PI-Sum ( [13, Figure 2] or Algorithm 1) and the proposed protocol 1 (Algorithm 7) is the phase of the computation of private set intersection (Algorithm 3 and Algorithm 8), it suffices to consider the phase of the computation of private set intersection (Algorithm 8) for the security of the proposed protocol 1. The following theorem holds for the security of the proposed protocol 1.

Theorem 2

Alice cannot obtain any information about Bob’s secret in Algorithm 8. Moreover, the proposed protocol 1 is secure in the honest-but-curious model.

Proof

We first prove that Alice cannot obtain any information about Bob’s secret \(B = \{(v_{j}, t_{j})\}_{j \in [J]}\). Alice performs Const Algorithm (Algorithm 5) in step 1 of Algorithm 8 and Alice performs ElementCheck Algorithm (Algorithm 6) in step 4 of Algorithm 8. Since Alice inputs the set of ciphertexts \(\psi \left( \left\{ \mathrm{RO}\left( u_{\phi _2\left( \phi _1\left( i\right) \right) }\right) ^{k_1k_2}\right\} _{i\in [I]} \right) , \psi \left( \left\{ \mathrm{RO}\left( v_{\phi _3\left( j\right) }\right) ^{k_1k_2} \right\} _{j\in [J]} \right)\) in these algorithms, it is infeasible for Alice to obtain the Bob’s secret \(v_{\phi _3 (j)} \in B, k_2\). In addition, the index \(i\in [I]\) of \(u_{i} \in A\) is not same as the index \({\phi }_{2}({\phi }_{1}(i)) \in [I]\) of the cipertext \(\mathrm{RO}\left( u_{\phi _2\left( \phi _1\left( i\right) \right) }\right) ^{k_1k_2}\) of \(u_{i} \in A\). Similarly, the index \(j \in [J]\) of \(v_{j}, t_{j} \in B\) is not same as the index \({\phi }_{3} (j)\in [J]\) of the cipertext \(\left( \mathrm{RO}\left( v_{\phi _3\left( j\right) }\right) ^{k_1k_2}, \mathrm{HomEnc}_{pk_B}\left( t_{\phi _3\left( j\right) }\right) \right)\) of \(v_{j}, t_{j} \in B\). Thus, it is infeasible for Alice to obtain any element in \(A \cap B\) by comparing the index \({\phi }_{3} (J)\in {\mathcal {J}}\) and the index \(i\in [I]\). The latter part follows from the former part. \(\square\)

Remark 2

Ion et al. discuss the security of PI-Sum [13] in the honest-but-curious model. More precisely, they discuss that Alice cannot obtain any information about Bob’s secret in [13, Theorem 1] and Bob cannot obtain any information about Alice’s secret in [13, Theorem 2]. On the other hand, Bob does not input its own secret and does not obtain the output of Algorithm 8. In addition, Bob does not carry out any computation in Algorithm 8. Therefore, we only discuss the security of the phase of the computation of private set intersection (Algorithm 8) for the security of the proposed protocol 1 in Theorem 2.

4 Efficiency estimation of proposed protocol 1

In this section, we estimate the computational costs of the PI-Sum (Algorithm 1) and the proposed protocol 1 (Algorithm 7). In addition, we compare the computational costs of the proposed protocol 1 with that of the PI-Sum. We use Curve25519 as Abelian group \({\mathbb {G}}\). Since \(p = 2^{255} - 19\) and \(\log _{2} p = 255\), we assume that \(\lambda = 128\) and \(\lceil \log _2 N \rceil = 3072\). As mentioned in Sect. 2.1, we can regard each \(x \in {\mathbb {F}}_{p}\) as the x-coordinate of a point on Curve25519 [1, Theorem 2.1]. Since a scalar multiplication of Curve25519 is not necessary to compute the y-coordinate of a point on Curve25519, the computational cost of the random oracle RO is negligibly small compared to other steps in the PI-Sum and the proposed protocol 1. Therefore, we can ignore the computational cost of the random oracle RO. By a similar argument, we can also ignore the computational costs of the mappings \({\phi }_{1}, {\phi }_{2}, {\phi }_{3}\), and \({\psi }\). We use the notation in Table 1 to estimate the computational costs of the PI-Sum and the proposed protocol 1.

Table 1 Notation for the estimation

Here, we estimate the computational costs of the PI-Sum and the proposed protocol 1 by counting the number of \({\mathbf {M}}^{(p)}\), \({\mathbf {S}}\), \(\mathbf {ECDBL}\), \(\mathbf {ECADD}\), \(\mathbf {Enc}\), \(\mathbf {Dec}\), \(\mathbf {Exp}\), and \({\mathbf {M}}^{(N^2)}\), respectively. We denote the computational cost of Precomptation Algorithm (Algorithm 2) (resp. Intersection Sum Algorithm (Algorithm 4)) by \(\mathrm{Cost_{Precomp}}\) (resp. \(\mathrm{Cost_{Inter\text {-}Sum}}\)). We define

$$\begin{aligned} \mathrm{Cost_{Common}} := \mathrm{Cost_{Precomp}} + \mathrm{Cost_{Inter\text {-}Sum}}. \end{aligned}$$

We first estimate the computational cost \(\text {Cost}_{\text {Common}}\). Table 2 shows the number of computations in Precomptation Algorithm and Intersection Sum Algorithm.

Table 2 The number of computations in Algorithm 2 and Algorithm 4

From [5, Section 3], we have \(0.8 {\mathbf {M}}^{(p)} \le {\mathbf {S}} \le {\mathbf {M}}^{(p)}\). Here, we assume that \({\mathbf {S}}= {\mathbf {M}}^{(p)}\). From [1, Section 5], we have \(\mathbf {ECDBL} = 4 {\mathbf {S}} + 1 {\mathbf {M}}^{(p)}=5 {\mathbf {M}}^{(p)}\) and \(\mathbf {ECADD}=5 {\mathbf {M}}^{(p)}\). These equations yield \(\mathbf {ECDBL} = \mathbf {ECADD} = 5{\mathbf {M}}^{(p)}\). Since we can use precomputing techniques in the Paillier encryption scheme by applying the technique of [24, Section 7], we assume that \(1 \mathbf {Enc}=1 \mathbf {Dec}=1 \mathbf {Exp}\). By [20, Algorithm 14.76], we have \(1 \mathbf {Exp} = \frac{3}{2} \lceil \log _2{N} \rceil {\mathbf {M}}^{(N^2)}\), namely,

$$\begin{aligned} 1 \mathbf {Enc} = 1 \mathbf {Dec} = 1 \mathbf {Exp} = \frac{3}{2} \lceil {\log }_{2} N \rceil {\mathbf {M}}^{(N^2)}. \end{aligned}$$
(4)

For the computational cost of one scalar multiplication \(\mathbf {ScalarMult}\), the following equation holds by [6]:

$$\begin{aligned} \begin{aligned} 1 \mathbf {ScalarMult}&= {\log }_{2} p \times \mathbf {ECDBL} + \frac{{\log }_{2} p}{2} \times \mathbf {ECADD} \\&= \frac{15}{2}({\log }_{2} p) {\mathbf {M}}^{(p)}. \end{aligned} \end{aligned}$$
(5)

By Eqs. (4 and  5), the computational cost \(\text {Cost}_{\text {Precomp}}\) satisfies the following equation:

$$\begin{aligned} \begin{aligned} \mathrm{Cost_{Precomp}}&= 2(I+J) \times \mathbf {ScalarMult} + J \times \mathbf {Enc} \\&= 15({\log }_{2} p)(I + J) {\mathbf {M}}^{(p)} + \frac{3}{2} \lceil {\log }_{2} N \rceil J {\mathbf {M}}^{(N^2)}. \end{aligned} \end{aligned}$$
(6)

By a similar argument as above, the computational cost \(\text {Cost}_{\text {Inter-Sum}}\) satisfies the following equation:

$$\begin{aligned} \begin{aligned} \mathrm{Cost_{Inter\text {-}Sum}}&= J \times {\mathbf {M}}^{(N^2)} + 1 \mathbf {Enc} + 1\mathbf {Dec} \\&= (J + 3 \lceil {\log }_{2} N \rceil ){\mathbf {M}}^{(N^2)}. \end{aligned} \end{aligned}$$
(7)

By Eqs. (6 and  7), the computational cost \(\text {Cost}_{\text {Common}}\) satisfies the following equation:

$$\begin{aligned} \begin{aligned} \mathrm{Cost_{Common}}&= \mathrm{Cost_{Precomp}} + \mathrm{Cost_{Inter\text {-}Sum}} \\&= 15({\log }_{2} p)(I + J) {\mathbf {M}}^{(p)} \\& + \left( \left( \frac{3}{2} \lceil \log _2{N} \rceil + 1\right) J + 3 \lceil \log _2{N} \rceil \right) {\mathbf {M}}^{(N^2)} \\&= 3825(I + J) {\mathbf {M}}^{(p)} + (4609J + 9216) {\mathbf {M}}^{(N^2)}. \end{aligned} \end{aligned}$$
(8)

Finally, we estimate the computational costs of the PI-Sum and the proposed protocol 1. Let \(\text {Cost}_{\text {PI-Sum}}\) (resp. \(\text {Cost}_{\text {Proposed1}}\)) be the computational cost of Algorithm 1 (resp. Algorithm 7). In Intersection Set Algorithm, we compute \({\mathcal {J}}\) (the set of all indices of the elements \(v_{{\phi }_{3}(j)} \in A \cap B\)) by checking whether \({\delta }_{1} = {\delta }_{2}\) or not for each \(({\delta }_{1}, {\delta }_{2}) \in \{\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}}\}_{j \in [J]} \times \{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})^{k_{1}k_{2}}\}_{i \in [I]}\). Hence, the computational cost \(\text {Cost}_{\text {PI-Sum}}\) satisfies

$$\begin{aligned} \begin{aligned} \mathrm{Cost_{PI\text {-}Sum}}&= \mathrm{Cost_{Common}} + IJ{\lambda } \\&= 3825(I + J) {\mathbf {M}}^{(p)} + (4609J + 9216) {\mathbf {M}}^{(N^2)} + IJ{\lambda }. \end{aligned} \end{aligned}$$
(9)

In the proposed algorithm, we input \({\psi }(\{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})^{k_{1}k_{2}}\}_{i \in [I]}) \subset \{0, 1\}^{*}\) in Const Algorithm. Const Algorithm requires I iterations in steps 4–11 of Algorithm 5. Thus, the computational cost of the whole processes in Const Algorithm is \(I\lambda\). We input \({\psi }(\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}})\) and \(\text {BF}({\psi }(\{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})^{k_{1}k_{2}}\}_{i \in [I]}))\) in ElemenetCheck Algorithm. It is easy to see that the computational cost of J times performing of Algorithm 6 is \(J\lambda\). Hence, the computational cost \(\text {Cost}_{\text {Proposed1}}\) satisfies

$$\begin{aligned} \begin{aligned} \mathrm{Cost_{Proposed1}}&= \mathrm{Cost_{Common}} + (I+J)\lambda \\&= 3825(I + J) {\mathbf {M}}^{(p)} + (4609J + 9216) {\mathbf {M}}^{(N^2)} + (I+J)\lambda . \end{aligned} \end{aligned}$$
(10)

5 Countermeasures against repeatedly executing the PI-Sum

As explained in [13, Section 3], it may be possible that “repeatedly executing the PI-Sum in sequence will leak information due to correlated inputs in different sessions, and one strategy for resolving this issue would be to compose differential privacy techniques [11] with the cryptographic protocol, by adding appropriately sampled noise to the inputs”. In this section, we propose two protocols (Algorithm 10 and Algorithm 11) having the resistance to the above security problem. These proposed protocols are modifications of the “Reverse” Private Intersection-Sum protocol (RePI-Sum for short) in [13, Figure 4].

5.1 Description of proposed protocol 2 and proposed protocol 3

In this section, we propose the proposed protocol 2 (Algorithm 10) and the proposed protocol 3 (Algorithm 11). These protocols allow Bob to obtain the intersection sum. The proposed protocol 2 and the proposed protocol 3 are based on the proposed estimation algorithm for the intersection size (Algorithm 9) and the proposed protocol 1. Algorithm 9 is based on Bloom filter Bootstrap [15, Algorithm 2] for preventing the security problem in the RePI-Sum ( [13, Figure 4]). The proposed estimation algorithm for the intersection size is described as Algorithm 9. We refer the reader to Sect. 3.1 for the definition of \(\psi \in \mathrm{Inj}({\mathbb {G}}, \{0, 1\}^{*})\) in Algorithm 9.

figure i

In Algorithm 9, Alice has the private set X and Bob has the private set Y. Algorithm 9 allows Alice and Bob to obtain the estimation size of the intersection \({\hat{x}} \approx \sharp (X \cap Y)\). Kikuchi and Sakuma in [15, Section 5.1] show that the following inequality holds

$$\begin{aligned} \vert \lceil {\hat{x}} \rceil - \sharp (X \cap Y) \vert \le 1 \end{aligned}$$

in Bloom filter Bootstrap [15, Algorithm 2]. So, we assume that the above inequality holds in Algorithm 9. In Algorithm 9, we use [10, Protocol 2] for Secure Scalar Product Protocol (SSP for short). In Algorithm 9, the input S is the number of executions of SSP. In steps 1–6 of Algorithm 9, Alice obtains \(\text {BF}({\psi }(X))\) and Bob obtains \(\text {BF}({\psi }(Y))\) in the d-th execution of SSP, where \(1 \le d \le S\). In steps 4–5 of Algorithm 9, \(y_{d}\) satisfies the following equation:

$$\begin{aligned} y_{d} = y_{a, d} + y_{b, d} = \sharp \{\alpha \in {\mathbb {Z}}_l \mid \text {BF}({\psi }(X))[\alpha ] = \text {BF}({\psi }(Y))[\alpha ] = 1\}. \end{aligned}$$

For \(\alpha \in {\mathbb {Z}}_{l}\), let us denote by \(\theta\) the probability of the event \(\text {BF}({\psi }(X))[\alpha ] = \text {BF}({\psi }(Y))[\alpha ] = 1\). In step 7 of Algorithm 9, \({\hat{\theta }}\) is the Bayesian estimation of the probability \(\theta\). We call the phase of computing \(y_{d}\) using SSP (steps 1–6 of Algorithm 9) the secure scalar product phase. We also call the phase of computing the output using \({\hat{\theta }}\) (steps 7–8 of Algorithm 9) the intersection size phase. We describe the proposed protocol 2 (resp. the proposed protocol 3). The detailed description of the algorithms are given in Algorithm 10 and Algorithm 11.

figure j
figure k

Algorithm 10 (resp. Algorithm 11) is the protocol that Alice and Bob perform Algorithm 9 in just before step 1 (resp. just after step 1) of Algorithm 7. In Algorithm 10 and Algorithm 11, Alice and Bob share the security parameter \(\lambda\), and perform the setup phase in exactly the same manner as in Algorithm 7. Algorithm 10 and Algorithm 11 require Alice and Bob to share the threshold T with respect to the cardinality of the intersection set of input sets A and B.

We describe the overview of Algorithm 10. Alice and Bob perform Algorithm 9 in step 1 of Algorithm 10. Alice (resp. Bob) inputs the private set \(\{u_{i}\}_{i \in [I]}\) (resp. \(\{v_{j}\}_{j \in [J]}\)) in Algorithm 9. Thus Alice and Bob obtain the estimation size of the intersection \({\hat{x}} \approx \sharp (\{u_{i}\}_{i \in [I]} \cap \{v_{j}\}_{j \in [J]})\). Alice (or Bob) checks whether \({\hat{x}} \le T\) or not in step 2 of Algorithm 10. If \({\hat{x}} \le T\), then Alice (or Bob) outputs “\(\perp\)” and aborts Algorithm 10. If Bob aborts Algorithm 10, then Alice and Bob add a new element into the input sets A and B, and repeat Algorithm 10. As similar to steps 1–4 of Algorithm 7, Bob computes and outputs the intersection sum in steps 6–9 of Algorithm 10.

Next, we describe the overview of Algorithm 11. As similar to step 1 of Algorithm 7, Algorithm 11 encrypts \(A = \{u_{i}\}_{i \in [I]}\) and \(B = \{(v_{j}, t_{j})\}_{j \in [J]}\), and outputs \(\{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})^{k_{1}k_{2}}\}_{i \in [I]}\), \(\{(\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}}, \mathrm{HomEnc}_{pk_{B}}\) \((t_{{\phi }_{3}(j)}))\}_{j \in [J]}\) in step 1 of Algorithm 11. Alice holds \(\{\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}}\}_{j \in [J]}\) in just after step 1 of Algorithm 11. Bob also holds \(\{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})^{k_{1}k_{2}}\}_{i \in [I]}\) in step 10 of Algorithm 2. Alice and Bob perform Algorithm 9 in step 2 of Algorithm 11. Alice inputs \(\{\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}}\}_{j \in [J]}\) and Bob inputs \(\{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})^{k_{1}k_{2}}\}_{i \in [I]}\) into Algorithm 9. Thus Alice and Bob obtain the estimation size of the intersection

$$\begin{aligned} {\hat{x}} \approx \\ \sharp (\{\mathrm{RO}(v_{{\phi }_{3}(j)})^{k_{1}k_{2}}\}_{j \in [J]} \cap \{\mathrm{RO}(u_{{\phi }_{2}({\phi }_{1}(i))})^{k_{1}k_{2}}\}_{i \in [I]}). \end{aligned}$$

As similar to steps 2–5 of Algorithm 10, Algorithm 11 also checks whether \({\hat{x}} \le T\) or not in steps 3–6 of Algorithm 11. If \({\hat{x}} \le T\), then Alice (or Bob) outputs “\(\perp\)” and aborts Algorithm 11. As similar to steps 1–4 of Algorithm 7, Algorithm 11 outputs the intersection sum in steps 7–9 of Algorithm 11. From the above discussion, Algorithm 10 (resp. Algorithm 11) performs Algorithm 9 in just before (resp. after) encrypting the input sets A and B.

In Algorithm 10 and Algorithm 11, Bob can obtain the estimation size of the intersection \({\hat{x}} \approx \sharp (A \cap B)\). Bob can compute the intersection sum if and only if \(T \le \sharp (A \cap B)\) holds. If Bob may know the element of the Alice’s private set \(u_{i \in [I]} \in A\) from the intersection sum, Bob can prevent leakage of the element of the Alice’s private set \(u_{i \in [I]} \in A\) by aborting step 4 of Algorithm 10 or step 5 of Algorithm 11 without computing the output. Bob can obtain the intersection sum in Algorithm 10 and Algorithm 11.

5.2 Efficiency estimation of proposed protocol 2 and proposed protocol 3

In this section, we estimate the computational costs of the proposed protocol 2 (Algorithm 10) and the proposed protocol 3 (Algorithm 11). Let us denote by \({\mathbf {M}}^{(p^{\prime })}\) the computational cost of one multiplication on \({\mathbb {F}}_{p^{\prime }}\). Here, \(p^{\prime }\) is a prime satisfying \(p^{\prime } > l\) and \(p^{\prime } \approx l\), and is used for step 4 of Algorithm 9 ( [10, Protocol 2]). By [15, Section 4.4], we assume that \(S = 10\) and \(k = 1\). We define

$$\begin{aligned} \mathrm{tmp} := {\hat{\theta }} - 1 + (1 - 1/l)^{k \sharp X} + (1 - 1/l)^{k \sharp Y}. \end{aligned}$$

Remark that \(\mathrm{tmp}\) is appeared in the equation of step 8 of Algorithm 9. We denote the computational costs of the proposed estimation algorithm for the intersection size (Algorithm 9) by \(\mathrm{Cost_{Estimation}}\). Similarly, we denote the secure scalar product phase (steps 1–6 of Algorithm 9), and the intersection size phase (steps 7–8 of Algorithm 9) by \(\mathrm{Cost_{SSP}}\) and \(\mathrm{Cost_{Size}}\), respectively. As in Section 4, we use the symbol \(\mathrm{Cost_{Proposed2}}\) (resp. \(\mathrm{Cost_{Proposed3}}\)) to represent the computational cost of the proposed protocol 2 (resp. the proposed protocol 3). Since the only difference between the proposed protocol 2 and the proposed protocol 3 is their timing of performing Algorithm 9, we have

$$\begin{aligned} \mathrm{Cost_{Proposed2}} = \mathrm{Cost_{Proposed3}}. \end{aligned}$$
(11)

Therefore, we only estimate \(\mathrm{Cost_{Proposed2}}\) by counting the number of \({\mathbf {M}}^{(p^{\prime })}\). We first estimate the computational cost \(\mathrm{Cost_{Estimation}}\). The computational cost of step 2 of Algorithm 9 (resp. step 3 of Algorithm 9) is \({\lambda } \sharp X\) (resp. \({\lambda } \sharp Y\)). By [10, Section 4.3], the computational cost of step 4 of Algorithm 9 is \(l^{2}/2{\mathbf {M}}^{(p^{\prime })}\). Thus the computational cost of \(\mathrm{Cost_{SSP}}\) satisfies the following equation:

$$\begin{aligned} \begin{aligned} \mathrm{Cost_{SSP}}&= S\left( {\lambda }\sharp X + {\lambda }\sharp Y + \frac{l^{2}}{2}{\mathbf {M}}^{(p^{\prime })}\right) \\&= 1280(\sharp X + \sharp Y) + 5l^{2}{\mathbf {M}}^{(p^{\prime })}. \end{aligned} \end{aligned}$$
(12)

In step 8 of Algorithm 9, it need to compute \({\hat{x}} = \sharp X + \sharp Y - (\log _{1 - 1/l} \mathrm{tmp})/k\). Since

$$\begin{aligned} {\log }_{1 - 1/l} \mathrm{tmp} = {\log }_{e} \mathrm{tmp}/{\log }_{e} (1 - 1/l), \end{aligned}$$

we can precompute \(1/{\log }_{e} (1 - 1/l)\). Therefore, we can ignore the computational cost of computing \(1/{\log }_{e} (1 - 1/l)\). By [20, Table 2.1] and [3, Section 9], the computational cost of \(\mathrm{Cost_{Size}}\) satisfies the following equation:

$$\begin{aligned} \begin{aligned} \mathrm{Cost_{Size}}&\approx ({\log }_{2} Sl)^{2} + ({\log }_{2} \mathrm{tmp})^{3} + ({\log }_{2} ({\log }_{e} \mathrm{tmp}))^{2} \\&\approx {\mathbf {M}}^{(p^{\prime })} + ({\log }_{2} \mathrm{tmp})^{3} + ({\log }_{2} ({\log }_{e} \mathrm{tmp}))^{2}. \end{aligned} \end{aligned}$$
(13)

By Eqs. (12 and  13), the computational cost of \(\mathrm{Cost_{Estimation}}\) satisfies the following equation:

$$\begin{aligned} \begin{aligned} \mathrm{Cost_{Estimation}}&= \mathrm{Cost_{SSP}} + \mathrm{Cost_{Size}} \\&\approx (5l^{2} + 1){\mathbf {M}}^{(p^{\prime })} + 1280(\sharp X + \sharp Y) \\& + ({\log }_{2} \mathrm{tmp})^{3} + ({\log }_{2} ({\log }_{e} \mathrm{tmp}))^{2}. \end{aligned} \end{aligned}$$

Hence, the computational cost of \(\mathrm{Cost_{Proposed2}}\) satisfies the following equation:

$$\begin{aligned} \mathrm{Cost_{Proposed2}} = \mathrm{Cost_{Proposed1}} + \mathrm{Cost_{Estimation}}. \end{aligned}$$
(14)

By Eqs. (11 and  14), we have

$$\begin{aligned} \mathrm{Cost_{Proposed3}}= \mathrm{Cost_{Proposed1}} + \mathrm{Cost_{Estimation}}. \end{aligned}$$
(15)

6 Results and discussion

In Sect. 3, we propose Algorithm 7 which is an efficiency improvement technique for the PI-Sum (Algorithm 1). By Table 2 and Eq. (9), the computational complexity of Algorithm 1 is \(O((I + J){\lambda }^{3})\). Similarly, by Table 2 and Eq. (10), the computational complexity of Algorithm 7 is also \(O((I + J){\lambda }^{3})\). Thus, the computational complexity of Algorithm 1 is same as the computational complexity of Algorithm 7. However, we claim that if I and J are relatively large for a fixed security parameter \({\lambda }\) then the proposed protocol 1 (Algorithm 7) is more efficient than the PI-Sum ( [13, Figure 2] or Algorithm 1). Next we explain why the claim holds.

Figure 1 shows the computational costs of \(\text {Cost}_{\text {PI-Sum}}\) and \(\text {Cost}_{\text {Proposed1}}\). For the sake of simplicity we assume that \(I = J = 2^{x}\) and the security parameter \(\lambda = 128\). In Fig. 1, the graph of \(y = f(x)\) (resp. \(y = g(x)\)) is described, where \(\text {Cost}_{\text {PI-Sum}} = 2^y\) (resp. \(\text {Cost}_{\text {Proposed1}} = 2^y\)). By [20, Table 2.5] and by [1, Section 1], the multiplication cost \({\mathbf {M}}^{(p)}\) is \(O((\lceil \log _2 p \rceil )^{2})\). Since \(p = 2^{255} - 19\), by omitting the constant cost of the multiplication cost \({\mathbf {M}}^{(p)}\), we may assume that

$$\begin{aligned} {\mathbf {M}}^{(p)} \approx (\lceil \log _2 p \rceil )^{2} \approx (\lceil \log _2 (2^{255} - 19) \rceil )^{2} \approx (255)^2. \end{aligned}$$
(16)

From [20, Table 2.5] and by [24, Section 4], the multiplication cost \({\mathbf {M}}^{(N^2)}\) is \(O((\lceil 2 \log _2 N \rceil )^{2})\). It is known that if \(\lambda = 128\) then \(\lceil \log _{2} N \rceil \approx 3072\) (See [23]). By omitting the constant cost of the multiplication cost \({\mathbf {M}}^{(N^2)}\), we may assume that

$$\begin{aligned} {\mathbf {M}}^{(N^2)} \approx (\lceil 2 \log _2 N \rceil )^{2} \approx (2 \times 3072)^{2} \approx (6144)^{2}. \end{aligned}$$
(17)

In this situation, we have

$$\begin{aligned} {\mathbf {M}}^{(N^2)} \approx 144 {\mathbf {M}}^{(p)}. \end{aligned}$$

Substituting Eqs. (16 and  17) into Eq. (9) yields

$$\begin{aligned} \mathrm{Cost_{PI\text {-}Sum}} = 128 \times 2^{2x} + 174481365474 \times 2^x + 347892350976. \end{aligned}$$
(18)

Similarly, substituting Eqs. (16 and  17) into Eq. (10) yields

$$\begin{aligned} \mathrm{Cost_{Proposed1}} = 174481365730 \times 2^x + 347892350976. \end{aligned}$$
(19)

By Eqs. (18 and  19), if \(x=16\) then the computational cost of the proposed protocol 1 is about 1% smaller than that of the PI-Sum, namely,

$$\begin{aligned} \text {Cost}_{\text {Proposed1}} \approx \frac{99}{100} \times \text {Cost}_{\text {PI-Sum}}. \end{aligned}$$

If \(x = 22\) then the computational cost of the proposed protocol 1 is about 50% smaller than that of the PI-Sum, namely,

$$\begin{aligned} \text {Cost}_{\text {Proposed1}} \approx \frac{1}{2} \times \text {Cost}_{\text {PI-Sum}}. \end{aligned}$$
Fig. 1
figure 1

Comparison result of the computational costs of \(\text {Cost}_{\text {PI-Sum}}\) and \(\text {Cost}_{\text {Proposed1}}\)

Thus, the proposed protocol 1 (Algorithm 7) is faster than the PI-Sum (Algorithm 1).

In Sect. 5, we propose other efficiency improvement techniques for the PI-Sum. As explained in Sect. 5, repeatedly executing the PI-Sum in sequence will leak information. The proposed protocol 2 (Algorithm 10) and the proposed protocol 3 (Algorithm 11) are modifications of the RePI-Sum [13, Figure 4], and can prevent the above security problem. Moreover, by adding an aborting functionality in step 4 of Algorithm 10 and step 5 of Algorithm 11, Algorithm 10 and Algorithm 11 are faster than the RePI-Sum [13, Figure 4].

7 Conclusion

In this paper, we proposed the efficiency improvement techniques for the Ion et al. private intersection-sum protocol (PI-Sum). The proposed protocol 1 is more efficient than the PI-Sum. The computational cost of the proposed protocol 1 is about 1% smaller than that of the PI-Sum if the base 2 logarithm of the number of the private set is 16, and the computational cost of the proposed protocol 1 is about 50% smaller than that of the PI-Sum if the base 2 logarithm of the number of the private set is 22. To our best knowledge, there exists only two private intersection-sum protocols. Thus, the proposed protocol 1 is the fastest private intersection-sum protocol. We also proposed the proposed protocol 2 and the proposed protocol 3 for preventing the security problem described in Section 5. By adding an aborting functionality in the proposed protocol 2 and the proposed protocol 3, the proposed protocol 2 and the proposed protocol 3 can be more efficient than the “Reverse” Private Intersection-Sum protocol proposed by Ion et al. Implementation of our proposed protocols is left for future work.