1 Introduction

There are many situations in which computations are performed on sensitive numerical data. A computation on numbers can usually be expressed as a sequence of arithmetic operations such as addition, subtraction, and multiplication.Footnote 1

In cases where the sensitive data is distributed among multiple parties, this calls for secure arithmetic computation, namely secure computation of functions defined by arithmetic operations. It is convenient to represent such a function by an arithmetic circuit, which is similar to a standard boolean circuit except that gates are labeled by addition, subtraction, or multiplication. It is typically sufficient to consider such circuits that evaluate the operations over a large finite field \(\mathbb {F}\), since arithmetic computations over the integers or (bounded precision) reals can be reduced to this case. Computing over finite fields (as opposed to integers or reals) can also be a feature, as it is useful for applications in threshold cryptography (see, e.g., [15, 26]). In the present work we are mainly interested in the case of secure arithmetic two-party computation in the presence of semi-honest adversaries.Footnote 2 From here on, the term “secure computation” will refer specifically to this case.

Oblivious Linear-Function Evaluation. A natural complete primitive for secure arithmetic computation is Oblivious Linear-function Evaluation (OLE). OLE is a two-party functionality that receives a field element \(x\in \mathbb {F}\) from Alice and field elements \(a,b\in \mathbb {F}\) from Bob and delivers \(ax+b\) to Alice. OLE can be viewed as the arithmetic analogue of 1-out-of-2 Oblivious Transfer of bits (bit-OT) [22]. In the binary case, every boolean circuit C can be securely evaluated with perfect security by using O(|C|) invocations of an ideal bit-OT oracle via the “GMW protocol” [27, 30]. A simple generalization of this protocol can be used to evaluate any arithmetic circuit C over \(\mathbb {F}\) using O(|C|) invocations of OLE and O(|C|) field operations [35].

The Complexity of Secure Arithmetic Computation. The goal of this work is to minimize the complexity of secure arithmetic computation. In light of the above, this reduces to efficiently realizing multiple instances of OLE. We start by surveying known approaches. The most obvious is a straightforward reduction to standard secure computation methods by emulating field operations using bit operations. This approach is quite expensive both asymptotically and in terms of concrete efficiency. In particular, it typically requires many “cryptographic” operations for securely emulating each field operation.

A more direct approach is via homomorphic encryption. Since OLE is a degree-1 function, it can be directly realized by using “linear-homomorphic” encryption schemes (that support addition and scalar multiplication). This approach can be instantiated using Paillier encryption [18, 26, 48] or using encryption schemes based on (ring)-LWE [19, 43]. While these techniques can be optimized to achieve good communication complexity, their concrete computational cost is quite high. In asymptotic terms, the best instantiations of this approach have computational overhead that grows polylogarithmically with the security parameter k. That is, the computational complexity of such secure computation protocols (in any standard computational model) is bigger than the computational complexity of the insecure computation by at least a polylogarithmic factor in k.

Another approach, first suggested by Gilboa [26] and recently implemented by Keller et al. [37], is to use a simple information-theoretic reduction of OLE to string-OT. By using a bit-decomposition of Alice’s input x, an OLE over a field \(\mathbb {F}\) with \(\ell \)-bit elements can be perfectly reduced to \(\ell \) instances of OT, where in each OT one of two field elements is being transferred from Bob to Alice. Using fast methods for OT extension [12, 31], the OTs can be implemented quite efficiently. However, even when neglecting the cost of OTs, the communication involves \(2\ell \) field elements and the computation involves \(O(\ell )\) field operations per OLE. This overhead can be quite large for big fields \(\mathbb {F}\) that are often useful in applications.

A final approach, which is the most relevant to our work, uses a computationally secure reduction from OLE to string-OT that assumes the peudorandomness of noisy random codewords in a linear code. This approach was first suggested by Naor and Pinkas [46] and was further developed by Ishai et al. [35]. The most efficient instantiation of these protocols relies on the assumption that a noisy random codeword of a Reed-Solomon code is pseudorandom, provided that the noise level is sufficiently high to defeat known list-decoding algorithms. In the best case scenario, this approach has polylogarithmic computational overhead (using asymptotically fast FFT-based algorithms for encoding and decoding Reed-Solomon codes). See Sect. 1.3 for a more detailed overview of existing approaches and [35] for further discussion of secure arithmetic computation and its applications.

The above state of affairs leaves the following question open:

Is it possible to realize secure arithmetic computation with constant computational overhead?

To be a bit more precise, by “constant computational overhead” we mean that there is a protocol which can securely evaluate any arithmetic circuit C over any finite field \(\mathbb {F}\), with a computational cost (on a RAM machine) that is only a constant times bigger than the cost of performing |C| field operations with no security at all. Here we make the standard convention of viewing the size of C also as a security parameter, namely the view of any adversary running in time \(\mathrm {poly}(|C|)\) can be simulated up to a negligible error (in |C|). In the boolean case, Ishai et al. [34] showed that secure computation with constant computational overhead can be based on the conjectured existence of a local polynomial-stretch pseudo-random generator (PRG). In contrast, in all known protocols for the arithmetic case the computational overhead either grows linearly with \(\log |\mathbb {F}|\) or polylogarithmically with the security parameter.

1.1 Our Contribution

We improve both the asymptotic and the concrete efficiency of secure arithmetic computation. On the asymptotic efficiency front, we settle the above open question in the affirmative under plausible cryptographic assumptions. More concretely, our main result is a protocol that securely evaluates any arithmetic circuit C over \(\mathbb {F}\) using only O(|C|) field operations, independently of the size of \(\mathbb {F}\). The protocol uses the underlying field \(\mathbb {F}\) as a black box, where the number of field operations depends only on the security parameter and not on the field size.Footnote 3 The security of the protocol is based on arithmetic analogues of well-studied cryptographic assumptions: concretely, an arithmetic version of an assumption due to Alekhnovich [1] (or similar kinds of “LPN-style” assumptions) and an arithmetic version of a local polynomial-stretch PRG [4, 11, 34].Footnote 4

On the concrete efficiency front, our approach is particularly appealing for a useful subclass of arithmetic computations that efficiently reduce to a multi-output extension of OLE that we call vector-OLE. A vector-OLE of width w is a two-party functionality that receives a field element \(x\in \mathbb {F}\) from Alice and a pair of vectors \({\varvec{a}},{\varvec{b}}\in \mathbb {F}^w\) from Bob and delivers \({\varvec{a}}x+{\varvec{b}}\) to Alice. We obtain a secure protocol for vector-OLE with constant computational overhead and with an asymptotic communication rate of 1/3 (i.e., the communication is dominated by sending roughly 3w elements of \(\mathbb {F}\)). Our implementation of this protocol suggests that it outperforms competing approaches even for relatively small fields \(\mathbb {F}\) and over fast networks. The protocol is also based on more conservative assumptions, namely it can be based only on the first of the two assumptions on which our more general result is based. This assumption is arguably more conservative than the assumption on noisy Reed-Solomon codes used in [35, 46], since the underlying codes do not have an algebraic structure that gives rise to efficient (list-)decoding algorithms.

Vector-OLE can be viewed as an arithmetic analogue of string-OT. Similarly to the usefulness of string-OT for garbling schemes [54], vector-OLE is useful for arithmetic garbling [5, 10] (see Sect. 4). Moreover, there are several natural secure computation tasks that can be directly and efficiently realized using vector-OLE. One class of such tasks are in the domain of secure linear algebra [17]. As a simple example, the secure multiplication of an \(n\times n\) matrix by a length-n vector easily reduces to n instances of vector-OLE of width n. Another class of applications is in the domain of securely searching for nearest neighbors, e.g., in the context of secure face recognition [21]. The goal is to find in a database of n vectors of dimension d the vector which is closest in Euclidean distance to a given target vector. This task admits a simple reduction to d instances of width-n vector OLE, followed by non-arithmetic secure computation of a simple function (minimum) of n integers whose size is independent of d. The cost of such a protocol is dominated by the cost of vector-OLE. See Sect. 5 for a more detailed discussion of these applications.

1.2 Overview of Techniques

Our constant-overhead protocol for a general circuit C is obtained in three steps. The first step is a reduction of the secure computation of C to \(n=O(|C|)\) instances of OLE via an arithmetic version of the GMW protocol.

The second step is a reduction of n instances of OLE to roughly \(\sqrt{n}\) instances of vector-OLE of width \(w=O(\sqrt{n})\). This step mimics the approach for constant-overhead secure computation of boolean circuits taken in [34], which combines a local polynomial-stretch PRGs with an information-theoretic garbling scheme [32, 54]. To extend this approach from the boolean to the arithmetic case, two changes are made. First, the information-theoretic garbling scheme for NC\(^0\) is replaced by an arithmetic analogue [10]. More interestingly, the polynomial-stretch PRGs in NC\(^0\) needs to be replaced by an arithmetic analogue. We propose candidates for such arithmetic PRGs that generalize the boolean candidates from [11, 28] and can be shown to resist known classes of attacks. While the security of these PRG candidates remains to be further studied, there are no apparent weaknesses that arise from increasing the field size.

The final, and most interesting, step is a constant-overhead protocol for vector-OLE. As noted above, the protocol obtained in this step is independently useful for applications, and our implementation of this protocol beats competing approaches not only asymptotically but also in terms of its concrete efficiency.

Our starting point is the code-based OLE protocol from [35, 46]. This protocol can be based on any randomized linear encoding scheme E over \(\mathbb {F}\) that has a the following “LPN-style” pseudorandomness property: If we encode a message \(x\in \mathbb {F}\) and replace a small random subset of the symbols by uniformly random field elements, the resulting noisy codeword is pseudorandom. For most linear encoding schemes this appears to be a conservative assumption, since there are very few linear codes for which efficient decoding techniques are known. The OLE protocol proceeds by having Alice compute a random encoding \({\varvec{y}}=E(x)\) and send a noisy version \({\varvec{y}}'\) of \({\varvec{y}}\) to Bob. Bob returns \({\varvec{c}'} = a{\varvec{y}}'+{\varvec{b}}\) to Alice, where \({\varvec{b}}=E(b)\) is a random linear encoding of b. Knowing the noise locations, Alice can decode \(c=ax+b\) from \({\varvec{c}'}\) via erasure-decoding in the linear code defined by E. If we ignore the noise coordinates, \({\varvec{c}'}\) does not reveal to Alice any additional information about (ab) except the output \(ax+b\). However, the noise coordinates can reveal more information. To prevent this information from being leaked, Alice uses oblivious transfer (OT) to selectively learn only the non-noisy coordinates of \({\varvec{c}'}\).

An attempt to extend the above protocol to the case of vector-OLE immediately encounters a syntactic difficulty. If the single value a is replaced by a vector \(\varvec{a}\), it is not clear how to “multiply” \({\varvec{y}}'\) by \(\varvec{a}\). A workaround taken in [35] is to use a “multiplicative” encoding scheme E based on Reed-Solomon codes. The encoding and decoding of these codes incurs a polylogarithmic computational overhead, and the high noise level required for defeating known list-decoding algorithms results in a poor concrete communication rate. The algebraic nature of the codes also makes the underlying intractability assumption look quite strong. It is therefore desirable to base a similar protocol on other types of linear codes.

Our first idea for using general linear codes is to apply “vector-OLE reversal.” Concretely, we apply a simple protocol for reducing vector-OLE to the computation of \({\varvec{a}}x+{\varvec{b}}\) where \({\varvec{a}}\) is the input of Bob, x and \({\varvec{b}}\) the are the inputs of Alice, and the output is delivered to Bob. Now a general linear encoding E can be used by Bob to encode its input \({\varvec{a}}\), and since x is a scalar Alice can multiply the encoding by x and add an encoding of \({\varvec{b}}\). If we base E on a linear-time encodable and decodable code, such as the code of Spielman [51], the protocol can be implemented using only O(w) field operations. The problem with this approach the is that the pseudorandomness assumption looks questionable in light of the existence of an efficient decoding algorithm for E. Even if the noise level can chosen in a way that still respects linear-time erasure-decoding but makes error-correction intractable (which is not at all clear), this would require a high noise rate and hurt the concrete efficiency.

Our final idea, which may be of independent interest, is that instead of requiring a single encoding E to simultaneously satisfy both “hardness” and “easiness” properties, we can combine two types of encoding to enjoy the best of both worlds. Concretely, we present a general way to combine any linear code \(C_1\) that has a fast encoder and a cryptographic (“LPN-style”) pseudorandomness property with another linear code \(C_2\) that supports fast encoding and erasure-decoding (but has no useful hardness properties) to get a randomized linear encoding E that inherits the pseudorandomness feature from \(C_1\) and the efficiency features from \(C_2\). This is achieved by using a noisy output of \(C_1\) to mask the output of \(C_2\), which we pad with a sufficient number of 0s. Given the knowledge of the noise locations in the padding zone, the entire \(C_1\) component can be recovered in a “brute-force” way via Gaussian elimination, and one can then compute and decode the output of \(C_2\). When the expansion of E is sufficiently large, the Gaussian elimination is only applied to a short part of the encoding length and hence does not form an efficiency bottleneck. Using these ideas, we obtain a constant-overhead vector-OLE protocol under a seemingly conservative assumption, namely a natural arithmetic analogue of an assumption due to Alekhnovich [1] or a similar assumption for other linear-time encodable codes that do not have the special structure required for fast erasure-decoding.

1.3 Related Work

We first give an overview of known techniques for OLE (with semi-honest security) and compare to what can be obtained using our approach.

First, the work of Gilboa [26] (see also [37]) implements OLE in a field with n-bit elements using n oblivious transfers of field elements. The asymptotic communication complexity of this approach is larger than ours by a factor \(\varOmega (n)\).

In particular, if the goal is to implement vector-OLE, we can say something more concrete. Our vector-OLE implementation sends n/r bits to do 1 OLE on n-bit field elements, where r is the rate, which is between 1/5 and 1/10 for our implementation. The OT based approach will need to send at least \(n^2\) bits to do the same. So in cases where network bandwidth is the bottleneck, we can expect to be faster than the OT based approach by a factor nr. Our experiments indicate that this happens for network speeds around 20–50 Mbits/sec. Actually, also at large network speeds, our vector-OLE implementation outperforms the OT based approach: The latest timings for semi-honest string OT on the type of architecture we used (2 desktop computers connected by a LAN) are from [12] (see also [37]) and indicate that one OT can be done in amortised time about 0.2 \(\upmu \)s, so that 0.2 n\(\upmu \)s would an estimate for the time needed for one OLE. In contrast, our times (for 100-bit security) are much smaller, even for the smallest case we considered (\(n=32\)) we have 0.7 \(\upmu \)s amortised time per OLE. For larger fields, the picture is similar, for instance for \(n=1024\), we obtain 19.5 \(\upmu \)s per OLE, where the estimate for the OT based technique is about 200 \(\upmu \)s.

A second class of OLE protocols can be obtained from homomorphic encryption schemes: one party encrypts his input under his own key and sends the ciphertext to the other party. He can now multiply his input “into the ciphertext” and randomize it appropriately before returning it for decryption. This will work based on Paillier encryption (see, e.g., [21] for an application of this) but will be very slow because exponentiation is required for the plaintext multiplication. A more efficient approach is to use (ring)-LWE based schemes, as worked out in [19] by Damgård et al. Here the asymptotic communication overhead is worse than ours by a poly-logarithmic factor, at least for prime fields if one uses the so-called SIMD variant where the plaintext is a vector of field elements. However, the approach becomes very problematic for extension fields of large degree because key generation requires that we find a cyclotomic polynomial that splits in a very specific way modulo the characteristic, and one needs to go to very large keys before this happens. Quantifying this precisely is non-trivial and was not done in [19], but as an example, the overhead in ciphertext size is a factor of about 7 for a 64-bit prime fields, but is 1850 for \(\mathbb {F}_{2^8}\). Also, the computational overhead for ring-LWE based schemes is much higher than ours: even if we pack as many field elements, say \(\lambda \), into a ciphertext as possible, the overhead involved in encryption and decryption is superlinear in \(\lambda \). Further \(\lambda \) needs to grow with the field size, again the asymptotic growth is hard to quantify exactly, but it is definitely super logarithmic. In more concrete terms, the computational overhead of homomorphic encryption makes these protocols slower in practice than the pure OT-based approach (see [37]), which is in turn generally slower than our approach for the case of vector-OLE.

A final class of protocols is more closely related to ours, namely the code-based approach of Naor and Pinkas [46] and its generalizations from [35]. The most efficient instantiation of these protocols is based on an assumption on pseudo-randomness of noisy Reed-Solomon codewords, whereas we use codes generated from sparse matrices. Because encoding and decoding Reed-Solomon codes is not known to be in linear time, these protocols are asymptotically slower that ours by a poly-logarithmic factor. As for the communication, we obtain an asymptotic rate of 1/3 and can obtain a practical rate of around 1/4. The rate of the protocol from [35] is also constant but much smaller: one loses a factor 2 because the protocol involves point-wise multiplication of codewords, so codewords need to be long enough to allow decoding of a Reed-Solomon code based on polynomials of double degree. Even more significantly, on top of the above, the distance needs to be increased (so the rate decreases) to protect against attacks that rely on efficient list-decoding algorithms for Reed-Solomon codes. This class of attacks does not apply to our approach, since it does not require the code for which the pseudorandomness assumption holds to have any algebraic structure.

2 Preliminaries

2.1 The Arithmetic Setting

Our formalization of secure arithmetic computation follows the one from [35], but simplifies it to account for the simpler setting of security against semi-honest adversaries. We also refine the computational model to allow for a more concrete complexity analysis. We refer the reader to [35] for more details.

Functionalities. We represent the functionalities that we want to realize securely via a multi-party variant of arithmetic circuits.

Definition 1

(Arithmetic circuits). An arithmetic circuit C has the same syntax as a standard boolean circuit, except that the gates are labeled by ‘+’ (addition), ‘−’ (subtraction) or ‘*’ (multiplication). Each input wire can be labeled by an input variable \(x_i\) or a constant \(c\in \{0,1\}\). Given a finite field \(\mathbb {F}\), an arithmetic circuit C with n inputs and m outputs naturally defines a function \(C^\mathbb {F}:\mathbb {F}^n\rightarrow \mathbb {F}^m\). An arithmetic functionality circuit is an arithmetic circuit whose inputs and outputs are labeled by party identities. In the two-party case, such a circuit C naturally defines a two-party functionality \(C^\mathbb {F}:\mathbb {F}^{n_1}\times \mathbb {F}^{n_2}\rightarrow \mathbb {F}^{m_1}\times \mathbb {F}^{m_2}\). We denote by \(C^\mathbb {F}(x_1,x_2)_P\) the output of Party P on inputs \((x_1,x_2)\).

Protocols and Complexity. To allow for a concrete complexity analysis, we view a protocol as a finite object that is generated by a protocol compiler (defined below). We assume that field elements have an adversarially chosen representation by \(\ell \)-bit strings, where the protocol can depend on \(\ell \) (but not on the representation). The representation is needed for realizing our protocols in the plain model. When considered as protocols in the OT-hybrid model, our protocols can be cast in the more restrictive arithmetic model of Applebaum et al. [5], where the parties do not have access to the bit-length of field elements or their representation, but can still perform field operations and communicate field elements over the OT channel. Protocols in this model have the feature that the number of field operations is independent of the field size.

By default, we model a protocol by a RAM program.Footnote 5 The choice of computational model does not change the number of field operations, which anyway dominates the overall cost as the field grows. In our theorem statements we will only refer to the number of field operations T, with the implicit understanding that all other computations can be implemented using \(O(T\ell )\) bit operations. (Note that \(T\ell \) bit operations are needed just for writing the outputs of T field operations.)

Protocol Compiler. A protocol compiler \(\mathcal P\) takes a security parameter \(1^k\), an arithmetic (two-party) functionality circuit C and bit-length parameter \(\ell \) as inputs, and outputs a protocol \(\varPi \) that realizes C given an oracle to any field \(\mathbb {F}\) whose elements are represented by \(\ell \)-bit strings. It should satisfy the following correctness and security requirements.

  • Correctness: For every choice of \(k,C,\mathbb {F},\ell \), any representation of elements of \(\mathbb {F}\) by \(\ell \)-bit strings, and every possible pair of inputs \((x_1,x_2)\) for C, the execution of \(\varPi \) on \((x_1,x_2)\) ends with the parties outputting \(C(x_1,x_2)\), except with negligible probability in k.

  • Security: For every polynomial-size non-uniform \(\mathcal {A}\) there is a negligible function \(\epsilon \) such that the success probability of \(\mathcal {A}\) in the following game is bounded by \(1/2+\epsilon (k)\):

    • On input \(1^k\), the adversary \(\mathcal {A}\) picks a functionality circuit C, positive integer \(\ell \) and field \(\mathbb {F}\) whose elements are represented by \(\ell \)-bit strings. The representation of field elements and field operations are implemented by a circuit \(\mathcal F\) output by \(\mathcal A\). (Note that all of the above parameters, including the complexity of the field operations, are effectively restricted to be polynomial in k.)

    • Let \(\varPi ^{\mathcal F}\) be the protocol returned by the compiler \(\mathcal P\) on \(1^k,C,\ell \), instantiating the field oracle \(\mathbb {F}\) using \(\mathcal F\).

    • \(\mathcal {A}\) picks a corrupted party \(P\in \{1,2\}\) and two input pairs \(x^0=(x^0_1,x^0_2)\), \(x^1=(x^1_1,x^1_2)\) such that \(C^\mathbb {F}(x^0)_P=C^\mathbb {F}(x^1)_P\).

    • Challenger picks a random bit b.

    • \(\mathcal {A}\) is given the view of Party P in \(\varPi ^{\mathcal F}(x^b)\) and outputs a guess for b.

OLE and Vector OLE. We will be particularly interested in the following two arithmetic computations: an OLE takes an input \(x\in \mathbb {F}\) from Alice and a pair \(a,b\in \mathbb {F}\) from Bob and delivers \(ax+b\) to Alice. Vector OLE of width w is similar, except that the input of Bob is a pair of vectors \({\varvec{a}},{\varvec{b}}\in \mathbb {F}^w\) and the output is \({\varvec{a}}x+{\varvec{b}}\). OLE and vector OLE can be viewed as arithmetic analogues of bit-OT and string-OT, respectively. Indeed, in the case \(\mathbb {F}=\mathbb {F}_2\), the OLE functionalities coincide with the corresponding OT functionalities up to a local relabeling of the inputs. An arithmetic generalization of the standard “GMW Protocol” [30, 35] compiles any arithmetic circuit functionality C into a perfectly secure protocol that makes \(O(s_\times )\) calls to an ideal OLE functionality, where \(s_\times \) is the number of multiplication gates, and O(|C|) field operations. Hence, to securely compute C with O(|C|) field operations in the plain model it suffices to realize n instances of OLE using O(n) field operations.

2.2 Decomposable Affine Randomized Encoding (DARE)

Let \(f: \mathbb {F}^{\ell }\rightarrow \mathbb {F}^{m}\) where \(\mathbb {F}\) is some finite field.Footnote 6 We say that a function \(\hat{f}: \mathbb {F}^{\ell } \times \mathbb {F}^{\rho } \rightarrow \mathbb {F}^{m}\) is a perfect randomized encoding [8, 32] of f if for every input \(x\in \mathbb {F}^{\ell }\), the distribution \(\hat{f}(x;r)\) induced by a uniform choice of , “encodes” the string f(x) in the following sense:

  1. 1.

    (Correctness) There exists a decoding algorithm \(\mathsf {Dec}\) such that for every \(x\in \mathbb {F}^{\ell }\) it holds that

  2. 2.

    (Privacy) There exists a randomized algorithm \(\mathcal {S}\) such that for every \(x\in \mathbb {F}^{\ell }\) and uniformly chosen it holds that

    $$\begin{aligned} \mathcal {S}(f(x)) \quad \text {is distributed identically to } \quad \hat{f}(x;r). \end{aligned}$$

We say that \(\hat{f}(x;r)\) is decomposable and affine if \(\hat{f}\) can be written as \(\hat{f}(x;r)=(\hat{f}_0(r),\hat{f}_1(x_1;r),\ldots ,\hat{f}_n(x_{\ell };r))\) where \(\hat{f}_i\) is linear in \(x_i\), i.e., it can be written as \(\varvec{a}_i x_i+\varvec{b}_i\) where the vectors \(\varvec{a}_i\) and \(\varvec{b}_i\) arbitrarily depend on the randomness r.

It follows from [33] (cf. [10]) that every single-output function \(f:\mathbb {F}^d \rightarrow \mathbb {F}\) which can be computed by constant-depth circuit (aka \(\mathbf {{NC^0}}\) function) admits a decomposable encoding which can be encoded and decoded by an arithmetic circuit of finite complexity D which depends only in the circuit depth. Note that any multi-output function can be encoded by concatenating independent randomized encodings of the functions defined by its output bits. Thus, we have the following:

Fact 1

Let \(f:\mathbb {F}^{\ell }\rightarrow \mathbb {F}^{m}\) be an \(\mathbf {{NC^0}}\) function. Then, f has a DARE \(\hat{f}\) which can be encoded, decoded and simulated by an arithmetic circuit of size O(m) where the constant in the big-O notation depends on the circuit depth.Footnote 7

We mention that the circuits for the encoding, decoder, and simulator can be all constructed efficiently given the circuit for f.

3 Vector OLE of Large Width

In this section, our goal is to construct a semi-honest secure protocol for Vector OLE of width w over the field \(\mathbb {F}=\mathbb {F}_p\) for parties Alice and Bob.

As a stepping stone, we will first implement a “reversed” version of this that can easily be turned into what we actually want: for the Reverse vector OLE functionality, Bob has input \(\varvec{a}\in \mathbb {F}^w\), while Alice has input \(x\in \mathbb {F}, \varvec{b}\in \mathbb {F}^w\), and the functionality outputs nothing to Alice and \(\varvec{a}\cdot x + \varvec{b}\) to Bob. The latter will be based on a special gadget (referred to as fast hard/easy code) that allows fast encoding and decoding under erasures but semantically hides the encoded messages in the presence of noise. We describe first this gadget and then give the protocol.

3.1 Ingredients

The main ingredient we need is a public matrix M over \(\mathbb {F}\) with the following pseudorandomness property: If we take a random vector \(\varvec{y}\) in the image of M, and perturb it with “noise”, the resulting vector \(\hat{\varvec{y}}\) is computationally indistinguishable from a truly random vector over \(\mathbb {F}\). Our noise distribution corresponds to the p-ary symmetric channel with crossover over probability \(\mu \), that is, \(\hat{\varvec{y}}=\varvec{y}+\varvec{e}\) where for each coordinate of \(\varvec{e}\) we assign independently the value zero with probability \(1-\mu \) and a uniformly chosen non-zero element from \(\mathbb {F}\) with probability \(1-\mu \). We let \({\mathcal D}(\mathbb {F})_{\mu }^{t}\) denote the corresponding noise distributions for vectors of length t (and occasionally omit the parameters \(\mathbb {F},\mu \) and t when they are clear from the context). For concreteness, the reader may think of \(\mu \) as 1/4. The properties needed for our protocol are summarized in the following assumption, that will be discussed in Sect. 7.

Assumption 2

(Fast pseudorandom matrix). There exists a constant \(\mu <1/2\) and an efficient randomized algorithm \({\mathcal M}\) that given a security parameter \(1^k\) and a field representation \(\mathbb {F}\), samples a \(k^3 \times k\) matrix M over \(\mathbb {F}\) such that the following holds:

  1. 1.

    (Linear-time computation) The mapping \(f_M:\varvec{x}\mapsto M \varvec{x}\) can be computed in linear-time in the output length. Formally, we assume that the sampler outputs a description of an \(O(k^3)\)-size arithmetic circuit over \(\mathbb {F}\) for computing \(f_M\).

  2. 2.

    (Pseudorandomness) The following ensembles (indexed by k) are computationally indistinguishable for \(\mathrm {poly}(k)\) adversaries

    $$\begin{aligned} (M, M\varvec{r} + \varvec{e}) \qquad \text {and} \qquad (M, \varvec{z}), \end{aligned}$$

    where , , and .

  3. 3.

    (Linear independence) If we sample and keep each of the first \(k \log ^2 k\) rows independently with probability \(\mu \) (and remove all other rows), then, except with negligible probability in k, the resulting matrix has full rank.

We will also need a linear error correcting code \(\mathsf {Ecc}\) over \(\mathbb {F}\) with constant rate R and linear time encoding and decoding, where we only need decoding from a constant fraction of erasures \(\mu '\) which is slightly larger than the noise rate \(\mu \). (For \(\mu =1/4\) we can take \(\mu '=1/3\).) Such codes are known to exist (cf. [51]) and can be efficiently constructed given a black-box access to \(\mathbb {F}\).

Fast Hard/Easy Code. We combine the “fast code” \(\mathsf {Ecc}\) and the “fast pseudorandom code” \({\mathcal M}\) into a single gadget that provides fast encoding and decoding under erasures, but hides the encoded message when delivered through a noisy channel. The gadget supports messages of length \(w=\varTheta (k^3)\). Our gadget is initialized by sampling a \(k^3 \times k\) matrix M over \(\mathbb {F}\) using the randomized algorithm \({\mathcal M}\) promised in Assumption 2. We view the matrix M as being composed of two matrices \(M^{\mathsf {top}}\) with \(u=2k \log ^2 k\) rows and k columns, placed above \(M^{\mathsf {bottom}}\) which has \(v=k^3-u\) rows and k columns. Let \(w=Rv=\varTheta (k^3)\) be a message length parameter (corresponding to the width of the vector-OLE). Note that our \(\mathsf {Ecc}\) encodes vectors of length w into vectors of length v.

For a message \(\varvec{a}\in \mathbb {F}^{w}\), and random vector \(\varvec{r}\in \mathbb {F}^k\), define the mapping

$$\begin{aligned} E_{\varvec{r}}(\varvec{a})= M\varvec{r} + (0^u \circ \mathsf {Ecc}(\varvec{a})), \end{aligned}$$

where \(\circ \) denotes concatenation (and so \((0^u \circ \mathsf {Ecc}(\varvec{a}))\) is a vector of length \(u+v\)). We will make use of the following useful properties of E:

  1. 1.

    (Fast and Linear) The mapping \(E_{\varvec{r}}(\varvec{a})\) can be computed by making only \(O(k^3)\) arithmetic operations. Moreover, it is a linear function of \((\varvec{r},\varvec{a})\) and so \(E_{\varvec{r}}(\varvec{a})+E_{\varvec{r'}}(\varvec{a'})=E_{\varvec{r}+\varvec{r'}}(\varvec{a}+\varvec{a'})\).

  2. 2.

    (Hiding under errors) For any message \(\varvec{a}\) and , the vector \(E_{\varvec{r}}(\varvec{a}) + \varvec{e}\) is pseudorandom and, in particular, it computationally hides \(\varvec{a}\).

  3. 3.

    (Fast decoding under erasures) Given a random \((1-\mu )\)-subset I of the coordinates of \(\varvec{z}=E_{\varvec{r}}(\varvec{a})\) (i.e., each coordinate is erased with independently probability \(\mu \)) it is possible to recover the vector \(\varvec{a}\), with negligible error probability, by making only \(O(|\varvec{z}|)=O(k^3)\) arithmetic operations. Indeed, letting \(I_0\) (resp., \(I_1\)) denote the coordinates received from the u-prefix of \(\varvec{z}\) (resp., v-suffix of \(\varvec{z}\)), we first recover \(\varvec{r}\) by solving the linear system \(\varvec{z}_{I_0}=(M^{\mathsf {top}} \varvec{r})_{I_0}\) via Gaussian elimination in \(O(k^3)\) arithmetic operations. By Assumption 2 (property 3) the system is likely to have a unique solution. Then we compute \((M^{\mathsf {bottom}}\varvec{r})_{I_1}\) in time \(O(k^3)\), subtract from \((E_{\varvec{r}}(\varvec{a}))_{I_1}\) to get \(\mathsf {Ecc}(\varvec{a})_{I_1}\), from which \(\varvec{a}\) can be recovered by erasure decoding in time \(O(k^3)\).

3.2 From Fast Hard/Easy Code to Reverse Vector-OLE

Our protocol uses the gadget E to implement a reversed vector-OLE. In the following we assume that the parties have access to a variant Oblivious Transfer of field elements which we assume (for now) is given as an ideal functionality. To be precise, the variant we need is one where Alice sends a field element f, Bob chooses to receive f, or to receive nothing, while Alice learns nothing new.

We describe the protocol under the assumption that the width w is taken to be \(\varTheta (k^3)\). A general value of w will be treated either by padding or by partitioning into smaller blocks of size \(O(k^3)\) each. (See the proof of Theorem 3.)

Construction 1

(Reverse Vector OLE protocol). To initialize the protocol one of the parties samples the matrix and publish it. The gadget E (and the parameters uv and w) are now defined based on M and k as described above.

  1. 1.

    Bob has input \(\varvec{a}\in \mathbb {F}^w\). He selects random \(\varvec{r}\in \mathbb {F}^k\), chooses \(\varvec{e}\) according to \({\mathcal D}(\mathbb {F})^{u+v}_{\mu }\) and sends to Alice the vector \(\varvec{c} = E_{\varvec{r}}(\varvec{a}) + \varvec{e} \).

  2. 2.

    Alice has input \(x, \varvec{b}\). She chooses \(\varvec{r'}\in \mathbb {F}^k\) at random and computes \(\varvec{d} = x\cdot \varvec{c} + E_{\varvec{r'}}(\varvec{b}) \).

  3. 3.

    Let I be an index set that contain those indices i for which \(\varvec{e}_i =0\). These are called the noise free positions in the following. The parties now execute, for each entry i in \(\varvec{d}\), an OT where Alice sends \(\varvec{d}_i\). If \(i\in I\), Bob chooses to receive \(\varvec{d}_i\), otherwise he chooses to receive nothing.

  4. 4.

    Notice that, since the function E is linear, we have

    $$\varvec{d} = E_{x\varvec{r} +\varvec{r'}}(x\varvec{a} + \varvec{b}) + x\varvec{e}.$$

    Using subscript-I to denote restriction to the noise-free positions, what Bob has just received is

    $$\begin{aligned} \varvec{d}_I= (E_{\varvec{s}}(x\varvec{a}+\varvec{b}))_I, \end{aligned}$$

    where \(\varvec{s}= x\varvec{r} +\varvec{r'}\). Using the fast-decoding property of E (property 3), Bob recovers the vector \(x\varvec{a}+\varvec{b}\) (by making \(O(k^3)\) arithmetic operations) and outputs \(x\varvec{a} + \varvec{b}\).

We are now ready to show that the reverse vector OLE protocol works:

Lemma 1

Suppose that Assumption 2 holds. Then Construction 1 implements the Reverse Vector-OLE functionality of width \(w=\varTheta (k)\) over \(\mathbb {F}\) with semi-honest and computational security in the OT-hybrid model. Furthermore, ignoring the cost of initialization, the arithmetic complexity of the protocol is O(w).

Proof

The running time follows easily by inspection of the protocol. We prove correctness. By Assumption 2 (property 3), except with negligible probability Bob recovers the vector \(\varvec{s}\) correctly. Also, by a Chernoff bound, the v-suffix of the error vector \(\varvec{e}\) contains at most \(\mu ' v\) non-zero coordinates. Therefore, the decoding procedure of the error-correcting code succeeds.

As for privacy, consider first the case where Alice is corrupt. We can then simulate Bob’s message with a random vector in \(\mathbb {F}^{u+v}\) which will be computationally indistinguishable by Assumption 2. If Bob is corrupt, we can simulate what Bob receives in OTs given Bob’s output \(x\varvec{a} + \varvec{b}\), namely we compute \(\varvec{f}= E_{\varvec{s}}(x\varvec{a} + \varvec{b})\) for a random \(\varvec{s}\) and sample a set I as in the protocol (each coordinate \(i\in [k^3]\) is chosen with probability \(1-\mu \)). Then for the OT in position i, we let Bob receive \(\varvec{f}_i\) if \(i\in I\) and nothing otherwise. This simulates Bob’s view perfectly, since in the real protocol \(\varvec{s}= x\varvec{r} +\varvec{r'}\) is indeed uniformly random, and the received values for positions in I do not depend on x or \(\varvec{e}\), only on \(\varvec{s}\) and Bob’s output.   \(\square \)

3.3 From Reverse Vector-OLE to Vector-OLE

Finally, to get a protocol for the vector OLE Functionality, note that we can get such a protocol from the Reverse vector OLE functionality:

Construction 2

(vector-OLE Protocol). Given an input \(\varvec{a}, \varvec{b}\in \mathbb {F}^w\) for Bob, and \(x\in \mathbb {F}\) for Alice, the parties do the following:

  1. 1.

    Call the Reverse Vector-OLE functionality, where Bob uses input \(\varvec{a}\) and Alice uses input x and a randomly chosen . As a result, Bob will receive \(x\varvec{a} + \varvec{b'}\).

  2. 2.

    Bob sends \(\varvec{b} + (x\varvec{a} + \varvec{b'})\) to Alice. Now, Alice outputs \((\varvec{b} + (x\varvec{a} + \varvec{b'}) - \varvec{b'} = x\varvec{a} + \varvec{b}\).

It is trivial to show that this implements the vector-OLE functionality with perfect security. Combining the above with Lemma 1, we derive the following theorem.

Theorem 3

Suppose that Assumption 2 holds. Then, there exists a protocol that implements the vector-OLE functionality of width w over \(\mathbb {F}\) with semi-honest computational security in the OT-hybrid model with arithmetic complexity of \(O(w)+\mathrm {poly}(k)\).

Proof

For \(w<k^3\), the theorem follows directly from Construction 2 and Lemma 1 (together with standard composition theorem for secure computation). The more general case (where w is larger) follows by reducing long w-vector OLE’s into t calls to \(w_0\)-vector OLE for \(w_0=\varTheta (k^3)\) and \(t=w/w_0\). Since initialization is only performed once (with a one-time \(\mathrm {poly}(k)\) cost) and M is re-used, the overall complexity is \(\mathrm {poly}(k)+O(tw_0)=\mathrm {poly}(k)+O(w)\) as claimed.   \(\square \)

Remark 1

(Implementing the OTs). First, note that the OT variant we need can be implemented efficiently for large fields as follows: Alice chooses a short seed for a PRG and to send field element f, she sends \(f \oplus PRG(seed)\) and then does an OT where she offers Bob seed and a random value. If Bob wants to receive f, he chooses to get seed, otherwise he choose the random value.

Our protocol employs O(w) such OTs on field elements, or equivalently, on strings of length \(\log |\mathbb {F}|\) bits. For sufficiently long strings (i.e., \(w=\mathrm {poly}(k)\) for sufficiently large polynomial) one can get these OT very cheaply both practically and theoretically.

Indeed, the implementation we described (which is similar to an observation from [34]), can be done with optimal asymptotic complexity of \(O(w \log |\mathbb {F}|)\) bit operations assuming the existence of a linear-stretch pseudorandom generator \(G:\{0,1\}^k \rightarrow \{0,1\}^{2k}\) which is computable in linear-time O(k). Moreover, such a generator can be based on the binary version of Assumption 2, as follows from [9]. In practice, we can get the OT’s very efficiently via OT extension and perhaps (for very large fields) using a PRG based on AES which is extremely efficient on modern Intel CPUs.

Remark 2

(On the achievable rate). Note that the full vector OLE protocol communicates \(u+v\) field elements, does \(u+v\) OTs and finally sends w field elements. The rate is defined as the size of the output (w) divided by the communication complexity. Now, asymptotically, we can ignore u since it is o(v). Furthermore, v is the length of the code \(\mathsf {Ecc}\), which needs to be about \(w/(1-\mu )\) to allow for erasure decoding w values from a fraction of \(\mu \) random erasures. By the previous remark, an OT can be done at rate 1, so it counts as 1 field element. So we find that the rate asymptotically at best approaches \((1-\mu )/(3-\mu )\) (i.e., \(3/11\approx 1/4\) for \(\mu =1/4\)). If we are willing to believe that Assumption 2 holds for any constant error rate (and large enough code length k) then we can obtain rate approaching \(1/3-\epsilon \) for any constant \(\epsilon > 0\).

4 Batch-OLEs

In this section we implement n copies of OLE (of width 1) with constant computational overhead based on vector-OLE with constant computational overhead and a polynomial-stretch arithmetic pseudorandom generator of constant depth. The transformation is similar to the one described in [34] for the binary setting, and is based on a combination Beaver’s OT extension [13] with a decomposable randomized encoding.

4.1 From Vector-OLE to \(\mathbf {{NC^0}}\) Functionalities

We begin by observing that local functionalities can be reduced to vector-OLE with constant computational overhead. This follows from an arithmetic variant of Yao’s protocol [54] where the garbled circuit is replaced with fully-decomposable randomized encoding. For simplicity, we restrict our attention to functionalities in which only the first party Alice gets the input.

Lemma 2

Let \(\mathbb {F}\) be a finite field and let f be a two-party \(\mathbf {{NC^0}}\) functionality that takes \(\ell _1\) field elements from the sender, \(\ell _2\) field elements from the receiver, and delivers m field elements to the receiver. Then, we can securely compute f with an information-theoretic security in the semi-honest model with arithmetic complexity of O(m) and by making \(O(\ell _2)\) calls to ideal \(O(m/\ell _2)\)-width OLE.

The constant in the big-O notation depends on the circuit depth of f.

Proof

View f as a function over \(\mathbb {F}^{\ell }\) where \(\ell =\ell _1+\ell _2\). By Fact 1, there exists a DARE \(\hat{f}\) which can be encoded and decoded by an O(m)-size arithmetic circuit. Recall, that

$$\begin{aligned} \hat{f}(x;r)=(\hat{f}_0(r),(\hat{f}_i(x_i;r))_{i\in [\ell ]}), \quad \text {where }\ \hat{f}_i(x_i;r)=x_i\varvec{a}_i(r)+\varvec{b_i}(r). \end{aligned}$$

Since the encoding is computable by O(m)-size circuit, it is also possible to take r and collectively compute \((\varvec{a}_i(r),\varvec{b}_i(r))_{i\in [\ell ]}\) by O(m) arithmetic operations. Also, the total length of these vectors is O(m).

Let us denote by \(A \cup B\) the partition of \([\ell ]\) to the inputs given to Alice and the inputs given to Bob, and so \(|A|=\ell _1\) and \(|B|=\ell _2\). Let \(w=m/\ell _2\) and assume an ideal vector OLE of width w. Given an input \(x_A\) for Alice and \(x_B\) for Bob, the parties use Yao’s garbled-circuit protocol to compute f as follows:

  • Bob selects randomness for the encoding and sends \(\hat{f}_0(r)\) together with \((\hat{f}_{i}(x_i;r))_{i \in B}\).

  • For every \(i\in A\) the parties invoke width w-OLE where Alice uses \(x_i\) as her input and Bob uses \((\varvec{a}_i(r), \varvec{b}_i(r))\) as his inputs. If the length \(W_i\) of \(\varvec{a}_i(r)\) and \(\varvec{b}_i(r)\) is larger than w, the vectors are partitioned to w-size blocks and the parties use \(\lceil (W_i/w) \rceil \) calls to w-width OLE. (In the j-th call Bob uses the j-th block of \((\varvec{a}_i(r), \varvec{b}_i(r))\) as his input and Alice uses \(x_i\) as her input.)

  • Finally, Alice aggregates the encoding \(\hat{f}(x;r)\), applies the decoder and recovers the output f(x).

It is not hard to verify that both parties can be implemented by making at most \(O(\ell )\) arithmetic operations. (In fact, they can be implemented by \(O(\ell )\)-size arithmetic circuits). Moreover, the number of call to the w vector-OLE is \(\sum _{i\in A} \lceil W_i/w \rceil =O(m/w)=O(\ell _2)\). The correctness of the protocol follows from the correctness of the DARE. Assuming perfect OLE, the protocol provides perfect security for Bob (who gets no message during the protocol) and for Alice (whose view can be trivially simulated using the perfect simulator of the DARE).   \(\square \)

4.2 From Pseudorandom-OLE to OLE

The following lemma is an arithmetic variant of Beaver’s reduction from batch-OT to OT with “pseudorandom” selection bits.

Lemma 3

Let \(G:\mathbb {F}^k \rightarrow \mathbb {F}^n\) be a pseudorandom generator. Consider the two-party functionality g that takes a seed \(\varvec{s}\in \mathbb {F}^k\) from Alice and n pairs of field elements \((a_i,b_i), i\in [n]\) from Bob and delivers to Alice the value \(y_ia_i+b_i\) where \(\varvec{y}=G(\varvec{s})\). Then, in the g-hybrid model it is possible to securely compute n copies of OLE of width 1 with semi-honest computational security and complexity of O(n) arithmetic operations and a single call to g.

Proof

Let \(\varvec{x}=(x_i)_{i\in [n]}\) be Alice’s input and let \((a_i,b_i), i\in [n]\) be Bob’s input.

  1. 1.

    Alice and Bob call the protocol for g where Alice uses a random seed as an input and Bob uses the pairs \((a_i,c_i), i\in [n]\) where are chosen uniformly at random. Alice gets back the value \(u_i=y_i a_i+c_i\) for \(i\in [n]\).

  2. 2.

    Alice sends to Bob the values \(\varDelta _i=x_i-y_i\), for every \(i\in [n]\).

  3. 3.

    Bob responds with \(v_i=\varDelta _i a_i+(b_i-c_i)\) for every \(i\in [n]\).

  4. 4.

    Alice outputs \(z_i=u_i+v_i\) for every \(i\in [n]\).

It is not hard to verify that correctness holds, i.e., \(z_i=x_i a_i+b_i\). To prove security, observe that Alice’s view, which consists of \((\varvec{x},\varvec{s},\varvec{u},\varvec{v})\), can be perfectly simulated. Indeed, given an input \(\varvec{x}\) and an output \(\varvec{z}\): Sample together with and set \(\varvec{v}=\varvec{z}-\varvec{u}\). As for Bob, his view consists of \(\varvec{a},\varvec{b},\varvec{c}\) and a pseudorandom string \(\varvec{\varDelta }\). We can therefore simulate Bob’s view by sampling \(\varvec{\varDelta }\) (and \(\varvec{c}\)) uniformly at random.    \(\square \)

4.3 From \(\mathbf {{NC^0}}\) PRG to Batch-OLE

To get our final result, we need a polynomial-stretch \(\mathbf {{NC^0}}\) arithmetic pseudorandom generator. In fact, it suffices to have a collection of such PRG’s.

Assumption 4

(polynomial-stretch \(\varvec{\mathbf {{NC^0}}}\) PRG (arithmetic version)). There exists a polynomial-time algorithm that given \(1^k\) and a field representation \(\mathbb {F}\) samples an \(\mathbf {{NC^0}}\) mapping \(G:\mathbb {F}^k\rightarrow \mathbb {F}^{k^2}\) (represented by a circuit) such that with all but negligible probability G is a pseudorandom generator against \(\mathrm {poly}(k)\) adversaries.

Assumption 4 is discussed in Sect. 7. For now, let us mention that similar assumptions were made in the binary setting and known binary candidates have natural arithmetic variants.

Combining Lemmas 2 with 3, we get the following theorem.

Theorem 5

Suppose that Assumption 4 holds. Then, it is possible to securely compute n copies of OLE over \(\mathbb {F}\) in the semi-honest model by making O(n / k) calls to ideal O(k)-width OLE and \(O(n)+\mathrm {poly}(k)\) additional arithmetic operations.

Proof

Let \(t=n/k^2\). Implement the OLE’s using t batches each of size \(k^2\). By Lemmas 2 and 3, each such batch can be implemented by making k calls to ideal O(k)-width OLE and \(O(k^2)\) additional arithmetic operations. Since the initialization of the pseudorandom generator has a one-time \(\mathrm {poly}(k)\) cost, we get the desired complexity.    \(\square \)

Combining Theorems 3 and 5, together with an optimal OT implementation (which by Remark 1 follows from standard OT), and plugging in standard composition theorem for secure computation, we derive the following theorem.

Corollary 1

(main result). Suppose that Assumptions 2 and 4 hold, and a standard binary OT exists. Then, there exists a protocol for securely computing n copies of OLE over \(\mathbb {F}\) with semi-honest computational security, and arithmetic complexity of \(O(n)+\mathrm {poly}(k)\).

5 Applications of Vector-OLE

In the previous section we used vector-OLE only as a tool to obtain OLE. However, there are applications where vector-OLE is precisely what we need.

First, it is easy to see that a secure multiplication of an \(n\times n\) matrix by a length-n vector reduces to n instances of width-n vector-OLE. Therefore, using our implementation of vector-OLE, it is straightforward to multiply a matrix by a vector with \(O(n^2)\) field operations, which is asymptotically optimal, and with a small concrete overhead. This can be used as a building block for other natural secure computation tasks, such as matrix multiplication and other instances of secure linear algebra; see [17, 44] for other examples and motivating applications.

Another class of applications is where a party holds some object that needs to be compared to entries in a database held by another party. The characteristic property is that the input of party is fixed whereas the input from the other party varies (as we run through the database). A good example is secure face recognition, where a face has been measured in some location and we now want to securely test if the measurement is close to an object in a database – containing, say, suspects of some kind. This reduces to computing the Euclidean distance from one point in a space of dimension m (say) to n points in the same space, and then comparing these distances, perhaps to some threshold. It is clearly sufficient to compute the square distance, so this means that what we need to compute will numbers of form

$$\sum _i (x_i-y^j_i)^2 = \sum _i x_i^2 + (y^j_i)^2 - 2x_iy^j_i,$$

where \((x_1,...,x_m)\) is the point held by the client, and \((y_1^j,...,y_m^j)\) is the \(j'th\) point in the database. Clearly, additive shares of \(x_i^2\) and \((y^j_i)^2\) can be computed locally, while additive shares of \(2x_iy^j_i\) can be done using vector-OLE, namely we fix i and compute \(2x_i \cdot (y^1_i,..., y^n_i)\).

Once we have additive shares of the square distances, the comparisons can be done using standard Yao-garbling. Since this only requires small circuits whose size is independent of the dimension m, this can be expected to add negligible overhead.

We note that the secure face recognition problem was considered in [21], where a solution based on Paillier encryption was proposed (see [50] for optimizations). This adds a very large computational overhead compared our solution, since an exponentiation is required for each product \(2x_iy^j_i\).

Similar applications of vector-OLE can apply in many other contexts of securely computing on numerical data that involve computations of low-degree polynomials. See, e.g., [16, 24] and references therein for some recent relevant works in the context of secure machine learning.

6 Implementation

We have implemented the vector-OLE protocol. This is the most practical of our constructions and, as we explained in the previous section, it has applications of its own, even without the conversion to OLEs of width 1.

6.1 Choice of the Matrix M

For the vector OLE protocol, we need a fast pseudorandom matrix M (see Assumption 2). For this, we have chosen to use a random d-sparse matrix for a suitable constant d. This means we are basing ourselves on Assumption 6 from Sect. 7, which essentially just says that a random d-sparse matrix is likely to satisfy a good “expansion” property which leads to pseudorandomness (i.e., satisfy Assumption 2). In particular, to get b bits of security, we select the size of M, such that, except with tiny probability, every set S of at most b rows have joint support which is larger than |S| (i.e., S is non-shrinking). This level of expansion is somewhat optimistic, but still seems to defend against known attacks. (See the discussion in Sect. 7.)

In the earlier theory sections we have assumed that the number rows in M is \(\varTheta (k^3)\). This was because we wanted to amortize away the \(O(k^3)\) amount of work needed to do Gaussian elimination using the top part of the matrix. However, to achieve this number of rows in the concrete security analysis we would need to go to rather large values of k, and this would create some issues with memory management. Hence, to get a more practical version with a relatively small footprint, we chose to settle for \(O(k^2)\) rows. Then, for 80-bit security and \(d=10\) it turns out that we will need approximately \(k=182\) columns and \(k^2\) rows, while for 100-bit security we need \(k=240\).

Note that once the number of rows and columns is fixed, this also fixes the parameters uv from the vector OLE protocol.

6.2 ECC: Using Luby Transform Codes

It remains to consider the erasure correcting code ECC. For this, we want to use Luby Transform (LT) codes [42]. LT codes have extremely simple and efficient en- and decoding algorithms, using only field addition and subtraction, no multiplications or inversions are needed. On the other hand, LT codes were designed for a streaming scenario, where one continues the stream until the receiver has enough data to decode. In our case, we must stop at some finite codeword size, and this means we will have a non-negligible probability that decoding fails. In practice, one can think of this as a small but constant error probability, say 1%. On the other hand, this be detected, and the event that decoding fails only depends on the concrete choice of LT code and the choice of the noiseless positions.

Since the player A knows the LT code to be used and is also the one who chooses the noise pattern, he can simply choose a random noise pattern subject to the condition that decoding succeeds.

The protocol will then always terminate successfully, but we need to make a slightly stronger computational assumption to show that the protocol is secure: the pseudorandomness condition for the matrix M must hold even if we exclude, say 1% of the possible noise patterns. It turns out that, given the known attacks, excluding any 1% of the noise patterns makes no significant difference.Footnote 8

More concretely we instantiate the encoding function \(\mathsf {Ecc}: \mathbb {F}^w \rightarrow \mathbb {F}^v\) over the Robust Soliton distribution also defined in [42]. One generates a output symbol by sampling a degree dec from that distribution and defining the symbol as the sum of dec input symbols chosen uniformly among alle the input symbols. This distribution is defined over two constant parameters \(\delta \) and c. Here \(\delta \) denotes the probability of failed decoding, which together with c adds extra weight to the probability of smaller degree encoding symbols. The two parameters also determine a constant \(\beta \) for which \(v=w\beta \), but since v and w is fixed in our construction, \(\beta \) is also fixed, and we have one degree of freedom less. Thus we instantiate the distribution with parameters wv and \(\delta \) and let those determine c such that \(\beta =v/w\).

Note that \(\delta \) may deviate from the actual probability of failed decoding \(\lambda \) depending on the concrete code. We estimate \(\lambda \) by testing our code on 50.000 random codewords. Note that we fixed the value of v earlier, as a result of choosing M. Given this, we tested different combinations of w and \(\delta \) to achieve a code decodes w/4 errors with probability \(\lambda \). Our concrete parameters are shown in Table 1. Here is presented different choices for w and \(\delta \) that shows how one may trade width for failure probability. In the implementation we will use the codes corresponding to \(\delta =0.01\) for both security parameters.

Table 1. Implementation parameters

6.3 Doing Oblivious Transfers

In the vector OLE protocol we need 1 OT for each row of M. It is natural to implement this via OT extension which can be done very efficiently in a situation like ours where we need a substantial number of OTs. For instance, in [12, 36], an amortised time of about 0.2 \(\upmu \)s per semi-honestly secure string OT was obtained, when generating enough of them in one go. Note that in the protocol specification, we required a special OT variant where one message is sent and the receiver chooses to get it or not. But this can of course be implemented using standard 1–2 string OT where the sender offers the message in question and a dummy.

In order to not require a specific relation between the number of OTs produced by one run of an OT extension and what our protocol requires, we have assumed that we precompute a number of random OTs, which we then adjust to the actual values using standard techniques. The adjustment requires one message in both directions where the first one can be sent in parallel with the message in the Vector OLE protocol, so we get a protocol with a total of 3 messages.

We have not implemented the OT extension itself, instead we simulate the data and communication needed when using the preprocessed OTs. The hypothesis is that that time required to create the random OTs in the first place is insignificant compared to the rest of the computation required. We discuss below the extent to which this turned out to be true.

6.4 Communication Overhead

Having fixed the parameter choices, we can already compute the communication we will need: we can ignore the communication relating to the top part of the matrix M as this is responsible for less than 1% of the communication. Then, by simple inspection of the protocol, one sees that we need to send \(v+w\) field element and do v OTs. We implement the OTs directly from 1–2 OT which means an OT costs communication of 2 field elements and 1 bit. So we get a total of \(3v +w\) field elements (plus v bits, which we can ignore when the field is large). With our choice of LT code, v is roughly 3w, so we have 10w field elements to send. Hence the rate is indeed constant, as expected, namely 1/10. Accepting a larger failure probability for LT decoding, we could get a rate of roughly 1/7. As explained in Remark 2, the best we can hope for asymptotically is about 1/4 when the noise rate is 1/4.

There are two reasons why we do not reach this goal: first, we chose to use LT codes for erasure correction to optimize the computational overhead, but this comes at the price of a suboptimal rate. Second we implement the OTs at rate 1/2. As explained in Remark 1, rate (almost) 1 is possible, but only for large fields. So for fields of size 1000 bits or more, we believe the rate of our implementation can be pushed to about 1/5 without significantly affecting its concrete computational overhead.

6.5 Test Set-Up and Results

Our set-up consists of two identical machines, each with 32 GB RAM and a 64-bit i7-3770K CPU running at 3.5 GHz. The machines are connected on a 1 GbE network with 0.15 ms delay.

A b-bit field is instantiated by choosing \(\mathbb {F}_p\) for the largest prime \(p<2^b\). All matrix operations are optimized to that of sparse matrices except for the Gaussian elimination, where we construct an augmented matrix and do standard row reduction. All parameters are loaded into memory prior to the protocol execution including the matrix M, the LT code and a finite set of test vectors.

First a version is implemented using the GNU Multiple Precision Arithmetic Library for finite field arithmetic. We benchmark this version with b-bit field for \(b\in \{32,64,\dots ,2048\}\). In this setting we allocate 2b bits for each element once, such that we never have to allocate more e.g. at multiplication operations, which consists of a mul and mod GMP call. We further replace the mod call after addition and subtraction with a conditional sum.

Since most computation in the protocol includes field operations, we optimized the finite field for 32-bit and 64-bit versions. Here the 32-bit version only use half of the machine’s word size, but offers fast modulo operation after a multiplication with the div instruction. The 64-bit version utilizes the full word size, but relies on the compiler’s implementation of the modulo operation for uint128_t as supported in GCC-based compilers. For random number generation, we use the Mersenne Twister SFMT variant instead of GMP.

In Tables 2 and 3 it is shown how the GMP and the optimized version compare for respectively \(k=182\) and \(k=240\). Here, we measure the amortized time per single OLE, or more precisely, since the protocol securely computes the multiplication of a scalar by a vector of length w, we divide the time for this by w to get the time per oblivious multiplication. We obtain these times by having as many threads as possible run the protocol in a loop and counting only successful executions. These amortized timings are also depicted in Fig. 1. Afterwards we run the protocol sequentially in a single thread and measure how fast we can execute one instance of the protocol. This indicates the latency, i.e., the time taken from the protocol starts until data is ready. Finally, since we use much less network speed than what is available, we present the network bandwidth we actually use, as this may become a limiting factor in low-bandwidth networks. The reason why the optimized versions use more bandwidth than corresponding GMP versions is that they are computationally faster, so the network is forced to handle the same amount of communication in shorter time. Then for larger fields, bandwidth usage increases because larger field elements need to be sent, but for the largest field size (2048 bits) we see a decrease because computation now has slowed down to the extent that there is more than twice the time to send field elements of double size (compared to 1024 bits).

Table 2. Benchmark of the vector-OLE protocol for \(k=182\)
Table 3. Benchmark of the vector-OLE protocol for \(k=240\)
Fig. 1.
figure 1

Amortized time per OLE compared to field size

We note the protocol latency for 100-bit security is about 2–3 times that of 80-bit security. But for the amortized times the increase in security parameter comes cheaply because we double w in going from 80 to 100-bit security.

In our setup, we need to execute between 2 and 3 OTs per single OLE. Given the results from [12] which were obtained on an architecture similar to ours, we can expect these to take an amortised time of 0.6 \(\upmu \)s, which as expected becomes insignificant as the field size grows, but cannot be ignored for the optimized version on smaller fields.

As computation is the bottleneck compared to network bandwidth, we identify which part of the computation is the most expensive. We test the optimized 32-bit version for \(k=182\) and focus on the Gaussian elimination, the Luby encoding and decoding and a matrix-vector product \(c=M\cdot r\). This is presented in Table 4 as an index set. Here the Gaussian elimination acts as base value and takes \(45\%\) of the total protocol time including communication.

Table 4. Timing of computation

Since the Gaussian elimination costs more than other parts of the protocol, this means that one would need to increase w for the amortization to work. However one could replace this step with any algorithm for solving linear systems, in particular algorithms taking advantage of matrix sparsity such as [53]. Finally one may take advantage of specific constructions of finite fields allowing for even faster arithmetic operations.

7 About the Assumptions

Our results rely on two types of assumptions, both of which can be viewed as natural arithmetic analogues of assumptions that have been studied in the boolean case. We discuss our instantiations of these assumptions below. In Sect. 7.1 we discuss the assumption we use for instantiating our constant-overhead vector-OLE protocol, whereas in Sect. 7.2 we discuss the additional assumption used for obtaining constant-overhead protocol for general arithmetic computations.

7.1 Instantiating Assumption 2 (Fast Pseudorandom Matrix)

An distribution ensemble \({\mathcal M}=\{ {\mathcal M}_k \}\) over \(m(k)\times k\) matrices is pseudorandom for noise rate \(\mu \) if it satisfies property 2 of Assumption 2. It is natural to assume that, for every \(m=\mathrm {poly}(k)\), a random \(m\times k\) matrix is pseudorandom over any finite field. (This is the arithmetic analogue of the Decisional-Learning-Parity-with-Noise assumption [14, 29, 49]). However, Assumption 2 requires the corresponding linear map to be computable in O(m) arithmetic operations (together with an additional linear-independency condition). We suggest two possible instantiations for this assumption.

The Druk-Ishai Ensemble. Druk and Ishai [20] constructed, for any finite field \(\mathbb {F}\) and any code length \(m\in \mathrm {poly}(k)\), an ensemble \({\mathcal M}\) of linear-time computable (mk) error-correcting code over \(\mathbb {F}\) whose distance approaches the Gilbert-Varshamov bound [25, 52] with overwhelming probability. It was further conjectured that, over the binary field, the ensemble is pseudorandom for arbitrary polynomial m(k).Footnote 9 The assumption seems to hold for arbitrary finite fields as well. Moreover, the ensemble satisfies Condition 3 of Assumption 2 since, by [20, Theorem 5], every subset of \(m'=\omega (k)\) rows of the code generates, except with negligible probability, a code of distance \(1-1/|\mathbb {F}|-o(1)\).

Alechnovich’s Ensemble. Alekhnovich [1, Remark 1] conjectured that sparse binary matrices which are “well expanding” are pseudorandom for constant noise rate. We will use the arithmetic version of this assumption. For this we will need the following definition.

Definition 2

Let \(G=(S_1,\ldots ,S_m)\) as a d-uniform hypergraph with m hyperedges over k nodes (hereafter referred to as (kmd)-hypergraph). We say that G is expanding with threshold r and expansion factor c (in short G is (rc)-expanding) if the union of every set of \(\ell \le r\) hyperedges \(S_{i_1},\ldots , S_{i_{\ell }}\) contains at least \(c \ell \) nodes. For a field \(\mathbb {F}\) and (kmd)-hypergraph G we define a probability distribution \({\mathcal M}(G,\mathbb {F})\) over \(m\times k\) matrices as follows: Take \(M_{i,j}\) to be a fresh random non-zero field element if j appears in the i-th hyperedge of G; otherwise, set \(M_{i,j}\) to zero.

Assumption 6

(Arithmetic version of Alekhnovich’s assumption). For every constant \(d>3\), \(m=\mathrm {poly}(k)\), real \(\mu \in (0,1/2)\) and finite field \(\mathbb {F}\), the following holds for all sufficiently large k’s. If G is a (kmd)-hypergraph which is (t, 2d / 3)-expanding then any circuit of size \(T=\exp (\varOmega (t))\) cannot distinguish with advantage better than 1/T between \((M, \varvec{v})\) and \((M,M\varvec{r}+\varvec{e})\) where , , and .

Remarks:

  1. 1.

    (Expansion vs. Security) The assumption says that the level of security is exponential in the size of the smallest expanding set. In particular, an expansion threshold of \((k^{\epsilon })\) guarantees sub-exponential hardness.Footnote 10 This bound is consistent with the best known attacks, and, over the binary field, can be analytically established for a large family of algorithms including myopic algorithms, semi-definite programs, linear-tests, low-degree polynomials, and constant depth circuits (see [6] and references therein). Many of these results can be established for the arithmetic setting as well. The constant 2d/3 (and the hidden constant in the Omega notation), determine the exact relation between expansion and security. The choice of 2d/3 is somewhat arbitrary, and it may be the case that an absolute expansion factor (which does not grow with d) suffice. For our practical implementation, we take an “optimistic” estimate and require an expansion factor slightly larger than 1, which guarantees that the support of r-size sets do not shrink.

  2. 2.

    (Variants) One may conjecture that the assumption holds with probability 1 over the choice of M. That is, any matrix (including 0–1 matrix) whose underlying graph is expanding is pseudorandom.

  3. 3.

    (Efficiency) Observe that since G is (kmd)-hypergraph any matrix in the support of \({\mathcal M}(G,\mathbb {F}_p)\) is d-sparse in the sense that each of its rows has exactly d non-zero elements. The linear mapping \(f_M:\varvec{x}\mapsto M\varvec{x}\) can be therefore computed by performing \(O(d m)=O(m)\) arithmetic operations.

  4. 4.

    (Linear Independency) Recall that Assumption 2 requires that a random subset of \(k \log ^2 k\) of the rows of M have, except with negligible probability, full rank. In Lemma 5 we show that this condition holds as long as G is semi-regular in the sense that each of its nodes participates in at least \(\varOmega (m/k)\) hyperedges.

  5. 5.

    (Different noise distributions) The choice of i.i.d based noise is somewhat arbitrary and it seems likely that other noise distributions can be used. In fact, it seems plausible that one can use any noise distribution which has high entropy and cannot be approximated by a low-degree function of few fresh variables (and thus is not subject to linearization attacks).

Given the above discussion, Assumption 2 now follows from Assumption 6 and the existence of an explicit family of expanders. The latter point is discussed in Sect. 7.2.

7.2 Instantiating Assumption 4 (\(\mathbf {{NC^0}}\) Polynomial-Stretch PRG)

In the binary setting, the existence of locally-computable polynomial-stretch PRG was extensively studied in the last decade. (See [4] and references therein.) Let \(f:\mathbb {F}^k\rightarrow \mathbb {F}^m\) be a d-local function which maps a k-long vector x into an m-long vector \((P_1(x_{S_1}),\ldots ,P(x_{S_m}))\) where \(S_i\in [k]^d\) is a d-tuple and \(P_i\) is a d-variate multi-linear polynomial. Over the binary field, it is conjectured that as long as the (kmd) hypergraph \(G=(S_1,\ldots ,S_m)\) is expanding and the \(P_i\)’s are sufficiently “non-degenerate” the function forms a good pseudorandom generator. (This is an extension of Goldreich’s original one-wayness conjecture [28].) In fact, this is conjectured to be the case even if all the polynomials \(P_1,\ldots ,P_m\) are taken to be the same polynomial P. We denote the resulting function by \(f_{G,P}\) and make the analog arithmetic assumption. In the following we say that a function \(f:\mathbb {F}^k\rightarrow \mathbb {F}^m\) is T-pseudorandom if every circuit of size at most T cannot distinguish from with advantage better than 1/T.

Assumption 7

For every finite field \(\mathbb {F}\) and every polynomial m(k) there exists a constant d and a d-variate multi-linear polynomial \(P:\mathbb {F}^d\rightarrow \mathbb {F}\) such that for every (kmd) hypergraph G which is (t, 2d / 3)-expanding the function \(f_{G,P}:\mathbb {F}^k\rightarrow \mathbb {F}^m\) is \(\exp (\varOmega (t))\)-pseudorandom over \(\mathbb {F}\).

As in the case of Alechnovich’s assumption, the constant 2/3 is somewhat arbitrary and a smaller constant may suffice. (A lower-bound of 1/d can be established.) In the binary setting, security was reduced to one-wayness assumption [3] and was analytically established for a large family of algorithms including myopic algorithms, linear tests, statistical algorithms, semi-definite programs and algebraic attacks [6, 7, 11, 23, 38, 47]. Some of these results can be extended to the arithmetic setting as well.

On Explicit Unbalanced Constant-Degree Expanders. In order to employ Assumptions 6 and 7 one needs an explicit family of \((k,m=k^{1+\delta },d=O(1))\) hypergraphs which are \((k^{\epsilon },(1+\varOmega (1))d)\)-expanding.Footnote 11 This assumption is known to be necessary for the existence of d-local (binary) PRG that stretches k bits to m bits [9], and so it was used (either explicitly or implicitly) in previous works who employed such a local PRG (e.g., [2, 34, 39,40,41]).

While recent advances in the theory of pseudorandomness have come close to generating such explicit highly-expanding hypergraphs, in our regime of parameters (\(m=\omega (k)\) and \(d=O(1)\)), an explicit provable construction is still unknown. It is important to mention that, by a standard calculation (cf. [45]), a uniformly chosen hypergraph G (i.e., each hyperedge contains a random d-subset of the nodes) is likely to be \((r=\mathrm {poly}(k),2d/3)\)-expanding except with some inverse polynomial failure probability \(\epsilon (k)\). Moreover, we can reduce the failure probability to \(1/k^c\) for an arbitrary (predetermined) constant c at the expense of increasing the sampling complexity to \(k^{b_c}\), where the constant b grows with c. (This can be done by rejecting hypergraphs which fail to expand for sets of size at most \(b_c\), and re-sampling the hypergraph if needed). As a result one gets a protocol that fails with “tunable” inverse polynomial probability which is independent of the running-time of the adversary. Moreover, the failure event is restricted to a one-time setup phase and its probability does not increase with the number of times the protocol is executed. Such a guarantee seems to be satisfactory in most practical scenarios. Finally, we mention that there are several heuristic approaches for constructing unbalanced constant-degree expanding hypergraphs. For example, by using some fixed sequence of bits (e.g., the binary expansion of \(\pi \)) and interpreting it as an (kmd)-hypergraph via some fixed translation. Assuming such a heuristic to give an explicit construction can be viewed as being a conservative “combinatorial” assumption, in the spirit of standard cryptographic assumptions.