1 Introduction

1.1 Background

In the setting of secure computation, a set of parties with private inputs wish to compute a joint function of their inputs, without revealing anything, but the output. Protocols for secure computation guarantee privacy (meaning that the protocol reveals nothing but the output), correctness (meaning that the correct function is computed), and independence of inputs (meaning that parties are not able to make their inputs depend on the other parties’ inputs). These security guarantees are to be provided in the presence of adversarial behavior. There are two classic adversary models that are typically considered: semi-honest (where the adversary follows the protocol specification, but may try to learn more than is allowed from the protocol transcript) and malicious (where the adversary can run any arbitrary polynomial time strategy in its attempt to breach security).

Garbled Circuits One of the central tools in the construction of secure two-party protocols is Yao’s garbled circuit [18, 22]. The basic idea behind Yao’s protocol is to provide a method of computing a circuit so that values obtained on all wires other than circuit output wires are never revealed. For every wire in the circuit, two random or garbled values are specified such that one value represents 0 and the other represents 1. For example, let i be the label of some wire. Then, two values \(k_i^0\) and \(k_i^1\) are chosen, where \(k_i^b\) represents the bit b. An important observation here is that even if one of the parties knows the value \(k_i^b\) obtained by the wire i, this does not help it to determine whether \(b = 0\) or \(b = 1\) (because both \(k_i^0\) and \(k_i^1\) are identically distributed). Of course, the difficulty with such an idea is that it seems to make computation of the circuit impossible. That is, let g be a gate with incoming wires i and j and output wire \(\ell \). Then, given two random values \(k_i^b\) and \(k_j^c\), it does not seem possible to compute the gate because b and c are unknown. We therefore need a method of computing the value of the output wire of a gate (also a random value \(k_\ell ^0\) or \(k_\ell ^1\)) given the value of the two input wires to that gate.

In short, this method involves providing “garbled computation tables” that map the random input values to random output values. However, this mapping should have the property that given two input values, it is only possible to learn the output value that corresponds to the output of the gate (the other output value must be kept secret). This is accomplished by viewing the four possible inputs to the gate, \(k_i^0,k_i^1,k_j^0\), and \(k_j^1\), as encryption keys. Then, the output values \(k_\ell ^0\) and \(k_\ell ^1\), which are also keys, are encrypted under the appropriate keys from the incoming wires. For example, let g be an OR gate. Then, the key \(k_\ell ^1\) is encrypted under the pairs of keys associated with the input values (1, 1), (1, 0), and (0, 1). In contrast, the key \(k_\ell ^0\) is encrypted under the pair of keys associated with (0, 0).

Fast Garbling and Assumptions Today, secure computation is fast enough to solve numerous problems in practice. This has been achieved due to multiple significant efficiency improvements that have been made on the protocol level, and also due to garbled circuits themselves. Many of the optimizations to garbled circuits—described below—come at the price of assuming strong assumptions on the security of the cryptographic primitives being used. For example, the free-XOR technique requires assuming circular security as well as a type of correlation robustness [7], the use of fixed-key AES requires assuming that AES with a fixed key behaves like a public random permutation [5], reductions in the number of encryption operations from 2 to 1 per entry in the garbled gate requires correlation robustness (when a hash function is used), and a related-key assumption (when AES is used).

Typically, the use of less standard cryptographic assumptions is accepted where necessary, especially in areas like secure computation where the costs are in general very high. However, in practice, solid cryptographic engineering practices dictate a more conservative approach to assumptions. New types of elliptic curve groups are not adopted quickly, people shy away from nonstandard use of block ciphers, and more. This is based on sound principles, and on the understanding that deployed solutions are very hard to change if vulnerabilities are discovered. In the field of secure computation, the willingness to take any assumption that enables a faster implementation stands in stark contrast to standard cryptographic practice. In this paper, we propose to pause, take a step back, and ask the question how much do nonstandard assumptions really cost us and are they justified. We remark, for just one example, that practitioners have warned against assuming that AES is an ideal cipher, due to related-key weaknesses that have been found; see e.g., [4, 6]. Furthermore, the security of AES with a known key was studied in [13], and the results show that the security margin for using AES in this way is arguably not as high as we would like. In particular, [13] present an algorithm that distinguishes 7-round AES with a fixed key from a public random permutation, in time \(2^{56}\) and little memory. As in most situations, if the benefit is huge, then more flexibility with respect to the assumptions is justified, whereas if the gains are smaller, then a more cautious approach is taken.

The focus of this paper is to study how much is really gained by relying on nonstandard assumptions and to provide optimizations that require assuming nothing more than that AES behaves like a pseudorandom function.

1.2 Known Garbled Circuit Optimizations

Before proceeding to describe our work, we present an overview of the most important efficiency improvements to garbled circuits:

  • Point and permute [20]: In order to prevent the garbled circuit evaluator from knowing what it is evaluating, the original construction randomly permuted the ciphertexts in each garbled gate. Then, when computing the garbled circuit, the evaluator tries each ciphertext in the gate until one correctly decrypts (this requires an additional mechanism to ensure that only one ciphertext decrypts to a valid value). On average, this means that 2.5 entries need to be decrypted per gate (where each costs 2 decryptions). The point-and-permute method assigns a random permutation or signal bit to each wire that determines the order of the garbled gate. Then, the encryption of a garbled value includes the bit needed to enable direct access to the appropriate entry in the garbled table (given two garbled values and the two associated bits). This reduces the number of entries to decrypt to 1 (and thus 2 actual decryptions).

  • Free-XOR [15]: The garbled circuit construction involves carrying out encryptions at every gate in the circuit and storing 4 ciphertexts. The free-XOR method enables the computation of XOR gates for free (the computation requires only computing 1–2 XORs, and no ciphertexts need be stored). This is achieved by choosing a fixed random mask \(\Delta \) and making the garbled values on every wire have fixed difference \(\Delta \) (i.e., for every i, the garbled values are \(k_i^{0}\) and \(k_{i}^{1} = k_i^{0}\oplus \Delta \), where \(k_i^{0}\) is random). In many circuits, the number of XOR gates is very large, and so this significantly reduces the cost (e.g., in the AES circuit there are approximately 7000 AND gates and 25,000 XOR gates; in a \(32\times 32\) bit multiplier circuit there are approximately 6000 AND gates and 1000 XOR gates [1]).

    We remark that the free-XOR method is patented, and as such, its use is restricted [16].

  • Reductions in garbled circuit size [14, 20, 21, 23]: Historically, the most expensive part of any secure protocol was the cryptographic operations. However, significant algorithmic improvements to secure protocols together with much faster implementations of cryptographic primitives (e.g., due to better hardware) have considerably changed the equation. In many cases, communication can be the bottleneck, and thus, reducing the size of the garbled circuit is of great importance. In [20], a method for reducing the number of garbled entries in a table from 4 to 3 was introduced; this is referred to as 4-to-3 garbled row reduction (or 4–3 GRR). This improvement is achieved by “forcing” the first ciphertext to be 0 (by setting the appropriate garbled value on the output wire so that the ciphertext becomes 0). In [21], polynomial interpolation was used to further reduce the number of ciphertexts to just 2; this is referred to as 4-to-2 garbled row reduction (or 4–2 GRR).

    Importantly, 4–3 GRR is compatible with free-XOR since only one output garbled value needs to taken as a function of the input values (and the other garbled value can be set according to the fixed \(\Delta \)). In contrast, 4–2 GRR is not compatible with free-XOR. In a work called FlexOR [14], it was shown that in some cases, it is possible to use a combination of 4–2 GRR and 4–3 GRR together with free-XOR, and obtain an overall cost that is less than 4–3 GRR alone.

    The state-of-the-art today is a new method called half-gates [23], which reduces the number of ciphertexts in AND gates from 4 to 2, while maintaining compatibility with free-XOR (in fact, half-gates only work with free XOR).

  • Number of encryptions [19]: Classically, each entry in a garbled gate contains the encryption of one of the output garbled values under two input garbled values and thus requires two encryptions. In [19], it was proposed to use a hash function as a type of key derivation function and to encrypt by hashing both input garbled values together and XORing the result with the output garbled value. This is secure in the random oracle model, or under a “correlation robustness” assumption [12]. This reduces the number of operations from 2 to 1. (Note, however, that two AES operations are typically much faster than a single hash operation, especially when utilizing the AES-NI instruction.)

  • Fixed-key AES and use of AES-NI [5]: AES-NI is a set of CPU instructions that are now part of the Intel architecture. They allow AES computations to be carried out at incredibly fast rates, especially in modes of operation that can be highly pipelined. AES-NI offers instructions for encryption/decryption and for the AES key expansion.

    However, since typical AES usages encrypt multiple blocks with a single key, the key expansion instructions do not highly optimize this part of the processing, and the key schedule generation routine is relatively expensive (compared to encryption/decryption). More importantly, pipelining cannot be carried out between different keys. When computing garbled circuits, 4 different keys are used in every gate, requiring many key schedules to be computed and preventing the use of pipelining.

    In light of this, Bellare et al. [5] proposed a method of using AES that is secure in the public random permutation model. The method uses a fixed key for AES, applies AES on a combination of the input garbled values, and XORs the result with appropriate output garbled value. This reduces the number of AES computation to 4 per gate. Furthermore, since a fixed key is used, only one key schedule needs to be computed for the entire circuit, and the encryptions within a gate can be fully pipelined. This led to an extraordinary speedup in the computation of garbled circuits, as demonstrated in the JustGarble implementation [5].

We stress that there have been a very large number of works that have provided highly significant efficiency improvements to protocols that use garbled circuits. However, our focus here is on improvements to garbled circuits themselves.

1.3 Our Results

We construct fast garbling methods solely under the assumption that AES behaves like a pseudorandom function. In particular, we do not use fixed-key AES, and we require two AES encryptions for generating each ciphertext in the garbled gates (since known techniques using just one encryption require some sort of related-key security assumption). In addition, we do not use free-XOR (since this requires circularity). However, this does enable us to use 4-to-2 row reduction. In brief, we construct the following:

  • Fast AES-NI without fixing the key: We show that, in addition to pipelining encryptions, it is also possible to pipeline the key schedule of AES-NI, in order to achieve very fast garbling times without using fixed-key AES or any other nonstandard AES variant. Namely, the key schedule processing of different keys can be pipelined together, so that the amortized effect of key scheduling on Yao garbling is greatly reduced. Our experiments (described below) show that this and other optimizations of AES operations have become so fast that the benefits of using fixed-key AES are almost insignificant. Thus, in contrast to current popular belief, in most cases fixed-key AES is not necessary for achieving extremely fast garbling.

  • Low-communication XOR gates: Over the past years, it has become apparent that in secure protocols, communication is far more problematic than computation. The free-XOR technique is so attractive exactly because it requires no computation, but also no communication for XOR gates. We provide a new garbling method for XOR gates that requires storing only a single ciphertext per XOR gate; our technique is inspired by the work of Kolesnikov et al. [14]. The computational cost is 3 AES computations for garbling the gate and 1–2 AES computations for evaluating it. (This overhead is for an optimized garbling method that we show. We first present a basic scheme requiring 4 AES computations for garbling and 2 computations for evaluation.)

  • Fast 4–2 row reduction: As we have mentioned, once we no longer use the free-XOR technique, we are able to use 4–2 GRR on the non-XOR gates. However, the method of Pinkas et al. [21] that uses polynomial interpolation is rather complex to implement (requiring finite field operations and precomputation of special constants to make it fast). In addition, even working in \(\textit{GF}(2^{n})\) Galois fields and using the PCLMULQDQ Intel instruction, the cost is still approximately half an AES computation. We present a new method for 4–2 row reduction that uses a few XOR operations only and is trivial to implement.

Table 1 shows the communication cost which our technique incurs, compared to the other optimizations mentioned before, with an emphasis on the security assumption each technique is based upon.

Table 1 Summary of garbled circuit’s size optimizations compared to our scheme

We implemented these optimizations and compared them to JustGarble [5]. There is no doubt that the cost of garbling and evaluation is higher using our method, since we have to run AES key schedules, and we pay for computing XOR gates. However, we show that within protocol executions, the difference is insignificant. We demonstrate this running Yao’s protocol for semi-honest adversaries which consists of oblivious transfer (for which we use the fast OT extensions of [2]), garbled circuit evaluation and computation, and communication.Footnote 1

Experimental Results We ran Yao’s protocol for semi-honest adversaries inside Amazon EC2. The details of the results can be found in Sect. 6. The results show removing the public random permutation assumption does not noticeably affect the performance of the protocol. Furthermore, in many scenarios, such as small circuits, large inputs, or relatively slow communication channels, garbling under the most conservative assumption (the existence of PRFs) performs on par with the most efficient garbling methods.

Patent-Free Garbled Circuits Another considerable advantage of using our method for computing XOR gates with low communication is that it does not rely on the free-XOR technique and thus is not patented. Since patents in cryptography are typically an obstacle to adoption, we believe that the search for efficient garbling techniques that are not patented is of great importance.

Garbling Under Weaker Yet Nonstandard Assumption Our work focuses on the comparison between garbling under a variety of strong assumptions (i.e., circularity, public random permutation) and garbling under a standard pseudorandom function assumption only. However, there are also garbling schemes that have been proven secure under a related-key assumption, but without circularity [14]. In order to provide a more complete picture regarding the trade-off between efficiency and security, we continue the directions introduced by Kolesnikov et al. [14] in Section 5. We present two new heuristics for solving the algorithmic problem presented in [14] and show that a related-key assumption-based garbling scheme (using any of the suggested heuristics) improves garbling and computation time, but fails to significantly reduce the communication overhead of the protocol.

2 Fixed-Key AES Versus Regular AES

2.1 Background

Pipelined Garbling The standard way of garbling a gate uses double encryption. Specifically, given 4 keys \(k_i^0,k_i^1,k_j^0,k_j^1\) for the input wires and 2 keys \(k_\ell ^0,k_\ell ^1\) for the output wire, four computations of the type \(E_{k_i^a}(E_{k_j^b}(k_\ell ^c))\) are made, for varying values of \(a,b,c\in \{0,1\}\). Observe that since \(E_{k_j^b}(k_\ell ^c)\) must be known before encrypting again with \(k_i^a\), this means that the encryptions must be computed sequentially and not in parallel. This makes a huge difference when using the AES-NI chip, since the cost of 8 pipelined encryptions is only slightly more than the cost of a single non-pipelined encryption.Footnote 2 We therefore garble an AND gate in a way that enables pipelining. This is easily achieved by using the method of Lindell et al. [19] to apply a pseudorandom function F (which will be instantiated as AES) to the gate index and appropriate signal/permutation bits. This ensures independence between all values. For example, an AND gate where both signal bits are 0 can be garbled as follows:

$$\begin{aligned} F_{k_i^0}(g \Vert 00) \oplus F_{k_j^0}(g\Vert 00) \oplus k_\ell ^0&F_{k_i^0}(g \Vert 01) \oplus F_{k_j^1}(g\Vert 01) \oplus k_\ell ^0\\ F_{k_i^1}(g \Vert 10) \oplus F_{k_j^0}(g\Vert 10) \oplus k_\ell ^0&F_{k_i^1}(g \Vert 11) \oplus F_{k_j^1}(g\Vert 11) \oplus k_\ell ^1 \end{aligned}$$

(One way of looking at this is simply double encryption in “counter mode”; intuitively, this is therefore secure.) Needless to say, 4-to-3 GRR can also be carried out by setting \(k_\ell ^0=F_{k_i^0}(g \Vert 00) \oplus F_{k_j^0}(g\Vert 00)\), meaning that the first ciphertext equals 0 and so need not be stored. Observe here that there are 8 encryptions. However, all inputs are known in advance, and therefore, it is possible to pipeline these computations.

Note that it is essential to take both signal bits as part of the input of F. Otherwise, the scheme is not secure. To understand this, assume that the gate was garbled as in the example above, but without using signal bits (e.g., the value \(F_{k_i^1}(g)\) is used instead of \(F_{k_i^1}(g \Vert 10)\)), and assume that the evaluator holds the keys \(k_i^0,k_j^0\). The evaluator will compute \(k_\ell ^0\), but then it will also be able to compute \(F_{k_i^1}(g)\) and \(F_{k_j^1}(g)\) using the second and the third garbled entries (without learning the values of \(k_j^1\) or \(k_i^1\)). Now, the evaluator would be able to compute \(k_\ell ^1\) as well, using the fourth garbled entry. Taking both signal bits as part of F’s input prevents this from happening, as the evaluator cannot learn \(F_{k_i^1}(g \Vert 11)\) and \(F_{k_j^1}(g\Vert 11)\).

The fixed-key AES approach. Although the approach described above enables pipelining the encryption, it still requires running four key schedules for garbling a gate and two key schedules for evaluating a gate. This is very expensive, and so Bellare et al. [5] introduced the use of fixed-key AES in garbling schemes and implemented the JustGarble library. This significantly speeds up garbling since the AES key schedule (which is quite expensive) need not be computed at every gate. In addition, JustGarble utilizes the AES-NI instruction set and pipelining, significantly reducing the cost of the AES computations.

Despite its elegance, the use of fixed-key AES requires the assumption that AES with a fixed and known key behaves like a random permutation. This is a very strong assumption and one that has been brought into question regarding AES specifically by the block cipher research community; see, for example, [4, 6, 13]. Clearly, the acceptance of this assumption in the context of secure computation and garbling is due to the perceived very high cost of garbling in any other way. However, the comparisons carried out in [5] to prior work are to Kreuter et al. [17] who use AES-256 using AES-NI without pipelining, and to Huang et al. [11] who use a hash function only. Thus, it is unclear how much of the impressive speedup achieved by Bellare et al. [5] is due to the savings obtained by using fixed-key AES, and how much is due to the other elements that they included (pipelining of the AES computations in each gate, optimizations to the circuit representation, and more).

2.2 Pipelining Key Schedule and Encryption

In this section, we show that it is possible to achieve fast garbling without using fixed-key AES and thus without resorting to the assumption that AES with a fixed key behaves like a public random permutation. We stress that some penalty will of course be incurred since the AES key schedule is expensive. Nevertheless, we show that when properly implemented, in many cases the penalty is not significant and it suffices to use regular AES. The goal is to make the performance depend on the throughput (which is excellent when pipelining is used) and not on the latency of a single computation. This goal can be achieved rather easily for the AES encryption alone, but we also achieve the more challenging task of pipelining the key schedule as well as the encryption.

The computations that are needed for garbling and evaluating garbled circuits are as follows:

  • KS4_ENC8: This consists of the computation of 4 AES key schedules from 4 different keys. The resulting keys are then used to encrypt 8 blocks (each key is used for encrypting 2 blocks). This is used for garbling AND (and other non-XOR) gates.

  • KS2_ENC2: This consists of the computation of 2 AES key schedules from 2 different keys. The resulting keys are then used to encrypt 2 blocks (each key is used for encrypting 1 block). This is used for evaluating all gates.

  • KS4_ENC4: This consists of the computation of 4 AES key schedules from 4 different keys. The resulting keys are then used to encrypt 4 blocks (each key is used for encrypting 1 block). This is used for garbling XOR gates according to our new XOR gate garbling scheme described in Sect. 3.2.

A naïve software implementation approach for these computations would use the appropriate sequence of calls to a “key expansion” function and to a “block encryption” function. To estimate the performance of that approach, we use, as a comparison baseline, the OpenSSL (1.0.2) library, running on the Haswell architecture.Footnote 3

Software running on this processor can use the AES hardware support, known as AES-NI (see [8, 9] for details). On this platform, a call (using the OpenSSL library) to an AES key expansion consumes 149 CPU cycles. A call to an (ECB) encryption function to encrypt 2/4/8 blocks consumes approximately \(70+\) cycles (explanation is provided below). However, OpenSSL’s API does not support ECB encryption with multiple key schedules. For example, this implies that KS4_ENC4 would required 4 calls to the key expansion function, followed by 4 calls to an ECB encryption, each one applied to a single (16B) block. The resulting performance of KS4_ENC4, KS4_ENC8, KS2_ENC2 obtained by calling OpenSSL’s functions (namely “aesni_set_encrypt_key” and “aesni_ecb_encrypt”) is summarized in middle column of Table 2 at the end of this section.

Our goal is to optimize the computations of KS4_ENC4, KS4_ENC8, KS2_ENC2 and alleviate the overhead imposed by the frequent key replacements. We achieve our optimization by: (a) interleaving the encryption of independent blocks; (b) optimizing the key expansion; (c) aggressive interleaving of the operations; (d) building an API that allows for encrypting with multiple key schedules. The details are as follows.

Interleaved Encryption AES encryption on a modern processor is accelerated by using the AES-NI instructions (see [8, 9]). Assuming that the cipher key is expanded to a key schedule of 11 round keys, RK[j], \(j=0, \ldots , 10\), AES encryption of a 16 bytes block X is achieved by the code sequence

figure a

If the latency of the AESENC/AESENCLAST instructions is L cycles, then the above flow can be completed in \(1+10L\) cycles. However, if the throughput of AESENC/AESENCLAST is 1 (i.e., pipelining can be used and the processor can dispatch AESENC/AESENCLAST every cycle, if the data is available), and the computations encrypt more than one block, the software can interleave the AESENC/AESENCLAST invocations. This achieves a higher computational throughput, compared to the single block encryption. Furthermore, the AESENC/AESENCLAST instructions can be applied to any round key, even those generated by different key schedules. For example, 2 blocks X and Y can be encrypted, with 2 different key schedules KS1 and KS2, by the following code sequence:

figure b

These computations can be completed within \(10L+1\) cycles (the 2 XOR’s of the whitening step can be executed in one cycle). Similarly, encrypting 4/8 blocks with an interleaved software flow could (theoretically) terminate after \((2 + 10L+3)\) /\((4 + 10L+7)\) cycles. (This idealized estimation assumes that the round keys are fetched from the processor’s cache, and ignores the cost of loading/storing the input/output blocks. We point out that the code sequence indeed closely approaches the theoretical performance, under these assumptions.) These computations are dominated only by the throughput of AESENC/AESENCLAST. We note that \(L=7\) on Haswell, and the AESENC/AESENCLAST throughput is 1.

Optimized Key Expansion We were able to optimize the computation of AES key expansion so that it computes (and stores) an AES128 key schedule in 96 cycles on Haswell, which is 1.55 times faster than the code used by OpenSSL on the same platform. The details of this optimization are quite low level, and we provide here only some high-level details. Then, a full set of key expansion code options was contributed to the NSS open source library and can be found in [10].

The AES-NI instruction set includes instructions that facilitate key expansion. For the encryption key schedule, the relevant instruction is AESKEYGENASSIST. However, this instruction does not provide a throughput of 1 and is significantly slower than the AESENC and AESENCLAST operations (the reason being that key schedules are typically run only once, and so, the cost involved in optimizing this instruction was not justified). We observe that the key schedule consists of S-box substitutions together with rotation and XOR operations. Likewise, the last round of AES costs of S-box substitutions together with shift rows (and key mixing, which can be effectively canceled by using a round key of all zeros, since XORing with zero has no effect on the result). Thus, the use of AESKEYGENASSIST can be replaced by a combination of a shuffle followed by an AESENCLAST invocation, to isolate the S-box transformation.Footnote 4 The shuffle is carried out efficiently using the PSHUFB instruction which also has a throughput of 1. We therefore obtain that the key schedule can be “simulated” using much faster instructions. Additional optimizations can be obtained by judicious usage of the available instructions to generate efficient sequences. We give one example. Consider the following portion of the AES key schedule flow (where \(\mathtt {RCON = Rcon[i/4]}\)):

figure c

As explained above, the S-box substitution can be isolated by a shuffle followed by AESENCLAST, and if we place (duplicated) RCON in the second operand of AESENCLAST, the addition of RCON is also done by AESENCLAST. The arrangement and XOR-ing of the “words” can be implemented by the following straightforward flow:

figure d

However, the same functionality can be achieved by a shorter, \(4\) instructions, flow, as follows:

figure e

In this way, the 3 shuffles and 3 xors of the straightforward flow can be replaced by shorter and faster 1 shift, 1 shuffle, and 2 flows. With our optimizations, we were able to write a key expansion code that computes and stores an AES128 key schedule in 96 cycles on Haswell (i.e., 1.55 times faster than OpenSSL).

Multiple aggressive interleaving. A higher degree of optimization can be achieved by interleaving the computations of multiple key expansions. This helps in partially alleviating the key expansion’s dependency on the latency of AESENC. For example, our code for expanding 2 key schedules consumes 124 cycles (on Haswell), which is significantly less than two independent (without interleaving) key schedules, that are \(2 \times 96\) cycles. We applied this technique to obtain an optimized KS4_ENC4 and KS4_ENC8 implementation. For KS2_ENC2, optimization is achieved by “mixed interleaving” of the key expansion and the encryptions.

The performance of the optimized KS4_ENC4, KS4_ENC8, and KS2_ENC2 is summarized in the right column of Table 2.

Table 2 Performance (in cycles) of KS4_ENC4, KS4_ENC8 and KS2_ENC2, measured on the Haswell architecture
Table 3 Garbling and evaluation times for the AES circuit 1000 times (in milliseconds)

2.3 Experimental Results

The results in Table 3 show the garbling and evaluation time of 1000 AES circuits, using the free-XOR technique and 4-to-3 row reduction (as used by JustGarble, in order to make a fair comparison). All methods use pipelining of the encryptions (the last two entries do not use a fixed key and therefore use the encryption pipelining method described in Sect. 2.1). The last entry is based on using also the key scheduling pipelining method described in Sect. 2.2. The table shows the results for garbling and evaluating the circuit (with the garble time first, followed by the evaluation time). We stress that the times in Table 3 are for 1000 computations; thus, a single garbling of the AES circuit using our pipelined key schedule takes 0.74 ms only.

The results were achieved on the Amazon EC2 c4.large Linux instance (with a 2.59GHz Intel Xeon E5-2666 v3 Haswell processor, a single thread, and 3.75GiB of RAM).

The results show that pipelining the key schedule as well as the encryptions (3rd row) reduces time by more than 50% over pipelining the encryptions only (2nd row). Fixed-key AES (1st row) does provide a significant improvement and the best performance. However, the gain in using fixed-key AES is not overwhelming, since, as we will show later on, in many settings the main cost of secure computation is no longer the garbling itself. Namely, although AES takes 86% more time without a fixed key, the objective difference is just 0.344 ms. Thus, when run in a protocol that includes communication, this additional time makes almost no difference. We demonstrate this in our experiments described in Sect. 6.

3 Garbling Under a Pseudorandom Function Assumption Only

3.1 Background

The free-XOR technique [15] is one of the most significant optimizations of garbling. When using this technique, the garbling and evaluation of XOR gates are essentially for free, requiring only two XOR operations for garbling and one for evaluating. In addition, no garbled table is used, thereby significantly reducing communication. However, the free-XOR technique also requires nonstandard assumptions. Specifically, when using this method, there is a global offset \(\Delta \), and on every wire a single random \(k_i^0\) is chosen and the other key is always set to \(k_i^1= k_i^0 \oplus \Delta \). This is secure in the random oracle model [15] or under a circular-secure correlation robustness or circular-secure related-key assumption [7] (correlation robustness is formalized for hash functions, whereas related-key security is for encryption or pseudorandom functions). The need for this assumption is due to the fact that when a global offset is used, multiple encryptions are made under related keys \(k_a,k_a\oplus \Delta \), \(k_b,k_b\oplus \Delta \), and so on. In addition, since these keys are used to encrypt the values \(k_c\) and \(k_c\oplus \Delta \), the ciphertext is related to the secret key which is exactly circular security. We remark that at some additional cost, the circularity assumption can be removed using the FleXOR technique [14]. However, the correlation robustness/related-key assumption remains.Footnote 5

We next show that it is possible to efficiently garble a circuit using a pseudorandom function only. We first show a basic version of our garbling scheme, where the garbled table for a XOR gate contains a single ciphertext and requires 4 pseudorandom function operations for garbling (instead of 8 for an AND gate) and 2 for evaluation. We then show an optimized version that reduces the number of PRF invocations to 3 calls for garbling and 1–2 calls for evaluation. The overhead of these schemes is definitely beyond that of the free-XOR technique. However, as we will show, the techniques are a considerable improvement over the naive method of computing XOR like an AND gate, they enable the usage of 4–2 garbled row reduction (4–2 GRR), and within protocols (where communication and other factors become the bottleneck) they perform well.

3.2 Garbled XOR With a Single Ciphertext

In order to prove security solely under the assumption that the primitive used is a pseudorandom function, all the garbled values on all wires should be independently chosen. Thus, for all pairs of wires i and j, the keys \(k_i^0,k_i^1,k_j^0,k_j^1\) should be independent and either uniformly distributed or pseudorandom. It will be useful to equivalently write the keys as \(k_i^0\) and \(k_i^1=k_i^0\oplus \Delta _i\), and \(k_j^0\) and \(k_j^1=k_j^0\oplus \Delta _j\) where \(\Delta _i,\Delta _j\) are random independent strings.

We use the point-and-permute method, described briefly in the introduction. In order to avoid confusion, we will call the bit used to determine the order of the ciphertexts in the garbling phase the permutation bit (since it determines the random order), and we call the bit that is viewed by the evaluator when it evaluates the circuit the signal bit (since it signals which ciphertext is to be decrypted). We denote the permutation bit on wire i by \(\pi _i\), and we denote the signal bit on wire i by \(\lambda _i\). Observe that if the evaluator has bit \(v_i\) on wire i (for which it does not know the value), then it holds that \(\lambda _i=\pi _i\oplus v_i\). Thus, if \(\pi _i=0\), then the evaluator will see \(\lambda _i=v_i\), and if \(\pi _i=1\), then the evaluation will see \(\lambda _i=\overline{v}_i\) (its complement). Since \(\pi _i\) is random, this reveals nothing about \(v_i\) whatsoever.

We now describe the basic XOR gate garbling method that uses just a single ciphertext. The method requires 4 calls to a pseudorandom function for garbling, but as we have seen, this is inexpensive using AES-NI. (We remark that AND gates are garbled in the standard way, independently of this method.) Denote the input wires to the gate by ij and denote the output wire from the gate by \(\ell \). We therefore have input keys \(k_i^0,k_i^0\oplus \Delta _i\) and \(k_j^0,k_j^0\oplus \Delta _j\). According to the above, we denote by \(\pi _i,\pi _j\) the permutation bits on wires i and j, respectively. As we will see, the keys on the output wire will be determined as a result of the garbling method. The method for garbling a XOR gate with index g is as follows:

  • Step 1—translate input keys on wire i: We first translate the input keys on wire i into new keys \(\tilde{k}_i^0,\tilde{k}_i^1\) by applying a pseudorandom function to the gate index. That is, we compute \(\tilde{k}_i^0 = F_{k_i^0}(g)\) and \(\tilde{k}_i^1 = F_{k_i^1}(g)\), where g is the gate index.

  • Step 2—set offset of wire \(\ell \): The offset of wire \(\ell \) (the output wire) is set to be the offset of the translated values on wire i, namely \(\Delta _\ell =\tilde{k}_i^0 \oplus \tilde{k}_i^1\). (Observe that if the same wires are input to multiple gates, independent values will be obtained since the pseudorandom function is applied to the gate index.)

  • Step 3—translate input keys on wire j: Next, we translate the input keys on wire j so that they too have the offset \(\Delta _\ell \) (this will enable the output key to be computed by XORing the translated input keys, as in the free-XOR technique). Thus, we set \(\tilde{k}_j^{\pi _j} = F_{k_j^{\pi _j}}(g)\) and \(\tilde{k}_j^{\bar{\pi }_j} = \tilde{k}_j^{\pi _j} \oplus \Delta _\ell \), where \(\pi _j\) is the random permutation bit that is associated with the bit 0 on wire j.

  • Step 4—compute output keys on wire \(\ell \): Since \(\tilde{k}_i^0 \oplus \tilde{k}_i^1 = \tilde{k}_j^0 \oplus \tilde{k}_j^1 =\Delta _\ell \), we can now use the free-XOR technique and can define \(k_\ell ^0= \tilde{k}_i^0 \oplus \tilde{k}_j^0\) and \(k_\ell ^1 = k_\ell ^0 \oplus \Delta _\ell \). (Observe that \(\tilde{k}_i^1 \oplus \tilde{k}_j^1 = k_\ell ^0\) as required, since \(\tilde{k}_i^0 \oplus \tilde{k}_i^1 = \tilde{k}_j^0 \oplus \tilde{k}_j^1\) implies that \(\tilde{k}_i^0 \oplus \tilde{k}_j^0 = \tilde{k}_i^1 \oplus \tilde{k}_j^1\). In addition, \(\tilde{k}_i^0 \oplus \tilde{k}_j^1 = \tilde{k}_i^1 \oplus \tilde{k}_j^0 = k_\ell ^1\) as required, since in both cases the result of the XOR is \(\tilde{k}_i^0 \oplus \tilde{k}_j^0 \oplus \Delta _\ell = k_\ell ^0 \oplus \Delta _\ell = k_\ell ^1\).)

  • Step 5—set the ciphertext: Given \(k_i^a\) for any \(a\in \{0,1\}\), the evaluator can easily compute \(\tilde{k}_i^a\). In addition, if it has \(k_j^{\pi _j}\) (as we show, this can be implicitly determined from the signal bit \(\lambda _i\)), then it can compute \(\tilde{k}_j^{\pi _j}\). The only problem is that it cannot compute \(\tilde{k}_j^{\bar{\pi }_j}\) since it does not know \(\Delta _\ell \) (and furthermore \(\Delta _\ell \) cannot be revealed). Thus, the ciphertext for the gate is set to \(T = F_{k_j^{\bar{\pi }_j}}(g) \oplus \tilde{k}_j^{\bar{\pi }_j}\). Now, given \(k_j^{\bar{\pi }_j}\) it is possible to compute \(\tilde{k}_j^{\bar{\pi }_j}\) as well (but without \(k_j^{\bar{\pi }_j}\) the value remains hidden since it is masked by a pseudorandom function keyed by \(k_j^{\bar{\pi }_j}\)).

In order to evaluate a XOR gate g with ciphertext T, given a key \(k_i\) on wire i and a key \(k_j\) on wire j, the evaluator simply needs to compute \(\tilde{k}_i=F_{k_i}(g)\) and either \(\tilde{k}_j = F_{k_j}(g)\) if it has signal bit 0, or \(\tilde{k}_j = F_{k_j}(g) \oplus T\) if it has signal bit 1. Then, the key on the output wire is obtained by finally computing \(k_\ell = \tilde{k}_i \oplus \tilde{k}_j\).

The computational cost of garbling the gate is 4 pseudorandom function computations, and the computational cost of evaluating the gate is 2 pseudorandom function computations. Most significantly, the gate table includes only a single ciphertext.

Reducing the Number of PRF Calls to 3 Observe that the pseudorandom function is used to ensure independence of the \(\Delta \) values between different gates. If we were to just take \(\Delta _\ell = k_i^0 \oplus k_i^1\), then the output \(\Delta \) from two different gates with the same input wire i would be the same, and once again correlation robustness or a related-key assumption would be needed. Thus, it is necessary to compute \(\tilde{k}_i^0 = F_{k_i^0}(g)\) and \(\tilde{k}_i^1 = F_{k_i^1}(g)\). In contrast, \(\tilde{k}_j^{\pi _j}\) can be taken to simply be \(k_j^{\pi _j}\) and the pseudorandom function computation is not needed. This is because \(\Delta _\ell \) is fixed independently of wire j. Using this method, we can reduce the computational cost of garbling the XOR gate from 4 pseudorandom function computations to 3 pseudorandom function computations (and the computational cost of evaluating the gate is decreased from 2 to either 1 or 2 PRF computations). The proof of security with this optimization is somewhat more involved, and we therefore prove it separately from the basic scheme.Footnote 6

Garbling NOT Gates When using free-XOR, it is possible to efficiently garble NOT gates by simply defining them to be XOR with a fixed wire that is always given value 1. Since the XOR gates are free, this is highly efficient. However, since we are not using free-XOR, a different method needs to be found. Fortunately, NOT gates can still be computed for free and with no additional assumption. In order to see this, let g be a NOT gate with input wire i and output wire j, and let \(k_i^0,k_i^1\) be the garbled values on wire i. Then, we simply define \(k_j^0 := k_i^1\) and \(k_j^1 := k_i^0\). During the garbling of the circuit, any gates receiving wire j as input will used these “reversed” values. Furthermore, when evaluating the circuit, if the value \(k_i^0\) is given on wire i, then the result of the NOT gate is \(k_j^1\) which equals \(k_i^0\). Thus, nothing needs to be done. This trivially preserves security since no additional information is provided in the garbled circuit.

3.3 Garbling Scheme Definitions

We use the notation of Bellare et al. [3] in which a garbling scheme consists of 4 algorithms:

  • \(\mathsf{Garble}(1^n,c)\rightarrow (C,e,d)\) is an algorithm that takes as input a security parameter \(1^n\) and a description of a boolean circuit c and returns a triple (Ced), where C represents a garbled circuit, e represents input encoding information (i.e., all the keys on the input wires), and d represents output decoding information (i.e., all the keys on the output wires).

  • \(\mathsf{Encode}(e,x)\rightarrow X\) is a function that takes as input encoding information e and input x and returns garbled input (i.e., the keys on the input wires that are associated with the concrete input x).

  • \(\mathsf{Eval}(C,X)\rightarrow Y\) is a function that takes as input a garbled circuit C and garbled input X and returns garbled output Y (i.e., the keys on the output wires that are associated with the concrete output \(y=c(x)\)).

  • \(\mathsf{Decode}(Y,d)\rightarrow y\) is a function that takes as input decoding information d and garbled output Y and returns either the real output y of the circuit or \(\bot \).

A secure garbling scheme should satisfy three security requirements:

  • Privacy The triple (CXd) should not reveal any information about x that cannot be learned directly from c(x). More formally, there exists a simulator \({\mathcal {S}}\) that receives input \((1^n, c, c(x))\) and outputs a simulated garbled circuit with garbled input and decoding information that is indistinguishable from (CXd) generated using the real garbling functions \(\mathsf{Garble}(1^n,c)\) and \(\mathsf{Encode}(e,x)\). Observe that \({\mathcal {S}}\) knows the output c(x) and does not know the input x.

  • Obliviousness (CX) should not reveal any information about x. More formally, there exists a simulator \({\mathcal {S}}\) that receives input \((1^n,c)\) and outputs a simulated garbled circuit with garbled input that is indistinguishable from (CX) generated using the real garbling functions \(\mathsf{Garble}(1^n,c)\) and \(\mathsf{Encode}(e,x)\). Observe that \({\mathcal {S}}\) here is not even given the output.

  • Authenticity Given (CX) as input, no adversary should be able to produce garbled output \(\tilde{Y}\) that when decoded provides a value that does not equal c(x) or abort, except with negligible probability.

For each security definition, we define an experiment that formalizes the adversary’s task. In the following, G denotes a garbling scheme that consists of the 4 algorithms stated above, and \({\mathcal {S}}\) denotes a simulator.

figure f

The basic non-triviality requirement for a garbling scheme, called correctness, is that for every circuit c and input \(x\in \{0,1\}^{\mathrm{poly}(n)}\), it holds that \(\mathsf{Decode}(\mathsf{Eval}(C,\mathsf{Encode} (e,x)),d) = c(x)\) except with negligible probability, where \((C,e,d)\leftarrow \mathsf{Garble}(1^n,c)\).

Definition 3.1

(Garbled Circuit Security) A garbling scheme is secure if it is correct, and achieves privacy, obliviousness, and authenticity as follows:

  1. 1.

    A garbling scheme G achieves privacy if for every probabilistic polynomial time adversary \(\mathcal{A}\) there exists a probabilistic polynomial time simulator \({\mathcal {S}}\) and a negligible function \(\mu \) such that for every \(n\in {\mathbb {N}}\):

    $$\begin{aligned} \mathrm{Pr}\left[ \mathsf{Expt}^\mathrm{priv}_{G,\mathcal{A},{\mathcal {S}}}(n)=1\right] \le \frac{1}{2}+\mu (n). \end{aligned}$$
  2. 2.

    A garbling scheme G achieves obliviousness if for every probabilistic polynomial time adversary \(\mathcal{A}\) there exists a probabilistic polynomial time simulator \({\mathcal {S}}\) and a negligible function \(\mu \) such that for every \(n\in {\mathbb {N}}\):

    $$\begin{aligned} \mathrm{Pr}\left[ \mathsf{Expt}^\mathrm{oblv}_{G,\mathcal{A},{\mathcal {S}}}(n)=1\right] \le \frac{1}{2}+\mu (n). \end{aligned}$$
  3. 3.

    A garbling scheme G achieves authenticity if for every probabilistic polynomial time adversary \(\mathcal{A}\) there exists a negligible function \(\mu \) such that for every \(n\in {\mathbb {N}}\):

    $$\begin{aligned} \mathrm{Pr}\left[ \mathsf{Expt}^\mathrm{auth}_{G,\mathcal{A}}(n)=1\right] <\mu (n) \end{aligned}$$

3.4 Our Garbling Scheme in Detail

In this section, we provide a full specification of our garbling scheme. In this description, we use the standard 4–3 row reduction technique. In later sections, we will incorporate our new 4–2 row reduction scheme. Our garbling scheme uses a pseudorandom function that takes an n-bit key, and has input and output of length \(n+1\). That is, \(F:\{0,1\}^n \times \{0,1\}^{n+1} \rightarrow \{0,1\}^{n+1}\) (formally, we consider a family of functions, where for every \(n\in {\mathbb {N}}\) the function is of this type). We denote by \(F_k(x)[1..n]\) the first n bits of the output of \(F_k(x)\), and we denote by \(x\Vert y\) the concatenation of x with y. We begin by defining the method for garbling XOR and AND gates in Figs. 1 and 2 (for simplicity we only consider XOR, AND, and NOT gates; the AND gate method can be extended to any gate type) and then proceed to the high-level garbling algorithm in Fig. 3. Finally, we describe the encoding, evaluation, and decoding algorithms.

Fig. 1
figure 1

Garbling XOR gates

Fig. 2
figure 2

Garbling AND gates

Fig. 3
figure 3

Full garbling algorithm

We now proceed to describe the encoding, evaluation, and decoding algorithms. The encoding and decoding algorithms are straightforward and consist merely of mapping the plaintext bit to the garbled value and vice versa. Observe that in the evaluation algorithm we refer to the signal bit \(\lambda _i\) on wire i. The difference between \(\lambda _i\) here and \(\pi _i\) used in the garbling is that \(\lambda _i\) is the “public” signal bit that the evaluator sees. The invariant over this value is that \(\lambda _i\) always equals the XOR of \(\pi _i\) and the actual value on the wire (associated with the encoding X) (Figs. 4, 5, 6).

Correctness We begin by demonstrating correctness. This is immediate for AND and NOT gates; we therefore show that it also holds for XOR gates. Observe that the ciphertext in a XOR gate with input wires ij and output wire \(\ell \) equals \(C[g]=F_{k_j^{\overline{\pi }_j}}(g\Vert 1)[1..n]\oplus \tilde{k}_j^{\overline{\pi }_j}\). However, \(\tilde{k}_j^{\overline{\pi }_j} = \tilde{k}_j^{\pi _j}\oplus \Delta _\ell =F_{k_j^{\pi _j}}(g\Vert 0)[1..n]\oplus \Delta _\ell \) and \(\Delta _\ell =\tilde{k}_i^{\pi _i}\oplus \tilde{k}_i^{\overline{\pi }_i}=F_{k_i^{\pi _i}}(g\Vert 0)[1..n]\oplus F_{k_i^{\overline{\pi }_i}}(g\Vert 1)[1..n]\). Thus,

$$\begin{aligned} C[g] = F_{k_i^{\pi _i}}(g\Vert 0)[1..n]\oplus F_{k_i^{\overline{\pi }_i}}(g\Vert 1)[1..n] \oplus F_{k_j^{\pi _j}}(g\Vert 0)[1..n] \oplus F_{k_j^{\overline{\pi }_j}}(g\Vert 1)[1..n]\qquad \end{aligned}$$
(1)
Fig. 4
figure 4

Encoding algorithm

Fig. 5
figure 5

Evaluation algorithm

Fig. 6
figure 6

Decoding algorithm

where \(\pi _i,\pi _j\) are the permutation bits that are associated with the bit 0 on wires ij, respectively. Now, assume that the evaluator holds the keys \(k_i^{v_i}\) and \(k_j^{v_j}\) that are associated with the (plain) bits \(v_i,v_j\). Then, according to procedure \(\mathsf{Eval}\), it computes: \(F_{{k}_i^{v_i}}(g\Vert \lambda _i)[1..n]\oplus F_{{k}_j^{v_j}}(g\Vert \lambda _j)[1..n]\oplus \lambda _j C[g]\). Thus, if \(\lambda _j=0\), then it computes

$$\begin{aligned} F_{{k}_i^{v_i}}(g\Vert \lambda _i)[1..n]\oplus F_{{k}_j^{v_j}}(g\Vert 0)[1..n] \end{aligned}$$
(2)

and if \(\lambda _j=1\), then it computes

$$\begin{aligned}&F_{{k}_i^{v_i}}(g\Vert \lambda _i)[1..n]\oplus F_{{k}_j^{v_j}}(g\Vert 1)[1..n] \oplus F_{k_i^{\pi _i}}(g\Vert 0)[1..n]\oplus F_{k_i^{\overline{\pi }_i}}(g\Vert 1)[1..n] \nonumber \\&\quad \oplus F_{k_j^{\pi _j}}(g\Vert 0)[1..n] \oplus F_{k_j^{\overline{\pi }_j}}(g\Vert 1)[1..n]. \end{aligned}$$
(3)

Recall that \(\lambda _j=v_j \oplus \pi _j\). Thus, if \(\lambda _j=1\), then \(v_j = \pi _j \oplus 1 = \bar{\pi }_j\), and if \(\lambda _j=0\), then \(v_j = \pi _j\), likewise for \(\lambda _i\), \(v_i\), and \(\pi _i\).

We first consider the case that \(\lambda _j=0\). Note that in wire i, we have that \(\tilde{k}_i^{v_i} = F_{k_i^{v_i}}(g\Vert \pi _i \oplus v_i)[1..n]\) (see Step 2 in Procedure \(\mathsf{GbXOR}\)). Thus, by the above relation between \(\lambda _i\), \(v_i\) and \(\pi _i\), it follows that \(\tilde{k}_i^{v_i} = F_{k_i^{v_i}}(g\Vert \lambda _i)[1..n]\). Furthermore, by Step 4 in Procedure \(\mathsf{GbXOR}\), we have that \(\tilde{k}_j^{\pi _j} = F_{k_j^{\pi _j}}(g\Vert 0)\). In this case of \(\lambda _j=0\), we have that \(v_j=\pi _j\), and thus, \(\tilde{k}_j^{v_j} = F_{k_j^{v_j}}(g\Vert 0)\). Combining this with (2), we conclude that when \(\lambda _j=0\), the evaluator computes

$$\begin{aligned} F_{{k}_i^{v_i}}(g\Vert \lambda _i)[1..n]\oplus F_{{k}_j^{v_j}}(g\Vert 0)[1..n] = \tilde{k}_i^{v_i} \oplus \tilde{k}_j^{v_j}. \end{aligned}$$

Now consider \(\lambda _j=1\). Observe that \(F_{{k}_i^{v_i}}(g\Vert \lambda _i)[1..n] \in \Big \{F_{k_i^{\pi _i}}(g\Vert 0)[1..n], F_{k_i^{\overline{\pi }_i}}(g\Vert 1)[1..n]\Big \}\) and that if \(\lambda _i=0\) ,then \(v_i=\pi _i\), and otherwise, \(v_i=\bar{\pi }_i\). Thus, \(F_{{k}_i^{v_i}}(g\Vert \lambda _i)[1..n]\) cancels out and

$$\begin{aligned} F_{{k}_i^{v_i}}(g\Vert \lambda _i)[1..n] \oplus F_{k_i^{\pi _i}}(g\Vert 0)[1..n] \oplus F_{k_i^{\overline{\pi }_i}}(g\Vert 1)[1..n] = F_{{k}_i^{\bar{v}_i}}(g\Vert \bar{\lambda }_i)[1..n]. \end{aligned}$$

If \(v_i=0\) then \(\lambda _i=\pi _i\) and we have \(F_{{k}_i^{\bar{v}_i}}(g\Vert \bar{\lambda }_i)[1..n] = F_{{k}_i^1}(g\Vert \bar{\pi }_i)[1..n]\), which is exactly \(\tilde{k}_i^1\) according to Step 2 of Procedure \(\mathsf{GbXOR}\). If \(v_i=1\) then \(\lambda _i=\overline{\pi }_i\) and we have \(F_{{k}_i^{\bar{v}_i}}(g\Vert \bar{\lambda }_i)[1..n] = F_{{k}_i^0}(g\Vert \pi _i)[1..n]\), which is exactly \(\tilde{k}_i^0\). In both cases, we receive \(\tilde{k}_i^{\overline{v}_i}\). Likewise \(F_{{k}_j^{v_j}}(g\Vert 1)[1..n] = F_{k_j^{\overline{\pi }_j}}(g\Vert 1)[1..n]\) because \(\lambda _j=1\) and so \(v_j=\overline{\pi }_j\). Thus, this element cancels out and

$$\begin{aligned} F_{{k}_j^{v_j}}(g\Vert 1)[1..n] \oplus F_{k_j^{\pi _j}}(g\Vert 0)[1..n] \oplus F_{k_j^{\overline{\pi }_j}}(g\Vert 1)[1..n] = F_{{k}_j^{\pi _j}}(g\Vert 0)[1..n] =\, \tilde{k}_j^{\pi _j} \,=\,\tilde{k}_j^{\overline{v}_j}. \end{aligned}$$

where the second last equality is from Step 4 in Procedure \(\mathsf{GbXOR}\). We conclude that when \(\lambda _j=1\), the evaluator receives \(\tilde{k}_i^{\overline{v}_i} \oplus \tilde{k}_j^{\overline{v}_j}\).

Since \(\tilde{k}_i^0\oplus \tilde{k}_i^1 = \tilde{k}_j^0 \oplus \tilde{k}_j^1\), we conclude that the output equals \(\tilde{k}_i^{v_i} \oplus \tilde{k}_j^{v_j}\) for both values of \(\lambda _j\). The fact that this yields the correct output is immediate from the way the output wire values are chosen for the gate.

Intuition for Security As just explained, the ciphertext in a XOR gate is the result of XORing the four outputs of the pseudorandom function:

$$\begin{aligned} C[g] = F_{k_i^{\pi _i}}(g\Vert 0)[1..n]\oplus F_{k_i^{\overline{\pi }_i}}(g\Vert 1)[1..n] \oplus F_{k_j^{\pi _j}}(g\Vert 0)[1..n]\oplus F_{k_j^{\overline{\pi }_j}}(g\Vert 1)[1..n] \end{aligned}$$

Each one of these four computations uses a different key, from which only two keys are known to the evaluator. Since we use the gate index as an input to the function, we are guaranteed that when a wire enters multiple gates, the pseudorandom values we compute will be different in each of the gates. Thus, the ciphertext looks like a random string to the evaluator. In addition, the output wire key values are determined by the result of the pseudorandom function computation as well. Thus, they are new keys that do not appear elsewhere in the circuit. We stress that the four values in the equation above are not the four new translated keys. If that was the case, then XORing them would yield 0, because the same offset is used in both wires after the translation. Instead, the first three values are the translated keys, but the last value is just a pseudorandom string that is used to mask them in a “one-time pad”-like encryption.

A similar argument applies for AND gates. Since the evaluator can compute only two of the eight PRF computations using the two keys it holds, and since the values that are used in computing the garbled table are unique and do not appear elsewhere in the circuit (again, this is ensured by using the gate index and the permutation bits as input to each pseudorandom function computation), the gate ciphertexts that are not associated with the keys known to the evaluator, look random to the evaluator.

3.5 Proof of Security

3.5.1 Preliminaries

We begin by defining an experiment based on pseudorandom functions that will be convenient for proving security of the garbling scheme. As we have mentioned, we consider a family of functions \({\mathcal {F}}=\{F_n\}_{n\in {\mathbb {N}}}\) where for every n it holds that \(F_n:\{0,1\}^n\times \{0,1\}^{n+1}\rightarrow \{0,1\}^{n+1}\). For clarity, we drop the subscript and write \(F_k(x)\) where \(k\in \{0,1\}^n\) instead of \(F_n(k,x)\).

We now define the experiment, call 2PRF. In this experiment, the distinguisher/adversary is given access to four oracles, divided into two pairs. The second and fourth oracles are always pseudorandom functions \(F_{k_1}\) and \(F_{k_2}\), respectively. In contrast, the first and third oracles are either the same pseudorandom functions \(F_{k_1}\) and \(F_{k_2}\), respectively, or independent truly random functions \(f^1\) and \(f^3\). Clearly, if \(\mathcal{A}\) can make the same query to the first and second oracle, or to the third and fourth oracle, then it can easily distinguish the cases. The security requirement is that as long as it does not make such queries, it cannot distinguish the cases. We prove that this property holds for any pseudorandom function. The experiment is formally defined in Fig. 7, and 2PRF security is formalized in Definition 3.2.

Fig. 7
figure 7

2PRF experiment

Definition 3.2

Let \({\mathcal {F}}=\{F_n\}_{n\in {\mathbb {N}}}\) be an efficient family of functions where for every n, \(F_n:\{0,1\}^n\times \{0,1\}^{n+1}\rightarrow \{0,1\}^{n+1}\). Family \({\mathcal {F}}\) is a 2PRF if for every probabilistic polynomial time adversary \(\mathcal{A}\) there exists a negligible function \(\mu \) such that for every n,

$$\begin{aligned} \left| \mathrm{Pr}[\mathsf{Expt}_{{\mathcal {F}},\mathcal{A}}^{2PRF}(n,1)=1]-\mathrm{Pr}[\mathsf{Expt}_{\mathcal{A}}^{2PRF}(n,0)]=1\right| <\mu (n) \end{aligned}$$

The following lemma shows that pseudorandomness of \(F_k\) is sufficient for it to be 2PRF as well.

Lemma 3.3

If \({\mathcal {F}}\) is a family of pseudorandom functions, then it is a 2PRF.

Proof

Assume that F is a PRF. Denote by \(\mathsf{Expt}_\mathcal{A}^{g_1,g_2,g_3,g_4}(n)\) the experiment where \(\mathcal{A}\) is given oracle access to functions \(g_1,g_2,g_3,g_4\) (under the input limitations outlined in the experiment). Using this notation, we have that \(\mathsf{Expt}_{{\mathcal {F}},\mathcal{A}}^{2PRF}(n,0)=\mathsf{Expt}_\mathcal{A}^{F_{k_1},F_{k_1},F_{k_2},F_{k_2}}(n)\) and \(\mathsf{Expt}_{{\mathcal {F}},\mathcal{A}}^{2PRF}(n,1)=\mathsf{Expt}_\mathcal{A}^{f^1,F_{k_1},f^2,F_{k_2}}(n)\).

First, a straightforward reduction to the security of the pseudorandom function (with a hybrid for two pseudorandom functions) yields that for every probabilistic polynomial time adversary \(\mathcal{A}\) there exists a negligible function \(\mu \) such that for every n,

$$\begin{aligned} \left| \mathrm{Pr}\left[ \mathsf{Expt}_\mathcal{A}^{F_{k_1},F_{k_1},F_{k_2},F_{k_2}}(n) =1\right] - \mathrm{Pr}\left[ \mathsf{Expt}_\mathcal{A}^{f^1,f^1,f^2,f^2}(n) =1\right] \right| \le \mu (n). \end{aligned}$$

Note that oracle access to the same random function twice or to two different random functions is identical when there is a constraint that the same input cannot be supplied to both oracles. Thus, for every adversary \(\mathcal{A}\) and for every n,

$$\begin{aligned} \mathrm{Pr}\left[ \mathsf{Expt}_\mathcal{A}^{f^1,f^1,f^2,f^2}(n) =1\right] = \mathrm{Pr}\left[ \mathsf{Expt}_\mathcal{A}^{f^1,f^3,f^2,f^4}(n) =1\right] . \end{aligned}$$

Next, we claim that for every probabilistic polynomial time adversary \(\mathcal{A}\) there exists a negligible function \(\mu \) such that for every n,

$$\begin{aligned} \left| \mathrm{Pr}\left[ \mathsf{Expt}_\mathcal{A}^{f^1,f^3,f^2,f^4}(n) =1\right] - \mathrm{Pr}\left[ \mathsf{Expt}_\mathcal{A}^{f^1,F_{k_1},f^2,f^4}(n) =1\right] \right| \le \mu (n). \end{aligned}$$

This follows from a direct reduction to the pseudorandomness of F (the reduction simulates \(f^1,f^3,f^4\) itself and uses its oracle to either have \(f^2\) or \(F_{k_1}\)). Likewise, a direction reduction yields that for every probabilistic polynomial time adversary \(\mathcal{A}\) there exists a negligible function \(\mu \) such that for every n,

$$\begin{aligned} \left| \mathrm{Pr}\left[ \mathsf{Expt}_\mathcal{A}^{f^1,F_{k_1},f^2,f^4}(n) =1\right] - \mathrm{Pr}\left[ \mathsf{Expt}_\mathcal{A}^{f^1,F_{k_1},f^2,F_{k_2}}(n) =1\right] \right| \le \mu (n). \end{aligned}$$

Here the reduction simulates \(f^1,F_{k_1},f^2\) itself. Combining all of the above, we conclude that for every adversary \(\mathcal{A}\) there exists a negligible function \(\mu \) such that for every n,

$$\begin{aligned}&\left| \mathrm{Pr}\left[ \mathsf{Expt}_{{\mathcal {F}},\mathcal{A}}^{2PRF}(n,0)=1\right] - \mathrm{Pr}\left[ \mathsf{Expt}_{{\mathcal {F}},\mathcal{A}}^{2PRF}(n,1)=1\right] \right| \\&\quad = \left| \mathrm{Pr}\left[ \mathsf{Expt}_\mathcal{A}^{F_{k_1},F_{k_1},F_{k_2},F_{k_2}}(n) =1\right] - \mathrm{Pr}\left[ \mathsf{Expt}_\mathcal{A}^{f^1,F_{k_1},f^2,F_{k_2}}(n) =1\right] \right| \le \mu (n). \end{aligned}$$

\(\square \)

3.5.2 The Proof of Security of Our Garbling Scheme

We begin by proving that our garbling scheme achieves privacy. Let G denote our garbling scheme. Our proof follows the high-level structure of Lindell and Pinkas [18], with modifications as needed for our garbling scheme.

Theorem 3.4

If \({\mathcal {F}}\) is a family of pseudorandom functions, then the garbling scheme G achieves privacy.

Proof

We begin by describing a simulator \({\mathcal {S}}\) for the \(\mathsf{Expt}^\mathrm{priv}\) privacy experiment. S is invoked with input \((1^n, c, c(x))\) and works as follows. As we will show, \({\mathcal {S}}\) will define an active key on every wire. This key will be the one that is “obtained” in the evaluation procedure. The other key is not active and is actually never explicitly defined. Rather, all the ciphertexts in the gates that are not “decrypted” in the evaluation are chosen at random.

  1. 1.

    For each input wire j in circuit c:

    1. (a)

      Choose an active key: \(k_j\leftarrow \{0,1\}^n\)

    2. (b)

      Choose an active signal bit \(\lambda _j\leftarrow \{0,1\}\)

    3. (c)

      Prepare the garbled input data: \(X[j]=k_j\Vert \lambda _j\)

  2. 2.

    In topological order, for each gate g in c:

    1. (a)

      If g is a XOR gate with input wires ij and output wire \(\ell \):

      1. (i)

        Compute the active output wire signal bit: \(\lambda _\ell :=\lambda _i\oplus \lambda _j\)

      2. (ii)

        Compute a translated new key for wire i: \(\tilde{k}_i:=F_{k_i}(g\Vert \lambda _i)[1..n]\)

      3. (iii)

        Compute a translated new key for wire j, and the ciphertext for this gate:

        1. A.

          If \(\lambda _j=0\), set \(\tilde{k}_j:=F_{k_j}(g\Vert 0)[1..n]\) and \(C[g]\leftarrow \{0,1\}^n\) (in this case, the translated key is obtained by computing F and so is correctly computed, but the ciphertext is not used and so is random)

        2. B.

          If \(\lambda _j=1\), set \(\tilde{k}_j\leftarrow \{0,1\}^n\) and \(C[g]:=F_{k_j}(g\Vert 1)[1..n]\oplus \tilde{k}_j\) (in this case, the translated key is obtained via the ciphertext, and so the ciphertext is correctly computed, but using a random key)

      4. (iv)

        Compute the output wire active key: \(k_\ell :=\tilde{k}_i\oplus \tilde{k}_j\)

    2. (b)

      If g is an AND gate with input wires ij and output wire \(\ell \):

      1. (i)

        Set the output wire active key and signal bit:

        1. A.

          If \(\lambda _i=\lambda _j=0\), set \(k_\ell \Vert \lambda _\ell :=F_{k_i}(g\Vert 00)\oplus F_{k_j}(g\Vert 00)\) (in this case, the output key is computed via F and so must be set in this way)

        2. B.

          Else, set \(k_\ell \Vert \lambda _\ell \leftarrow \{0,1\}^{n+1}\) (in this case, the output key is computed from the ciphertexts and is chosen at random)

      2. (ii)

        Compute the gate’s ciphertexts (they are random except for the one that is opened according to the active signal bits):

        1. A.

          If \(2\lambda _i+\lambda _j\ne 0\), then \(T_{2\lambda _i+\lambda _j}:=F_{k_i}(g\Vert \lambda _i\lambda _j)\oplus F_{k_j}(g\Vert \lambda _i\lambda _j)\oplus k_\ell \Vert \lambda _\ell \)

        2. B.

          For \(\alpha \in \{1,2,3\}\setminus \{2\lambda _i+\lambda _j\}:\) \(T_{\alpha }\leftarrow \{0,1\}^{n+1}\)

        3. C.

          \(C[g]\leftarrow {T_1,T_2,T_3}\)

      3. (iii)

        If g is a NOT gate with input wire i and output wire \(\ell \): set \(k_\ell \Vert \lambda _\ell =k_i\Vert \lambda _i\)

  3. 3.

    For each output wire j in c:

    1. (a)

      Prepare the decoding information: \(d[j,c(x)_j]:=F_{k_j}(\mathrm{out}\Vert \lambda _j)\) and \(d[j,\overline{c(x)}_j]\leftarrow \{0,1\}^n\)

  4. 4.

    Return (CXd)

Note that the garbled tables in the simulator-generated garbled circuit consists of random strings, except for the ciphertexts used in the evaluation itself. Specifically, in an AND gate all ciphertexts are random in the case that \(\lambda _i=\lambda _j=0\) since none are used in evaluation; in all other cases, the single ciphertext which is decrypted is constructed “correctly,” whereas all other are random. Likewise, in a XOR gate where \(\lambda _j=0\) the ciphertext is random since in this case the ciphertext is not used in evaluation.

We now show that the simulated garbled circuit is indistinguishable from a real garbled circuit by reduction to the 2PRF experiment, which by Lemma 3.3 follows merely from the fact that F is a pseudorandom function. Let \(\mathcal{A}\) be a probabilistic polynomial time adversary for \(\mathsf{Expt}^\mathrm{priv}\), and let m denote the number of gates in the circuit. We define a hybrid distribution \(H_i(c,x)\) with \(0\le i \le m\) as the triple (CXd) generated in the following way (note that the procedure for generating \(H_i(c,x)\) is given the circuit c and the real input x):

  • Garbling of gates The garbled circuit C is generated by garbling the first i gates in the topological order using the simulator garbling procedure, while gates \(i+1,\ldots ,m\) are garbled using the real garbling scheme. Observe that the simulator generates only a single key per wire; specifically, it generates the active key \(k_j^{\lambda _j}\). The first step in the hybrid is therefore to choose an additional key \(k_j^{1-\lambda _j}\) for every wire that enters or exits a gate that is garbled according to the real scheme.

  • Encoding information X For each circuit input wire j that enters a gate g, if g is garbled using the real scheme (i.e., the gate’s index \(> i\)), then X[j] is the garbled value that was chosen to represent the jth bit of the input (recall that in experiment \(\mathsf{Expt}^\mathrm{priv}\), the adversary knows the input string x and so can choose the correct encoding for x). Else, if g is garbled using the simulator procedure (i.e., the gate’s index \(\le i\)), then X[j] is the garbled value that was chosen for the active key of that wire.

  • Decoding information d For each output wire j that exits from a gate g, if g was garbled using the real scheme then there are two garbled values on wire j, and \(d[j,\cdot ]\) is generated exactly as in the \(\mathsf{Garble}\) procedure. Else, if g is garbled using the simulator instructions, then there is only one garbled value on j and \(d[j,\cdot ]\) is generated exactly as in the simulator procedure.

Note that the hybrid \(H_0(x)\) is a real garbled circuit (and is distributed as (CXd) in \(\mathsf{Expt}^\mathrm{priv}_{G,\mathcal{A},{\mathcal {S}}}\) in the case that \(\beta =0\)), while \(H_m(x)\) is the output of the simulator S (and is distributed as (CXd) in \(\mathsf{Expt}^\mathrm{priv}_{G,\mathcal{A},{\mathcal {S}}}\) in the case that \(\beta =1\)). Next, for each \(0\le i\le m\), we define \(\mathcal{A}_i\) to be a probabilistic polynomial time adversary for \(\mathsf{Expt}_{{\mathcal {F}},\mathcal{A}_i}^{2PRF}(n,\sigma )\) experiment. \(\mathcal{A}_i\) is given access to four oracles:

$$\begin{aligned} ({\mathcal {O}}^{(1)}(\cdot ),{\mathcal {O}}^{(2)}(\cdot ),{\mathcal {O}}^{(3)}(\cdot ),{\mathcal {O}}^{(4)}(\cdot ))= (f^1(\cdot )\ or\ F_{k_1}(\cdot ), F_{k_1}(\cdot ), f^2(\cdot )\ or\ F_{k_2}(\cdot ), F_{k_2}(\cdot )). \end{aligned}$$

Adversary \(\mathcal{A}_i\) runs \(\mathsf{Expt}^\mathrm{priv}\) with adversary \(\mathcal{A}\). First, it invokes \(\mathcal{A}\) and receives (cx). Then, as we will see, it constructs a garbled circuit which will either be distributed according to \(H_{i-1}\) or \(H_{i}\), depending on the oracles it received. Thus, as we will show, if \(\mathcal{A}\) can succeed in \(\mathsf{Expt}^\mathrm{priv}\) with probability that is non-negligibly greater than 1 / 2, then \(\mathcal{A}_i\) will distinguish in the 2PRF experiment with non-negligible probability.

Formally, adversary \(\mathcal{A}_i\) constructs a garbled circuit by generating the first \(i-1\) gates in topological order using the simulator procedure, and generating the gates indexed by \(i+1,\ldots ,m\) using the real \(\mathsf{Garble}\) instructions (with subroutines \(\mathsf{GbXOR}\) and \(\mathsf{GbAND}\)). However, for the ith gate, \(\mathcal{A}_i\) will use its oracles to generate a garbled table that is garbled as in the real scheme or as in the simulator code, depending whether it received an oracle access to pseudorandom or to random functions. Assume the input wires of the ith gate are ab and the output wire is c. In addition, assume that the active keys on the input wires are associated with the bits \(v_a,v_b\) (recall that \(\mathcal{A}_i\) knows the input to the circuit and thus \(v_a,v_b\) are known to it). Knowing \(k_a^{v_a}\) and \(k_b^{v_b}\), adversary \(\mathcal{A}_i\) will (implicitly) use the secrets \(k_1,k_2\) that were chosen for the pseudorandom function in \(\mathsf{Expt}^{2PRF}\) as \(k_a^{\overline{v}_a}\) and \(k_b^{\overline{v}_b}\), respectively. Thus, whenever \(\mathcal{A}_i\) needs to compute \(F_{k_a^{\overline{v}_a}}(x)\) or \(F_{k_b^{\overline{v}_b}}(x)\) for some x, it will send x to its oracles \({\mathcal {O}}^{(1)}\) or \({\mathcal {O}}^{(3)}\), respectively (recall that these are either also \(F_{k_1},F_{k_2}\) or are random functions \(f^1,f^2\)). We remark that \({\mathcal {O}}^{(2)}\) and \({\mathcal {O}}^{(4)}\) are used to garble gates \(\ell >i\) that use wires ab as well; this will be described after we present the method for garbling the ith gate. We separately consider the case that the ith gate is a XOR gate and the case that it is AND gate.

Case 1—the ith gate is a XOR gate: the keys on the input wires to this gate were generated using the simulator procedure. Thus, \(\mathcal{A}_i\) holds one key on each input wire a and b, denoted \(k_a\) and \(k_b\), respectively. \(\mathcal{A}_i\) sets these keys to be \(k_a^{v_a}\) and \(k_b^{v_b}\), respectively. In addition, \(\mathcal{A}_i\) has signal bits \(\lambda _a,\lambda _b\) that were determined on these wires. \(\mathcal{A}\) constructs the gate as follows:

  1. 1.

    \(\mathcal{A}_i\) computes the permutation bit for the output wire c: \(\pi _c:=\pi _a\oplus \pi _b=(\lambda _a\oplus v_a)\oplus (\lambda _b\oplus v_b)\)

  2. 2.

    \(\mathcal{A}_i\) computes new translated keys for wire a: \(\tilde{k}_a^{v_a}:=F_{k_a^{v_a}}(g\Vert \lambda _a)[1..n]\) and \(\tilde{k}_a^{\overline{v}_a}:={\mathcal {O}}^{(1)}(g\Vert \overline{\lambda }_a)[1..n]\) (observe that if \({\mathcal {O}}^{(1)}\) is pseudorandom then this is a “real” key value, whereas if it is a random function then this is an independent random key)

  3. 3.

    \(\mathcal{A}_i\) computes the offset of the output wire: \(\Delta _c:=\tilde{k}_a^{v_a}\oplus \tilde{k}_a^{\overline{v}_a}\)

  4. 4.

    \(\mathcal{A}_i\) computes new translated keys for wire b and the ciphertext:

    1. (a)

      If the signal bit of \(k_b^{v_b}\) is 0 (i.e., if \(\lambda _b=0\)), set \(\tilde{k}_b^{v_b}:=F_{k_b^{v_b}}(g\Vert 0)[1..n]\) and \(\tilde{k}_b^{\overline{v}_b}:=\tilde{k}_b^{v_b}\oplus \Delta _c\), and define \(C[g]:={\mathcal {O}}^{(3)}(g\Vert 1)[1..n]\oplus \tilde{k}_b^{\overline{v}_b}\)

    2. (b)

      Else, if the signal bit of \(k_b^{v_b}\) is 1 (i.e., if \(\lambda _b=1\)), set \(\tilde{k}_b^{\overline{v}_b}:={\mathcal {O}}^{(3)}(g\Vert 0)[1..n]\) and \(\tilde{k}_b^{v_b}:=\tilde{k}_b^{\overline{v}_b}\oplus \Delta _c\), and define \(C[g]:=F_{k_b^{v_b}}(g\Vert 1)[1..n]\oplus \tilde{k}_b^{v_b}\)

  5. 5.

    \(\mathcal{A}_i\) computes the output wire keys: \(k_c^0:=\tilde{k}_a^0\oplus \tilde{k}_b^0\) and \(k_c^1:=k_c^0\oplus \Delta _c\)

It easy to see that when \(\sigma =0\) (and the oracle answers are pseudorandom strings), the code is identical to the real garbling scheme. In contrast, when \(\sigma =1\) (and the oracle the answers are random strings), then the result is exactly according to the simulator instructions. In order to see this, observe that if \(\lambda _b=0\) then C[g] is random, exactly as in Step 2(a)iiiA of the simulator. This is because \({\mathcal {O}}^{(3)}\) is random and so the XOR with \(\tilde{k}_b^{\overline{v}_b}\) makes no difference. Likewise, if \(\lambda _b=1\) then the active key \(\tilde{k}_b^{v_b}\) is random since it is the XOR of the output of \({\mathcal {O}}^{(3)}\) with another value, and C[g] is the XOR of this key with the appropriate output from \(F_{k_b^{v_b}}\). Thus, this is also exactly as in Step 2(a)iiiB of the simulator.

Case 2—the ith gate is an AND gate: As before, for wires a and b, \(\mathcal{A}_i\) has two keys \(k_a^{v_a},k_b^{v_b}\), two signal bits \(\lambda _a,\lambda _b\) and the bits \(v_a,v_b\) that are on the wires. Then, it does the following:

  1. 1.

    Compute the values \(K_0,\ldots ,K_3\):

    $$\begin{aligned} K_{2\lambda _a+\lambda _b}:= & {} F_{k_a^{v_a}}(g\Vert \lambda _a\lambda _b)\oplus F_{k_b^{v_b}}(g\Vert \lambda _a\lambda _b) K_{2\lambda _a+\overline{\lambda }_b} \\&:= F_{k_a^{v_a}}(g\Vert \lambda _a\overline{\lambda }_b)\oplus {\mathcal {O}}^{(3)}(g\Vert \lambda _a\overline{\lambda }_b)\\ K_{2\overline{\lambda }_a+\lambda _b}:= & {} {\mathcal {O}}^{(1)}(g\Vert \overline{\lambda }_a\lambda _b)\oplus F_{k_b^{v_b}}(g\Vert \overline{\lambda }_a\lambda _b) K_{2\overline{\lambda }_a+\overline{\lambda }_b}\\&:= {\mathcal {O}}^{(1)}(g\Vert \overline{\lambda }_a\overline{\lambda }_b)\oplus {\mathcal {O}}^{(3)}(g\Vert \overline{\lambda }_a\overline{\lambda }_b) \end{aligned}$$
  2. 2.

    Set the output wire keys and permutation bits:

    1. (a)

      Compute: \(\pi _a=v_a\oplus \lambda _a\) and \(\pi _b=v_b\oplus \lambda _b\)

    2. (b)

      If \(\pi _a=\pi _b= 1\), set \(k_c^0\Vert \pi _c\leftarrow \{0,1\}^{n+1}\) and \(k_c^1:=K_0[1..n]\)

    3. (b)

      Else, set \(k_c^0\Vert \pi _c:=K_0\) and \(k_c^1\leftarrow \{0,1\}^n\)

    Denote \(K_c^0:= k_c^0\Vert \pi _c\) and \(K_c^1:=k_c^1\Vert \overline{\pi }_c\)

  3. 3.

    Compute the ciphertexts: For \(\alpha \in \{1,2,3\}\),

    1. (a)

      If \(\alpha =2\overline{\pi }_a+\overline{\pi }_b\), then \(T_{\alpha }:=K_{\alpha }\oplus K_c^1\)

    2. (b)

      Else: \(T_{\alpha }:=K_{\alpha }\oplus K_c^0 \)

    Set \(C[g]\leftarrow \{T_1,T_2,T_3\}\)

As in the previous case, when \(\sigma =0\), the code is identical to real garbling scheme. When \(\sigma =1\), the answers of the oracles are random strings, and therefore all the rows in the garbled table are random as well, except for the row that is pointed to by the signal bits of the active keys (the row \(T_{2\lambda _a+\lambda _b}\) where the adversary computes the value of \(K_{2\lambda _a+\lambda _b}\) directly using the keys it holds). Thus, the gate is garbled as in the simulation.

We conclude that when \(\sigma =0\), the ith gate is garbled as in the real garbling scheme, while when \(\sigma =1\) the ith gate is garbled as in the simulator procedure. However, to complete the construction of the garbled circuit, \(\mathcal{A}_i\) needs to construct all the gates \(\ell >i\). For a gate \(\ell >i\) with input wires that are output from the ith gate and greater, \(\mathcal{A}_i\) has both keys on the wires and so can compute the gate just like in the real garbling procedure. If a gate \(\ell >i\) has an input wire that is output from a gate \(j<i\) that does not equal a or b, then \(\mathcal{A}_i\) simply chooses the (inactive) key at random, like in the hybrid definition. Finally, for a gate that has an input wire a or b, the gate is constructed used oracles \({\mathcal {O}}^{(2)}\) and \({\mathcal {O}}^{(4)}\) and the same code for gate i (except with these oracles instead of \({\mathcal {O}}^{(1)}\) and \({\mathcal {O}}^{(3)}\)). Since these oracles always use the pseudorandom functions, it follows that the computation of the gate is always according to the real garbling method. (Note that when garbling the \(\ell \)th gate, each of the queries to these oracles includes the gate’s number. Thus we are guaranteed that these queries were not sent to \({\mathcal {O}}^{(1)}\) and \({\mathcal {O}}^{(3)}\) when \(\mathcal{A}_i\) garbled the ith gate, as required in \(\mathsf{Expt}_{{\mathcal {F}},\mathcal{A}}^{2PRF}(n,\sigma )\) experiment.)

Concluding the proof, when \(\sigma =0\), \(\mathcal{A}_i\) constructs the hybrid \(H_{i-1}(x)\), while when \(\sigma =1\), \(\mathcal{A}_i\) constructs the hybrid \(H_{i}(x)\). We therefore construct a single adversary \(\mathcal{A}'\) for 2PRF who chooses a random i and then runs \(\mathcal{A}_i\) with adversary \(\mathcal{A}\). By a standard hybrid argument, if \(\mathcal{A}\) succeeds with non-negligible probability in \(\mathsf{Expt}^\mathrm{priv}\) then \(\mathcal{A}'\) distinguishes between \(\mathsf{Expt}_{{\mathcal {F}},A}^{2PRF}(n,0)\) and \(\mathsf{Expt}_{{\mathcal {F}},A}^{2PRF}(n,1)\), with non-negligible probability. This contradicts the assumption that \({\mathcal {F}}\) is a family of pseudorandom functions. \(\square \)

Achieving Obliviousness and Authenticity In order to satisfy the obliviousness requirement, we need to construct a simulator that outputs (CX) given only c as an input. Note that the simulator \({\mathcal {S}}\) constructed above for the privacy requirement outputs the triple (CXd). However, \({\mathcal {S}}\) uses c only for generating (CX), and in particular the output c(x) is used only for generating d. Thus, we can simply remove the generation of the decoding information from \({\mathcal {S}}\)’s instruction, and we obtain a simulator that generates only (CX) as required. Proving that this simulator’s output is indistinguishable from (CX) generated by the real scheme is the same as in the proof of privacy.Footnote 7

Regarding authenticity, we need to show that a probabilistic polynomial time adversary \(\mathcal{A}\) that is given (CX) as input can output \(\tilde{Y}\) such that \(\mathsf{Decode}(\tilde{Y},d)\notin \{c(x),\perp \}\) with at most negligible probability. Note that if we give \(\mathcal{A}\) the pair (CX) generated by our simulator, it can succeed only with probability at most \(2^{-n}\). This is due to the fact that in the simulated garbled circuit, for each output wire j corresponding to the jth output bit, \(d[j,\overline{c(x)}_j]\) is a random string. Now, if given the real (CX), the adversary can output such a \(\tilde{Y}\) with non-negligible probability, then it could be used by an adversary given (CXd), to break the privacy property, in contradiction to Theorem 3.4. Observe that since the adversary in the privacy experiment is given all of the decoding information d, it can efficiently verify if \(\mathcal{A}\) output a \(\tilde{Y}\) with the property that \(\mathsf{Decode}(\tilde{Y},d)\notin \{c(x),\bot \}\).

3.6 XOR Gates with Only Three PRF Computations

Our garbling method requires four calls to the pseudorandom function for garbling XOR gates, where each call uses a different key. In this section we show that it is possible to remove one of these calls by leaving one of the input keys unchanged. Recall that our ciphertext for a XOR gate g with input wires ij and output wire \(\ell \) is:

$$\begin{aligned} C[g] = F_{k_i^{\pi _i}}(g\Vert 0)[1..n]\oplus F_{k_i^{\overline{\pi }_i}}(g\Vert 1)[1..n] \oplus F_{k_j^{\pi _j}}(g\Vert 0)[1..n]\oplus F_{k_j^{\overline{\pi }_j}}(g\Vert 1)[1..n]\quad \end{aligned}$$
(4)

Assume the evaluator has the keys \(k_i^{v_i},k_j^{v_j}\) and the signal bits \(\lambda _i,\lambda _j\) when computing the gate. Then, using the ciphertext C[g] it computes \(C[g]\oplus F_{k_i^{v_i}}(g\Vert \lambda _i)[1..n]\oplus F_{k_j^{v_j}}(g\Vert \lambda _j)[1..n]\) and obtains \(F_{k_i^{\overline{v}_i}}(g\Vert \overline{\lambda }_i)[1..n]\oplus F_{k_j^{\overline{v}_j}}(g\Vert \overline{\lambda }_j)\), which is the XOR of two pseudorandom values. If we leave, for example, the value of \(k_j^{\overline{v}_j}\) unchanged—i.e., use it in Eq. (4) instead of \(F_{k_j^{\overline{v}_j}}(g\Vert \overline{\lambda }_j)\)—the evaluator will be able to compute \(F_{k_i^{\overline{v}_i}}(g\Vert \overline{\lambda }_i)[1..n]\oplus k_j^{\overline{v}_j}\). Observe that the evaluator still cannot learn anything since one of the two values is a new pseudorandom value that does not appear anywhere else in the circuit (taking the gate index g as an input to F ensures that if a wire i or j enters multiple gates, then we compute a different value for each gate). Therefore, the ciphertext is pseudorandom as required. In addition, since the two keys on wire i are still translated to new keys, the output wire keys, generated in the same way as before, are guaranteed to obtain new fresh values. (See Footnote 6 as to why we cannot use the same method to remove one of the pseudorandom function calls on wire i as well.)

The Modified Garbling Scheme Denote the modified scheme where only three pseudorandom function calls are made by \(G'\). Figure 8 presents the modifications in \(G'\) compared to our base scheme; only the items in \(\mathsf{GbXOR}\) and \(\mathsf{Eval}\) that were changed appear, and the actual changes appear in bold. The procedure \(\mathsf{GbXOR}\) is changed by not changing the key on wire j that represents the bit \(v_j\) when \(v_j\oplus \pi _j=0\); i.e., \(\tilde{k}_j^0\) is set to \(k_j^0\) instead of \(F_{k_j^0}(g\Vert 0)[1..n]\). Consequently, in the \(\mathsf{Eval}\) procedure, the evaluator uses the signal bit it holds to decide whether to translate the key on wire j into a new key or not.

Fig. 8
figure 8

Improved garbling scheme \(G'\)

Security Proof We now prove security of the modified scheme.

Theorem 3.5

If \({\mathcal {F}}\) is a family of pseudorandom functions, then the garbling scheme \(G'\) achieves privacy.

Proof Sketch

The proof is very similar to the proof of Theorem 3.4 for G. We describe the main changes that are needed in order to make the proof valid for our modified scheme \(G'\). First, let \({\mathcal {S}}'\) be a simulator that is identical to the simulator \({\mathcal {S}}\) from the proof of Theorem 3.4, except that when simulating XOR gates in step 2.(a).iii, when \(\lambda _j=0\), it sets \(\tilde{k}_j=k_j\) (instead of as \(F_{k_j}(g\Vert 0)[1..n]\)). In order to prove that the output of \({\mathcal {S}}'\) is indistinguishable (CXd) generated by the garbling scheme \(G'\), we reduce the security to \(\mathsf{Expt}^{2PRF}\), as in Theorem 3.4. Specifically, the same hybrid distribution \(H_i(x)\) that was defined in the proof of Theorem 3.4 is used here. Then, we define a probabilistic polynomial time adversary \(\mathcal{A}_i\) for \(\mathsf{Expt}^{2PRF}\).

\(\mathcal{A}_i\) garbles the first \(i-1\) gates using the instructions of simulator \({\mathcal {S}}'\). When \(\mathcal{A}_i\) needs to construct the ith gate with input wires ab and output wire c, \(\mathcal{A}_i\) holds two active keys \(k_a^{v_a},k_b^{v_b}\), two signal bits \(\lambda _a,\lambda _b\) and the actual bits that are on the wire \(v_a,v_b\). If g is an AND gate, then \(\mathcal{A}_i\) proceeds exactly as in the proof of Theorem 3.4. If g is a XOR gate, then \(\mathcal{A}_i\) proceeds differently, depending on the value of \(\lambda _b\). If \(\lambda _b=0\) (i.e, the key that \(\mathcal{A}_i\) holds on wire b has the signal bit ‘0’), then in order to generate the ciphertext C[g], the adversary \(\mathcal{A}_i\) needs to use its oracle (see Step 3.5.2 of \(\mathcal{A}_i\) in the case that the ith gate is a XOR gate). In this case, it sets \(\tilde{k}_b^{v_b}:=k_b^{v_b}\) and sets the other key \(\tilde{k}_b^{\overline{v}_b}\) and the ciphertext C[g] exactly as in the proof of Theorem 3.4. However, if \(\lambda _j=1\), then \(\mathcal{A}_i\) does not use the oracle anymore in order to generate \(\tilde{k}_b^{\overline{v}_b}\) (see Step 4b of \(\mathcal{A}_i\) in the case that the ith gate is a XOR gate) since this value is not translated. Thus, \(\mathcal{A}_i\) just chooses \(k_b^{\overline{v}_b}\) randomly and uses this instead of the call to \({\mathcal {O}}^{(3)}\). This is the only difference to \(\mathcal{A}_i\) (note that when constructing gates for \(\ell >i\) where b is an input wire, \(\mathcal{A}_i\) does not use \({\mathcal {O}}^{(3)}\) or \({\mathcal {O}}^{(4)}\), but rather uses \(\tilde{k}_b^{\overline{v}_b}\) as chosen above).

The remainder of the proof is the same. \(\square \)

4 Simple and Fast 4–2 GRR for Non-XOR gates

4.1 Overview

Abstractly, gate garbling typically works by generating four pseudorandom masks \(K_{0},K_{1},K_{2},K_{3}\), corresponding to the four possible input combinations (in some permuted order). In the notation, we have used so far—see Procedure \(\mathsf{GbAND}\)—we have that \(K_0 = F_{k_i^{\pi _i}}(g\Vert 00)\oplus F_{k_j^{\pi _j}}(g\Vert 00)\), \(K_1= F_{k_i^{\pi _i}}(g\Vert 01)\oplus F_{k_j^{\bar{\pi }_j}}(g\Vert 01)\), and so on (note that \(K_1\) equals the value used to mask the output key \(k_\ell ^{g(\pi _i,\bar{\pi }_j)}\) in \(T_1\)). The evaluator of the circuit is able to compute one of these four masks, and can also use the signal bits to identify the index of that mask. Namely, it computes a pair \((i,K_{i})\) (but is unable to identify the real input combination corresponding to the value that it computed).

In our base scheme described in the previous section, we garbled non-XOR gates with three ciphertexts for each garbled gate. One of the ciphertexts was “removed” by setting one of the keys on the output wire to actually be \(K_0\) rather than using \(K_0\) to mask the key (this is called garbled row reduction, or GRR for short). In this section, we improve on this by applying a 4–2 row reduction technique on these gates in order to remove an additional ciphertext. There are two known such techniques: The 4-to-2 reduction technique method of Pinkas et al. [21] and the new “half-gates” approach of Zahur et al. [23]. The “half-gates” technique was designed to be compatible with the free-XOR technique and actually requires free-XOR; as such, it is based on the circularity assumption and so is not suitable for this paper. In contrast, the 4–2 GRR technique of Pinkas et al. [21] does not require free-XOR; it has been proved relying on a standard assumption only and can be incorporated into our scheme. However, in this technique, the generation of the garbled table by the circuit garbler, as well as the computation of the output wire key given two ciphertexts of the gate table and the K value, is carried out by interpolating a degree 2 polynomial. We describe here a different 4-to-2 garbling method where the garbling and evaluation of the gate use only simple XOR operations. This is preferable for two major reasons:

  • Efficiency Polynomial interpolation uses three finite field multiplications and two additions (after the Lagrange coefficients are precomputed). The overhead of computing the multiplications is rather high, even when implemented in \(\textit{GF}(2^{128})\). For example, our implementation of this task, which used the PCLMULQDQ Intel instruction, needed about half as many cycles as AES encryption.

  • Simpler coding Efficient implementation of polynomial interpolation, especially over \(\textit{GF}(2^{128})\), and using machine instructions rather than calling a software library, requires some expertise and is significantly harder to code than a few XOR operations.

Gate Evaluation We first describe the process of evaluating a gate. We will then describe the garbling procedure which enables this gate evaluation procedure. Although this is somewhat reversed (as one would expect a description of how garbling is computed first), we present it this way as we find it clearer.

The gate evaluator receives as input a gate table with two entries \([T_{1},T_{2}]\), an index \(i\in \{0,1,2,3\}\), and a value \(K_{i}\) computed from the two garbled values of the input wires (note, \(T_{1},T_{2},K_{i}\) are all 128 bit strings). It computes the garbled output wire key \(k_{\mathrm {out}}\) in the following way:

  • If \(i=0\) then \(k_{\mathrm {out}} = K_{0}\)

  • If \(i=1\) then \(k_{\mathrm {out}} = K_{1} \oplus T_{1}\)

  • If \(i=2\) then \(k_{\mathrm {out}} = K_{2} \oplus T_{2}\)

  • If \(i=3\) then \(k_{\mathrm {out}} = K_{3} \oplus T_{1} \oplus T_{2}\)

Garbling We now show how to garble AND gates so that the evaluation described above provides correct evaluation. Due to the random permutation applied to the rows (via the permutation bit), the single output bit “1” of these gates might correspond to any of the masks \(K_{0},K_{1},K_{2},K_{3}\). Denote the index of that mask as \(s\in \{0,1,2,3\}\), and denote by \(k_{\mathrm {out}}^{0}, k_{\mathrm {out}}^{1}\) the output wire keys. We need to design a method for computing the garbled output key from the garbled table of this gate and the \(K_{i}\) values, such that

  • The method applied to \(K_{s}\) outputs \(k_{\mathrm {out}}^1\), and when applied to any other K value it outputs \(k_{\mathrm {out}}^0\).

  • Given \(K_{s}\) and the gate table, the value \(k_{\mathrm {out}}^0\) is pseudorandom. Similarly, given any other K value and the gate table, \(k_{\mathrm {out}}^1\) value is pseudorandom.

Our starting point is the basic garbled gate procedure without row reduction, and so with a gate table of four entries \([T_{0},T_{1},T_{2},T_{3}]\). We denote the output garbled value associated with the ith entry of the table by \(k[T_i]\), meaning that if \(T_i\) is the one “decrypted,” then the key obtained is \(k[T_i]\). It holds that one \(k[T_i]\) value is equal to \(k_{\mathrm {out}}^{1}\), and the other three \(k[T_i]\) values are equal to \(k_{\mathrm {out}}^{0}\). The table contains the four entries \(T_{i} = K_{i}\oplus k[T_i]\).

In the 4-to-3 row reduction method, the garbled gate entry \(T_{0}\) is always 0, and therefore, (1) there is no need to store and communicate that entry, and (2) it always holds that \(k[T_0]= K_{0}\). If \(k[T_0]= k_{\mathrm {out}}^{0}\), then \(k_{\mathrm {out}}^{1}\) can be defined arbitrarily, whereas if \(k[T_0]= k_{\mathrm {out}}^{1}\) then \(k_{\mathrm {out}}^{0}\) can be defined arbitrarily.

In our new garbling method, we use the freedom in choosing the second output wire key to always set it to \(K_{1}\oplus K_{2}\oplus K_{3}\). As a result, and as will be explained below, the garbled table will have the property that entry \(T_{3}\) of the table satisfies \(T_{3} = T_{1}\oplus T_{2}\). Therefore, \(T_{3}\) can be computed in run time by the evaluator and need not be stored or sent. In summary, garbling is carried out as follows:

  • If \(k[T_0]=k_{\mathrm {out}}^0\) then \(k_{\mathrm {out}}^0=K_0\) and \(k_{\mathrm {out}}^1=K_1 \oplus K_2 \oplus K_3\)

  • Else, \(k_{\mathrm {out}}^1=K_0\) and \(k_{\mathrm {out}}^0=K_1 \oplus K_2 \oplus K_3\)

This fully defines the garbled table, as follows:

  • If \(k[T_1]=k_{\mathrm {out}}^0=K_0\), then \(T_1 = K_0 \oplus K_1\) (since \(K_0=k_{\mathrm {out}}=K_1\oplus T_1\)). Else, we have \(k[T_1]=K_1\oplus K_2 \oplus K_3\), implying that \(T_1 = K_2 \oplus K_3\).

  • If \(k[T_2]=k_{\mathrm {out}}^0=K_0\), then \(T_2 = K_0 \oplus K_2\) (since \(K_0=k_{\mathrm {out}}=K_2\oplus T_2\)). Else, we have \(k[T_2]=K_1\oplus K_2 \oplus K_3\), implying that \(T_2 = K_1 \oplus K_3\).

See Table 4 for the full definition of the garbled table \([T_1,T_2]\) and the definition of the output wires, depending on the permutation (recall that s is the index such that \(K_s=k_{\mathrm {out}}^1\)). It is easy to verify correctness by tracing the computation in each case according to the table.

Table 4 Garbling the gate table

An alternative way to verify that the new scheme is correct is to observe that the output wire key computed for \(K_{3}\) is always

$$\begin{aligned} k[T_3]= & {} K_{3}\oplus T_{1}\oplus T_{2} \\= & {} K_{3} \oplus (k[T_1] \oplus K_{1}) \oplus (k[T_2] \oplus K_{2}) \\= & {} K_{1}\oplus K_{2} \oplus K_{3} \oplus k[T_1] \oplus k[T_2] \end{aligned}$$

If \(k[T_1] \ne k[T_2]\) then \(k[T_1] \oplus k[T_2] = K_{0} \oplus (K_{1}\oplus K_{2}\oplus K_{3})\). In this case, \(k[T_3]\) should equal \(K_0\) (since one of \(k[T_1],k[T_2]\) equals \(k_{\mathrm {out}}^1\) and thus \(k_{\mathrm {out}}^3=k_{out}^0\)), and this indeed follows from the equation.

If \(k[T_1] = k[T_2]\), then by the equation we have that \(k[T_3]= K_{1}\oplus K_{2}\oplus K_{3}\). If \(k[T_3]=k_{\mathrm {out}}^1\) then this is correct since \(k_{\mathrm {out}}^0=K_0\). Furthermore, if \(k[T_3]=k_{\mathrm {out}}^0\) then since \(k[T_1] = k[T_2]\) they both also equal \(k_{\mathrm {out}}^0\). This implies that \(k[T_0]=k_{\mathrm {out}}^1=K_0\) and so \(k[T_3]=K_1\oplus K_2 \oplus K_3\), as required. Intuitively, the first case (where \(k[T_3]=k_{\mathrm {out}}^1\)) corresponds to the case that the 0-key is \(K_0\) and the 1-key is \(K_1\oplus K_2 \oplus K_3\), whereas the second case (where \(k[T_3]=k_{\mathrm {out}}^0\)) corresponds to the case that the 0-key is \(K_1\oplus k_2\oplus K_3\) and the 1-key is \(K_0\).

Encoding the Permutation Bits The permutation bits can be encoded in a similar way to that suggested in [21]. Two changes are applied to the basic garbling scheme:

  • The garbled values are only n bits long, whereas the values \(K_{i}\) are still \(n+1\) bits long (concretely here, we use \(n=127\)). Therefore, the function used for generating the \(K_{i}\) inputs has n-bit inputs and an \(n+1\)-bit output. We denote the least significant bit of \(K_{i}\) by \(m_{i}\). Only n bits of \(K_{i}\) are used for computing the garbled key of the output wire, using the procedure described above. Consequently, the values \(T_{1},T_{2}\) of the garbled table are also only n bits long.

  • We add 4 bits to the table. The ith of these bits is the XOR of \(m_i\) with the permutation bit of the corresponding output value.

The total length of a gate table is now \(2n+4=2n+4\) bits (concretely 258 bits). The evaluation of a gate is performed by computing \(K_{i}\); using its most significant n bits for computing the corresponding garbled output value; and using its least significant bit \(m_{i}\) for computing the corresponding signal bit.

As for security, note that the \(m_{i}\) bits are pseudorandom, and are used only for the encryption of the permutation/signal values.

Intuition for Security Recall that the 4-to-3 garbled row reduction scheme enables an arbitrary choice of the output wire key that is not \(k[T_0]\). The new 4-to-2 garbled row reduction scheme that we present is a special case, where we define that output wire key to be equal to \(K_{1}\oplus K_{2}\oplus K_{3}\). Note that the evaluator can compute one of the \(K_i\) values using the two keys it holds, and can obtain two of the other three using \(T_1,T_2\). However, in order to learn the other output wire key it needs the one \(K_i\) value that it cannot compute. Thus, from the point of view of the evaluator, the other output wire key, is a random string as required.

4.2 The Garbling Scheme

The changes need to be made at the \(\mathsf{Garble}\) and \(\mathsf{Eval}\) procedures in order to incorporate our 4–2 GRR technique are presented in Fig. 9. We denote the improved scheme by \(G''\).

4.3 Proof of Security

Next, we prove that \(G''\) satisfies the privacy requirement. As before, we present the modifications needed to the proof of Theorem 3.4.

Theorem 4.1

If \({\mathcal {F}}\) is a family of pseudorandom functions, then the garbling scheme \(G''\) achieves privacy.

Fig. 9
figure 9

Improved garbling scheme \(G''\)

Proof Sketch

Let \({\mathcal {S}}''\) be a simulator that is identical to the simulator \({\mathcal {S}}\) from the proof of Theorem 3.4 (or to \({\mathcal {S}}'\) from Theorem 3.5 if the XOR gates are computed as in \(G'\)), except that when \({\mathcal {S}}''\) needs to simulate the garbling of an AND gate, holding the active keys \(k_i,k_j\) and signal bits \(\lambda _i,\lambda _j\), it does the following:

  1. 1.

    \({\mathcal {S}}''\) computes \(K||m:=F_{k_i}(g||\lambda _i\lambda _j)\oplus F_{k_j}(g||\lambda _i\lambda _j)\)

  2. 2.

    \({\mathcal {S}}''\) sets the output wire active key and signal bit:

    1. (a)

      \(\lambda _{\ell }\leftarrow \{0,1\}\)

    2. (b)

      If \(2\lambda _i+\lambda _j=0\), then set \(k_{\ell }:=K\)

    3. (c)

      Else, set \(k_{\ell }\in \{0,1\}^n\)

  3. 3.

    Set \(T_1,T_2\):

    1. (a)

      If \(2\lambda _i+\lambda _j=0\), set \(T_1,T_2\leftarrow \{0,1\}^n\)

    2. (b)

      If \(2\lambda _i+\lambda _j=1\), set \(T_1:=K\oplus k_{\ell }\) and \(T_2\leftarrow \{0,1\}^n\)

    3. (c)

      If \(2\lambda _i+\lambda _j=2\), set \(T_1\leftarrow \{0,1\}^n\) and \(T_2:=K\oplus k_{\ell }\)

    4. (d)

      If \(2\lambda _i+\lambda _j=3\), set \(T_1\leftarrow \{0,1\}^n\) and \(T_2:=K\oplus k_{\ell } \oplus T_1\)

  4. 4.

    Compute the additional 4 bits: set \(t_{2\lambda _i+\lambda _j}:=m\oplus \lambda _{\ell }\), and for \(\alpha \in \{0,1,2,3\}\setminus (2\lambda _i+\lambda _j)\) set \(t_{\alpha }\leftarrow \{0,1\}\)

  5. 5.

    Set \(C[g]\leftarrow T_1,T_2,t_0,t_1,t_2,t_3\)

Note that in the AND gates generated by \({\mathcal {S}}''\) code, the ciphertexts are computed so that the result of \(\mathsf{Eval}\) will always be \(k_\ell \). (For example, according to \(\mathsf{Eval}\), if \(2\lambda _i+\lambda _j=1\) then \(k_\ell \) is computed as \(K\oplus T_1\). In such a case, \({\mathcal {S}}''\) sets \(T_1 := K \oplus k_\ell \) and thus indeed \(K\oplus T_1=k_\ell \).) Beyond this constraint, the values are uniformly random. In particular, if \(2\lambda _i+\lambda _j=0\) then both ciphertexts are random, and otherwise, the single ciphertext not used in \(\mathsf{Eval}\) is random. In addition, the 4 bits that mask the output wire permutation bits are chosen randomly except for the bit that is pointed to by the input wire’s active signal bits.

We now define a hybrid \(H_i\) as in the proof of Theorem 3.4 and construct an adversary \(\mathcal{A}_i\) for the experiment \(\mathsf{Expt}^{2PRF}\). Adversary \(\mathcal{A}_i\) garbles the first \(i-1\) gates in topological order using the instructions of \({\mathcal {S}}''\). When \(\mathcal{A}_i\) reaches the ith gate with input wires \({\varvec{a}}, {{\varvec{b}}}\) and output wire \({{\varvec{c}}}\), it holds two active keys \(k_a^{v_a},k_b^{v_b}\), two signal bits \(\lambda _a,\lambda _b\) and the actual bits that are on the wire \(v_a,v_b\). Now, if g is a XOR gate, the \(\mathcal{A}_i\) garbled the gate exactly as in the proof of Theorem 3.4 (or Theorem 3.5). If g is an AND gate, then \(\mathcal{A}_i\) works as follows:

  1. 1.

    \(\mathcal{A}_i\) computes:

    $$\begin{aligned} K_{2\lambda _a+\lambda _b}||m_{2\lambda _a+\lambda _b}:=F_{k_a^{v_a}}(g||\lambda _a\lambda _b)\oplus F_{k_b^{v_b}}(g||\lambda _a\lambda _b)\\ K_{2\lambda _a+\overline{\lambda }_b}||m_{2\lambda _a+\overline{\lambda }_b}:=F_{k_a^{v_a}}(g||\lambda _a\overline{\lambda }_b)\oplus {\mathcal {O}}^{(3)}(g||\lambda _a\overline{\lambda }_b)\\ K_{2\overline{\lambda }_a+\lambda _b}||m_{2\overline{\lambda }_a+\lambda _b}:={\mathcal {O}}^{(1)}(g||\overline{\lambda }_a\lambda _b)\oplus F_{k_b^{v_b}}(g||\overline{\lambda }_a\lambda _b)\\ K_{2\overline{\lambda }_a+\overline{\lambda }_b}||m_{2\overline{\lambda }_a+\overline{\lambda }_b}:={\mathcal {O}}^{(1)}(g||\overline{\lambda }_a\overline{\lambda }_b)\oplus {\mathcal {O}}^{(3)}(g||\overline{\lambda }_a\overline{\lambda }_b) \end{aligned}$$
  2. 2.

    \(\mathcal{A}_i\) computes the location of ‘1’ in the truth table: \(s:=2\overline{\pi }_a+\overline{\pi }_b=2(\overline{v_a\oplus \lambda _a})+(\overline{v_a\oplus \lambda _a})\)

  3. 3.

    \(\mathcal{A}_i\) runs steps (3)–(5) from procedure \(\mathsf{GbAND}\) in \(G''\)

  4. 4.

    \(\mathcal{A}_i\) outputs the garbled table \(C[g]\leftarrow T_1,T_2,t_0,t_1,t_2,t_3\)

It clear that when \(\sigma =0\) in \(\mathsf{Expt}^{2PRF}\), the result is identical to the real scheme \(G''\). In contrast, when \(\sigma =1\), the answers of the oracles are random strings, and we have that all of the K values are independent random strings except for the value of \(K_{2\lambda _a+\lambda _b}\) which \(\mathcal{A}_i\) computes by itself using the keys it holds. (Observe that in each of the K values except for \(K_{2\lambda _a+\lambda _b}\), oracles \({\mathcal {O}}^{(3)}\) and \({\mathcal {O}}^{(1)}\) are invoked on different inputs, resulting in independent random outputs.)

The output from the garbling of the gate by \({\mathcal {S}}''\) is \(k_c,\lambda _c\) (the active key used in garbling gates with wire c along with its signal) and the garbled table \(T_1,T_2,t_0,t_1,t_2,t_3\). In contrast, the output of \(\mathcal{A}_i\) is \(k_c^0,k_c^1,\pi _c\) along with \(T_1,T_2,t_0,t_1,t_2,t_3\). Since the actual value \(v_c\) on the output wire is given, we can compute the actual signal bit \(\lambda _c\) (which equals \(\pi _c\oplus v_c\)) and the active wire \(k_c^{v_c}\). Thus, we need to show that the joint distribution over \((k_c,\lambda _c,T_1,T_2,t_0,t_1,t_2,t_3)\) generated by \({\mathcal {S}}''\) in this case of \(\sigma =1\) is identical to the joint distribution over \((k_c^{v_c},\lambda _c,T_1,T_2,t_0,t_1,t_2,t_3)\) generated by \(\mathcal{A}_i\). In order to understand the following, we remark that given \(2\lambda _a+\lambda _b\) (the row pointed to by the signal bits) and s (the row in which the ‘1’-key is “encrypted”), the active key on the output wire can be determined. This is because if \(s=2\lambda _a+\lambda _b\) then the active key on the output wire is \(k_c^1\) (since the signal bits point to the 1-key), and otherwise, it is \(k_c^0\) (since the signal bits point to the 0-key).

We consider four cases:

  1. 1.

    Case 1\(2\lambda _a+\lambda _b=0\): In this case, \(\mathcal{A}_i\) computes \(K_0\) using keys \(k_a^{v_a},k_b^{v_b}\), whereas \(K_1,K_2,K_3\) are independent random strings. Now, in this case, \({\mathcal {S}}''\) sets \(k_c := K\) where K is computed exactly like \(K_0\) by \(\mathcal{A}_i\). In addition, \({\mathcal {S}}''\) chooses \(T_1,T_2\leftarrow \{0,1\}^n\) at random. Since \(K_1,K_2,K_3\) are independent and random, it follows that all four ways of setting \(T_1,T_2\) depending on s that are described in Step 4 of \(\mathsf{GbAND}\) of \(G''\) yield two independent keys. Thus, \(K_0,T_1,T_2\) generated by \(\mathcal{A}_i\) are distributed identically to \(K_0,T_1,T_2\) generated by \({\mathcal {S}}''\). Now, if \(s\ne 0\), then \(\mathcal{A}_i\) sets \(k_c^0=K_0\) while if \(s=0\) then \(\mathcal{A}_i\) sets \(k_c^1=K_0\). Since \(2\lambda _a+\lambda _b=0\) it follows that if \(s\ne 0\) then the active key is \(k_c^0\) and if \(s=0\) then the active key is \(k_c^1\). Thus, in both cases the active output key is \(K_0\), exactly like \({\mathcal {S}}''\).

  2. 2.

    Case 2\(2\lambda _a+\lambda _b=1\): In this case, \(\mathcal{A}_i\) computes \(K_1\) using keys \(k_a^{v_a},k_b^{v_b}\), whereas \(K_0,K_2,K_3\) are independent random strings. When \(s\in \{2,3\}\), the active key \(k_c^0\) on the output wire is set by \(\mathcal{A}_i\) in Step 3 to be equal to \(K_0\), and \(T_1=K_0\oplus K_1\). When \(s\in \{0,1\}\) the active key on the output wire equals \(K_1\oplus K_2\oplus K_3\) (because when \(s=0\), the active key is \(k_c^0:=K_1\oplus K_2\oplus K_3\) while when \(s=1\), the active key is \(k_c^1:=K_1\oplus K_2\oplus K_3\)), and \(T_1=K_2\oplus K_3\). In both cases, we have that \(K_1\oplus T_1\) equals the active key on the output wire. In addition, in all cases \(T_2\) is computed by \(\mathcal{A}_i\) by XORing two strings, of which at least one of them is random and independent of \(T_1\) and \(K_1\). (To be exact, \(T_2\) is actually the XOR of one of the output wire keys with \(K_2\). However, since \(K_2\) is random, and since there is at least one random string that appears in \(T_1\) or \(T_2\), but not in both, we have that \(T_2\) is completely independent of all other values.) In summary, \(k_c,T_1,T_2\) are all random strings under the constraint that \(k_c=T_1\oplus K_1\).

    In contrast, \({\mathcal {S}}''\) sets \(k_c\) to be random, sets \(T_1 = K\oplus k_c\) and \(T_2\) to be random. Thus, \(K\oplus T_1\) equals the active key on the output wire, and we have that \(k_c,T_1,T_2\) are also all random under the constraint that \(k_c=T_1\oplus K\). Thus, the distributions are identical.

  3. 3.

    Case 3\(2\lambda _a+\lambda _b=2\): In this case, \(\mathcal{A}_i\) computes \(K_2\) using keys \(k_a^{v_a},k_b^{v_b}\), whereas \(K_0,K_1,K_3\) are independent random strings. Using the same analysis as the previous case, we obtain that when \(s\in \{0,2\}\) the active key on the output wire is set by \(\mathcal{A}_i\) to be \(K_1\oplus K_2\oplus K_3\) (because when \(s=2\), the active key is \(k_c^1\), while when \(s=0\) the active key is \(k_c^0\); in both cases it equals \(K_1\oplus K_2\oplus K_3\)), and \(T_2=K_1\oplus K_3\). In contrast, when \(s\in \{1,3\}\), the active key \(k_c^0\) on the output wire is set to be \(K_0\), and \(T_2=K_0 \oplus K_2\). Denoting the active key by \(k_c\) in all cases, we have that \(k_c\) and \(T_2\) are random under the constraint that \(k_c\oplus T_2=K_2\). In all cases, as in the previous case with \(T_2\), ciphertext \(T_1\) is random and independent of \(k_c,T_2\) since it involves an independent random value each time (\(K_0\), \(K_1\) or \(K_3\)). Thus, \(k_c,T_1,T_2\) are independent random strings, under the constraint that \(k_c\oplus T_2=K_2\).

    Regarding \({\mathcal {S}}''\), it chooses \(k_c\) and \(T_1\) uniformly at random and sets \(T_2=K\oplus k_c\). Thus, \(k_c,T_1,T_2\) have exactly the same distribution as that generated by \(\mathcal{A}_i\).

  4. 4.

    Case 4\(2\lambda _a+\lambda _b=3\): In this case, \(\mathcal{A}_i\) computes \(K_3\) using keys \(k_a^{v_a},k_b^{v_b}\), whereas \(K_0,K_1,K_2\) are independent random strings. In this case, if \(s\in \{0,3\}\) then the active key \(k_c\) on the output wire is set by \(\mathcal{A}_i\) to be \(K_1\oplus K_2\oplus K_3\) (since if \(s=0\) the active key is \(k_c^0\), whereas if \(s=3\) the active key is \(k_c^1\)), and \(T_1\oplus T_2=K_1\oplus K_2\) (see Step 4 in \(\mathsf{GbAND}\)). Furthermore, if \(s\in \{1,2\}\) then the active key \(k_c\) on the output wire is \(K_0\) and \(T_1\oplus T_2=K_0\oplus K_3\). In both cases, \(k_c \oplus T_1 \oplus T_2 = K_3\). Apart from this constraint, the values are random. Thus, we have that \(k_c,T_1,T_2\) are random under the constraint that \(k_c \oplus T_1 \oplus T_2 = K_3\).

    Regarding \({\mathcal {S}}''\), in this case it chooses \(k_c\) and \(T_1\) independently at random and sets \(T_2=K\oplus k_c \oplus T_1\). Thus, as above, \(k_c,T_1,T_2\) are random under the constraint that \(k_c\oplus T_1\oplus T_2 = K_3\).

We conclude that \(k_c,T_1,T_2\) is identically distributed when generated by the adversary \(\mathcal{A}_i\) in the case of \(\sigma =1\) and when generated by the simulator \({\mathcal {S}}''\) (note that \(K_{2\lambda _a+\lambda _b}\) is always the exact same value since it is fixed by the incoming keys). In addition, in \(A_i\)’s code all of the m values, except for the value of \(m_{2\lambda _a+\lambda _b}\) are random. Thus, the bits \(t_1,t_2,t_3,t_4\) are random except for \(t_{2\lambda _a+\lambda _b}\), and the distribution over their values is the same as when they are generated by \({\mathcal {S}}''\). We conclude that when \(\sigma =1\), adversary \(\mathcal{A}_i\) constructs gate i exactly according to \({\mathcal {S}}''\).

The remaining gates \(j>i\) are garbled using the real garbling scheme \(G''\), with \(\mathcal{A}_i\) using its oracles \(O^{(2)},O^{(4)}\) to garble the other gates which wires a and b enters. We conclude that when \(\sigma =0\), adversary \(\mathcal{A}_i\) constructs the \(H_{i-1}\) hybrid, while when \(\sigma =1\) it constructs the \(H_i\) hybrid. The rest of the proof is the same as the proof of Theorem 3.4. \(\square \)

5 Garbling With Related-Key Security

5.1 Background

When using the free-XOR technique, a constant difference is used between the garbled values on every wire (i.e., there exists a random \(\Delta \) such that for every wire i, \(k_i^0\oplus k_i^1=\Delta \)). As a result, the keys used for encryption in non-XOR gates are correlated with each other, and also with the plaintext that they encrypt (observe that \(\Delta \) appears in the garbled values on both the input and output wires). Thus, a strong circularity related-key assumption is needed for proving that the technique is secure. As we have seen, if we want to rely on a pseudorandom function assumption only, then the keys on the wires have to be uniformly and independently chosen. In this section, we consider garbling schemes that rely on related keys, but do not require the stronger circularity assumption. In order to achieve this, keys on the input wires of each gate are allowed to be related, but no relation is allowed between input wire keys and the output wire keys they encrypt. This relaxation allows us to garble some of the XOR gates for free, and yields results that are better than when garbling under a pseudorandom function assumption only, but worse than garbling all the XOR gates for free which requires circularity. The work in this section builds strongly on the fleXOR technique of Kolesnikov et al. [14], and provides a more complete picture regarding the trade-off between efficiency and the security assumptions used in circuit garbling. (Specifically, we consider the cost of garbling under the hierarchy of assumptions, from public random permutation to circular related-key security to related-key security to a pseudorandom function assumption).

In the work of Kolesnikov et al. [14], they showed that in order to avoid circularity it suffices to apply a monotone rule on the wire ordering of the circuit. This monotone rule states that when a certain difference value \(\Delta \) is used on the input wires to a non-XOR gate, then the \(\Delta \) on the output wire must be different (actually, it has to be a \(\Delta \) that has not appeared previously in the garbling of gates that are in the path to the current gate). Denote L different difference values by \(\Delta _1,\ldots ,\Delta _L\). Then, a wire ordering is defined to be a function \(\phi \) that takes a wire as its input and returns an element of the set \(\{1,..,L\}\) (with the interpretation that on wire i, the difference between the garbled values is \(\Delta _{\phi (i)}\)). We formally define a monotone ordering as follows.

Definition 5.1

Let C be a garbled circuit, and let I be the set of circuit wires. A wire ordering function \(\phi :I\rightarrow \{1,..,L\}\) is called monotone if:

  1. 1.

    For every non-XOR gate with input wires ij and output wire \(\ell \): \(\phi (\ell )>max(\phi (i),\phi (j))\)

  2. 2.

    For every XOR gate with input wires ij and output wire \(\ell \): \(\phi (\ell )\ge max(\phi (i),\phi (j))\)

Now, assume that a wire ordering was fixed, and consider a XOR gate g. If \(\phi (\ell )=\phi (i)=\phi (j)\) then the gate is garbled and computed using the free-XOR technique. However, if \(\phi (\ell )\ne \phi (i)\) (or likewise if \(\phi (\ell )\ne \phi (j)\)) then wire i’s keys are translated into new keys \(\tilde{k}_i^0,\tilde{k}_i^1\) such that \(\tilde{k}_i^0=F_{k_i^0}(g)\) and \(\tilde{k}_i^1=\tilde{k}_i^0\oplus \Delta _{\phi (\ell )}\), yielding a garbled gate entry \(F_{k_i^1}(g)\oplus \tilde{k_i^1}\) (to be more exact, the way of computing \(\tilde{k}_i^0,\tilde{k}_i^1\) can be reversed, depending on the permutation bit). Once the input and output wires all have difference \(\Delta _{\phi (\ell )}\), the free-XOR technique can once again be used. It follows that a translation from \(\Delta _{\mathrm {input}\,\mathrm {wire}}\) to \(\Delta _{\mathrm {output}\,\mathrm {wire}}\) can be carried out with one ciphertext and two encryptions. Thus XOR gates can be garbled using 0, 1 or 2 ciphertexts and using 0, 2 or 4 encryptions (for the cases where no translation is needed, where one translation is needed and where two translations are needed, in, respectively), depending on the wire ordering that was chosen for the circuit. This is a “flexible approach” since many different wire orderings can be chosen, and hence its name “fleXOR.”Footnote 8 Since the specific ordering determines the cost, this introduces a new algorithmic goal which is to find a monotone wire ordering that is optimal; i.e., that minimizes the size of the circuit while satisfying the monotone property.

Unfortunately, it is NP-hard to find an optimal monotone wire ordering [14]. Thus, Kolesnikov et al. [14] described heuristic techniques for finding a good monotone ordering. Briefly, their heuristic is based on the observation that only non-XOR gates increase the wire ordering number. They therefore define the non-XOR-depth of a wire i to be the maximum number of non-XOR gates on all directed paths from i to an output wire. Then, they set the wire ordering so that \(\phi (i)+\mathsf{non-XOR-depth}(i)\) is constant for all wires. Algorithmically, they set the wire ordering value of each XOR gate’s output wire to be equal to the maximal ordering value of its input wires, and they make the wire ordering value of each AND gate’s output to equal a value that maintains the constant. For more details, see [14].

5.2 Safe and Monotone Wire Orderings

The goal of constructing a good monotone wire ordering is to assign, whenever possible, the same wire ordering number to input wires and output wires of XOR gates, so that the communication and computation cost at XOR gates will be minimized. However, such a strategy is not compatible with 4–2 row reduction techniques (the technique in this paper and in [21] require that both output values be arbitrary unlike here, and the half-gates method of Zahur et al. [23] works only under a circularity assumption which is exactly what we are trying to avoid here). Thus, an optimized monotone wire ordering may result in most AND gates being garbled with 3 ciphertexts (4–2 row reduction could be used in AND gates where the difference on the output wire is “new”). In circuits with many XOR gates relative to AND gates, such a strategy may be worthwhile. However, in circuits where there are more AND gates than XOR gates (like the SHA256 circuit), the result may be a larger circuit than that obtained by using our scheme based on pseudorandom functions alone that costs 2 ciphertexts per AND gate and 1 ciphertext per XOR gate.

This motivates the search for wire orderings that enable 4–2 row reduction in AND gates. Such a wire ordering is called safe and was defined by Kolesnikov et al. [14] for this purpose; intuitively, a wire ordering is safe if the values on the output wires of AND gates can be determined arbitrarily. Formally, we require that the \(\Delta \) in the output of a non-XOR gate different to all previous gates, implying that it is not yet determined:

Definition 5.2

Let C be a garbled circuit, and let I be the set of circuit wires. A wire ordering function \(\phi :I\rightarrow \{1,..,L\}\) is called safe if for every non-XOR gate g with output wire \(\ell \), it holds that for every wire i that precedes it in the topological order of the circuit \(\phi (i)<\phi (\ell )\).

Note that a wire ordering that is safe does not necessarily avoid circularity; thus free-XOR together with half-gates will always be preferable (note that the notion of a safe ordering was introduced before the half-gates construction was discovered, and this made it redundant). Nevertheless, in order to both avoid circularity and potentially reduce the number of ciphertexts in AND gates, we are interested in wire orderings that are simultaneously safe and monotone. Such wire orderings were not considered in the work of Kolesnikov et al. [14], and we will dedicate the rest of this section to introducing two simple heuristics that satisfy these two properties.Footnote 9

Safe and Monotone Heuristics Our goal is to find a “good” safe and monotone wire ordering heuristic that will allow us to garble AND gates using our 4–2 row reduction technique, and to garble XOR gates using the fleXOR approach. When we say a “good” heuristic, we mean that it minimizes the average number of ciphertexts per XOR gate. Recall that in the fleXOR approach, XOR gates are garbled using 0,1 or 2 ciphertexts. Thus, a wire ordering will only be reasonable if the average number of ciphertexts per XOR gate is lower than 1; otherwise, it is better to use the scheme presented earlier that garbles XOR gates with 1 ciphertext and under a pseudorandom function assumption only.

Observe that in a safe and monotone wire ordering, if there are L non-XOR gates in the circuit then there are \(L+1\) different delta values \(\{\Delta _0,\Delta _1,..,\Delta _L\}\), where \(\Delta _0\) is a random value that is set at the beginning of the garbling process and is assigned to the input wires of the circuit, and the rest of the \(\Delta \) values are assigned to each non-XOR gate in topological order as determined by the garbling row reduction method in the associated gate. We define two variables \(\phi _i^{min},\phi _i^{max}\) for every wire i in the circuit, where \(\phi _i^{min}\) (resp., \(\phi _i^{max}\)) is the minimal (resp., maximal) value that \(\phi (i)\) can have in any safe and monotone wire ordering in the circuit. In Fig. 10, we present an algorithm that computes the exact value of \(\phi _i^{min}\) and \(\phi _i^{max}\) for each wire.

Fig. 10
figure 10

Initialization algorithm for the safe and monotone wire ordering heuristic

To understand why the algorithm computes \(\phi _i^{min}\) and \(\phi _i^{max}\) correctly, recall that for each AND gate the wire ordering number is fixed in a safe wire ordering, and is equal to its index in the order of AND gates in the circuit. Thus, \(\phi _{\ell }^{min}=\phi _{\ell }^{max}:=AND\_index\) as set in step 3(a). For XOR gates, note that in our algorithm, setting the value of \(\phi _{\ell }^{min}\) for each XOR gate’s output wire to be the maximum \(\phi \) of its inputs, means that it is increased only by AND gates that are in a path from a circuit input wire to the gate. Therefore, if there exists a wire ordering that assigns \(\phi _{\ell }^{min}\) a smaller value than our algorithm, it will assign it a wire ordering number that is smaller than a wire ordering number of an AND gate’s output wire that is in a path that leads to it, thereby breaking monotonicity (as in Definition 5.1). In addition, since we cannot assign a wire with a \(\Delta \) whose value will be determined at a later stage (due to the safe condition), the highest possible wire ordering value that a XOR gate’s output wire \(\ell \) can have equals the number of AND gates that appears before it in the topological order of the circuit. This maximum value is exactly the value of the variable \(AND\_index\) in the algorithm, which is assigned to \(\phi _{\ell }^{max}\) in step 3(b).

Next, observe that a heuristic that assigns each wire the value \(\phi _i^{min},\) as well as a heuristic that assigns each wire the value \(\phi _i^{max}\), both yield valid safe and monotone wire orderings. Moreover, taking \(\phi _i^{min}\) at every gate ensures that each XOR gate will have at most one ciphertext (because the output wire value equals at least one of the input wire values). Thus, this heuristic—that we call the pure min-heuristic—guarantees that the average number of ciphertexts per XOR gate is less than or equal to 1.Footnote 10 This means that, not surprisingly, heuristics that yield a more efficient garbling scheme than our scheme based only on a pseudorandom function assumption do exist. However, our aim is to do better than the pure min-heuristic, and we suggest two heuristics that use \(\phi _i^{min}\) and \(\phi _i^{max}\) as initialization values for the wire ordering values, and then improve upon them by traversing the circuit gate by gate from the output to the input, and setting the value of each gate output wire based on the existing values given so far.

The idea behind both heuristics is that, starting with the output wires and going backwards, we try to group as many wires as possible to have the same wire ordering number. We do this by trying to assign a wire i the same value as one of the output wires of a gate that it enter (specifically, we try give it the minimal value among all the values on the output wires that it enter; taking the minimal ensures monotonicity). When this fails—measured by the fact this yields a value not between \(\phi _i^{min}\) and \(\phi _i^{max}\)—we set its wire ordering number to be the maximum between the initialization values of its input wires.

In the first heuristic, called SafeMon1, each wire i is given the initial ordering value \(\phi (i)=\phi _i^{min}\). Then, starting with the circuit’s output wires and going backwards in reverse topological order, we compute for every wire i that is not a circuit output wire:

$$\begin{aligned} \overline{\phi }_i:=\min \left\{ \phi (k) \mid \exists \text { gate }g\text { with input wire i and output wire }k\right\} \end{aligned}$$

This value is in fact the maximal value \(\phi (i)\) can have without breaking monotonicity since the input wire i to a gate cannot have a higher value than its output wire \(\ell \). Then, we set \(\phi (i)=\overline{\phi }_i\) if \(\overline{\phi }_i\le \phi _i^{max}\), and set \(\phi (i)=\phi _i^{min}\) if \(\overline{\phi }_i > \phi _i^{max}\) (this ensures that we don’t break the safe property).

The second heuristic, called SafeMon2, works in the same way except that the wires are initialized with \(\phi _i^{max}\) instead of \(\phi _i^{min}\). (Observe that in the initialization, \(\phi _\ell ^{min}=\max (\phi _i^{min},\phi _j^{min}\) and thus in SafeMon1 setting \(\phi _\ell =\phi _\ell ^{min}\) is the same as setting it to be \(\max \{\phi _i^{min},\phi _j^{min}\}\) as in SafeMon2.) The full description of the heuristic appears in Fig. 11.

Fig. 11
figure 11

Heuristics for finding safe and monotone wire orderings

5.3 Choosing the Best Heuristic

We have described three heuristics for generating wire orderings that yield garbled circuits that are secure based on a related-key assumption and without circularity (the monotone heuristic from [14] and two new heuristic SafeMon1 and SafeMon2). We ran these heuristics on three different circuits, and compared the results with our garbling scheme based only on pseudorandom functions and with the monotone heuristic of Kolesnikov et al. [14]. The circuits we tested the heuristics on are AES, SHA-256 and Min-Cut 250,000. The circuits have 6800, 90,825 and 999,960 AND gates, respectively, and 25,124, 42,029 and 2,524,920 XOR gates, respectively [1]. The performance of each heuristic was measured by the size of the circuit it yields. Table 5 shows the results of the comparison. It can be seen that the monotone heuristic of Kolesnikov et al. [14] gives the best result for the AES circuit, while the SafeMon2 heuristic yields the smallest garbled circuit for the SHA-256 and Min-Cut circuits. Observe that in the SHA-256 circuit, which has a high percentage of AND gates, all the heuristics fail to significantly reduce the size of the garbled circuit, relative to the size of the circuit constructed under the pseudorandom function assumption only. This is due to the fact that in such circuits the high amount of AND gates impose many constraints on the wire ordering, and so we are forced to have many different deltas that are “spread” between a small amount of XOR gates. In such cases, not much is gained by using a related-key assumption, versus pseudorandom functions only.

As we described above, safe and monotone heuristics are expected to beat the pure monotone heuristic of Kolesnikov et al. [14] only when there are more AND gates than XOR gates. Indeed, if we measure only the effect on XOR gates (see the numbers in the parentheses in Table 5), then the monotone-only heuristic always results in less ciphertexts on average for the XOR gates. This is because it imposes less constraints on the wire ordering and focuses only on the XOR gates. In the AES circuit, where there are only 6800 AND gates and 25124 XOR gates, the average number of ciphertexts per XOR gates is only 0.15 (which is very impressive). Thus, even though the AND gates require 3 ciphertexts each, the overall result is the best. The safe and monotone heuristics that we present here also take into consideration the cost of AND gates (at the expense of XOR gates), and so achieve better results when there is a higher percentage of AND gates.

Table 5 Comparison of the size of the garbled circuit that each heuristic generated (including a base comparison to the cost under a pseudorandom function assumption only)

In Table 6, we show the computation cost of garbling XOR gates (number of pseudorandom function computations) when using the wire orderings that each heuristic generated. We only consider XOR gates in this computation, since all the methods used for garbling AND gates require the same computational work. Recall that in the fleXOR method, garbling XOR gates may need 0, 2 or 4 calls to the pseudorandom function. Thus, the computation cost is measured by the average number of calls to the encryption function per XOR gate. Observe that for this measure, all heuristics are considerably better than the scheme relying only on a pseudorandom function assumption. This is because in the fleXOR approach, only two calls to the pseudorandom function are needed when garbling a XOR gate with one ciphertext, in contrast to three in the scheme based only on pseudorandom functions. However, we remark that when using AES-NI and pipelining, this actually makes little difference to the overall time. Also, observe that the monotone heuristic is better than the other heuristics when comparing computational cost. As explained before, this is because the monotone heuristic focuses solely on minimizing the cost at XOR gates, rather than minimizing the cost for the entire circuit, thus achieving better results when measuring the effect on XOR gates only.

Table 6 Comparison of the average number of calls to the pseudorandom function per XOR gate, for each heuristic

We conclude that in circuits with many XOR gates relative to AND gates, the use of a related-key assumption yields an improvement over the scheme relying on pseudorandom functions only. For example, in the AES circuit the smallest result is 24% smaller, and in the min-cut circuit the size of the circuit is approximately 16% smaller.

Optimal Algorithms We stress that we did not prove anything regarding the optimality of the heuristics we described. Indeed, adding the requirement that the ordering be safe is just a way to force the heuristic to take AND gates into account. However, it is possible that a better result can be achieved with an ordering that is not safe. However, as we have mentioned, finding an optimal monotone ordering is NP-hard. Thus, finding better heuristics or optimization algorithms is left for future work.

6 Experimental Results and Discussion

In the previous sections, we presented four tools that can optimize the performance of garbled circuits without relying on any additional cryptographic assumption beyond the existence of pseudorandom functions: (1) pipelined garbling; (2) pipelined key scheduling; (3) XOR gates with one ciphertext and three encryptions; and (4) improved 4–2 GRR for AND gates. In this section, we present the results of an experimental evaluation of these methods—together and separately – and compare their performance to that of other garbling methods.

Table 7 shows the time it takes to run the full Yao semi-honest protocol [18, 22] on three different circuits of interest: AES, SHA-256 and Min-Cut 250,000. The circuits have 6800, 90,825 and 999,960 AND gates, respectively, and 25,124, 42,029 and 2,524,920 XOR gates, respectively. The number of input bits for which OTs are performed are 128, 256 and 250,000, respectively [1]. We remark that our implementation of the semi-honest protocol of Yao utilizes the highly optimized OT extension protocol of [2].

Table 7 Summary of experimental results (times are for a full semi-honest execution in milliseconds)
Table 8 Summary of garbled circuit size in Megabytes, according to scheme

We examined eight different schemes, described using the following notation: [pipe-garble] for the pipelined garbling method; [pipe-garble+KS] for pipelined garbling and pipelined key scheduling; [fixed key] where all PRF evaluations were performed using the fixed-key technique described in [5]; [XOR-3] where XOR gates were garbled using a simple 4–3 GRR method; [XOR-1] where XOR gates were garbled using our method of garbling with one ciphertext; [free-XOR] where the free-XOR technique was used; [AND-3] where AND gates were garbled using simple 4–3 GRR; [AND-2] where our 4–2 GRR method was used to garble AND gates; and finally, [AND-half-gates] where the “half-gates” technique of Zahur et al. [23] was used to garble AND gates. Note that the half-gates method is only used in conjunction with free-XOR since this is a requirement.

The first scheme in Table 7 is the most “naïve,” where a simple 4–3 GRR was used for both AND and XOR gates and the garbling was pipelined, but not the key scheduling. In contrast, the last scheme is the most efficient as it uses fast fixed-key encryption and the half-gates approach to achieve two ciphertexts per AND gates and none for XOR gates. However, this scheme is based on the strongest assumption, that fixed-key AES behaves like a public random permutation. The third scheme in the table uses all our optimizations together, and thus it is the most efficient scheme that is based on a standard PRF assumption. The sixth scheme in the table shows the best that can be achieved while assuming circularity and related-key security, but without resorting to a public random permutation. Table 8 shows the size of the garbled circuit under all our different schemes, as in Table 7.

The experiments were performed on Amazon’s c4.8xlarge compute-optimized machines (with Intel Xeon E5-2666 v3 Haswell processors) running Windows. The measurements include the time it takes to garble the circuit, send it to the evaluator, and compute the output. Since communication is also involved, this measures improvements both in the encryption technique and in the size of circuit. Each scheme was tested on the three circuits in two different settings: the Virginia–Virginia (VA–VA) setting where the two parties running the protocol are located at the same data center, and the Virginia–Ireland (VA–IRE) setting where the physical distance between the parties is large. (We omitted the results of running the large min-cut circuit in the VA–IRE setting as they were not consistent and had a high variability.) Each number in the table is an average of 20 executions of the indicated specific scenario.

The table rows marked in boldface highlight the best schemes under each set of assumptions. Looking at the results, we derive the following observations:

  • Best efficiency As predicted, the fixed key + half-gates implementation (8) is the fastest and most efficient in all scenarios. (This seems trivial, but when using fixed-key AES, the \(\mathsf{Eval}\) procedure at AND gates requires one more encryption than in a simple 4–3 GRR. Thus, this confirms the hypothesis that the communication saved is far more significant than an additional encryption, that is anyway pipelined.)

  • Small circuits In small circuits (e.g., AES), the running time is almost identical in all schemes and in both communication settings. In particular, using our optimizations (3) yields the same performance result as that of the most efficient scheme (8), in both the VA–VA and VA–IRE settings. This is due to the fact that in small circuits, running the OT protocol is the bottleneck of the protocol (even if, as in our experiments, optimize OT extension [2] is used). This means that for small circuits there is no reason to rely on a nonstandard cryptographic assumption.

  • Medium circuits In the larger SHA-256 circuit, where the majority of the gates are AND gates, there was a difference between the results in the two communication settings. In the VA–VA setting the best scheme based on PRF alone (3) has performance that is closer to that of the naïve scheme (1) than to that of the schemes based on the circularity or the public random permutation assumptions (schemes 6 and 8). In contrast, in the VA–IRE setting the PRF based scheme performs close to schemes 6 and 8. This is explained by observing that when the parties are closely located, communication is less dominant and garbling becomes a bigger factor. Thus, garbling XOR gates for free improves the performance of the protocol. In contrast, when the parties are far from each other, communication becomes the bottleneck, thus the PRF based scheme (3) yields a significant improvement compared to the naïve case (1) and its performance is not much worse than that of the best fixed-key-based scheme (and since there are fewer XOR gates, the overhead of an additional ciphertext per gate is reasonable).

  • Large circuits In the large Min-Cut circuit, the run time of our best PRF based scheme (3) is closer to the best result (8) than to the naïve result (1). This is explained by the fact that the circuit is very large and so bandwidth is very significant. This is especially true since the majority of gates are XOR gates, and so the reduction from 3 ciphertexts to 1 ciphertext per XOR gate has a big influence. (Observe that the size of the garbled circuit sent in (8) is 30.5 MB, the size of the garbled circuit sent in (3) is 69 MB, while the size of the garbled circuit sent in (1) is 161.4 MB.) Observe that schemes (6) and (8) have the same bandwidth; the difference in cost is therefore due to the additional cost of the AES key schedules and encryptions. Note, however, that despite the fact that there are 1,000,000 AND gates, the difference between the running times is 15%, which is not negligible, but also not overwhelming.

  • Removing the public random permutation assumption Comparing scheme (8), which is the most efficient, to scheme (6) which is the most efficient scheme that does not depend on the public random permutation assumption, shows that in all scenarios removing the fixed-key technique causes only a minor increase in running time.

We conclude that strengthening security by removing the public random permutation assumption does not noticeably affect the performance of the protocol. Thus, in many cases, two-party secure computation protocols do not need to use the fixed-key method. Further security strengthening by not depending on a circularity assumption (i.e., “paying” for XOR gates) does come with a cost. Yet, in scenarios where garbling time is not the bottleneck (e.g., small circuits, large inputs, communication constraints), one should consider using a more conservative approach as suggested in this work. In any case, we believe that our ideas should encourage future research on achieving faster and more efficient secure two-party computation based on standard cryptographic assumptions.