Improved homomorphic evaluation for hash function based on TFHE

Homomorphic evaluation of hash functions offers a solution to the challenge of data integrity authentication in the context of homomorphic encryption. The earliest attempt to achieve homomorphic evaluation of SHA-256 hash function was proposed by Mella and Susella (in: Cryptography and coding—14th IMA international conference, IMACC 2013. Lecture notes in computer science, vol 8308. Springer, Heidelberg, pp 28–44, 2013. https:// doi. org/ 10. 1007/ 978-3- 642-45239-0_3.) based on the BGV scheme. Unfortunately, their implementation faced significant limitations due to the exceedingly high multiplicative depth, rendering it impractical. Recently, a homomorphic implementation of SHA-256 based on the TFHE scheme (Homomorphic evaluation of SHA-256. https:// github. com/ zama-ai/ tfhe-rs/ tree/ main/ tfhe/ examp les/ sha256_ bool) brings it from theory to reality, however, its current efficiency remains insufficient. In this paper, we revisit the homomorphic evaluation of the SHA-256 hash function in the context of TFHE, further reducing the reliance on gate bootstrapping and enhancing evaluation latency. Specifically, we primarily utilize ternary gates to reduce the number of gate bootstrappings required for logic functions in message expansion and addition of modulo 2 32 in iterative compression. Furthermore, we demonstrate that our optimization techniques are applicable to the Chinese commercial cryptographic hash SM3. Finally, we give specific comparative implementations based on the TFHE-rs library. Experiments demonstrate that our optimization techniques lead to an improvement of approximately 35–50% compared with the state-of-the-art result under different cores.


Introduction
Fully homomorphic encryption (FHE) is a cryptographic technique that allows performing arbitrary function on ciphertexts without decryption.This remarkable property makes FHE an ideal solution for addressing security concerns in various domain such as machine learning, cloud computing, medical diagnostic and financial data analysis.Since Gentry (Gentry 2009) proposed ingenius bootstrapping technique to construct the first true fully homomorphic encryption scheme, extensive research spanning over a decade has resulted in significant advancements in both theoretical understanding and practical implementations of FHE.Some representative works include BGV (Brakerski et al. 2012), BFV (Brakerski 2012; Fan and Vercauteren 2012), CKKS (Cheon et al. 2017(Cheon et al. , 2018)), FHEW (Ducas and Micciancio 2015), TFHE (Chillotti et al. 2020) and Final (Bonte et al. 2022).
Indeed, one of the major challenges in FHE is the significant expansion in ciphertext size, which is generally three to six orders of magnitude larger than the plaintext size.Transciphering (Naehrig et al. 2011), by combining FHE with symmetric encryption scheme, can tackle the challenge of ciphertext size expansion, thereby mitigating the impact on communication costs between the client and the cloud.Specifically, instead of encrypting the data using fully homomorphic encryption scheme, the client encrypts the data using traditional symmetric encryption scheme.The encrypted data, in the form of symmetric ciphertexts, is then transmitted to the cloud.In this way, the ciphertext size expansion ratio of the data is only 1 (i.e., the ciphertext size divided by the plaintext size).Some additional operations need to be performed on the server side: convert the symmetric ciphertext to homomorphic ciphertext by evaluating the decryption circuit of symmetric encryption scheme homomorphically.Once the conversion is complete, the cloud can proceed to evaluate the desired function homomorphically.Therefore, optimizing the multiplicative depth of the decryption circuit is vital for achieving efficient execution within the transciphering framework.
Homomorphic evaluation of symmetric encryption schemes, including block ciphers and stream ciphers, has garnered significant attention in recent years.Early in 2012, Gentry et al. (2012) presented a homomorphic evaluation of AES-128 encryption using the BGV scheme, and they obtained an execution time of more than 4 min based on the leveled mode and a latency of 18 min based on the boostrapped mode(in updated version of this paper).Since then, optimized evaluations of AES have been developed, and a recent work (Trama et al. 2023) claimed to reduce the evaluation time of an AES block to 30 s.In addition to optimizing AES, researchers have explored the use of lightweight block ciphers to achieve lower evaluation latency.On the other hand, researchers have also delved into the investigation of specialized FHE-friendly block ciphers (Albrecht et al. 2015) or stream cipher (Dobraunig et al. 2018;Cid et al. 2022) with lower multiplicative depth and complexity.

Motivation for homomorphic evaluation of hash function
Fully homomorphic encryption in combination with symmetric encryption solves the ciphertext size expansion problem.What about with hash functions?A direct application is to verify the integrity of data in a homomorphic sense.The earliest evaluation of hash function can be traced back to Mella and Susella (2013), who presented a homomorphic evaluation of the SHA-256 hash algorithm based on the BGV scheme.However, the main challenge encountered in evaluating SHA-256 homomorphically is the extremely high multiplicative depth caused by its significant number of iteration rounds, and the authors did not provide a practical implementation time.Compared with the BGV scheme, TFHE has the advantage of not being limited by circuit depth, e.g., Lou and Jiang (2019) evaluated deep neural networks by means of TFHE.Recently in Bendoukha et al. (2022), Bendoukha et al evaluated hash functions constructed by lightweight block ciphers such as PRINCE, SIMON, and LowMC using the TFHE scheme.They also proposed several intriguing application scenarios for homomorphic evaluation of hash, such as Homomorphic Data Integrity Check, Single Secret Leader Election, Homomorphic Database Querying and Oblivious Authenticated (Homomorphic) Calculation, which greatly encourage and highlight the need for homomorphic evaluation of hash functions.However, it is worth noting that their homomorphic evaluation of hash functions is directly derived from some previous evaluation of lightweight block ciphers, and these constructed hash functions are not already standardized, making them difficult to deploy in industry.In this paper, we focus on the well-studied and standardized hash algorithm SHA-256 and Chinese commercial cryptographic hash SM3 (https:// oscca.gov.cn/ sca/ xxgk/ 2010-12/ 17/ 10023 89/ files/ 302a3 ada05 7c4a7 38305 36d03 e6831 10. pdf ).We note that a homomorphic implementation of SHA-256 (Homomorphic evaluation 2023) is proposed based on the TFHE scheme, but there is still significant room for optimization.

Our contributions
In this paper, we revisit the evaluation of SHA-256 in the context of TFHE homomorphic encryption and concentrate on improving the latency of SHA-256 evaluation.We first discuss modifications to the SHA-256 code to make it more friendly to the TFHE scheme.One significant improvement is the utilization of ternary gates, which effectively reduces the number of gate bootstrappings required for evaluating SHA-256.Specifically, the logic functions σ 0 1 σ 1 , 2 s 0 , 3 s 1 4 and Maj 5 required in message expansion can be evaluated with only a single bootstrapping.For the expensive addition of modulo 2 32 , we present a number of optimization techniques to further minimize the number of required gate bootstrappings.Moreover, we show that our optimization techniques are also applicable to the evaluation of SM3 hash algorithm.Finally, we provide a concrete implementation based on the TFHE-rs library.Our experimental results show that our optimization tricks can achieve about 35%-50% efficiency gains compared with the state-of-the-art under different CPUs.

Related works
The transciphering framework was initially proposed in Naehrig et al. (2011), and early works mainly focused on some popular symmetric ciphers, such as AES (Gentry et al. 2012), SIMON (Lepoint and Naehrig 2014), SPECK (Togan et al. 2015) and PRINCE (Doröz et al. 2016).However, their evaluation efficiency is not satisfactory due to the high multiplicative depth.Two recent works (Stracovsky et al. 2022;Trama et al. 2023) based on TFHE's programmable bootstrapping technique greatly improve the evaluation latency of AES.
There has been significant research on designing FHE-friendly symmetric cryptographic primitives, aiming to achieve lower multiplicative complexity and depth.LowMC (Albrecht et al. 2015) is the first FHEfriendly cipher, however, it has been found to be vulnerable to algebraic attack (Dinur et al. 2015;Dobraunig et al. 2015;Rechberger et al. 2018).In 2022, an FHEfriendly block cipher called Chaghri (Ashur et al. 2022) with lower multiplicative depth is proposed, which is 63% faster than the evaluation of AES using the BGV scheme.Another line of research focuses on FHEfriendly stream cipher design that allow some expensive computations to be performed offline due to the fact that their encryption and decryption are simple XORs.Canteaut et al. (2016) first evaluated the Trivum algorithm in the eSTREAM project and proposed Kreyvium with a 128 bit security level.Since then, numerous FHE-friendly stream cipher designs have emerged, such as FLIP-like (Méaux et al. 2016(Méaux et al. , 2019;;Hoffmann et al. 2020;Cosseron et al. 2022) and Rasta-like (Dobraunig et al. 2018;Ha et al. 2020;Hebborn and Leander 2020;Dobraunig et al. 2023;Cid et al. 2022).Mandal and Gong (2021) et al studied the gate complexity of boolean circuits from NIST lightweight cryptography (LWC) round 2 candidates and gave their evaluation latency based on the TFHE scheme.Moreover, Cho et al. (2021) proposed a transciphering framework for approximate homomorphic encryption, called RtF, which consists of stream cipher over modular domain and transformation from BFV to CKKS.Also they proposed the stream cipher HERA as building block of the RtF framework.Ha et al. (2022) proposed faster Rubato cipher suitable for the RtF framework, which has lower multiplicative depth.
The first SHA-256 evaluation based on BGV scheme was given by Mella and Susella (2013).The required multiplication depths for word-sliced implementation, packed implementation and bit slice implementation are 2762.5,3310.5 and 2634, respectively.Due to ultra high multiplication depths, it is not possible to give a practical implementation of SHA-256.Bendoukha et al. (2022) homomorphically evaluated hash functions based on the construction of "FHE-friendly" grouping ciphers such as PRINCE, LowMC and SIMON.In Homomorphic evaluation (2023) the authors presented a practical implementation of SHA-256 based on the TFHE scheme combined with a number of optimization techniques.

Paper organization
The paper is organized as follows.In "Preliminaries" section, we review the preliminary knowledge required for this paper, in particular, about the TFHE cryptosystem."Specifications of SHA-256 and SM3" section gives an introduction about the NIST standard hash SHA-256 and the Chinese commercial cryptographic hash SM3."Hash goes to homomorphic" section provides details about how to convert the these two hash algorithms to efficient homomorphic computation."Implementation and experimental results" section presents specific performance and implementation results.We conclude this paper in "Conclusion" section.

Notations
Let T = R/Z be the real torus, i.e., the additive group of real numbers modulo 1.We will use T N [X] k to denote the set of polynomials of size k that have coefficients in T and modulo (X N + 1) , where N is usually a power of 2. B N [X] denotes the polynomials with binary coefficients and modulo X N + 1 .<, > denotes the inner product.We use ≫ to denote right-rotation, and ≫ to represent right- shift operations, such as x ≫ n by discarding the right- most n bits and then adding n zeros to the left.

Hash function
Hash function can map message (data) with arbitrary length into hash value with fixed length (also known as message digest), which is widely used in cryptography, typically for signature, encryption, message authentication code and other authentication, etc. Hash function need to satisfy the following security properties: • Collision resistance: Finding two messages with the same hash value is computationally difficult.• Pre-image Resistance: Given the value h, which is the output of some hash function H, finding the message The TFHE cryptosystem TFHE (Chillotti et al. 2020) is currently the fastest scheme to achieve bootstrapping, which builds on the FHEW scheme (Ducas and Micciancio 2015).There are three types of ciphertexts defined in the TFHE scheme, and they play different roles in fast bootstrapping.

TFHE ciphertexts
, where a is uniformly sampled from T n , m is the encoded mes- sage, the secret key s is uniformly sampled from B n , and the error e ∈ T n is sampled from Gaussian distri- bution with mean 0 and standard deviation σ.
, where a is uniformly sampled from T N [X] k , m is the encoded phase polynomial, the secret key s is uniformly sampled from B N [X] k and the error e ∈ T N [X] is a polynomial with random coefficients from sampled from Gaussian distribution with mean 0 and standard deviation σ .Generally, k = 1.
• TRGSW: 2ℓ PBS fresh TRLWE samples.In detail, TRGSW encrypts the message m ∈ B into C as fol- lows: where (a i (x), b i (x)), for 1 ≤ i ≤ 2ℓ PBS is are TRLWE ciphertexts encrypting 0 using the same secret key, β PBS denotes the basis of gadget decomposition and ℓ PBS is the length of gadget decomposition.

Remark
In TFHE's bootstrapping, the TLWE ciphertext is the input to be bootstrapped, TRLWE is the ciphertext that encodes the test polynomial and will be used as intermediate ciphertext in the bootstrapping.Each part of the TLWE secret key would be encrypted to be TRGSW ciphertext as bootstrapping key, which can be precomputed.

TFHE bootstrapping
Bootstrapping allows refreshing ciphertext with large noise to support further homomorphic computation.
The most important feature of the TFHE scheme is the efficient bootstrapping, which consists of three core algorithms: blind rotation, sample extraction and key switching, as shown in Algorithm 2. Key Switching Two kinds of Key Switching are proposed by Chillotti et al. (2020).The first one is Public Functional KeySwitching, which allows packing TLWE samples into TRLWE sample or switching secret key.It can also evaluate the public linear function f on the input TLWE samples.The second one is Private Functional KeySwitching, which can evaluate private linear function on the input TLWE samples by encoding the secret f into the KeySwitching key.
Blind Rotation Blind rotation, as the name implies, rotates a polynomial encrypted as TRLWE ciphertext by an encrypted index, which is the core operation in bootstrapping.In fact, the blind rotation is mainly constructed by successive external products.Algorithm 1 presents the detailed blind rotation operation.

Sample Extraction
This operation can extract the TLWE ciphertext encrypting any m i from the TRLWE ciphertext encrypt- ing the message m , which encrypts m 0 .This can be simply proved by the decryption of TRLWE.

Specifications of SHA-256 and SM3
SHA-256 (Science 2012) is a hash function developed by the NSA and published by NIST in 2001, while SM3 (https:// oscca.gov.cn/ sca/ xxgk/ 2010-12/ 17/ 10023 89/ files/ 302a3 ada05 7c4a7 38305 36d03 e6831 10. pdf ) is a Chinese commercial cryptographic hash algorithm standard published by the Chinese National Cryptography Administration in 2010.Both of them are Merkle-Damgå rd structure that processes a 512-bit block of input messages and returns a 256-bit hash value.The hash function SHA-256 and SM3 operate on 32-bit variables, combining NOT, XOR, OR, AND, rotation and addition of modulo 2 32 .

Message padding
Assume that the message m has ℓ bits length.First add "1" to the end of the message followed by k zeros, where k is the smallest non-negative integer such that ℓ + k + 1 = 448 (mod 512) .And then add a 64-bit string which is equal to the binary expansion of ℓ .The bit length of the padded message M is a multiple of 512.

Some useful logical functions
These useful functions will be used in the message schedule and iterative compression function.

SHA-256 hash computation
Then, each message block M 1 , M 2 , . . ., M N would be processed using the following four loop steps, for i from 1 to N: (1)Message schedule: (2)Initialization: (3)Iterative compression: for t = 0 to 63 : , H (i) After repeating steps one through four a total of N times, the resulting 256-bit message digest is 1 illustrates the state update step of SHA-256.

Recall on SM3 hash function
SM3 consists of two parts: message expansion and status update transformation.Below, we will describe these two parts.The auxiliary functions P 0 and P 1 , which operate on 32-bit words, are defined as follows:

Message expansion
The input here is the 512 message block splitted as 16 32-bit words W 0 , . . ., W 15 and then is expanded to 68 32-bit words W i : for 16 ≤ i < 68 and 64 expanded words

State update transformation
In SM3, the state update transformation starts with fixed initial values of eight 32-bit words and updates them in 64 rounds.Let A, B, C, D, E, F , G and H denote the inner state registers, the j-th round transformation is given by where the bitwise boolean functions FF j and GG j are defined by Note that T j = 0x79cc4519 for 0 ≤ j < 15 and T j = 0x7a879d8a , for 16 ≤ j < 63 .After the last step of the state update transformation, the initial values are added to the output values of the last step.The result is the final hash value or the initial value for the next message block, as SHA-256.

Hash goes to homomorphic
Indeed, when designing hash functions, it is crucial to ensure efficient computation on software platforms.As shown in "Specifications of SHA-256 and SM3" section, Fig. 1 The state update function of SHA-256 the core computation units of hash functions typically involve basic instructions such as AND, OR, NOT, and ROTATION.The TFHE scheme boasts efficient gate bootstrapping, and obviously the evaluation of function designed by gates based on this scheme is more flexible and not limited by the circuit depth compared with the BGV or BFV scheme.Therefore we will present the homomorphic computation of SHA-256 and SM3 by means of TFHE.
It is important to highlight that gate bootstrapping is computationally demanding when gates are used as the basic computational unit in the encrypted domain.To improve the overall computational performance, minimizing the number of gates consumed by the circuit becomes a crucial consideration.In particular, in SHA-256 and SM3, the basic operation mainly consists of functions composed of logic gates and addition of modulo 2 32 .In the following, we present our circuit optimization.
A short reminder of gate bootstrapping.
For ease of representation, in gate bootstrapping, binary messages 0 and 1 are encoded as −1/8 and 1/8 over the torus, respectively.Now assume two TLWE ciphertexts c 1 and c 2 , then some basic homomorphic gate operations are as follows: using two gate bootstrap- pings and a public key switching.

Trivial gate reduction in SHA-256
In Homomorphic evaluation (2023), the authors proposed optimizations for reducing the usage of logic gates in the Ch and Maj functions of the SHA-256 algorithm, thereby reducing the number of gate bootstrappings required.Specifically, for function Ch(x, y, z) = (x ∧ y) ⊕ (¬x ∧ z) , it can be easily inferred that the result is y when x = 1 , and z when x = 0 , which behaves like a bitwise multiplexer.In this way, we can replace the 4 gates in the Ch function with a HomMUX gate in the encrypted domain.The function Thanks to the the boolean distributive law and can be simplified as As a result, the number of gates required by Maj can be reduced from 5 to 4. While these optimizations do improve the overall evaluation efficiency of the SHA-256 hash, they are still not sufficient for achieving optimal efficiency within the TFHE scheme.

Further gate reduction of function in SHA-256
In this subsection, we further reduce the number of gates needed to evaluate the SHA-256 in the encrypted domain.We observe that the σ 0 , σ 1 , s 0 and s 1 functions involve different rotations or shifts of 32-bit word, followed by two consecutive XOR operations.The rotation and shift operations are now free due to bit-wise encryption, and next we will explain how to implement the XOR between the 3 inputs using one gate bootstrapping (i.e., one blind rotation).Moreover, the Maj function can also be implemented with only one gate bootstrapping.
Ternary gates are introduced into the TFHE scheme in Matsuoka et al. (2021), containing XOR3 and 2OF36 gates, where XOR3 is the XOR of 3 inputs, and 2OF3 gate outputs true if at least two inputs are true.
The implementation of the ternary gates in the encrypted domain is as follows: Now we give a high-level explanation for their correctness.Note that the test (negacyclic) polynomial in the gate bootstrapping is set to: From another point of view, for the XOR3 function, the result is equal to the least significant bit of the sum of the 3 inputs.As we show in Fig. 2, when three plain inputs are 0||0||0 or 1||1||0 (independent of the order), i.e., their encoding phase sum = − 3 8 or 1 8 , the desired result is 0, i.e., − 1 8 on torus; and when the input is 1||0||0 or 1||1||1 (independent of the order), i.e., phase sum = − 1 8 or 3 8 , the desired result is 0, i.e., 1 8 on torus.Therefore, to match the test polynomial, we simply multiply the sum by −2 such that phase can be divided into two separate pieces on the torus.For 2OF3 , the result is the most significant bit of the sum of the three inputs, which exactly match the settings of the test polynomial.
In this way, the σ 0 , σ 1 , s 0 , s 1 and Maj functions can be computed homomorphically by just one expensive blind rotation, while the Ch function needs to be implemented in the encrypted domain using HomMUX at the cost of about two blind rotations.One thing that must be noted is that the ternary gate requires the sum of 3 inputs, and it is better to use larger parameters in order not to affect the correctness of the decryption.In the experiment, we show that the parameter sets satisfy this requirement.

Addition of modulo 2 32
In addition to some logical functions, the arithmetic addition of modulo 2 32 is also widely used in SHA-256, which would be the most time-consuming operation.Integer arithmetic can be directly implemented in the second generation FHE schemes such as BGV and BFV, but bootstrapping efficiency of these schemes currently perform poorly, which is unfriendly to deep circuits.As mentioned in the previous section, we choose the efficient TFHE scheme to implement the hash function homomorphically.A natural question is how to efficiently evaluate the required homomorphic addition of modulo 2 32 via TFHE.
For the addition of two n−bit integers, a naive method is to use Ripple Carry Adder(RCA), which is constructed by cascading multiple full adder gates, as illustrated in Fig. 3.For an n−bit adder, there must be n full adder gates.The output of the full adder can be obtained by the following equation: Klemsa and Önen (2022) also apply this to the addition of integer.Therefore, we only need 32 * 2 − 1 = 63 instead of 32 * 5 − 3 = 157 gate bootstrappings to evaluate addi- tion of modulo 2 32 by utilizing ternary gates.
Optimization of sequential addition Note that we have Fig. 2 The mapping relationship required for the sum of the three inputs, for XOR3 and 2OF3 respectively Fig. 3 Bitwise addition of two n−bit numbers a and b, where a i , b i , c i , s i are ith-bit of a, b, carry and the result, respectively.Due to modulo 2 32 , the last carry bit c n would be discarded, in other words, we don't need to compute it in the message schedule and T 1 = h + s 1 (e) + Ch(e, f , g) +K t + w t in the iterative compression.These two func- tions involve successive addition operations, which can be optimized using the Carry Save Adder(CSA).
CSA has a very small carry propagation delay when performing the addition of multiple numbers, the idea behind it is that the sum of three inputs is reduced to the sum of two inputs and the carry C and sum S are computed separately for each bit, thus it is faster.
It is interesting to note that the carry save adder can be constructed by the full adder, so the optimizations we introduced previously for full adder can be extended to CSA as well.

Parallel implementation
The disadvantage of RCA is that the carry-in bit of each full adder is derived from the carry-out bit of the previous cascaded full adder, making the critical path of the adder circuit positively correlated with the bit length of input.The Carry LookAhead Adder (CLA) reduces the depth of the critical path by parallel computation.CLA computes one or more carry bits before the sum, which reduces the waiting time of computing the carry bit, so this seems to be very friendly to the BGV scheme.Mella and Susella (2013) firstly used CLA for homomorphic computation of SHA-256 based on the BGV scheme, for 32-bit addition they estimated to consume 10 multiplication depths.The multiplication depth for computing CLA was further reduced from 10 to 5 for 32-bit addition in Togan et al. (2015).The idea is to use the Equation(**) instead of Equation(*) to compute the carry bit, which Table 1 The truth table of the function FF j , if 16 ≤ j < 64 eliminates the evaluation of the OR function.Specifically, let P i = a i ⊕ b i and G i = a i ∧ b i , then P i and G i can be precomputed in parallel when there are more CPUs available, independent of carry bits.In this way, we can rewrite the carry bit of 32-bit adder respectively as follows: Thus, the result of 32-bit adder is S i = P i ⊕ C i , for 0 ≤ i ≤ 31.
In Mella and Susella (2013), Togan et al. (2015), they exploited the batch packing capability of the BGV scheme.However, it is hard to give a practical time for homomorphic computation of SHA-256 and SPECK cipher based on the BGV scheme because the parameters of the leveled BGV scheme are related to the multiplicative depth of the circuit and the bootstrapping is not efficient enough.
In the context of the TFHE scheme, we similarly utilize the Equation(**) rather than Equation(*) like (Togan et al. 2015).The reason for this is that successive XOR give us much room for optimization.For all carry bit C i , for 2 ≤ i ≤ 31 , we can still uti- lize HomXOR3 gate to reduce the gate required in encrypted domain.
Compared to the BGV scheme, the TFHE scheme does not support batch processing.Hence a natural solution for TFHE scheme is to do parallelization using multiple CPUs, which is reasonable for the cloud server with a large number of CPUs.Some advanced Parallel Prefix Adders (Payal et al. 2015) for CLA structures such as Brent-Kung adder, Kogge-Stone adder and Ladner-Fischer adder are proposed for high performance arithmetic structures in industry.In Homomorphic evaluation (2023), they utilized the Brent-Kung and the Ladner-Fischer Adder for optimization.See "Appendix A" for a more detailed description.For a fair experimental comparison, we also exploit these two optimization techniques.

Analysis of functions in SM3
In this subsection, we give an analysis of homomorphic evaluation of the hash algorithm SM3.The interesting observation is that for the GG j function, the result is z if x = 0 and y otherwise, which is equivalent to the Ch function for 16 ≤ j < 64 , i.e., the mux gate.For the FF j function, it can be seen from Table 1 that it implements the same function as the Maj function for 16 ≤ j < 64 .Thus, the FF , GG, P 0 and P 1 functions can be imple- mented using only one bootstrapping.For addition modulo 2 32 , it can be observed that SM3 uses fewer con- secutive modulo additions compared to SHA-256 in the iterative compression function, enabling it have a lower latency evaluation.For the specific evaluation method of SM3 we use the method mentioned in the above section, please refer to the next section for the specific implementation results.

Implementation and experimental results
In this section we provide a detailed explanation of our implementation for evaluating the hash functions SHA-256 and SM3 based on the TFHE scheme.To the best of our knowledge, the TFHE-rs library 7 is the fastest public implementation of the TFHE scheme among the homomorphic cryptographic libraries (https:// www.zama.ai/ post/ annou ncing-tfhe-rs).Therefore, we implement our evaluation method in the TFHE-rs library.All tests were conducted on 12th Gen Intel(R) Core(TM) i5-12500 × 12 with 15.3GB RAM, running the Ubuntu 20.04 operating system.

Experimental parameter setting
Now we present our parameter settings in the TFHE scheme.We use two parameter sets from the TFHE-rs library, as shown in Table 2, both of which provide at least 128 bits of security."DEFAULT_PARAMS" guarantees an error probability bound of 2 −40 and "TFHE_LIB_PAR- AMS" provides a lower decryption error rate of 2 −165 , which can be used for different scenario requirements.

Performance result
In this subsection, we present a comparison of our evaluation experimental data.A trival implementation of SHA-256 based on the TFHE-rs library is currently publicly available from Homomorphic evaluation (2023).For a fair experimental comparison, we run their code on our machine.One thing to note is that in addition to bit-wise  encryption, the TFHE-rs implementation based on twobit encryption is available from Github. 8However, this implementation takes up to 23 min due to the fact that this encryption is not suitable for rotation operation, resulting in huge latency even when we use multiple CPUs.Therefore, we did not consider further optimization of this implementation.
As in their experiments, we use Rayon, a multithreaded crate of the Rust programming language, to parallelize the implementation when there are available CPUs.Specifically, we can control the number of CPUs used by calling the interface rayon::ThreadPoolBuilder:: new().num_threads().build_global().unwrap().We present the comparison of homomorphic evalaution of SHA-256 and SM3 based on the parameter sets "DEFAULT PAR-AMS" and "TFHE_LIB_PARAMS" for different CPU cores in Figs. 4 and 5, respectively.More detailed data, please refer to Table 3 in "Appendix B".
Experimental results show that for the SHA-256 and SM3 algorithm we achieve about 35%-50% efficiency improvement compared to the state-of-the-art work, especially up to 50% when only one CPU is used.We observed that the Brent-Kung adder outperforms the Ladner-Fishcher adder, particularly when fewer CPUs are used.The overall SM3 evaluation latency is lower than SHA-256 due to its use of fewer additions.It is worth noting that when using the "TFHE_LIB_PARAMS" parameter, the evaluation latency tends to be higher.However, this parameter set offers the benefit of a lower decryption error rate, ensuring higher reliability in the evaluation results.

Conclusion
In this paper, we explore the application of ternary gates to the various logic functions required for hash functions and further reduce the number of gate bootstrapping required by SHA-256 and SM3 in the context of TFHE, realizing an improvement in efficiency.This advancement holds significant potential for various applications, including data integrity checking and private database retrieval, where hash functions play a vital role.
Further optimization directions for hash function evaluation include utilizing the fully homomorphic encryption scheme FINAL (Bonte et al. 2022) constructed by NTRU cipher, which achieves faster gate bootstrapping efficiency compared with TFHE.We believe this can directly reduce the overall runtime latency.Lower latency can be obtained when there is a large number of CPUs available, such as GPU.

Fig. 4 Fig. 5
Fig. 4 Comparison of implementations of SHA-256 and SM3 based on parameter set "DEFAULT_PARAMS" under different CPU cores

Table 2
Parameter sets of the TFHE scheme

Table 3
The latency(s) of homomorphic evaluation of SHA-256 and SM3 based on different parameter sets using different CPUsThe numbers in bold indicate that they are the results of our implementation, to distinguish them from previous results