Skip to main content

Fast Falcon Signature Generation and Verification Using ARMv8 NEON Instructions

  • Conference paper
  • First Online:
Progress in Cryptology - AFRICACRYPT 2023 (AFRICACRYPT 2023)

Abstract

We present our speed records for Falcon signature generation and verification on ARMv8-A architecture. Our implementations are benchmarked on Apple M1 ‘Firestorm’, Raspberry Pi 4 Cortex-A72, and Jetson AGX Xavier. Our optimized signature generation is \(2\times \) slower, but signature verification is 3–3.9\(\times \) faster than the state-of-the-art CRYSTALS-Dilithium implementation on the same platforms. Faster signature verification may be particularly useful for the client side on constrained devices. Our Falcon implementation outperforms the previous work targeting Jetson AGX Xavier by the factors \(1.48\times \) for signing in falcon512 and falcon1024, \(1.52\times \) for verifying in falcon512, and \(1.70\times \) for verifying in falcon1024. We achieve improvement in Falcon signature generation by supporting a larger subset of possible parameter values for FFT-related functions and applying our compressed twiddle-factor table to reduce memory usage. We also demonstrate that the recently proposed signature scheme Hawk, sharing optimized functionality with Falcon, has \(3.3\times \) faster signature generation and 1.6–1.9\(\times \) slower signature verification when implemented on the same ARMv8 processors as Falcon.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/diaxen/fft-garden.

  2. 2.

    https://godbolt.org/z/esP78P33b.

  3. 3.

    https://godbolt.org/z/613vvzh3Y.

  4. 4.

    https://godbolt.org/z/zPr94YjYr.

  5. 5.

    https://github.com/ludopulles/hawk-sign/.

  6. 6.

    https://github.com/GMUCERG/xmssfs.

  7. 7.

    https://github.com/sphincs/sphincsplus.

  8. 8.

    mitigations=off https://make-linux-fast-again.com/.

  9. 9.

    https://github.com/mupq/pqax/tree/main/enable_ccr.

  10. 10.

    https://github.com/dougallj.

  11. 11.

    https://github.com/GMUCERG/PQC_NEON/blob/main/neon/kyber/m1cycles.c.

References

  1. Abdulrahman, A., Hwang, V., Kannwischer, M.J., Sprenkels, D.: Faster kyber and dilithium on the Cortex-M4. In: Ateniese, G., Venturi, D. (eds.) Applied Cryptography and Network Security, ACNS 2022. Lecture Notes in Computer Science, vol. 13269, pp. 853–871. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-09234-3_42

    Chapter  Google Scholar 

  2. Alagic, G., et al.: Status report on the third round of the NIST post-quantum cryptography standardization process (2022)

    Google Scholar 

  3. Alkim, E., Bilgin, Y.A., Cenk, M., Gérard, F.: Cortex-M4 optimizations for R, MLWE schemes. IACR TCHES 2020(3), 336–357 (2020)

    Article  Google Scholar 

  4. Andrysco, M., Nötzli, A., Brown, F., Jhala, R., Stefan, D.: Towards verified, constant-time floating point operations. In: ACM CCS 2018, pp. 1369–1382 (2018)

    Google Scholar 

  5. Bai, S., et al.: CRYSTALS-Dilithium: Algorithm Specifications and Supporting Documentation (Version 3.1) (2021)

    Google Scholar 

  6. Becker, H., Hwang, V., Kannwischer, M.J., Yang, B.Y., Yang, S.Y.: Neon NTT: faster dilithium, kyber, and saber on cortex-A72 and apple M1. IACR TCHES 1, 221–244 (2022)

    Google Scholar 

  7. Becker, H., Kannwischer, M.J.: Hybrid scalar/vector implementations of Keccak and SPHINCS+ on AArch64. Cryptology ePrint Archive, Report 2022/1243

    Google Scholar 

  8. Becker, H., Mera, J.M.B., Karmakar, A., Yiu, J., Verbauwhede, I.: Polynomial multiplication on embedded vector architectures. IACR TCHES 2022(1), 482–505 (2022)

    Google Scholar 

  9. Becoulet, A., Verguet, A.: A depth-first iterative algorithm for the conjugate pair fast Fourier transform. IEEE Trans. Sig. Process. 69, 1537–1547 (2021). https://doi.org/10.1109/TSP.2021.3060279

    Article  MathSciNet  MATH  Google Scholar 

  10. Bennett, H., Ganju, A., Peetathawatchai, P., Stephens-Davidowitz, N.: Just how hard are rotations of \(\mathbb{{Z}} ^n\)? Algorithms and cryptography with the simplest lattice. Cryptology ePrint Archive, Report 2021/1548 (2021)

    Google Scholar 

  11. Bernstein, D.J., Hülsing, A., Kölbl, S., Niederhagen, R., Rijneveld, J., Schwabe, P.: The SPHINCS+ signature framework. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (2019)

    Google Scholar 

  12. Bindel, N., McCarthy, S., Twardokus, G., Rahbari, H.: Drive (Quantum) safe! - towards post-quantum security for V2V communications. Cryptology ePrint Archive, Paper 2022/483 (2022)

    Google Scholar 

  13. Blake, A.M., Witten, I.H., Cree, M.J.: The fastest Fourier transform in the south. IEEE Trans. Sig. Proc. 61, 4707–4716 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  14. Botros, L., Kannwischer, M.J., Schwabe, P.: Memory-efficient high-speed implementation of Kyber on Cortex-M4. In: Buchmann, J., Nitaj, A., Rachidi, T. (eds.) AFRICACRYPT 2019. LNCS, vol. 11627, pp. 209–228. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23696-0_11

    Chapter  Google Scholar 

  15. Buchmann, J., Dahmen, E., Hülsing, A.: XMSS - a practical forward secure signature scheme based on minimal security assumptions. In: Yang, B.-Y. (ed.) PQCrypto 2011. LNCS, vol. 7071, pp. 117–129. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25405-5_8

    Chapter  Google Scholar 

  16. Chen, L., et al.: Report on post-quantum cryptography. Technical Report. NIST IR 8105, National Institute of Standards and Technology (2016)

    Google Scholar 

  17. Chung, C.M.M., Hwang, V., Kannwischer, M.J., Seiler, G., Shih, C.J., Yang, B.Y.: NTT multiplication for NTT-unfriendly rings: new speed records for saber and NTRU on Cortex-M4 and AVX2. IACR Trans. Cryptographic Hardw. Embed. Syst. 2021(2), 159–188 (2021)

    Article  Google Scholar 

  18. Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19, 297–301 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  19. Cooper, D.A., et al.: Recommendation for stateful hash-based signature schemes. NIST Spec. Publ. SP 800, 208 (2020)

    Google Scholar 

  20. Dagdelen, Ö., Fischlin, M., Gagliardoni, T.: The Fiat–Shamir transformation in a quantum world. In: Sako, K., Sarkar, P. (eds.) ASIACRYPT 2013. LNCS, vol. 8270, pp. 62–81. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-42045-0_4

    Chapter  MATH  Google Scholar 

  21. Ducas, L., Postlethwaite, E.W., Pulles, L.N., van Woerden, W.: Hawk: module LIP makes lattice signatures fast, compact and simple. Cryptology ePrint Archive, Report 2022/1155 (2022). https://eprint.iacr.org/2022/1155

  22. Ducas, L., van Woerden, W.P.J.: On the lattice isomorphism problem, quadratic forms, remarkable lattices, and cryptography. In: Dunkelman, O., Dziembowski, S. (eds.) EUROCRYPT 2022, Part III. Lecture Notes in Computer Science, vol. 13277, pp. 643–673. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-07082-2_23

    Chapter  Google Scholar 

  23. Fouque, P.A., et al.: Falcon: Fast-Fourier Lattice-based Compact Signatures over NTRU: Specifications v1.2 (2020)

    Google Scholar 

  24. Frigo, M., Johnson, S.G.: FFTW: fastest Fourier transform in the west. Astrophysics Source Code Library, pp. ascl-1201 (2012)

    Google Scholar 

  25. Howe, J., Westerbaan, B.: Benchmarking and Analysing the NIST PQC Finalist Lattice-Based Signature Schemes on the ARM Cortex M7. Cryptology ePrint Archive, Paper 2022/405 (2022)

    Google Scholar 

  26. Huelsing, A., Butin, D., Gazdag, S.L., Rijneveld, J., Mohaisen, A.: XMSS: eXtended Merkle Signature Scheme. RFC 8391 (2018). https://www.rfc-editor.org/info/rfc8391

  27. Hülsing, A.: W-OTS+ – shorter signatures for hash-based signature schemes. In: Youssef, A., Nitaj, A., Hassanien, A.E. (eds.) AFRICACRYPT 2013. LNCS, vol. 7918, pp. 173–188. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38553-7_10

    Chapter  Google Scholar 

  28. Jalali, A., Azarderakhsh, R., Mozaffari Kermani, M., Campagna, M., Jao, D.: ARMv8 SIKE: optimized supersingular isogeny key encapsulation on ARMv8 processors. IEEE Trans. Circ. Syst. I: Regul. Pap. 66, 4209–4218 (2019)

    Google Scholar 

  29. Kannwischer, M.J., Petri, R., Rijneveld, J., Schwabe, P., Stoffelen, K.: PQM4: post-quantum crypto library for the ARM Cortex-M4. https://github.com/mupq/pqm4

  30. Karmakar, A., Bermudo Mera, J.M., Sinha Roy, S., Verbauwhede, I.: Saber on ARM. IACR Trans. Cryptographic Hardw. Embed. Syst. 2018(3), 243–266 (2018)

    Article  Google Scholar 

  31. Kim, Y., Song, J., Seo, S.C.: Accelerating falcon on ARMv8. IEEE Access 10, 44446–44460 (2022). https://doi.org/10.1109/ACCESS.2022.3169784

    Article  Google Scholar 

  32. Kwon, H., et al.: ARMed Frodo. In: Kim, H. (ed.) WISA 2021. LNCS, vol. 13009, pp. 206–217. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89432-0_17

    Chapter  Google Scholar 

  33. Lyubashevsky, V.: Fiat-Shamir with aborts: applications to lattice and factoring-based signatures. In: Matsui, M. (ed.) ASIACRYPT 2009. LNCS, vol. 5912, pp. 598–616. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10366-7_35

    Chapter  Google Scholar 

  34. McGrew, D., Curcio, M., Fluhrer, S.: RFC 8554: Leighton-Micali hash-based signatures (2019). https://www.rfc-editor.org/rfc/rfc8554

  35. Nguyen, D.T., Gaj, K.: Fast NEON-based multiplication for lattice-based NIST post-quantum cryptography finalists. In: Cheon, J.H., Tillich, J.-P. (eds.) PQCrypto 2021 2021. LNCS, vol. 12841, pp. 234–254. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-81293-5_13

    Chapter  MATH  Google Scholar 

  36. Nguyen, D.T., Gaj, K.: Optimized software implementations of CRYSTALS-Kyber, NTRU, and Saber using NEON-based special instructions of ARMv8. In: Proceedings of the NIST 3rd PQC Standardization Conference (NIST PQC 2021) (2021)

    Google Scholar 

  37. Pornin, T.: New Efficient, Constant-Time Implementations of Falcon. Cryptology ePrint Archive, Report 2019/893 (2019). https://eprint.iacr.org/2019/893

  38. Seo, H., Sanal, P., Jalali, A., Azarderakhsh, R.: Optimized implementation of SIKE round 2 on 64-bit ARM Cortex-A processors. IEEE Trans. Circuits Syst. I Regul. Pap. 67(8), 2659–2671 (2020)

    Article  Google Scholar 

  39. Shor, P.: Algorithms for quantum computation: discrete logarithms and factoring. In: Proceedings 35th Annual Symposium on Foundations of Computer Science, pp. 124–134. IEEE Computer Society Press, Santa Fe, NM, USA (1994)

    Google Scholar 

  40. Streit, S., De Santis, F.: Post-quantum key exchange on ARMv8-A: a new hope for NEON made simple. IEEE Trans. Comput. 11, 1651–1662 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  41. Zhao, L., Zhang, J., Huang, J., Liu, Z., Hancke, G.: Efficient Implementation of kyber on Mobile devices. In: 2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS), pp. 506–513

    Google Scholar 

Download references

Acknowledgments

This work has been partially supported by the National Science Foundation under Grant No.: CNS-1801512 and by the US Department of Commerce (NIST) under Grant No.: 70NANB18H218.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Duc Tri Nguyen .

Editor information

Editors and Affiliations

A Visualizing Complex Point Multiplication

A Visualizing Complex Point Multiplication

 

Fig. 2.
figure 2

Single pair complex multiplication using fmul, fcmla. Real and imagine points are stored adjacently.

Fig. 3.
figure 3

Two pairs complex multiplication using fmul, fmls, fmla. Real and imagine points are stored separately.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, D.T., Gaj, K. (2023). Fast Falcon Signature Generation and Verification Using ARMv8 NEON Instructions. In: El Mrabet, N., De Feo, L., Duquesne, S. (eds) Progress in Cryptology - AFRICACRYPT 2023. AFRICACRYPT 2023. Lecture Notes in Computer Science, vol 14064. Springer, Cham. https://doi.org/10.1007/978-3-031-37679-5_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-37679-5_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-37678-8

  • Online ISBN: 978-3-031-37679-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics