ABSTRACT
We report on a multi-threaded implementation of Fast Fourier Transforms over generalized Fermat prime fields. This work extends a previous study realized on graphics processing units to multi-core processors. In this new context, we overcome the less fine control of hardware resources by successively using FFT in support of the multiplication in those fields. We obtain favorable speedup factors (up to 6.9x on a 6-core, 12 threads node, and 4.3x on a 4-core, 8 threads node) of our parallel implementation compared to the serial implementation for the overall application thanks to the low memory footprint and the sharp control of arithmetic instructions of our implementation of generalized Fermat prime fields.
- Ayaz Ali, Lennart Johnsson, and Jaspal Subhlok. 2007. Scheduling FFT Computation on SMP and Multicore Systems. In Proceedings of the 21st Annual International Conference on Supercomputing (ICS '07). ACM, New York, NY, USA, 293--301. Google ScholarDigital Library
- Elizabeth A. Arnold. 2003. Modular Algorithms for Computing Gröbner Bases. Journal of Symbolic Computation, Vol. 35, 4 (2003), 403--419. Google ScholarDigital Library
- Mohammadali Asadi, Alexander Brandt, Changbo Chen, Svyatoslav Covanov, Farnam Mansouri, Davood Mohajerani, Robert Moir, Marc Moreno Maza, Linxiao Wang, Ning Xie, and Yuzhen Xie. 2019. Basic Polynomial Algebra Subprograms (BPAS). http://www.bpaslib.org.Google Scholar
- Paul Barrett. 1986. Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor. In Conference on the Theory and Application of Cryptographic Techniques. Springer, 311--323. Google ScholarDigital Library
- Liangyu Chen, Svyatoslav Covanov, Davood Mohajerani, and Marc Moreno Maza. 2017. Big Prime Field FFT on the GPU. In Proceedings of the 2017 ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC. 85--92. Google ScholarDigital Library
- Svyatoslav Covanov. 2014. Putting Fürer Algorithm into practice. Technical Report. ORCCA Lab, London.Google Scholar
- Svyatoslav Covanov and Emmanuel Thomé. 2018. Fast integer multiplication using generalized Fermat primes . Mathematics of Computation (2018). https://hal.inria.fr/hal-01108166Google Scholar
- Xavier Dahan, Marc Moreno Maza, É ric Schost, Wenyuan Wu, and Yuzhen Xie. 2005. Lifting techniques for triangular decompositions. In ISSAC 2005, Proceedings, , M. Kauers (Ed.). ACM, 108--115. Google ScholarDigital Library
- Anindya De, Piyush P. Kurur, Chandan Saha, and Ramprasad Saptharishi. 2008. Fast integer multiplication using modular arithmetic. In STOC. 499--506. Google ScholarDigital Library
- Anindya De, Piyush P Kurur, Chandan Saha, and Ramprasad Saptharishi. 2013. Fast Integer Multiplication Using Modular Arithmetic. SIAM J. Comput., Vol. 42, 2 (2013), 685--699.Google ScholarCross Ref
- Franz Franchetti and Markus Püschel. 2011. FFT (Fast Fourier Transform). In Encyclopedia of Parallel Computing . 658--671.Google Scholar
- Franz Franchetti, Yevgen Voronenko, and Markus Pü schel. 2006. Tools and techniques for performance - FFT program generation for shared memory: SMP and multicore. In Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, November 11--17, 2006, Tampa, FL, USA. 115. Google ScholarDigital Library
- Martin Fürer. 2009. Faster Integer Multiplication. SIAM J. Comput., Vol. 39, 3 (2009), 979--1005. Google ScholarDigital Library
- Torbjörn Granlund and the GMP development team. 2012. GNU MP: The GNU Multiple Precision Arithmetic Library 5.0.5 ed.). http://gmplib.org/.Google Scholar
- David Harvey, Joris van der Hoeven, and Grégoire Lecerf. 2017. Faster Polynomial Multiplication over Finite Fields. J. ACM, Vol. 63, 6, Article 52 (Jan. 2017), bibinfonumpages23 pages. Google ScholarDigital Library
- David Harvey, Joris van der Hoeven, and Grégoire Lecerf. 2016a. Even faster integer multiplication. Journal of Complexity, Vol. 36 (2016), 1--30. Google ScholarDigital Library
- David Harvey, Joris van der Hoeven, and Grégoire Lecerf. 2016b. Fast Polynomial Multiplication over F260. In Proceedings of the ACM on International Symposium on Symbolic and Algebraic Computation (ISSAC '16). ACM, New York, NY, USA, 255--262. Google ScholarDigital Library
- Marc Moreno Maza and Yuzhen Xie. 2009. FFT-Based Dense Polynomial Arithmetic on Multi-cores. In HPCS (Lecture Notes in Computer Science), Vol. 5976. Springer, 378--399. Google ScholarDigital Library
- Lingchuan Meng, Yevgen Voronenko, Jeremy R. Johnson, Marc Moreno Maza, Franz Franchetti, and Yuzhen Xie. 2010. Spiral-generated modular FFT algorithms. In PASCO. 169--170. Google ScholarDigital Library
- Peter L. Montgomery. 1985. Modular multiplication without trial division. Mathematics of computation, Vol. 44, 170 (1985), 519--521.Google Scholar
- Arnold Schönhage and Volker Strassen. 1971. Schnelle Multiplikation großer Zahlen. Computing, Vol. 7, 3--4 (1971), 281--292.Google ScholarCross Ref
- Joachim von zur Gathen and Jürgen Gerhard. 2013. Modern Computer Algebra (3. ed.) .Cambridge University Press. Google ScholarDigital Library
Index Terms
- Big Prime Field FFT on Multi-core Processors
Recommendations
Big Prime Field FFT on the GPU
ISSAC '17: Proceedings of the 2017 ACM on International Symposium on Symbolic and Algebraic ComputationWe consider prime fields of large characteristic, typically fitting on $k$ machine words, where k is a power of 2. When the characteristic of these fields is restricted to a subclass of the generalized Fermat numbers, we show that arithmetic operations ...
An Implementation of Parallel 1-D FFT Using AVX Instructions on Multi-core Processors
IWIA '12: Proceedings of the 2012 International Workshop on Innovative Architecture for Future Generation Processors and SystemsIn this paper, we propose an implementation of a parallel one-dimensional fast Fourier transform (FFT) using Intel Advanced Vector Extensions (AVX) instructions on multi-core processors. The combination of vectorization and a block six-step FFT ...
Performance evolution and power benefits of cluster system utilizing quad-core and dual-core Intel Xeon processors
PPAM'07: Proceedings of the 7th international conference on Parallel processing and applied mathematicsMulti-core processors represent an evolutionary change in conventional computing as well setting the new trend for high performance computing. The chip-level multiprocessing architectures with a large number of cores continue to offer dramatically ...
Comments