Skip to main content
Log in

MPFFT: An Auto-Tuning FFT Library for OpenCL GPUs

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Fourier methods have revolutionized many fields of science and engineering, such as astronomy, medical imaging, seismology and spectroscopy, and the fast Fourier transform (FFT) is a computationally efficient method of generating a Fourier transform. The emerging class of high performance computing architectures, such as GPU, seeks to achieve much higher performance and efficiency by exposing a hierarchy of distinct memories to software. However, the complexity of GPU programming poses a significant challenge to developers. In this paper, we propose an automatic performance tuning framework for FFT on various OpenCL GPUs, and implement a high performance library named MPFFT based on this framework. For power-of-two length FFTs, our library substantially outperforms the clAmdFft library on AMD GPUs and achieves comparable performance as the CUFFT library on NVIDIA GPUs. Furthermore, our library also supports non-power-of-two size. For 3D non-power-of-two FFTs, our library delivers 1.5x to 28x faster than FFTW with 4 threads and 20.01x average speedup over CUFFT 4.0 on Tesla C2050.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Duhamel P, Vetterli M. Fast fourier transforms: A tutorial review and a state of the art. Signal Processing, 1990, 9(14): 259-299.

    Article  MathSciNet  Google Scholar 

  2. Govindaraju N K, Lloyd B, Dotsenko Y, Smith B, Manferdelli J. High performance discrete Fourier transforms on graphics processors. In Proc. SC, Nov. 2008, Article No.2.

  3. Nukada A, Matsuoka S. Auto-tuning 3-D FFT library for CUDA GPUs. In Proc. SC, Nov. 2009, Article No.30.

  4. Dotsenko Y, Baghsorkhi S S, Lloyd B, Govindaraju N K. Auto-tuning of fast Fourier transform on graphics processors. In Proc PPoPP, Feb. 2011, pp.257-266.

  5. Gu L, Li X M, Siegel J. An empirically tuned 2D and 3D FFT library on CUDA GPU. In Proc. the 24th ICS, June 2010, pp.305-314.

  6. Gaster B, Howes L, Kaeli D R, Mistry P, Schaa D. Heterogeneous Computing with OpenCL. San Fransisco, USA: Morgan Kaufmann, 2011.

    Google Scholar 

  7. Munshi A, Gaster B, Mattson T G, Fung J, Ginsburg D. OpenCL Programming Guide. Boston, USA: Addison-Wesley Professional, 2011.

    Google Scholar 

  8. Zhang E Z, Jiang Y L, Guo Z Y, Shen X P. Streamlining GPU applications on the fly: Thread divergence elimination through runtime thread-data remapping. In Proc. the 24th ICS, June 2010, pp.115-126.

  9. Yang Y, Xiang P, Kong J F, Zhou H Y. A GPGPU compiler for memory optimization and parallelism management. In Proc. PLDI, June 2010, pp.86-97.

  10. Cooley J W, Tukey J W. An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation, 1965, 19: 297-301.

    Article  MathSciNet  MATH  Google Scholar 

  11. Van Loan C. Computational Frameworks for the Fast Fourier Transform. Philadelphia, USA: SIAM, 1992.

    Book  MATH  Google Scholar 

  12. Johnson J, Johnson R W, Rodriguez D, Tolimieri R. A methodology for designing, modifying, and implementing Fourier transform algorithms on various architectures. Circuits, Systems and Signal Processing, 1990, 9(4): 449-500.

    Article  MathSciNet  MATH  Google Scholar 

  13. Franchetti F, PÄuschel M, Voronenko Y, Chellappa S, Moura J M F. Discrete Fourier transform on multicore. IEEE Signal Processing Magazine, 2009, 26(6): 90-102.

    Article  Google Scholar 

  14. Tolimieri R, An M, Lu C. Algorithms for Discrete Fourier Transforms and Convolution. Berlin: Springer-Verlag, 1989.

    Google Scholar 

  15. NVIDIA Corporation. NVIDIA CUDA compute unified device architecture − Programming guide version 4.2. 2008, http://developer.nvidia.com/cuda/nvidia-gpu-computing-documentation, Sept. 2012.

  16. Ji F, Ma X S. Using shared memory to accelerate MapReduce on graphics processing units. In Proc. IPDPS, May 2011, pp.805-816.

  17. Jiao Y, Lin H, Balaji P, Feng W. Power and performance characterization of computational kernels on the GPU. In Proc. GreenCom-CPSCOM, Dec. 2010, pp.221-228.

  18. Baghsorkhi S S, Delahaye M, Patel S J, Gropp W D, Hwu W W. An adaptive performance modeling tool for GPU architectures. In Proc. the 15th PPoPP, May 2010, pp.105-114.

  19. Schwarztrauber P N. Multiprocessor FFTs. Parallel Computing, 1987, 5: 197-210.

    Article  Google Scholar 

  20. Frigo M, Johnson S G. The design and implementation of FFTW3. In Proceedings of the IEEE, 2005, 93(2): 216-231.

  21. Frigo M. A fast Fourier transform compiler. In Proc. PLDI, May 1999, pp.169-180.

  22. Mesmay F, Franchetti F, Voronenko Y. Encyclopedia of Parallel Computing. Berlin: Springer, 2011.

    Google Scholar 

  23. de Mesmay F, Voronenko Y, PÄuschel M. Offline library adaptation using automatically generated heuristics. In Proc. IPDPS, Apr. 2010, pp.1-10.

  24. Kirk D B, Hwu W W. Programming Massively Parallel Processors: A Hands-on Approach. San Fransisco, USA: Morgan Kaufmann, 2010.

    Google Scholar 

  25. Purnomo B, Rubin N, Houston M. ATI stream profiler: A tool to optimize an OpenCL kernel on ATI radeon GPUs. In Proc. SIGGRAPH, July 2011, pp.26-30.

  26. Mirkovic D, Johnsson S L. Automatic performance tuning in the UHFFT library. In Proc. ICCS, May 2001, pp.71-80.

  27. Ali A, Johnsson L, Mirkovic D. Empirical auto-tuning code generator for FFT and trigonometric transforms. In Proc. ODES, Mar. 2007.

  28. Mirkovic D, Mahasoom R, Johnsson L. An adaptive software library for fast Fourier transforms. In Proc. the 14th ICS, May 2000, pp.215-224.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun-Quan Zhang.

Additional information

This work is supported in partial by the National Natural Science Foundation of China under Grant Nos. 61133005, 61272136, 61100073, 61100066, the National High Technology Research and Development 863 Program of China under Grant Nos. 2012AA010902, 2012AA010903, and the Chinese Academy of Sciences Special Grant for Postgraduate Research, Innovation and Practice.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(DOC 28 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Zhang, YQ., Liu, YQ. et al. MPFFT: An Auto-Tuning FFT Library for OpenCL GPUs. J. Comput. Sci. Technol. 28, 90–105 (2013). https://doi.org/10.1007/s11390-013-1314-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-013-1314-8

Keywords

Navigation