skip to main content
10.1145/301618.301661acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
Article
Free Access

A fast Fourier transform compiler

Authors Info & Claims
Published:01 May 1999Publication History

ABSTRACT

The FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performance-critical code was generated automatically by a special-purpose compiler, called genfft, that outputs C code. Written in Objective Caml, genfft can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, genfft "discovered" algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this special-purpose compiler in some detail, and it argues that a specialized compiler is a valuable tool.

References

  1. ACT90.Myoung An, James W. Cooley, and Richard Tolimieri. Factofization method for crystallographic Fourier transforms. Advances in Applied Mathematics, 11:358-371, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ASU86.Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers, principles, techniques, and tools. Addison- Wesley, March 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. AV88.Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116-1127, September 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BFJ+96.Robert D. Blumofe, Matteo Frigo, Chrisopher F. Joerg, Charles E. Leiserson, and Keith H. Randall. An analysis of dag-consistent distributed shared-memory algorithms. In Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), pages 297-308, Padua, italy, June 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. CO75.R.E. Crochiere and A. V. Oppenheim. Analysis of linear digital networks. Proceedings of the IEEE, 63:581-595, April 1975.Google ScholarGoogle ScholarCross RefCross Ref
  6. CT65.J.W. Cooley and J. W. 'Ihkey. An algorithm for the machine computation of the complex Fourier series. Mathematics of Computation, 19:297-301, April 1965.Google ScholarGoogle ScholarCross RefCross Ref
  7. DV90.P. Duhamel and M. Vettefii. Fast Fourier transforms: a tutorial review and a state of the art. Signal Processing, 19:259-299, April 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. FJ.Matteo Frigo and Steven G. Johnson. The FFTW web page. http://theory, lcs .air. edu/'fftw.Google ScholarGoogle Scholar
  9. FJ97.Matteo Frigo and Steven G. Johnson. The fastest Fourier transform in the West. Technical Report MIT-LCS-TR- 728, MIT Lab for Computer Science, September 1997. The description of the codelet generator given in this report is no longer current. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. FJ98.Matteo Frigo and Steven G. Johnson. FFTW: An adaptive software architecture for the FFT, In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 3, pages 1381- 1384, Seattle, WA, May 1998.Google ScholarGoogle ScholarCross RefCross Ref
  11. FLR98.Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIG- PLAN '98 Conference on Programming Language Design and Implementation (PLDI), pages 212-223, Montreal, Canada, June 1998. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. GHSJ96.S. K. S. Gupta, C.-H. Huang, P. Sadayappan, and R. W. Johnson. A framework for generating distributedmemory parallel programs for block recursive algorithms. Journal of Parallel and Distributed Computing, 34(2):137-153, 1 May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. HK81.Jia-Wei Hong and H. T. Kung. I/O complexity: the red-blue pebbling game. In Proceedings of the Thirteenth Annual A CM Symposium on Theory of Computing, pages 326-333, Milwaukee, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. HV92.P.H. Hartel and W. G. Vree. Arrays in a lazy functional language---a case study: the fast Fourier transform. In G. Hains and L. M. R. Mullin, editors, Arrays, functional languages, and parallel systems (ATABLE), pages 52-66, June 1992.Google ScholarGoogle Scholar
  15. JB83.H.W. Johnson and C. S. Bums. The design of optimal DFT algorithms using dynamic programming. IEEE Transactions on Acoustics, Speech and Signal Processing, 31:378-387, April 1983.Google ScholarGoogle ScholarCross RefCross Ref
  16. Knu98.Donald E. Knuth. The Art of Computer Programming, volume 2 (Seminumerical Algorithms). Addison- Wesley, 3rd edition, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kul95.Joanna L. Kulik. Implementing compiler optimizations using parallel graph reduction. Master's thesis, Massachussets Institute of Technology, February 1995.Google ScholarGoogle Scholar
  18. Ler98.Xavier Leroy. The Objective Carol system release 2.00. Institut National de Recherche en Informatique at Automatique (INRIA), August 1998.Google ScholarGoogle Scholar
  19. Mar76.J.A. Maruhn. FOURGEN: a fast Fourier transform program generator. Computer Physics Communications, 12:147-162, 1976.Google ScholarGoogle ScholarCross RefCross Ref
  20. Muc97.Steven S. Muchnick. Advanced Compiler Design Implementation. Morgan Kaufmann, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. OS89.A.V. Oppenheim and R. W. Schafer. Discrete-time Signal Processing. Prentice-Hall, Englewood Cliffs, NJ 07632, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Par92.Will Partain. The nofib benchmark suite of Haskell programs. In J. Launchbury and P. M. Sansom, editors, Functional Programming, Workshops in Computing, pages 195-202. Springer Verlag, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. PT87.F. Perez and T. Takaoka. A prime factor FF'T algorithm implementation using a program generation technique. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(8): 1221-1223, August 1987.Google ScholarGoogle ScholarCross RefCross Ref
  24. Rad68.C.M. Rader. Discrete Fourier transforms when the number of data samples is prime. Proc. of the IEEE, 56:1107-1108, June 1968.Google ScholarGoogle ScholarCross RefCross Ref
  25. SB96.i. Selesnick and C. S. Burrus. Automatic generation of prime length FFr programs. IEEE Transactions on Signal Processing, pages 14-24, January 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. SJHB87.H. V. Sorensen, D. L. Jones, M. T. Heideman, and C. S. Burrus. Real-valued fast Fourier transform algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-35(6):849-863, June 1987.Google ScholarGoogle Scholar
  27. TAL97.Richard Tolimieri, Myoung An, and Chao Lu. Algorithms for Discrete Fourier Transform and Convolution. Springer Verlag, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  28. Vel95.Todd Veldhuizen. Using C++ template metaprograms. C++ Report, 7(4):36-43, May 1995. Reprinted in C++ Gems, ed. Stanley Lippman. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. VS94a.J.S. Vitter and E. A. M. Shriver. Optimal algorithms for parallel memory I: Two-level memories. Algorithmica, 12(2-3):110-147, 1994. double special issue on Large- Scale Memories.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. VS94b.J.S. Vitter and E. A. M. Shriver. Optimal algorithms for parallel memory II: Hierarchical multilevel memories. Algorithrnica, 12(2-3):148-169, 1994. double special issue on Large-Scale Memories.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Wad97.Philip Wadler. How to declare an imperative. A CM Computing Surveys, 29(3):240-263, September 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Win78.S. Winograd. On computing the discrete Fourier transform. Mathematics of Computation, 32(1):175-199, January 1978.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A fast Fourier transform compiler

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PLDI '99: Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
        May 1999
        304 pages
        ISBN:1581130945
        DOI:10.1145/301618

        Copyright © 1999 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 May 1999

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        PLDI '99 Paper Acceptance Rate26of130submissions,20%Overall Acceptance Rate406of2,067submissions,20%

        Upcoming Conference

        PLDI '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader