ABSTRACT
The FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries supplied by vendors). In FFTW, most of the performance-critical code was generated automatically by a special-purpose compiler, called genfft, that outputs C code. Written in Objective Caml, genfft can produce DFT programs for any input length, and it can specialize the DFT program for the common case where the input data are real instead of complex. Unexpectedly, genfft "discovered" algorithms that were previously unknown, and it was able to reduce the arithmetic complexity of some other existing algorithms. This paper describes the internals of this special-purpose compiler in some detail, and it argues that a specialized compiler is a valuable tool.
- ACT90.Myoung An, James W. Cooley, and Richard Tolimieri. Factofization method for crystallographic Fourier transforms. Advances in Applied Mathematics, 11:358-371, 1990. Google ScholarDigital Library
- ASU86.Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers, principles, techniques, and tools. Addison- Wesley, March 1986. Google ScholarDigital Library
- AV88.Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116-1127, September 1988. Google ScholarDigital Library
- BFJ+96.Robert D. Blumofe, Matteo Frigo, Chrisopher F. Joerg, Charles E. Leiserson, and Keith H. Randall. An analysis of dag-consistent distributed shared-memory algorithms. In Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA), pages 297-308, Padua, italy, June 1996. Google ScholarDigital Library
- CO75.R.E. Crochiere and A. V. Oppenheim. Analysis of linear digital networks. Proceedings of the IEEE, 63:581-595, April 1975.Google ScholarCross Ref
- CT65.J.W. Cooley and J. W. 'Ihkey. An algorithm for the machine computation of the complex Fourier series. Mathematics of Computation, 19:297-301, April 1965.Google ScholarCross Ref
- DV90.P. Duhamel and M. Vettefii. Fast Fourier transforms: a tutorial review and a state of the art. Signal Processing, 19:259-299, April 1990. Google ScholarDigital Library
- FJ.Matteo Frigo and Steven G. Johnson. The FFTW web page. http://theory, lcs .air. edu/'fftw.Google Scholar
- FJ97.Matteo Frigo and Steven G. Johnson. The fastest Fourier transform in the West. Technical Report MIT-LCS-TR- 728, MIT Lab for Computer Science, September 1997. The description of the codelet generator given in this report is no longer current. Google ScholarDigital Library
- FJ98.Matteo Frigo and Steven G. Johnson. FFTW: An adaptive software architecture for the FFT, In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 3, pages 1381- 1384, Seattle, WA, May 1998.Google ScholarCross Ref
- FLR98.Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIG- PLAN '98 Conference on Programming Language Design and Implementation (PLDI), pages 212-223, Montreal, Canada, June 1998. ACM. Google ScholarDigital Library
- GHSJ96.S. K. S. Gupta, C.-H. Huang, P. Sadayappan, and R. W. Johnson. A framework for generating distributedmemory parallel programs for block recursive algorithms. Journal of Parallel and Distributed Computing, 34(2):137-153, 1 May 1996. Google ScholarDigital Library
- HK81.Jia-Wei Hong and H. T. Kung. I/O complexity: the red-blue pebbling game. In Proceedings of the Thirteenth Annual A CM Symposium on Theory of Computing, pages 326-333, Milwaukee, 1981. Google ScholarDigital Library
- HV92.P.H. Hartel and W. G. Vree. Arrays in a lazy functional language---a case study: the fast Fourier transform. In G. Hains and L. M. R. Mullin, editors, Arrays, functional languages, and parallel systems (ATABLE), pages 52-66, June 1992.Google Scholar
- JB83.H.W. Johnson and C. S. Bums. The design of optimal DFT algorithms using dynamic programming. IEEE Transactions on Acoustics, Speech and Signal Processing, 31:378-387, April 1983.Google ScholarCross Ref
- Knu98.Donald E. Knuth. The Art of Computer Programming, volume 2 (Seminumerical Algorithms). Addison- Wesley, 3rd edition, 1998. Google ScholarDigital Library
- Kul95.Joanna L. Kulik. Implementing compiler optimizations using parallel graph reduction. Master's thesis, Massachussets Institute of Technology, February 1995.Google Scholar
- Ler98.Xavier Leroy. The Objective Carol system release 2.00. Institut National de Recherche en Informatique at Automatique (INRIA), August 1998.Google Scholar
- Mar76.J.A. Maruhn. FOURGEN: a fast Fourier transform program generator. Computer Physics Communications, 12:147-162, 1976.Google ScholarCross Ref
- Muc97.Steven S. Muchnick. Advanced Compiler Design Implementation. Morgan Kaufmann, 1997. Google ScholarDigital Library
- OS89.A.V. Oppenheim and R. W. Schafer. Discrete-time Signal Processing. Prentice-Hall, Englewood Cliffs, NJ 07632, 1989. Google ScholarDigital Library
- Par92.Will Partain. The nofib benchmark suite of Haskell programs. In J. Launchbury and P. M. Sansom, editors, Functional Programming, Workshops in Computing, pages 195-202. Springer Verlag, 1992. Google ScholarDigital Library
- PT87.F. Perez and T. Takaoka. A prime factor FF'T algorithm implementation using a program generation technique. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(8): 1221-1223, August 1987.Google ScholarCross Ref
- Rad68.C.M. Rader. Discrete Fourier transforms when the number of data samples is prime. Proc. of the IEEE, 56:1107-1108, June 1968.Google ScholarCross Ref
- SB96.i. Selesnick and C. S. Burrus. Automatic generation of prime length FFr programs. IEEE Transactions on Signal Processing, pages 14-24, January 1996. Google ScholarDigital Library
- SJHB87.H. V. Sorensen, D. L. Jones, M. T. Heideman, and C. S. Burrus. Real-valued fast Fourier transform algorithms. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-35(6):849-863, June 1987.Google Scholar
- TAL97.Richard Tolimieri, Myoung An, and Chao Lu. Algorithms for Discrete Fourier Transform and Convolution. Springer Verlag, 1997.Google ScholarCross Ref
- Vel95.Todd Veldhuizen. Using C++ template metaprograms. C++ Report, 7(4):36-43, May 1995. Reprinted in C++ Gems, ed. Stanley Lippman. Google ScholarDigital Library
- VS94a.J.S. Vitter and E. A. M. Shriver. Optimal algorithms for parallel memory I: Two-level memories. Algorithmica, 12(2-3):110-147, 1994. double special issue on Large- Scale Memories.Google ScholarDigital Library
- VS94b.J.S. Vitter and E. A. M. Shriver. Optimal algorithms for parallel memory II: Hierarchical multilevel memories. Algorithrnica, 12(2-3):148-169, 1994. double special issue on Large-Scale Memories.Google ScholarDigital Library
- Wad97.Philip Wadler. How to declare an imperative. A CM Computing Surveys, 29(3):240-263, September 1997. Google ScholarDigital Library
- Win78.S. Winograd. On computing the discrete Fourier transform. Mathematics of Computation, 32(1):175-199, January 1978.Google ScholarCross Ref
Index Terms
- A fast Fourier transform compiler
Recommendations
A fast Fourier transform compiler
20 Years of the ACM SIGPLAN Conference on Programming Language Design and Implementation 1979-1999: A SelectionThe FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries ...
A fast Fourier transform compiler
The FFTW library for computing the discrete Fourier transform (DFT) has gained a wide acceptance in both academia and industry, because it provides excellent performance on a variety of machines (even competitive with or faster than equivalent libraries ...
Split vector-radix fast Fourier transform
The split-radix approach for computing the discrete Fourier transform (DFT) is extended for the vector-radix fast Fourier transform (FFT) to two and higher dimensions. It is obtained by further splitting the ( N /2× N /2) transforms with twiddle factors ...
Comments