This manual documents version 2.1.5 of FFTW, the *Fastest
Fourier Transform in the West*. FFTW is a comprehensive collection of
fast C routines for computing the discrete Fourier transform (DFT) in
one or more dimensions, of both real and complex data, and of arbitrary
input size. FFTW also includes parallel transforms for both shared- and
distributed-memory systems. We assume herein that the reader is already
familiar with the properties and uses of the DFT that are relevant to
her application. Otherwise, see e.g. The Fast Fourier Transform
by E. O. Brigham (Prentice-Hall, Englewood Cliffs, NJ, 1974).
Our web page also has links to
FFT-related information online.

FFTW is usually faster (and sometimes much faster) than all other
freely-available Fourier transform programs found on the Net. For
transforms whose size is a power of two, it compares favorably with the
FFT codes in Sun's Performance Library and IBM's ESSL library, which are
targeted at specific machines. Moreover, FFTW's performance is
*portable*. Indeed, FFTW is unique in that it automatically adapts
itself to your machine, your cache, the size of your memory, the number
of registers, and all the other factors that normally make it impossible
to optimize a program for more than one machine. An extensive
comparison of FFTW's performance with that of other Fourier transform
codes has been made. The results are available on the Web at
the benchFFT home page.

In order to use FFTW effectively, you need to understand one basic
concept of FFTW's internal structure. FFTW does not used a fixed
algorithm for computing the transform, but it can adapt the DFT
algorithm to details of the underlying hardware in order to achieve best
performance. Hence, the computation of the transform is split into two
phases. First, FFTW's *planner* is called, which "learns" the
fastest way to compute the transform on your machine. The planner
produces a data structure called a *plan* that contains this
information. Subsequently, the plan is passed to FFTW's *executor*,
along with an array of input data. The executor computes the actual
transform, as dictated by the plan. The plan can be reused as many
times as needed. In typical high-performance applications, many
transforms of the same size are computed, and consequently a
relatively-expensive initialization of this sort is acceptable. On the
other hand, if you need a single transform of a given size, the one-time
cost of the planner becomes significant. For this case, FFTW provides
fast planners based on heuristics or on previously computed plans.

The pattern of planning/execution applies to all four operation modes of FFTW, that is, I) one-dimensional complex transforms (FFTW), II) multi-dimensional complex transforms (FFTWND), III) one-dimensional transforms of real data (RFFTW), IV) multi-dimensional transforms of real data (RFFTWND). Each mode comes with its own planner and executor.

Besides the automatic performance adaptation performed by the planner,
it is also possible for advanced users to customize FFTW for their
special needs. As distributed, FFTW works most efficiently for arrays
whose size can be factored into small primes (2, 3,
5, and 7), and uses a slower general-purpose routine for
other factors. FFTW, however, comes with a code generator that can
produce fast C programs for any particular array size you may care
about.
For example, if you need transforms of size
513 = 19*3^{3},
you can customize FFTW to support the factor 19 efficiently.

FFTW can exploit multiple processors if you have them. FFTW comes with
a shared-memory implementation on top of POSIX (and similar) threads, as
well as a distributed-memory implementation based on MPI.
We also provide an experimental parallel implementation written in Cilk,
*the superior programming tool of choice for discriminating
hackers* (Olin Shivers). (See the Cilk home page.)

For more information regarding FFTW, see the paper, "The Fastest
Fourier Transform in the West," by M. Frigo and S. G. Johnson, which is
the technical report MIT-LCS-TR-728 (Sep. '97). See also, "FFTW: An
Adaptive Software Architecture for the FFT," by M. Frigo and
S. G. Johnson, which appeared in the 23rd International Conference on
Acoustics, Speech, and Signal Processing (Proc. ICASSP 1998
**3**, p. 1381). The code generator is described in the paper "A Fast
Fourier Transform Compiler",
by M. Frigo, to appear in the Proceedings of the 1999 ACM SIGPLAN
Conference on Programming Language Design and Implementation (PLDI),
Atlanta, Georgia, May 1999. These papers, along with the latest
version of FFTW, the FAQ, benchmarks, and other links, are available at
the FFTW home page. The current
version of FFTW incorporates many good ideas from the past thirty years
of FFT literature. In one way or another, FFTW uses the Cooley-Tukey
algorithm, the Prime Factor algorithm, Rader's algorithm for prime
sizes, and the split-radix algorithm (with a variation due to Dan
Bernstein). Our code generator also produces new algorithms that we do
not yet completely understand.
The reader is referred to the cited papers for the appropriate
references.

The rest of this manual is organized as follows. We first discuss the sequential (one-processor) implementation. We start by describing the basic features of FFTW in Section Tutorial. This discussion includes the storage scheme of multi-dimensional arrays (Section Multi-dimensional Array Format) and FFTW's mechanisms for storing plans on disk (Section Words of Wisdom). Next, Section FFTW Reference provides comprehensive documentation of all FFTW's features. Parallel transforms are discussed in their own chapter Section Parallel FFTW. Fortran programmers can also use FFTW, as described in Section Calling FFTW from Fortran. Section Installation and Customization explains how to install FFTW in your computer system and how to adapt FFTW to your needs. License and copyright information is given in Section License and Copyright. Finally, we thank all the people who helped us in Section Acknowledgments.

Go to the first, previous, next, last section, table of contents.