A 32x32x32, spatially distributed 3D FFT in four microseconds on Anton
Citations Over TimeTop 10% of 2009 papers
Abstract
Anton, a massively parallel special-purpose machine for molecular dynamics simulations, performs a 32x32x32 FFT in 3.7 microseconds and a 64x64x64 FFT in 13.3 microseconds on a configuration with 512 nodes---an order of magnitude faster than all other FFT implementations of which we are aware. Achieving this FFT performance requires a coordinated combination of computation and communication techniques that leverage Anton's underlying hardware mechanisms. Most significantly, Anton's communication subsystem provides over 300 gigabits per second of bandwidth per node, message latency in the hundreds of nanoseconds, and support for word-level writes and single-ended communication. In addition, Anton's general-purpose computation system incorporates primitives that support the efficient parallelization of small 1D FFTs. Although Anton was designed specifically for molecular dynamics simulations, a number of the hardware primitives and software implementation techniques described in this paper may also be applicable to the acceleration of FFTs on general-purpose high-performance machines.
Related Papers
- → Scalable implementation of the parallel multigrid method on massively parallel computers(2015)15 cited
- → Structural analysis on massively parallel computers(1991)17 cited
- Scalable implementation of the parallel multigrid method on massively parallel computers(2014)
- ABC: A Blocked C/C++ Parallel Programming Model(1995)
- → Space-charge-dominated beam dynamics simulations using the massively parallel processors (MPPs) of the Cray T3D(1996)