ABSTRACT
We present an error-controlled, highly scalable FMM implementation for long-range interactions of particle systems with open, 1D, 2D and 3D periodic boundary conditions. We highlight three aspects of fast summation codes not fully addressed in most articles; namely memory consumption, error control and runtime minimization. The aim of this poster is to contribute to all of these three points in the context of modern large scale parallel machines. Especially the used data structures, the parallelization approach and the precision-dependent parameter optimization will be discussed.
The current code is able to compute all mutual long-range interactions of more than three trillion particles on 294.912 BG/P cores within a few minutes for an expansion up to quadrupoles. The maximum memory footprint of such a computation has been reduced to less than 45 Bytes per particle. The code employs a one-sided, non-blocking parallelization approach with a small communication overhead.
Supplemental Material
Available for Download
- J. Barnes and P. Hut. A hierarchical O(N log N) force-calculation algorithm. Nature, 324:446--449, 1986.Google ScholarCross Ref
- BMBF Project 01 IH 08001 A-D. ScaFaCoS -- Scalable Fast Coulomb Solver. http://www.fz-juelich.de/jsc/scafacos, Apr. 2011.Google Scholar
- H. Dachsel. An error-controlled Fast Multipole Method. J. Chem. Phys., 132(11):119901, 2010.Google ScholarCross Ref
- J. W. Eastwood, R. W. Hockney, and D. N. Lawrence. P3M3DP--the three-dimensional periodic particle-particle/ particle-mesh program. Comput. Phys. Commun., 19(2):215--261, 1980.Google ScholarCross Ref
- T. C. Germann and K. Kadau. Trillion-atom molecular dynamics becomes a reality. Int. J. Mod. Phys. C, 19(9):1315--1319, 2008.Google ScholarCross Ref
- L. Greengard and V. Rokhlin. A fast algorithm for particle simulations. J. Comput. Phys., 73(2):325--348, 1987. Google ScholarDigital Library
- A. Rahimian, I. Lashuk, S. Veerapaneni, A. Chandramowlishwaran, D. Malhotra, L. Moon, R. Sampath, A. Shringarpure, J. Vetter, R. Vuduc, D. Zorin, and G. Biros. Petascale direct numerical simulation of blood flow on 200k cores and heterogeneous architectures. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC'10, pages 1--11, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarDigital Library
- D. F. Richards, J. N. Glosli, B. Chan, M. R. Dorr, E. W. Draeger, J.-L. Fattebert, W. D. Krauss, T. Spelce, F. H. Streitz, M. P. Surh, and J. A. Gunnels. Beyond homogeneous decomposition: scaling long-range forces on massively parallel systems. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pages 60:1--60:12, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- D. Solvason and H. G. Petersen. Error estimates for the fast multipole method. J. Stat. Phys., 86:391--420, 1997.Google ScholarCross Ref
- V. Springel. Simulating the joint evolution of quasars, galaxies and their large-scale distribution. Nature, 435:629, 2005.Google ScholarCross Ref
- V. Springel. The millennium-XXL project: Simulating the galaxy population of dark energy universes. inSiDE, 8(2):20--28, 2010.Google Scholar
Index Terms
- Poster: Passing the three trillion particle limit with an error-controlled fast multipole method
Recommendations
MPPs and clusters for scalable computing
ISPAN '96: Proceedings of the 1996 International Symposium on Parallel Architectures, Algorithms and NetworksThis article assess the state-of-the-art technology in massively parallel processors (MPPs) and clusters of workstations (COWs) for scalable parallel computing. We evaluate the IBM SP2, the Intel Paragon, the Cray T3D/T3E, and the ASCI TeraFLOPS system ...
Tools-supported HPF and MPI parallelization of the NAS parallel benchmarks
FRONTIERS '96: Proceedings of the 6th Symposium on the Frontiers of Massively Parallel ComputationHigh Performance Fortran (HPF) compilers and communication libraries with the standardized Message Passing Interface (MPI) are becoming widely available, easing the development of portable parallel applications. The Annai tool environment supports ...
A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems
Among the algorithms that are likely to play a major role in future exascale computing, the fast multipole method (fmm) appears as a rising star. Our previous recent work showed scaling of an fmm on gpu clusters, with problem sizes of the order of ...
Comments