Options
Improved GROMACS Scaling on Ethernet Switched Clusters
Journal
Recent Advances in Parallel Virtual Machine and Message Passing Interface
ISSN
0302-9743
Date Issued
2006
Author(s)
Kutzner, Carsten
Van der Spoel, David
Fechner, Martin
Lindahl, Erik
Schmitt, Udo W.
Editor(s)
Träff, Jesper Larsson
Mohr, Bernd
Worringen, Joachim
Dongarra, Jack
DOI
10.1007/11846802_57
Abstract
We investigated the prerequisites for decent scaling of the GROMACS 3.3 molecular dynamics (MD) code [1] on Ethernet Beowulf clusters. The code uses the MPI standard for communication between the processors and scales well on shared memory supercomputers like the IBM p690 (Regatta) and on Linux clusters with a high-bandwidth/low latency network. On Ethernet switched clusters, however, the scaling typically breaks down as soon as more than two computational nodes are involved. For an 80k atom MD test system, exemplary speedups Sp N on N CPUs are Sp 8 = 6.2, Sp 16 = 10 on a Myrinet dual-CPU 3 GHz Xeon cluster, Sp 16 = 11 on an Infiniband dual-CPU 2.2 GHz Opteron cluster, and Sp 32 = 21 on one Regatta node. However, the maximum speedup we could initially reach on our Gbit Ethernet 2 GHz Opteron cluster was Sp 4 = 3 using two dual-CPU nodes. Employing more CPUs only led to slower execution (Table 1).