Improved GROMACS Scaling on Ethernet Switched Clusters

Kutzner, Carsten; Van der Spoel, David; Fechner, Martin; Lindahl, Erik; Schmitt, Udo W.; Groot, Bert L. de; Grubmüller, Helmut

doi:10.1007/11846802_57

Improved GROMACS Scaling on Ethernet Switched Clusters

Journal

Recent Advances in Parallel Virtual Machine and Message Passing Interface

ISSN

0302-9743

Date Issued

2006

Author(s)

Kutzner, Carsten

Van der Spoel, David

Fechner, Martin

Lindahl, Erik

Schmitt, Udo W.

Groot, Bert L. de

Grubmüller, Helmut

Editor(s)

Träff, Jesper Larsson

Mohr, Bernd

Worringen, Joachim

Dongarra, Jack

DOI

10.1007/11846802_57

Abstract

We investigated the prerequisites for decent scaling of the GROMACS 3.3 molecular dynamics (MD) code [1] on Ethernet Beowulf clusters. The code uses the MPI standard for communication between the processors and scales well on shared memory supercomputers like the IBM p690 (Regatta) and on Linux clusters with a high-bandwidth/low latency network. On Ethernet switched clusters, however, the scaling typically breaks down as soon as more than two computational nodes are involved. For an 80k atom MD test system, exemplary speedups Sp N on N CPUs are Sp 8 = 6.2, Sp 16 = 10 on a Myrinet dual-CPU 3 GHz Xeon cluster, Sp 16 = 11 on an Infiniband dual-CPU 2.2 GHz Opteron cluster, and Sp 32 = 21 on one Regatta node. However, the maximum speedup we could initially reach on our Gbit Ethernet 2 GHz Opteron cluster was Sp 4 = 3 using two dual-CPU nodes. Employing more CPUs only led to slower execution (Table 1).

google-scholar

Views

Downloads

Options

Improved GROMACS Scaling on Ethernet Switched Clusters