[MITgcm-support] Optimal System/CPUtype for MITgcm?
Martin.Losch at awi.de
Wed Nov 19 09:09:33 EST 2008
I have been meaning to reply to this email for a long time, but never
got around to it. As far as I can see there has been no feedback
whatsoever, so I'll be the first, although I am by no means a
This is my experience:
The MITgcm code has (as I recently learned) a relatively low
"computational intensity" (FLOPS/memory access); incidentally, that
seems to be true for any fluid dynamics code, and is to some extend
related to the type of equations that we are solving. It is intented
for both parallel and vector architecture machines.
For a good performance you obviously need fast cpus (with fast memory
access) and reasonable bandwidth between cpus.
- A machine where this was realized to a high degree is our Cray XD1
with amd (single core) opteron cpus. There I could not notice any
overhead for applications ranging from 1 to 64 cpus (or machine does
not have more).
- IBM p690, P4 work very well, too, as long as the individual nodes
have a fast connection. We have small IBM where jobs scale linear up
to 8CPU on one node, but do not scale at all across nodes. For very
MPI-intensive jobs (may passive tracers) I had to modify the way the
exchanges are handled for ptracers even on the best configuration.
- I get by far the best throughput on our NEC-SX8-R (vector)
computer. My best results are on the order of 7.5GFLOPS on 1CPU and
5GFLOPS/CPU on 24CPU (3nodes), so 120GFLOPS (I am sure that Jens-Olaf
will correct me on these numbers)
- recently, we've had a bad experience on an Altix with Intel
quadcore cpus (see this thread: <http://forge.csail.mit.edu/pipermail/
mitgcm-support/2008-October/005731.html> and the following postings).
In the end I never closed the thread, but these are conclusions: On
that particular machine there were two problems: 1. the connection of
nodes was slow (not sure whether this is related to hardward or MPI
implementation). 2. the memory bandwidth of the quadcore chips is not
sufficient to hand 4cores, so that the memory bandwidth intensive
MITgcm (see "computational intensity" above), does not scale when you
use more than 2cores per chip. This observation is very similar to
what Jeff Blundell found: <http://forge.csail.mit.edu/pipermail/
My conclusions are: If I were ever responsible for buying a large
computer to run something like the MITgcm on (not very likely that
someone will trust me with that), I would try to avoid Mulitcore-
chips, and take care that the connection between cpus is fast.
On 5 Nov 2008, at 19:59, m. r. schaferkotter wrote:
> of late we/ve been less than enthused about the MITgcm performance
> on Cray XT4 Opteron quadcore system.
> given the opportunity of choice of system/cputype pair, i would be
> most interested in community views regarding
> the model performance on the system/cputype pairs below:
> which system/cputype would give best performance (speed)?
> which system/cputype should be avoided?
> SYSTEM CPUtype
> HP XC Opteron
> SGI Altix 3700 Itanium 2
> SGI Altix 4700 Dual-Core Intel 2
> IBM Cluster 1350 Opteron
> Linux Networx ATC Intel Dempsey
> Linux Networx ATC Intel Woodcrest
> Linux Networx Evolocity II Intel Xeon EMT64T
> Linux Networx LS-V Opteron
> Sun Fire X2200 Opteron
> Sun Fire X4600 Opteron
> Cray XT3 Opteron
> Crat XT4 Opteron
> Dell PowerEdge 1955 Intel Woodcrest
> Cray XD1 Opteron
> IBM P5+ Power5+ Opteron
> Cray XT5 Opteron
> IBM Power6 Power6
> IBM P575+ Power5+
> m. r. schaferkotter
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
More information about the MITgcm-support