[MITgcm-support] dual core Opteron
Constantinos Evangelinos
ce107 at ocean.mit.edu
Tue Mar 6 11:49:26 EST 2007
On Tue 06 Mar 2007 09:37, Michael A. Spall wrote:
> I am currently running MITgcm on a Microway cluster with
> Opteron 248s (2.2 GHz, 1MB Cache, 2Gb RAM). I am thinking
> of adding some nodes using dual core Opteron. Does anyone have
> any experience on the scaling of MITgcm from single core to dual
> core processors? I have only Gigabit ethernet. I tend to use only
> 4 processors / job and run several jobs at once since it does not
> scale very well beyond 4 processors due the the Gigabit ethernet
> limitation.
In my testing on an Opteron dual core quad socket platform (total of 8
processors per system) I could see a speed advantage from dual core.
Specifically, going from running on 4 cpus bound to different sockets (that
means suffering no contention reading/writing to their local memory where
most if not all of the arrays they use should be located) to using all 8 cpus
(and suffering contention for local memory access as the two cores compete
for memory bandwidth). That speed advantage however was not double but more
like going from 1.85 secs/timestep to 1.4 secs/timestep (on an 1x1 degree
ECCO forward run setup).
Now in the special case of wanting to run 4 processor runs, using dual core
dual socket systems would mean that all communication would be using shared
memory within the node so there would be a speed advantage for communications
(vis-a-vis Gigabit Ethernet) to offset the memory contention issues. Who
would win I can't tell without more testing. On the other hand, if you start
using 8 processor runs, using 2 nodes and all 4 cores on each node I would
expect you to fare better than using your current setup.
This may all be a moot point as soon there won't be single core AMD (or any
Intel) systems to buy anyway.
What I would definitely suggest to you to improve your Gigabit Ethernet
performance is to invest (time and/or money) in a low latency MPI
implementation for Gigabit Ethernet.
You have a choice between the free options:
a) GAMMA (very low latency for E1000 Intel Gigabit cards, less so for
Broadcomm Tigon3 chipsets) http://www.disi.unige.it/project/gamma/
This requires a dedicated Gigabit Interface for GAMMA so it means that you
need to have two networks per node.
b) SCore (http://www.pccluster.org), again works better with E1000 and Tigon3
chipsets
c) Parastation4 (http://www.cluster-competence-center.com/), works better with
the same chipsets as the others, offered for free with no support, support is
offered on a contract basis.
and the for pay options:
d) Scali MPI Connect (http://www.scali.com/)
I'm not sure if the other for pay options you can consider also have a low
level O/S bypass or other similar mechanism for better performance
e) Intel MPI
f) Verari MPI/Pro and ChaMPIon
g) HP MPI
Finally if you don't feel adventurous at all (for the free solutions) or don't
want to pay any money you can at least try building the latest MPICH2 (using
the nemesis device), LAM and OpenMPI that may provide you with some marginal
latency advantages.
Constantinos
--
Dr. Constantinos Evangelinos
Department of Earth, Atmospheric and Planetary Sciences
Massachusetts Institute of Technology
More information about the MITgcm-support
mailing list