[MITgcm-support] dual core Opteron

Tue Mar 6 11:49:26 EST 2007

On Tue 06 Mar 2007 09:37, Michael A. Spall wrote:

> I am currently running MITgcm on a Microway cluster with
> Opteron 248s (2.2 GHz, 1MB Cache, 2Gb RAM).  I am thinking
> of adding some nodes using dual core Opteron.  Does anyone have
> any experience on the scaling of MITgcm from single core to dual
> core processors?  I have only Gigabit ethernet.  I tend to use only
> 4 processors / job and run several jobs at once since it does not
> scale very well beyond 4 processors due the the Gigabit ethernet
> limitation.

In my testing on an Opteron dual core quad socket platform (total of 8 
processors per system) I could see a speed advantage from dual core. 
Specifically, going from running on 4 cpus bound to different sockets (that 
means suffering no contention reading/writing to their local memory where 
most if not all of the arrays they use should be located) to using all 8 cpus 
(and suffering contention for local memory access as the two cores compete 
for memory bandwidth). That speed advantage however was not  double but more 
like going from 1.85 secs/timestep to 1.4 secs/timestep (on an 1x1 degree 
ECCO forward run setup). 

Now in the special case of wanting to run 4 processor runs, using dual core 
dual socket systems would mean that all communication would be using shared 
memory within the node so there would be a speed advantage for communications 
(vis-a-vis Gigabit Ethernet) to offset the memory contention issues. Who 
would win I can't tell without more testing. On the other hand, if you start 
using 8 processor runs, using 2 nodes and all 4 cores on each node I would 
expect you to fare better than using your current setup.

This may all be a moot point as soon there won't be single core AMD (or any 
Intel) systems to buy anyway.

What I would definitely suggest to you to improve your Gigabit Ethernet 
performance is to invest (time and/or money) in a low latency MPI 
implementation for Gigabit Ethernet.

You have a choice between the free options:
a) GAMMA (very low latency for E1000 Intel Gigabit cards, less so for 
Broadcomm Tigon3 chipsets) http://www.disi.unige.it/project/gamma/
This requires a dedicated Gigabit Interface for GAMMA so it means that you 
need to have two networks per node.
b) SCore (http://www.pccluster.org), again works better with E1000 and Tigon3 
chipsets
c) Parastation4 (http://www.cluster-competence-center.com/), works better with 
the same chipsets as the others, offered for free with no support, support is 
offered on a contract basis.
and the for pay options:
d) Scali MPI Connect (http://www.scali.com/)
I'm not sure if the other for pay options you can consider also have a low 
level O/S bypass or other similar mechanism  for better performance
e) Intel MPI 
f) Verari MPI/Pro and ChaMPIon
g) HP MPI

Finally if you don't feel adventurous at all (for the free solutions) or don't 
want to pay any money you can at least try building the latest MPICH2 (using 
the nemesis device), LAM and OpenMPI that may provide you with some marginal 
latency advantages.

Constantinos
-- 
Dr. Constantinos Evangelinos
Department of Earth, Atmospheric and Planetary Sciences
Massachusetts Institute of Technology