[MITgcm-support] MITGCM on dual-dual-core Opteron

Jason Goodman jgoodman at whoi.edu
Thu Jun 15 17:00:33 EDT 2006


Hey, MITGCM gang!

My group here at WHOI has a 28-node cluster with dual Opteron 248  
nodes.  We're going to be replacing some of them with dual-core  
Opteron 275s (same clock rate, but dual-core), and I've been doing  
some benchmarking on a trial node.

MITGCM ocean model build 20050721
3-d convection code (verification/exp5)
domain = 120x120x20

Dual core AMD 275:
Cores		Usr Time        Rate
------------------------------------
1		133.3		1
2		120.0, 116.5	1.11, 1.14
4		77.4		1.72

Single core AMD 248:

Cores		Usr Time        Rate
------------------------------------
1		132.2		1
2		87.1		1.52
4 (2 nodes)	55.6		2.38

Now, I don't expect MITGCM to work well with dual-core chips.  I've  
been told that MITGCM is memory-bandwidth-limited, and so when two  
cores on the same chip try to access their local memory at the same  
time, there's a traffic jam on the memory bus.  Compare with NCAR  
CAM, an atmospheric model which is CPU-number-crunching limited:

Dual core AMD 275:

Cores		Wallclock time  Rate
------------------------------------
2		797		1
4	 	394		2.0


Single core AMD 248:

Cores		Wallclock time  Rate
------------------------------------
2		775		1
4 (2 nodes) 	410		1.9

I'd like to be able to use the quad-core nodes as dual-core nodes  
sometimes, but MITGCM's performance using 2 cores per node is  
terrible.  The 275 chip has the same clock speed as the 248, so you  
would think that each MITGCM process would run on a separate chip,  
each with its separate memory, and performance would be equal to the  
248.  That's exactly what happens with CAM.  But with MITGCM on my  
system, the processes hop around from CPU to CPU.  Sometimes both  
processes will be on the same chip, sometimes they'll be on different  
chips, the scheduler moves them around like crazy.  And of course,  
when they're on the same chip, you get a memory bottleneck.

Why does this happen?  Why MITGCM and not CAM?  Is there anything I  
can change in my OS configuration or MITGCM configuration to stop it?

Jason




More information about the MITgcm-support mailing list