[MITgcm-support] Scalability

Stefano Querin squerin at ogs.trieste.it
Wed Aug 3 09:27:13 EDT 2005


Hi,
I'm wondering if is there something wrong in my model configuration.
I'm running on an AMD Opteron 64 small cluster (12 CPUs: 4 SunFire V20z  + 
1 SunFire V40z). I made some runs (1, 2, 4 and 8 CPUs) to check the 
scalability of the code: the results are in the file  Scalability.txt . I 
also attach the file SIZE.h (taken from the 8 CPUs run).
The domain is 88 x 128 x 28 points with horizontal resolution of 250 m, 
double periodic. Levels are 0.5 (upper 6) and 1.0 (other 22) meters thick. 
The simulation is made by 4320 steps, timestep is 10 s, dumptime is 1800 s. 
The model (checkpoint57j_post) is forced with surface heat fluxes and wind 
stress.
It seems like the model doesn't scale so fine. User time scales almost 
linearly; on the contrary, wall clock time (expecially passing from 4 to 8 
CPUs) gets worse.
In the column  sea+land  I put the time elapsed running exactly the same 
simulation, but using a different domain: in this latter case, the southern 
half of the domain is made by land points (i.e. running on 8 CPUs, 4 handle 
only sea points, the other 4 handle only land points). Simulations are not 
that faster (do land points require only a slightly lower computational 
effort?). Scalability is similar to the "all sea" experiment.
My question is: why the gap between user+system time and wall clock time 
rises so much as I increase the number of CPUs?
I know that the model was created to be highly scalable on much larger 
computational platforms, so I'm probably making some mistakes somewhere!
I'm sure it is not an I/O problem: the DO_THE_MODEL_IO routine requires few 
seconds, since I turned GlobalFiles off  (I also don't use the NFS 
filesystem anymore).
Most of the time (almost all the time) is "lost" in the SOLVE_FOR_PRESSURE 
and BLOCKING_EXCHANGES routines, which (I think) involve the whole domain. 
Can be a communication problem of our cluster (network, switches...)?
Any other ideas?
Thank you very much!

Regards,

Stefano 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Scalability.txt
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20050803/af83d886/attachment.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SIZE.h
Type: application/octet-stream
Size: 2258 bytes
Desc: not available
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20050803/af83d886/attachment.obj>


More information about the MITgcm-support mailing list