[MITgcm-support] Scalability
Stefano Querin
squerin at ogs.trieste.it
Wed Aug 3 09:27:13 EDT 2005
Hi,
I'm wondering if is there something wrong in my model configuration.
I'm running on an AMD Opteron 64 small cluster (12 CPUs: 4 SunFire V20z +
1 SunFire V40z). I made some runs (1, 2, 4 and 8 CPUs) to check the
scalability of the code: the results are in the file Scalability.txt . I
also attach the file SIZE.h (taken from the 8 CPUs run).
The domain is 88 x 128 x 28 points with horizontal resolution of 250 m,
double periodic. Levels are 0.5 (upper 6) and 1.0 (other 22) meters thick.
The simulation is made by 4320 steps, timestep is 10 s, dumptime is 1800 s.
The model (checkpoint57j_post) is forced with surface heat fluxes and wind
stress.
It seems like the model doesn't scale so fine. User time scales almost
linearly; on the contrary, wall clock time (expecially passing from 4 to 8
CPUs) gets worse.
In the column sea+land I put the time elapsed running exactly the same
simulation, but using a different domain: in this latter case, the southern
half of the domain is made by land points (i.e. running on 8 CPUs, 4 handle
only sea points, the other 4 handle only land points). Simulations are not
that faster (do land points require only a slightly lower computational
effort?). Scalability is similar to the "all sea" experiment.
My question is: why the gap between user+system time and wall clock time
rises so much as I increase the number of CPUs?
I know that the model was created to be highly scalable on much larger
computational platforms, so I'm probably making some mistakes somewhere!
I'm sure it is not an I/O problem: the DO_THE_MODEL_IO routine requires few
seconds, since I turned GlobalFiles off (I also don't use the NFS
filesystem anymore).
Most of the time (almost all the time) is "lost" in the SOLVE_FOR_PRESSURE
and BLOCKING_EXCHANGES routines, which (I think) involve the whole domain.
Can be a communication problem of our cluster (network, switches...)?
Any other ideas?
Thank you very much!
Regards,
Stefano
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Scalability.txt
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20050803/af83d886/attachment.txt>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SIZE.h
Type: application/octet-stream
Size: 2258 bytes
Desc: not available
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20050803/af83d886/attachment.obj>
More information about the MITgcm-support
mailing list