[MITgcm-support] Parallel performance
Stefano Querin
squerin at ogs.trieste.it
Thu Jun 16 09:17:49 EDT 2005
Hi Kevin,
I had similar problems when I ported the code from an IBM SP4 to a linux
cluster that seems similar to yours (Opteron 64, Rocks OS, Sun Grid Engine,
ethernet network). Basically the bad performances when using multiple procs
were due to I/O problems. SP4 has a very fast I/O scratch filesystem so I
could make the model read and write files on that disk without problems
(also using the GlobalFiles=.TRUE. option in PARM01 in the data file). The
same configuration on the new cluster using NFS filesystem gave rise to very
slow simulations. I solved the problem simply splitting the run on every
CPU: before the beginnig of the simulation I copy the input files and the
executable on each disk (via scp) then I start the run with
GlobalFiles=.FALSE. and, at the end of the simulation, I copy the output
back to the front-end node.
Are you using the GlobalFiles=.TRUE. option in PARM01 in the data file?
Maybe this is not the solution for your problem but a check on the I/O could
be useful!
Good luck!
Stefano
P.S.: updating the code to more recent versions is not so difficult and VERY
useful, I suggest you to try a newer checkpoint (lots of new features and
bugs fixed)!
----- Original Message -----
From: "Kevin Oliver" <K.Oliver at uea.ac.uk>
To: <mitgcm-support at mitgcm.org>
Sent: Thursday, June 16, 2005 2:01 PM
Subject: [MITgcm-support] Parallel performance
> Hello,
>
> I wonder if anyone can help me with a problem I have regarding running the
> model in parallel. We
> have just started running MIT model running on a new cluster: Opteron 64
> bit Susa 9.1, Sun Grid
> Engine, linux platform, myrinet network (running release 1 patch 8 - this
> was the version I could
> get to work on an older system). Presumably because not everything is set
> up optimally, we get
> disappointing performance results running the MIT model in parallel.
>
> The experiment I have done uses a 120x120x40 domain. OLx and OLy are both
> 3 and all diffusion is
> Laplacian. I've run it with 1x1 and 4x4 subgrids (and several other
> combinations in between).
> Through-put increases more-or-less monotonically with the number of
> processers used, but the 4x4
> expriment is only 20% faster than the 1x1 experiment. Delays due to
> competition in the queue are not
> an issue.
>
> Has looked at how the performance scales with multiple processors on a
> similar setup, so we know
> what we should be aiming for? Also, is there anything I need to look out
> for in the code (e.g.
> switches) which could affect performance?
>
> Many thanks for your time,
>
> Kevin
>
> _________________________
> Dr Kevin Oliver
> Senior Research Associate
> School of Environmental Sciences
> University of East Anglia
> Norwich, NR4 7TJ
> United Kingdom
> ________________________
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
More information about the MITgcm-support
mailing list