[MITgcm-support] Parallel performance

Thu Jun 16 09:17:49 EDT 2005

Hi Kevin,
I had similar problems when I ported the code from an IBM SP4 to a linux 
cluster that seems similar to yours (Opteron 64, Rocks OS, Sun Grid Engine, 
ethernet network). Basically the bad performances when using multiple procs 
were due to I/O problems. SP4 has a very fast I/O scratch filesystem so I 
could make the model read and write files on that disk without problems 
(also using the GlobalFiles=.TRUE. option in PARM01 in the data file). The 
same configuration on the new cluster using NFS filesystem gave rise to very 
slow simulations. I solved the problem simply splitting the run on every 
CPU: before the beginnig of the simulation I copy the input files and the 
executable on each disk (via scp) then I start the run with 
GlobalFiles=.FALSE. and, at the end of the simulation, I copy the output 
back to the front-end node.
Are you using the GlobalFiles=.TRUE. option in PARM01 in the data file?
Maybe this is not the solution for your problem but a check on the I/O could 
be useful!
Good luck!

Stefano

P.S.: updating the code to more recent versions is not so difficult and VERY 
useful, I suggest you to try a newer checkpoint (lots of new features and 
bugs fixed)!

----- Original Message ----- 
From: "Kevin Oliver" <K.Oliver at uea.ac.uk>
To: <mitgcm-support at mitgcm.org>
Sent: Thursday, June 16, 2005 2:01 PM
Subject: [MITgcm-support] Parallel performance

> Hello,
>
> I wonder if anyone can help me with a problem I have regarding running the 
> model in parallel. We
> have just started running MIT model running on a new cluster: Opteron 64 
> bit Susa 9.1, Sun Grid
> Engine, linux platform, myrinet network (running release 1 patch 8 - this 
> was the version I could
> get to work on an older system). Presumably because not everything is set 
> up optimally, we get
> disappointing performance results running the MIT model in parallel.
>
> The experiment I have done uses a 120x120x40 domain. OLx and OLy are both 
> 3 and all diffusion is
> Laplacian. I've run it with 1x1 and 4x4 subgrids (and several other 
> combinations in between).
> Through-put increases more-or-less monotonically with the number of 
> processers used, but the 4x4
> expriment is only 20% faster than the 1x1 experiment. Delays due to 
> competition in the queue are not
> an issue.
>
> Has looked at how the performance scales with multiple processors on a 
> similar setup, so we know
> what we should be aiming for? Also, is there anything I need to look out 
> for in the code (e.g.
> switches) which could affect performance?
>
> Many thanks for your time,
>
> Kevin
>
> _________________________
> Dr Kevin Oliver
> Senior Research Associate
> School of Environmental Sciences
> University of East Anglia
> Norwich, NR4 7TJ
> United Kingdom
> ________________________
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>