[MITgcm-support] Parallel performance

Fri Jun 17 19:14:22 EDT 2005

Hello:
I  am running a 57b_post version of the MIT model (forward run) at linux
cluster and have the same problem of slow I/O
Is it possible to specify the path to the local (per node)  directory, so
that O/I files are written/read locally without modifying fortran code?

Thanks
Elena

> Hi Kevin,
> I had similar problems when I ported the code from an IBM SP4 to a linux
> cluster that seems similar to yours (Opteron 64, Rocks OS, Sun Grid
> Engine,
> ethernet network). Basically the bad performances when using multiple
> procs
> were due to I/O problems. SP4 has a very fast I/O scratch filesystem so I
> could make the model read and write files on that disk without problems
> (also using the GlobalFiles=.TRUE. option in PARM01 in the data file). The
> same configuration on the new cluster using NFS filesystem gave rise to
> very
> slow simulations. I solved the problem simply splitting the run on every
> CPU: before the beginnig of the simulation I copy the input files and the
> executable on each disk (via scp) then I start the run with
> GlobalFiles=.FALSE. and, at the end of the simulation, I copy the output
> back to the front-end node.
> Are you using the GlobalFiles=.TRUE. option in PARM01 in the data file?
> Maybe this is not the solution for your problem but a check on the I/O
> could
> be useful!
> Good luck!
>
> Stefano
>
> P.S.: updating the code to more recent versions is not so difficult and
> VERY
> useful, I suggest you to try a newer checkpoint (lots of new features and
> bugs fixed)!
>
>
> ----- Original Message -----
> From: "Kevin Oliver" <K.Oliver at uea.ac.uk>
> To: <mitgcm-support at mitgcm.org>
> Sent: Thursday, June 16, 2005 2:01 PM
> Subject: [MITgcm-support] Parallel performance
>
>
>> Hello,
>>
>> I wonder if anyone can help me with a problem I have regarding running
>> the
>> model in parallel. We
>> have just started running MIT model running on a new cluster: Opteron 64
>> bit Susa 9.1, Sun Grid
>> Engine, linux platform, myrinet network (running release 1 patch 8 -
>> this
>> was the version I could
>> get to work on an older system). Presumably because not everything is
>> set
>> up optimally, we get
>> disappointing performance results running the MIT model in parallel.
>>
>> The experiment I have done uses a 120x120x40 domain. OLx and OLy are
>> both
>> 3 and all diffusion is
>> Laplacian. I've run it with 1x1 and 4x4 subgrids (and several other
>> combinations in between).
>> Through-put increases more-or-less monotonically with the number of
>> processers used, but the 4x4
>> expriment is only 20% faster than the 1x1 experiment. Delays due to
>> competition in the queue are not
>> an issue.
>>
>> Has looked at how the performance scales with multiple processors on a
>> similar setup, so we know
>> what we should be aiming for? Also, is there anything I need to look out
>> for in the code (e.g.
>> switches) which could affect performance?
>>
>> Many thanks for your time,
>>
>> Kevin
>>
>> _________________________
>> Dr Kevin Oliver
>> Senior Research Associate
>> School of Environmental Sciences
>> University of East Anglia
>> Norwich, NR4 7TJ
>> United Kingdom
>> ________________________
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>