[MITgcm-devel] sx8 latest testing
Martin Losch
Martin.Losch at awi.de
Tue Jan 12 10:05:26 EST 2010
Me again,
unrelated to the recent problem, but lab_sea/run and the two
offline_exf_seaice experiments fail for the same reason: too many
netcdf files are opened. For some reason, opening a netcdf files
requires a lot of RAM on this machine and I am already at the limit of
32GB of RAM for my test jobs (larger RAM would mean different queue
and unnecessary waiting).
Why are there so many netcdf files? Because in data.diagnostics, there
is a file for every variable (and 4 tiles!). While this makes sense if
useMNC=.false., it's not useful for netcdf files. I suggest to
1. turn off the mnc package in lab_sea/input/data.pkg (as it is turned
of for the other sub-experiments)
2. change "data.diagnostics" for offline_exf_seaice to have only one
or two netcdf files opened by the diagnostics package.
Any objection?
Martin
On Jan 12, 2010, at 3:34 PM, Martin Losch wrote:
> Hi Jean-Michel,
> compared to Dec06, 2009 these are the extra fails (compiled but did
> not run):
>
> deep_anelastic
> fizhi-gridalt-hs
> flt_example
> global_with_exf (2)
> hs94.cs-32x32x5
>
> everything else looks pretty much the same.
>
> The reason for this is unclear but probably unrelated to the model.
> All experiments finished regularily and only the comparison is
> missing. (BTW, some of the other experiments that are missing: dome,
> global_ocean.cs32x15.icedyn and thsice, are complete, too, so it's
> the same problem there; the other two fizhi experiments fail as
> usual, probably seg-fault, and the lab_sea and offline_exf_seaice
> experiments have a different problem that I have not yet found).
>
> Because of cross compiling I need to run testreport on the head node
> and run the models on with individual qsub-commands. The qsub on
> this machine does not have a flag to make it return control to the
> calling shell only after completion of the job, so that I have to
> make testreport wait for some specific output file to appear before
> it continues. This is my jobscript:
>> x8::scripts> less runit_sxf90
>> #!/bin/sh
>> # submit the job
>> qsub -q sx8-r /home/sx8/mlosch/scripts/job_sxf90
>>
>> sleep 10
>> stillruns=`qstat -n -u mlosch | grep testsx8`
>> # wait until the job is finished; do this by waiting for output.txt
>> to appear
>> while [ ! -e output.txt ]
>> do
>> sleep 10
>> stillruns=`qstat -n -u mlosch | grep testsx8`
>> echo "output of qstat "${stillruns}x
>> if [ "${stillruns}"x = x ] ; then
>> exit
>> fi
>> done
>> #
> and in job_sx8f90 I do this:
>> #PBS -q sx8-r # job queue not neccesary
>> so far
>> #PBS -N testsx8 # give the job a name
>> #PBS -l cpunum_job=2 # cpus per node
>> #PBS -l cputim_job=2:00:00 # time limit
>> #PBS -l memsz_job=32gb # max accumulated memory,
>> we need this much because of many netcdf files
>> #PBS -j o # join i/o
>> #PBS -S /bin/sh
>> #PBS -o /home/sx8/mlosch/out_sxf90 # o
>> Where to write output
>> #
>>
>> cd ${PBS_O_WORKDIR}
>> (mpirun -np 2 ./mitgcmuv && cp STDOUT.0000 output.txt && echo
>> "NORMAL END" >> run.log) || cp STDOUT.0000 output.txt
>
> So it's not pretty and I assume that for some runs it just does not
> work. To be honest, I don't feel like finding the problem, because
> it does not have anything to do with the model and I already tried
> to fix it with the help of the system administrator, but we were not
> successful.
>
> BTW the edvir machine is completely out, and replaced by something
> called iblade (IBM P6). If we need this platform in our tests,
> please let me know and I'll try to do something there
>
> Martin
>
> On Jan 11, 2010, at 6:50 PM, Jean-Michel Campin wrote:
>
>> Hi Martin,
>>
>> looks like the last testing on sx8 has more "fail" than usually.
>> Do you know why ? Something wrong in the code for this
>> platform ?
>>
>> Thanks,
>> Jean-Michel
>>
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list