[MITgcm-devel] sx8 latest testing
Martin Losch
Martin.Losch at awi.de
Tue Jan 12 09:34:28 EST 2010
Hi Jean-Michel,
compared to Dec06, 2009 these are the extra fails (compiled but did
not run):
deep_anelastic
fizhi-gridalt-hs
flt_example
global_with_exf (2)
hs94.cs-32x32x5
everything else looks pretty much the same.
The reason for this is unclear but probably unrelated to the model.
All experiments finished regularily and only the comparison is
missing. (BTW, some of the other experiments that are missing: dome,
global_ocean.cs32x15.icedyn and thsice, are complete, too, so it's the
same problem there; the other two fizhi experiments fail as usual,
probably seg-fault, and the lab_sea and offline_exf_seaice experiments
have a different problem that I have not yet found).
Because of cross compiling I need to run testreport on the head node
and run the models on with individual qsub-commands. The qsub on this
machine does not have a flag to make it return control to the calling
shell only after completion of the job, so that I have to make
testreport wait for some specific output file to appear before it
continues. This is my jobscript:
> x8::scripts> less runit_sxf90
> #!/bin/sh
> # submit the job
> qsub -q sx8-r /home/sx8/mlosch/scripts/job_sxf90
>
> sleep 10
> stillruns=`qstat -n -u mlosch | grep testsx8`
> # wait until the job is finished; do this by waiting for output.txt
> to appear
> while [ ! -e output.txt ]
> do
> sleep 10
> stillruns=`qstat -n -u mlosch | grep testsx8`
> echo "output of qstat "${stillruns}x
> if [ "${stillruns}"x = x ] ; then
> exit
> fi
> done
> #
and in job_sx8f90 I do this:
> #PBS -q sx8-r # job queue not neccesary
> so far
> #PBS -N testsx8 # give the job a name
> #PBS -l cpunum_job=2 # cpus per node
> #PBS -l cputim_job=2:00:00 # time limit
> #PBS -l memsz_job=32gb # max accumulated memory,
> we need this much because of many netcdf files
> #PBS -j o # join i/o
> #PBS -S /bin/sh
> #PBS -o /home/sx8/mlosch/out_sxf90 # o Where
> to write output
> #
>
> cd ${PBS_O_WORKDIR}
> (mpirun -np 2 ./mitgcmuv && cp STDOUT.0000 output.txt && echo
> "NORMAL END" >> run.log) || cp STDOUT.0000 output.txt
So it's not pretty and I assume that for some runs it just does not
work. To be honest, I don't feel like finding the problem, because it
does not have anything to do with the model and I already tried to fix
it with the help of the system administrator, but we were not
successful.
BTW the edvir machine is completely out, and replaced by something
called iblade (IBM P6). If we need this platform in our tests, please
let me know and I'll try to do something there
Martin
On Jan 11, 2010, at 6:50 PM, Jean-Michel Campin wrote:
> Hi Martin,
>
> looks like the last testing on sx8 has more "fail" than usually.
> Do you know why ? Something wrong in the code for this
> platform ?
>
> Thanks,
> Jean-Michel
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list