[MITgcm-devel] problem with my last checkin
Jean-Michel Campin
jmc at ocean.mit.edu
Tue Feb 5 18:48:45 EST 2008
Hi Ed and others,
Thanks for your answer. I know I was doing something wrong
but did not know what, and you pointed to the right thing
(I forgot to purge the mpich/intel, but compile with g77+mpi_aces
optfile !).
I've just repeated a clean test, and
a) seems that useSETRLSTK=.TRUE. is "working" on aces with g77+mpi_aces
(I don't get a seg fault anymore)
b) fizhi-cs-aqualev20 now stops in MON_SOLUTION with temperature
out of bounds :
(PID.TID 0000.0001) SOLUTION IS HEADING OUT OF BOUNDS: tMin,tMax= -INF 7.550E+02
(PID.TID 0000.0001) MON_SOLUTION: STOPPING CALCULATION
c) fizhi-cs-32x32x40 hangs indefinitively
In conclusion, even if useSETRLSTK is working for mpi and non-mpi
run, it cannot be used for fizhi-cs-32x32x40 since it will prevent
the current g77 test on aces to run.
A solution would be not to test those 2 exp. on aces with g77
(testreport -skipdir 'fizhi-cs-aqualev20 fizhi-cs-32x32x40')
Should we do that ?
Jean-Michel
On Tue, Feb 05, 2008 at 04:06:22PM -0500, Ed Hill wrote:
> On Tue, 5 Feb 2008 12:51:10 -0500 Jean-Michel Campin wrote:
> >
> > But things are not completely clear, because I get a different error:
> > > forrtl: severe (36): attempt to access non-existent record, unit 9,
> > > file /home/jmc/gcm_current/verification/fizhi-cs-32x32x40/run/dxC1_dXYa.face001.bin
> > > Image PC Routine Line Source
> > > mitgcmuv 083E380C Unknown Unknown Unknown
> >
> > which is not what it use to be:
> > > /usr/local/pkg/mpich/mpich-gcc/bin/mpirun: line 1: 29506
> > > Segmentation fault
> > Will see.
>
>
> Hi Jean-Michel and Chris,
>
> I removed useSETRLSTK because, as I remember it, one or two of the
> fizhi verification experiments compiled with g77 would go off into
> "lala land" on the ACES cluster. By that, I mean they would not die
> (e.g., due to a seg-fault), they would not stop running, and they would
> not produce any results. They would consume all of the time remaining
> for the PBS job and would eventually be killed by the PBS job-cleanup
> scripts. And, unfortunately, I could not think of any way to detect
> and deal with that situation within testreport.
>
> Also, the "forrtl" above means you're using the Intel Fortran Run Time
> Library. I could be mistaken but I think the fizhi/setrlstk/"won't
> die" problem only happened with g77.
>
> Anyway, good luck! :-)
>
> Ed
>
> --
> Edward H. Hill III, PhD | ed at eh3.com | http://eh3.com/
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list