[MITgcm-devel] problem with my last checkin

Jean-Michel Campin jmc at ocean.mit.edu
Tue Feb 5 18:48:45 EST 2008


Hi Ed and others,

Thanks for your answer. I know I was doing something wrong
but did not know what, and you pointed to the right thing
(I forgot to purge the mpich/intel, but compile with g77+mpi_aces
optfile !).
I've just repeated a clean test, and
a) seems that useSETRLSTK=.TRUE. is "working" on aces with g77+mpi_aces
 (I don't get a seg fault anymore)
b) fizhi-cs-aqualev20 now stops in MON_SOLUTION with temperature
out of bounds :
(PID.TID 0000.0001) SOLUTION IS HEADING OUT OF BOUNDS: tMin,tMax= -INF  7.550E+02
(PID.TID 0000.0001) MON_SOLUTION: STOPPING CALCULATION
c) fizhi-cs-32x32x40 hangs indefinitively

In conclusion, even if useSETRLSTK is working for mpi and non-mpi
run, it cannot be used for fizhi-cs-32x32x40 since it will prevent
the current g77 test on aces to run.
A solution would be not to test those 2 exp. on aces with g77
(testreport -skipdir 'fizhi-cs-aqualev20 fizhi-cs-32x32x40')
Should we do that ?

Jean-Michel

On Tue, Feb 05, 2008 at 04:06:22PM -0500, Ed Hill wrote:
> On Tue, 5 Feb 2008 12:51:10 -0500 Jean-Michel Campin wrote:
> > 
> > But things are not completely clear, because I get a different error:
> > > forrtl: severe (36): attempt to access non-existent record, unit 9,
> > > file /home/jmc/gcm_current/verification/fizhi-cs-32x32x40/run/dxC1_dXYa.face001.bin
> > > Image              PC        Routine            Line        Source
> > > mitgcmuv           083E380C  Unknown               Unknown  Unknown
> > 
> > which is not what it use to be:
> > > /usr/local/pkg/mpich/mpich-gcc/bin/mpirun: line 1: 29506
> > > Segmentation fault 
> > Will see.
> 
> 
> Hi Jean-Michel and Chris,
> 
> I removed useSETRLSTK because, as I remember it, one or two of the
> fizhi verification experiments compiled with g77 would go off into
> "lala land" on the ACES cluster.  By that, I mean they would not die
> (e.g., due to a seg-fault), they would not stop running, and they would
> not produce any results.  They would consume all of the time remaining
> for the PBS job and would eventually be killed by the PBS job-cleanup
> scripts.  And, unfortunately, I could not think of any way to detect
> and deal with that situation within testreport.
> 
> Also, the "forrtl" above means you're using the Intel Fortran Run Time
> Library.  I could be mistaken but I think the fizhi/setrlstk/"won't
> die" problem only happened with g77.
> 
> Anyway, good luck!  :-)
> 
> Ed
> 
> -- 
> Edward H. Hill III, PhD  |  ed at eh3.com  |  http://eh3.com/



> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel




More information about the MITgcm-devel mailing list