[MITgcm-devel] problem with my last checkin
Jean-Michel Campin
jmc at ocean.mit.edu
Tue Feb 5 12:51:10 EST 2008
Hi Chris,
Here is what I found in MITgcm/verification/fizhi-cs-aqualev20/input/eedata :
# unlimit the stack size for the FIZHI rad code
#EH3 useSETRLSTK is commented out (default: false) since, with the g77
#EH3 compiler, it causes the model to hang *without* returning -- thus
#EH3 killing all our automated g77 testing on, for instance, the ACES
#EH3 cluster.
#EH3 useSETRLSTK=.TRUE.,
I've just tried to run this test with g77+mpi on aces, with useSETRLSTK=.TRUE.,
and it seems to work (does not fix the problem with g77+mpi on aces,
but does not hang and finish with same error as without useSETRLSTK=.TRUE.).
But things are not completely clear, because I get a different error:
> forrtl: severe (36): attempt to access non-existent record, unit 9, file /home/jmc/gcm_current/verification/fizhi-cs-32x32x40/run/dxC1_dXYa.face001.bin
> Image PC Routine Line Source
> mitgcmuv 083E380C Unknown Unknown Unknown
which is not what it use to be:
> /usr/local/pkg/mpich/mpich-gcc/bin/mpirun: line 1: 29506 Segmentation fault
Will see.
Jean-Michel
On Tue, Feb 05, 2008 at 10:59:44AM -0500, chris hill wrote:
> Hi All,
>
> In theory if you have
> useSETRLSTK=.TRUE.,
> in "eedata" the stack will get automatically unlimited
> on Linux systems (and other systems that support Posix setrlimit() ).
> genmake tests for setrlimit(), so that where it doesn't exist
> it should be #ifdef'd out.
> I remember there were some problems with this in the past, but I just
> took a look and the code looks OK to me, so maybe we should try
> activating this again?
>
> Chris
>
> Jean-Michel Campin wrote:
> >Hi Martin,
> >
> >On Tue, Feb 05, 2008 at 03:49:17PM +0100, Martin Losch wrote:
> >>No, I didn't, stupid me. The seg-fault goes away with "unlimit", but
> >>still I don't see how my changes lead to a stack overflow.
> >>
> >>Also, is the "unlimit" taken care of with the automated tests?
> >Yes and No: it's among the 1rst command at the top of the script:
> >MITgcm/tools/example_scripts/faulks/test_csail: lines 19 & 20;
> >># Turn off stack limit for FIZHI
> >>ulimit -s unlimited
> >
> >Jean-Michel
> >
> >>Martin
> >>On 5 Feb 2008, at 15:33, Patrick Heimbach wrote:
> >>
> >>>Did you try the "usual" seg fault candidate first:
> >>>
> >>>unlimit
> >>>
> >>>-p.
> >>>
> >>>On Feb 5, 2008, at 9:06 AM, Martin Losch wrote:
> >>>
> >>>>Hi there,
> >>>>
> >>>>I have just checked in some functionality that is handy for
> >>>>rotated spherical grids (basically a few new scalar variables in
> >>>>PARAMS.h and a new subroutine that recomputes XC/YC/XG/YG in one
> >>>>special case. The new code does not change the verification
> >>>>experiments on my linux_ia32_g77-machine. But now I am rerunning
> >>>>testreport on hugo.csail.mit.edu with the same build_options_file
> >>>>and I get segmentation faults (linux_ia32_g77) for fizhi-cs-
> >>>>aqualev and fizhi-cs-32x32x40. Everything else is OK.
> >>>>
> >>>>I have tried to find the problem, but the segmentation fault
> >>>>happens somewhere within fizhi, here' s the debugger output
> >>>>>Program received signal SIGSEGV, Segmentation fault.
> >>>>>0x0805aca8 in solir_ (m=0xbf897574, n=0x8581468, ndim=0x8581468,
> >>>>>np=0x85810e0, wh=0xbf114b50, taucld=0xbf24cde0,
> >>>>>tauclb=0xbf1c8b80, tauclf=0xbf18cb70, reff=0xbf2c4df0,
> >>>>>ict=0xbf897598, icb=0xbf89759c,
> >>>>> fcld=0xbf574f40, cc=0xbf204b90, taual=0xbf210dd0,
> >>>>>csm=0xbefdcad0, rsirbm=0xbf47aec0, rsirdf=0xbf478eb0,
> >>>>>flx=0xbf42ee40, flc=0xbf3f0e30, fdirir=0xbf472e80,
> >>>>>fdifir=0xbf470e70) at fizhi_swrad.f:1732
> >>>>>1732 ssaclt(i,j)=1.0
> >>>>>Current language: auto; currently fortran
> >>>>>
> >>>>I have no idea what's going on, and I can't even run g77 -fbounds-
> >>>>check, because in fizhi, there are so many assignments where this
> >>>>array bound check chockes, e.g. variable mndy(12,4) is access via
> >>>>DO I=1,48; mnc(I,1)=...; ENDDO, which is technically correct by
> >>>>makes it impossible to debug these files. What am I to do? Remove
> >>>>my changes again?
> >>>>
> >>>>Martin
> >>>>
> >>>>_______________________________________________
> >>>>MITgcm-devel mailing list
> >>>>MITgcm-devel at mitgcm.org
> >>>>http://mitgcm.org/mailman/listinfo/mitgcm-devel
> >>>---
> >>>Patrick Heimbach | heimbach at mit.edu | http://www.mit.edu/~heimbach
> >>>MIT | EAPS 54-1518 | 77 Massachusetts Ave | Cambridge MA 02139 USA
> >>>FON +1-617-253-5259 | FAX +1-617-253-4464 | SKYPE patrick.heimbach
> >>>
> >>>
> >>>_______________________________________________
> >>>MITgcm-devel mailing list
> >>>MITgcm-devel at mitgcm.org
> >>>http://mitgcm.org/mailman/listinfo/mitgcm-devel
> >>_______________________________________________
> >>MITgcm-devel mailing list
> >>MITgcm-devel at mitgcm.org
> >>http://mitgcm.org/mailman/listinfo/mitgcm-devel
> >_______________________________________________
> >MITgcm-devel mailing list
> >MITgcm-devel at mitgcm.org
> >http://mitgcm.org/mailman/listinfo/mitgcm-devel
> >
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list