[MITgcm-devel] good news for FIZHI !!!

Jean-Michel Campin jmc at ocean.mit.edu
Sat Jul 30 16:40:47 EDT 2005


Hello Ed,

It's interesting.

I've done some tests few months ago, with a very simple program
that requires dynamical allocation
and with ifort on faulks, the unlimited option
did not do anything, and I was still getting the
Segmentation fault
But with pgf77 and g77, my small test works fine.

May be it's the ifort compiler on faulks, or the options
that I was using ?

Jean-Michel

On Sat, Jul 30, 2005 at 12:24:09PM -0400, Ed Hill wrote:
> 
> Hi Andrea, Jean-Michel, and Chris
> 
> I have some excellent FIZHI news but first let me tell you how I found
> it since it might be helpful in the future.
> 
> Despite Andrea's (good!) efforts to remove Fortran "save" and any other
> not-so-portable idioms, FIZHI still only ran with the PGI compiler
> although it compiled without any real trouble on the others.  So in an
> effort to get it running on Columbia, I spent a few hours on it last
> night and this morning trying to get it working with g77 and ifort on my
> laptop.  The key thing I discovered was that both ifort and g77
> segfault-ed at exactly the same place in the irrad() routine:
> 
>   $ gdb ./mitgcmuv
> 
>     [...lots of output...]
> 
>   Program received signal SIGSEGV, Segmentation fault.
>   0x080b52c6 in irrad_ ()
> 
>   (gdb) bt
>   #0  0x080b52c6 in irrad_ ()
>   #1  0x080b223f in lwrio_ ()
>   #2  0x0809ddc4 in fizhi_driver__ ()
>   #3  0x08091374 in do_fizhi__ ()
>   #4  0x0810ac48 in fizhi_wrapper__ ()
>   #5  0x081ccfe7 in do_atmospheric_phys__ ()
>   #6  0x081d6746 in forward_step__ ()
>   #7  0x0820835c in the_main_loop__ ()
>   #8  0x082084b5 in the_model_main__ ()
>   #9  0x081acee3 in MAIN__ ()
>   #10 0x0820e4c5 in main ()
>   (gdb) 
> 
> So, I then wasted hours trying to pare down irrad() to figure out the
> exact line(s) causing the segfault.  With both g77 and ifort, the
> segfault happened before the first statement in irrad() was executed.
> Thats the tip-off.  It wasn't irrad() code or any of the many DATA
> statements it contains.  It was irrad() simply using too much stack
> space with its many local variables!
> 
> So, I set the stack size to unlimited in my shell:
> 
>   $ ulimit -s unlimited
> 
> and now you can see on our testing pages that the FIZHI experiment works
> reasonably well on the "ernie" machine (my laptop) which has ifort v8.1
> and g77 v3.4.4 installed (Fedora Core 3):
> 
>   http://mitgcm.org/testing.html
> 
> I'll look into our testing scripts next and try to figure out how to do
> the ulimit command within MPI runs.
> 
> Ed
> 
> ps - This means the PGI compiler is almost certainly creating Fortran 
>      local variables on the heap instead of the stack.  Yeah, theres 
>      some useful trivia.
> 



More information about the MITgcm-devel mailing list