[MITgcm-devel] good news for FIZHI !!!

Ed Hill ed at eh3.com
Sat Jul 30 12:24:09 EDT 2005


Hi Andrea, Jean-Michel, and Chris

I have some excellent FIZHI news but first let me tell you how I found
it since it might be helpful in the future.

Despite Andrea's (good!) efforts to remove Fortran "save" and any other
not-so-portable idioms, FIZHI still only ran with the PGI compiler
although it compiled without any real trouble on the others.  So in an
effort to get it running on Columbia, I spent a few hours on it last
night and this morning trying to get it working with g77 and ifort on my
laptop.  The key thing I discovered was that both ifort and g77
segfault-ed at exactly the same place in the irrad() routine:

  $ gdb ./mitgcmuv

    [...lots of output...]

  Program received signal SIGSEGV, Segmentation fault.
  0x080b52c6 in irrad_ ()

  (gdb) bt
  #0  0x080b52c6 in irrad_ ()
  #1  0x080b223f in lwrio_ ()
  #2  0x0809ddc4 in fizhi_driver__ ()
  #3  0x08091374 in do_fizhi__ ()
  #4  0x0810ac48 in fizhi_wrapper__ ()
  #5  0x081ccfe7 in do_atmospheric_phys__ ()
  #6  0x081d6746 in forward_step__ ()
  #7  0x0820835c in the_main_loop__ ()
  #8  0x082084b5 in the_model_main__ ()
  #9  0x081acee3 in MAIN__ ()
  #10 0x0820e4c5 in main ()
  (gdb) 

So, I then wasted hours trying to pare down irrad() to figure out the
exact line(s) causing the segfault.  With both g77 and ifort, the
segfault happened before the first statement in irrad() was executed.
Thats the tip-off.  It wasn't irrad() code or any of the many DATA
statements it contains.  It was irrad() simply using too much stack
space with its many local variables!

So, I set the stack size to unlimited in my shell:

  $ ulimit -s unlimited

and now you can see on our testing pages that the FIZHI experiment works
reasonably well on the "ernie" machine (my laptop) which has ifort v8.1
and g77 v3.4.4 installed (Fedora Core 3):

  http://mitgcm.org/testing.html

I'll look into our testing scripts next and try to figure out how to do
the ulimit command within MPI runs.

Ed

ps - This means the PGI compiler is almost certainly creating Fortran 
     local variables on the heap instead of the stack.  Yeah, theres 
     some useful trivia.

-- 
Edward H. Hill III, PhD
office:  MIT Dept. of EAPS;  Rm 54-1424;  77 Massachusetts Ave.
             Cambridge, MA 02139-4307
emails:  eh3 at mit.edu                ed at eh3.com
URLs:    http://web.mit.edu/eh3/    http://eh3.com/
phone:   617-253-0098
fax:     617-253-4464




More information about the MITgcm-devel mailing list