[MITgcm-support] adjoint for MITgcm before checkpoint61 and staf flag -nonew_arg

Patrick Heimbach heimbach at MIT.EDU
Mon Jan 25 23:36:48 EST 2010


Santha,

I guess you are using an older MITgcm version.
Since checkpoint61 we had to change some subroutine argument lists
in the adjoint support package because Fastopt no longer
supports the TAF flag "-nonew_arg".

However, for older MITgcm versions this flag is still necessary.
To be able to adjoint older versions you have to do following  
modifications:

1) change your Makefile:
    add flag "-version 1.9.22" to line starting with
    (but keep the "-nonew_arg" flag!)

AD_TAF_FLAGS         = ...

2) add one line to latest staf script
    (Fastopt removed it from its latest script version):

        -new|-nonew_arg)         OPTIONS="$OPTIONS $1"; shift;;

    So the block where is this line needs to go should look,
    with the added line, something like:
         -openmp|-openmp1)        OPTIONS="$OPTIONS $1"; shift;;
         -ompjac|-ignore-omp)     OPTIONS="$OPTIONS $1"; shift;;
         -mpi|-craypointer)       OPTIONS="$OPTIONS $1"; shift;;
         -new|-nonew_arg)         OPTIONS="$OPTIONS $1"; shift;;
         -free)                   OPTIONS="$OPTIONS $1"; FREE=1; shift;;

The way this backward compatibility is (not) handled by Fastopt
is not helpful. I'll request to have Fastopt change this to
facilitate the backward compatibility with use of a consistent set of
old MITgcm code, older TAF version, and a backward-compatible staf  
script.

-Patrick



On Jan 25, 2010, at 10:49 PM, Santha Akella wrote:

> Patrick,
>
> I hit another problem now.
> After computing the cost, my code dies and it produces the  
> following error msg in the PBS output (for all the CPUs). I guess  
> it has got some thing with the adjoint and TAF. Everything seemed  
> to work fine when I was using -version 1.9.48 in AD_TAF_FLAGS. Now  
> I have problems with that 1.9.48, and TAF was not taking it. I  
> checked with staf -show versions and got: 1.9.22, 1.9.63, 1.9.65,  
> 1.9.66. I tried 1.9.22. And I am having these problems... Thanks  
> for the help! Santha
>
>
> !!!!!!! PANIC !!!!!!! CATASTROPHIC ERROR
>  !!!!!!! PANIC !!!!!!! in S/R BARRIER  myThid =            0   
> nThreads =
>            1
>
>
> On Mon, Jan 25, 2010 at 9:31 PM, Santha Akella  
> <santha.akella at gmail.com> wrote:
> Hello Patrick,
> Thanks so much !
>
> It worked!
> Santha
>
>
> On Mon, Jan 25, 2010 at 9:22 PM, Patrick Heimbach  
> <heimbach at mit.edu> wrote:
>
> Hi Santha,
>
> this error message indicates that at link time the compiler "sees"
> that the executable it generates exceeds the available memory.
>
> This is consistent with the fact that Pleiades (unfortunately) only  
> has 512MB per
> processor (I think) compared to, e.g. Columbia or the GFDL Altix  
> (which both had ~2GB).
> So you'll need to lower your number for your inner-most checkpoint
> which you've currently set to
>      integer    nchklev_1
>      parameter( nchklev_1      =    16 )
>
> Please give that a try.
> -Patrick
>
>
>
>
> On Jan 25, 2010, at 9:15 PM, Santha Akella wrote:
>
> Dear MITgcm Users,
>
> I am trying to build my adjoint model on NASA-Pleiades , everything  
> seems to go well, but in the end of building, I get the following  
> error message. Please Help! I am attaching my Makefile, SIZE.h and  
> tamc.h files. Thank you,
> Santha
> -------r_stats.o  ad_taf_output.o -L/nasa/sgi/mpt/1.23try08/lib64 - 
> lmpi -L/nasa/netcdf/3.6.0/intel/lib -lnetcdf
> /nasa/intel/Compiler/11.0/083/lib/intel64/libifcore.a(for_aio.o):  
> In function `for_waitid':
> for_aio.c:(.text+0x13a8): relocation truncated to fit: R_X86_64_32S  
> against symbol `for__aio_lub_table' defined in COMMON section in / 
> nasa/intel/Compiler/11.0/083/lib/intel64/libifcore.a(for_aio.o)
> /nasa/intel/Compiler/11.0/083/lib/intel64/libifcore.a(for_aio.o):  
> In function `for_asynchronous':
> for_aio.c:(.text+0x2cd7): relocation truncated to fit: R_X86_64_32S  
> against symbol `for__aio_lub_table' defined in COMMON section in / 
> nasa/intel/Compiler/11.0/083/lib/intel64/libifcore.a(for_aio.o)
> /nasa/intel/Compiler/11.0/083/lib/intel64/libifcore.a(for_aio.o):  
> In function `for__aio_release_lun':
> for_aio.c:(.text+0x3633): relocation truncated to fit: R_X86_64_32S  
> against symbol `for__aio_lub_table' defined in COMMON section in / 
> nasa/intel/Compiler/11.0/083/lib/intel64/libifcore.a(for_aio.o)
> for_aio.c:(.text+0x36f7): relocation truncated to fit: R_X86_64_32S  
> against symbol `for__aio_lub_table' defined in COMMON section in / 
> nasa/intel/Compiler/11.0/083/lib/intel64/libifcore.a(for_aio.o)
> for_aio.c:(.text+0x3712): relocation truncated to fit: R_X86_64_32S  
> against symbol `for__aio_lub_table' defined in COMMON section in / 
> nasa/intel/Compiler/11.0/083/lib/intel64/libifcore.a(for_aio.o)
> /nasa/intel/Compiler/11.0/083/lib/intel64/libifcore.a(for_aio.o):  
> In function `for__aio_acquire_lun':
> for_aio.c:(.text+0x3c27): relocation truncated to fit: R_X86_64_32S  
> against symbol `for__aio_lub_table' defined in COMMON section in / 
> nasa/intel/Compiler/11.0/083/lib/intel64/libifcore.a(for_aio.o)
> for_aio.c:(.text+0x3c8f): relocation truncated to fit: R_X86_64_32S  
> against symbol `for__aio_lub_table' defined in COMMON section in / 
> nasa/intel/Compiler/11.0/083/lib/intel64/libifcore.a(for_aio.o)
> for_aio.c:(.text+0x3dd9): relocation truncated to fit: R_X86_64_32S  
> against symbol `for__aio_lub_table' defined in COMMON section in / 
> nasa/intel/Compiler/11.0/083/lib/intel64/libifcore.a(for_aio.o)
> for_aio.c:(.text+0x40c0): relocation truncated to fit: R_X86_64_32S  
> against symbol `for__aio_lub_table' defined in COMMON section in / 
> nasa/intel/Compiler/11.0/083/lib/intel64/libifcore.a(for_aio.o)
> /nasa/intel/Compiler/11.0/083/lib/intel64/libifcore.a(for_aio.o):  
> In function `for__aio_acquire_lun_fname':
> for_aio.c:(.text+0x4369): relocation truncated to fit: R_X86_64_32S  
> against symbol `for__aio_lub_table' defined in COMMON section in / 
> nasa/intel/Compiler/11.0/083/lib/intel64/libifcore.a(for_aio.o)
> /nasa/intel/Compiler/11.0/083/lib/intel64/libifcore.a(for_aio.o):  
> In function `for__aio_error_handling':
> for_aio.c:(.text+0x4b4b): additional relocation overflows omitted  
> from the output
> make: *** [mitgcmuv_ad] Error 1
> -------------------------
>
> ---
> Patrick Heimbach | heimbach at mit.edu | http://www.mit.edu/~heimbach
> MIT | EAPS 54-1518 | 77 Massachusetts Ave | Cambridge MA 02139 USA
> FON +1-617-253-5259 | FAX +1-617-253-4464 | SKYPE patrick.heimbach
>
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support

---
Patrick Heimbach | heimbach at mit.edu | http://www.mit.edu/~heimbach
MIT | EAPS 54-1518 | 77 Massachusetts Ave | Cambridge MA 02139 USA
FON +1-617-253-5259 | FAX +1-617-253-4464 | SKYPE patrick.heimbach





More information about the MITgcm-support mailing list