[MITgcm-support] adjoint compilation on 64bit system

Thu Feb 24 11:05:23 EST 2005

On Thursday 24 February 2005 05:11, Martin Losch wrote:

> I am trying to compile the adjoint MITgcm on an AMD 64bit opteron
> machine with the Portland Group fortran compiler. I have no problems
> with my build options file linux_amd64_pgf77_ocl (that I am responsible
> for anyway) as long as I don't ALLOW_ECCO_ADJOINT_RUN (in
> ECCO_CPPOPTIONS.h), but use ALLOW_ECCO_FORWARD_RUN. But as soon as
> there is the adjoint model involved (ad_taf_output.f) I get funny error
>
> messages at the link step:
> > ad_taf_output.o: In function `adconvective_adjustment_':
> > ad_taf_output.o(.text+0xf018): relocation truncated to fit:
> > R_X86_64_32S cadtheta_
> > ad_taf_output.o(.text+0xf038): relocation truncated to fit:
> > R_X86_64_32S cadtheta_
> > ad_taf_output.o(.text+0xf4e0): relocation truncated to fit:
> > R_X86_64_32S cadthetb_
> > ad_taf_output.o(.text+0xf501): relocation truncated to fit:
> > R_X86_64_32S cadthetb_
>
> (most of them concern common blocks defined in ad_taf_output.f for the
>
> tapes) but also:
> > /var/tmp.shared/pgi/linux86-64/5.2/lib/libpgc.a(barrier.o): In
> > function `_mp_get_parpar':
> > barrier.o(.text+0x69): relocation truncated to fit: R_X86_64_32S
> > _mp_parpar
> > /var/tmp.shared/pgi/linux86-64/5.2/lib/libpgc.a(barrier.o): In
> > function `_mp_lcpu2':
> > barrier.o(.text+0x34e): relocation truncated to fit: R_X86_64_32S
> > _mp_parpar
>
> which doesn't have anything to do with ad_taf_output.f
> These errors go away when I use an additional optiong: -fpic, which
> according to the man pages "(Linux only) Instructs the compiler to
> generate position-independent code with can be used to create shared
> object files ..."
>
> I don't see why the TAMC/TAF generated code should need this option
> while the remaining part of the code doesn't need it. Any ideas?

Hi Martin. 

The problem arises from the sizes of the arrays that you've specified through 
your choices in tamc.h. 

Unfortunately the 64-bit code memort model in Linux/AMD64 is not one (like 
nice IMHO models in Linux for IA-64 and Alpha) but 4! The small model which 
is the default compilation target expects all individual objects to be 
smaller than 2GB. This provides clear speed advantages. The medium model 
allows unlimited sizes for data objects but code sizes smaller than 2GB and 
can be selected with -mcmodel=medium for both the GNU, PGI and Pathscale 
compilers. There is also an unimplemented (currently by any available 
compiler) large model that allows an unlimited code size in addition to 
unlimited data size. The fourth model is the kernel model for kernel 
compilation.

>From the GCC man page:
-------------------------------------------------------------------------------
       -mcmodel=small
           Generate code for the small code model: the program and its sym-
           bols must be linked in the lower 2 GB of the address space.
           Pointers are 64 bits.  Programs can be statically or dynamically
           linked.  This is the default code model.

       -mcmodel=kernel
           Generate code for the kernel code model.  The kernel runs in the
           negative 2 GB of the address space.  This model has to be used for
           Linux kernel code.

       -mcmodel=medium
           Generate code for the medium model: The program is linked in the
           lower 2 GB of the address space but symbols can be located any-
           where in the address space.  Programs can be statically or dynami-
           cally linked, but building of shared libraries are not supported
           with the medium model.

       -mcmodel=large
           Generate code for the large model: This model makes no assumptions
           about addresses and sizes of sections.  Currently GCC does not
           implement this model.
-------------------------------------------------------------------------------

The ugly thing about this is that to use -mcmodel=medium you need to recompile 
your MPI libraries as well (which in the case of supercomputer centers is not 
something you do). For the PGI compilers and really large arrays it may also 
be a good idea to couple this option with -Mlarge_arrays that makes array 
index arithmetic 64 bit instead of 32 bit to make sure the arrays can be 
addressed properly.

>From the PGI site at:
https://www.pgroup.com/userforum/viewtopic.php?t=18&sid=fa1b206eb0fceb46c0e8806513d1fe20
-------------------------------------------------------------------------------
The -mcmodem=medium and -Mlarge_arrays compiler and linker options are 
supported under 64-bit linux environments (they are not supported under 
32-bit linux environments).

The -mcmodel=medium option must be used to compile/link a program whose data 
and .bss sections exceed 2GB. In order for the program to use these large 
data sections, additional addressing instructions that support 64-bit offsets 
need to be generated. The effect this option has on performance is a function 
of the amount of data-use in the application. Therefore, this option should 
be used only when the aggregate data size exceeds 2GB.

The -Mlarge_arrays option tells the compiler that you have at least one single 
static data section (array) larger than 2GB. In this case, array accesses 
require 64-bit index arithmetic. This option must be used in conjunction with 
-mcmodel=medium.

A tell tale sign that you might need -mcmodel=medium occurs when you get 
warnings from the linker that mention "relocation truncated to fit".

There are other limitations to -mcmodel=medium (w.r.t. -fpic or 
position-independent code, shared libraries, etc.). Refer to the release 
notes (page 13) for more information:

http://www.pgroup.com/doc/pgiwsrn.pdf 
-------------------------------------------------------------------------------

Why does -fPIC work then?
A nicely written answer can be found at:

http://developers.sun.com/tools/cc/articles/about_amd64_abi.html#space

-------------------------------------------------------------------------------
 1. Using the -Kpic option. This creates a position independent code. But the 
compiler will generate 64-bit memory reference by using register indirection 
via the Global Offset Table with the R_AMD64_GOTPCREL relocatable type. This 
will work fine as long as the difference between the current code location 
and the location in the Global Offset Table for the corresponding data object 
is less than 32 bits.

2. Allocate all static data objects in heap. Then reference the objects via 
pointer indirection.

Note the workaround may have a small performance degradation in memory access 
due to reference indirection.
-------------------------------------------------------------------------------

You also need to keep in mind that there was a limitation in the GNU assembler 
that limited individual common blocks to being less than 2GB in size. This is 
not the case with binutils 2.14 and later.

These issues came up with my runs at NCAR for some time now but I foolishly 
did not think of letting others know about it as I did not realise others 
were also doing adjoint runs.

Constantinos
-- 
Dr. Constantinos Evangelinos
Department of Earth, Atmospheric and Planetary Sciences
Massachusetts Institute of Technology