[MITgcm-support] adjoint compilation on 64bit system
Constantinos Evangelinos
ce107 at ocean.mit.edu
Thu Feb 24 11:05:23 EST 2005
On Thursday 24 February 2005 05:11, Martin Losch wrote:
> I am trying to compile the adjoint MITgcm on an AMD 64bit opteron
> machine with the Portland Group fortran compiler. I have no problems
> with my build options file linux_amd64_pgf77_ocl (that I am responsible
> for anyway) as long as I don't ALLOW_ECCO_ADJOINT_RUN (in
> ECCO_CPPOPTIONS.h), but use ALLOW_ECCO_FORWARD_RUN. But as soon as
> there is the adjoint model involved (ad_taf_output.f) I get funny error
>
> messages at the link step:
> > ad_taf_output.o: In function `adconvective_adjustment_':
> > ad_taf_output.o(.text+0xf018): relocation truncated to fit:
> > R_X86_64_32S cadtheta_
> > ad_taf_output.o(.text+0xf038): relocation truncated to fit:
> > R_X86_64_32S cadtheta_
> > ad_taf_output.o(.text+0xf4e0): relocation truncated to fit:
> > R_X86_64_32S cadthetb_
> > ad_taf_output.o(.text+0xf501): relocation truncated to fit:
> > R_X86_64_32S cadthetb_
>
> (most of them concern common blocks defined in ad_taf_output.f for the
>
> tapes) but also:
> > /var/tmp.shared/pgi/linux86-64/5.2/lib/libpgc.a(barrier.o): In
> > function `_mp_get_parpar':
> > barrier.o(.text+0x69): relocation truncated to fit: R_X86_64_32S
> > _mp_parpar
> > /var/tmp.shared/pgi/linux86-64/5.2/lib/libpgc.a(barrier.o): In
> > function `_mp_lcpu2':
> > barrier.o(.text+0x34e): relocation truncated to fit: R_X86_64_32S
> > _mp_parpar
>
> which doesn't have anything to do with ad_taf_output.f
> These errors go away when I use an additional optiong: -fpic, which
> according to the man pages "(Linux only) Instructs the compiler to
> generate position-independent code with can be used to create shared
> object files ..."
>
> I don't see why the TAMC/TAF generated code should need this option
> while the remaining part of the code doesn't need it. Any ideas?
Hi Martin.
The problem arises from the sizes of the arrays that you've specified through
your choices in tamc.h.
Unfortunately the 64-bit code memort model in Linux/AMD64 is not one (like
nice IMHO models in Linux for IA-64 and Alpha) but 4! The small model which
is the default compilation target expects all individual objects to be
smaller than 2GB. This provides clear speed advantages. The medium model
allows unlimited sizes for data objects but code sizes smaller than 2GB and
can be selected with -mcmodel=medium for both the GNU, PGI and Pathscale
compilers. There is also an unimplemented (currently by any available
compiler) large model that allows an unlimited code size in addition to
unlimited data size. The fourth model is the kernel model for kernel
compilation.
>From the GCC man page:
-------------------------------------------------------------------------------
-mcmodel=small
Generate code for the small code model: the program and its sym-
bols must be linked in the lower 2 GB of the address space.
Pointers are 64 bits. Programs can be statically or dynamically
linked. This is the default code model.
-mcmodel=kernel
Generate code for the kernel code model. The kernel runs in the
negative 2 GB of the address space. This model has to be used for
Linux kernel code.
-mcmodel=medium
Generate code for the medium model: The program is linked in the
lower 2 GB of the address space but symbols can be located any-
where in the address space. Programs can be statically or dynami-
cally linked, but building of shared libraries are not supported
with the medium model.
-mcmodel=large
Generate code for the large model: This model makes no assumptions
about addresses and sizes of sections. Currently GCC does not
implement this model.
-------------------------------------------------------------------------------
The ugly thing about this is that to use -mcmodel=medium you need to recompile
your MPI libraries as well (which in the case of supercomputer centers is not
something you do). For the PGI compilers and really large arrays it may also
be a good idea to couple this option with -Mlarge_arrays that makes array
index arithmetic 64 bit instead of 32 bit to make sure the arrays can be
addressed properly.
>From the PGI site at:
https://www.pgroup.com/userforum/viewtopic.php?t=18&sid=fa1b206eb0fceb46c0e8806513d1fe20
-------------------------------------------------------------------------------
The -mcmodem=medium and -Mlarge_arrays compiler and linker options are
supported under 64-bit linux environments (they are not supported under
32-bit linux environments).
The -mcmodel=medium option must be used to compile/link a program whose data
and .bss sections exceed 2GB. In order for the program to use these large
data sections, additional addressing instructions that support 64-bit offsets
need to be generated. The effect this option has on performance is a function
of the amount of data-use in the application. Therefore, this option should
be used only when the aggregate data size exceeds 2GB.
The -Mlarge_arrays option tells the compiler that you have at least one single
static data section (array) larger than 2GB. In this case, array accesses
require 64-bit index arithmetic. This option must be used in conjunction with
-mcmodel=medium.
A tell tale sign that you might need -mcmodel=medium occurs when you get
warnings from the linker that mention "relocation truncated to fit".
There are other limitations to -mcmodel=medium (w.r.t. -fpic or
position-independent code, shared libraries, etc.). Refer to the release
notes (page 13) for more information:
http://www.pgroup.com/doc/pgiwsrn.pdf
-------------------------------------------------------------------------------
Why does -fPIC work then?
A nicely written answer can be found at:
http://developers.sun.com/tools/cc/articles/about_amd64_abi.html#space
-------------------------------------------------------------------------------
1. Using the -Kpic option. This creates a position independent code. But the
compiler will generate 64-bit memory reference by using register indirection
via the Global Offset Table with the R_AMD64_GOTPCREL relocatable type. This
will work fine as long as the difference between the current code location
and the location in the Global Offset Table for the corresponding data object
is less than 32 bits.
2. Allocate all static data objects in heap. Then reference the objects via
pointer indirection.
Note the workaround may have a small performance degradation in memory access
due to reference indirection.
-------------------------------------------------------------------------------
You also need to keep in mind that there was a limitation in the GNU assembler
that limited individual common blocks to being less than 2GB in size. This is
not the case with binutils 2.14 and later.
These issues came up with my runs at NCAR for some time now but I foolishly
did not think of letting others know about it as I did not realise others
were also doing adjoint runs.
Constantinos
--
Dr. Constantinos Evangelinos
Department of Earth, Atmospheric and Planetary Sciences
Massachusetts Institute of Technology
More information about the MITgcm-support
mailing list