[MITgcm-support] adjoint compilation on 64bit system

Thu Feb 24 11:47:32 EST 2005

On Thursday 24 February 2005 11:34, Martin Losch wrote:

> hey, thanks for the long and knowledgable answer. I have to read it
> again tonight with a little more time on my hands. From a quick glance
> I think taht it is possible to boil it down to the following cook book:
> If I have output (ecco_c*, OPWARM*) generated with linux_ia32_ifc type
> build options file (i can't do anything about that, but repeat the
> whole optimization). In order to be able to read these files with an
> executable that has bin generated with a linux_amd64_pgf77 type build
> options file I have to build that executable with the flag
> -mcmodel=medium -Mlarge_arrays options added to the FFLAGS line.
>
> Is that correct?

Actually my answer dealt only with the linking and fPIC parts. You appear to 
have an inconsistent set of options between linux_ia32_ifc (which uses 
-D_BYTESWAPIO but any sequential Fortran output outside of MDSIO gets done in 
little-endian order) and linux_amd64_pgf77_ocl/linux_amd64_pgf77+mpi_ocl 
(where you do not use -D_BYTESWAPIO but compile with -byteswapio which means 
that not only MDS output but also sequential Fortran output outside of MDSIO 
gets done in big-endian order). So to fix things you might want to compile 
with -D_BYTESWAPIO but without -byteswapio. That however may not be the only 
source of your problems. Patrick is also e-mailing you with more ideas.

Constantinos

Constantinos
 

>
> Martin
>
> On Feb 24, 2005, at 5:05 PM, Constantinos Evangelinos wrote:
> > On Thursday 24 February 2005 05:11, Martin Losch wrote:
> >> I am trying to compile the adjoint MITgcm on an AMD 64bit opteron
> >> machine with the Portland Group fortran compiler. I have no problems
> >> with my build options file linux_amd64_pgf77_ocl (that I am
> >> responsible
> >> for anyway) as long as I don't ALLOW_ECCO_ADJOINT_RUN (in
> >> ECCO_CPPOPTIONS.h), but use ALLOW_ECCO_FORWARD_RUN. But as soon as
> >> there is the adjoint model involved (ad_taf_output.f) I get funny
> >> error
> >>
> >> messages at the link step:
> >>> ad_taf_output.o: In function `adconvective_adjustment_':
> >>> ad_taf_output.o(.text+0xf018): relocation truncated to fit:
> >>> R_X86_64_32S cadtheta_
> >>> ad_taf_output.o(.text+0xf038): relocation truncated to fit:
> >>> R_X86_64_32S cadtheta_
> >>> ad_taf_output.o(.text+0xf4e0): relocation truncated to fit:
> >>> R_X86_64_32S cadthetb_
> >>> ad_taf_output.o(.text+0xf501): relocation truncated to fit:
> >>> R_X86_64_32S cadthetb_
> >>
> >> (most of them concern common blocks defined in ad_taf_output.f for the
> >>
> >> tapes) but also:
> >>> /var/tmp.shared/pgi/linux86-64/5.2/lib/libpgc.a(barrier.o): In
> >>> function `_mp_get_parpar':
> >>> barrier.o(.text+0x69): relocation truncated to fit: R_X86_64_32S
> >>> _mp_parpar
> >>> /var/tmp.shared/pgi/linux86-64/5.2/lib/libpgc.a(barrier.o): In
> >>> function `_mp_lcpu2':
> >>> barrier.o(.text+0x34e): relocation truncated to fit: R_X86_64_32S
> >>> _mp_parpar
> >>
> >> which doesn't have anything to do with ad_taf_output.f
> >> These errors go away when I use an additional optiong: -fpic, which
> >> according to the man pages "(Linux only) Instructs the compiler to
> >> generate position-independent code with can be used to create shared
> >> object files ..."
> >>
> >> I don't see why the TAMC/TAF generated code should need this option
> >> while the remaining part of the code doesn't need it. Any ideas?
> >
> > Hi Martin.
> >
> > The problem arises from the sizes of the arrays that you've specified
> > through
> > your choices in tamc.h.
> >
> > Unfortunately the 64-bit code memort model in Linux/AMD64 is not one
> > (like
> > nice IMHO models in Linux for IA-64 and Alpha) but 4! The small model
> > which
> > is the default compilation target expects all individual objects to be
> > smaller than 2GB. This provides clear speed advantages. The medium
> > model
> > allows unlimited sizes for data objects but code sizes smaller than
> > 2GB and
> > can be selected with -mcmodel=medium for both the GNU, PGI and
> > Pathscale
> > compilers. There is also an unimplemented (currently by any available
> > compiler) large model that allows an unlimited code size in addition to
> > unlimited data size. The fourth model is the kernel model for kernel
> > compilation.
> >
> >> From the GCC man page:
> >
> > -----------------------------------------------------------------------
> > --------
> >        -mcmodel=small
> >            Generate code for the small code model: the program and its
> > sym-
> >            bols must be linked in the lower 2 GB of the address space.
> >            Pointers are 64 bits.  Programs can be statically or
> > dynamically
> >            linked.  This is the default code model.
> >
> >        -mcmodel=kernel
> >            Generate code for the kernel code model.  The kernel runs
> > in the
> >            negative 2 GB of the address space.  This model has to be
> > used for
> >            Linux kernel code.
> >
> >        -mcmodel=medium
> >            Generate code for the medium model: The program is linked
> > in the
> >            lower 2 GB of the address space but symbols can be located
> > any-
> >            where in the address space.  Programs can be statically or
> > dynami-
> >            cally linked, but building of shared libraries are not
> > supported
> >            with the medium model.
> >
> >        -mcmodel=large
> >            Generate code for the large model: This model makes no
> > assumptions
> >            about addresses and sizes of sections.  Currently GCC does
> > not
> >            implement this model.
> > -----------------------------------------------------------------------
> > --------
> >
> > The ugly thing about this is that to use -mcmodel=medium you need to
> > recompile
> > your MPI libraries as well (which in the case of supercomputer centers
> > is not
> > something you do). For the PGI compilers and really large arrays it
> > may also
> > be a good idea to couple this option with -Mlarge_arrays that makes
> > array
> > index arithmetic 64 bit instead of 32 bit to make sure the arrays can
> > be
> > addressed properly.
> >
> >> From the PGI site at:
> >
> > https://www.pgroup.com/userforum/viewtopic.php?
> > t=18&sid=fa1b206eb0fceb46c0e8806513d1fe20
> > -----------------------------------------------------------------------
> > --------
> > The -mcmodem=medium and -Mlarge_arrays compiler and linker options are
> > supported under 64-bit linux environments (they are not supported under
> > 32-bit linux environments).
> >
> > The -mcmodel=medium option must be used to compile/link a program
> > whose data
> > and .bss sections exceed 2GB. In order for the program to use these
> > large
> > data sections, additional addressing instructions that support 64-bit
> > offsets
> > need to be generated. The effect this option has on performance is a
> > function
> > of the amount of data-use in the application. Therefore, this option
> > should
> > be used only when the aggregate data size exceeds 2GB.
> >
> > The -Mlarge_arrays option tells the compiler that you have at least
> > one single
> > static data section (array) larger than 2GB. In this case, array
> > accesses
> > require 64-bit index arithmetic. This option must be used in
> > conjunction with
> > -mcmodel=medium.
> >
> > A tell tale sign that you might need -mcmodel=medium occurs when you
> > get
> > warnings from the linker that mention "relocation truncated to fit".
> >
> > There are other limitations to -mcmodel=medium (w.r.t. -fpic or
> > position-independent code, shared libraries, etc.). Refer to the
> > release
> > notes (page 13) for more information:
> >
> > http://www.pgroup.com/doc/pgiwsrn.pdf
> > -----------------------------------------------------------------------
> > --------
> >
> > Why does -fPIC work then?
> > A nicely written answer can be found at:
> >
> > http://developers.sun.com/tools/cc/articles/about_amd64_abi.html#space
> >
> > -----------------------------------------------------------------------
> > --------
> >  1. Using the -Kpic option. This creates a position independent code.
> > But the
> > compiler will generate 64-bit memory reference by using register
> > indirection
> > via the Global Offset Table with the R_AMD64_GOTPCREL relocatable
> > type. This
> > will work fine as long as the difference between the current code
> > location
> > and the location in the Global Offset Table for the corresponding data
> > object
> > is less than 32 bits.
> >
> > 2. Allocate all static data objects in heap. Then reference the
> > objects via
> > pointer indirection.
> >
> > Note the workaround may have a small performance degradation in memory
> > access
> > due to reference indirection.
> > -----------------------------------------------------------------------
> > --------
> >
> > You also need to keep in mind that there was a limitation in the GNU
> > assembler
> > that limited individual common blocks to being less than 2GB in size.
> > This is
> > not the case with binutils 2.14 and later.
> >
> > These issues came up with my runs at NCAR for some time now but I
> > foolishly
> > did not think of letting others know about it as I did not realise
> > others
> > were also doing adjoint runs.
> >
> > Constantinos
> > --
> > Dr. Constantinos Evangelinos
> > Department of Earth, Atmospheric and Planetary Sciences
> > Massachusetts Institute of Technology
> >
> >
> > _______________________________________________
> > MITgcm-support mailing list
> > MITgcm-support at mitgcm.org
> > http://dev.mitgcm.org/mailman/listinfo/mitgcm-support
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://dev.mitgcm.org/mailman/listinfo/mitgcm-support

-- 
Dr. Constantinos Evangelinos
Department of Earth, Atmospheric and Planetary Sciences
Massachusetts Institute of Technology