[MITgcm-support] Compiler options for ifort on x86_64

Thu Oct 20 10:52:34 EDT 2005

On Thursday 20 October 2005 06:31, Lucas Merckelbach wrote:

> Hi *,
>
> In an attempt to compile the mitgcm model for a 512x512x32 grid on
> x86_64 machines (EM64T and AMD64), the linking stage fails with
>
> impldiff.o: In function `impldiff_':
> impldiff.f:(.text+0x697): relocation truncated to fit: R_X86_64_PC32
> against `.bss'
>
>
>
>
> which apparantly is caused by arrays that are too big, as a 256x256x32
> gird compiles just fine. A very similar "relocation truncated to fit"
> error is obtained with the following test program:
>
>        program main
>        real*8 a(600000000)
>        a(1) = 10.0
>        stop
>        end
>
> when compiled without any specific options
>
> /nerc/packages/intel_compilers/intel_fce_8.1/lib/libifcore.a(for_init.o)(.t
>ext+0x20):
>
> In function `for_rtl_init_':
> : relocation truncated to fit: R_X86_64_PC32 .bss
>
> However, adding the option -i_dynamic to the linker (also ifort) cures
> this. Unfortunately, it makes no difference for the mitgcm model, if I add
> the option to the linker (in the Makefile). Also adding -fpic to FFLAGS
> didn't help, as I found in the mitgcm-archives.
>
> The Makefile is generated from
> genmake2 -fc=ifort -of [..]/build_options/linux_amd64_ifort
> Then the Makefile is tweaked to have the option -i_dynamic during linking.
>
> Using g77 as fortran compiler fails as well, also generating "relocation
> truncated to fit" errors.
>
> Then I thought, inspired by what google came up with, that it may be
> related to the linker 'ld'.
>
> $ ld --version
> GNU ld version 2.15.92.0.2 20040927
>
> Installing the newest version of binutils:
> $ ld --version
> GNU ld version 2.16
> didn't make a difference, though.
>
> Using the buildoption file "linux_ia64_ifort" on an ia64 machine, and
> everything *does* work.
>
> Does anyone have any experience with this issue or any suggestions?
>
> Cheers,
>
> Lucas

A few issues as the person that wrote that optfile. For small problems it 
works fine obviously...

Solutions:
1) Compile everything (including MPI/NetCDF libraries) with -fPIC (-fpic etc. 
should be the same) and link with -i_dynamic. This should work in most if not 
all cases. 
2) Compile everything (apparently and unfortunately including MPI/NetCDF 
libraries) with -mcmodel=medium. This should work in all cases with a small 
performance penalty. However it requires that the Intel fortran runtime libs 
are also compiled with -mcmodel=medium and that is currently not the case. 
Adding -i_dunamic solves that problem but the executable ends up requiring 
runtime libraries.
3) Compile everything (apparently and unfortunately including MPI/NetCDF 
libraries) with -mcmodel=large. This should work in all cases with a slightly 
larger performance penalty than (2) but the only compiler that will actually 
accept the -mcmodel=large flag is the Intel one and I have no  idea whether 
it actually does with it because I'm unsure the rest of the GNU toolchain 
knows what to do about it. Furthermore it also requires that the Intel 
fortran runtime libs are also compiled with -mcmodel=large and that is 
currently not the case. Adding -i_dunamic solves that problem but the 
executable ends up requiring runtime libraries.
4) In my experience solution (2) works for the GNU and PGI compilers without 
requiring dynamic linking of the final executable (that is a static 
executable is possible). For solution (1) only a dynamically linked 
executable is possible.

Explanation:
In its effort to extend the world of x86 to 64 bits in x84_64/AMD64 (and the 
compatible EMT64 from Intel) AMD chose not to follow a clean 64-bit memory 
model such as the one in IA64 or the Alphas. Instead there are three (3) 
memory models, named aptly small (the default), medium and large (which is 
largely left unimplemented). The difference is that in the small memory model 
all code and data objects cannot be individually larger than the 2GB size 
that a 32bit pointer can address. In the medium memory model a data object 
can be larger than 2GB and in the large one both code and data can be larger. 
Since relocatable data (such as in shared libraries) are addressed indirectly 
(with an extra pointer dereferencing for the relocation) this restriction 
should not be there for code compiled with -fPIC. 

Illustration:
Changing your code to the following so that optimizers would not get rid of 
the content, compiling with g77 -O3 -funroll-loops and ifort -O3 -xW

      program main
      integer*8 i,size
      parameter (size=1000000000)
      real*8 a(size)

      a(1)=1D-9
      do i=2,size
         a(i) = 1D-9+a(i-1)
      enddo
      print *, a(size)
      stop
      end

batsi:/data4/ce107% time testlarge.g77
  0.999999993
5.367u 11.211s 0:16.62 99.6%    0+0k 0+0io 0pf+0w
batsi:/data4/ce107% time testlarge.g77medium
  0.999999993
10.664u 11.111s 0:21.86 99.5%   0+0k 0+0io 0pf+0w
batsi:/data4/ce107% time testlarge.ifort
  0.999999992539933
4.509u 11.439s 0:15.99 99.6%    0+0k 0+0io 0pf+0w
batsi:/data4/ce107% time testlarge.ifortmedium
  0.999999992539933
4.575u 11.408s 0:16.01 99.7%    0+0k 0+0io 0pf+0w
batsi:/data4/ce107% time testlarge.ifortlarge
  0.999999992539933
4.812u 11.330s 0:16.20 99.6%    0+0k 0+0io 0pf+0w

You will notice that system time (the time for the O/S to allocate the pages 
to get 8GB of RAM for the code) remains essentially the same, but user time 
increases going from small with -fPIC, to medium to large.
-- 
Dr. Constantinos Evangelinos
Department of Earth, Atmospheric and Planetary Sciences
Massachusetts Institute of Technology