[MITgcm-support] Compiler options for ifort on x86_64
Constantinos Evangelinos
ce107 at ocean.mit.edu
Thu Oct 20 10:52:34 EDT 2005
On Thursday 20 October 2005 06:31, Lucas Merckelbach wrote:
> Hi *,
>
> In an attempt to compile the mitgcm model for a 512x512x32 grid on
> x86_64 machines (EM64T and AMD64), the linking stage fails with
>
> impldiff.o: In function `impldiff_':
> impldiff.f:(.text+0x697): relocation truncated to fit: R_X86_64_PC32
> against `.bss'
>
>
>
>
> which apparantly is caused by arrays that are too big, as a 256x256x32
> gird compiles just fine. A very similar "relocation truncated to fit"
> error is obtained with the following test program:
>
> program main
> real*8 a(600000000)
> a(1) = 10.0
> stop
> end
>
> when compiled without any specific options
>
> /nerc/packages/intel_compilers/intel_fce_8.1/lib/libifcore.a(for_init.o)(.t
>ext+0x20):
>
> In function `for_rtl_init_':
> : relocation truncated to fit: R_X86_64_PC32 .bss
>
> However, adding the option -i_dynamic to the linker (also ifort) cures
> this. Unfortunately, it makes no difference for the mitgcm model, if I add
> the option to the linker (in the Makefile). Also adding -fpic to FFLAGS
> didn't help, as I found in the mitgcm-archives.
>
> The Makefile is generated from
> genmake2 -fc=ifort -of [..]/build_options/linux_amd64_ifort
> Then the Makefile is tweaked to have the option -i_dynamic during linking.
>
> Using g77 as fortran compiler fails as well, also generating "relocation
> truncated to fit" errors.
>
> Then I thought, inspired by what google came up with, that it may be
> related to the linker 'ld'.
>
> $ ld --version
> GNU ld version 2.15.92.0.2 20040927
>
> Installing the newest version of binutils:
> $ ld --version
> GNU ld version 2.16
> didn't make a difference, though.
>
> Using the buildoption file "linux_ia64_ifort" on an ia64 machine, and
> everything *does* work.
>
> Does anyone have any experience with this issue or any suggestions?
>
> Cheers,
>
> Lucas
A few issues as the person that wrote that optfile. For small problems it
works fine obviously...
Solutions:
1) Compile everything (including MPI/NetCDF libraries) with -fPIC (-fpic etc.
should be the same) and link with -i_dynamic. This should work in most if not
all cases.
2) Compile everything (apparently and unfortunately including MPI/NetCDF
libraries) with -mcmodel=medium. This should work in all cases with a small
performance penalty. However it requires that the Intel fortran runtime libs
are also compiled with -mcmodel=medium and that is currently not the case.
Adding -i_dunamic solves that problem but the executable ends up requiring
runtime libraries.
3) Compile everything (apparently and unfortunately including MPI/NetCDF
libraries) with -mcmodel=large. This should work in all cases with a slightly
larger performance penalty than (2) but the only compiler that will actually
accept the -mcmodel=large flag is the Intel one and I have no idea whether
it actually does with it because I'm unsure the rest of the GNU toolchain
knows what to do about it. Furthermore it also requires that the Intel
fortran runtime libs are also compiled with -mcmodel=large and that is
currently not the case. Adding -i_dunamic solves that problem but the
executable ends up requiring runtime libraries.
4) In my experience solution (2) works for the GNU and PGI compilers without
requiring dynamic linking of the final executable (that is a static
executable is possible). For solution (1) only a dynamically linked
executable is possible.
Explanation:
In its effort to extend the world of x86 to 64 bits in x84_64/AMD64 (and the
compatible EMT64 from Intel) AMD chose not to follow a clean 64-bit memory
model such as the one in IA64 or the Alphas. Instead there are three (3)
memory models, named aptly small (the default), medium and large (which is
largely left unimplemented). The difference is that in the small memory model
all code and data objects cannot be individually larger than the 2GB size
that a 32bit pointer can address. In the medium memory model a data object
can be larger than 2GB and in the large one both code and data can be larger.
Since relocatable data (such as in shared libraries) are addressed indirectly
(with an extra pointer dereferencing for the relocation) this restriction
should not be there for code compiled with -fPIC.
Illustration:
Changing your code to the following so that optimizers would not get rid of
the content, compiling with g77 -O3 -funroll-loops and ifort -O3 -xW
program main
integer*8 i,size
parameter (size=1000000000)
real*8 a(size)
a(1)=1D-9
do i=2,size
a(i) = 1D-9+a(i-1)
enddo
print *, a(size)
stop
end
batsi:/data4/ce107% time testlarge.g77
0.999999993
5.367u 11.211s 0:16.62 99.6% 0+0k 0+0io 0pf+0w
batsi:/data4/ce107% time testlarge.g77medium
0.999999993
10.664u 11.111s 0:21.86 99.5% 0+0k 0+0io 0pf+0w
batsi:/data4/ce107% time testlarge.ifort
0.999999992539933
4.509u 11.439s 0:15.99 99.6% 0+0k 0+0io 0pf+0w
batsi:/data4/ce107% time testlarge.ifortmedium
0.999999992539933
4.575u 11.408s 0:16.01 99.7% 0+0k 0+0io 0pf+0w
batsi:/data4/ce107% time testlarge.ifortlarge
0.999999992539933
4.812u 11.330s 0:16.20 99.6% 0+0k 0+0io 0pf+0w
You will notice that system time (the time for the O/S to allocate the pages
to get 8GB of RAM for the code) remains essentially the same, but user time
increases going from small with -fPIC, to medium to large.
--
Dr. Constantinos Evangelinos
Department of Earth, Atmospheric and Planetary Sciences
Massachusetts Institute of Technology
More information about the MITgcm-support
mailing list