[MITgcm-support] Fwd: MITgcm with PGI and Ubuntu

Stefano Querin squerin at ogs.trieste.it
Thu Jul 21 11:30:31 EDT 2011


Hi everybody,

I sent this e-mail some days ago directly to Jean-Michel and  
Constantinos to avoid overloading the list with too many (and too  
long...) mails.  But most likely it went into spam... I don't know if  
somebody else is interested in these kind of issues.

In a few words, the model runs VERY slowly on a Sgi H2106 node and we  
think there is still something wrong in some settings.
After receiving other colleagues' advice, we think that activating  
processor affinity and process-core binding could solve our problem  
(on this kind of architecture).
Some people using Sgi H2106 experienced performance issues that were  
solved by using process placement tools (like numactl). They observed  
that the package Numatools (numactl command) can set affinity of  
processes to processors, improving runtime performance.

I'm carrying out further tests but any advice would be very useful.

Could these tools improve the performances also on other kind of  
platforms (maybe someone else can be interested)?

Thank you!

Cheers,

S.

P.S.: I'm sending a second part with some more statistics...


Begin forwarded message:

> From: Stefano Querin <squerin at ogs.trieste.it>
> Date: 06 July 2011 17:01:07 GMT+02:00
> To: Jean-Michel Campin <jmc at ocean.mit.edu>
> Subject: Re: [MITgcm-support] MITgcm with PGI and Ubuntu
>
> Hi Jean-Michel,
>
> thanks for your answer!
>
>> Hi Stefano,
>>
>> I don't know much about this performance issue.
>> But regarding the genmake2 part on Ubuntu, I don't remember
>> having problems when I tried.
>
> OK, I just wanted to be sure about this, so the problem must be  
> elsewhere...
>
>> what is the command you type and is it the latest version
>> of MITgcm/tools that you are using ?
>
> I just updated the compilation and execution-environment related  
> stuff:
>
> /home/squerin/MITgcm/tools
> /home/squerin/MITgcm/eesupp
> /home/squerin/MITgcm/pkg/exch2
> /home/squerin/MITgcm/pkg/regrid
>
>
> For compiling, I launch this script:
>
> #!/bin/bash
> MYPRJ=VECTOR
> MYCODE=codeG4_24p
> MYPKG=DARWIN
> echo "launching MITgcm genmake2 with project $MYPRJ and code  
> $MYCODE ..."
> MITGCM_ROOT=/home/squerin/MITgcm
> MITGCM_BLDOPT=linux_amd64_pgi+mpich+nobyteswap_sgi2106e
> MITGCM_GNMK=${MITGCM_ROOT}/tools/genmake2
> MITGCM_OF=/home/squerin/MIT_home/${MYPRJ}/build_options/$ 
> {MITGCM_BLDOPT}
> MITGCM_CODE=/home/squerin/MIT_home/${MYPRJ}/${MYCODE}
> ${MITGCM_GNMK} -of=${MITGCM_OF} -rootdir=${MITGCM_ROOT} -mods=$ 
> {MITGCM_CODE}
>
>
> With build_options:
>
> #!/bin/bash
> #
> #  $Header: /u/gcmpack/MITgcm/tools/build_options/linux_amd64_g77,v  
> 1.1 2004/01/03 05:31:36 edhill Exp $
> #  $Name: checkpoint56 $
> FC=/opt/pgi/linux86-64/2011/mpi/mpich/bin/mpif77
> CC=/opt/pgi/linux86-64/2011/mpi/mpich/bin/mpicc
> LINK=/opt/pgi/linux86-64/2011/mpi/mpich/bin/mpif77
> DEFINES='-DWORDLENGTH=4 -DALLOW_USE_MPI -DALWAYS_USE_MPI'
> CPP='cpp  -traditional -P'
> NOOPTFLAGS='-O0'
> MAKEDEPEND=/usr/bin/makedepend
> INCLUDES='-I/usr/include -I/opt/pgi/linux86-64/2011/mpi/mpich/include'
> #  For IEEE, use the "-ffloat-store" option
> if test "x$IEEE" = x ; then
>    FFLAGS='-r8 -Mnodclchk -Mextend -Ktrap=fp'
>    FOPTIM='-tp k8-64e -pc=64 -fastsse -O3 -Msmart -Mvect=cachesize: 
> 1048576,transform'
> else
>    FFLAGS='-r8 -Mnodclchk -Mextend -Ktrap=fp'
>    FOPTIM='-tp k8-64e -pc=64 -fastsse -O3 -Msmart -Kieee - 
> Mvect=cachesize:1048576,transform'
> fi
>
>
> and pkg groups:
>
> # $Header: /u/gcmpack/MITgcm/pkg/pkg_groups,v 1.12 2010/12/19  
> 18:06:04 heimbach Exp $
> # $Name:  $
> #-------------------------------
> #  This file contains:
> #  a) the package "groups" definition where groups of packages
> #     are defined so they can be conveniently substituted.
> #  b) the default package list (defined as a pkg group  
> "default_pkg_list")
> #     that genmake2 will, by default, add to MITgcm.
> #  Note: genmake2 will use this default package list when no  
> customized package
> #   list file "packages.conf" can be found
> #        a) where MITgcm is compiled
> #     or b) in the path of modified sources (genmake2, argument: - 
> mods)
> #-------------------------------
> #  package "groups" definition (including default "default_pkg_list"):
> default_pkg_list : DARWIN
> gfd : mom_common mom_fluxform mom_vecinv generic_advdiff debug mdsio  
> rw monitor
> oceanic : gfd gmredi kpp
> atmospheric : gfd shap_filt
> adjoint : autodiff cost ctrl grdchk
> DARWIN : gfd gmredi kpp timeave obcs exf cal diagnostics ptracers  
> gchem darwin
>
>
> After fixing some small issues, mostly related to ee-files  
> (EEPARAMS.h, eeset_params.F, ...), I compile successfully, even if I  
> still obtain this error for the genmake2 script:
>
> ...
> Can we register a signal handler using /opt/pgi/linux86-64/2011/mpi/ 
> mpich/bin/mpif77...  no
> ...
>
> genmake.log (attached):
>
> ...
> running: check_HAVE_SIGREG()
> /opt/pgi/linux86-64/2011/mpi/mpich/bin/mpicc -c genmake_tc_1.c
> NOTE: your trial license will expire in 7 days, 10.6 hours.
> PGC-F-0206-Can't find include file asm/errno.h (/usr/include/ 
> errno.h: 4)
> PGC/x86-64 Linux 11.6-0: compilation aborted
>      program hello
>      integer anint
>      common /iv/ anint
>      external sigreg
>      call sigreg(anint)
>      end
> /opt/pgi/linux86-64/2011/mpi/mpich/bin/mpif77 -r8 -Mnodclchk - 
> Mextend -Ktrap=fp -o genmake_tc genmake_tc_2.f genmake_tc_1.o
> genmake_tc_2.f:
> NOTE: your trial license will expire in 7 days, 10.6 hours.
> /usr/bin/ld: cannot find genmake_tc_1.o: No such file or directory
> --> set HAVE_SIGREG=''
> ...
>
>
> Then I launch the executable with this command line:
>
> /opt/pgi/linux86/2011/mpi/mpich/bin/mpirun -leave_pg -np 24 - 
> machinefile machines ./mitgcmuv_24pG4_ADRI_sgi2106e
>
> But the model is EXTREMELY slow...
>
>> (f90mkdepend was re-writen in bash some time ago).
>
> Now it is up to date.
>
>> Cheers,
>> Jean-Michel
>
> So the problem seems to be still present somewhere...
> The strange thing is that I run the same model, with the same  
> configuration on another machine with excellent performances. I mean  
> that the code was exactly the same (also with an old version of  
> tools, genmake2, etc...), with the same ICs and BCs. The only change  
> was, of course, the hardware and the compiler (and related options).
>
> Any ideas?
>
> Thank you very much!
>
> Cheers,
>
> SQ
>
>
>
>> On Tue, Jul 05, 2011 at 05:45:24PM +0200, Stefano Querin wrote:
>>> Dear MITgcmers,
>>>
>>> we are still trying to understand what's wrong with the new Sgi node
>>> (H2106-G7, 2 Opteron 6172 with 12 cores, 2.1GHz, and 12MB L3 cache,
>>> RAM 15.68 GB) on our cluster. We are experiencing very low
>>> performance (including scalability: see previous "[MITgcm-support]
>>> Scalability on a new Sgi node" issue). Most likely (as Constantinos
>>> told us), there is a problem with memory access/bandwidth.
>>> Anyway, we extracted the node from the cluster and used it as a
>>> stand alone machine in order to isolate the problem: in fact, the
>>> cluster has older CPUs and compiler version (PGI 6.1)... We
>>> installed on the node Ubuntu 11.04 (DISTRIB_CODENAME=natty) with a
>>> trial version of the up to date PGI compiler (11.6, linux86-64).
>>> There are errors and warnings during the compilation, in particular,
>>> when launching the genmake2 we get:
>>>
>>>> launching MITgcm genmake2 with project VECTOR and code  
>>>> codeG4_24p ...
>>>>
>>>> GENMAKE :
>>>>
>>>> A program for GENerating MAKEfiles for the MITgcm project.  For a
>>>> quick list of options, use "genmake -h" or for more detail see:
>>>>
>>>> http://mitgcm.org/devel_HOWTO/
>>>>
>>>> ===  Processing options files and arguments  ===
>>>> getting local config information:  none found
>>>> grep: write error: Broken pipe
>>>
>>> I don't know why...
>>>
>>>> getting OPTFILE information:
>>>>  using OPTFILE="/home/squerin/MIT_home/VECTOR/build_options/
>>>> linux_amd64_pgi+mpich+nobyteswap_sgi2106e"
>>>> getting AD_OPTFILE information:
>>>>  using AD_OPTFILE="/home/squerin/MITgcm/tools/adjoint_options/
>>>> adjoint_default"
>>>>
>>>> ===  Checking system libraries  ===
>>>> Do we have the system() command using
>>>> /opt/pgi/linux86-64/2011/mpi/mpich/bin/mpif77...  yes
>>>> Do we have the fdate() command using
>>>> /opt/pgi/linux86-64/2011/mpi/mpich/bin/mpif77...  yes
>>>> Do we have the etime() command using
>>>> /opt/pgi/linux86-64/2011/mpi/mpich/bin/mpif77...  yes
>>>> Can we call simple C routines (here, "cloc()") using /opt/pgi/
>>>> linux86-64/2011/mpi/mpich/bin/mpif77...  yes
>>>> Can we unlimit the stack size using
>>>> /opt/pgi/linux86-64/2011/mpi/mpich/bin/mpif77...  yes
>>>> Can we register a signal handler using /opt/pgi/linux86-64/2011/
>>>> mpi/mpich/bin/mpif77...  no
>>>
>>> Usually this check was OK...
>>>
>>>> Can we use stat() through C calls...  yes
>>>> Can we create NetCDF-enabled binaries...  no
>>>
>>> This is OK since we don't use NetCDF.
>>>
>>>> ===  Setting defaults  ===
>>>> Adding MODS directories: /home/squerin/MIT_home/VECTOR/codeG4_24p
>>>> Making source files in eesupp from templates
>>>> Making source files in pkg/exch2 from templates
>>>> Making source files in pkg/regrid from templates
>>>>
>>>> ===  Determining package settings  ===
>>>> getting package dependency info from  /home/squerin/MITgcm/pkg/
>>>> pkg_depend
>>>> checking default package list:
>>>>  using PDEFAULT="/home/squerin/MIT_home/VECTOR/pkg/
>>>> pkg_default_DARWIN"
>>>>  before group expansion packages are:  DARWIN
>>>>  replacing "DARWIN" with:   gfd gmredi kpp timeave obcs exf cal
>>>> diagnostics ptracers gchem darwin
>>>>  replacing "gfd" with:   mom_common mom_fluxform mom_vecinv
>>>> generic_advdiff debug mdsio rw monitor
>>>>  after group expansion packages are:   mom_common mom_fluxform
>>>> mom_vecinv generic_advdiff debug mdsio rw monitor gmredi kpp
>>>> timeave obcs exf cal diagnostics ptracers gchem darwin
>>>> applying DISABLE settings
>>>> applying ENABLE settings
>>>>  packages are:   cal darwin debug diagnostics exf gchem
>>>> generic_advdiff gmredi kpp mdsio mom_common mom_fluxform
>>>> mom_vecinv monitor obcs ptracers rw timeave
>>>> applying package dependency rules
>>>>  packages are:   cal darwin debug diagnostics exf gchem
>>>> generic_advdiff gmredi kpp mdsio mom_common mom_fluxform
>>>> mom_vecinv monitor obcs ptracers rw timeave
>>>> Adding STANDARDDIRS
>>>> Searching for *OPTIONS.h files in order to warn about the presence
>>>>  of "#define "-type statements that are no longer allowed:
>>>>  found CPP_OPTIONS="/home/squerin/MIT_home/VECTOR/codeG4_24p/
>>>> CPP_OPTIONS.h"
>>>>  found CPP_EEOPTIONS="/home/squerin/MITgcm/eesupp/inc/
>>>> CPP_EEOPTIONS.h"
>>>> Creating the list of files for the adjoint compiler.
>>>>
>>>> ===  Creating the Makefile  ===
>>>> setting INCLUDES
>>>> Determining the list of source and include files
>>>> Writing makefile: Makefile
>>>> Add the source list for AD code generation
>>>> Making list of "exceptions" that need ".p" files
>>>> Making list of NOOPTFILES
>>>> Add rules for links
>>>> Adding makedepend marker
>>>>
>>>> ===  Done  ===
>>>
>>> I also attach the "genmake_warnings" and "genmake_state" files.
>>>
>>> When launching "make depend" we get this (at the end):
>>>
>>>> /home/squerin/MITgcm/tools/f90mkdepend >> Makefile
>>>> /bin/sh: /home/squerin/MITgcm/tools/f90mkdepend: not found
>>>> make: *** [depend] Error 127
>>>
>>> but we specified: -rootdir=/home/squerin/MITgcm
>>>
>>> Then "mitgcmuv" is created without warnings but the executable is
>>> extremely slow...
>>> We never experienced these warnings/errors in the past, also using
>>> different HPC systems.
>>> This looks like a system libraries/environment problem, but I'm not
>>> a computer scientist so it could be something else (totally
>>> different)...
>>> Did somebody test Ubuntu 11.04? Should we try an older OS version (8
>>> or 9)? I'm getting stuck...
>>>
>>> Thanks for any suggestion!
>>>
>>> Cheers,
>>>
>>> Stefano
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: genmake.log
Type: application/octet-stream
Size: 5104 bytes
Desc: not available
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20110721/15356a3f/attachment.obj>
-------------- next part --------------

>




More information about the MITgcm-support mailing list