[MITgcm-support] Out of memory:?optfile for IBM AIX with complier xlf90 (michael schaferkotter)

michael schaferkotter schaferk at bellsouth.net
Fri Sep 5 07:49:32 EDT 2014


as Jean-Michel has helped focus.

to test the conjecture about the 'burden' of the diagnostics package (that i 'hinted' at, rather than explicitly stated), simply remove the diagnostics package from the compile by commenting
out the diagnostics packages from code/packages.conf, recompile and execute for a few timesteps. this will eliminate diagnostics from the picture, if it is the culprit.

if it is not the problem, then memory will need to be looked as jm outlines.
if it is        the problem, then i have given suggestions to still use diagnostics by reducing the size of the executable.

#
# $Header: /u/gcmpack/MITgcm/verification/plume_on_slope/code/packages.conf,v 1.3 2003/10/10 00:35:37 jmc Exp $
# $Name:  $

# DISABLE="aim gmredi zonal_filt"
# ENABLE="kpp shap_filt obcs timeave"

debug
generic_advdiff
kpp
mdsio
mom_fluxform
mom_vecinv
monitor
obcs
rw
timeave
cal
exf
####diagnostics              <---------  
#ecco


On Sep 4, 2014, at 11:11 PM, Jean-Michel Campin wrote:

> Dear Wangg,
> 
> First of all, I am sorry that I am not able to write correctly
> your first name, so I ended up using your username in this reply.
> 
> I would like to make few suggestions:
> 1) from the error you get from the system, it seems that the executable 
>  is asking for too much memory.
>  what is memory limitation of the platform/nodes you are trying to
>  run on ? and what is the size of your executable (that you can get
>  from "size migcmuv") ?
> 2) if the first one (memory of the node) if significantly larger than
>  the second, then it might have something to do with:
>  a) queue limitation ? 
>  b) memory model ? I know that on some platforms and with some compilers,
>  I need to set the compiler flag "-mcmodel=medium" or sometimes 
>  "-fPIC" to run executable larger than 2.G
>  c) something else specific to this computer (and I would recommand 
>  to ask your sys-admin for help).
> 3) if the first one (memory of the node) if smaller than the second
>  (size of the executable),
>  you could start to remove some packages (from packages.conf) that
>  you know you are not using, re-compile, and see what happens.
>  For instance (although neither uses so much memory), you are likely
>  to use one of the 2 momentum pkg (pkg/mom_fluxform by default or
>  pkg/mom_vecinv if you use vectorInvariantMomentum=.TRUE., in "data")
>  and you should be able not to compile the other one.
>  The content of your "packages.conf" might help us to advise where
>  to save some memory.
> 4) As Michael suggested, turning off "useDiagnostics" (run time parameter)
>  is not going to fix the problem, which is a compiling-step issue.
>  If you want to keep using pkg/daignostics but would like
>  to reduce the memory footprint, I would suggest to only play with
>  the 2 parameters mentionned in DIAGNOSTICS_SIZE.h:
>> C Note : may need to increase "numDiags" when using several 2D/3D diagnostics,
>> C  and "diagSt_size" (statistics-diags) since values here are deliberately small.
>  since the other parameter defaults are not responsible for 
>  large memory usage + it might not be always very safe to change them.
> 
> Cheers,
> Jean-Michel
> 
> On Fri, Sep 05, 2014 at 09:39:45AM +0800, 王刚 wrote:
>> Dear Michael
>> Thank you for your advise. I disabled the Diagnostics by setting useDiagnostics=.FALSE., in data.pkg file. However, the problem is still there. The size of my domain is not very large, I think. Here is my size.h:
>>  PARAMETER (
>>     &           sNx =  250,
>>     &           sNy =  80,
>>     &           OLx =   3,
>>     &           OLy =   3,
>>     &           nSx =   1,
>>     &           nSy =   1,
>>     &           nPx =   8,
>>     &           nPy =   1,
>>     &           Nx  = sNx*nSx*nPx,
>>     &           Ny  = sNy*nSy*nPy,
>>     &           Nr  =  50)
>> 
>> and DIAGNOSTICS_SIZE.h:
>> 
>> PARAMETER( ndiagMax = 500 )
>>      PARAMETER( numlists = 10, numperlist = 50, numLevels=2*Nr )
>>      PARAMETER( numdiags = 10*Nr )
>>      PARAMETER( nRegions = 0 , sizRegMsk = 1 , nStats = 4 )
>>      PARAMETER( diagSt_size = 10*Nr )
>> 
>> Did I misunderstand your seggustion? The job goes well in a HP machine compiled by ifort.
>> 
>>> 
>>> -------------- next part --------------
>>> An HTML attachment was scrubbed...
>>> URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20140904/0bb6fa4a/attachment-0001.htm>
>>> 
>>> ------------------------------
>>> 
>>> Message: 3
>>> Date: Thu, 4 Sep 2014 10:43:30 -0500
>>> From: michael schaferkotter <schaferk at bellsouth.net>
>>> To: mitgcm-support at mitgcm.org
>>> Subject: Re: [MITgcm-support] Out of memory:?optfile for IBM AIX with
>>> 	complier xlf90
>>> Message-ID: <879FDD18-7922-4F82-A30D-6B19F17509A6 at bellsouth.net>
>>> Content-Type: text/plain; charset=utf-8
>>> 
>>> greetings;
>>> 
>>> the problem may or may not be the opts file. (probably not)
>>> 
>>> 
>>> 
>>> what i would like to see is the code/SIZE.h file and code/DIAGNOSTICS_SIZE.h
>>> 
>>> the use of the standard distribution code/DIAGNOSTICS_SIZE.h file caused similar problems for me with a largish domain 330million point  computational grid.
>>> 
>>> re: diagnostics pkg.
>>> 
>>> though very useful in many cases, the use of diagnostics can make the executable large.
>>> 
>>> here is shown how code/DIAGNOSTICS_SIZE.h was altered to make the executable smaller
>>> 
>>> diff DIAGNOSTICS_SIZE.h 
>>> 1c1
>>> < C $Header: /u/gcmpack/MITgcm/pkg/diagnostics/DIAGNOSTICS_SIZE.h,v 1.5 2008/02/05 15:31:19 jmc Exp $
>>> ---
>>>> C $Header: /u/gcmpack/MITgcm/pkg/diagnostics/DIAGNOSTICS_SIZE.h,v 1.4 2006/01/23 22:24:28 jmc Exp $
>>> 24,25c24,25
>>> <       PARAMETER( numlists = 10, numperlist = 50, numLevels=2*Nr )
>>> <       PARAMETER( numDiags = 1*Nr )
>>> ---
>>>>      PARAMETER( numlists = 6, numperlist = 10, numLevels=Nr )
>>>>      PARAMETER( numDiags = 60*Nr )
>>> 27c27
>>> <       PARAMETER( diagSt_size = 10*Nr )
>>> ---
>>>>      PARAMETER( diagSt_size = 60*Nr )
>>> 
>>> recompile and run again.
>>> 
>>> you can disable the use of diagnostics with
>>> 
>>> [mach:DOMAIN/expt_num/run] me% more data.pkg
>>> # Packages
>>> &PACKAGES
>>> useOBCS=.TRUE.,
>>> #useDiagnostics=.TRUE.,
>>> useMNC=.FALSE.,
>>> useEXF=.TRUE.,
>>> #useEcco=.TRUE.,
>>> &
>>> 
>>> The estimated time to complete the test is approximately 10 minutes + time to sit in batch queue.
>>> 
>>> 
>>> On Sep 3, 2014, at 7:59 PM, ?? wrote:
>>> 
>>>> Dear all,
>>>> 
>>>> Can someone help me to check my optfile for IBM with AIX 6.1 as the operating system, and xlf90 for complier? I can successfully pass the compling process and get the executable file:mitgcmuv. However, the job finished after a short run, with the error message like: 
>>>> exec(): 0509-036 Cannot load program ./mitgcmuv because of the following errors:
>>>>        0509-026 System errors: There is not enough memory available now
>>>> 
>>>> My job uses 8 cups, but the mechine still has at least 100 cups left!  I think the problem is due to wrong parameters in the optfile script. My optfile looks like:               
>>>> 
>>>> #!/bin/bash
>>>> #
>>>> # $Name: checkpoint65b $
>>>> #  using the following invocation:
>>>> #    ../../../tools/genmake2 -mpi -mods=../code -of=../../../tools/build_options/IBM_AIX_xlf90+mpi
>>>> 
>>>> S64='$(TOOLSDIR)/set64bitConst.sh'
>>>> MAKEDEPEND=makedepend
>>>> DEFINES='-DTARGET_AIX -DALLOW_USE_MPI -DALWAYS_USE_MPI  -DWORDLENGTH=4'
>>>> 
>>>> INCLUDES='-I/usr/local64/include -I/usr/lpp/ppe.poe/include/thread64'
>>>> CPP='/usr/lib/cpp -P'
>>>> CC='mpcc -q64'
>>>> FC='mpxlf90 -q64'
>>>> LINK='mpxlf90 -q64'
>>>> MPI='true'
>>>> LIBS="-L/usr/lib64 -L/usr/local64/lib -L/usr/lpp/ppe.poe/lib64 -lmpi  -lnetcdf"
>>>> FFLAGS='-qfixed=132'
>>>> if test "x$IEEE" = x ; then
>>>>    #  No need for IEEE-754
>>>>    FOPTIM='-O3 -qarch=pwr7 -qtune=pwr7 -qhot'
>>>>    #CFLAGS='-O3 -Q -qarch=auto -qtune=auto -qcache=auto -qmaxmem=-1'
>>>> else
>>>>    #  Try to follow IEEE-754
>>>>    FOPTIM='-O3 -qstrict -Q -qarch=auto -qtune=auto -qcache=auto -qmaxmem=-1'
>>>>    #CFLAGS='-O3 -qstrict -Q -qarch=auto -qtune=auto -qcache=auto -qmaxmem=-1'
>>>> fi
>>>> FC_NAMEMANGLE="#define FC_NAMEMANGLE(X) X"
>>>> 
>>>> 
>>>> I submit my task using another script: 
>>>> 
>>>> #!/usr/bin/ksh
>>>> #@job_type=parallel
>>>> #@job_name=task1
>>>> #@ class = normal
>>>> #@ group = group2
>>>> #@node    =1
>>>> #@tasks_per_node=8
>>>> #@output=$(job_name).out
>>>> #@error=$(job_name).err
>>>> #@queue
>>>> poe ./mitgcmuv 
>>>> 
>>>> I'll appreciate your help very much!
>>>> 
>>>> 
>>>> _______________________________________________
>>>> MITgcm-support mailing list
>>>> MITgcm-support at mitgcm.org
>>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>> 
>>> 
>>> 
>>> 




More information about the MITgcm-support mailing list