[MITgcm-support] optim_m1qn3 on ARCHER supercomputer

Daniel Goldberg dngoldberg at gmail.com
Thu Apr 11 09:58:11 EDT 2019


Hi Martin

I think i might be in business! I updated the Makefile and get the
following in m1qn3_output.txt after the first iteration (below). i.e. file
read, headers correct, no nans in analysis.

Thanks very much for your help on this!

Best
Dan

 M1QN3 (Version 3.3, October 2009): entry point
     dimension of the problem (n):        385208
     absolute precision on x (dxmin):          1.00E-06
     expected decrease for f (df1):            2.96E+01
     relative precision on g (epsg):           1.00E-06 (two-norm)
     maximal number of iterations (niter):  1000
     maximal number of simulations (nsim):  5000
     printing level (impres):                 10
     reverse communication

 m1qn3: Diagonal Initial Scaling mode

     allocated memory (ndz) :  9245002
     used memory :             9245002
     number of updates :            10
     (y,s) pairs are stored in core memory

 m1qn3: cold start

     f             =  2.29644015E+02
     two-norm of g =  1.09602268E+00

 m1qn3a: descent direction -g: precon =  0.494E+02

 -------------------------------------------------------------------------------

 m1qn3: iter 1, simul 1, f= 2.29644015E+02, h'(0)=-5.92880E+01

 m1qn3: line search

     mlis3       fpn=-5.929E+01 d2= 2.93E+03  tmin= 2.96E-08 tmax= 1.00E+20


On Thu, Apr 11, 2019 at 10:37 AM Martin Losch <Martin.Losch at awi.de> wrote:

> Hi Dan,
>
> I have found and fixed the initialization bug that lead to NaN in my
> output and it’s all checked into github. With that change you don’t even
> need to use the DYNAMIC flag (but there’s no harm in using it, maybe it
> should be the default?).
>
> Martin
>
> > On 11. Apr 2019, at 11:08, GOLDBERG Daniel <Dan.Goldberg at ed.ac.uk>
> wrote:
> >
> > Hi Martin
> >
> > Thanks for this. I have not had a chance to do anything yet, but --
> >
> > 1) D_BYTESWAPIO: yes, this was in my optfile; I notice it is a CPP flag,
> while i used "-h byteswapio" as a fortran flag in the optim_m1qn3 makefile,
> and hope this changes things. So i will try again with more consistent
> flags! And thanks -- that makes more sense re: MDSIO!
> >
> > 2) thanks again, will try with the DYNAMIC directive!
> >
> > best
> > dan
> >
> > On Thu, Apr 11, 2019 at 9:50 AM Martin Losch <Martin.Losch at awi.de>
> wrote:
> > Hi Dan,
> >
> > this may solve your problem:
> >
> > include somewhere in optim_sub.F
> > #define DYNAMIC
> > or in the Makefile -DDYNAMIC
> >
> > Not sure why this fixes things, but the NaNs are gone.
> >
> > Martin
> >
> > > On 11. Apr 2019, at 10:17, Martin Losch <Martin.Losch at awi.de> wrote:
> > >
> > > Hi Dan,
> > >
> > > I assume that you are using linux_ia64_cray_archer?
> > >
> > > In that file you set -D_BYTESWAPIO
> > > This option enables code that does the byteswapping witin the mdiso
> package. The code is there for the (by now) rare case of a compiler that
> does not have an option to do this internally.
> > > In “my” opt file linux_ia64_cray_ollie, I use the compiler option "-h
> byteswapio” instead. You could do that too, but what’s really important is
> that you use the same compiler options for compiling the mitgcmuv and
> optim.x (although that does not help me either). The code that writes
> "ecco_cost_MIT_CE_000.optXXX” never used mdsio and so the -D_BYTESWAPIO is
> not in effect here, but if you use -h byteswapio for one excutable, you
> need to do it also for the other.
> > >
> > > Martin
> > >
> > >> On 10. Apr 2019, at 22:23, Daniel Goldberg <dngoldberg at gmail.com>
> wrote:
> > >>
> > >> Hi Martin
> > >>
> > >> Did some investigation of the pkg/ctrl output -- the gradient portion
> of the cost file (ecco_cost_MIT_CE_000.optXXX) is being written in reverse
> byte-order when using the cray compiler with the settings i've given.
> possibly this is happening with the header information as well. Im not sure
> why since the mdsio package seems to read and write files using the right
> byte ordering, but I don't understand the mdsio or ctrl packages well
> enough to know the difference.
> > >>
> > >> So aside from the numerical problems you reference below with optim.x
> (the NaNs), ill need to find some way to control this, but again, guidance
> would be very welcome.
> > >>
> > >> Many thanks again
> > >> Dan
> > >>
> > >> On Wed, Apr 10, 2019 at 5:35 PM Martin Losch <Martin.Losch at awi.de>
> wrote:
> > >> Hi Dan,
> > >>
> > >> I was wrong, the problem is already there after reading xx with
> optim_readdata. It has got to do something with the header information
> where a lot of integer stuff is written.
> > >>
> > >> I also noticed that the cray compiler complains about bi/bj being
> used/written before defined and some other old features in the code. Fixing
> that doesn’t change anything (although it needs to be fixed). More tomorrow.
> > >>
> > >> Martin
> > >>
> > >>> On 10. Apr 2019, at 18:02, Martin Losch <Martin.Losch at awi.de> wrote:
> > >>>
> > >>> Hi Dan,
> > >>>
> > >>> I don’t have access to ARCHER but I have a Cray CS400 with a
> somewhat disfunctional cray compiler. I followed the instructions in the
> README.md on https://github.com/mjlosch/optim_m1qn3 and I was able to run
> the tutorial_global_oce_optim for the zeroth iteration (so I can assume
> that in this case my compiler is not totally off), I could also compile and
> run optim.x with these flags:
> > >>>
> > >>>
> > >>> CPPFLAGS = -DREAL_BYTE=4              \
> > >>>      -DMAX_INDEPEND=1000000          \
> > >>>      -D_RL=‘real*8'  \
> > >>>      -D_RS=‘real*4'  \
> > >>>      -D_d='d'
> > >>>
> > >>> #                -DMAX_INDEPEND=293570968        \
> > >>> # FORTRAN compiler and its flags copied from the opt file, or rather
> the Makefile of tutorial_global_oce_optim
> > >>> FC              = ftn
> > >>> FFLAGS     =  -h byteswapio -hnoomp -O0 -hfp0
> > >>>
> > >>> Everything looks good until m1qn3_offline is called, unfortunately.
> > >>> I get many wrong numbers ( somthingE+/-317, and even NaN), but also
> useful numbers in xx after m1qn3_offline has been called. This looks like
> something more severe. I’ll look into that, but if you can get this far,
> that would be good.
> > >>>
> > >>> Please use the github version.
> > >>>
> > >>> Martin
> > >>>
> > >>>
> > >>>
> > >>>> On 10. Apr 2019, at 16:30, GOLDBERG Daniel <Dan.Goldberg at ed.ac.uk>
> wrote:
> > >>>>
> > >>>> Hello Martin (or anyone who has used optim_m1qn3 on ARCHER)
> > >>>>
> > >>>> I have used optim_m1qn3 previously but not on the ARCHER UK
> supercomputer (a Cray architecture). The setup i am using (making use of
> STREAMICE/OpenAD/optim_m1qn3) has been working well on the MIT engaging
> cluster but am now trying to run on ARCHER. Following the recommendations
> of others I have built MITgcm using Cray compilers; and i modified
> mlosch/m1qn3_optim/Makefile, save to point to my build directory.
> > >>>>
> > >>>> The first call to optim.x yields the error
> > >>>>
> > >>>> ============================================================
> > >>>> OPTIM_READDATA: opened file ecco_cost_MIT_CE_000.opt0000
>
> > >>>> At line 1295 of file optim_readdata.f (unit = 20, file =
> 'ecco_cost_MIT_CE_000.opt0000')
> > >>>> Fortran runtime error: End of file
> > >>>> ============================================================
> > >>>>
> > >>>> which suggests the binary file is written in a format that
> optim_m1qn3 is not expecting?
> > >>>>
> > >>>> Other tests I did:
> > >>>>
> > >>>> 1) Ran the same experiment (i.e. same code and input, but with gnu
> compilers) on the engaging cluster. Ran fine.
> > >>>> 2) Called optim.x (compiled on Archer) with the
> ecco_cost_MIT_CE_000.opt0000 produced on engaging. Ran fine.
> > >>>> 3) Called optim.x (compiled on engaging) with the
> ecco_cost_MIT_CE_000.opt0000 produced on ARCHER.
> > >>>>
> > >>>> Hence, either mitgcmuv_ad is writing a corrupted executable when
> built with a cray compiler, or the Makefile of m1qn3_optim should be
> modified to reflect that its input files are being produced by a
> cray-compiled executable -- but I do not know how to do this. I am
> attempting now to build and run MITgcm/OAD using iFort, but may run into
> trouble for different reasons.
> > >>>>
> > >>>> Any guidance you could give on this topic would be much appreciated.
> > >>>>
> > >>>> Best
> > >>>> Dan
> > >>>>
> > >>>> --
> > >>>>
> > >>>> Daniel Goldberg, PhD
> > >>>> Sr. Lecturer in Glaciology
> > >>>> School of Geosciences, University of Edinburgh
> > >>>> Geography Building, Drummond Street, Edinburgh EH8 9XP
> > >>>>
> > >>>>
> > >>>> em: dan.goldberg at ed.ac.uk
> > >>>> web: https://www.geos.ed.ac.uk/homes/dgoldber
> > >>>> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> > >>>> _______________________________________________
> > >>>> MITgcm-support mailing list
> > >>>> MITgcm-support at mitgcm.org
> > >>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> > >>>
> > >>> _______________________________________________
> > >>> MITgcm-support mailing list
> > >>> MITgcm-support at mitgcm.org
> > >>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> > >>
> > >> _______________________________________________
> > >> MITgcm-support mailing list
> > >> MITgcm-support at mitgcm.org
> > >> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> > >> _______________________________________________
> > >> MITgcm-support mailing list
> > >> MITgcm-support at mitgcm.org
> > >> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> > >
> > > _______________________________________________
> > > MITgcm-support mailing list
> > > MITgcm-support at mitgcm.org
> > > http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> >
> > _______________________________________________
> > MITgcm-support mailing list
> > MITgcm-support at mitgcm.org
> > http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> >
> >
> > --
> >
> > Daniel Goldberg, PhD
> > Sr. Lecturer in Glaciology
> > School of Geosciences, University of Edinburgh
> > Geography Building, Drummond Street, Edinburgh EH8 9XP
> >
> >
> > em: dan.goldberg at ed.ac.uk
> > web: https://www.geos.ed.ac.uk/homes/dgoldber
> > _______________________________________________
> > MITgcm-support mailing list
> > MITgcm-support at mitgcm.org
> > http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20190411/1a6be164/attachment-0001.html>


More information about the MITgcm-support mailing list