[MITgcm-support] optim_m1qn3 on ARCHER supercomputer

Daniel Goldberg dngoldberg at gmail.com
Wed Apr 10 16:23:27 EDT 2019


Hi Martin

Did some investigation of the pkg/ctrl output -- the gradient portion of
the cost file (ecco_cost_MIT_CE_000.optXXX) is being written in reverse
byte-order when using the cray compiler with the settings i've given.
possibly this is happening with the header information as well. Im not sure
why since the mdsio package seems to read and write files using the right
byte ordering, but I don't understand the mdsio or ctrl packages well
enough to know the difference.

So aside from the numerical problems you reference below with optim.x (the
NaNs), ill need to find some way to control this, but again, guidance would
be very welcome.

Many thanks again
Dan

On Wed, Apr 10, 2019 at 5:35 PM Martin Losch <Martin.Losch at awi.de> wrote:

> Hi Dan,
>
> I was wrong, the problem is already there after reading xx with
> optim_readdata. It has got to do something with the header information
> where a lot of integer stuff is written.
>
> I also noticed that the cray compiler complains about bi/bj being
> used/written before defined and some other old features in the code. Fixing
> that doesn’t change anything (although it needs to be fixed). More tomorrow.
>
> Martin
>
> > On 10. Apr 2019, at 18:02, Martin Losch <Martin.Losch at awi.de> wrote:
> >
> > Hi Dan,
> >
> > I don’t have access to ARCHER but I have a Cray CS400 with a somewhat
> disfunctional cray compiler. I followed the instructions in the README.md
> on https://github.com/mjlosch/optim_m1qn3 and I was able to run the
> tutorial_global_oce_optim for the zeroth iteration (so I can assume that in
> this case my compiler is not totally off), I could also compile and run
> optim.x with these flags:
> >
> >
> > CPPFLAGS = -DREAL_BYTE=4              \
> >       -DMAX_INDEPEND=1000000          \
> >       -D_RL=‘real*8'  \
> >       -D_RS=‘real*4'  \
> >       -D_d='d'
> >
> > #                -DMAX_INDEPEND=293570968        \
> > # FORTRAN compiler and its flags copied from the opt file, or rather the
> Makefile of tutorial_global_oce_optim
> > FC              = ftn
> > FFLAGS     =  -h byteswapio -hnoomp -O0 -hfp0
> >
> > Everything looks good until m1qn3_offline is called, unfortunately.
> > I get many wrong numbers ( somthingE+/-317, and even NaN), but also
> useful numbers in xx after m1qn3_offline has been called. This looks like
> something more severe. I’ll look into that, but if you can get this far,
> that would be good.
> >
> > Please use the github version.
> >
> > Martin
> >
> >
> >
> >> On 10. Apr 2019, at 16:30, GOLDBERG Daniel <Dan.Goldberg at ed.ac.uk>
> wrote:
> >>
> >> Hello Martin (or anyone who has used optim_m1qn3 on ARCHER)
> >>
> >> I have used optim_m1qn3 previously but not on the ARCHER UK
> supercomputer (a Cray architecture). The setup i am using (making use of
> STREAMICE/OpenAD/optim_m1qn3) has been working well on the MIT engaging
> cluster but am now trying to run on ARCHER. Following the recommendations
> of others I have built MITgcm using Cray compilers; and i modified
> mlosch/m1qn3_optim/Makefile, save to point to my build directory.
> >>
> >> The first call to optim.x yields the error
> >>
> >> ============================================================
> >>  OPTIM_READDATA: opened file ecco_cost_MIT_CE_000.opt0000
>
> >> At line 1295 of file optim_readdata.f (unit = 20, file =
> 'ecco_cost_MIT_CE_000.opt0000')
> >> Fortran runtime error: End of file
> >> ============================================================
> >>
> >> which suggests the binary file is written in a format that optim_m1qn3
> is not expecting?
> >>
> >> Other tests I did:
> >>
> >> 1) Ran the same experiment (i.e. same code and input, but with gnu
> compilers) on the engaging cluster. Ran fine.
> >> 2) Called optim.x (compiled on Archer) with the
> ecco_cost_MIT_CE_000.opt0000 produced on engaging. Ran fine.
> >> 3) Called optim.x (compiled on engaging) with the
> ecco_cost_MIT_CE_000.opt0000 produced on ARCHER.
> >>
> >> Hence, either mitgcmuv_ad is writing a corrupted executable when built
> with a cray compiler, or the Makefile of m1qn3_optim should be modified to
> reflect that its input files are being produced by a cray-compiled
> executable -- but I do not know how to do this. I am attempting now to
> build and run MITgcm/OAD using iFort, but may run into trouble for
> different reasons.
> >>
> >> Any guidance you could give on this topic would be much appreciated.
> >>
> >> Best
> >> Dan
> >>
> >> --
> >>
> >> Daniel Goldberg, PhD
> >> Sr. Lecturer in Glaciology
> >> School of Geosciences, University of Edinburgh
> >> Geography Building, Drummond Street, Edinburgh EH8 9XP
> >>
> >>
> >> em: dan.goldberg at ed.ac.uk
> >> web: https://www.geos.ed.ac.uk/homes/dgoldber
> >> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> >> _______________________________________________
> >> MITgcm-support mailing list
> >> MITgcm-support at mitgcm.org
> >> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> >
> > _______________________________________________
> > MITgcm-support mailing list
> > MITgcm-support at mitgcm.org
> > http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20190410/2ec32cc8/attachment.html>


More information about the MITgcm-support mailing list