[MITgcm-support] optim_m1qn3 on ARCHER supercomputer

Martin Losch Martin.Losch at awi.de
Thu Apr 11 04:49:30 EDT 2019


Hi Dan,

this may solve your problem:

include somewhere in optim_sub.F
#define DYNAMIC
or in the Makefile -DDYNAMIC

Not sure why this fixes things, but the NaNs are gone.

Martin

> On 11. Apr 2019, at 10:17, Martin Losch <Martin.Losch at awi.de> wrote:
> 
> Hi Dan,
> 
> I assume that you are using linux_ia64_cray_archer?
> 
> In that file you set -D_BYTESWAPIO 
> This option enables code that does the byteswapping witin the mdiso package. The code is there for the (by now) rare case of a compiler that does not have an option to do this internally.
> In “my” opt file linux_ia64_cray_ollie, I use the compiler option "-h byteswapio” instead. You could do that too, but what’s really important is that you use the same compiler options for compiling the mitgcmuv and optim.x (although that does not help me either). The code that writes "ecco_cost_MIT_CE_000.optXXX” never used mdsio and so the -D_BYTESWAPIO is not in effect here, but if you use -h byteswapio for one excutable, you need to do it also for the other.
> 
> Martin
> 
>> On 10. Apr 2019, at 22:23, Daniel Goldberg <dngoldberg at gmail.com> wrote:
>> 
>> Hi Martin
>> 
>> Did some investigation of the pkg/ctrl output -- the gradient portion of the cost file (ecco_cost_MIT_CE_000.optXXX) is being written in reverse byte-order when using the cray compiler with the settings i've given. possibly this is happening with the header information as well. Im not sure why since the mdsio package seems to read and write files using the right byte ordering, but I don't understand the mdsio or ctrl packages well enough to know the difference. 
>> 
>> So aside from the numerical problems you reference below with optim.x (the NaNs), ill need to find some way to control this, but again, guidance would be very welcome.
>> 
>> Many thanks again
>> Dan
>> 
>> On Wed, Apr 10, 2019 at 5:35 PM Martin Losch <Martin.Losch at awi.de> wrote:
>> Hi Dan,
>> 
>> I was wrong, the problem is already there after reading xx with  optim_readdata. It has got to do something with the header information where a lot of integer stuff is written.
>> 
>> I also noticed that the cray compiler complains about bi/bj being used/written before defined and some other old features in the code. Fixing that doesn’t change anything (although it needs to be fixed). More tomorrow.
>> 
>> Martin
>> 
>>> On 10. Apr 2019, at 18:02, Martin Losch <Martin.Losch at awi.de> wrote:
>>> 
>>> Hi Dan,
>>> 
>>> I don’t have access to ARCHER but I have a Cray CS400 with a somewhat disfunctional cray compiler. I followed the instructions in the README.md on https://github.com/mjlosch/optim_m1qn3 and I was able to run the tutorial_global_oce_optim for the zeroth iteration (so I can assume that in this case my compiler is not totally off), I could also compile and run optim.x with these flags:
>>> 
>>> 
>>> CPPFLAGS = -DREAL_BYTE=4              \
>>>      -DMAX_INDEPEND=1000000          \
>>>      -D_RL=‘real*8'  \
>>>      -D_RS=‘real*4'  \
>>>      -D_d='d'
>>> 
>>> #                -DMAX_INDEPEND=293570968        \
>>> # FORTRAN compiler and its flags copied from the opt file, or rather the Makefile of tutorial_global_oce_optim
>>> FC              = ftn
>>> FFLAGS     =  -h byteswapio -hnoomp -O0 -hfp0
>>> 
>>> Everything looks good until m1qn3_offline is called, unfortunately.
>>> I get many wrong numbers ( somthingE+/-317, and even NaN), but also useful numbers in xx after m1qn3_offline has been called. This looks like something more severe. I’ll look into that, but if you can get this far, that would be good.
>>> 
>>> Please use the github version.
>>> 
>>> Martin
>>> 
>>> 
>>> 
>>>> On 10. Apr 2019, at 16:30, GOLDBERG Daniel <Dan.Goldberg at ed.ac.uk> wrote:
>>>> 
>>>> Hello Martin (or anyone who has used optim_m1qn3 on ARCHER)
>>>> 
>>>> I have used optim_m1qn3 previously but not on the ARCHER UK supercomputer (a Cray architecture). The setup i am using (making use of STREAMICE/OpenAD/optim_m1qn3) has been working well on the MIT engaging cluster but am now trying to run on ARCHER. Following the recommendations of others I have built MITgcm using Cray compilers; and i modified mlosch/m1qn3_optim/Makefile, save to point to my build directory.
>>>> 
>>>> The first call to optim.x yields the error
>>>> 
>>>> ============================================================
>>>> OPTIM_READDATA: opened file ecco_cost_MIT_CE_000.opt0000                                                                
>>>> At line 1295 of file optim_readdata.f (unit = 20, file = 'ecco_cost_MIT_CE_000.opt0000')
>>>> Fortran runtime error: End of file
>>>> ============================================================ 
>>>> 
>>>> which suggests the binary file is written in a format that optim_m1qn3 is not expecting? 
>>>> 
>>>> Other tests I did:
>>>> 
>>>> 1) Ran the same experiment (i.e. same code and input, but with gnu compilers) on the engaging cluster. Ran fine.
>>>> 2) Called optim.x (compiled on Archer) with the ecco_cost_MIT_CE_000.opt0000 produced on engaging. Ran fine.
>>>> 3) Called optim.x (compiled on engaging) with the ecco_cost_MIT_CE_000.opt0000 produced on ARCHER. 
>>>> 
>>>> Hence, either mitgcmuv_ad is writing a corrupted executable when built with a cray compiler, or the Makefile of m1qn3_optim should be modified to reflect that its input files are being produced by a cray-compiled executable -- but I do not know how to do this. I am attempting now to build and run MITgcm/OAD using iFort, but may run into trouble for different reasons.
>>>> 
>>>> Any guidance you could give on this topic would be much appreciated.
>>>> 
>>>> Best
>>>> Dan
>>>> 
>>>> -- 
>>>> 
>>>> Daniel Goldberg, PhD
>>>> Sr. Lecturer in Glaciology
>>>> School of Geosciences, University of Edinburgh
>>>> Geography Building, Drummond Street, Edinburgh EH8 9XP
>>>> 
>>>> 
>>>> em: dan.goldberg at ed.ac.uk
>>>> web: https://www.geos.ed.ac.uk/homes/dgoldber
>>>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
>>>> _______________________________________________
>>>> MITgcm-support mailing list
>>>> MITgcm-support at mitgcm.org
>>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>>> 
>>> _______________________________________________
>>> MITgcm-support mailing list
>>> MITgcm-support at mitgcm.org
>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>> 
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support



More information about the MITgcm-support mailing list