[MITgcm-support] error while writing pickup files with Cray compilers

Jody Klymak jklymak at uvic.ca
Tue Apr 11 17:58:32 EDT 2017


Hopefully people who really understand the compiler issues will pipe up.  I assume you are running w/ MPI - maybe the parallel writing of the mds files is failing somehow? But if you have a compiler issue:

- Did you try running w/ the optimizations turned off?  (edit the `linux_ia64_cray_archer` file)
- in `data` you shoudl set `debugLevel=5` or something large like that and see if there are clues in the output.

Good luck!  Jody


> On 11 Apr 2017, at  13:39 PM, Laura Cimoli <laura.cimoli at physics.ox.ac.uk> wrote:
> 
> Hi Jody,
> 
> yes, it does start writing the pickup file. 
> I also made a few other tests (of course much shorter than 100 y!), and I got always the same error. Also, the configuration works with the gnu compiler, but if the Cray compiler is really 10x faster it would be nice to use it!
> 
> I wonder if the pickup file is overwritten or if it is appending to the file...? Maybe it is doing something funny when trying to appending to it?
> 
> Thanks,
> Laura
> 
>  
> From: Jody Klymak [jklymak at uvic.ca]
> Sent: 11 April 2017 21:27
> To: mitgcm-support at mitgcm.org
> Subject: Re: [MITgcm-support] error while writing pickup files with Cray compilers
> 
> Hi Laura,
> 
> Are you sure the mitgcm can write to the directory it is trying to write to?  Does it *start* to write the pickup file?  
> 
> These are just dumb questions.  Maybe it truly is a compiler issue, but it seems more likely it is a configuration issue.   Obviously, for testing I’d suggest writing a pickup file well before 100 y has passed.
> 
> Good luck, 
> 
> Jody
> 
> 
> 
> 
> 
> 
> 
> 
>> On 11 Apr 2017, at  11:21 AM, Laura Cimoli <laura.cimoli at physics.ox.ac.uk <mailto:laura.cimoli at physics.ox.ac.uk>> wrote:
>> 
>> Hello Jody,
>> 
>> sorry I forgot to mention that all my other outputs are in netcdf format, and they look fine.
>> The data file is attached.
>> 
>> Thanks, 
>> Laura
>> 
>> From: Jody Klymak [jklymak at uvic.ca <mailto:jklymak at uvic.ca>]
>> Sent: 11 April 2017 19:01
>> To: mitgcm-support at mitgcm.org <mailto:mitgcm-support at mitgcm.org>
>> Subject: Re: [MITgcm-support] error while writing pickup files with Cray compilers
>> 
>> Are you able to write any mds files?  i.e. did the T.000000000000.data file write?  Can you supply your `data` file?
>> 
>> Cheers,   Jody
>> 
>> 
>>> On 11 Apr 2017, at  10:54 AM, Laura Cimoli <laura.cimoli at physics.ox.ac.uk <mailto:laura.cimoli at physics.ox.ac.uk>> wrote:
>>> 
>>> Hello,
>>> 
>>> this question is relevant mainly for Archer user, but of course any help is appreciated!
>>> 
>>> I have recently tried to use Cray instead of gnu compilers, since the model should run much faster according to what stated here <http://www.archer.ac.uk/community/eCSE/eCSE03-09/eCSE03-09_White_Paper.pdf>. I have to admit I have not read that report in detail, but I hope that there are not particular constraints on the use of Cray compilers on Archer.
>>> 
>>> I used the linux_ia64_cray_archer optfile, as indicated in the report.
>>> 
>>> At a first glance, the model is compiled without any odd warning, and seems to run without any problem, but it crashes when writing the pickup file. This is the message I got (the whole error file is attached):
>>> 
>>> lib-5058 : UNRECOVERABLE library error
>>> A read system call read less data than expected.
>>> 
>>> Encountered during a direct access unformatted WRITE to unit 9
>>> Fortran unit 9 is connected to a direct unformatted unblocked file:
>>> "pickup.0001752000.data"
>>> 
>>> _pmiu_daemon(SIGCHLD): [NID 02940] [c7-1c0s15n0] [Tue Apr 11 08:49:37 2017] PE RANK 69 exit signal Aborted
>>> [NID 02940] 2017-04-11 08:49:37 Apid 26123498: initiated application termination
>>> 
>>> 
>>> I am writing the permanent pickup file, and I don't have any temporary pickup file.
>>> 
>>> The only weird warning I have noticed in the genmake.log file (attached) is below, but I don't know whether it is related to the problem reported above:
>>> 
>>> running: check_HAVE_SIGREG() 
>>> cc -c genmake_tc_1.c 
>>> CC-513 craycc: WARNING File = genmake_tc_1.c, Line = 22
>>> A value of type "void *" cannot be assigned to an entity of type
>>> "void (*)(int, siginfo_t *, void *)".
>>> s.sa_sigaction = (void *)killhandler;
>>> ^
>>> Total warnings detected in genmake_tc_1.c: 1
>>> program hello
>>> integer anint
>>> common /iv/ anint
>>> external sigreg
>>> call sigreg(anint)
>>> end
>>> ftn -o genmake_tc genmake_tc_2.f genmake_tc_1.o
>>> --> set HAVE_SIGREG='t'
>>> 
>>> 
>>> Does anyone know why the Cray compilers return this error while writing the output binary file?
>>> 
>>> Many thanks,
>>> Laura
>>> <genmake.log><output_000.e4441213>_______________________________________________
>>> MITgcm-support mailing list
>>> MITgcm-support at mitgcm.org <mailto:MITgcm-support at mitgcm.org>
>>> http://mitgcm.org/mailman/listinfo/mitgcm-support <http://mitgcm.org/mailman/listinfo/mitgcm-support>
>> --
>> Jody Klymak    
>> http://web.uvic.ca/~jklymak/ <http://web.uvic.ca/~jklymak/>
>> 
>> 
>> 
>> 
>> 
>> <data>_______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org <mailto:MITgcm-support at mitgcm.org>
>> http://mitgcm.org/mailman/listinfo/mitgcm-support <http://mitgcm.org/mailman/listinfo/mitgcm-support>
> --
> Jody Klymak    
> http://web.uvic.ca/~jklymak/ <http://web.uvic.ca/~jklymak/>
> 
> 
> 
> 
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org <mailto:MITgcm-support at mitgcm.org>
> http://mitgcm.org/mailman/listinfo/mitgcm-support <http://mitgcm.org/mailman/listinfo/mitgcm-support>
--
Jody Klymak    
http://web.uvic.ca/~jklymak/





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20170411/e197ab8f/attachment-0001.htm>


More information about the MITgcm-support mailing list