[MITgcm-support] segmentation fault

Andreas Klocker andreas.klocker at utas.edu.au
Thu May 31 21:55:54 EDT 2018


Hi guys,

After becoming really desperate to get this going without success I have 
tried different openmpi (1.6.3) and intel-fc (12.1.9.293) versions and I 
finally managed to get an MITgcm error message before the segmentation 
fault (including a beautiful copy/paste spelling mistake ;)).

mitgcm.err says:

ABNROMAL END: S/R BARRIER
ABNROMAL END: S/R BARRIER
ABNROMAL END: S/R BARRIER
ABNROMAL END: S/R BARRIER
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine Line        Source
libirc.so          00002B73C04B32C9  Unknown Unknown  Unknown
libirc.so          00002B73C04B1B9E  Unknown Unknown  Unknown
libifcoremt.so.5   00002B73C2CEB13C  Unknown Unknown  Unknown
libifcoremt.so.5   00002B73C2C5A2A2  Unknown Unknown  Unknown
libifcoremt.so.5   00002B73C2C6B0F0  Unknown Unknown  Unknown
libpthread.so.0    00002B73C36157E0  Unknown Unknown  Unknown
.                  00000000004E009D  Unknown Unknown  Unknown
.                  000000000041E2D3  Unknown Unknown  Unknown
.                  00000000005B5658  Unknown Unknown  Unknown
ABNROMAL END: S/R BARRIER
ABNROMAL END: S/R BARRIER
ABNROMAL END: S/R BARRIER
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine Line        Source
libirc.so          00002AE9EBD362C9  Unknown Unknown  Unknown
libirc.so          00002AE9EBD34B9E  Unknown Unknown  Unknown
libifcoremt.so.5   00002AE9EE56E13C  Unknown Unknown  Unknown
libifcoremt.so.5   00002AE9EE4DD2A2  Unknown Unknown  Unknown
libifcoremt.so.5   00002AE9EE4EE0F0  Unknown Unknown  Unknown
libpthread.so.0    00002AE9EEE987E0  Unknown Unknown  Unknown
.                  00000000004C5ADB  Unknown Unknown  Unknown
.                  000000000041C165  Unknown Unknown  Unknown
.                  00000000005B5658  Unknown Unknown  Unknown

And mitgcm.out:

bash-4.1$ more mitgcm.out
  !!!!!!! PANIC !!!!!!! CATASTROPHIC ERROR
  !!!!!!! PANIC !!!!!!! in S/R BARRIER  myThid =            0 nThreads =
            1
  !!!!!!! PANIC !!!!!!! CATASTROPHIC ERROR
  !!!!!!! PANIC !!!!!!! in S/R BARRIER  myThid =            0 nThreads =
            1
  !!!!!!! PANIC !!!!!!! CATASTROPHIC ERROR
  !!!!!!! PANIC !!!!!!! in S/R BARRIER  myThid =            0 nThreads =
            1
  !!!!!!! PANIC !!!!!!! CATASTROPHIC ERROR
  !!!!!!! PANIC !!!!!!! in S/R BARRIER  myThid =            0 nThreads =
            1
  !!!!!!! PANIC !!!!!!! CATASTROPHIC ERROR
  !!!!!!! PANIC !!!!!!! in S/R BARRIER  myThid =            0 nThreads =
            1
  !!!!!!! PANIC !!!!!!! CATASTROPHIC ERROR
  !!!!!!! PANIC !!!!!!! in S/R BARRIER  myThid =            0 nThreads =
            1
  !!!!!!! PANIC !!!!!!! CATASTROPHIC ERROR
  !!!!!!! PANIC !!!!!!! in S/R BARRIER  myThid =            0 nThreads =
            1
  !!!!!!! PANIC !!!!!!! CATASTROPHIC ERROR
  !!!!!!! PANIC !!!!!!! in S/R BARRIER  myThid =            0 nThreads =

I'm struggling to figure out what this means though since that part of 
the code is far beyond my understanding...but I'm worrying about the 
amount of "PANIC" in there!

Has anyone got any suggestions?

cheers,

Andreas




On 19/05/18 01:37, Patrick Heimbach wrote:
> Hi Andreas,
>
> a small chance (and a bit of a guess) that one of the following might do the trick
> (we have memory-related issues when running the adjoint):
>
> In your shell script or batch job, add
> ulimit -s unlimited
>
> If compiling with ifort, you could try -mcmodel=medium
>
> See mitgcm-support thread, e.g.
> http://mailman.mitgcm.org/pipermail/mitgcm-support/2005-October/003505.html
> (you'll need to scroll down to input provided by Constantinos Evangelinos).
>
> p.
>
>> On May 18, 2018, at 8:04 AM, Dimitris Menemenlis <menemenlis at jpl.nasa.gov> wrote:
>>
>> Andreas, I have done something similar quite a few times (i.e., increase horizontal and/or vertical resolution in regional domains with obcs cut-out from a global set-up) and did not have same issue.  If helpful I can dig out and commit to contrib some examples that you can compare your set-up against.  Actually, you remind me that I already promised to do this for Gael but that it fell off the bottom of my todo list :-(
>>
>> Do you have any custom routines in your "code" directory?  Have you tried compiling and linking with array-bound checks turned on?
>>
>> Dimitris Menemenlis
>> On 05/17/2018 11:45 PM, Andreas Klocker wrote:
>>> Matt,
>>> I cut all the unnecessary packages and still have the same issue.
>>> I also checked 'size mitgcmuv' and compared the size to other runs which work - it asks for about half the size of other runs which work fine (same machine, same queue, same compiling options, etc).
>>> The tiles are already down to 32x32 grid points, and I'm happily running configurations with a tile size almost twice as big and the same amount of vertical layers.
>>> I will try some different tile sizes but I think the problem must be somewhere else..
>>> Andreas
>>> On 18/05/18 00:35, Matthew Mazloff wrote:
>>>> Sounds like a memory issue. I think your executable has become too big for you machine. You will need to reduce tile size or do something else (e.g. reduce number of diagnostics or cut a package)
>>>>
>>>> and check
>>>> size mitgcmuv
>>>> to get a ballpark idea of how much memory you are requesting
>>>>
>>>> Matt
>>>>
>>>>
>>>>
>>>>> On May 16, 2018, at 11:27 PM, Andreas Klocker <andreas.klocker at utas.edu.au<mailto:andreas.klocker at utas.edu.au>> wrote:
>>>>>
>>>>> Hi guys,
>>>>>
>>>>> I've taken a working 1/24 degree nested simulation (of Drake Passage)
>>>>> with 42 vertical layers and tried to increase the vertical layers to 150
>>>>> (without changing anything else apart from obviously my boundary files
>>>>> for OBCS and recompiling with 150 vertical layers). Suddenly I get the
>>>>> following error message:
>>>>>
>>>>> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>>>>> Image              PC                Routine Line        Source
>>>>> libirc.so <http://libirc.so>          00002BA1704BC2C9  Unknown Unknown  Unknown
>>>>> libirc.so <http://libirc.so>          00002BA1704BAB9E  Unknown Unknown  Unknown
>>>>> libifcore.so.5     00002BA1722B5F3F  Unknown Unknown  Unknown
>>>>> libifcore.so.5     00002BA17221DD7F  Unknown Unknown  Unknown
>>>>> libifcore.so.5     00002BA17222EF43  Unknown Unknown  Unknown
>>>>> libpthread.so.0    00002BA1733B27E0  Unknown Unknown  Unknown
>>>>> mitgcmuv_drake24_  00000000004E61BC  mom_calc_visc_ 3345 mom_calc_visc.f
>>>>> mitgcmuv_drake24_  0000000000415127  mom_vecinv_ 3453 mom_vecinv.f
>>>>> mitgcmuv_drake24_  0000000000601C33  dynamics_ 3426  dynamics.f
>>>>> mitgcmuv_drake24_  0000000000613C2B  forward_step_ 2229 forward_step.f
>>>>> mitgcmuv_drake24_  000000000064581E  main_do_loop_ 1886 main_do_loop.f
>>>>> mitgcmuv_drake24_  000000000065E500  the_main_loop_ 1904 the_main_loop.f
>>>>> mitgcmuv_drake24_  000000000065E6AE  the_model_main_ 2394 the_model_main.f
>>>>> mitgcmuv_drake24_  00000000005C6439  MAIN__ 3870  main.f
>>>>> mitgcmuv_drake24_  0000000000406776  Unknown Unknown  Unknown
>>>>> libc.so.6          00002BA1737E2D1D  Unknown Unknown  Unknown
>>>>> mitgcmuv_drake24_  0000000000406669  Unknown Unknown  Unknown
>>>>>
>>>>> First this error pointed to a line in mom_calc_visc.f on which
>>>>> calculations regarding the Leith viscosity are done. As a test I then
>>>>> used a Smagorinsky viscosity instead and now it crashes with the same
>>>>> error, but pointing to a line where Smagorinsky calculations are done. I
>>>>> assume I must be chasing a way more fundamental problem than one related
>>>>> to these two viscosity choices...but I'm not sure what this might be....
>>>>>
>>>>> Has anyone got any idea of what could be going wrong here?
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>> Andreas
>>>>>
>>>>>
>>>>>
>>>>> University of Tasmania Electronic Communications Policy (December, 2014).
>>>>> This email is confidential, and is for the intended recipient only. Access, disclosure, copying, distribution, or reliance on any of it by anyone outside the intended recipient organisation is prohibited and may be a criminal offence. Please delete if obtained in error and email confirmation to the sender. The views expressed in this email are not necessarily the views of the University of Tasmania, unless clearly intended otherwise.
>>>>>
>>>>> _______________________________________________
>>>>> MITgcm-support mailing list
>>>>> MITgcm-support at mitgcm.org <mailto:MITgcm-support at mitgcm.org>
>>>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>>>>
>>>>
>>>> _______________________________________________
>>>> MITgcm-support mailing list
>>>> MITgcm-support at mitgcm.org
>>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>>> -- 
>>> ===============================================================
>>> Dr. Andreas Klocker
>>> Physical Oceanographer
>>> ARC Centre of Excellence for Climate System Science
>>> &
>>> Institute for Marine and Antarctic Studies
>>> University of Tasmania
>>> 20 Castray Esplanade
>>> Battery Point, TAS
>>> 7004 Australia
>>> M:     +61 437 870 182
>>> W:http://www.utas.edu.au/profiles/staff/imas/andreas-klocker
>>> skype: andiklocker
>>> ===============================================================
>>> _______________________________________________
>>> MITgcm-support mailing list
>>> MITgcm-support at mitgcm.org
>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support

-- 
===============================================================
Dr. Andreas Klocker
Physical Oceanographer

ARC Centre of Excellence for Climate System Science
&
Institute for Marine and Antarctic Studies
University of Tasmania
20 Castray Esplanade
Battery Point, TAS
7004 Australia

M:     +61 437 870 182
W:     http://www.utas.edu.au/profiles/staff/imas/andreas-klocker
skype: andiklocker
===============================================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20180601/e0dd5881/attachment-0001.html>


More information about the MITgcm-support mailing list