[MITgcm-support] MITgcm-support Digest, Vol 237, Issue 1
Matthew Mazloff
mmazloff at ucsd.edu
Fri Mar 3 19:04:35 EST 2023
It may be something with reading the data* files. I have had issues with this in the past when using many cores, though it has been a long time since I have seen this issue arise.
To try something other than the default you can try
USE_FORTRAN_SCRATCH_FILES
I’m not sure how it works but it only impacts this part of the code so definitely safe to try
or you can try
SINGLE_DISK_IO
but this eliminates IO from all other cores, and will thus suppress error messages. That said, if multiple cores trying to process data* at once is your issue this will resolve it.
Matt
> On Mar 3, 2023, at 2:14 PM, mario wrk <wrkmario at gmail.com> wrote:
>
> Thanks for pointing that out! the scratch files point to data.exch2 and data.ctrl I excluded some pkg one by one, but I still have some similar issues,
> in the end, there is a segmentation fault, srun: error: l10551: task 256: Segmentation fault, now I highly suspect it might be some compiler issues, cuz it was good with some verification examples, but my own configuration substantially increased resolution and I wanted to compile/run in multi-processors
> I also noticed there were similar discussions before: http://mailman.mitgcm.org/pipermail/mitgcm-support/2018-June/011593.html <https://urldefense.com/v3/__http://mailman.mitgcm.org/pipermail/mitgcm-support/2018-June/011593.html__;!!Mih3wA!Gw4tc7h4d5xgbTQUaiW0X933fMbQFlPW_FA6f78Qjf2PAq-yIVJFR10JAMMJ4xttCi3NmbCTS8RLRznInA$>
>
> Best,
> Mario
>
> 255: forrtl: severe (28): CLOSE error, unit 11, file "Unknown"
> 255: Image PC Routine Line Source
> 255: libifcoremt.so.5 00001555553EFBDE for__exit_handler Unknown Unknown
> 255: libifcoremt.so.5 00001555553FC78E for__signal_handl Unknown Unknown
> 255: libpthread-2.28.s 0000155550C3BC20 Unknown Unknown Unknown
> 255: libc-2.28.so <https://urldefense.com/v3/__http://libc-2.28.so__;!!Mih3wA!Gw4tc7h4d5xgbTQUaiW0X933fMbQFlPW_FA6f78Qjf2PAq-yIVJFR10JAMMJ4xttCi3NmbCTS8RezGD5pg$> 000015555095316B unlink Unknown Unknown
> 255: libifcoremt.so.5 00001555553E18B1 for__close_proc Unknown Unknown
> 255: libifcoremt.so.5 00001555553E0EB0 for_close Unknown Unknown
> 255: mitgcmuv_ad 0000000000902A3F Unknown Unknown Unknown
> 255: mitgcmuv_ad 00000000009E2F4A Unknown Unknown Unknown
> 255: mitgcmuv_ad 00000000009DB0E5 Unknown Unknown Unknown
> 255: mitgcmuv_ad 00000000009F7D38 Unknown Unknown Unknown
> 255: mitgcmuv_ad 000000000097DDEA Unknown Unknown Unknown
> 714: mitgcmuv_ad 0000000000403852 Unknown Unknown Unknown
> 714: libc-2.28.so <https://urldefense.com/v3/__http://libc-2.28.so__;!!Mih3wA!Gw4tc7h4d5xgbTQUaiW0X933fMbQFlPW_FA6f78Qjf2PAq-yIVJFR10JAMMJ4xttCi3NmbCTS8RezGD5pg$> 0000155550887493 __libc_start_main Unknown Unknown
> 714: mitgcmuv_ad 000000000040375E Unknown Unknown Unknown
> 255: mitgcmuv_ad 0000000000403852 Unknown Unknown Unknown
> 255: libc-2.28.so <https://urldefense.com/v3/__http://libc-2.28.so__;!!Mih3wA!Gw4tc7h4d5xgbTQUaiW0X933fMbQFlPW_FA6f78Qjf2PAq-yIVJFR10JAMMJ4xttCi3NmbCTS8RezGD5pg$> 0000155550887493 __libc_start_main Unknown Unknown
> 255: mitgcmuv_ad 000000000040375E Unknown Unknown Unknown
> 1481: forrtl: error (78): process killed (SIGTERM)
> 1481: Image PC Routine Line Source
> 1481: libifcoremt.so.5 00001555553FC76C for__signal_handl Unknown Unknown
> 1481: libpthread-2.28.s 0000155550C3BC20 Unknown Unknown Unknown
> 1481: libpthread-2.28.s 0000155550C3B1D6 __open64 Unknown Unknown
> 1481: libifcoremt.so.5 00001555554911B1 for__open_proc Unknown Unknown
> 1481: libifcoremt.so.5 000015555540CCBE for_open Unknown Unknown
> 1481: mitgcmuv_ad 000000000097F1F5 Unknown Unknown Unknown
> 1481: mitgcmuv_ad 000000000090122F Unknown Unknown Unknown
> 1481: mitgcmuv_ad 00000000009E2F4A Unknown Unknown Unknown
> 1481: mitgcmuv_ad 00000000009DB0E5 Unknown Unknown Unknown
> 1481: mitgcmuv_ad 00000000009F7D38 Unknown Unknown Unknown
> 1481: mitgcmuv_ad 000000000097DDEA Unknown Unknown Unknown
>
> On Fri, Mar 3, 2023 at 6:35 PM <mitgcm-support-request at mitgcm.org <mailto:mitgcm-support-request at mitgcm.org>> wrote:
>> Send MITgcm-support mailing list submissions to
>> mitgcm-support at mitgcm.org <mailto:mitgcm-support at mitgcm.org>
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support <https://urldefense.com/v3/__http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support__;!!Mih3wA!Gw4tc7h4d5xgbTQUaiW0X933fMbQFlPW_FA6f78Qjf2PAq-yIVJFR10JAMMJ4xttCi3NmbCTS8QuLWdRUA$>
>> or, via email, send a message with subject or body 'help' to
>> mitgcm-support-request at mitgcm.org <mailto:mitgcm-support-request at mitgcm.org>
>>
>> You can reach the person managing the list at
>> mitgcm-support-owner at mitgcm.org <mailto:mitgcm-support-owner at mitgcm.org>
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of MITgcm-support digest..."
>>
>>
>> Today's Topics:
>>
>> 1. too many values for NAMELIST variable (mario wrk)
>> 2. Re: too many values for NAMELIST variable
>> (Menemenlis, Dimitris (US 329B))
>> 3. Re: too many values for NAMELIST variable
>> (Carroll, Dustin (US 329C-Affiliate))
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Fri, 3 Mar 2023 17:48:03 +0300
>> From: mario wrk <wrkmario at gmail.com <mailto:wrkmario at gmail.com>>
>> To: mitgcm-support at mitgcm.org <mailto:mitgcm-support at mitgcm.org>
>> Subject: [MITgcm-support] too many values for NAMELIST variable
>> Message-ID:
>> <CAAfDP0dnjOKcAO682mC1+z1BDGmAYXdOaC3-iLQfJa0E8kinjA at mail.gmail.com <mailto:CAAfDP0dnjOKcAO682mC1%2Bz1BDGmAYXdOaC3-iLQfJa0E8kinjA at mail.gmail.com>>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Dear MITgcm community,
>> I was running a high resolution model in parallel with OpenMPI and got the
>> error below.
>> Does anyone have a clue?
>> Best,
>> Mario
>>
>>
>> 799: forrtl: severe (18): too many values for NAMELIST variable, unit 11,
>> file .........run_ad/scratch1.000000799, line 3315, position 7
>> 799: Image PC Routine Line
>> Source
>> 799: libifcoremt.so.5 00001555553E6E79 for__io_return Unknown
>> Unknown
>> 799: libifcoremt.so.5 000015555542F3F5 for_read_seq_nml Unknown
>> Unknown
>> 799: mitgcmuv_ad 0000000000786AC8 Unknown Unknown
>> Unknown
>> 799: mitgcmuv_ad 000000000077F9A4 Unknown Unknown
>> Unknown
>> 799: mitgcmuv_ad 0000000000862732 Unknown Unknown
>> Unknown
>> 799: mitgcmuv_ad 00000000008BD780 Unknown Unknown
>> Unknown
>> 799: mitgcmuv_ad 00000000004037C2 Unknown Unknown
>> Unknown
>> 799: libc-2.28.so <https://urldefense.com/v3/__http://libc-2.28.so__;!!Mih3wA!Gw4tc7h4d5xgbTQUaiW0X933fMbQFlPW_FA6f78Qjf2PAq-yIVJFR10JAMMJ4xttCi3NmbCTS8RezGD5pg$> 0000155550887493 __libc_start_main Unknown
>> Unknown
>> 799: mitgcmuv_ad 00000000004036CE Unknown Unknown
>> Unknown
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20230303/14bbf386/attachment-0001.html <https://urldefense.com/v3/__http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20230303/14bbf386/attachment-0001.html__;!!Mih3wA!Gw4tc7h4d5xgbTQUaiW0X933fMbQFlPW_FA6f78Qjf2PAq-yIVJFR10JAMMJ4xttCi3NmbCTS8R3N7lOzA$>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Fri, 3 Mar 2023 15:11:44 +0000
>> From: "Menemenlis, Dimitris (US 329B)"
>> <dimitris.menemenlis at jpl.nasa.gov <mailto:dimitris.menemenlis at jpl.nasa.gov>>
>> To: MITgcm Support <mitgcm-support at mitgcm.org <mailto:mitgcm-support at mitgcm.org>>
>> Subject: Re: [MITgcm-support] too many values for NAMELIST variable
>> Message-ID: <541236B8-2F43-46B3-A2BE-67CC43AD6D94 at jpl.nasa.gov <mailto:541236B8-2F43-46B3-A2BE-67CC43AD6D94 at jpl.nasa.gov>>
>> Content-Type: text/plain; charset="us-ascii"
>>
>> There is a problem with one of your runtime parameter files (data or data.*) in your runtime directory run_ad.
>>
>> On Mar 3, 2023, at 6:48 AM, mario wrk <wrkmario at gmail.com <mailto:wrkmario at gmail.com>> wrote:
>>
>> Dear MITgcm community,
>> I was running a high resolution model in parallel with OpenMPI and got the error below.
>> Does anyone have a clue?
>> Best,
>> Mario
>>
>>
>> 799: forrtl: severe (18): too many values for NAMELIST variable, unit 11, file .........run_ad/scratch1.000000799, line 3315, position 7
>> 799: Image PC Routine Line Source
>> 799: libifcoremt.so.5 00001555553E6E79 for__io_return Unknown Unknown
>> 799: libifcoremt.so.5 000015555542F3F5 for_read_seq_nml Unknown Unknown
>> 799: mitgcmuv_ad 0000000000786AC8 Unknown Unknown Unknown
>> 799: mitgcmuv_ad 000000000077F9A4 Unknown Unknown Unknown
>> 799: mitgcmuv_ad 0000000000862732 Unknown Unknown Unknown
>> 799: mitgcmuv_ad 00000000008BD780 Unknown Unknown Unknown
>> 799: mitgcmuv_ad 00000000004037C2 Unknown Unknown Unknown
>> 799: libc-2.28.so <https://urldefense.com/v3/__http://libc-2.28.so__;!!Mih3wA!Gw4tc7h4d5xgbTQUaiW0X933fMbQFlPW_FA6f78Qjf2PAq-yIVJFR10JAMMJ4xttCi3NmbCTS8RezGD5pg$><https://urldefense.us/v3/__http://libc-2.28.so__;!!PvBDto6Hs4WbVuu7!M2W-DPfuY1q_PomytTcSW2ZHGL19dSVErDKaeBk6O3OIi_YZzV1x60iLZjKuaNrKDqjCzquEc3Kh4E8hzb2sHWWSPPDd$ <https://urldefense.com/v3/__https://urldefense.us/v3/__http:/*libc-2.28.so__;!!PvBDto6Hs4WbVuu7!M2W-DPfuY1q_PomytTcSW2ZHGL19dSVErDKaeBk6O3OIi_YZzV1x60iLZjKuaNrKDqjCzquEc3Kh4E8hzb2sHWWSPPDd$__;Lw!!Mih3wA!Gw4tc7h4d5xgbTQUaiW0X933fMbQFlPW_FA6f78Qjf2PAq-yIVJFR10JAMMJ4xttCi3NmbCTS8SGL_NBUw$>> 0000155550887493 __libc_start_main Unknown Unknown
>> 799: mitgcmuv_ad 00000000004036CE Unknown Unknown Unknown
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org <mailto:MITgcm-support at mitgcm.org>
>> https://urldefense.us/v3/__http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support__;!!PvBDto6Hs4WbVuu7!M2W-DPfuY1q_PomytTcSW2ZHGL19dSVErDKaeBk6O3OIi_YZzV1x60iLZjKuaNrKDqjCzquEc3Kh4E8hzb2sHXBtL2F1$ <https://urldefense.com/v3/__https://urldefense.us/v3/__http:/*mailman.mitgcm.org/mailman/listinfo/mitgcm-support__;!!PvBDto6Hs4WbVuu7!M2W-DPfuY1q_PomytTcSW2ZHGL19dSVErDKaeBk6O3OIi_YZzV1x60iLZjKuaNrKDqjCzquEc3Kh4E8hzb2sHXBtL2F1$__;Lw!!Mih3wA!Gw4tc7h4d5xgbTQUaiW0X933fMbQFlPW_FA6f78Qjf2PAq-yIVJFR10JAMMJ4xttCi3NmbCTS8SaL3uv1Q$>
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20230303/3f126b2a/attachment-0001.html <https://urldefense.com/v3/__http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20230303/3f126b2a/attachment-0001.html__;!!Mih3wA!Gw4tc7h4d5xgbTQUaiW0X933fMbQFlPW_FA6f78Qjf2PAq-yIVJFR10JAMMJ4xttCi3NmbCTS8RTkw7Kfw$>>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Fri, 3 Mar 2023 15:35:12 +0000
>> From: "Carroll, Dustin (US 329C-Affiliate)"
>> <dustin.carroll at jpl.nasa.gov <mailto:dustin.carroll at jpl.nasa.gov>>
>> To: "mitgcm-support at mitgcm.org <mailto:mitgcm-support at mitgcm.org>" <mitgcm-support at mitgcm.org <mailto:mitgcm-support at mitgcm.org>>
>> Subject: Re: [MITgcm-support] too many values for NAMELIST variable
>> Message-ID:
>> <SJ0PR09MB90307EB60388A8CD177F8130BDB39 at SJ0PR09MB9030.namprd09.prod.outlook.com <mailto:SJ0PR09MB90307EB60388A8CD177F8130BDB39 at SJ0PR09MB9030.namprd09.prod.outlook.com>>
>>
>> Content-Type: text/plain; charset="windows-1252"
>>
>> To follow up on Dimitris? comment, if you open the file ?scratch1.000000799? in your run_ad directory
>> and look at line 3315, position 7 this will tell you where the syntax error / incorrect parameter value
>> occurred in your data.* file.
>>
>> From: MITgcm-support <mitgcm-support-bounces at mitgcm.org <mailto:mitgcm-support-bounces at mitgcm.org>> on behalf of Menemenlis, Dimitris (US 329B) <dimitris.menemenlis at jpl.nasa.gov <mailto:dimitris.menemenlis at jpl.nasa.gov>>
>> Date: Friday, March 3, 2023 at 7:12 AM
>> To: MITgcm Support <mitgcm-support at mitgcm.org <mailto:mitgcm-support at mitgcm.org>>
>> Subject: [EXTERNAL] Re: [MITgcm-support] too many values for NAMELIST variable
>> There is a problem with one of your runtime parameter files (data or data.*) in your runtime directory run_ad.
>>
>>
>> On Mar 3, 2023, at 6:48 AM, mario wrk <wrkmario at gmail.com <mailto:wrkmario at gmail.com>> wrote:
>>
>> Dear MITgcm community,
>> I was running a high resolution model in parallel with OpenMPI and got the error below.
>> Does anyone have a clue?
>> Best,
>> Mario
>>
>>
>> 799: forrtl: severe (18): too many values for NAMELIST variable, unit 11, file .........run_ad/scratch1.000000799, line 3315, position 7
>> 799: Image PC Routine Line Source
>> 799: libifcoremt.so.5 00001555553E6E79 for__io_return Unknown Unknown
>> 799: libifcoremt.so.5 000015555542F3F5 for_read_seq_nml Unknown Unknown
>> 799: mitgcmuv_ad 0000000000786AC8 Unknown Unknown Unknown
>> 799: mitgcmuv_ad 000000000077F9A4 Unknown Unknown Unknown
>> 799: mitgcmuv_ad 0000000000862732 Unknown Unknown Unknown
>> 799: mitgcmuv_ad 00000000008BD780 Unknown Unknown Unknown
>> 799: mitgcmuv_ad 00000000004037C2 Unknown Unknown Unknown
>> 799: libc-2.28.so <https://urldefense.com/v3/__http://libc-2.28.so__;!!Mih3wA!Gw4tc7h4d5xgbTQUaiW0X933fMbQFlPW_FA6f78Qjf2PAq-yIVJFR10JAMMJ4xttCi3NmbCTS8RezGD5pg$><https://urldefense.us/v3/__http:/libc-2.28.so__;!!PvBDto6Hs4WbVuu7!M2W-DPfuY1q_PomytTcSW2ZHGL19dSVErDKaeBk6O3OIi_YZzV1x60iLZjKuaNrKDqjCzquEc3Kh4E8hzb2sHWWSPPDd$ <https://urldefense.com/v3/__https://urldefense.us/v3/__http:/libc-2.28.so__;!!PvBDto6Hs4WbVuu7!M2W-DPfuY1q_PomytTcSW2ZHGL19dSVErDKaeBk6O3OIi_YZzV1x60iLZjKuaNrKDqjCzquEc3Kh4E8hzb2sHWWSPPDd$__;!!Mih3wA!Gw4tc7h4d5xgbTQUaiW0X933fMbQFlPW_FA6f78Qjf2PAq-yIVJFR10JAMMJ4xttCi3NmbCTS8QNGFHtAQ$>> 0000155550887493 __libc_start_main Unknown Unknown
>> 799: mitgcmuv_ad 00000000004036CE Unknown Unknown Unknown
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org <mailto:MITgcm-support at mitgcm.org>
>> https://urldefense.us/v3/__http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support__;!!PvBDto6Hs4WbVuu7!M2W-DPfuY1q_PomytTcSW2ZHGL19dSVErDKaeBk6O3OIi_YZzV1x60iLZjKuaNrKDqjCzquEc3Kh4E8hzb2sHXBtL2F1$ <https://urldefense.com/v3/__https://urldefense.us/v3/__http:/*mailman.mitgcm.org/mailman/listinfo/mitgcm-support__;!!PvBDto6Hs4WbVuu7!M2W-DPfuY1q_PomytTcSW2ZHGL19dSVErDKaeBk6O3OIi_YZzV1x60iLZjKuaNrKDqjCzquEc3Kh4E8hzb2sHXBtL2F1$__;Lw!!Mih3wA!Gw4tc7h4d5xgbTQUaiW0X933fMbQFlPW_FA6f78Qjf2PAq-yIVJFR10JAMMJ4xttCi3NmbCTS8SaL3uv1Q$>
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20230303/c74ea89b/attachment.html <https://urldefense.com/v3/__http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20230303/c74ea89b/attachment.html__;!!Mih3wA!Gw4tc7h4d5xgbTQUaiW0X933fMbQFlPW_FA6f78Qjf2PAq-yIVJFR10JAMMJ4xttCi3NmbCTS8RsC1HGuw$>>
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org <mailto:MITgcm-support at mitgcm.org>
>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support <https://urldefense.com/v3/__http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support__;!!Mih3wA!Gw4tc7h4d5xgbTQUaiW0X933fMbQFlPW_FA6f78Qjf2PAq-yIVJFR10JAMMJ4xttCi3NmbCTS8QuLWdRUA$>
>>
>>
>> ------------------------------
>>
>> End of MITgcm-support Digest, Vol 237, Issue 1
>> **********************************************
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> https://urldefense.com/v3/__http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support__;!!Mih3wA!Gw4tc7h4d5xgbTQUaiW0X933fMbQFlPW_FA6f78Qjf2PAq-yIVJFR10JAMMJ4xttCi3NmbCTS8QuLWdRUA$
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20230303/f6bc228a/attachment-0001.html>
More information about the MITgcm-support
mailing list