[MITgcm-support] jobs died suddenly

Yangxin He y67he at uwaterloo.ca
Wed Mar 25 15:56:04 EDT 2020


Hi Martin,


It seems like I have deleted the files for this run.

But I have attached files for another run that died for perhaps a different reason.


Thanks


Yangxin

________________________________
From: MITgcm-support <mitgcm-support-bounces at mitgcm.org> on behalf of Martin Losch <Martin.Losch at awi.de>
Sent: Wednesday, March 25, 2020 3:43:54 PM
To: MITgcm Support
Subject: Re: [MITgcm-support] jobs died suddenly

Hi Yangxin,

something is happening in S/R MDS_READ_SEC_YZ (in pkg/mdsio/mdsio_read_section.F)
The error message can have at least 5 different reasons, but we cannot know which, because you didn’t provide that infromation (probably somewhere in STDOUT.* or STDERR.*).

Martin

> On 25. Mar 2020, at 20:24, Yangxin He <y67he at uwaterloo.ca> wrote:
>
> Hello there,
>
> Recently several jobs of mine died of no reason. The error message is
> [y67he at gra-login1 b6]$ more sim-29315632.err
> ABNORMAL END: S/R MDS_READ_SEC_YZ
> srun: Job step aborted: Waiting up to 62 seconds for job step to finish.
> slurmstepd: error: *** JOB 29315632 ON gra228 CANCELLED AT 2020-03-25T08:30:08 DUE TO TIME LIMIT ***
> slurmstepd: error: *** STEP 29315632.0 ON gra228 CANCELLED AT 2020-03-25T08:30:08 DUE TO TIME LIMIT ***
> The time limit was not the problem. The code simply stopped producing any new results, however, it was still running.
> This is confusing, because I have been using the same set up for a while and this only started to happen in the past few weeks.
>
> I ran my code on graham in compute Canada, and the people there suggested it may be the problem in the code.
> Can anyone shed any lights on this?
>
> Thanks
>
> Yangxin
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support

_______________________________________________
MITgcm-support mailing list
MITgcm-support at mitgcm.org
http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20200325/cff38763/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sim-29508690.err
Type: application/octet-stream
Size: 32679 bytes
Desc: sim-29508690.err
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20200325/cff38763/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: STDERR.0000
Type: application/octet-stream
Size: 772 bytes
Desc: STDERR.0000
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20200325/cff38763/attachment-0003.obj>


More information about the MITgcm-support mailing list