<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Helvetica,sans-serif;" dir="ltr">
<p>Hello there,</p>
<p><br>
</p>
<p>Recently several jobs of mine died of no reason. The error message is</p>
<p style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Courier; color: rgb(255, 238, 149); background-color: rgb(21, 102, 47);">
<span style="font-variant-ligatures: no-common-ligatures">[y67he@gra-login1 b6]$ more sim-29315632.err </span></p>
<p style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Courier; color: rgb(255, 238, 149); background-color: rgb(21, 102, 47);">
<span style="font-variant-ligatures: no-common-ligatures">ABNORMAL END: S/R MDS_READ_SEC_YZ</span></p>
<p style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Courier; color: rgb(255, 238, 149); background-color: rgb(21, 102, 47);">
<span style="font-variant-ligatures: no-common-ligatures">srun: Job step aborted: Waiting up to 62 seconds for job step to finish.</span></p>
<p style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Courier; color: rgb(255, 238, 149); background-color: rgb(21, 102, 47);">
<span style="font-variant-ligatures: no-common-ligatures">slurmstepd: error: *** JOB 29315632 ON gra228 CANCELLED AT 2020-03-25T08:30:08 DUE TO TIME LIMIT ***</span></p>
<p></p>
<p style="margin-right: 0px; margin-left: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Courier; color: rgb(255, 238, 149); background-color: rgb(21, 102, 47);">
<span style="font-variant-ligatures: no-common-ligatures">slurmstepd: error: *** STEP 29315632.0 ON gra228 CANCELLED AT 2020-03-25T08:30:08 DUE TO TIME LIMIT ***</span></p>
<p>The time limit was not the problem. The code simply stopped producing any new results, however, it was still running.</p>
<p>This is confusing, because I have been using the same set up for a while and this only started to happen in the past few weeks.</p>
<p><br>
</p>
<p>I ran my code on graham in compute Canada, and the people there suggested it may be the problem in the code.</p>
<p>Can anyone shed any lights on this?</p>
<p><br>
</p>
<p>Thanks</p>
<p><br>
</p>
<p>Yangxin</p>
</div>
</body>
</html>