<html><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Do you hav a STDOUT from a run that worked to compare with a STDOUT from a run that crashed?<div class="">If so you can send it to me at <a href="mailto:mmazloff@ucsd.edu" class="">mmazloff@ucsd.edu</a> and I can try to find something different.</div><div class=""><br class=""></div><div class="">Matt</div><div class=""><br class=""></div><div class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Mar 26, 2020, at 12:19 PM, Yangxin He <<a href="mailto:y67he@uwaterloo.ca" class="">y67he@uwaterloo.ca</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><div id="x_divtagdefaultwrapper" dir="ltr" style="font-size: 12pt; font-family: Calibri, Helvetica, sans-serif;" class=""><div style="margin-top: 0px; margin-bottom: 0px;" class="">Hi Jean-Michel,</div><div style="margin-top: 0px; margin-bottom: 0px;" class=""><br class=""></div><div style="margin-top: 0px; margin-bottom: 0px;" class="">Previously I have contacted the "graham" staff, their response was everything seemed normal on their ends. Thats why I am here to see if I can have any luck.</div><div style="margin-top: 0px; margin-bottom: 0px;" class="">A few more of my runs have died, and I have not received any reply from them yet.</div><div style="margin-top: 0px; margin-bottom: 0px;" class=""><br class=""></div><div style="margin-top: 0px; margin-bottom: 0px;" class="">Yangxin</div></div><hr tabindex="-1" style="display: inline-block; width: 857.5px;" class=""><div id="x_divRplyFwdMsg" dir="ltr" class=""><font face="Calibri, sans-serif" style="font-size: 11pt;" class=""><b class="">From:</b><span class="Apple-converted-space"> </span>MITgcm-support <<a href="mailto:mitgcm-support-bounces@mitgcm.org" class="">mitgcm-support-bounces@mitgcm.org</a>> on behalf of Jean-Michel Campin <<a href="mailto:jmc@mit.edu" class="">jmc@mit.edu</a>><br class=""><b class="">Sent:</b><span class="Apple-converted-space"> </span>Wednesday, March 25, 2020 10:20:46 PM<br class=""><b class="">To:</b><span class="Apple-converted-space"> </span><a href="mailto:mitgcm-support@mitgcm.org" class="">mitgcm-support@mitgcm.org</a><br class=""><b class="">Subject:</b><span class="Apple-converted-space"> </span>Re: [MITgcm-support] jobs died suddenly</font><div class=""> </div></div></div><font size="2" style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="font-size: 10pt;" class=""><div class="PlainText">Hi Yangxin,<br class=""><br class="">I could be be due to disk system on this computer that was not allowing the model<span class="Apple-converted-space"> </span><br class="">to access this file from one othe node you were using. To check this, I would suggest<span class="Apple-converted-space"> </span><br class="">to try to re-run it without changing anything, and keep a reccord of which node your<br class="">job was affected to. This way you might be able to contact the "graham" admin staff<br class="">with more precise information.<br class=""><br class="">Cheers,<br class="">Jean-Michel<br class=""><br class="">On Thu, Mar 26, 2020 at 02:10:57AM +0000, Yangxin He wrote:<br class="">> Hi Matt,<br class="">><span class="Apple-converted-space"> </span><br class="">><span class="Apple-converted-space"> </span><br class="">> so for the sudden crash of the run, do you think the problem is in the code or it may be something with graham (compute Canada)?<br class="">><span class="Apple-converted-space"> </span><br class="">><span class="Apple-converted-space"> </span><br class="">> Yangxin<br class="">><span class="Apple-converted-space"> </span><br class="">> ________________________________<br class="">> From: MITgcm-support <<a href="mailto:mitgcm-support-bounces@mitgcm.org" class="">mitgcm-support-bounces@mitgcm.org</a>> on behalf of Matthew Mazloff <<a href="mailto:mmazloff@ucsd.edu" class="">mmazloff@ucsd.edu</a>><br class="">> Sent: Wednesday, March 25, 2020 4:28:30 PM<br class="">> To: <a href="mailto:mitgcm-support@mitgcm.org" class="">mitgcm-support@mitgcm.org</a><br class="">> Subject: Re: [MITgcm-support] jobs died suddenly<br class="">><span class="Apple-converted-space"> </span><br class="">> That is a separate issue. The model crashed but the HPC didn?t stop the job. I don?t know how to remedy that and the HPC support should be able to help with that.<br class="">> The model was not running. The executable had stopped and that is your primary issue. I am not sure why the model crashed, but my first guess is it happened while trying to read an OBW or OBE file.<br class="">><span class="Apple-converted-space"> </span><br class="">> Matt<br class="">><span class="Apple-converted-space"> </span><br class="">><span class="Apple-converted-space"> </span><br class="">> On Mar 25, 2020, at 1:22 PM, Yangxin He <<a href="mailto:y67he@uwaterloo.ca" class="">y67he@uwaterloo.ca</a><<a href="mailto:y67he@uwaterloo.ca" class="">mailto:y67he@uwaterloo.ca</a>>> wrote:<br class="">><span class="Apple-converted-space"> </span><br class="">> Hi Matt,<br class="">><span class="Apple-converted-space"> </span><br class="">> Yep. This is part of my data file:<br class="">> #obcs forcing<br class="">> periodicExternalForcing=.TRUE.,<br class="">> externForcingPeriod=36.,<br class="">> externForcingCycle=86148.,<br class="">><span class="Apple-converted-space"> </span><br class="">> &<br class="">><span class="Apple-converted-space"> </span><br class="">> Apart from this, Enrico has the same problem as to run not producing files but still running.<br class="">> I submitted a ticket to graham compute Canada, they did not know why and suggested me to try here.<br class="">><span class="Apple-converted-space"> </span><br class="">> Yangxin<br class="">> ________________________________<br class="">> From: MITgcm-support <<a href="mailto:mitgcm-support-bounces@mitgcm.org" class="">mitgcm-support-bounces@mitgcm.org</a><<a href="mailto:mitgcm-support-bounces@mitgcm.org" class="">mailto:mitgcm-support-bounces@mitgcm.org</a>>> on behalf of Matthew Mazloff <<a href="mailto:mmazloff@ucsd.edu" class="">mmazloff@ucsd.edu</a><<a href="mailto:mmazloff@ucsd.edu" class="">mailto:mmazloff@ucsd.edu</a>>><br class="">> Sent: Wednesday, March 25, 2020 4:17:53 PM<br class="">> To: <a href="mailto:mitgcm-support@mitgcm.org" class="">mitgcm-support@mitgcm.org</a><<a href="mailto:mitgcm-support@mitgcm.org" class="">mailto:mitgcm-support@mitgcm.org</a>><br class="">> Subject: Re: [MITgcm-support] jobs died suddenly<br class="">><span class="Apple-converted-space"> </span><br class="">> Well it definitely died while trying to read something for the obcs:<br class="">> ABNORMAL END: S/R MDS_READ_SEC_YZ<br class="">><span class="Apple-converted-space"> </span><br class="">> Do you also give boundary files that have a start time and period given in data.exf?<br class="">> E.g.:<br class="">> obcsWstartdate1 = 20081216,<br class="">> obcsWstartdate2 = 00000,<br class="">> obcsWperiod = 2629800,<br class="">><span class="Apple-converted-space"> </span><br class="">> Matt<br class="">><span class="Apple-converted-space"> </span><br class="">><span class="Apple-converted-space"> </span><br class="">> On Mar 25, 2020, at 1:11 PM, Yangxin He <<a href="mailto:y67he@uwaterloo.ca" class="">y67he@uwaterloo.ca</a><<a href="mailto:y67he@uwaterloo.ca" class="">mailto:y67he@uwaterloo.ca</a>>> wrote:<br class="">><span class="Apple-converted-space"> </span><br class="">> Hi Matt,<br class="">><span class="Apple-converted-space"> </span><br class="">> This would be really confusing.<br class="">> My file seems to be the right size, and the run died after running fine for 34 tidal periods. If the size of boundary files is the problem, then the run would have died in the beginning?<br class="">> Another thing is, I have been using this set up for about a year now, and it was running fine only until recently.<br class="">><span class="Apple-converted-space"> </span><br class="">> Yangxin<br class="">> ________________________________<br class="">> From: MITgcm-support <<a href="mailto:mitgcm-support-bounces@mitgcm.org" class="">mitgcm-support-bounces@mitgcm.org</a><<a href="mailto:mitgcm-support-bounces@mitgcm.org" class="">mailto:mitgcm-support-bounces@mitgcm.org</a>>> on behalf of Matthew Mazloff <<a href="mailto:mmazloff@ucsd.edu" class="">mmazloff@ucsd.edu</a><<a href="mailto:mmazloff@ucsd.edu" class="">mailto:mmazloff@ucsd.edu</a>>><br class="">> Sent: Wednesday, March 25, 2020 4:06:38 PM<br class="">> To: <a href="mailto:mitgcm-support@mitgcm.org" class="">mitgcm-support@mitgcm.org</a><<a href="mailto:mitgcm-support@mitgcm.org" class="">mailto:mitgcm-support@mitgcm.org</a>><br class="">> Subject: Re: [MITgcm-support] jobs died suddenly<br class="">><span class="Apple-converted-space"> </span><br class="">> Hello<br class="">><span class="Apple-converted-space"> </span><br class="">> The code crashed trying to read a file. The file is size NY*NZ*NT so I suspect it is an eastern or western boundary condition file. Make sure your files are long enough.<br class="">><span class="Apple-converted-space"> </span><br class="">> -Matt<br class="">><span class="Apple-converted-space"> </span><br class="">><span class="Apple-converted-space"> </span><br class="">> On Mar 25, 2020, at 12:24 PM, Yangxin He <<a href="mailto:y67he@uwaterloo.ca" class="">y67he@uwaterloo.ca</a><<a href="mailto:y67he@uwaterloo.ca" class="">mailto:y67he@uwaterloo.ca</a>>> wrote:<br class="">><span class="Apple-converted-space"> </span><br class="">> Hello there,<br class="">><span class="Apple-converted-space"> </span><br class="">> Recently several jobs of mine died of no reason. The error message is<br class="">> [y67he@gra-login1 b6]$ more sim-29315632.err<br class="">> ABNORMAL END: S/R MDS_READ_SEC_YZ<br class="">> srun: Job step aborted: Waiting up to 62 seconds for job step to finish.<br class="">> slurmstepd: error: *** JOB 29315632 ON gra228 CANCELLED AT 2020-03-25T08:30:08 DUE TO TIME LIMIT ***<br class="">><span class="Apple-converted-space"> </span><br class="">> slurmstepd: error: *** STEP 29315632.0 ON gra228 CANCELLED AT 2020-03-25T08:30:08 DUE TO TIME LIMIT ***<br class="">> The time limit was not the problem. The code simply stopped producing any new results, however, it was still running.<br class="">> This is confusing, because I have been using the same set up for a while and this only started to happen in the past few weeks.<br class="">><span class="Apple-converted-space"> </span><br class="">> I ran my code on graham in compute Canada, and the people there suggested it may be the problem in the code.<br class="">> Can anyone shed any lights on this?<br class="">><span class="Apple-converted-space"> </span><br class="">> Thanks<br class="">><span class="Apple-converted-space"> </span><br class="">> Yangxin<br class="">> _______________________________________________<br class="">> MITgcm-support mailing list<br class="">> <a href="mailto:MITgcm-support@mitgcm.org" class="">MITgcm-support@mitgcm.org</a><<a href="mailto:MITgcm-support@mitgcm.org" class="">mailto:MITgcm-support@mitgcm.org</a>><br class="">><span class="Apple-converted-space"> </span><a href="http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support" class="">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support</a><br class="">><span class="Apple-converted-space"> </span><br class="">> _______________________________________________<br class="">> MITgcm-support mailing list<br class="">> <a href="mailto:MITgcm-support@mitgcm.org" class="">MITgcm-support@mitgcm.org</a><<a href="mailto:MITgcm-support@mitgcm.org" class="">mailto:MITgcm-support@mitgcm.org</a>><br class="">><span class="Apple-converted-space"> </span><a href="http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support" class="">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support</a><br class="">><span class="Apple-converted-space"> </span><br class="">> _______________________________________________<br class="">> MITgcm-support mailing list<br class="">> <a href="mailto:MITgcm-support@mitgcm.org" class="">MITgcm-support@mitgcm.org</a><<a href="mailto:MITgcm-support@mitgcm.org" class="">mailto:MITgcm-support@mitgcm.org</a>><br class="">><span class="Apple-converted-space"> </span><a href="http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support" class="">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support</a><br class="">><span class="Apple-converted-space"> </span><br class=""><br class="">> _______________________________________________<br class="">> MITgcm-support mailing list<br class="">> <a href="mailto:MITgcm-support@mitgcm.org" class="">MITgcm-support@mitgcm.org</a><br class="">><span class="Apple-converted-space"> </span><a href="http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support" class="">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support</a><br class=""><br class="">_______________________________________________<br class="">MITgcm-support mailing list<br class=""><a href="mailto:MITgcm-support@mitgcm.org" class="">MITgcm-support@mitgcm.org</a><br class=""><a href="http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support" class="">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support</a><br class=""></div></span></font><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">_______________________________________________</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">MITgcm-support mailing list</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><a href="mailto:MITgcm-support@mitgcm.org" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class="">MITgcm-support@mitgcm.org</a><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><a href="http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px;" class="">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support</a></div></blockquote></div><br class=""></div></body></html>