<div dir="ltr"><div>Hi Kunal <br></div><div><br></div><div>Besides what Jody already said.</div><div><br></div><div>1) Take a look at the code file (.f) lines in the backtraced error (STDERR listed in your last email).<br></div><div><br></div><div>The floating point exception seems to have happened in the exf package <br></div><div>(which handles external forcings).</div><div>Note that the .f files should be in your build directory, not in the original code tree (where .F90 files live).</div><div>They have been preprocessed to insert all the ".h" files and preprocessor directives, so the line numbers are different</div><div>in the .f and .F90.</div><div><br></div><div>Sometimes it is hard to nail down which forcing file is responsible for the failure,</div><div>but you should have a data.exf namelist file that lists the files you are using at least.</div><div>Inspect your forcing files, make sure they are consistent with the grid,</div><div>and that there aren't NaNs, missing values, or zeroes where they are not supposed to be.</div><div>The bulkformulae.f line 4598 may give a hint why you get a division by zero, and hopefully <br></div><div>suggest which data in your forcing files is causing trouble.</div><div><br></div><div>2) Pay attention to the warning messages in STDERR.</div><div><br></div><div>They are mostly related to the MNC (netCDF IO) package, and suggest some changes you</div><div>may consider adopting.<br></div><div>My recollection is that MNC doesn't support useSingleCpuIO,</div><div>which means that it will output one file per MPI subdomain,</div><div>which you will have to combine after the model runs.</div><div>[There are some matlab scripts for that, maybe Python too.]<br></div><div>I gave up completely using netCDF in MITgcm experiments because of this.</div><div>Output through all processors stresses out any computer or cluster,</div><div>especially when the number of subdomains is large, and if other users are</div><div>also pounding on IO.<br></div><div><br></div><div>The MITgcm still works better with binary IO (MDSIO),</div><div>where the binaries have a metadata text file (.meta) and the binary part itself (.data).<br></div><div>That is unfortunate, causes much frustration, and errors that don't happen in models that have <br></div><div>a robust netCDF IO, because binary files are error prone, producing and inspecting binary files is a pain, <br></div><div>let alone doing data analysis on them, but that is the way it is.</div><div>If you don't have a parallel file system use MDSIO and useSingleCpuIO=.true. is probably <br></div><div>the most stable way to run the model.</div><div>However, you will have to either read your binary output directly for QC and data analysis,</div><div>or write scripts to convert them to netCDF for further processing.</div><div><br></div><div>Beware that the MITgcm has controls for the binary precision readBinaryPrec,</div><div>and writeBinaryPrec in the data namelist &PARM01.</div><div>Besides, the norm is to use "big endian" floating point format (which is commonly part</div><div>of the compilation flags), so that even the visualization of binaries requires swapping bytes</div><div>(because virtually all computers today are "little endian".<br></div><div><br></div><div>People (including myself) can spend a lot of time writing and <br></div><div>debugging these binary-to-netcdf and netcdf-to-binary scripts, <br></div><div>something that would be obviated if the model had a complete and robust netCDF IO package.<br></div><div><br></div><div>There are some tools for that in the MITgcm code itself, and I suggest that you start by looking at <br></div><div>the existing Matlab scripts:</div><div><a href="http://wwwcvs.mitgcm.org/viewvc/MITgcm/MITgcm/utils/matlab/" target="_blank">http://wwwcvs.mitgcm.org/viewvc/MITgcm/MITgcm/utils/matlab/</a></div><div>This is more organized set of scripts created by Martin Losch (I think):<br></div><div><a href="http://wwwcvs.mitgcm.org/viewvc/MITgcm/MITgcm/verification/tutorial_global_oce_latlon/diags_matlab/" target="_blank">http://wwwcvs.mitgcm.org/viewvc/MITgcm/MITgcm/verification/tutorial_global_oce_latlon/diags_matlab/</a></div><div><br></div><div>My recollection is that Jody wrote an extension to the MNC package, <br></div><div>not yet part of the mainstream code, though.</div><div><br></div><div>3) You can increase the frequency of monitor and data output, to achieve what Jody suggested,</div><div>and inspect also STDOUT.XXXX.<br></div><div><br></div><div>If the model fails right at the start, then set them equal to one time step.</div><div>The output will be huge, but if it fails in the begining of the run,</div><div>this is no big deal:</div><div><br></div><div><div>monitorFreq and dumpFreq in data &PARM03 <br></div><div><br></div><div>I hope this helps,</div><div>Gus Correa<br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div></div><div><br></div><div>I hope this helps,</div><div>Gus Correa<br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Sep 13, 2020 at 11:23 AM Jody Klymak <<a href="mailto:jklymak@uvic.ca" target="_blank">jklymak@uvic.ca</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Hi Kunal.<div><br></div><div>Check STDOUT.0000 as that is more relevant. Does the cfl criteria blow up? If so, you will simply have to reduce the time step. Its possible you have also set the model up incorrectly and it is convecting initially, which will easily violate the cfl criteria. It is also hard to tell fro what you are giving us when the model blows up. time step 1? time step 100? Finally, its often useful to plot the fields to see where the instability is happening. That may require you to save quite a bit of data, but its hard to debug in the absence of information. </div><div><br></div><div>Best of luck! Jody<br><div><br><blockquote type="cite"><div>On 13 Sep 2020, at 07:22, kunal madkaiker <<a href="mailto:kunal.madkaiker02@gmail.com" target="_blank">kunal.madkaiker02@gmail.com</a>> wrote:</div><br><div><div dir="ltr"><div>Hi Gus,</div><div><br></div><div>As per your suggestion, I made the respective changes and tried to run the executable again. Below is the log generated</div><div><br></div><div>$ mpirun -np 60 ./mitgcmuv <br>forrtl: error (72): floating overflow<br>Image PC Routine Line Source <br>libifcoremt.so.5 00002AD443246555 for__signal_handl Unknown Unknown<br>libpthread-2.17.s 00002AD442DB35F0 Unknown Unknown Unknown<br>libnetcdf.so.15.2 00002AD44121C4B3 __libm_exp_e7 Unknown Unknown<br>mitgcmuv 0000000000AC0FF7 exf_bulkformulae_ 4598 exf_bulkformulae.f<br>mitgcmuv 0000000000B02334 exf_getforcing_ 4430 exf_getforcing.f<br>mitgcmuv 000000000128726E load_fields_drive 2141 load_fields_driver.f<br>mitgcmuv 0000000000C45A25 forward_step_ 2340 forward_step.f<br>mitgcmuv 0000000001290200 main_do_loop_ 2078 main_do_loop.f<br>mitgcmuv 0000000001C283F6 the_main_loop_ 2097 the_main_loop.f<br>mitgcmuv 0000000001C28955 the_model_main_ 2421 the_model_main.f<br>mitgcmuv 0000000001290615 MAIN__ 4286 main.f<br>mitgcmuv 0000000000403412 Unknown Unknown Unknown<br><a href="http://libc-2.17.so/" target="_blank">libc-2.17.so</a> 00002AD445C2A505 __libc_start_main Unknown Unknown<br>mitgcmuv 0000000000403319 Unknown Unknown Unknown</div><div>----------------------------------------------------------------------------------------------------------------------------------------------</div><div><br></div><div>The STDERR file reads:</div><div>(PID.TID 0030.0001) ** WARNING ** MNC_READPARMS: incomplete MNC pickup files implementation<br>(PID.TID 0030.0001) ** WARNING ** MNC_READPARMS: => pickup_write_mnc=T not recommanded<br>(PID.TID 0030.0001) ** WARNING ** MNC_READPARMS: => pickup_read_mnc=T not working for some set-up<br>(PID.TID 0030.0001) ** WARNING ** INI_MODEL_IO: globalFiles=TRUE is not safe in Multi-processors (MPI) run<br>(PID.TID 0030.0001) ** WARNING ** INI_MODEL_IO: use instead "useSingleCpuIO=.TRUE."<br>(PID.TID 0030.0001) ** WARNING ** INI_MODEL_IO: use tiled-files to write sections (for OBCS)<br>(PID.TID 0030.0001) ** WARNING ** EXF_CHECK: wind-stress position irrelevant</div><div><br></div><div>Attaching data, data.obcs, data.exf for your reference. I have set deltaTmom=120.0,</div><div>What I am understanding is that the model is blowing up due to overestimation of few values and not because of any error. Am I right?</div><div><br></div><div>Regards</div><div>Kunal<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Sep 13, 2020 at 6:47 AM Gus Correa <<a href="mailto:gus@ldeo.columbia.edu" target="_blank">gus@ldeo.columbia.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Hi Kunal</div><div><br></div><div>To try to nail down where, when, why it fails you could compile in debugging mode,</div><div>ie. start fresh ('make CLEAN' in the build directory or just wipe that directory off) <br></div><div>and run gemake2 with the -devel flag (keep the other flags).</div><div>Then, to increase verbosity add:</div><div>debugLevel = 4,</div><div>to the "data" namelist &PARM01,</div><div>and increase the <br></div><div>monitorFreq <br></div><div>in &PARM03</div><div>to one or a few time steps.</div><div>The STDOUT.XXXX, and STDERR.XXXX files <br></div><div>may give a hint of what is going on (when, where, wny it fails).</div><div><br></div><div>I hope this helps,</div><div>Gus Correa<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, Sep 12, 2020 at 7:38 PM kunal madkaiker <<a href="mailto:kunal.madkaiker02@gmail.com" target="_blank">kunal.madkaiker02@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>Dear All,</div><div><br></div><div>I am trying to simulate U,V currents circulation along the West Coast of India.</div><div>I have a grid of 720 x 1560 with a high resolution of 1.45km x 1.45km, with 25 levels in the vertical from 0 to 2150m. I have set hFacMin=0.3 and hFacMinDz=10</div><div><br></div><div>But model blows up at the initial stage and I get the error: <br></div><div>Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG IEEE_DIVIDE_BY_ZERO IEEE_UNDERFLOW_FLAG<br>STOP MOM_IMPLICIT_R: error when solving 3-Diag problem.</div><div><br></div><div>I have tried changing viscAh from 1 to 1000 m2/s and viscAz from 0.02 to 0.001 m2/s. Also tried with viscAhgrid=0.1.</div><div></div><div>I have defined the vertical levels keeping delZ(k+1)/delZ(k) < 1.4 ratio in mind. But the issue persists. Kindly advise. <br></div><div>Please let me know if any additional information is required from my side.</div><div><br></div><div>Regards</div><div>Kunal<br></div><div><br></div></div>
_______________________________________________<br>
MITgcm-support mailing list<br>
<a href="mailto:MITgcm-support@mitgcm.org" target="_blank">MITgcm-support@mitgcm.org</a><br>
<a href="http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support" rel="noreferrer" target="_blank">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support</a><br>
</blockquote></div>
_______________________________________________<br>
MITgcm-support mailing list<br>
<a href="mailto:MITgcm-support@mitgcm.org" target="_blank">MITgcm-support@mitgcm.org</a><br>
<a href="http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support" rel="noreferrer" target="_blank">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support</a><br>
</blockquote></div>
<span id="gmail-m_-2582315382732661100gmail-m_-4779576667501712223gmail-m_-6875495784263515161cid:f_kf16sty10"><data></span><span id="gmail-m_-2582315382732661100gmail-m_-4779576667501712223gmail-m_-6875495784263515161cid:f_kf16stys2"><data.obcs></span><span id="gmail-m_-2582315382732661100gmail-m_-4779576667501712223gmail-m_-6875495784263515161cid:f_kf16styo1"><data.exf></span><span id="gmail-m_-2582315382732661100gmail-m_-4779576667501712223gmail-m_-6875495784263515161cid:f_kf16styv3"><STDOUT.0025></span>_______________________________________________<br>MITgcm-support mailing list<br><a href="mailto:MITgcm-support@mitgcm.org" target="_blank">MITgcm-support@mitgcm.org</a><br><a href="http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support" target="_blank">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support</a><br></div></blockquote></div><br><div>
<div style="color:rgb(0,0,0);letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><span style="border-collapse:separate;border-spacing:0px;color:rgb(0,0,0);font-family:"Lucida Sans Typewriter";font-size:12px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div><span style="border-collapse:separate;border-spacing:0px;color:rgb(0,0,0);font-family:"Lucida Sans Typewriter";font-size:12px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-variant-east-asian:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div>--</div><div>Jody Klymak </div><div><a href="http://ocean-physics.seos.uvic.ca/~jklymak/" target="_blank">http://ocean-physics.seos.uvic.ca/~jklymak/</a></div><div><br></div><div><br></div><br></span></div></span></div><br><br>
</div>
<br></div></div>_______________________________________________<br>
MITgcm-support mailing list<br>
<a href="mailto:MITgcm-support@mitgcm.org" target="_blank">MITgcm-support@mitgcm.org</a><br>
<a href="http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support" rel="noreferrer" target="_blank">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support</a><br>
</blockquote></div>