[MITgcm-support] Missing/inconsistent output and stalled runs with ECCOv4 setup on ARCHER

Dan Jones dcjones.work at gmail.com
Wed Oct 19 08:16:27 EDT 2016


Hi all,

I am using the ECCOv4 setup in adjoint mode for sensitivity analysis.  My
setup is close to the one found here:

    MITgcm_contrib/verification_other/global_oce_llc90/

modified for sensitivity analysis via data.ecco.  I am having a lot of
trouble getting consistent output from this setup on ARCHER (
http://www.archer.ac.uk/).  Here are some of the problems that I have run
into:

   1. The model does not output any STDOUT or STDERR while it is running,
   which makes it impossible to "check up on" the model while it's running.
   This behavior is different from every other MITgcm setup that I have run on
   ARCHER.  The STDOUT files are *only sometimes* produced *after* the model
   has finished running.

   2. The model will "crash" (e.g. a package check error during start-up)
   without stopping the ARCHER job.  The ARCHER job will continue until its
   wall time has been exceeded.  At that point, the STDOUT files *may* finally
   be produced.

   3. The model is inconsistent about when it produces output (e.g. MDS
   files, pickup files).  Often it doesn't produce anything.  When it actually
   does produce output, it only does so *after* the model has finished running.

I have checked all of my output frequencies in 'data' multiple times, and I
have had a colleague check as well.  Interestingly, I do not have these
problems on very short test runs (e.g. 8-10 time steps).  I only have these
problems on longer runs, i.e. greater than a couple hundred timesteps.

In short, my question is "why doesn't this setup produce regular output
while the model is running, and why does it often produce no output at all?"

One possibility - perhaps ARCHER has a memory buffer that needs to fill up
before it is dumped/output.  Maybe this setup isn't filling up the buffer
very often, so it's not producing output at run-time.  However, this
wouldn't explain why I sometimes get *no* output, even after the model has
finished running.  It also wouldn't explain why this behavior differs from
every other MITgcm setup that I have used on ARCHER.

Thank you in advance for any help/clarification that you can provide!

-Dan

--------------------------------------------------------------
Dr Dan Jones
Polar Oceans Team
British Antarctic Survey
--------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20161019/ed01fbe2/attachment.htm>


More information about the MITgcm-support mailing list