[MITgcm-support] Missing/inconsistent output and stalled runs with ECCOv4 setup on ARCHER

Wed Oct 19 09:57:26 EDT 2016

Hi Dan,

MITgcm_contrib/verification_other/global_oce_llc90/ is not intended for long runs and I suspect that your 
experience on archer may reflect that the model / system is trying to access forcing files that are not there.

Please refer to the email I just sent to mitgcm-support ("running ECCO version 4 release 2” ) and to 
http://mitgcm.org/viewvc/*checkout*/MITgcm/MITgcm_contrib/gael/verification/eccov4.pdf
for additional details with regard to the reference bi-decadal run.

Cheers,
Gael

On Oct 19, 2016, at 8:16 AM, Dan Jones <dcjones.work at gmail.com> wrote:

> Hi all,
> 
> I am using the ECCOv4 setup in adjoint mode for sensitivity analysis.  My setup is close to the one found here:
> 
>     MITgcm_contrib/verification_other/global_oce_llc90/
> 
> modified for sensitivity analysis via data.ecco.  I am having a lot of trouble getting consistent output from this setup on ARCHER (http://www.archer.ac.uk/).  Here are some of the problems that I have run into:
> The model does not output any STDOUT or STDERR while it is running, which makes it impossible to "check up on" the model while it's running.  This behavior is different from every other MITgcm setup that I have run on ARCHER.  The STDOUT files are *only sometimes* produced *after* the model has finished running.
> 
> The model will "crash" (e.g. a package check error during start-up) without stopping the ARCHER job.  The ARCHER job will continue until its wall time has been exceeded.  At that point, the STDOUT files *may* finally be produced.  
> 
> The model is inconsistent about when it produces output (e.g. MDS files, pickup files).  Often it doesn't produce anything.  When it actually does produce output, it only does so *after* the model has finished running.
> I have checked all of my output frequencies in 'data' multiple times, and I have had a colleague check as well.  Interestingly, I do not have these problems on very short test runs (e.g. 8-10 time steps).  I only have these problems on longer runs, i.e. greater than a couple hundred timesteps.  
> 
> In short, my question is "why doesn't this setup produce regular output while the model is running, and why does it often produce no output at all?"
> 
> One possibility - perhaps ARCHER has a memory buffer that needs to fill up before it is dumped/output.  Maybe this setup isn't filling up the buffer very often, so it's not producing output at run-time.  However, this wouldn't explain why I sometimes get *no* output, even after the model has finished running.  It also wouldn't explain why this behavior differs from every other MITgcm setup that I have used on ARCHER.
> 
> Thank you in advance for any help/clarification that you can provide!
> 
> -Dan
> 
> --------------------------------------------------------------
> Dr Dan Jones
> Polar Oceans Team
> British Antarctic Survey
> --------------------------------------------------------------
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20161019/6e085653/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1843 bytes
Desc: not available
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20161019/6e085653/attachment.p7s>