[MITgcm-support] Missing/inconsistent output and stalled runs with ECCOv4 setup on ARCHER

Dan Jones dcjones.work at gmail.com
Wed Oct 19 18:05:31 EDT 2016


Hi Gael,

Thanks for your quick reply!  Sorry, I should have specified that I
modified the setup to use repeating Core normal year forcing.  It isn't
having trouble getting the forcing files.  Happily, my output issues went
away when I set useSingleCpuIO=.FALSE.,.

Best,
Dan

On Wed, Oct 19, 2016 at 5:00 PM, <mitgcm-support-request at mitgcm.org> wrote:

> Send MITgcm-support mailing list submissions to
>         mitgcm-support at mitgcm.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://mitgcm.org/mailman/listinfo/mitgcm-support
> or, via email, send a message with subject or body 'help' to
>         mitgcm-support-request at mitgcm.org
>
> You can reach the person managing the list at
>         mitgcm-support-owner at mitgcm.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of MITgcm-support digest..."
>
>
> Today's Topics:
>
>    1. Re: Missing/inconsistent output and stalled runs  with ECCOv4
>       setup on ARCHER (gael forget)
>    2. Re: Missing/inconsistent output and stalled runs  with ECCOv4
>       setup on ARCHER (Dan Jones)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 19 Oct 2016 09:57:26 -0400
> From: gael forget <gforget at mit.edu>
> To: mitgcm-support at mitgcm.org
> Subject: Re: [MITgcm-support] Missing/inconsistent output and stalled
>         runs    with ECCOv4 setup on ARCHER
> Message-ID: <4E2BC0EA-45DF-48C7-8452-0AFFBAC789A9 at mit.edu>
> Content-Type: text/plain; charset="windows-1252"
>
> Hi Dan,
>
> MITgcm_contrib/verification_other/global_oce_llc90/ is not intended for
> long runs and I suspect that your
> experience on archer may reflect that the model / system is trying to
> access forcing files that are not there.
>
> Please refer to the email I just sent to mitgcm-support ("running ECCO
> version 4 release 2? ) and to
> http://mitgcm.org/viewvc/*checkout*/MITgcm/MITgcm_
> contrib/gael/verification/eccov4.pdf
> for additional details with regard to the reference bi-decadal run.
>
> Cheers,
> Gael
>
>
> On Oct 19, 2016, at 8:16 AM, Dan Jones <dcjones.work at gmail.com> wrote:
>
> > Hi all,
> >
> > I am using the ECCOv4 setup in adjoint mode for sensitivity analysis.
> My setup is close to the one found here:
> >
> >     MITgcm_contrib/verification_other/global_oce_llc90/
> >
> > modified for sensitivity analysis via data.ecco.  I am having a lot of
> trouble getting consistent output from this setup on ARCHER (
> http://www.archer.ac.uk/).  Here are some of the problems that I have run
> into:
> > The model does not output any STDOUT or STDERR while it is running,
> which makes it impossible to "check up on" the model while it's running.
> This behavior is different from every other MITgcm setup that I have run on
> ARCHER.  The STDOUT files are *only sometimes* produced *after* the model
> has finished running.
> >
> > The model will "crash" (e.g. a package check error during start-up)
> without stopping the ARCHER job.  The ARCHER job will continue until its
> wall time has been exceeded.  At that point, the STDOUT files *may* finally
> be produced.
> >
> > The model is inconsistent about when it produces output (e.g. MDS files,
> pickup files).  Often it doesn't produce anything.  When it actually does
> produce output, it only does so *after* the model has finished running.
> > I have checked all of my output frequencies in 'data' multiple times,
> and I have had a colleague check as well.  Interestingly, I do not have
> these problems on very short test runs (e.g. 8-10 time steps).  I only have
> these problems on longer runs, i.e. greater than a couple hundred timesteps.
> >
> > In short, my question is "why doesn't this setup produce regular output
> while the model is running, and why does it often produce no output at all?"
> >
> > One possibility - perhaps ARCHER has a memory buffer that needs to fill
> up before it is dumped/output.  Maybe this setup isn't filling up the
> buffer very often, so it's not producing output at run-time.  However, this
> wouldn't explain why I sometimes get *no* output, even after the model has
> finished running.  It also wouldn't explain why this behavior differs from
> every other MITgcm setup that I have used on ARCHER.
> >
> > Thank you in advance for any help/clarification that you can provide!
> >
> > -Dan
> >
> > --------------------------------------------------------------
> > Dr Dan Jones
> > Polar Oceans Team
> > British Antarctic Survey
> > --------------------------------------------------------------
> > _______________________________________________
> > MITgcm-support mailing list
> > MITgcm-support at mitgcm.org
> > http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20161019/0923a61b/attachment.htm>


More information about the MITgcm-support mailing list