[MITgcm-devel] snapshots and divided.ctrl
Matthew Mazloff
mmazloff at ucsd.edu
Mon Jul 2 14:20:43 EDT 2012
Hi Patrick
My question was basically who chose the syntax for the I/O command,
>> open(unit=77,file='snapshot'//filen,status='old',form=
>> $'unformatted',iostat=iers)
>> if (iers .eq. 0) then
>> read(unit=77) adapressure0,adapressure1,adaqh0,adaqh1,adarea,
>> ...
But I don't think I need to know that anymore. What I believe is
happening is that there is an issue with multiple processors accessing
divided.ctrl. Occasionally, I believe, processor zero is writing
divided.ctrl in adthe_main_loop when another processor tries to read
it in cost_final_restore. This causes the model to crash. Then some
processors, that were writing snapshot, stop writing and this is what
I was seeing.
So I believe the problem is with cost_final_restore. There are
numerous ways to remedy this, and I am not sure what is best. Since
cost_final_restore is only needed for packing I will just put it
inside EXCLUDE_CTRL_PACK, but this isn't the most robust fix so I
won't check it in. Anyway, I hope this fixes my problem.
Thanks
-Matt
On Jul 2, 2012, at 6:10 AM, Patrick Heimbach wrote:
>
> Hi Matt,
>
> not sure I understand your question.
> If DIVA is enabled (via #define ALLOW_DIVIDED_ADJOINT)
> TAF automatically picks the outermost checkpoint level
> (by default ilev_3, in your case ilev_4)
> as the interval with which to checkpoint the adjoint snapshots,
> because it is here that we tell TAF to do so:
> c**************************************
> # ifdef ALLOW_DIVIDED_ADJOINT
> CADJ loop = divided
> # endif
> c**************************************
>
> There is no extra directive of where the I/O itself should take place,
> the natural place to do this is:
> * to read the snapshot files right before the ilev_4 loop gets
> incremented, i.e. before
> do ilev_4 = idivbeg, idivend+1, -1
> * to overwrite that snapshot file after that same loop is completed,
> i.e.
> enddo (of the above loop)
>
> So you should know exactly where in S/R the_main_loop.F TAF will
> target the read/write,
> namely just before block:
> # ifdef AUTODIFF_4_LEVEL_CHECKPOINT
> do ilev_4 = 1,nchklev_4
> if(ilev_4.le.max_lev4) then
>
> and just after block:
> # ifdef AUTODIFF_4_LEVEL_CHECKPOINT
> endif
> enddo
> # endif
>
> But I guess you know this, so not sure if this helps.
>
> Cheers
> -Patrick
>
> On Jun 29, 2012, at 8:07 PM, Matthew Mazloff wrote:
>
>> Hello
>>
>> I am having an issue where occasionally the adjoint snapshot files
>> are not properly written out. I wanted to troubleshoot this and
>> perhaps just change the way the I/O for these files are performed,
>> but I am having trouble figuring out where taf gets the info for
>> snapshots. Is there a place where we tell taf to write the adjoint
>> state snapshot, or does it do that on its own? I can't seem to
>> locate in any forward code the call that generates the adjoint code:
>>
>> C----------------------------------------------
>> C read snapshot
>> C----------------------------------------------
>> 9813 continue
>> if (idivbeg .lt. nchklev_4) then
>> open(unit=77,file='snapshot'//filen,status='old',form=
>> $'unformatted',iostat=iers)
>> if (iers .eq. 0) then
>> read(unit=77) adapressure0,adapressure1,adaqh0,adaqh1,adarea,
>> ...
>>
>> and the later write snapshot one either. How is this information
>> provided to taf?
>>
>>
>> Matt
>>
>>
>>
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>
> ---
> Patrick Heimbach | heimbach at mit.edu | http://www.mit.edu/~heimbach
> MIT | EAPS 54-1420 | 77 Massachusetts Ave | Cambridge MA 02139 USA
> FON +1-617-253-5259 | FAX +1-617-253-4464 | SKYPE patrick.heimbach
>
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list