[MITgcm-devel] (not so) funny things happen in seaice_lsr and pickups
Martin Losch
Martin.Losch at awi.de
Mon Mar 2 05:49:13 EST 2009
Hi Jean-Michel:
On Mar 2, 2009, at 1:54 AM, Jean-Michel Campin wrote:
> Hi Martin,
>
> I am a little bit confused:
so am I.
>
> If cg2d_rhs (sum or max, doesn't matter) is not identical,
> it means the state is different, and I guess if you do
> a diff of the final pickup (as I wrote earlier, this is what I
> consider to be the "true" answer), it will be different too.
> So, it seems to me that there is a more fundamental Pb with
> restart/pickup. Because only 3 or 4 correct digits for
> Sum(rhs) does not look very good.
>
> Could you try to run the "../tools/do_tst_2+2" from
> MITgcm/verification where the last SX8 testreport has run ?
> I made some changes recently for MPI restart test, and put an
> automatic restart test after the aces_ifc_mpi testreport
> (see the changes in tools/example_scripts/ACESgrid/aces_test_ifc_mpi)
> You don't need to recompile anything, so the issue of cross compiler
> should not be a problem.
> And if something in those script does not work on this platform,
> would be happy to try to fix it.
Thanks for the modified do_tst_2+2 (BTW, tst_2+2 does not work on my
Apple/Leopard, some sed syntax issues, I think, but I did not have the
time to sort it out, as my sed skills are poor; does the script work
on other non-linux platforms? I assume that the shell tools are
different/GNU vs. BSD Unix, etc).
I ran do_tst_2+2 on the SX8 for lab_sea (will includes these tests
into the weekly routine), and all four tests pass. I repeated the
procedure with grid rotation, and still the tests do pass. So
everything that is tested in do_tst_2+2 seems to be perfectly OK. But
does the script test the Sum(rhs) numbers? (I am running the tests on
all verification experiments now and so far there are no fails, except
for fizhi, where the verification tests did no run, either)
Now I have to figure out, why I am diagnosing wrong restarts in my
specific configuration. What do I need to do to run you scripts on my
non-verification configuration?
Unrelated to the restart issues:
Over the weekend I have solved at least one problem: I understood, why
for me the loop counters for the copy of u(3)=u(1) matter. On the SX8
the default is to have SEAICE_VECTORIZE_LSR defined. Then tLev=3
(otherwise 1), and u(i,j-1,tLev,bi,bj) and u(i,j+1,tlev,bi,bj) are
actually used, and thus the overlap of u(3) is actually used. My
mistake!
Further, there are some inconsistencies in the discretisation of the
metric terms in seaice_lsr.F, these lead to slight asymmetriies in the
solutions, when the solutions should be symmetric (e.g. I have put my
funnel/channel at the equator, so that everything should be symmetric
about the equator). I have not yet managed to get everything
symmetric, but on difficulty is that even the grid parameters, such as
rA and fCori are not quite symmetric (on the truncation level). As a
matter of fact, when SEAICE_VECTORIZE_LSR is undefined, the solutions
are even "more" non-symmetric so there is something in the LSOR
algorithm itself that is not quite consistent (probably just the
solver accuracy).
However, remembering the non-symmetric discretization in the B-grid
code? Once that was removed, Dimitris' problems with CS510 with the B-
grid LSR disappeared, so I would no be surprised if I these
asymmetries in the metric terms cause the "spontaneous" explosions
that I have been talking about initially. I am testing this now and
will check-in my fixes soon.
To summarize:
1. there are no restart problems in the verification experiments, as
diagnosed by do_tst_2+2
2. the restart problem in my specific configuration (and the one that
Olaf Klatt uses, a regular lat lon grid) remains, the grid rotation
changes the behavior, but in the end the restarts fail here, too
(pickups are different)
3. the overlap problem is solved, as usual I am the culprit
4. the explosions may be caused by non-symmetric or even wrong
discretizations (by me, again) in the metric terms, but that's unclear
yet.
I am attaching the code directory and more number in restarttest.out.
Maybe you have an idea, why I am getting the wrong restarts (maybe
it's just in my diagnostics, which are not automatized as in your
script, again, how can I use the script on my example?).
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: restarttest.out
Type: application/octet-stream
Size: 5799 bytes
Desc: not available
URL: <http://mitgcm.org/pipermail/mitgcm-devel/attachments/20090302/64cd00dd/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: code.tgz
Type: application/octet-stream
Size: 12223 bytes
Desc: not available
URL: <http://mitgcm.org/pipermail/mitgcm-devel/attachments/20090302/64cd00dd/attachment-0001.obj>
-------------- next part --------------
> Cheers,
> Jean-Michel
>
> On Fri, Feb 27, 2009 at 05:48:47PM +0100, Martin Losch wrote:
>> Hi Jean-Michel,
>>
>> sorry, I was offset by the Sum(rhs); all other values that I checked
>> (dynstat_theta/uvel_min/max/mean/sd) do agree perfectly (for both the
>> agressive and minimal optimization), so that for lab_sea the restart
>> seems to be OK. So I need to go to my configuration and do the checks
>> there (where there are really differences and the restart does not
>> work). I'll have to figure out, what's different to lab_sea
>> (parameters
>> mostly) and narrow down the problem, more to follow ...
>> Martin
>>
>> On Feb 27, 2009, at 5:31 PM, Martin Losch wrote:
>>
>>> Hi Jean-Michel,
>>>
>>> it's probably a good idea for me to first tackle the restart
>>> problem.
>>> Here's what I get on 1CPU (two tiles, snx=2) with my aggressive
>>> optimization for lab_sea/input.lsr (output.0-10 is for a total of 10
>>> steps, output.5-10 is starting from a pickup at niter0=5)
>>> sx8::tr_run.lsr> grep cg2d: output.0-10
>>> [...]
>>> cg2d: Sum(rhs),rhsMax = 3.07698311274862E-13 1.19974476101239E+00
>>> cg2d: Sum(rhs),rhsMax = 4.01567668006919E-13 1.19252858573205E+00
>>> cg2d: Sum(rhs),rhsMax = 5.02708985550271E-13 1.18194572452171E+00
>>> cg2d: Sum(rhs),rhsMax = 6.01629857044372E-13 1.16776484963845E+00
>>> cg2d: Sum(rhs),rhsMax = 8.02802269106451E-13 1.15096778602035E+00
>>> sx8::tr_run.lsr> grep cg2d: output.5-10
>>> cg2d: Sum(rhs),rhsMax = 3.07975867031018E-13 1.19974476101239E+00
>>> cg2d: Sum(rhs),rhsMax = 4.01789712611844E-13 1.19252858573205E+00
>>> cg2d: Sum(rhs),rhsMax = 5.03430630516277E-13 1.18194572452171E+00
>>> cg2d: Sum(rhs),rhsMax = 6.03184169278848E-13 1.16776484963844E+00
>>> cg2d: Sum(rhs),rhsMax = 8.05300270911857E-13 1.15096778602035E+00
>>>
>>> and with the lowest possible optimization ("ssafe" only safe scalar
>>> optimization):
>>> sx8::tr_run.lsr> grep cg2d: output.0-10
>>> [...]
>>> cg2d: Sum(rhs),rhsMax = 3.05866443284231E-13 1.19974475698064E+00
>>> cg2d: Sum(rhs),rhsMax = 4.00179889226138E-13 1.19252857858165E+00
>>> cg2d: Sum(rhs),rhsMax = 5.01432229071952E-13 1.18194571749093E+00
>>> cg2d: Sum(rhs),rhsMax = 6.03017635825154E-13 1.16776484246162E+00
>>> cg2d: Sum(rhs),rhsMax = 8.00970401115819E-13 1.15096777725923E+00
>>> sx8::tr_run.lsr> grep cg2d: output.5-10
>>> cg2d: Sum(rhs),rhsMax = 3.05810932132999E-13 1.19974475698064E+00
>>> cg2d: Sum(rhs),rhsMax = 3.99458244260131E-13 1.19252857858165E+00
>>> cg2d: Sum(rhs),rhsMax = 5.01820807130571E-13 1.18194571749093E+00
>>> cg2d: Sum(rhs),rhsMax = 6.02740080068997E-13 1.16776484246162E+00
>>> cg2d: Sum(rhs),rhsMax = 8.03301869467532E-13 1.15096777725923E+00
>>>
>>> Note that in both cases the rhsMax-values are identical after the
>>> pickup, but he Sum(rhs) are not (substraction of large numbers?);
>>> with
>>> aggressive optimization I am losing one digit precisition (3
>>> instead of
>>> 4, big deal). On eddy, both numbers are identical.
>>>
>>> Martin
>>>
More information about the MITgcm-devel
mailing list