[MITgcm-devel] (not so) funny things happen in seaice_lsr and pickups

Martin Losch Martin.Losch at awi.de
Fri Feb 27 08:07:04 EST 2009


Hi all, but probably in particular Jean-Michel,

I have no found this on our SX8:

1. restarts that work elsewhere (e.g. lab_sea on eddy.csail.mit.edu)  
do not work. I have no idea why, it is not connected with a particular  
package, but also for experiments where data.pkg has no entries the  
restart is broken. This is clearly an issue related to SX8, as the  
restart behavior is regularly tested. I am still looking for the  
precise reason, but at the moment I am clueless. Suggestions are  
welcome.

2. "spontaneous" explosions happen in the C-LSR solver, but so far not  
in the B-LSR or C-EVP solver. I am not sure to what extent this is  
just coincidence. Currently this happens in a 1cpu-2deg-lat-lon  
configuration, a 2cpu Arctic configuration with a rotated lat-lon grid  
and .25deg resolution and with OBCS, and regional 0.5deg resolution  
for the Weddell Sea (so far without OBCS). I have run the CS510 for  
16year without problems, also I have run the above Arcttc  
configuration with a curvilinear grid (basically the grid is the same,  
but the metric terms in the ice model are no there) without any  
problems. It "looks" like it's connected to the lat-lon grid (and thus  
metric terms?).

3. C-LSR (and B-LSR) is basically set of iterations. At the beginning,  
the first timelevel velocity is copied to the third: uice(i,j, 
3,bi,bj)=uice(i,j,1,bi,bj), then later we compute an innovation like  
this:
u(1) = u(3) + .95*(URT-u(3)).
and at the end of each iteration there is an  
exch_uv_3d_rl(uice,vice,.true.,3,mythid).
All of these computations happen within j=1,sNy; i=1,sNx (but  
partiallly in separate loops). u(3) is never used outside of  
seaice_lsr.F (lsr.F, except in some obsolete and never used ice/ocea- 
stress computation). I have made a change so that uice(3)=uice(1) is  
now done for the entire array: j=1-Oly,sNy+Oly; i=1-Olx,sNx+Olx, that  
is including the overlaps. These overlaps of u(3) (and v(3)) are never  
touched elsewhere, except in the exchange routines. After this change  
(copy of u/v(1) to u/v(3), including overlaps), the results should not  
change; they do not change on, say, eddy.csail.mit.edu, but the do  
change on our SX8. In some cases the "spontaneous" explosions go away,  
in others they are "delayed" by order(1000) timesteps.

My preliminary conclusions are, that the problem with seaice_lsr and  
pickups are actually connected. The only thing that can go wrong in  
the pickups is that something fishy is happening in the exchanges.  
Other option is, that it is somehow connected to metric terms in the  
ice model, which I find hard to believe; it would not explain the  
restart problem.

What should I try next to figure out this problem?

Martin
cc to Olaf Klatt


On Feb 19, 2009, at 8:57 AM, Martin Losch wrote:

> Hi Jinlun and Matt, thanks for your comments,
>
> I did comparison runs with the B-grid code and with EVP and in the  
> particular instances I am interested in, they do not crash. That's a  
> bit discomforting for me, but on the other hand, I do not use the B- 
> grid or EVP code to often, so that I don't have an appropriate  
> statistical sample (again, in nearly all cases the C-LSR code is  
> absolutely stable, and Dimitris, does all his CS510 runs with C-LSR).
>
> Matt, the original seaice_growth.F has lots of these
>>                 HEFF(I,J,2,bi,bj)  = MAX(0. _d 0, HEFF(I,J, 
>> 2,bi,bj)  )
>>                 HSNOW(I,J,bi,bj)   = MAX(0. _d 0,  
>> HSNOW(I,J,bi,bj)   )
>>                 AREA(I,J,2,bi,bj)  = MAX(0. _d 0, AREA(I,J, 
>> 2,bi,bj)  )
> as well, but we will try this also. I don't think that the  
> thermodynamic growth is the problem, it's more likely that changing  
> anything in the sea ice model makes the model not crash at a  
> particular point (e.g., interrupting and restarting and integration  
> from a pickup rather than doing everything in one go, in this sense  
> changing from C to B-grid is a change, too, and not a small one),  
> but I guess, if we have some funny HEFF etc, the LSR solver might  
> get into trouble, too.
> So I'll try this.
>
> Martin
>
> On Feb 18, 2009, at 5:42 PM, Jinlun Zhang wrote:
>
>> Martin,
>> Have you tried LSR on B-grid with the bug fixed, just for a  
>> comparison?
>> Good luck, Jinlun
>>
>> Martin Losch wrote:
>>> Hi all,
>>>
>>> just to let you know that we are experiencing problems with the  
>>> LSR sea ice solver on the C-grid: At unpredictable points of the  
>>> integration, it appears to become instable and blows up. I have  
>>> not been able to isolate this in all cases, because a small issue  
>>> with pickups hampers this:
>>>
>>> Apparently, starting from pickup is NOT exact. We have tried the  
>>> famous 2+2=4 test with our 8CPU job on our SX8 (cc to Olaf, who's  
>>> been mostly involved in this) and found no difference between the  
>>> cg2d output (and other output). However, when we run an experiment  
>>> for a longer time, the same test fails, e.g., 2160+2160 != 4320  
>>> (we can provide plots if required). I assume that this is  
>>> expected, because double precision is not more than double  
>>> precisioin and in the cg2d output (and other monitor output) there  
>>> are always only 15 digits, and we don't know about the 16th one,  
>>> correct? Anyway, this tiny pickup issue hinders me from  
>>> approaching the point of model crash with pickups, because after  
>>> starting from a pickup, the model integrate beyond the problem and  
>>> crashes (sometimes) at a much later time. This is to say, that the  
>>> problem in seaice_lsr (the problem only appears when the C-LSR  
>>> solver is used) very sensitive; the code crashes without any  
>>> warning from one time step to the other. A while ago, in a  
>>> different case I was able to get close enough the point of  
>>> crashing to do some diagnostics, but its almost impossible to  
>>> identify, why the model explodes. I am assuming that for random  
>>> pathological cases one or more matrix entries are nearly zero,  
>>> which then prevents the solver from converging.
>>>
>>> Any comments? Any similar experience?
>>>
>>> I run this code in so many different configurations, and I have  
>>> these problems only very seldom/randomly, so I am a little at a  
>>> loss where I should continue looking, so any hint is appreciated.
>>>
>>> Martin
>>>
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel




More information about the MITgcm-devel mailing list