[MITgcm-devel] (not so) funny things happen in seaice_lsr and pickups

Fri Feb 27 12:16:36 EST 2009

Hi Martin,
You might want to use just 1cpu to test all the grids so you would not 
have edge (exchange) problem. Also if you suspect troubles with metric 
terms, we have a paper that lists all the metric terms: Zhang, J., and 
D.A. Rothrock: Modeling global sea ice with a thickness and enthalpy 
distribution model in generalized curvilinear coordinates 
<http://psc.apl.washington.edu/zhang/Pubs/POIM.pdf>, /Mon. Wea. Rev/., 
/131(5)/, 681–697, 2003.
Jinlun

Martin Losch wrote:
> Hi all, but probably in particular Jean-Michel,
>
> I have no found this on our SX8:
>
> 1. restarts that work elsewhere (e.g. lab_sea on eddy.csail.mit.edu) 
> do not work. I have no idea why, it is not connected with a particular 
> package, but also for experiments where data.pkg has no entries the 
> restart is broken. This is clearly an issue related to SX8, as the 
> restart behavior is regularly tested. I am still looking for the 
> precise reason, but at the moment I am clueless. Suggestions are welcome.
>
> 2. "spontaneous" explosions happen in the C-LSR solver, but so far not 
> in the B-LSR or C-EVP solver. I am not sure to what extent this is 
> just coincidence. Currently this happens in a 1cpu-2deg-lat-lon 
> configuration, a 2cpu Arctic configuration with a rotated lat-lon grid 
> and .25deg resolution and with OBCS, and regional 0.5deg resolution 
> for the Weddell Sea (so far without OBCS). I have run the CS510 for 
> 16year without problems, also I have run the above Arcttc 
> configuration with a curvilinear grid (basically the grid is the same, 
> but the metric terms in the ice model are no there) without any 
> problems. It "looks" like it's connected to the lat-lon grid (and thus 
> metric terms?).
>
> 3. C-LSR (and B-LSR) is basically set of iterations. At the beginning, 
> the first timelevel velocity is copied to the third: 
> uice(i,j,3,bi,bj)=uice(i,j,1,bi,bj), then later we compute an 
> innovation like this:
> u(1) = u(3) + .95*(URT-u(3)).
> and at the end of each iteration there is an 
> exch_uv_3d_rl(uice,vice,.true.,3,mythid).
> All of these computations happen within j=1,sNy; i=1,sNx (but 
> partiallly in separate loops). u(3) is never used outside of 
> seaice_lsr.F (lsr.F, except in some obsolete and never used 
> ice/ocea-stress computation). I have made a change so that 
> uice(3)=uice(1) is now done for the entire array: j=1-Oly,sNy+Oly; 
> i=1-Olx,sNx+Olx, that is including the overlaps. These overlaps of 
> u(3) (and v(3)) are never touched elsewhere, except in the exchange 
> routines. After this change (copy of u/v(1) to u/v(3), including 
> overlaps), the results should not change; they do not change on, say, 
> eddy.csail.mit.edu, but the do change on our SX8. In some cases the 
> "spontaneous" explosions go away, in others they are "delayed" by 
> order(1000) timesteps.
>
> My preliminary conclusions are, that the problem with seaice_lsr and 
> pickups are actually connected. The only thing that can go wrong in 
> the pickups is that something fishy is happening in the exchanges. 
> Other option is, that it is somehow connected to metric terms in the 
> ice model, which I find hard to believe; it would not explain the 
> restart problem.
>
> What should I try next to figure out this problem?
>
> Martin
> cc to Olaf Klatt
>
>
> On Feb 19, 2009, at 8:57 AM, Martin Losch wrote:
>
>> Hi Jinlun and Matt, thanks for your comments,
>>
>> I did comparison runs with the B-grid code and with EVP and in the 
>> particular instances I am interested in, they do not crash. That's a 
>> bit discomforting for me, but on the other hand, I do not use the 
>> B-grid or EVP code to often, so that I don't have an appropriate 
>> statistical sample (again, in nearly all cases the C-LSR code is 
>> absolutely stable, and Dimitris, does all his CS510 runs with C-LSR).
>>
>> Matt, the original seaice_growth.F has lots of these
>>> HEFF(I,J,2,bi,bj) = MAX(0. _d 0, HEFF(I,J,2,bi,bj) )
>>> HSNOW(I,J,bi,bj) = MAX(0. _d 0, HSNOW(I,J,bi,bj) )
>>> AREA(I,J,2,bi,bj) = MAX(0. _d 0, AREA(I,J,2,bi,bj) )
>> as well, but we will try this also. I don't think that the 
>> thermodynamic growth is the problem, it's more likely that changing 
>> anything in the sea ice model makes the model not crash at a 
>> particular point (e.g., interrupting and restarting and integration 
>> from a pickup rather than doing everything in one go, in this sense 
>> changing from C to B-grid is a change, too, and not a small one), but 
>> I guess, if we have some funny HEFF etc, the LSR solver might get 
>> into trouble, too.
>> So I'll try this.
>>
>> Martin
>>
>> On Feb 18, 2009, at 5:42 PM, Jinlun Zhang wrote:
>>
>>> Martin,
>>> Have you tried LSR on B-grid with the bug fixed, just for a comparison?
>>> Good luck, Jinlun
>>>
>>> Martin Losch wrote:
>>>> Hi all,
>>>>
>>>> just to let you know that we are experiencing problems with the LSR 
>>>> sea ice solver on the C-grid: At unpredictable points of the 
>>>> integration, it appears to become instable and blows up. I have not 
>>>> been able to isolate this in all cases, because a small issue with 
>>>> pickups hampers this:
>>>>
>>>> Apparently, starting from pickup is NOT exact. We have tried the 
>>>> famous 2+2=4 test with our 8CPU job on our SX8 (cc to Olaf, who's 
>>>> been mostly involved in this) and found no difference between the 
>>>> cg2d output (and other output). However, when we run an experiment 
>>>> for a longer time, the same test fails, e.g., 2160+2160 != 4320 (we 
>>>> can provide plots if required). I assume that this is expected, 
>>>> because double precision is not more than double precisioin and in 
>>>> the cg2d output (and other monitor output) there are always only 15 
>>>> digits, and we don't know about the 16th one, correct? Anyway, this 
>>>> tiny pickup issue hinders me from approaching the point of model 
>>>> crash with pickups, because after starting from a pickup, the model 
>>>> integrate beyond the problem and crashes (sometimes) at a much 
>>>> later time. This is to say, that the problem in seaice_lsr (the 
>>>> problem only appears when the C-LSR solver is used) very sensitive; 
>>>> the code crashes without any warning from one time step to the 
>>>> other. A while ago, in a different case I was able to get close 
>>>> enough the point of crashing to do some diagnostics, but its almost 
>>>> impossible to identify, why the model explodes. I am assuming that 
>>>> for random pathological cases one or more matrix entries are nearly 
>>>> zero, which then prevents the solver from converging.
>>>>
>>>> Any comments? Any similar experience?
>>>>
>>>> I run this code in so many different configurations, and I have 
>>>> these problems only very seldom/randomly, so I am a little at a 
>>>> loss where I should continue looking, so any hint is appreciated.
>>>>
>>>> Martin
>>>>
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel

-- 

Jinlun Zhang
Polar Science Center, Applied Physics Laboratory
University of Washington, 1013 NE 40th St, Seattle, WA 98105-6698

Phone: (206)-543-5569; Fax: (206)-616-3142
zhang at apl.washington.edu
http://psc.apl.washington.edu/pscweb2002/Staff/zhang/zhang.html