[MITgcm-devel] SIZE.h matters on Columbia for CS510?
Dimitris Menemenlis
dmenemenlis at gmail.com
Fri Jun 27 12:47:33 EDT 2008
The w2_e2setup.F and W2_EXCH2_TOPOLOGY.h that we use have been around
for a long time and we have successfully used them on many occasions
before.
We have determined that the problem is most likely a compiler
optimization issue. In addition to being able to run successfully on
270 CPUs (but failing on 54, 216, and 450), the code will also run
successfully if we use -O0 optimization. We have tried -O0
successfully on both 54 and 450 CPUs.
The way the model fails, when it fails, is the appearance of randomly
distributed spikes in Eta, up to +/- 200 m, during the second time step:
http://ecco2.jpl.nasa.gov/data1/cube/cube81/run_test/e2.pdf
Initial Eta does not contain these spikes:
http://ecco2.jpl.nasa.gov/data1/cube/cube81/run_test/e1.pdf
The spikes only appear in ETAN (at first). All the other model
prognostic variables (we have looked at THETA and SALT and monitored
UVEL/VVEL) seem OK.
The spikes are randomly distributed everywhere in the domain, i.e.,
they do not appear to be associated with edge effects of any sort.
Has anyone ever seen a similar problem. It seems like possible
trouble with the global_sum or something like that? Would anyone have
a suggestion as to what individual files we could try compiling with -
O0 to proceed?
Hong and Dimitris
On Jun 24, 2008, at 7:57 AM, Patrick Heimbach wrote:
>
> Hi Hong,
>
> afaik,
> the files w2_e2setup.F and W2_EXCH2_TOPOLOGY.h
> are dependent on your domain decomposition on the cubed sphere,
> i.e. if you change that decomposition in SIZE.h
> (which seems to be what you did), you need to regenerate
> these two files so that all tile and face neighbor informations
> on the cube remain correct.
> The matlab script to do that is in
> utils/exch2/matlab-topology-generator/driver.m
>
> At least from your mail it sounds like you didn't do that.
> And it means your problem is not a code version problem.
>
> Hope this helps
> -Patrick
>
>
>
> On Jun 23, 2008, at 7:13 PM, Hong Zhang wrote:
>
>> Dear all,
>> last lime we reported a problem (attached here:
>> ---------
>> Something has happened to code from checkpoint59l to current head
>> branch, which makes it impossible to restart CS510 code. Any clues
>> where we should look and what chekpoints to test?
>> Job crashes on third time step with
>>
>>> WARNING: r*FacC < hFacInf at 3 pts : bi,bj,Thid,Iter= 1
>>> 1 1 218
>>> e.g. at i,j= 65 85 ; rStarFac,H,eta = -1.237739 4.755480E+03
>>> -1.064152E+04
>>> STOP in CALC_R_STAR : too SMALL rStarFacC !
>>>
>> ---------
>> We found this problem is related to the config of SIZE.h and
>> w2_e2setup.F
>> We tested s216t_85x85/SIZE.h_216, s1800t_17x51/SIZE.h_450, and
>> s216t_85x85/SIZE.h_54.
>> They all failed and caused the same error as mentioned above.
>> But the config of s1350t_34x34/SIZE_270.h is workable.
>> For s216t_85x85/SIZE.h_54 we further switched off the optimization
>> (in Makefile setting FOPTIM =) but it has the same problem.
>> We checked the output @second timestep
>> but didn't find obvious overlap problem.
>> Does anyone have any clue?
>>
>> hong
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>
> ---
> Patrick Heimbach | heimbach at mit.edu | http://www.mit.edu/~heimbach
> MIT | EAPS 54-1518 | 77 Massachusetts Ave | Cambridge MA 02139 USA
> FON +1-617-253-5259 | FAX +1-617-253-4464 | SKYPE patrick.heimbach
>
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list