[MITgcm-devel] Identical executable producing diverging results

An T Nguyen An.T.Nguyen at jpl.nasa.gov
Tue Dec 29 17:48:59 EST 2009


Hi David,

Yes the forcing files are unchanged.  I uploaded them on May 7th.  The original runs were on June 
18, and repeats in Dec.  Also, all initial and boundary condition files are on the same date as the 
first output files.

It seems that the divergence only happens when we start from iteration 0 (jan 1, 1992).  When I 
picked it up Jan 1, 1993 (the 1st available pickup file, produced on June 19th), I am able to 
reproduce exact results.

An

David Ferreira wrote:
> An,
> One possibility is that you are not using the same modules as when the 
> first run was made.
> It is possible to run an executable with modules other than those used 
> to compile, there
> will be no error message. The mpt modules are the proble: they do change 
> the results.
> (the ifort modules don't affect the results, which according to JMC 
> makes sense
> because ifort uses static linking by default).
> 
> That said, it is a bit weird that the precipitations are changed, this 
> field is read from a file and
> there is no computation, no ? Maybe just an interpolation ? Anyway, it 
> changes by as much
> as 10%. A change in library would change a digit here and there only.
> Are you sure the forcing files haven't been modified ?
> 
> david
> 
> 
> 
> Nguyen, An T (3248-Affiliate) wrote:
>> Hi Dimitris,
>>
>> Yes I still have the original Makefile, although it's in your directory and I don't have permission to read:
>> cfe2:/nobackup2a/menemenl/arctic/arctic_9km/MITgcm/bin/Makefile
>>
>> I have done a little bit more testing.  It seems that if we start from niter=0 the solutions diverge.  However, if I start from a pick-up file (I've done for 5 available pick-up files in cfe2:/nobackup2a/menemenl/arctic/arctic_9km/MITgcm/exe_firstrun_1992_2009/ including the one that starts on 01-Jan-1993), I can reproduce exact results.
>>
>> Can you send a message to Art Lazanoff because I dont' know which flag we used to compile the executable (permission issue)?
>>
>> An
>>
>> ________________________________________
>> From: Dimitris Menemenlis [menemenlis at jpl.nasa.gov]
>> Sent: Saturday, December 26, 2009 3:40 AM
>> To: MITgcm-devel at mitgcm.org
>> Subject: Re: [MITgcm-devel] Identical executable producing diverging results
>>
>> An, do you remember what flags you used to compile executable?
>> That is, do you still have access to the Makefile.
>> We could send these flags to NAS support along with your question.
>>
>> I don't think that -O3 guarantees that order of operations will always
>> be the same under different machine load conditions or
>> configurations (but some modifiers, e.g., -mp, might guarantee
>> repeatability).
>>
>> If you do send your question to NAS support, could you cc me on it.
>> Or if you want you or I can send it directly to Art Lazanoff since he is
>> already familiar with MITgcm and has been helping Hong and
>> I with compiler flags, etc.
>>
>> The crash is worrisome.  It may be happening for same reason
>> that 4-km integration crashes, but I have yet to diagnose that.
>>
>> D.
>>
>> On Dec 25, 2009, at 1:36 PM, Nguyen, An T (3248-Affiliate) wrote:
>>
>>   
>>> hi Patrick,
>>>
>>> The 2 STDOUT files I sent via attachment starts at the same niter0 = 0.  The STDOUT.0000 in /nobackup2a/menemenl/arctic/arctic_9km/MITGCM/exe_firstrun_1992_2009/ is from a pick-up time and is not what I'm using for comparison here.  I used stdout.216096 in that directory to compare with my current run (i.e., head -4000 stdout.216096 > STDOUT.0000_Jun09a).
>>>
>>> An
>>> ________________________________________
>>> From: Patrick Heimbach [heimbach at MIT.EDU]
>>> Sent: Friday, December 25, 2009 2:22 AM
>>> To: MITgcm-devel at mitgcm.org
>>> Subject: Re: [MITgcm-devel] Identical executable producing diverging results
>>>
>>> Hi An,
>>>
>>> I diff-ed the two STDOUT.0000
>>> and it appears that one setup starts from
>>>   niter0=0,
>>> the other one from
>>>   niter0=601824,
>>>
>>> That probably explains your problem.
>>>
>>> -Patrick
>>>
>>>
>>>
>>> On Dec 23, 2009, at 9:35 PM, An T Nguyen wrote:
>>>
>>>     
>>>> Hi all,
>>>>
>>>> I wasn't sure where I should send this email, but I'm hoping
>>>> someone could help me with this issue:
>>>>
>>>> I ran 2 experiments (one @ 18km and one @ 9km resolution) back in
>>>> June 2009, and now am repeating
>>>> the exact experiments.  The results are diverging, and in 1 case
>>>> the model crashes in 2002 while the
>>>> original experiment ran to 2009 without any problem.  The
>>>> divergence is true for both 18km and 9km
>>>> runs, and the first place it diverges is at exf_hflux_max/min/mean/
>>>> sd/del2 and
>>>> exf_sflux_max/min/mean/sd/del2 at time-step 0 in the STDOUT file.
>>>>
>>>> I include here the beginning of two 9-km STDOUT files, one of the
>>>> original run back in June, and one
>>>> of this month run.  All input files are identical.  Forcing files
>>>> and executables (mitgcmuv) are also identical.  The locations of
>>>> these 2 runs are:
>>>>
>>>> Original:
>>>> cfe2:/nobackup2a/menemenl/arctic/arctic_9km/MITgcm/
>>>> exe_firstrun_1992_2009/
>>>>
>>>> Repeat:
>>>> cfe2:/nobackup1/atnguyen/arctic/MITgcm/exe_firstrun_1992_2009
>>>>
>>>> The 18km runs also differ at the same place.  Could this be due to
>>>> some changes on the Columbia side?
>>>>
>>>> I also notice that if I repeat the same experiments from pick-up
>>>> files that were generated after Jun 27, 2009, then I get exact
>>>> results.  So it seems that results generated prior to approximately
>>>> this date can not be repeated.
>>>>
>>>> Thanks,
>>>> An
>>>>
>>>>       
>>> ---
>>> Patrick Heimbach | heimbach at mit.edu | http://www.mit.edu/~heimbach
>>> MIT | EAPS 54-1518 | 77 Massachusetts Ave | Cambridge MA 02139 USA
>>> FON +1-617-253-5259 | FAX +1-617-253-4464 | SKYPE patrick.heimbach
>>>
>>>
>>>
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>>
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>>     
>>
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>   
> 
> 
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel


-- 
An T. Nguyen, Ph.D. <atn at jpl.nasa.gov>
Jet Propulsion Lab, California Institute of Technology
MS 300-323, 4800 Oak Grove Dr, Pasadena CA 91109-8099
tel: 818-354-4122; fax: 818-393-6720



More information about the MITgcm-devel mailing list