[MITgcm-devel] Re: [TT#: 5042674] possible compiler optimization problem

Hong Zhang hong.zhang at caltech.edu
Fri Jun 27 20:36:11 EDT 2008


Hi Art,
The code failed quickly, actually at the second time step.
We tried experiments (with both new code and old code)
by turning off the optimization (FOPTIM = -O0)
and it turned out OK, though quite slowly.

hong
NAS Support wrote:
> Hi Dimitris, Hi Hong,
>
> Is the code failing the same way in both circumstances? Does it fail quickly 
> or after a longer run time?
>
> Might be time for a "sanity check" especially with a "new code". Try -O0 and -
> check (no space) just in case a subscript error crept into the code. You can 
> also add -g -traceback and -gen-interfaces. This last switch can help find 
> mismatches in argument lists.
>
> Repeatability is hard to do when more than one processor is involved. Events 
> do not happen at the same time nor in the same order from one run to the 
> next, so the whole world of parallel programming is non-deterministic. For 
> example, an app might be allocated CPUs at different locations and distances 
> within the system. Contention for access to /nobackup2a between users of 7-8 
> compute nodes, cfe2 and lou2 also occurs and this contention introduces an 
> even larger measure of non-repeatability.
>
> Art
>
>
> Hong Zhang <hong.zhang at caltech.edu> wrote:
>
>   
>> Hi, Art,
>> We did two more tests. Unfortunately both failed and caused the same
>> error.
>> One is that we used the old compiler (intel-comp.10.0.026)
>> to compile a new code (cube81, see
>> /nobackup2a/menemenl/cube81/MITgcm/build_test/ and
>> /nobackup2a/menemenl/cube81/MITgcm/run_test).
>> The other one is that we used the old compiler (intel-comp.10.0.026)
>> to compile an old version code which previously was OK (cube79,
>> /nobackup2a/menemenl/cube79/MITgcm/build/ and
>> /nobackup2a/menemenl/cube79/MITgcm/run).
>> But this time it failed (see
>> /nobackup2a/menemenl/cube79/MITgcm/build_test/ and
>> /nobackup2a/menemenl/cube79/MITgcm/run_test).
>> It's very strange. Why it is not repeatable?
>>
>> Do you have any idea?
>> thanks,
>>
>>
>>
>>
>>
>> NAS Support wrote:
>>     
>>> Hi Dimitris, Hi Hong,
>>>
>>> yes, you can use the module intel-comp.10.0.026 instead of the v10.1 
>>>       
> module. 
>   
>>> You can switch from one to the other:
>>>
>>> module switch intel-comp.10.1.013 intel-comp.10.0.026
>>>
>>> or
>>> module purge
>>> and then
>>> module load intel-comp.10.0.026
>>>
>>> and also the MPT and SCSL modules.
>>>
>>> Any chance I can get a test case to submit a bug report to Intel?
>>>
>>> Art
>>>
>>>
>>> Hong Zhang <hong.zhang at caltech.edu> wrote:
>>>
>>>   
>>>       
>>>> Hi, thanks for your suggestion,
>>>> we tried the new compiler, ie,  version of
>>>>
>>>> intel-comp.10.1.015
>>>>
>>>> But it has the same problem.
>>>>
>>>> Our previous runs show that the complier of
>>>>
>>>> intel-comp.10.0.026
>>>>
>>>> is workable.
>>>> Is this older version still available?
>>>> Can you help us to access this complier?
>>>>
>>>> thanks,
>>>> hong
>>>>
>>>>
>>>> NAS Support wrote:
>>>>     
>>>>         
>>>>> Hi Dimitris,
>>>>>
>>>>> right... no change to the other modules to do this.
>>>>>
>>>>> Art
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Dimitris Menemenlis <dmenemenlis at gmail.com> wrote:
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>>>> Hi we are having a possible compiler problem with ECCO2 code.  A  
>>>>>> configuration that used to run without trouble a few months ago now  
>>>>>> fails on second time step unless we 1) integrate with 270 CPUs (216  
>>>>>> and 450 fails), or compile with -O0 option.
>>>>>>
>>>>>> The compiler, etc., that we think we are using is
>>>>>> module load modules scsl.1.6.1.0 intel-comp.10.1.013 mpt.1.16.0.0 pd- 
>>>>>> netcdf.3.6.0-p1
>>>>>>
>>>>>> Any known bugs with above version?  Should we try a different one?
>>>>>>
>>>>>> Dimitris
>>>>>>
>>>>>> Dimitris Menemenlis <menemenlis at jpl.nasa.gov>
>>>>>> Jet Propulsion Lab, California Institute of Technology
>>>>>> MS 300-323, 4800 Oak Grove Dr, Pasadena CA 91109-8099, USA
>>>>>> tel: 818-354-1656;  cell: 818-625-6498;  fax: 818-393-6720
>>>>>>
>>>>>>
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> The current ticket state is: "open"
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>>     
>>>>         
>>> The current ticket state is: "open"
>>>
>>>   
>>>       
>>     
> The current ticket state is: "open"
>
>   




More information about the MITgcm-devel mailing list