[MITgcm-devel] Beaufort experiment on mac os x

Thu Mar 15 11:00:42 EDT 2012

OK.

It'd be good to hear from Torge whether
 NOOPTFILES='ini_masks_etc.F'

fixes his segmentation-fault and out-of-memory problems.

Cheers

Dimitris Menemenlis

On Mar 15, 2012, at 7:54 AM, Jean-Michel Campin wrote:

> Hi,
> 
> I still disagree. There are probably other places where similar 
> problem can appear, and leaving -O3 with empty NOOPTFILES list
> is just giving the impression that this optimisation level
> is safe and can be use with no caution.
> Since ini_masks_etc.F is not an issue for cpu time spent, this 
> is a perfect candidate for the NOOPTFILES list.
> This is my view. Let see what others think.
> 
> Now, there is an other alternative (that I prefer even better)
> it's to switch to -O2 for all subroutines whenever -O3 is not safe.
> 
> Cheers,
> Jean-Michel
> 
> On Thu, Mar 15, 2012 at 07:28:15AM -0700, Menemenlis, Dimitris (3248) wrote:
>> Jean-Michel, I kind of agree with Martin's suggestion.
>> We should strive to order the loops k-j-i whenever possible,
>> even if in this case it does not impact performance because ini_masks_etc is only called once.
>> One reason is that folks (e.g., me) tend to copy and re-use bits of code elsewhere,
>> so it's possible this j-i-k loop will one day end up somewhere else, where it does affect performance.
>> 
>> Cheers
>> 
>> Dimitris Menemenlis
>> 
>> On Mar 15, 2012, at 7:12 AM, Martin Losch wrote:
>> 
>>> Hi Jean-Michel et al.
>>> 
>>> I agree, this is a compiler bug, but on the other hand, moving the k-loop out of the i/j-loops would be preferable from a vectorization point of view (again, in this routine performance is not an issue, still ...). All forward tests pass with this change, and the only backward test that is affected by an additional (the same) change in update_masks_etc.F does not even pass before I make the changes (NaNs in grdck output).
>>> 
>>> Martin
>>> 
>>> On Mar 15, 2012, at 2:51 PM, Jean-Michel Campin wrote:
>>> 
>>>> Hi Martin,
>>>> 
>>>> Thanks for looking a this.
>>>> It's interesting to know where the optimisation break.
>>>> But I would prefer not to change this routine (if the compiler does
>>>> a wrong optimisation, this is the compiler problem after all,
>>>> and I prefer to see this in the NOOPTFILES list).
>>>> 
>>>> Cheers,
>>>> Jean-Michel
>>>> 
>>>> On Thu, Mar 15, 2012 at 10:21:32AM +0100, Martin Losch wrote:
>>>>> Alternatively, we could replace
>>>>>      DO j=1-Oly,sNy+Oly
>>>>>       DO i=1-Olx,sNx+Olx
>>>>>        R_low(i,j,bi,bj) = rF(1)
>>>>>        DO k=Nr,1,-1
>>>>>         R_low(i,j,bi,bj) = R_low(i,j,bi,bj)
>>>>>   &                      - drF(k)*hFacC(i,j,k,bi,bj)
>>>>>        ENDDO
>>>>>       ENDDO
>>>>>      ENDDO
>>>>> 
>>>>> with
>>>>>      DO j=1-Oly,sNy+Oly
>>>>>       DO i=1-Olx,sNx+Olx
>>>>>        R_low(i,j,bi,bj) = rF(1)
>>>>>       ENDDO
>>>>>      ENDDO
>>>>>      DO k=Nr,1,-1
>>>>>       DO j=1-Oly,sNy+Oly
>>>>>        DO i=1-Olx,sNx+Olx
>>>>>         R_low(i,j,bi,bj) = R_low(i,j,bi,bj)
>>>>>   &                      - drF(k)*hFacC(i,j,k,bi,bj)
>>>>>        ENDDO
>>>>>       ENDDO
>>>>>      ENDDO
>>>>> That does work too (and makes sense, if you care about vectoriziation, although in this initialisation routine, performance is not an issue).
>>>>> 
>>>>> M.