[MITgcm-devel] Beaufort experiment on mac os x

Thu Mar 15 10:28:15 EDT 2012

Jean-Michel, I kind of agree with Martin's suggestion.
We should strive to order the loops k-j-i whenever possible,
even if in this case it does not impact performance because ini_masks_etc is only called once.
One reason is that folks (e.g., me) tend to copy and re-use bits of code elsewhere,
so it's possible this j-i-k loop will one day end up somewhere else, where it does affect performance.

Cheers

Dimitris Menemenlis

On Mar 15, 2012, at 7:12 AM, Martin Losch wrote:

> Hi Jean-Michel et al.
> 
> I agree, this is a compiler bug, but on the other hand, moving the k-loop out of the i/j-loops would be preferable from a vectorization point of view (again, in this routine performance is not an issue, still ...). All forward tests pass with this change, and the only backward test that is affected by an additional (the same) change in update_masks_etc.F does not even pass before I make the changes (NaNs in grdck output).
> 
> Martin
> 
> On Mar 15, 2012, at 2:51 PM, Jean-Michel Campin wrote:
> 
>> Hi Martin,
>> 
>> Thanks for looking a this.
>> It's interesting to know where the optimisation break.
>> But I would prefer not to change this routine (if the compiler does
>> a wrong optimisation, this is the compiler problem after all,
>> and I prefer to see this in the NOOPTFILES list).
>> 
>> Cheers,
>> Jean-Michel
>> 
>> On Thu, Mar 15, 2012 at 10:21:32AM +0100, Martin Losch wrote:
>>> Alternatively, we could replace
>>>       DO j=1-Oly,sNy+Oly
>>>        DO i=1-Olx,sNx+Olx
>>>         R_low(i,j,bi,bj) = rF(1)
>>>         DO k=Nr,1,-1
>>>          R_low(i,j,bi,bj) = R_low(i,j,bi,bj)
>>>    &                      - drF(k)*hFacC(i,j,k,bi,bj)
>>>         ENDDO
>>>        ENDDO
>>>       ENDDO
>>> 
>>> with
>>>       DO j=1-Oly,sNy+Oly
>>>        DO i=1-Olx,sNx+Olx
>>>         R_low(i,j,bi,bj) = rF(1)
>>>        ENDDO
>>>       ENDDO
>>>       DO k=Nr,1,-1
>>>        DO j=1-Oly,sNy+Oly
>>>         DO i=1-Olx,sNx+Olx
>>>          R_low(i,j,bi,bj) = R_low(i,j,bi,bj)
>>>    &                      - drF(k)*hFacC(i,j,k,bi,bj)
>>>         ENDDO
>>>        ENDDO
>>>       ENDDO
>>> That does work too (and makes sense, if you care about vectoriziation, although in this initialisation routine, performance is not an issue).
>>> 
>>> M.
>>> 
>>> On Mar 15, 2012, at 10:15 AM, Martin Losch wrote:
>>> 
>>>> Hi there I compiled with "-g -ftraceback" (and -O3) and used gdb and found that ini_masks_etc.F (line 120) is  the problematic rountine, I don't see why. Anyway, I put this into the list of NOOPTFILES and removed (had to!!!) the -ftree-vectorize option and it works with
>>>>> top
>>>> PID    COMMAND      %CPU  TIME     #TH  #WQ  #POR #MREG RPRVT  RSHRD  RSIZE
>>>> 75875  mitgcmuv     98.7  00:13.07 1/1  0    14   34    98M    240K   101M
>>>> so the approximate 100MB core memory that Dimitris was talking about.
>>>> 
>>>> I can check in this change, but incidentally, do we need the two gad_*.F routines in the NOOPTFILES list? If not, I'll remove them.
>>>> 
>>>> Martin
>>>> 
>>>> 
>>>> On Mar 14, 2012, at 7:37 PM, Torge Martin wrote:
>>>> 
>>>>> Hi Dimitris,
>>>>> 
>>>>> just updated MITgcm and beaufort. I also don't use the -ieee option with genamke2 anymore. 
>>>>> With -O3 I get the segmentation fault, with -O2 it runs just fine.
>>>>> 
>>>>> Torge
>>>>> 
>>>>> On Wed, Mar 14, 2012 at 10:15 AM, Menemenlis, Dimitris (3248) <Dimitris.Menemenlis at jpl.nasa.gov> wrote:
>>>>> Torge, maybe try "-O2".  It will be a bit faster.
>>>>> 
>>>>> Martin, since "-O3" in darwin_amd64_gfortran is problematic,
>>>>> should we downgrade to "-O2" in the CVS repository,
>>>>> until compiler bug is fixed ... or until someone takes the time to
>>>>> locate particular subroutine that causes optimization crash?
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> Dimitris Menemenlis
>>>>> 
>>>>> On Mar 14, 2012, at 10:08 AM, Torge Martin wrote:
>>>>> 
>>>>>> Hi Dimitris, Martin,
>>>>>> 
>>>>>> looks like Martin is right. I just found that using the -ieee option with genmake2 sets FOPTIM=-O0. This helps to get pass the Segmentation Fault.
>>>>>> 
>>>>>> Now, I the Beaufort set up is running with this option on my MacPro, OS X 10.5.8 (Snow Leopard), 2 x 3 GHz Dual-Core, 4 GB Memory, using gcc version 4.0.1 (Apple Inc. build 5465) and MITgcm/tools/build_options/darwin_ia32_gfortran.
>>>>>> 
>>>>>> Torge
>>>>>> 
>>>>>> P.S. Haven't tried running with MPI using both processors, yet.
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> MITgcm-devel mailing list
>>>>> MITgcm-devel at mitgcm.org
>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>>> 
>>>> 
>>>> _______________________________________________
>>>> MITgcm-devel mailing list
>>>> MITgcm-devel at mitgcm.org
>>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>> 
>>> 
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>> 
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
> 
> 
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel