[MITgcm-support] results quite differents depending on number of procs used

Camille Mazoyer mazoyer at univ-tln.fr
Fri Mar 18 09:58:08 EDT 2016


Hi Jean-Michel,

Thank you very much for your reply, and sorry for the delay of mine.
I check the different points below:

Le 07/03/2016 16:52, Jean-Michel Campin a écrit :
> Hi Camille,
>
> Few comments here:
> 1) With tile size reduced to sNx=20, sNy=10 (120 procs) it's likely that
>    it will scale not as well (in part due to the increase of number of points
>    when including overlap). But it should works as well as the 10 procs case.
> 2) One thing you can check would be to compare, let's say
>    a 80 procs case (sNx=20, sNy=15, nPx=8, nPy=10) with
>    a 10 procs case with same tile size (sNx=20, sNy=15) but with more
>    tiles per procs (e.g., nSx=8, nSy=1, nPx=1, nPy=10).
>    These two cases should give identical results with recent version of the
>    code (#define GLOBAL_SUM_ORDER_TILES, added on Aug 25, 2015).
You're right! I have exactly the same results.
> 3) With different tile size, we expect small differences, but in your
>    case, differences seem quite large:
>    a) could be the flow regime is unstable, or the model parameter is close to
>     unstable, and then a small difference grows with time.
>    b) or there is some thing not right with one of the 2 tile-size. I would
>    suggest to repeat 2 short runs (one for each case) but turning off
>    compiler optimisation flag (e.g., -O0).
I ran short simulations (time=1hour) with -O0 and -O2. There are 
differences (eg. temperature) between -00 and -02 for each configuration.
My flags are:
- debug compilation: mpiifort -w95 -W0 -WB -convert big_endian -assume 
byterecl -fPIC -O0 -noalign -xW -ip -mp
- standard compilation: mpiifort -w95 -W0 -WB -convert big_endian 
-assume byterecl -fPIC -O2 -align -xW -ip
Configurations tested:
- 10x1 procs
- 10x1 procs (same tile size than 80 procs)
- 8x10 procs
- 8x15 procs (120 procs)

When I compare two simulations (with no compiler optimisation), it 
appears that I still have some differences between the simulations. 
Except 10x1 versus 10x1 procs same tile size than 80 procs which give 
same results.
I send you plots wich show differences on surface temperature after t=1 
hour. As you can see in the plots attached, the differences are bigger 
between  10x1 vs 8x15 than between 10x1 vs 8x10.

For the simulations 10x1 vs 8x15 (compiled with -O0), an interesting 
thing is that after only a time of 5 min, differences between surface 
temperature appear in the south. Their shape is more or less lines: I 
check, these lines are just in the border between 2 tiles (file: 
diff_10x1_8x15_5min_k130.gif).

I see in CPP_EEEOPTIONS.h that there are CPP keys for MPI SUM. Can 
another CPP Key from this file be usefull for my problem?
In fact, the main problem for me is that I don't know wich run is the 
saffer. I'm afraid the saffer are the ones less parallelized for the moment?

Thank you,
Camille

>    There has been reports of compilier optimisation problems that only show up
>    for some tile size but just fine for others.
>
> Cheers,
> Jean-Michel
>
> On Mon, Mar 07, 2016 at 11:29:11AM +0100, Camille Mazoyer wrote:
>> Dear all,
>>
>> I ran two simulations of a configuration of the Mediterranean coast,
>> near Toulon, France.
>> The simulations are exactly the same except the number of procs (10
>> procs for one run, 120 procs for the other run). I only change the
>> file SIZE.h to change the number of procs.
>> I know we can't except to have exactly the same results but I was
>> very surprised to see the differences. After 5 days, for example,
>> the max of differences between temperature fields is around 0.034.
>> Have you ever see such differences while changing number of procs?
>> Is this ok for you? If not, do you know where I might have made a
>> mistake?
>>
>> In attached files, you can see different plots, to compare a run
>> with 10 procs, and a run with 120 procs:
>> - the difference of temperature at the surface (k=kmax) :
>> diff_temp_kmax_5days.gif
>> - the difference of u field at the surface (k=kmax) : diff_u_kmax_5days.gif
>> - the difference of v field at the surface (k=kmax) : diff_v_kmax_5days.gif
>> - I calculate the mean of differences in the domain Nx*Ny*Nz, and I
>> plot it for each time : mean_diff_temp.gif (temperature ),
>> mean_diff_u.gif (u zonal), mean_diff_v.gif (v meridional).
>> =>>>> Differences increase with time.
>>
>>
>> Number of points on the domain: Nx=160, Ny=150, Nz=130.
>> Subdomains for 120 procs: sNx=20, sNy=10 points  => Is it to small,
>> for a subdomain?
>> Subdomains for  10 procs: sNx=160, sNy=15 points
>>
>>
>> Thank you for your advices and ideas,
>> Camille
>>
>>
>>
>> -- 
>> ------------------------------------------
>> Camille Mazoyer
>> Phd Student
>> Mediterranean Institute of Oceanography (MIO)
>> Institut de Mathématiques de Toulon (IMATH)
>> Université de TOULON
>> Bat X - CS 60584
>> 83041 TOULON cedex 9
>> France
>> http://mio.pytheas.univ-amu.fr/
>> http://imath.fr/
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support

-- 
------------------------------------------
Camille Mazoyer
Phd Student
Mediterranean Institute of Oceanography (MIO)
Institut de Mathématiques de Toulon (IMATH)
Université de TOULON
Bat X - CS 60584
83041 TOULON cedex 9
France
http://mio.pytheas.univ-amu.fr/
http://imath.fr/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff_O2_10x1_vs_10x1tilesize80procs_l13_1h.gif
Type: image/gif
Size: 12286 bytes
Desc: not available
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20160318/157c5aed/attachment-0005.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff_O0_O2_10x1_tilesize80procs_l13_1h.gif
Type: image/gif
Size: 12436 bytes
Desc: not available
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20160318/157c5aed/attachment-0006.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff_O0_O2_10x1_l13_1h.gif
Type: image/gif
Size: 12242 bytes
Desc: not available
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20160318/157c5aed/attachment-0007.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff_O0_10x1_vs_10x1tilesize80procs_l13_1h.gif
Type: image/gif
Size: 11886 bytes
Desc: not available
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20160318/157c5aed/attachment-0008.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: diff_10x1_8x15_5min_k130.gif
Type: image/gif
Size: 12450 bytes
Desc: not available
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20160318/157c5aed/attachment-0009.gif>


More information about the MITgcm-support mailing list