[MITgcm-support] results quite differents depending on number of procs used

Jean-Michel Campin jmc at ocean.mit.edu
Mon Mar 7 10:52:25 EST 2016


Hi Camille,

Few comments here:
1) With tile size reduced to sNx=20, sNy=10 (120 procs) it's likely that 
  it will scale not as well (in part due to the increase of number of points 
  when including overlap). But it should works as well as the 10 procs case.
2) One thing you can check would be to compare, let's say 
  a 80 procs case (sNx=20, sNy=15, nPx=8, nPy=10) with
  a 10 procs case with same tile size (sNx=20, sNy=15) but with more
  tiles per procs (e.g., nSx=8, nSy=1, nPx=1, nPy=10).
  These two cases should give identical results with recent version of the
  code (#define GLOBAL_SUM_ORDER_TILES, added on Aug 25, 2015).
3) With different tile size, we expect small differences, but in your
  case, differences seem quite large:
  a) could be the flow regime is unstable, or the model parameter is close to
   unstable, and then a small difference grows with time.
  b) or there is some thing not right with one of the 2 tile-size. I would 
  suggest to repeat 2 short runs (one for each case) but turning off 
  compiler optimisation flag (e.g., -O0).
  There has been reports of compilier optimisation problems that only show up
  for some tile size but just fine for others.

Cheers,
Jean-Michel

On Mon, Mar 07, 2016 at 11:29:11AM +0100, Camille Mazoyer wrote:
> Dear all,
> 
> I ran two simulations of a configuration of the Mediterranean coast,
> near Toulon, France.
> The simulations are exactly the same except the number of procs (10
> procs for one run, 120 procs for the other run). I only change the
> file SIZE.h to change the number of procs.
> I know we can't except to have exactly the same results but I was
> very surprised to see the differences. After 5 days, for example,
> the max of differences between temperature fields is around 0.034.
> Have you ever see such differences while changing number of procs?
> Is this ok for you? If not, do you know where I might have made a
> mistake?
> 
> In attached files, you can see different plots, to compare a run
> with 10 procs, and a run with 120 procs:
> - the difference of temperature at the surface (k=kmax) :
> diff_temp_kmax_5days.gif
> - the difference of u field at the surface (k=kmax) : diff_u_kmax_5days.gif
> - the difference of v field at the surface (k=kmax) : diff_v_kmax_5days.gif
> - I calculate the mean of differences in the domain Nx*Ny*Nz, and I
> plot it for each time : mean_diff_temp.gif (temperature ),
> mean_diff_u.gif (u zonal), mean_diff_v.gif (v meridional).
> =>>>> Differences increase with time.
> 
> 
> Number of points on the domain: Nx=160, Ny=150, Nz=130.
> Subdomains for 120 procs: sNx=20, sNy=10 points  => Is it to small,
> for a subdomain?
> Subdomains for  10 procs: sNx=160, sNy=15 points
> 
> 
> Thank you for your advices and ideas,
> Camille
> 
> 
> 
> -- 
> ------------------------------------------
> Camille Mazoyer
> Phd Student
> Mediterranean Institute of Oceanography (MIO)
> Institut de Mathématiques de Toulon (IMATH)
> Université de TOULON
> Bat X - CS 60584
> 83041 TOULON cedex 9
> France
> http://mio.pytheas.univ-amu.fr/
> http://imath.fr/

> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support




More information about the MITgcm-support mailing list