[MITgcm-support] (no subject)

Thu Jul 14 11:45:08 EDT 2016

Hi,

I'm trying to help researchers from the University of Maine to run MITgcm.
The model runs they think it should run much faster.

I have run or helped run many models while working for the Ocean Modeling
Group however this is the first time I have encountered MITgcm.

With Rutgers ROMS there is a method of running a number of tiles per
sub-domain and it seems that MITgcm can do that too. The reason for doing
so with ROMS was (I believe) to try to get the tiles to fit in cache to
increase performance. Is that the reason for doing so with MITgcm? We have
tried a number of combinations with not much luck.

For testing, the full domain we have is 600 x 520 x 21 using 64 processes
and getting only 30 time steps per minute. I wondered if the domain was too
small for that many processes so I reduced the number of processes but that
didn't help. The plan is to triple the resolution in each horizontal
direction and double in the vertical.

Our cluster has nodes with Intel E5-2600v3 processors totaling 24 cores per
node with FDR-10 Infiniband. The way the jobs were specified, some compute
nodes had many processes (like 20) on them and some had only 1 or 2. I
experimented and found that by using only 4 cores per node and only 48
cores, it ran close to twice as fast as with 64 cores and a mix of the
numbers of cores per node. To me this indicates that the
inter-process-communication is high and it is saturating the memory
bandwidth of the nodes with large process counts. That might point to the
subdomains being too small halo region being a significant proportion of
the subdomain) but in that case when I decreased the run to 16 cores I
would have thought that it would have improved things quite a bit. I
haven't profiled the code yet. I thought it might be quicker to write to
you to get some information first.

Can you please explain what the optimal layout is for performance? Is there
an optimal size subdomain that you know of for these processors? Optimal
number of tiles per subdomain? Also can you explain at a somewhat high
level any other factors to consider when running the model to get better
performance? Also, are there Intel Haswell CPU-specific compiler flags
(we're using the Intel compilers with MVAPICH2) that you can recommend to
us? Finally, is there a benchmark case where we can verify that we are
getting the expected performance?

Thanks very much,

Steve
-- 
________________________________________________________________
 Steve Cousins             Supercomputer Engineer/Administrator
 Advanced Computing Group            University of Maine System
 244 Neville Hall (UMS Data Center)              (207) 561-3574
 Orono ME 04469                      steve.cousins at maine.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20160714/df2b6f40/attachment.htm>