[MITgcm-support] MPI speed issues

Martin Losch Martin.Losch at awi.de
Fri Jun 14 11:08:39 EDT 2013


Hi Neil,

I would try to solve the first issue first (it looks like there are no initial fields for temperature and salinity). On a clean copy of the MITgcm (where you didn't change anything but a build_options file):
cd verification
./testreport -t tutorial_global_oce_latlon -of "your build options file"
That should work. If it works (returns some numbers about agreement with reference result)
cd tutorial_global_ocean_latlon/build
make CLEAN
../../../tools/genmake2 -of "your bulid options file with the ieee-flags" -mods ../code
make depend
make
cd ../run
../build/mitgcm >| output.txt.new
which should give you different results from output.txt, because with testreport, the default optimization of your compiler is usually turned off.
Then you can start to change the configuration, and from then on any further error will probably be due to changes that you made.

About MPI: the configuration is very small. (90x40 horizontal grid points). Don't expect it to scale very well. You'll probably get a speed-up between 1 and 2 sub domains (tiles), but usually the model doesnt scale for domain sizes below 30x30 grid points.

Martin

On Jun 14, 2013, at 4:53 PM, Neil Patel <nigellius at gmail.com> wrote:

> I managed to get MPI running on my machine (I sent a message about it last week). I got it to work on tutorial_global_oce_latlon, I was previously using the exp4 example. Weirdly, I couldn't get it to run on a one-processor configuration and got this error:
> 
> terra tutorial_global_oce_latlon_test/input> ../build/mitgcmuv > output.txt
> ** WARNINGS ** INI_THETA: found      5040 wet grid-pts with theta=0 identically.
> ** WARNINGS ** INI_SALT: found      5040 wet grid-pts with salt=0 identically.
> SOLUTION IS HEADING OUT OF BOUNDS: tMin,tMax= -1.627E+03  2.554E+03
> 
> Any clue? Stranger and more problematic, using more processors with MPI got no speed boost, it just made more processors work harder. Here's some sample output with the time command.
> 
> 19.177u 3.160s 0:02.91 767.3%	0+0k 0+171168io 0pf+0w — 25 processors, 30 time steps
> 27.481u 5.020s 0:03.96 820.7%	0+0k 0+302768io 0pf+0w — 25 processors, 60 time steps
> 51.815u 9.144s 0:06.24 976.7%	0+0k 0+578568io 0pf+0w — 25 processors, 120 time steps
> 
> 1.972u 0.336s 0:01.94 118.5%	        0+0k 0+80232io   0pf+0w — 4 processors, 30 time steps
> 18.889u 2.172s 0:07.12 295.6%	0+0k 0+675784io 0pf+0w — 4 processors, 120 time steps
> 15.688u 1.564s 0:10.67 161.5%	0+0k 0+599248io 0pf+0w — 2 processors, 300 time steps
> 15.664u 1.664s 0:10.77 160.8%	0+0k 0+664520io 0pf+0w — 2 procesors, 300 time steps
> 
> Are there some MPI settings that might affect this?
> 
> Thanks,
> 
> Neil
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support




More information about the MITgcm-support mailing list