[MITgcm-devel] cs510 on IBM p690
chris hill
cnh at mit.edu
Wed Feb 14 11:59:53 EST 2007
D/M,
Turning on W2_SAFE.... should catch a buffer issue, but I think
buffering should be dependent on tile size.
One thing to note is that some of the j exchanges will not even be using
MPI, because you have 8 tiles per CPU and there are 6 tiles per row per
face. So some are straight copies - but using same buffers.
My two thoughts are
1 - a configuration or clean rebuild screw up
2 - some initialization screw up due to a divide by 27 getting rounded
down in a way I overlooked. We did test with odd number CPU counts early
on, but we don't run that very often and don't have any regression tests
that do odd numbers. Running c32 with 24 tiles on 3 procs might be
instructive for that.
Got to dash - keep me posted.
Chris
Dimitris Menemenlis wrote:
>> can you do a 27 CPU test on columbia
>
> 27-cpu will not work on columbia because of limited memory
>
> 54-cpu works fine
>
> Could Martin's problem be an MPI buffer overflow, which is not caught by
> MPI implementation? Or maybe a bug in the particular MPI library that
> he is using? In running on Columbia we had to play around with a bunch
> of MPI variables For a 216-CPU configuration, we are using following
> environment variables:
>
> limit descriptors unlimited
> limit stacksize 2000m
> limit coredumpsize 1
> module load modules scsl.1.6.1.0 intel-comp.9.1.039 mpt.1.12.0.nas
> setenv MPI_DSM_DISTRIBUTE
> setenv MPI_BUFS_PER_PROC 512
> setenv MPI_BUFS_PER_HOST 512
> setenv MPI_MSGS_PER_HOST 2048
> setenv MPI_MSGS_PER_PROC 1024
> setenv MPI_MSG_RETRIES 5000
> setenv KMP_STACKSIZE 1000m
> setenv KMP_LIBRARY turnaround
>
> Not sure in detail what all of the above do but they are needed both to
> speed up the code "and" to avoid MPI buffer overflows.
>
> D.
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>
More information about the MITgcm-devel
mailing list