[MITgcm-devel] [MITgcm-support] MPI problem on Archer (CRAY XC30)

Jean-Michel Campin jmc at ocean.mit.edu
Tue Oct 21 10:42:01 EDT 2014


Hi Chris,

Switched to devel for now.

I am not sure that just iarg1+iarg2 will be enough (ignoring carg),
since 1 component (e.g., ATM) is doing multiple non blocking send
one after the other, using same iarg1 & iarg2, but different carg
(in pkg/atm_compon_interf/cpl_export_my_data.F).
There are BARRIER calls in atm_export_fld.F, but just for threads.
And the fact that there is a check on HEADER content makes it safer, 
but does avoid the need to have the right tag, I think.

But it's probably true that the sum iarg1+iarg2 could be used as part
of a simpler tag expression, since all comp{send/rec}_r8tiles we use
have the same iarg1, and the other send/rec don't use iarg2 or
from an early initialisation call that would not interfer with
the other.

May be we could try to get this MPI max tag value (using MPI_Get_attr ?)
and make up a tag number that fits.

Cheers,
Jean-Michel

On Mon, Oct 20, 2014 at 10:44:30PM -0400, Chris Hill wrote:
> Hi David and Jean-Michel,
> 
>  I am unsure why generate_tag doesn’t just do "iarg1+iarg2", but
> lets try Jean-Michels’s fix. Not sure who introduced the funky
> hash stuff in the first place. My recollection is that “iarg1+iarg2”
> should be all that are needed, but that code was a long time ago!
> 
> Chris
> 
> On Mon, Oct 20, 2014 at 10:29 PM, Jean-Michel Campin <jmc at ocean.mit.edu> wrote:
> > Hi David,
> >
> > Could you try to replace the original
> >  pkg/compon_communic/generate_tag.F
> > with the modified "generate_tag.F" routine (attached to this email)
> > and check that it works ?
> >
> > Cheers,
> > Jean-Michel
> >
> > On Mon, Oct 20, 2014 at 03:14:51PM +0100, David Ferreira wrote:
> >> Hi all,
> >> I'm having a problem running the coupled model on Archer.
> >> The Archer support team discovered the following problem:
> >>
> >> ##########
> >> ...
> >> ---
> >> Rank 0 [Tue Oct  7 22:44:10 2014] [c7-0c1s12n2] Fatal error in
> >> MPI_Recv: Invalid
> >> tag, error stack:
> >> MPI_Recv(192): MPI_Recv(buf=0x7fffffff8000, count=1024, MPI_INTEGER,
> >> src=1, tag=9862928,
> >>  comm=0x84000004, status=0x7fffffff7df0) failed
> >> MPI_Recv(113): Invalid tag, value is 9862928
> >> ---
> >>
> >> The maximum allowed value for the "tag" in MPI messages on Cray XC30 systems
> >> is 4194303, which explains the error message
> >>
> >> MPI_Recv(113): Invalid tag, value is 9862928
> >>
> >> The MPI standard only requires that the maximum tag value is not less than
> >> 32767 (so Cray MPI is compliant). The maximum can be larger, and can
> >> be found
> >> using the MPI_Get_attr enquiry function. Your program should
> >> retrieve this value
> >> and ensure that any tags it specifies are no larger.
> >>
> >> ##########
> >>
> >> I have absolutely no clue how to tell the model to choose MPI tags
> >> which are below 4194303.
> >> A bit of googling on MPI_Get_attr led me to some very obscure pages.
> >>
> >> Any help is welcome.
> >> Cheers,
> >> david
> >>
> >> _______________________________________________
> >> MITgcm-support mailing list
> >> MITgcm-support at mitgcm.org
> >> http://mitgcm.org/mailman/listinfo/mitgcm-support
> >
> > _______________________________________________
> > MITgcm-support mailing list
> > MITgcm-support at mitgcm.org
> > http://mitgcm.org/mailman/listinfo/mitgcm-support
> >
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support



More information about the MITgcm-devel mailing list