[MITgcm-devel] [MITgcm-support] MPI problem on Archer (CRAY XC30)
Jean-Michel Campin
jmc at ocean.mit.edu
Tue Oct 21 15:05:26 EDT 2014
Hi Chris,
you are right: the way the MPI-tag is currently generated only
account for the 1rst 3 characters of the name we passed,
and they happen to be identical (=ATM or =OCN) for all fields that
come from / go to the same component. And apparently it works !
Cheers,
Jean-Michel
On Tue, Oct 21, 2014 at 10:46:09AM -0400, Chris Hill wrote:
> Hi J-M,
>
> Think this is even below devel!
>
> I have a feeling that messages are uniquely id’d through
> tag and rank, not just tag alone? Anyhow, we can definitely tidy
> up - just need to remember what on earth we were thinking
> when we did this :-).
>
> Will catch you later in the week.
>
> Chris
>
> On Tue, Oct 21, 2014 at 10:42 AM, Jean-Michel Campin <jmc at ocean.mit.edu> wrote:
> > Hi Chris,
> >
> > Switched to devel for now.
> >
> > I am not sure that just iarg1+iarg2 will be enough (ignoring carg),
> > since 1 component (e.g., ATM) is doing multiple non blocking send
> > one after the other, using same iarg1 & iarg2, but different carg
> > (in pkg/atm_compon_interf/cpl_export_my_data.F).
> > There are BARRIER calls in atm_export_fld.F, but just for threads.
> > And the fact that there is a check on HEADER content makes it safer,
> > but does avoid the need to have the right tag, I think.
> >
> > But it's probably true that the sum iarg1+iarg2 could be used as part
> > of a simpler tag expression, since all comp{send/rec}_r8tiles we use
> > have the same iarg1, and the other send/rec don't use iarg2 or
> > from an early initialisation call that would not interfer with
> > the other.
> >
> > May be we could try to get this MPI max tag value (using MPI_Get_attr ?)
> > and make up a tag number that fits.
> >
> > Cheers,
> > Jean-Michel
> >
> > On Mon, Oct 20, 2014 at 10:44:30PM -0400, Chris Hill wrote:
> >> Hi David and Jean-Michel,
> >>
> >> I am unsure why generate_tag doesn’t just do "iarg1+iarg2", but
> >> lets try Jean-Michels’s fix. Not sure who introduced the funky
> >> hash stuff in the first place. My recollection is that “iarg1+iarg2”
> >> should be all that are needed, but that code was a long time ago!
> >>
> >> Chris
> >>
> >> On Mon, Oct 20, 2014 at 10:29 PM, Jean-Michel Campin <jmc at ocean.mit.edu> wrote:
> >> > Hi David,
> >> >
> >> > Could you try to replace the original
> >> > pkg/compon_communic/generate_tag.F
> >> > with the modified "generate_tag.F" routine (attached to this email)
> >> > and check that it works ?
> >> >
> >> > Cheers,
> >> > Jean-Michel
> >> >
> >> > On Mon, Oct 20, 2014 at 03:14:51PM +0100, David Ferreira wrote:
> >> >> Hi all,
> >> >> I'm having a problem running the coupled model on Archer.
> >> >> The Archer support team discovered the following problem:
> >> >>
> >> >> ##########
> >> >> ...
> >> >> ---
> >> >> Rank 0 [Tue Oct 7 22:44:10 2014] [c7-0c1s12n2] Fatal error in
> >> >> MPI_Recv: Invalid
> >> >> tag, error stack:
> >> >> MPI_Recv(192): MPI_Recv(buf=0x7fffffff8000, count=1024, MPI_INTEGER,
> >> >> src=1, tag=9862928,
> >> >> comm=0x84000004, status=0x7fffffff7df0) failed
> >> >> MPI_Recv(113): Invalid tag, value is 9862928
> >> >> ---
> >> >>
> >> >> The maximum allowed value for the "tag" in MPI messages on Cray XC30 systems
> >> >> is 4194303, which explains the error message
> >> >>
> >> >> MPI_Recv(113): Invalid tag, value is 9862928
> >> >>
> >> >> The MPI standard only requires that the maximum tag value is not less than
> >> >> 32767 (so Cray MPI is compliant). The maximum can be larger, and can
> >> >> be found
> >> >> using the MPI_Get_attr enquiry function. Your program should
> >> >> retrieve this value
> >> >> and ensure that any tags it specifies are no larger.
> >> >>
> >> >> ##########
> >> >>
> >> >> I have absolutely no clue how to tell the model to choose MPI tags
> >> >> which are below 4194303.
> >> >> A bit of googling on MPI_Get_attr led me to some very obscure pages.
> >> >>
> >> >> Any help is welcome.
> >> >> Cheers,
> >> >> david
> >> >>
> >> >> _______________________________________________
> >> >> MITgcm-support mailing list
> >> >> MITgcm-support at mitgcm.org
> >> >> http://mitgcm.org/mailman/listinfo/mitgcm-support
> >> >
> >> > _______________________________________________
> >> > MITgcm-support mailing list
> >> > MITgcm-support at mitgcm.org
> >> > http://mitgcm.org/mailman/listinfo/mitgcm-support
> >> >
> >>
> >> _______________________________________________
> >> MITgcm-support mailing list
> >> MITgcm-support at mitgcm.org
> >> http://mitgcm.org/mailman/listinfo/mitgcm-support
> >
> > _______________________________________________
> > MITgcm-devel mailing list
> > MITgcm-devel at mitgcm.org
> > http://mitgcm.org/mailman/listinfo/mitgcm-devel
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list