[MITgcm-devel] [MITgcm-support] MPI problem on Archer (CRAY XC30)
Jean-Michel Campin
jmc at ocean.mit.edu
Wed Oct 22 08:50:55 EDT 2014
Hi David,
Not yet ready for checking in this change.
Cheers,
Jean-Michel
On Wed, Oct 22, 2014 at 07:58:58AM +0100, David Ferreira wrote:
>
> BTW, should I/you check in the modified version of generate_tag.F ?
> cheers,
> david
>
> On 10/21/14 8:05 PM, Jean-Michel Campin wrote:
> >Hi Chris,
> >
> >you are right: the way the MPI-tag is currently generated only
> >account for the 1rst 3 characters of the name we passed,
> >and they happen to be identical (=ATM or =OCN) for all fields that
> >come from / go to the same component. And apparently it works !
> >
> >Cheers,
> >Jean-Michel
> >
> >On Tue, Oct 21, 2014 at 10:46:09AM -0400, Chris Hill wrote:
> >>Hi J-M,
> >>
> >> Think this is even below devel!
> >>
> >> I have a feeling that messages are uniquely id’d through
> >> tag and rank, not just tag alone? Anyhow, we can definitely tidy
> >> up - just need to remember what on earth we were thinking
> >> when we did this :-).
> >>
> >> Will catch you later in the week.
> >>
> >>Chris
> >>
> >>On Tue, Oct 21, 2014 at 10:42 AM, Jean-Michel Campin <jmc at ocean.mit.edu> wrote:
> >>>Hi Chris,
> >>>
> >>>Switched to devel for now.
> >>>
> >>>I am not sure that just iarg1+iarg2 will be enough (ignoring carg),
> >>>since 1 component (e.g., ATM) is doing multiple non blocking send
> >>>one after the other, using same iarg1 & iarg2, but different carg
> >>>(in pkg/atm_compon_interf/cpl_export_my_data.F).
> >>>There are BARRIER calls in atm_export_fld.F, but just for threads.
> >>>And the fact that there is a check on HEADER content makes it safer,
> >>>but does avoid the need to have the right tag, I think.
> >>>
> >>>But it's probably true that the sum iarg1+iarg2 could be used as part
> >>>of a simpler tag expression, since all comp{send/rec}_r8tiles we use
> >>>have the same iarg1, and the other send/rec don't use iarg2 or
> >>>from an early initialisation call that would not interfer with
> >>>the other.
> >>>
> >>>May be we could try to get this MPI max tag value (using MPI_Get_attr ?)
> >>>and make up a tag number that fits.
> >>>
> >>>Cheers,
> >>>Jean-Michel
> >>>
> >>>On Mon, Oct 20, 2014 at 10:44:30PM -0400, Chris Hill wrote:
> >>>>Hi David and Jean-Michel,
> >>>>
> >>>> I am unsure why generate_tag doesn’t just do "iarg1+iarg2", but
> >>>>lets try Jean-Michels’s fix. Not sure who introduced the funky
> >>>>hash stuff in the first place. My recollection is that “iarg1+iarg2”
> >>>>should be all that are needed, but that code was a long time ago!
> >>>>
> >>>>Chris
> >>>>
> >>>>On Mon, Oct 20, 2014 at 10:29 PM, Jean-Michel Campin <jmc at ocean.mit.edu> wrote:
> >>>>>Hi David,
> >>>>>
> >>>>>Could you try to replace the original
> >>>>> pkg/compon_communic/generate_tag.F
> >>>>>with the modified "generate_tag.F" routine (attached to this email)
> >>>>>and check that it works ?
> >>>>>
> >>>>>Cheers,
> >>>>>Jean-Michel
> >>>>>
> >>>>>On Mon, Oct 20, 2014 at 03:14:51PM +0100, David Ferreira wrote:
> >>>>>>Hi all,
> >>>>>>I'm having a problem running the coupled model on Archer.
> >>>>>>The Archer support team discovered the following problem:
> >>>>>>
> >>>>>>##########
> >>>>>>...
> >>>>>>---
> >>>>>>Rank 0 [Tue Oct 7 22:44:10 2014] [c7-0c1s12n2] Fatal error in
> >>>>>>MPI_Recv: Invalid
> >>>>>>tag, error stack:
> >>>>>>MPI_Recv(192): MPI_Recv(buf=0x7fffffff8000, count=1024, MPI_INTEGER,
> >>>>>>src=1, tag=9862928,
> >>>>>> comm=0x84000004, status=0x7fffffff7df0) failed
> >>>>>>MPI_Recv(113): Invalid tag, value is 9862928
> >>>>>>---
> >>>>>>
> >>>>>>The maximum allowed value for the "tag" in MPI messages on Cray XC30 systems
> >>>>>>is 4194303, which explains the error message
> >>>>>>
> >>>>>>MPI_Recv(113): Invalid tag, value is 9862928
> >>>>>>
> >>>>>>The MPI standard only requires that the maximum tag value is not less than
> >>>>>>32767 (so Cray MPI is compliant). The maximum can be larger, and can
> >>>>>>be found
> >>>>>>using the MPI_Get_attr enquiry function. Your program should
> >>>>>>retrieve this value
> >>>>>>and ensure that any tags it specifies are no larger.
> >>>>>>
> >>>>>>##########
> >>>>>>
> >>>>>>I have absolutely no clue how to tell the model to choose MPI tags
> >>>>>>which are below 4194303.
> >>>>>>A bit of googling on MPI_Get_attr led me to some very obscure pages.
> >>>>>>
> >>>>>>Any help is welcome.
> >>>>>>Cheers,
> >>>>>>david
> >>>>>>
> >>>>>>_______________________________________________
> >>>>>>MITgcm-support mailing list
> >>>>>>MITgcm-support at mitgcm.org
> >>>>>>http://mitgcm.org/mailman/listinfo/mitgcm-support
> >>>>>_______________________________________________
> >>>>>MITgcm-support mailing list
> >>>>>MITgcm-support at mitgcm.org
> >>>>>http://mitgcm.org/mailman/listinfo/mitgcm-support
> >>>>>
> >>>>_______________________________________________
> >>>>MITgcm-support mailing list
> >>>>MITgcm-support at mitgcm.org
> >>>>http://mitgcm.org/mailman/listinfo/mitgcm-support
> >>>_______________________________________________
> >>>MITgcm-devel mailing list
> >>>MITgcm-devel at mitgcm.org
> >>>http://mitgcm.org/mailman/listinfo/mitgcm-devel
> >>_______________________________________________
> >>MITgcm-devel mailing list
> >>MITgcm-devel at mitgcm.org
> >>http://mitgcm.org/mailman/listinfo/mitgcm-devel
> >_______________________________________________
> >MITgcm-devel mailing list
> >MITgcm-devel at mitgcm.org
> >http://mitgcm.org/mailman/listinfo/mitgcm-devel
>
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list