[MITgcm-devel] [MITgcm-support] MPI problem on Archer (CRAY XC30)
Chris Hill
cnh at mit.edu
Tue Oct 21 10:46:09 EDT 2014
Hi J-M,
Think this is even below devel!
I have a feeling that messages are uniquely id’d through
tag and rank, not just tag alone? Anyhow, we can definitely tidy
up - just need to remember what on earth we were thinking
when we did this :-).
Will catch you later in the week.
Chris
On Tue, Oct 21, 2014 at 10:42 AM, Jean-Michel Campin <jmc at ocean.mit.edu> wrote:
> Hi Chris,
>
> Switched to devel for now.
>
> I am not sure that just iarg1+iarg2 will be enough (ignoring carg),
> since 1 component (e.g., ATM) is doing multiple non blocking send
> one after the other, using same iarg1 & iarg2, but different carg
> (in pkg/atm_compon_interf/cpl_export_my_data.F).
> There are BARRIER calls in atm_export_fld.F, but just for threads.
> And the fact that there is a check on HEADER content makes it safer,
> but does avoid the need to have the right tag, I think.
>
> But it's probably true that the sum iarg1+iarg2 could be used as part
> of a simpler tag expression, since all comp{send/rec}_r8tiles we use
> have the same iarg1, and the other send/rec don't use iarg2 or
> from an early initialisation call that would not interfer with
> the other.
>
> May be we could try to get this MPI max tag value (using MPI_Get_attr ?)
> and make up a tag number that fits.
>
> Cheers,
> Jean-Michel
>
> On Mon, Oct 20, 2014 at 10:44:30PM -0400, Chris Hill wrote:
>> Hi David and Jean-Michel,
>>
>> I am unsure why generate_tag doesn’t just do "iarg1+iarg2", but
>> lets try Jean-Michels’s fix. Not sure who introduced the funky
>> hash stuff in the first place. My recollection is that “iarg1+iarg2”
>> should be all that are needed, but that code was a long time ago!
>>
>> Chris
>>
>> On Mon, Oct 20, 2014 at 10:29 PM, Jean-Michel Campin <jmc at ocean.mit.edu> wrote:
>> > Hi David,
>> >
>> > Could you try to replace the original
>> > pkg/compon_communic/generate_tag.F
>> > with the modified "generate_tag.F" routine (attached to this email)
>> > and check that it works ?
>> >
>> > Cheers,
>> > Jean-Michel
>> >
>> > On Mon, Oct 20, 2014 at 03:14:51PM +0100, David Ferreira wrote:
>> >> Hi all,
>> >> I'm having a problem running the coupled model on Archer.
>> >> The Archer support team discovered the following problem:
>> >>
>> >> ##########
>> >> ...
>> >> ---
>> >> Rank 0 [Tue Oct 7 22:44:10 2014] [c7-0c1s12n2] Fatal error in
>> >> MPI_Recv: Invalid
>> >> tag, error stack:
>> >> MPI_Recv(192): MPI_Recv(buf=0x7fffffff8000, count=1024, MPI_INTEGER,
>> >> src=1, tag=9862928,
>> >> comm=0x84000004, status=0x7fffffff7df0) failed
>> >> MPI_Recv(113): Invalid tag, value is 9862928
>> >> ---
>> >>
>> >> The maximum allowed value for the "tag" in MPI messages on Cray XC30 systems
>> >> is 4194303, which explains the error message
>> >>
>> >> MPI_Recv(113): Invalid tag, value is 9862928
>> >>
>> >> The MPI standard only requires that the maximum tag value is not less than
>> >> 32767 (so Cray MPI is compliant). The maximum can be larger, and can
>> >> be found
>> >> using the MPI_Get_attr enquiry function. Your program should
>> >> retrieve this value
>> >> and ensure that any tags it specifies are no larger.
>> >>
>> >> ##########
>> >>
>> >> I have absolutely no clue how to tell the model to choose MPI tags
>> >> which are below 4194303.
>> >> A bit of googling on MPI_Get_attr led me to some very obscure pages.
>> >>
>> >> Any help is welcome.
>> >> Cheers,
>> >> david
>> >>
>> >> _______________________________________________
>> >> MITgcm-support mailing list
>> >> MITgcm-support at mitgcm.org
>> >> http://mitgcm.org/mailman/listinfo/mitgcm-support
>> >
>> > _______________________________________________
>> > MITgcm-support mailing list
>> > MITgcm-support at mitgcm.org
>> > http://mitgcm.org/mailman/listinfo/mitgcm-support
>> >
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list