[MITgcm-devel] [MITgcm-support] MPI problem on Archer (CRAY XC30)

David Ferreira dfer at mit.edu
Wed Oct 22 02:58:58 EDT 2014


BTW, should I/you check in the modified version of generate_tag.F ?
cheers,
david

On 10/21/14 8:05 PM, Jean-Michel Campin wrote:
> Hi Chris,
>
> you are right: the way the MPI-tag is currently generated only
> account for the 1rst 3 characters of the name we passed,
> and they happen to be identical (=ATM or =OCN) for all fields that
> come from / go to the same component. And apparently it works !
>
> Cheers,
> Jean-Michel
>
> On Tue, Oct 21, 2014 at 10:46:09AM -0400, Chris Hill wrote:
>> Hi J-M,
>>
>>   Think this is even below devel!
>>
>>   I have a feeling that messages are uniquely id’d through
>>   tag and rank, not just tag alone? Anyhow, we can definitely tidy
>>   up - just need to remember what on earth we were thinking
>>   when we did this :-).
>>
>>   Will catch you later in the week.
>>
>> Chris
>>
>> On Tue, Oct 21, 2014 at 10:42 AM, Jean-Michel Campin <jmc at ocean.mit.edu> wrote:
>>> Hi Chris,
>>>
>>> Switched to devel for now.
>>>
>>> I am not sure that just iarg1+iarg2 will be enough (ignoring carg),
>>> since 1 component (e.g., ATM) is doing multiple non blocking send
>>> one after the other, using same iarg1 & iarg2, but different carg
>>> (in pkg/atm_compon_interf/cpl_export_my_data.F).
>>> There are BARRIER calls in atm_export_fld.F, but just for threads.
>>> And the fact that there is a check on HEADER content makes it safer,
>>> but does avoid the need to have the right tag, I think.
>>>
>>> But it's probably true that the sum iarg1+iarg2 could be used as part
>>> of a simpler tag expression, since all comp{send/rec}_r8tiles we use
>>> have the same iarg1, and the other send/rec don't use iarg2 or
>>> from an early initialisation call that would not interfer with
>>> the other.
>>>
>>> May be we could try to get this MPI max tag value (using MPI_Get_attr ?)
>>> and make up a tag number that fits.
>>>
>>> Cheers,
>>> Jean-Michel
>>>
>>> On Mon, Oct 20, 2014 at 10:44:30PM -0400, Chris Hill wrote:
>>>> Hi David and Jean-Michel,
>>>>
>>>>   I am unsure why generate_tag doesn’t just do "iarg1+iarg2", but
>>>> lets try Jean-Michels’s fix. Not sure who introduced the funky
>>>> hash stuff in the first place. My recollection is that “iarg1+iarg2”
>>>> should be all that are needed, but that code was a long time ago!
>>>>
>>>> Chris
>>>>
>>>> On Mon, Oct 20, 2014 at 10:29 PM, Jean-Michel Campin <jmc at ocean.mit.edu> wrote:
>>>>> Hi David,
>>>>>
>>>>> Could you try to replace the original
>>>>>   pkg/compon_communic/generate_tag.F
>>>>> with the modified "generate_tag.F" routine (attached to this email)
>>>>> and check that it works ?
>>>>>
>>>>> Cheers,
>>>>> Jean-Michel
>>>>>
>>>>> On Mon, Oct 20, 2014 at 03:14:51PM +0100, David Ferreira wrote:
>>>>>> Hi all,
>>>>>> I'm having a problem running the coupled model on Archer.
>>>>>> The Archer support team discovered the following problem:
>>>>>>
>>>>>> ##########
>>>>>> ...
>>>>>> ---
>>>>>> Rank 0 [Tue Oct  7 22:44:10 2014] [c7-0c1s12n2] Fatal error in
>>>>>> MPI_Recv: Invalid
>>>>>> tag, error stack:
>>>>>> MPI_Recv(192): MPI_Recv(buf=0x7fffffff8000, count=1024, MPI_INTEGER,
>>>>>> src=1, tag=9862928,
>>>>>>   comm=0x84000004, status=0x7fffffff7df0) failed
>>>>>> MPI_Recv(113): Invalid tag, value is 9862928
>>>>>> ---
>>>>>>
>>>>>> The maximum allowed value for the "tag" in MPI messages on Cray XC30 systems
>>>>>> is 4194303, which explains the error message
>>>>>>
>>>>>> MPI_Recv(113): Invalid tag, value is 9862928
>>>>>>
>>>>>> The MPI standard only requires that the maximum tag value is not less than
>>>>>> 32767 (so Cray MPI is compliant). The maximum can be larger, and can
>>>>>> be found
>>>>>> using the MPI_Get_attr enquiry function. Your program should
>>>>>> retrieve this value
>>>>>> and ensure that any tags it specifies are no larger.
>>>>>>
>>>>>> ##########
>>>>>>
>>>>>> I have absolutely no clue how to tell the model to choose MPI tags
>>>>>> which are below 4194303.
>>>>>> A bit of googling on MPI_Get_attr led me to some very obscure pages.
>>>>>>
>>>>>> Any help is welcome.
>>>>>> Cheers,
>>>>>> david
>>>>>>
>>>>>> _______________________________________________
>>>>>> MITgcm-support mailing list
>>>>>> MITgcm-support at mitgcm.org
>>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>>>> _______________________________________________
>>>>> MITgcm-support mailing list
>>>>> MITgcm-support at mitgcm.org
>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>>>>
>>>> _______________________________________________
>>>> MITgcm-support mailing list
>>>> MITgcm-support at mitgcm.org
>>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel




More information about the MITgcm-devel mailing list