[MITgcm-support] CRAY XD1
Martin Losch
mlosch at awi-bremerhaven.de
Thu May 12 17:20:55 EDT 2005
Chris,
there has been some progress on the CRAY XD1 problem. The problem: on
our XD1 the model would not run with MPI (segmentation fault), but
after removing the -r8 compiler flag I could run two different
configurations on up to 30 and 32CPUs (linux_amd64_pgf77+mpi_xd1). The
next step would have been 50 and 64CPUs, but with these, I got
segmenations faults again.
The cray people (to whom I am CC'ing this email) found out that a write
statement in mon_out.F make the 50CPU run go through, pointing towards
a stacksize problem. This is what they did:
> diff -c mon_out.F_orig mon_out.F
> *** mon_out.F_orig Tue May 10 16:45:34 2005
> --- mon_out.F Tue May 10 16:46:03 2005
> ***************
> *** 194,199 ****
> --- 194,200 ----
> #endif /* ALLOW_USE_MPI */
>
> IF (mon_write_stdout) THEN
> + write(0,*) 'itype =',itype
> IF (itype .EQ. 1)
> & WRITE(msgBuf(36:57),'(1X,I21)') ival
>
> IF (itype .EQ. 2)
I can reproduce that. This is again close to a "CALL PRINT_MESSAGE"
statement (as the original segmentation fault, see below in my initial
email), so I was wondering whether this has to do with that routine. I
commented out the entire body of PRINT_MESSAGE, but that did not do the
trick. Do you have any idea what may be going on?
Martin
PS. Still no test accout for you, but I'll keep trying.
On Apr 5, 2005, at 1:13 PM, chris hill wrote:
> Hi Martin,
>
> If you can get us an account that would be great.
>
> Thanks,
>
> Chris
> Martin Losch wrote:
>> Chris,
>> I had help from the CRAY support, and they could get rid of the
>> problem by removing the -r8 option for the pgf77 compiler. Now I only
>> get segmentation faults if I the number of CPUs is too large
>> (currently 50 is a problem). But the CRAY people can reproduce this
>> and promised to find the problem. I assume that there's a bug in the
>> MPI implementation (MPICH). In the meantime, I have asked for test
>> accounts (again), in case you are interested, but not success so far
>> (neverending story). I'll keep you posted.
>> Martin
>> On Apr 1, 2005, at 3:17 PM, Chris Hill wrote:
>>> Hi Martin,
>>>
>>> Did you get anywhere on this (I can't see a reply but I have been
>>> away).
>>> On thing that can make this symptom happen is stacksize problems in
>>> a shell. Can you set the shell stacksize to "unlimited" for your
>>> shell and see what happens. I can't remember the command for this
>>> and I'm not properly on line - its something like "ulimit" I think.
>>>
>>> Chris
>>> Martin Losch wrote:
>>>
>>>> Hi,
>>>> we have a new platform, a CRAY XD1, (basically opteron CPUs with a
>>>> very fast network). I have a hard time on this machine:
>>>> I was able to run a global_ocean experiment (180x80x23, the ecco
>>>> 2deg configuration, with KPP, GMRedi, etc.) with up to 12CPUs,
>>>> scales and everything, but I have a different configuration
>>>> (1500x1x100, or 256x256x100, cartesian coordinates,
>>>> non-hydrostatic, but no other extra packages, no netcdf, runs fine
>>>> on many platforms with up to 20CPUs so far), which ends with a
>>>> segmentation fault in the routine packages_boot.F (actually, while
>>>> printing a message with print_message). Works only with MPI turned
>>>> off.
>>>> For these two cases I use the same build_options file
>>>> (linux_amd64_pgf77+mpi_xd1).
>>>> Any idea what might be going on?
>>>> Martin
>>>> _______________________________________________
>>>> MITgcm-support mailing list
>>>> MITgcm-support at mitgcm.org
>>>> http://dev.mitgcm.org/mailman/listinfo/mitgcm-support
>>>
>>>
>>> _______________________________________________
>>> MITgcm-support mailing list
>>> MITgcm-support at mitgcm.org
>>> http://dev.mitgcm.org/mailman/listinfo/mitgcm-support
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
More information about the MITgcm-support
mailing list