[MITgcm-devel] archer CRAY_XC30

Thu Apr 30 09:38:26 EDT 2015

Hi Martin,
OK, if it is not a small thing to do, I won't do it :-)
But this will probably be done in the coming months. I'll keep in touch.
cheers,
david

On 4/30/15 8:33 AM, Martin Losch wrote:
> Hi David,
>
> for a scaling analysis you’d have to have an application that is large enough so that there is a chance for it to scale (so like a 1024^2 or 2048^2 domain or so) and then use different processors. Don’t take that too lightly, I have not spend most of the past 3 weeks on trying to figure out the scaling for “my" configuration (JPL’s arctic_4km set up) and I have come to the conclusion (with a little help from the sysadmins) that the linear scalability limit is earlier than expected, due to a somewhat slow mpi_allreduce. The pat_build/pat_report/app2 stuff is actually quite useful (although there was a steep learning curve for me, too).
>
> So maybe no need to repeat that on Archer now, if you going for an optimization of the code anyway.
>
> Martin
>
>> On 30 Apr 2015, at 09:06, David Ferreira <dfer at mit.edu> wrote:
>>
>> Hi Martin,
>> Sorry for lapse. And thanks for the tip about TARGET_CRAYXT, it helps (and indeed produces a lot of scratch* files).
>>
>> I don't have any profiling of the model on Archer (and not sure about the pat_stuff). Profiling will be part of the Archer optimization, but will take while.
>>
>> Meanwhile, I need to set-up a configuration on Archer. Would it be useful to just do a quick test with different number of processor or do you need something more extensive ?
>>
>> cheers,
>> david
>>
>>
>> On 4/27/15 4:57 PM, Martin Losch wrote:
>>> Hi David,
>>>
>>> thanks for your answer. BTW I don’t know if the model is slow or not, it just does not scale as expected with increasing number of processors.
>>>
>>> I needed to define TARGET_CRAYXT to run with OpenMP because the different threads somehow tried to open the same “scratch” files for reading the namelists. With this option I didn’t have any trouble at all (except that I now have many useless scratch*-files in the run directory).
>>>
>>> Did you do any profiling of the code? From running pat_build and pat_report (I don’t know if that’s availabe on archer) I find that MPI_ALLREDUCE seems to be the problem.
>>>
>>> M.
>>>> On 27 Apr 2015, at 17:45, David Ferreira <dfer at mit.edu> wrote:
>>>>
>>>> Hi Martin,
>>>> I'm doing good! I'm almost fully British by now.
>>>>
>>>> So to answer you are question:
>>>> the /home/n02/n02/dfer/linux_Archer_cray is indeed the one checked-in. I should fix this to avoid confusion.
>>>>
>>>> I did not tweak the cray environment. In fact I went for the bare minimum, getting the testreports  (normal+restart) running and giving decent results. I failed to get mixed mode (MPI+OpenMP) running. Whatever I tried, I got an error with the synchronization of the threads (did you run into such an issue?).
>>>> That said, I did not find the model to be slow, it seems comparable or faster than on the NASA Pleiades.
>>>>
>>>> Not very useful for you, sorry.
>>>>
>>>> One reason I did not do much is that we got a small technological grant to install and optimize the MITgcm and its adjoint on Archer (an engineer from the Archer team will carry the work). This includes improving the speed of the forward model (optimization obviously + for example, using the PETSc library in place of cg2d).
>>>> I'm happy to transmit whatever good comes out of this testing.
>>>>
>>>> cheers,
>>>> david
>>>>
>>>> On 4/27/15 8:47 AM, Martin Losch wrote:
>>>>> Hi David,
>>>>>
>>>>> how are you doing?
>>>>>
>>>>> I am running the MITgcm on ECMWF’s Cray XC30, which is probably similar to archer. I have a few scaling issues that may be related to environment variables, compile options, etc.
>>>>>
>>>>>  From testing.html I saw that you don’t use the checked-in build options file for linux_ia64_cray_archer, but /home/n02/n02/dfer/linux_Archer_cray. Are there any significant difference between these files? What else do you do to tweak the cray environment? Do you link with dmapp-libraries, hugepages, etc.? I really don’t know what these things mean, but I am seeing only very little improvement (solve_for_pressure, and seaice_dynamics with their multible calls to mpi_allreduce in global_sum or global_max are the problem) with these options. I also use the “single reduction” cg2d version (ALLOW_SRCG + use it), which also helps; mixing MPI with OpenMP reduces the overhead in solve_for_pressure a little (currently I use only 4 threads).
>>>>>
>>>>> What’s your experience?
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> MITgcm-devel mailing list
>>>>> MITgcm-devel at mitgcm.org
>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>>> _______________________________________________
>>>> MITgcm-devel mailing list
>>>> MITgcm-devel at mitgcm.org
>>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel