[MITgcm-devel] I/O

Mon Aug 14 03:32:33 EDT 2017

Hi Dimitris,
thanks for the claification. To me this looks pretty much like what I have seen in the talk, that I was referring to, although details may be different.

Maybe it’s not really that interesting to have this also for other packages like diagnostics. But in principle, on should be able to replace any WRITE_REC_LEV_RL in diagnostics_out.F with this, right? Would be more involved, wouldn’t it, because you’d have to deal with the CALL beginNewEpoch(icounter,myIter,0) earlier in the code (in do_the_model_io?) and pass the counter around, etc, but should be possibe, shouldn’t it?

Martin

> On 11. Aug 2017, at 21:47, Menemenlis, Dimitris (329C) <Dimitris.Menemenlis at jpl.nasa.gov> wrote:
> 
> Hi Martin, I agree that asyncio is invasive and configuration-specific and that
> what you suggest would be an improvement in terms of usability and portability.
> Bron Nelson has cleaned up his asyncio code somewhat compared to what
> is checked in MITgcm_contrib but I have not had time to test it and the code
> remains invasive and configuration-specific.
> 
> Definitely the usesinglecpuio flag is not very efficient as core count increases.
> Way back (http://ecco2.org/manuscripts/2007/Hill_etal_07_SciProg.pdf)
> Chris and I used the capability of MITgcm to run in mixed memory model
> to force model to do I/O from 1 core per shared memory set of processors.
> 
> For asyncio we reserve extra CPU cores that just do I/O.
> So for example let’s say we run an MITgcm configuration that
> requires 19023 cores and submit jobs with "mpiexec -n 20400 mitgcmuv”.
> This would set aside 1377 cores just for doing I/O.  During initialization,
> asyncio spreads these 1377 cores across all the available nodes that
> are being used by MITgcm for computations.  Let’s say that the 20400
> cores are from 1020 nodes with 20 cores each, the 1377 I/O cores
> will be distributed 1 core per node for 663 nodes and 2 cores per node
> for 357 nodes.
> 
> Dimitris
> 
>> On Aug 11, 2017, at 12:26 AM, Martin Losch <Martin.Losch at awi.de> wrote:
>> 
>> Hi Dimitris,
>> 
>> one of the reasons why I suggested this is that the stuff in code-async seems so invasive and configuration specific to me, whereas what I suggest should work without too many changes in the code (but I am not so sure about that).
>> But, honestly, I don’t really understand how the "code-async” works. Do you reserve extra node(s) for this or do you reserve extra cpus on nodes that are already used by the model run? In the latter case, it is almost exactly what I had in mind and I probably should stay away from it, because it is too involved (with my limited understanding of this)?
>> 
>> Martin
> 
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel