[MITgcm-devel] I/O

Fri Aug 11 03:26:42 EDT 2017

Hi Dimitris,

one of the reasons why I suggested this is that the stuff in code-async seems so invasive and configuration specific to me, whereas what I suggest should work without too many changes in the code (but I am not so sure about that).
But, honestly, I don’t really understand how the "code-async” works. Do you reserve extra node(s) for this or do you reserve extra cpus on nodes that are already used by the model run? In the latter case, it is almost exactly what I had in mind and I probably should stay away from it, because it is too involved (with my limited understanding of this)?

Martin

> On 11. Aug 2017, at 02:49, Dimitris Menemenlis <dmenemenlis at gmail.com> wrote:
> 
> Hi Martin, another alternative to useSingleCPUio is the code written by
> NASA Ames folks for the 1/48 global ocean simulation and checked in here:
> http://wwwcvs.mitgcm.org/viewvc/MITgcm/MITgcm_contrib/llc_hires/llc_4320/code-async/
> 
> As you suggest it’s a n on m strategy, except that you request additional CPUs, 2 to 5% typically,
> that just do I/O.  The code takes care of distributing these CPUs judiciously among compute nodes.
> The asyncio code literally makes the cost of I/O disappear, as long as the disk can keep up with dump frequency.
> 
> Dimitris Menemenlis
> 
>> On Aug 10, 2017, at 9:14 AM, Martin Losch <Martin.Losch at awi.de> wrote:
>> 
>> Hi all,
>> 
>> I just heard an interesting talk about optimizing I/O, which we could try also for the MITgcm (mdsio).
>> The idea is to use more than one CPU for I/O, but not all, to write individual output streams. E.g., Instead of one CPU as for useSingleCPUio, one could have a list of cores, e.g. one per compute node, that can do the I/O (on top of the usual computation). At the time of writing each new output stream (e.g. from the diagnostics package, but also the “regular” variables like T, S, Eta, etc.) is done by the next CPU in the list in a “round robin” way. This cpu then gathers the field and writes it. One can in second step do the writing asynchronously; apparently the writing does not interfere too much with the ongoing compuations. In the talk this output method was much faster than anything else, also than moving the output to extra dedicated I/O-CPUs (because of the extra network load). This applies to large simulations, where the indiviual fields still fit into the part of the memory of one node that is not used from computations (probably not the llc4320 size).
>> 
>> As far as I can see, to do this one needs to replace MASTER_CPU_IO in routines like MDS_WRITE_FIELD by something like MASTER_CPU_OUT (which can default to MASTER_CPU_IO, if the method is not used), which checks if myProcId equals the “next in the list”, and then “the next in the list" needs to be passed to the gather routines, so that other than the master-process = 0 can do the gather, but I’d have to try it out (I have help for this) ...
>> 
>> I am asking for your opinion, because if there are good reasons for not doing this, I will not spend any further time on this. Does this seem like something worth the effort?
>> 
>> Martin
>> 
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
> 
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel