[MITgcm-devel] local (tiled) MDSIO

Tue Jun 2 10:15:11 EDT 2009

Hi Martin and others,

Thanks for the test (for the number I reported, I changed the
dumpFreq to 1 to get much more IO, so that, by comparaison,
the timing you are getting are not bad). 
But if, latter on, you have a chance to pick some number from 
a "real" (the one you generally run) simulation on this sx8, 
could be interesting.

Otherwise, I could push this tiled IO further, read/write all levels
at a time, but I need bigger buffers (with size that are not know 
at compile time) so it requires more changes. And if we choose to
go this way, I would prefer to combine this change (all levels) 
with modifications to get those IO routines able to read/write 
non-shared array (not in common block) for multi-threaded run.

Cheers,
Jean-Michel

On Tue, Jun 02, 2009 at 12:36:53PM +0200, Martin Losch wrote:
> Hi Jean-Michel,
>
> here's what I find for verification/deep_anelastic (no modifications of 
> data files, 4 tiles)
> on 2009-05-31 04:20 (so before you changes)
> PID.TID 0000.0001)   Seconds in section "ALL                     
> [THE_MODEL_MAIN]":
> (PID.TID 0000.0001)           User time:  13.26000034343451
> (PID.TID 0000.0001)         System time:  1.430000022053719
> (PID.TID 0000.0001)     Wall clock time:  15.95677185058594
> today (after your changes):
> (PID.TID 0000.0001)   Seconds in section "ALL                     
> [THE_MODEL_MAIN]":
> (PID.TID 0000.0001)           User time:  12.71000026725233
> (PID.TID 0000.0001)         System time:  1.395000047981739
> (PID.TID 0000.0001)     Wall clock time:  15.20528006553650
>
> So faster, but not significantly. The reason is probably, that for the  
> GSFS of the SX8 the batches of IO are still very small. The system  
> considers basically everything below 1GB as small (o:
>
> Martin
>
>
>
> On Jun 1, 2009, at 4:34 PM, Jean-Michel Campin wrote:
>
>> Hi Martin,
>>
>> I've check-in a modification to MDSIO pkg such as tiled IO are
>> now done by chunk of 1-level tile (instead of 1-line of length sNx).
>> I remember you reported that non-SingleCpuIO was slower than
>> SingleCpuIO because of many small read/write pieces.
>> This modification should improve the speed of those IO,
>> and it would be interesting to see if it really does (because it's
>> still a matter of platform/disk system ...).
>>
>> I've did some short test with lot of IO, and in the most favorable one
>> (verification/deep_anelastic), without any Optimisation, I get:
>> std_outp.new   User: 13.3799661 System: 0.490924996 Wall clock:  
>> 14.110111
>> std_outp.ref   User: 15.7935989 System: 6.53900592 Wall clock:  
>> 22.7398989
>> In other cases, I've seen also a reduction of the System time,
>> but the wall-clock time improvement was not as big.
>>
>> Cheers,
>> Jean-Michel
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel