[MITgcm-support] file size issue with mpi

Patrick Heimbach heimbach at MIT.EDU
Mon Aug 23 18:34:43 EDT 2004


Tom,

from my experience the option
globalFiles = .TRUE.
should be used with caution, or not used at all.
>From what I know, parallel write to global file is
not a well defined operation on some/many(?) parallel platforms
and not strictly supported by MPI.
I ran into similar problem as recently as last week on an SGI Altix.
Filesize there was actually OK, but content was wrong.

So bottom line is (I think Chris/Alistair would agree)
it's implemented in the model and works on some platforms,
but by far not on all platforms.

-Patrick



Quoting THOMAS HAINE <thomas.haine at jhu.edu>:

> Hi Folks, 
> 
> I've hit a problem that is confusing me. If I run an MPI job (on my Opteron
> Suse cluster with g77 and mpich-1.2.5) I get incorrect file dumps according
> to how I set up mpi.
> 
> For example, if I write to local disk:
> mpirun -np 9 -nolocal -machinefile node_list run/mitgcmuv -p4wd
> /tmp/twnh/scratch
> 
> I get files written on each scratch disk which are the wrong size:
> 
> for node in `cat node_list `; do echo $node; ssh $node 'ls -alt
> /tmp/twnh/scratch/U.*.data'; done
> 
> gives:
> 
> node2
> -rw-r--r--    1 twnh     users    96545664 Aug 23 06:30
> /tmp/twnh/scratch/U.0000000000.data
> node5
> -rw-r--r--    1 twnh     users    96212736 Aug 23 06:33
> /tmp/twnh/scratch/U.0000000000.data
> node14
> -rw-r--r--    1 twnh     users    96544512 Aug 23 02:21
> /tmp/twnh/scratch/U.0000000000.data
> node15
> -rw-r--r--    1 twnh     users    96546816 Aug 23 06:29
> /tmp/twnh/scratch/U.0000000000.data
> node16
> -rw-r--r--    1 twnh     users    96213888 Aug 23 06:31
> /tmp/twnh/scratch/U.0000000000.data
> 
> The correct size is 96546816 bytes (so node15 is the right size - the others
> are too small. node15 ran processes 3 and 8 of 9). Similar problems occur
> with 2D files. This also happens if I write to disk on the master node
> (although it seems a bit better).
> 
> These cases are with globalFiles=.true. If I set it false, to get tiled
> output, everything looks the right size when I write to local scratch disk.
> 
> Any ideas what's going on here and how I should fix it?  Have I missed
> something?
> 
> Thanks, Tom.
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://dev.mitgcm.org/mailman/listinfo/mitgcm-support
> 


--------------------------------------------------------
Patrick Heimbach   Massachusetts Institute of Technology
FON: +1/617/253-5259                  EAPS, Room 54-1518
FAX: +1/617/253-4464             77 Massachusetts Avenue
mailto:heimbach at mit.edu               Cambridge MA 02139
http://www.mit.edu/~heimbach/                        USA




More information about the MITgcm-support mailing list