[MITgcm-support] file size issue with mpi
Patrick Heimbach
heimbach at MIT.EDU
Mon Aug 23 18:34:43 EDT 2004
Tom,
from my experience the option
globalFiles = .TRUE.
should be used with caution, or not used at all.
>From what I know, parallel write to global file is
not a well defined operation on some/many(?) parallel platforms
and not strictly supported by MPI.
I ran into similar problem as recently as last week on an SGI Altix.
Filesize there was actually OK, but content was wrong.
So bottom line is (I think Chris/Alistair would agree)
it's implemented in the model and works on some platforms,
but by far not on all platforms.
-Patrick
Quoting THOMAS HAINE <thomas.haine at jhu.edu>:
> Hi Folks,
>
> I've hit a problem that is confusing me. If I run an MPI job (on my Opteron
> Suse cluster with g77 and mpich-1.2.5) I get incorrect file dumps according
> to how I set up mpi.
>
> For example, if I write to local disk:
> mpirun -np 9 -nolocal -machinefile node_list run/mitgcmuv -p4wd
> /tmp/twnh/scratch
>
> I get files written on each scratch disk which are the wrong size:
>
> for node in `cat node_list `; do echo $node; ssh $node 'ls -alt
> /tmp/twnh/scratch/U.*.data'; done
>
> gives:
>
> node2
> -rw-r--r-- 1 twnh users 96545664 Aug 23 06:30
> /tmp/twnh/scratch/U.0000000000.data
> node5
> -rw-r--r-- 1 twnh users 96212736 Aug 23 06:33
> /tmp/twnh/scratch/U.0000000000.data
> node14
> -rw-r--r-- 1 twnh users 96544512 Aug 23 02:21
> /tmp/twnh/scratch/U.0000000000.data
> node15
> -rw-r--r-- 1 twnh users 96546816 Aug 23 06:29
> /tmp/twnh/scratch/U.0000000000.data
> node16
> -rw-r--r-- 1 twnh users 96213888 Aug 23 06:31
> /tmp/twnh/scratch/U.0000000000.data
>
> The correct size is 96546816 bytes (so node15 is the right size - the others
> are too small. node15 ran processes 3 and 8 of 9). Similar problems occur
> with 2D files. This also happens if I write to disk on the master node
> (although it seems a bit better).
>
> These cases are with globalFiles=.true. If I set it false, to get tiled
> output, everything looks the right size when I write to local scratch disk.
>
> Any ideas what's going on here and how I should fix it? Have I missed
> something?
>
> Thanks, Tom.
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://dev.mitgcm.org/mailman/listinfo/mitgcm-support
>
--------------------------------------------------------
Patrick Heimbach Massachusetts Institute of Technology
FON: +1/617/253-5259 EAPS, Room 54-1518
FAX: +1/617/253-4464 77 Massachusetts Avenue
mailto:heimbach at mit.edu Cambridge MA 02139
http://www.mit.edu/~heimbach/ USA
More information about the MITgcm-support
mailing list