[MITgcm-support] file size issue with mpi

THOMAS HAINE thomas.haine at jhu.edu
Mon Aug 23 18:14:25 EDT 2004


Hi Folks, 

I've hit a problem that is confusing me. If I run an MPI job (on my Opteron Suse cluster with g77 and mpich-1.2.5) I get incorrect file dumps according to how I set up mpi.

For example, if I write to local disk:
mpirun -np 9 -nolocal -machinefile node_list run/mitgcmuv -p4wd /tmp/twnh/scratch

I get files written on each scratch disk which are the wrong size:

for node in `cat node_list `; do echo $node; ssh $node 'ls -alt /tmp/twnh/scratch/U.*.data'; done

gives:

node2
-rw-r--r--    1 twnh     users    96545664 Aug 23 06:30 /tmp/twnh/scratch/U.0000000000.data
node5
-rw-r--r--    1 twnh     users    96212736 Aug 23 06:33 /tmp/twnh/scratch/U.0000000000.data
node14
-rw-r--r--    1 twnh     users    96544512 Aug 23 02:21 /tmp/twnh/scratch/U.0000000000.data
node15
-rw-r--r--    1 twnh     users    96546816 Aug 23 06:29 /tmp/twnh/scratch/U.0000000000.data
node16
-rw-r--r--    1 twnh     users    96213888 Aug 23 06:31 /tmp/twnh/scratch/U.0000000000.data

The correct size is 96546816 bytes (so node15 is the right size - the others are too small. node15 ran processes 3 and 8 of 9). Similar problems occur with 2D files. This also happens if I write to disk on the master node (although it seems a bit better).

These cases are with globalFiles=.true. If I set it false, to get tiled output, everything looks the right size when I write to local scratch disk.

Any ideas what's going on here and how I should fix it?  Have I missed something?

Thanks, Tom.




More information about the MITgcm-support mailing list