[MITgcm-support] Clean exit from errors during MPI runs
Christopher L. Wolfe
clwolfe at ucsd.edu
Mon Oct 1 15:26:04 EDT 2007
Hi modelers,
I recently had a run stop within initialization due to a missing
pickup file. The run executed the standard error code
write(msgbuf,'(a)')
& ' MDSREADFIELD: Files do not exist'
call PRINT_MESSAGE( msgbuf, standardmessageunit,
& SQUEEZE_RIGHT , mythid)
call PRINT_ERROR( msgbuf, mythid )
stop 'ABNORMAL END: S/R MDSREADFIELD'
(from mdsio_readfield.F) and stopped. However, the job (running on
SDSC's BlueGene) hung in the running state until it exceeded its
walltime 12 hours later. When I asked the people at SDSC why this
happened and how I could prevent it in the future, they said "A
'stop' statement won't stop the process. You need a MPI finallization
to finish the process, otherwise the process will still be running."
I am far from an MPI expert and know even less about how the WRAPPER
works "under the hood," so I have no idea is this is true, though
I've had jobs stop without hanging in the running state before. I
guess what I'm asking is if the explanation I got from SDSC is
reasonable and, if so, am I going to have to go through the MITgcm
sprinkling "MPI_Finalize" statements before every "stop" command?
Thanks in advance,
Christopher
-----------------------------------------------------------
Dr. Christopher L. Wolfe 858-534-4560
Physical Oceanography Research Division OAR 357
Scripps Institution of Oceanography, UCSD clwolfe at ucsd.edu
-----------------------------------------------------------
More information about the MITgcm-support
mailing list