[MITgcm-support] Clean exit from errors during MPI runs

Constantinos Evangelinos ce107 at ocean.mit.edu
Mon Oct 1 16:39:57 EDT 2007


On Mon 01 Oct 2007 15:26, Christopher L. Wolfe wrote:

> Hi modelers,
>
> I recently had a run stop within initialization due to a missing
> pickup file. The run executed the standard error code
>
>            write(msgbuf,'(a)')
>       &      ' MDSREADFIELD: Files do not exist'
>            call PRINT_MESSAGE( msgbuf, standardmessageunit,
>       &                        SQUEEZE_RIGHT , mythid)
>            call PRINT_ERROR( msgbuf, mythid )
>            stop 'ABNORMAL END: S/R MDSREADFIELD'
>
> (from mdsio_readfield.F) and stopped. However, the job (running on
> SDSC's BlueGene) hung in the running state until it exceeded its
> walltime 12 hours later. When I asked the people at SDSC why this
> happened and how I could prevent it in the future, they said "A
> 'stop' statement won't stop the process. You need a MPI finallization
> to finish the process, otherwise the process will still be running."

This is correct - depending on the MPI runtime a STOP may or may not crash the 
process(es). However the most generic way to abort execution cannot be 
MPI_Finalize as that would require synchronization among the processes (in 
this case they will all miss the pickup file but in other cases only one may 
stop). MPI_Abort is supposed to do a best-effort attempt to shut down 
everything cleanly.

> I am far from an MPI expert and know even less about how the WRAPPER
> works "under the hood," so I have no idea is this is true, though
> I've had jobs stop without hanging in the running state before. I
> guess what I'm asking is if the explanation I got from SDSC is
> reasonable and, if so, am I going to have to go through the MITgcm
> sprinkling "MPI_Finalize" statements before every "stop" command?

You can go ahead and do it with MPI_Abort instead. We could also define a 
macro _STOP (like _BARRIER) that in serial mode translates to _STOP and in 
parallel model translates to MPI_Abort.

Constantinos
-- 
Dr. Constantinos Evangelinos
Department of Earth, Atmospheric and Planetary Sciences
Massachusetts Institute of Technology




More information about the MITgcm-support mailing list