[MITgcm-support] Clean exit from errors during MPI runs

Christopher L. Wolfe clwolfe at ucsd.edu
Mon Oct 1 15:26:04 EDT 2007


Hi modelers,

I recently had a run stop within initialization due to a missing  
pickup file. The run executed the standard error code

           write(msgbuf,'(a)')
      &      ' MDSREADFIELD: Files do not exist'
           call PRINT_MESSAGE( msgbuf, standardmessageunit,
      &                        SQUEEZE_RIGHT , mythid)
           call PRINT_ERROR( msgbuf, mythid )
           stop 'ABNORMAL END: S/R MDSREADFIELD'

(from mdsio_readfield.F) and stopped. However, the job (running on  
SDSC's BlueGene) hung in the running state until it exceeded its  
walltime 12 hours later. When I asked the people at SDSC why this  
happened and how I could prevent it in the future, they said "A  
'stop' statement won't stop the process. You need a MPI finallization  
to finish the process, otherwise the process will still be running."

I am far from an MPI expert and know even less about how the WRAPPER  
works "under the hood," so I have no idea is this is true, though  
I've had jobs stop without hanging in the running state before. I  
guess what I'm asking is if the explanation I got from SDSC is  
reasonable and, if so, am I going to have to go through the MITgcm  
sprinkling "MPI_Finalize" statements before every "stop" command?

Thanks in advance,
Christopher

-----------------------------------------------------------
Dr. Christopher L. Wolfe                   858-534-4560
Physical Oceanography Research Division    OAR 357
Scripps Institution of Oceanography, UCSD  clwolfe at ucsd.edu
-----------------------------------------------------------







More information about the MITgcm-support mailing list