[MITgcm-devel] mpi_finalize
Jean-Michel Campin
jmc at ocean.mit.edu
Thu Mar 29 13:20:56 EDT 2012
Hi Martin,
I am currently looking at this "termination" problem (with MPI + OpenMP),
since, with some (old) mpich version, sometimes it hangs or finishes
but leave some process behind (that needs to be killed afterward).
Now regarding your question, we have already 2 S/R to end cleanly:
1) ALL_PROC_DIE : needs to be called just before the "stop",
but it only works if all the MPI proc call it.
2) And for the case where few (but not all) MPI proc detect an error,
there is an other S/R: STOP_IF_ERROR which collects the error
and then decide to stop. But this 2nd one is not used currently
(and I don't know what TAF will do with this), and the global_sum
can slow down the run if used too often.
The advantage of ALL_PROC_DIE + a STOP compared to a S/R like STOP_THE_MODEL
(which would contain the stop) is that TAF can see the stop and we provide
some flow directives for ALL_PROC_DIE (eesupp.flow).
And ALL_PROC_DIE is used (e.g., ini pkg/monitor/mon_solution.F)
but there are many places where we the call is missing.
Cheers,
Jean-Michel
On Thu, Mar 29, 2012 at 09:49:57AM +0200, Martin Losch wrote:
> Hi there,
>
> as you know, my mpi-skills are not very good, which should explain the level of question:
>
> I often get this type of error message when the model encounters a Fortran "STOP" statement, because some parameters are not set properly, or netcdf files are not overwritten.
>
> MPI: MPI_COMM_WORLD rank 0 has terminated without calling MPI_Finalize()
> MPI: aborting job
>
> Some system then complain about not being able to terminate the job properly and ask for manual intervention (e.g. run a LAM command or whatever), and sometimes some instances of mitgcmuv do remain and are difficult to delete without root-privildge
>
> Would it be useful to replace all "STOP" statements with a S/R STOP_THE_MODEL, or some other fancy name (maybe we even have this routine and I just don't know about it?), where the system is then shut down "cleanly" (with calling MPI_finalize, if necessary)? Is that difficult to do (all sorts of different possilbilies, w/ MPI, w/out MPI, etc.)? Is it worth it?
>
> Martin
>
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list