[MITgcm-support] MPI issues

Martin Losch Martin.Losch at awi.de
Thu Oct 1 03:29:15 EDT 2015


Hi Renske,

The error messages come from the MPI libraries, not the MITgcm. I guess you can even google them and get a few clues where things go wrong.

I am assuming that MPI works properly on this new platform. If that’s the case, this looks like something very simple to fix, ie. some fundamental mistake like forgetting the “-mpi” in the genmake2 step or something in the mpirun/mpiexec flags. I suggest that you carefully retrace your building and running steps.

M.

> On 30 Sep 2015, at 17:30, Renske Gelderloos <rgelder2 at jhu.edu> wrote:
> 
> Hi,
> 
> I've just downloaded the MITgcm onto a (for me) new platform. Compilation did not give any big issues, but at runtime it immediately crashes with the following error:
> 
> [ln154:28821] *** An error occurred in MPI_Comm_rank
> [ln154:28821] *** on communicator MPI_COMM_WORLD
> [ln154:28821] *** MPI_ERR_COMM: invalid communicator
> [ln154:28821] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 101 with PID 28826 on
> node ln154 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> [ln318:28272] 38 more processes have sent help message help-mpi-errors.txt / mpi_errors_are_fatal
> [ln318:28272] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
> 
> 
> I have not seen this before, nor do I really know where to start debugging. Any ideas?
> 
> Renske
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support




More information about the MITgcm-support mailing list