[MITgcm-support] Coupled model running!

Jean-Michel Campin jmc at ocean.mit.edu
Tue Oct 9 10:51:23 EDT 2012


Hi Taimaz,

The coupled set-up is used by several users on different
platforms, so we should find a way for you to run it.
But regarding the script "run_cpl_test" in verification/cpl_aim+ocn/
it has not been used so much (plus it pre-date some changes 
in genmake2) and could have been better written.

So, will need to check each step to see where the problem is.

1) have you tried to run a simple (i.e., not coupled) verification
  experiment using MPI ? this would confirm that libs and mpirun
  are working well on your platform.

2) need to check which optfile is being used (run_cpl_test is not 
  well written regarding this optfile selection and it expects an 
  optfile "*+mpi" in verification directory !). 
  "run_cpl_test 2" command should write it as:
  >  Using optfile: OPTFILE_NAME (compiler=COMPILER_NAME)
  Might be useful also to sent the 1rst 100 lines of build_atm/Makefile 
  just to check.

3) need to check if run_cpl_test recognizes an OpenMPI built and
  proceeds with the right command. 
  Could you send all the printed information that "run_cpl_test 3"
  is producing ? the command should be printed as:
  > execute 'mpirun ... 
  In my case, a successful run using OpenMPI gives me:
> execute 'mpirun -np 1 ./build_cpl/mitgcmuv : -np 1 ./build_ocn/mitgcmuv : -np 1 ./build_atm/mitgcmuv' :
>  MITCPLR_init1:            2  UV-Atmos MPI_Comm_create MPI_COMM_compcplr=           6  ierr=           0
>  MITCPLR_init1:            2  UV-Atmos component num=           2  MPI_COMM=           5           6
>  MITCPLR_init1:            2  UV-Atmos Rank/Size =            1  /           2
>  MITCPLR_init1:            1  UV-Ocean Rank/Size =            1  /           2
>  CPL_READ_PARAMS: nCouplingSteps=           5
>  runoffmapFile =>>runOff_cs32_3644.bin<<= , nROmap=  3644
>  ROmap:    1  599  598 0.100280
>  ROmap: 3644 4402 4403 0.169626
>   Exporting (pid=    0 ) atmospheric fluxes at iter.         0
>   Importing (pid=    0 ) oceanic fields at iteration         0
>   Exporting (pid=    0 ) atmospheric fluxes at iter.         8
>   Importing (pid=    0 ) oceanic fields at iteration         8
>   Exporting (pid=    0 ) atmospheric fluxes at iter.        16
>   Importing (pid=    0 ) oceanic fields at iteration        16
>   Exporting (pid=    0 ) atmospheric fluxes at iter.        24
>   Importing (pid=    0 ) oceanic fields at iteration        24
>   Exporting (pid=    0 ) atmospheric fluxes at iter.        32
>   Importing (pid=    0 ) oceanic fields at iteration        32
> STOP NORMAL END
> STOP NORMAL END

Once all these steps are checked and are OK, can start to dig into the coupling
log files.

Cheers,
Jean-Michel

On Wed, Oct 03, 2012 at 04:21:49PM -0230, taimaz.bahadory wrote:
> There is a "stdout" file generated in the main directory of run, with these
> contents:
> 
> ***********************************************************************************************************************************
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> librdmacm: couldn't read ABI version.
> librdmacm: assuming: 4
> CMA: unable to get RDMA device list
> --------------------------------------------------------------------------
> [[9900,1],2]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> 
> Module: OpenFabrics (openib)
>   Host: glacdyn
> 
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
>  CPL_READ_PARAMS: nCouplingSteps=           5
>  runoffmapFile =>>runOff_cs32_3644.bin<<= , nROmap=  3644
>  ROmap:    1  599  598 0.100280
>  ROmap: 3644 4402 4403 0.169626
> [glacdyn:04864] 2 more processes have sent help message
> help-mpi-btl-base.txt / btl:no-nics
> [glacdyn:04864] Set MCA parameter "orte_base_help_aggregate" to 0 to see
> all help / error messages
> ***********************************************************************************************************************************
> 
> Maybe there would be some relation between these error-like messages and
> run stuck!
> 
> 
> 
> 
> 
> On Wed, Oct 3, 2012 at 2:13 PM, taimaz.bahadory <taimaz.bahadory at mun.ca>wrote:
> 
> > Re-Hi;
> >
> > Yes; as I said, it stuck again. I check the CPU. It is fully loaded, but
> > the output file is not updated! It is only a few seconds younger than the
> > run initiation.
> >
> >
> >
> > On Wed, Oct 3, 2012 at 1:32 PM, taimaz.bahadory <taimaz.bahadory at mun.ca>wrote:
> >
> >> Hi;
> >>
> >> I guess I've tried it too, but the same problem occurred (I will try it
> >> again right now to check it again).
> >> Will report soon
> >> Thanks
> >>
> >>
> >>
> >> On Wed, Oct 3, 2012 at 1:29 PM, Jean-Michel Campin <jmc at ocean.mit.edu>wrote:
> >>
> >>> Hi Taimaz,
> >>>
> >>> Can you try without MNC ? The current set-up (cpl_aim+ocn) does not
> >>> use MNC (useMNC=.TRUE., is commented out in both input_atm/data.pkg
> >>> and input_ocn/data.pkg) so if there was a problem in the coupled set-up
> >>> code
> >>> with NetCDF output, might not have seen it (since I did not try recently
> >>> with it).
> >>>
> >>> Cheers,
> >>> Jean-Michel
> >>>
> >>> On Tue, Sep 25, 2012 at 11:54:59AM -0230, taimaz.bahadory wrote:
> >>> > Hi everybody;
> >>> >
> >>> > I'm trying to run the coupled model example (cpl_aim+ocn) in the
> >>> > verification directory. All the three first steps (Cleaning; Compiling
> >>> and
> >>> > Making; Copying input files) passed with no error; but when I run the
> >>> > coupler, it starts and creates the netCDF output files initially, but
> >>> stops
> >>> > updating them and also the output files, although the three "mitgcmuv"
> >>> > files are still running! It's like a program freezing.
> >>> > Has anybody been stuck in such a situation?
> >>>
> >>> > _______________________________________________
> >>> > MITgcm-support mailing list
> >>> > MITgcm-support at mitgcm.org
> >>> > http://mitgcm.org/mailman/listinfo/mitgcm-support
> >>>
> >>>
> >>> _______________________________________________
> >>> MITgcm-support mailing list
> >>> MITgcm-support at mitgcm.org
> >>> http://mitgcm.org/mailman/listinfo/mitgcm-support
> >>>
> >>
> >>
> >

> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support




More information about the MITgcm-support mailing list