[MITgcm-support] Coupled tutorial cpl_aim+ocn questions

Jean-Michel Campin jmc at mit.edu
Thu Mar 16 16:13:58 EDT 2017


Hi Xing,

This is strange.
Normally, writing to screen/std_outp happens very early, from S/R MITCPLR_init1
(which is called by each of the 3 executables), and even before doing any real 
communication.

Just to check, could you send me (off the list) the 3 Makefile (gzip) ?

Otherwise, you may want to try to add:

         STOP 'MITCPLR_init1: STOP before any MPI_Bcast'

in pkg/compon_communic/mitcplr_init1.F, after line 43, and recompile the 3 executables. 
This should terminate the run immediatly with something in std_outp like:

>  MITCPLR_init1:            0  Coupler Rank/Size =            0  /           3
> STOP MITCPLR_init1: STOP before any MPI_Bcast
>  MITCPLR_init1:            1  UV-Ocean Rank/Size =            1  /           3
> STOP MITCPLR_init1: STOP before any MPI_Bcast
>  MITCPLR_init1:            2  UV-Atmos Rank/Size =            2  /           3
> STOP MITCPLR_init1: STOP before any MPI_Bcast

Cheers,
Jean-Michel

On Thu, Mar 16, 2017 at 04:24:35PM +0000, Lu, Xing wrote:
> Hi Jean-Michel,
> 
> Thank you for helping!
> 
> The debugMode=.TRUE. doesn??t help. Everything stays the same.
> 
> There is no .clog file in verification/cpl_aim+ocn.
> 
> I directed the output as std_outp, and there is nothing printed on the screen. The only thing on the screen is the similar command as "mpirun &RunOpt > std_outp 2>&1??.
> 
> And std_outp is also empty. 
> 
> Do you have any other suggestions that I can try? Thanks??
> 
> Xing
> 
> 
> 
> > ?? 2017??3??15????????7:08??Jean-Michel Campin <jmc at mit.edu> ??????
> > 
> > Hi Xing,
> > 
> > 1) You may want to try to uncomment:
> >   debugMode=.TRUE.,
> >   in the 2 eedata files: input_atm/eedata & input_ocn/eedata
> >  This forces the 2 components to flush the I/O buffer for STDOUT & STDERR
> >   (+ write many more information in STDOUT)
> > 
> > 2) If debugMode=T  does not help, there are few things that can be checked:
> >   The coupling interface is writing some log files:
> >   in verification/cpl_aim+ocn, 
> >> ls -l rank_?/*.clog
> > -rw-rw-r--. 1 jmc 2909 03-15 18:39 rank_0/Coupler.0000.clog
> > -rw-rw-r--. 1 jmc  735 03-15 18:38 rank_1/UV-Ocean.0001.clog
> > -rw-rw-r--. 1 jmc  735 03-15 18:38 rank_2/UV-Atmos.0001.clog
> >  and in addition, some information are written directly to the sreen,
> >  unless you re-direct the output as in "run_cpl_test", e.g. line 278:
> >    mpirun $RunOpt  > std_outp 2>&1
> > I might be useful to know what are the content of these 4 files (std_outp & *.clog)
> > to check if it's start correctly (and possibly where it's stuck)
> > 
> > Cheers,
> > Jean-Michel
> > 
> > On Mon, Mar 13, 2017 at 04:14:23PM +0000, Lu, Xing wrote:
> >> Hi David,
> >> 
> >> Thank you very much for replying! I found the correct command to run the 3 executables together, but I??m stuck at a new place. The program seems to be frozen after running the command. It gave two STDERR and two STDOUT out but they are empty. However, the model is still running and consuming computer resources. I tried to change ntimesteps in both data files to 0 iteration and run the model again, but it still doesn??t work. Do you know why it stuck in that situation?
> >> 
> >> Thanks!
> >> Xing
> >> 
> >> 
> >>> ?? 2017??3??9????????4:26??David Ferreira <dfer at mit.edu> ??????
> >>> 
> >>> Hi Xing,
> >>> In case you are still stuck. The next step for you is to find out which command to use to run the 3 executables together. You might want to do this outside of run_cpl_test. 
> >>> 
> >>> Unfortunately, this command very much depends on which system you are running.
> >>> Start from the mpi command you use to run single-executable jobs and work from there. run_cpl_test contains a few examples with mpirun, but maybe you need to use another command. For example, on one system (Cray) I use this:
> >>> 
> >>> aprun -n 1 -cc 0 ./executables/$exC : -n 12 ./executables/$exO : -n 12 ./executables/$exA >& OECpl$period
> >>> 
> >>> and on Pleiades I use this:
> >>> mpiexec_mpt -np  1 ./executables/$exC : -np $NpO ./executables/$exO : -np $NpA ./executables/$exA > OECpl$period 2>&1
> >>> 
> >>> (exC, exA, and exO are the executables for the coupler/atm/ocean stored in a directory "executables")
> >>> 
> >>> Sometimes the best way is to ask your IT service to give you the magical combination of options to get the multiple executable running.
> >>> 
> >>> cheers,
> >>> david
> >>> 
> >>> 
> >>> 
> >>> ________________________________________
> >>> From: Lu, Xing [xlu at rsmas.miami.edu]
> >>> Sent: Wednesday, March 01, 2017 9:26 PM
> >>> To: mitgcm-support at mitgcm.org
> >>> Subject: [MITgcm-support] Coupled tutorial cpl_aim+ocn questions
> >>> 
> >>> Hi all,
> >>> 
> >>> I??m having some questions running the cpl_aim+ocn tutorial. I can create mitgcmuv executables in build_atm, build_ocn and build_cpl with no problem.
> >>> 
> >>> So what is the next step to run the tutorial? I have 3 executables in 3 different directories and I??m not sure how to call them. I tried run_cpl_test but it does not work, and I don??t really understand it. Does anyone know how to get the coupled model running?
> >>> 
> >>> Thanks a lot!
> >>> 
> >>> Xing
> >>> _______________________________________________
> >>> MITgcm-support mailing list
> >>> MITgcm-support at mitgcm.org
> >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwIFEA&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=39gfeYSsIHRXpEGZm714NopqjGeC4SfuqrSemfaZuFA&s=yKcb5G4RuQtS6upDDmIzIcX3dE6UB2COrkkEjWF8_r4&e= 
> >>> 
> >>> _______________________________________________
> >>> MITgcm-support mailing list
> >>> MITgcm-support at mitgcm.org
> >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwIFEA&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=39gfeYSsIHRXpEGZm714NopqjGeC4SfuqrSemfaZuFA&s=yKcb5G4RuQtS6upDDmIzIcX3dE6UB2COrkkEjWF8_r4&e= 
> >> 
> >> _______________________________________________
> >> MITgcm-support mailing list
> >> MITgcm-support at mitgcm.org
> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwICAg&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=x28iUBEcU-GoOO1dOKZHNxp5Pkbp02PsPpkB2lT4x7o&s=SNprwMmnDjCaLaz9CiFn4QdRhVy6Jj7ThH5k1qfqDdU&e= 
> > 
> > _______________________________________________
> > MITgcm-support mailing list
> > MITgcm-support at mitgcm.org
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwICAg&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=x28iUBEcU-GoOO1dOKZHNxp5Pkbp02PsPpkB2lT4x7o&s=SNprwMmnDjCaLaz9CiFn4QdRhVy6Jj7ThH5k1qfqDdU&e= 
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support



More information about the MITgcm-support mailing list