[MITgcm-support] Coupled tutorial cpl_aim+ocn questions

Jean-Michel Campin jmc at mit.edu
Mon Mar 20 18:51:50 EDT 2017


Hi Xing,

I took a look at the Makefile you sent me:

1) It seems that you have 3 different copies (in code_cpl, code_ocn & code_atm)
  of the pair of files: ATMIDS.h & OCNIDS.h
  whereas I expect to find only 1 pair in dir "shared_code".
  This pair of files needs to be identical for the 3 built (the reason
  why they are in "shared_code"). I don't know what could happen if they are
  different.
  In a standard built process, this "shared_code" dir is "magically" included 
  by the 3 build_???/genmake_local files.
  Similarly, you don't need to specify "-mods=../code" when generating
  the Makefile (genmake2 command), the 3 genmake_local are taking care of this.

2) you may want to try to compile the 3 executables using genmake2 "-devel"
  option (as it's done in run_cpl_test script) which is supported with the
  optfile you are using, i.e. linux_amd64_gfortran. This might help to get
  more consistent error messages.
  
Cheers,
Jean-Michel

On Mon, Mar 20, 2017 at 05:16:53PM +0000, Lu, Xing wrote:
> Hi Jean-Michel,
> 
> I added STOP 'MITCPLR_init1: STOP before any MPI_Bcast?? to mitcplr_init1.F, and this time there is output in std_outp, but it looks a little different than yours. It didn??t write anything about UV-Ocean:
> 
> STOP MITCPLR_init1: STOP before any MPI_Bcast
>  MITCPLR_init1:            0  Coupler Rank/Size =            0  /           3
> STOP MITCPLR_init1: STOP before any MPI_Bcast
>  MITCPLR_init1:            2  UV-Atmos Rank/Size =            2  /           3
> 
> Maybe it is why the model stuck?
> 
> I also sent the 3 Makefile to you off the list.
> 
> Thanks!
> Xing
> 
> 
> 
> ?? 2017??3??16????????4:13??Jean-Michel Campin <jmc at mit.edu<mailto:jmc at mit.edu>> ??????
> 
> Hi Xing,
> 
> This is strange.
> Normally, writing to screen/std_outp happens very early, from S/R MITCPLR_init1
> (which is called by each of the 3 executables), and even before doing any real
> communication.
> 
> Just to check, could you send me (off the list) the 3 Makefile (gzip) ?
> 
> Otherwise, you may want to try to add:
> 
>         STOP 'MITCPLR_init1: STOP before any MPI_Bcast'
> 
> in pkg/compon_communic/mitcplr_init1.F, after line 43, and recompile the 3 executables.
> This should terminate the run immediatly with something in std_outp like:
> 
> MITCPLR_init1:            0  Coupler Rank/Size =            0  /           3
> STOP MITCPLR_init1: STOP before any MPI_Bcast
> MITCPLR_init1:            1  UV-Ocean Rank/Size =            1  /           3
> STOP MITCPLR_init1: STOP before any MPI_Bcast
> MITCPLR_init1:            2  UV-Atmos Rank/Size =            2  /           3
> STOP MITCPLR_init1: STOP before any MPI_Bcast
> 
> Cheers,
> Jean-Michel
> 
> On Thu, Mar 16, 2017 at 04:24:35PM +0000, Lu, Xing wrote:
> Hi Jean-Michel,
> 
> Thank you for helping!
> 
> The debugMode=.TRUE. doesn??t help. Everything stays the same.
> 
> There is no .clog file in verification/cpl_aim+ocn.
> 
> I directed the output as std_outp, and there is nothing printed on the screen. The only thing on the screen is the similar command as "mpirun &RunOpt > std_outp 2>&1??.
> 
> And std_outp is also empty.
> 
> Do you have any other suggestions that I can try? Thanks??
> 
> Xing
> 
> 
> 
> ?? 2017??3??15????????7:08??Jean-Michel Campin <jmc at mit.edu<mailto:jmc at mit.edu>> ??????
> 
> Hi Xing,
> 
> 1) You may want to try to uncomment:
>  debugMode=.TRUE.,
>  in the 2 eedata files: input_atm/eedata & input_ocn/eedata
> This forces the 2 components to flush the I/O buffer for STDOUT & STDERR
>  (+ write many more information in STDOUT)
> 
> 2) If debugMode=T  does not help, there are few things that can be checked:
>  The coupling interface is writing some log files:
>  in verification/cpl_aim+ocn,
> ls -l rank_?/*.clog
> -rw-rw-r--. 1 jmc 2909 03-15 18:39 rank_0/Coupler.0000.clog
> -rw-rw-r--. 1 jmc  735 03-15 18:38 rank_1/UV-Ocean.0001.clog
> -rw-rw-r--. 1 jmc  735 03-15 18:38 rank_2/UV-Atmos.0001.clog
> and in addition, some information are written directly to the sreen,
> unless you re-direct the output as in "run_cpl_test", e.g. line 278:
>   mpirun $RunOpt  > std_outp 2>&1
> I might be useful to know what are the content of these 4 files (std_outp & *.clog)
> to check if it's start correctly (and possibly where it's stuck)
> 
> Cheers,
> Jean-Michel
> 
> On Mon, Mar 13, 2017 at 04:14:23PM +0000, Lu, Xing wrote:
> Hi David,
> 
> Thank you very much for replying! I found the correct command to run the 3 executables together, but I??m stuck at a new place. The program seems to be frozen after running the command. It gave two STDERR and two STDOUT out but they are empty. However, the model is still running and consuming computer resources. I tried to change ntimesteps in both data files to 0 iteration and run the model again, but it still doesn??t work. Do you know why it stuck in that situation?
> 
> Thanks!
> Xing
> 
> 
> ?? 2017??3??9????????4:26??David Ferreira <dfer at mit.edu<mailto:dfer at mit.edu>> ??????
> 
> Hi Xing,
> In case you are still stuck. The next step for you is to find out which command to use to run the 3 executables together. You might want to do this outside of run_cpl_test.
> 
> Unfortunately, this command very much depends on which system you are running.
> Start from the mpi command you use to run single-executable jobs and work from there. run_cpl_test contains a few examples with mpirun, but maybe you need to use another command. For example, on one system (Cray) I use this:
> 
> aprun -n 1 -cc 0 ./executables/$exC : -n 12 ./executables/$exO : -n 12 ./executables/$exA >& OECpl$period
> 
> and on Pleiades I use this:
> mpiexec_mpt -np  1 ./executables/$exC : -np $NpO ./executables/$exO : -np $NpA ./executables/$exA > OECpl$period 2>&1
> 
> (exC, exA, and exO are the executables for the coupler/atm/ocean stored in a directory "executables")
> 
> Sometimes the best way is to ask your IT service to give you the magical combination of options to get the multiple executable running.
> 
> cheers,
> david
> 
> 
> 
> ________________________________________
> From: Lu, Xing [xlu at rsmas.miami.edu<mailto:xlu at rsmas.miami.edu>]
> Sent: Wednesday, March 01, 2017 9:26 PM
> To: mitgcm-support at mitgcm.org<mailto:mitgcm-support at mitgcm.org>
> Subject: [MITgcm-support] Coupled tutorial cpl_aim+ocn questions
> 
> Hi all,
> 
> I??m having some questions running the cpl_aim+ocn tutorial. I can create mitgcmuv executables in build_atm, build_ocn and build_cpl with no problem.
> 
> So what is the next step to run the tutorial? I have 3 executables in 3 different directories and I??m not sure how to call them. I tried run_cpl_test but it does not work, and I don??t really understand it. Does anyone know how to get the coupled model running?
> 
> Thanks a lot!
> 
> Xing
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org<mailto:MITgcm-support at mitgcm.org>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwIFEA&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=39gfeYSsIHRXpEGZm714NopqjGeC4SfuqrSemfaZuFA&s=yKcb5G4RuQtS6upDDmIzIcX3dE6UB2COrkkEjWF8_r4&e=
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwIFEA&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=39gfeYSsIHRXpEGZm714NopqjGeC4SfuqrSemfaZuFA&s=yKcb5G4RuQtS6upDDmIzIcX3dE6UB2COrkkEjWF8_r4&e=
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org<mailto:MITgcm-support at mitgcm.org>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwICAg&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=x28iUBEcU-GoOO1dOKZHNxp5Pkbp02PsPpkB2lT4x7o&s=SNprwMmnDjCaLaz9CiFn4QdRhVy6Jj7ThH5k1qfqDdU&e=
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org<mailto:MITgcm-support at mitgcm.org>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwICAg&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=x28iUBEcU-GoOO1dOKZHNxp5Pkbp02PsPpkB2lT4x7o&s=SNprwMmnDjCaLaz9CiFn4QdRhVy6Jj7ThH5k1qfqDdU&e=
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org<mailto:MITgcm-support at mitgcm.org>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwICAg&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=uptuJuXgpJxXBwMWB1EqnWUHyaEhc27H6T9FF5UrMpk&s=wK1v-t7zqSLrMzqRQry78JzMxbIqdBo_NFLbopDLANA&e=
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org<mailto:MITgcm-support at mitgcm.org>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwICAg&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=uptuJuXgpJxXBwMWB1EqnWUHyaEhc27H6T9FF5UrMpk&s=wK1v-t7zqSLrMzqRQry78JzMxbIqdBo_NFLbopDLANA&e=
> 

> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support




More information about the MITgcm-support mailing list