[MITgcm-support] Coupled tutorial cpl_aim+ocn questions

Lu, Xing xlu at rsmas.miami.edu
Wed Mar 22 18:36:36 EDT 2017


Hi Jean-Michel,

I removed ATMIDS.h and OCNIDS.h from code_cpl, code_ocn and code_atm, and also removed “-mods=../code” from the command, but this time it gives some error messages when doing the “make depend” step for cpl:

accept_component_registrations.F:30: error: ATMIDS.h: No such file or directory
accept_component_registrations.F:31: error: OCNIDS.h: No such file or directory
cpl_recv_atm_atmconfig.F:26: error: ATMIDS.h: No such file or directory
cpl_recv_atm_fields.F:28: error: ATMIDS.h: No such file or directory
cpl_recv_ocn_fields.F:29: error: OCNIDS.h: No such file or directory
cpl_recv_ocn_ocnconfig.F:26: error: OCNIDS.h: No such file or directory
cpl_register_atm.F:20: error: ATMIDS.h: No such file or directory
cpl_register_ocn.F:21: error: OCNIDS.h: No such file or directory
cpl_send_atm_cplparms.F:24: error: ATMIDS.h: No such file or directory
cpl_send_atm_fields.F:29: error: ATMIDS.h: No such file or directory
cpl_send_atm_ocnconfig.F:25: error: ATMIDS.h: No such file or directory
cpl_send_ocn_atmconfig.F:25: error: ATMIDS.h: No such file or directory
cpl_send_ocn_atmconfig.F:28: error: OCNIDS.h: No such file or directory
cpl_send_ocn_cplparms.F:24: error: OCNIDS.h: No such file or directory
cpl_send_ocn_fields.F:30: error: OCNIDS.h: No such file or directory

Similar errors were given during “make depend” step for ocn and atm. The same thing happened when using “-devel”. Can these errors be ignored?

Thanks!
Xing




在 2017年3月20日,下午6:51,Jean-Michel Campin <jmc at mit.edu<mailto:jmc at mit.edu>> 写道:

Hi Xing,

I took a look at the Makefile you sent me:

1) It seems that you have 3 different copies (in code_cpl, code_ocn & code_atm)
 of the pair of files: ATMIDS.h & OCNIDS.h
 whereas I expect to find only 1 pair in dir "shared_code".
 This pair of files needs to be identical for the 3 built (the reason
 why they are in "shared_code"). I don't know what could happen if they are
 different.
 In a standard built process, this "shared_code" dir is "magically" included
 by the 3 build_???/genmake_local files.
 Similarly, you don't need to specify "-mods=../code" when generating
 the Makefile (genmake2 command), the 3 genmake_local are taking care of this.

2) you may want to try to compile the 3 executables using genmake2 "-devel"
 option (as it's done in run_cpl_test script) which is supported with the
 optfile you are using, i.e. linux_amd64_gfortran. This might help to get
 more consistent error messages.

Cheers,
Jean-Michel

On Mon, Mar 20, 2017 at 05:16:53PM +0000, Lu, Xing wrote:
Hi Jean-Michel,

I added STOP 'MITCPLR_init1: STOP before any MPI_Bcast?? to mitcplr_init1.F, and this time there is output in std_outp, but it looks a little different than yours. It didn??t write anything about UV-Ocean:

STOP MITCPLR_init1: STOP before any MPI_Bcast
MITCPLR_init1:            0  Coupler Rank/Size =            0  /           3
STOP MITCPLR_init1: STOP before any MPI_Bcast
MITCPLR_init1:            2  UV-Atmos Rank/Size =            2  /           3

Maybe it is why the model stuck?

I also sent the 3 Makefile to you off the list.

Thanks!
Xing



?? 2017??3??16????????4:13??Jean-Michel Campin <jmc at mit.edu<mailto:jmc at mit.edu><mailto:jmc at mit.edu>> ??????

Hi Xing,

This is strange.
Normally, writing to screen/std_outp happens very early, from S/R MITCPLR_init1
(which is called by each of the 3 executables), and even before doing any real
communication.

Just to check, could you send me (off the list) the 3 Makefile (gzip) ?

Otherwise, you may want to try to add:

       STOP 'MITCPLR_init1: STOP before any MPI_Bcast'

in pkg/compon_communic/mitcplr_init1.F, after line 43, and recompile the 3 executables.
This should terminate the run immediatly with something in std_outp like:

MITCPLR_init1:            0  Coupler Rank/Size =            0  /           3
STOP MITCPLR_init1: STOP before any MPI_Bcast
MITCPLR_init1:            1  UV-Ocean Rank/Size =            1  /           3
STOP MITCPLR_init1: STOP before any MPI_Bcast
MITCPLR_init1:            2  UV-Atmos Rank/Size =            2  /           3
STOP MITCPLR_init1: STOP before any MPI_Bcast

Cheers,
Jean-Michel

On Thu, Mar 16, 2017 at 04:24:35PM +0000, Lu, Xing wrote:
Hi Jean-Michel,

Thank you for helping!

The debugMode=.TRUE. doesn??t help. Everything stays the same.

There is no .clog file in verification/cpl_aim+ocn.

I directed the output as std_outp, and there is nothing printed on the screen. The only thing on the screen is the similar command as "mpirun &RunOpt > std_outp 2>&1??.

And std_outp is also empty.

Do you have any other suggestions that I can try? Thanks??

Xing



?? 2017??3??15????????7:08??Jean-Michel Campin <jmc at mit.edu<mailto:jmc at mit.edu><mailto:jmc at mit.edu>> ??????

Hi Xing,

1) You may want to try to uncomment:
debugMode=.TRUE.,
in the 2 eedata files: input_atm/eedata & input_ocn/eedata
This forces the 2 components to flush the I/O buffer for STDOUT & STDERR
(+ write many more information in STDOUT)

2) If debugMode=T  does not help, there are few things that can be checked:
The coupling interface is writing some log files:
in verification/cpl_aim+ocn,
ls -l rank_?/*.clog
-rw-rw-r--. 1 jmc 2909 03-15 18:39 rank_0/Coupler.0000.clog
-rw-rw-r--. 1 jmc  735 03-15 18:38 rank_1/UV-Ocean.0001.clog
-rw-rw-r--. 1 jmc  735 03-15 18:38 rank_2/UV-Atmos.0001.clog
and in addition, some information are written directly to the sreen,
unless you re-direct the output as in "run_cpl_test", e.g. line 278:
 mpirun $RunOpt  > std_outp 2>&1
I might be useful to know what are the content of these 4 files (std_outp & *.clog)
to check if it's start correctly (and possibly where it's stuck)

Cheers,
Jean-Michel

On Mon, Mar 13, 2017 at 04:14:23PM +0000, Lu, Xing wrote:
Hi David,

Thank you very much for replying! I found the correct command to run the 3 executables together, but I??m stuck at a new place. The program seems to be frozen after running the command. It gave two STDERR and two STDOUT out but they are empty. However, the model is still running and consuming computer resources. I tried to change ntimesteps in both data files to 0 iteration and run the model again, but it still doesn??t work. Do you know why it stuck in that situation?

Thanks!
Xing


?? 2017??3??9????????4:26??David Ferreira <dfer at mit.edu<mailto:dfer at mit.edu><mailto:dfer at mit.edu>> ??????

Hi Xing,
In case you are still stuck. The next step for you is to find out which command to use to run the 3 executables together. You might want to do this outside of run_cpl_test.

Unfortunately, this command very much depends on which system you are running.
Start from the mpi command you use to run single-executable jobs and work from there. run_cpl_test contains a few examples with mpirun, but maybe you need to use another command. For example, on one system (Cray) I use this:

aprun -n 1 -cc 0 ./executables/$exC : -n 12 ./executables/$exO : -n 12 ./executables/$exA >& OECpl$period

and on Pleiades I use this:
mpiexec_mpt -np  1 ./executables/$exC : -np $NpO ./executables/$exO : -np $NpA ./executables/$exA > OECpl$period 2>&1

(exC, exA, and exO are the executables for the coupler/atm/ocean stored in a directory "executables")

Sometimes the best way is to ask your IT service to give you the magical combination of options to get the multiple executable running.

cheers,
david



________________________________________
From: Lu, Xing [xlu at rsmas.miami.edu<mailto:xlu at rsmas.miami.edu><mailto:xlu at rsmas.miami.edu>]
Sent: Wednesday, March 01, 2017 9:26 PM
To: mitgcm-support at mitgcm.org<mailto:mitgcm-support at mitgcm.org><mailto:mitgcm-support at mitgcm.org>
Subject: [MITgcm-support] Coupled tutorial cpl_aim+ocn questions

Hi all,

I??m having some questions running the cpl_aim+ocn tutorial. I can create mitgcmuv executables in build_atm, build_ocn and build_cpl with no problem.

So what is the next step to run the tutorial? I have 3 executables in 3 different directories and I??m not sure how to call them. I tried run_cpl_test but it does not work, and I don??t really understand it. Does anyone know how to get the coupled model running?

Thanks a lot!

Xing
_______________________________________________
MITgcm-support mailing list
MITgcm-support at mitgcm.org<mailto:MITgcm-support at mitgcm.org><mailto:MITgcm-support at mitgcm.org>
https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwIFEA&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=39gfeYSsIHRXpEGZm714NopqjGeC4SfuqrSemfaZuFA&s=yKcb5G4RuQtS6upDDmIzIcX3dE6UB2COrkkEjWF8_r4&e=

_______________________________________________
MITgcm-support mailing list
MITgcm-support at mitgcm.org<mailto:MITgcm-support at mitgcm.org>
https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwIFEA&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=39gfeYSsIHRXpEGZm714NopqjGeC4SfuqrSemfaZuFA&s=yKcb5G4RuQtS6upDDmIzIcX3dE6UB2COrkkEjWF8_r4&e=

_______________________________________________
MITgcm-support mailing list
MITgcm-support at mitgcm.org<mailto:MITgcm-support at mitgcm.org><mailto:MITgcm-support at mitgcm.org>
https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwICAg&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=x28iUBEcU-GoOO1dOKZHNxp5Pkbp02PsPpkB2lT4x7o&s=SNprwMmnDjCaLaz9CiFn4QdRhVy6Jj7ThH5k1qfqDdU&e=

_______________________________________________
MITgcm-support mailing list
MITgcm-support at mitgcm.org<mailto:MITgcm-support at mitgcm.org><mailto:MITgcm-support at mitgcm.org>
https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwICAg&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=x28iUBEcU-GoOO1dOKZHNxp5Pkbp02PsPpkB2lT4x7o&s=SNprwMmnDjCaLaz9CiFn4QdRhVy6Jj7ThH5k1qfqDdU&e=

_______________________________________________
MITgcm-support mailing list
MITgcm-support at mitgcm.org<mailto:MITgcm-support at mitgcm.org><mailto:MITgcm-support at mitgcm.org>
https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwICAg&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=uptuJuXgpJxXBwMWB1EqnWUHyaEhc27H6T9FF5UrMpk&s=wK1v-t7zqSLrMzqRQry78JzMxbIqdBo_NFLbopDLANA&e=

_______________________________________________
MITgcm-support mailing list
MITgcm-support at mitgcm.org<mailto:MITgcm-support at mitgcm.org><mailto:MITgcm-support at mitgcm.org>
https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwICAg&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=uptuJuXgpJxXBwMWB1EqnWUHyaEhc27H6T9FF5UrMpk&s=wK1v-t7zqSLrMzqRQry78JzMxbIqdBo_NFLbopDLANA&e=


_______________________________________________
MITgcm-support mailing list
MITgcm-support at mitgcm.org<mailto:MITgcm-support at mitgcm.org>
https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwICAg&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=QTcVjqSGXZqCX35hFj3paxmKn5k6Sycd7nm8DLYiDb0&s=qOZPay0X0Q2viSXTZwG94HQo8CjeJo0kFEt9ziQWR2Q&e=


_______________________________________________
MITgcm-support mailing list
MITgcm-support at mitgcm.org<mailto:MITgcm-support at mitgcm.org>
https://urldefense.proofpoint.com/v2/url?u=http-3A__mitgcm.org_mailman_listinfo_mitgcm-2Dsupport&d=DwICAg&c=y2w-uYmhgFWijp_IQN0DhA&r=DnSHG_zP2bDt5JdodbR-S6ABxS0tQOiReyQ2-3zDX6M&m=QTcVjqSGXZqCX35hFj3paxmKn5k6Sycd7nm8DLYiDb0&s=qOZPay0X0Q2viSXTZwG94HQo8CjeJo0kFEt9ziQWR2Q&e=

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20170322/9b3860a6/attachment-0001.htm>


More information about the MITgcm-support mailing list