[MITgcm-support] (no subject)
Stephen Cousins
steve.cousins at maine.edu
Thu Jul 14 15:48:37 EDT 2016
Hi Malte,
Thanks very much. That does help a lot. For one of the runs the total time
was 285000 seconds. SOLVE_FOR_PRESSURE is 238445 and CG3D is 222765. Those
must be overlapped, ie CG3D is part of SOLVE_FOR_PRESSURE? So CG3D is
taking up by far the most of the time. So, I think what Jody was saying
about dialing down the CG3D iterations may help a lot.
Steve
On Thu, Jul 14, 2016 at 2:26 PM, Malte Jansen <mfj at uchicago.edu> wrote:
> Steve,
>
> At the end of the run the model should produce a summary of how much time
> was spent doing what, which should be written in the STDOUT file. It should
> look something like what I pasted below. It might be helpful to look at
> that. (In addition to all the things Jody already pointed out.)
>
> Cheers,
> Malte
>
> ------------------------------------------------------
> Malte F Jansen
> Assistant Professor
> Department of the Geophysical Sciences
> The University of Chicago
> 5734 South Ellis Avenue
> Chicago, IL 60637 USA
>
>
> (PID.TID 0000.0001) %CHECKPOINT 21900000 0021900000
> (PID.TID 0000.0001) Seconds in section "ALL
> [THE_MODEL_MAIN]":
> (PID.TID 0000.0001) User time: 81553.5515705012
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 83194.7516629696
> (PID.TID 0000.0001) No. starts: 1
> (PID.TID 0000.0001) No. stops: 1
> (PID.TID 0000.0001) Seconds in section "INITIALISE_FIXED
> [THE_MODEL_MAIN]":
> (PID.TID 0000.0001) User time: 0.133980002254248
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 1.00463199615479
> (PID.TID 0000.0001) No. starts: 1
> (PID.TID 0000.0001) No. stops: 1
> (PID.TID 0000.0001) Seconds in section "THE_MAIN_LOOP
> [THE_MODEL_MAIN]":
> (PID.TID 0000.0001) User time: 81553.4175904989
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 83193.7469999790
> (PID.TID 0000.0001) No. starts: 1
> (PID.TID 0000.0001) No. stops: 1
> (PID.TID 0000.0001) Seconds in section "INITIALISE_VARIA
> [THE_MAIN_LOOP]":
> (PID.TID 0000.0001) User time: 7.298800349235535E-002
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 0.855633974075317
> (PID.TID 0000.0001) No. starts: 1
> (PID.TID 0000.0001) No. stops: 1
> (PID.TID 0000.0001) Seconds in section "MAIN LOOP
> [THE_MAIN_LOOP]":
> (PID.TID 0000.0001) User time: 81553.3446024954
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 83192.8913462162
> (PID.TID 0000.0001) No. starts: 1
> (PID.TID 0000.0001) No. stops: 1
> (PID.TID 0000.0001) Seconds in section "MAIN_DO_LOOP
> [THE_MAIN_LOOP]":
> (PID.TID 0000.0001) User time: 81545.1413792670
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 83145.3943319321
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "FORWARD_STEP
> [MAIN_DO_LOOP]":
> (PID.TID 0000.0001) User time: 81528.1345684826
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 83048.1237185001
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "DO_STATEVARS_DIAGS
> [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 1078.10327297449
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 2827.88697195053
> (PID.TID 0000.0001) No. starts: 14600000
> (PID.TID 0000.0001) No. stops: 14600000
> (PID.TID 0000.0001) Seconds in section "LOAD_FIELDS_DRIVER
> [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 876.897960990667
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 1301.42414379120
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "EXTERNAL_FLDS_LOAD
> [LOAD_FLDS_DRIVER]":
> (PID.TID 0000.0001) User time: 28.5411767959595
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 49.7934198379517
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "RBCS_FIELDS_LOAD [I/O]":
> (PID.TID 0000.0001) User time: 754.574070543051
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 1100.32427453995
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "DO_ATMOSPHERIC_PHYS
> [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 35.8292319774628
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 49.0405611991882
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "DO_OCEANIC_PHYS
> [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 8746.01205241680
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 9023.84966444969
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "THERMODYNAMICS
> [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 11385.7259706557
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 11436.7402439117
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "DYNAMICS
> [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 14802.8334793448
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 14765.4700200558
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "UPDATE_SURF_DR
> [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 261.719356000423
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 286.261898517609
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "SOLVE_FOR_PRESSURE
> [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 34763.6573372781
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 36091.8884937763
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "MOM_CORRECTION_STEP
> [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 501.247010856867
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 506.126838922501
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "INTEGR_CONTINUITY
> [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 765.209014028311
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 747.568499565125
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "TRC_CORRECTION_STEP
> [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 47.1817142963409
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 49.4994082450867
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "BLOCKING_EXCHANGES
> [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 7571.84065386653
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 4694.35633111000
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "DO_STATEVARS_TAVE
> [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 6.13304901123047
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 49.2227129936218
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "MONITOR
> [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 112.063203334808
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 214.197835445404
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "DO_THE_MODEL_IO
> [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 9.37701416015625
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 62.9506850242615
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "DO_WRITE_PICKUP
> [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 8.42355561256409
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 53.2940809726715
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) //
> ======================================================
> (PID.TID 0000.0001) // Tile <-> Tile communication statistics
> (PID.TID 0000.0001) //
> ======================================================
> (PID.TID 0000.0001) // o Tile number: 000001
> (PID.TID 0000.0001) // No. X exchanges = 0
> (PID.TID 0000.0001) // Max. X spins = 0
> (PID.TID 0000.0001) // Min. X spins = 1000000000
> (PID.TID 0000.0001) // Total. X spins = 0
> (PID.TID 0000.0001) // Avg. X spins = 0.00E+00
> (PID.TID 0000.0001) // No. Y exchanges = 0
> (PID.TID 0000.0001) // Max. Y spins = 0
> (PID.TID 0000.0001) // Min. Y spins = 1000000000
> (PID.TID 0000.0001) // Total. Y spins = 0
> (PID.TID 0000.0001) // Avg. Y spins = 0.00E+00
> (PID.TID 0000.0001) // o Thread number: 000001
> (PID.TID 0000.0001) // No. barriers = 2103458542
> (PID.TID 0000.0001) // Max. barrier spins = 1
> (PID.TID 0000.0001) // Min. barrier spins = 1
> (PID.TID 0000.0001) // Total barrier spins = 2103458542
> (PID.TID 0000.0001) // Avg. barrier spins = 1.00E+00
> PROGRAM MAIN: Execution ended Normally
>
>
>
> On Jul 14, 2016, at 10:45 AM, Stephen Cousins <steve.cousins at maine.edu>
> wrote:
>
> Hi,
>
> I'm trying to help researchers from the University of Maine to run MITgcm.
> The model runs they think it should run much faster.
>
> I have run or helped run many models while working for the Ocean Modeling
> Group however this is the first time I have encountered MITgcm.
>
> With Rutgers ROMS there is a method of running a number of tiles per
> sub-domain and it seems that MITgcm can do that too. The reason for doing
> so with ROMS was (I believe) to try to get the tiles to fit in cache to
> increase performance. Is that the reason for doing so with MITgcm? We have
> tried a number of combinations with not much luck.
>
> For testing, the full domain we have is 600 x 520 x 21 using 64 processes
> and getting only 30 time steps per minute. I wondered if the domain was too
> small for that many processes so I reduced the number of processes but that
> didn't help. The plan is to triple the resolution in each horizontal
> direction and double in the vertical.
>
> Our cluster has nodes with Intel E5-2600v3 processors totaling 24 cores
> per node with FDR-10 Infiniband. The way the jobs were specified, some
> compute nodes had many processes (like 20) on them and some had only 1 or
> 2. I experimented and found that by using only 4 cores per node and only 48
> cores, it ran close to twice as fast as with 64 cores and a mix of the
> numbers of cores per node. To me this indicates that the
> inter-process-communication is high and it is saturating the memory
> bandwidth of the nodes with large process counts. That might point to the
> subdomains being too small halo region being a significant proportion of
> the subdomain) but in that case when I decreased the run to 16 cores I
> would have thought that it would have improved things quite a bit. I
> haven't profiled the code yet. I thought it might be quicker to write to
> you to get some information first.
>
> Can you please explain what the optimal layout is for performance? Is
> there an optimal size subdomain that you know of for these processors?
> Optimal number of tiles per subdomain? Also can you explain at a somewhat
> high level any other factors to consider when running the model to get
> better performance? Also, are there Intel Haswell CPU-specific compiler
> flags (we're using the Intel compilers with MVAPICH2) that you can
> recommend to us? Finally, is there a benchmark case where we can verify
> that we are getting the expected performance?
>
> Thanks very much,
>
> Steve
> --
> ________________________________________________________________
> Steve Cousins Supercomputer Engineer/Administrator
> Advanced Computing Group University of Maine System
> 244 Neville Hall (UMS Data Center) (207) 561-3574
> Orono ME 04469 steve.cousins at maine.edu
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>
--
________________________________________________________________
Steve Cousins Supercomputer Engineer/Administrator
Advanced Computing Group University of Maine System
244 Neville Hall (UMS Data Center) (207) 561-3574
Orono ME 04469 steve.cousins at maine.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20160714/b1b9fd8a/attachment-0001.htm>
More information about the MITgcm-support
mailing list