[MITgcm-support] (no subject)

Stephen Cousins steve.cousins at maine.edu
Thu Jul 14 15:48:37 EDT 2016


Hi Malte,

Thanks very much. That does help a lot. For one of the runs the total time
was 285000 seconds. SOLVE_FOR_PRESSURE is 238445 and CG3D is 222765. Those
must be overlapped, ie CG3D is part of SOLVE_FOR_PRESSURE? So CG3D is
taking up by far the most of the time. So, I think what Jody was saying
about dialing down the CG3D iterations may help a lot.

Steve

On Thu, Jul 14, 2016 at 2:26 PM, Malte Jansen <mfj at uchicago.edu> wrote:

> Steve,
>
> At the end of the run the model should produce a summary of how much time
> was spent doing what, which should be written in the STDOUT file. It should
> look something like what I pasted below. It might be helpful to look at
> that. (In addition to all the things Jody already pointed out.)
>
> Cheers,
> Malte
>
> ------------------------------------------------------
> Malte F Jansen
> Assistant Professor
> Department of the Geophysical Sciences
> The University of Chicago
> 5734 South Ellis Avenue
> Chicago, IL 60637 USA
>
>
> (PID.TID 0000.0001) %CHECKPOINT  21900000 0021900000
> (PID.TID 0000.0001)   Seconds in section "ALL
> [THE_MODEL_MAIN]":
> (PID.TID 0000.0001)           User time:   81553.5515705012
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   83194.7516629696
> (PID.TID 0000.0001)          No. starts:           1
> (PID.TID 0000.0001)           No. stops:           1
> (PID.TID 0000.0001)   Seconds in section "INITIALISE_FIXED
> [THE_MODEL_MAIN]":
> (PID.TID 0000.0001)           User time:  0.133980002254248
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   1.00463199615479
> (PID.TID 0000.0001)          No. starts:           1
> (PID.TID 0000.0001)           No. stops:           1
> (PID.TID 0000.0001)   Seconds in section "THE_MAIN_LOOP
> [THE_MODEL_MAIN]":
> (PID.TID 0000.0001)           User time:   81553.4175904989
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   83193.7469999790
> (PID.TID 0000.0001)          No. starts:           1
> (PID.TID 0000.0001)           No. stops:           1
> (PID.TID 0000.0001)   Seconds in section "INITIALISE_VARIA
> [THE_MAIN_LOOP]":
> (PID.TID 0000.0001)           User time:  7.298800349235535E-002
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:  0.855633974075317
> (PID.TID 0000.0001)          No. starts:           1
> (PID.TID 0000.0001)           No. stops:           1
> (PID.TID 0000.0001)   Seconds in section "MAIN LOOP
> [THE_MAIN_LOOP]":
> (PID.TID 0000.0001)           User time:   81553.3446024954
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   83192.8913462162
> (PID.TID 0000.0001)          No. starts:           1
> (PID.TID 0000.0001)           No. stops:           1
> (PID.TID 0000.0001)   Seconds in section "MAIN_DO_LOOP
> [THE_MAIN_LOOP]":
> (PID.TID 0000.0001)           User time:   81545.1413792670
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   83145.3943319321
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "FORWARD_STEP
> [MAIN_DO_LOOP]":
> (PID.TID 0000.0001)           User time:   81528.1345684826
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   83048.1237185001
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "DO_STATEVARS_DIAGS
> [FORWARD_STEP]":
> (PID.TID 0000.0001)           User time:   1078.10327297449
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   2827.88697195053
> (PID.TID 0000.0001)          No. starts:    14600000
> (PID.TID 0000.0001)           No. stops:    14600000
> (PID.TID 0000.0001)   Seconds in section "LOAD_FIELDS_DRIVER
> [FORWARD_STEP]":
> (PID.TID 0000.0001)           User time:   876.897960990667
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   1301.42414379120
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "EXTERNAL_FLDS_LOAD
> [LOAD_FLDS_DRIVER]":
> (PID.TID 0000.0001)           User time:   28.5411767959595
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   49.7934198379517
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "RBCS_FIELDS_LOAD      [I/O]":
> (PID.TID 0000.0001)           User time:   754.574070543051
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   1100.32427453995
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "DO_ATMOSPHERIC_PHYS
> [FORWARD_STEP]":
> (PID.TID 0000.0001)           User time:   35.8292319774628
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   49.0405611991882
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "DO_OCEANIC_PHYS
> [FORWARD_STEP]":
> (PID.TID 0000.0001)           User time:   8746.01205241680
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   9023.84966444969
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "THERMODYNAMICS
> [FORWARD_STEP]":
> (PID.TID 0000.0001)           User time:   11385.7259706557
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   11436.7402439117
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "DYNAMICS
> [FORWARD_STEP]":
> (PID.TID 0000.0001)           User time:   14802.8334793448
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   14765.4700200558
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "UPDATE_SURF_DR
> [FORWARD_STEP]":
> (PID.TID 0000.0001)           User time:   261.719356000423
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   286.261898517609
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "SOLVE_FOR_PRESSURE
> [FORWARD_STEP]":
> (PID.TID 0000.0001)           User time:   34763.6573372781
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   36091.8884937763
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "MOM_CORRECTION_STEP
> [FORWARD_STEP]":
> (PID.TID 0000.0001)           User time:   501.247010856867
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   506.126838922501
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "INTEGR_CONTINUITY
> [FORWARD_STEP]":
> (PID.TID 0000.0001)           User time:   765.209014028311
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   747.568499565125
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "TRC_CORRECTION_STEP
> [FORWARD_STEP]":
> (PID.TID 0000.0001)           User time:   47.1817142963409
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   49.4994082450867
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "BLOCKING_EXCHANGES
> [FORWARD_STEP]":
> (PID.TID 0000.0001)           User time:   7571.84065386653
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   4694.35633111000
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "DO_STATEVARS_TAVE
> [FORWARD_STEP]":
> (PID.TID 0000.0001)           User time:   6.13304901123047
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   49.2227129936218
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "MONITOR
> [FORWARD_STEP]":
> (PID.TID 0000.0001)           User time:   112.063203334808
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   214.197835445404
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "DO_THE_MODEL_IO
> [FORWARD_STEP]":
> (PID.TID 0000.0001)           User time:   9.37701416015625
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   62.9506850242615
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001)   Seconds in section "DO_WRITE_PICKUP
> [FORWARD_STEP]":
> (PID.TID 0000.0001)           User time:   8.42355561256409
> (PID.TID 0000.0001)         System time:  0.000000000000000E+000
> (PID.TID 0000.0001)     Wall clock time:   53.2940809726715
> (PID.TID 0000.0001)          No. starts:     7300000
> (PID.TID 0000.0001)           No. stops:     7300000
> (PID.TID 0000.0001) //
> ======================================================
> (PID.TID 0000.0001) // Tile <-> Tile communication statistics
> (PID.TID 0000.0001) //
> ======================================================
> (PID.TID 0000.0001) // o Tile number: 000001
> (PID.TID 0000.0001) //         No. X exchanges =              0
> (PID.TID 0000.0001) //            Max. X spins =              0
> (PID.TID 0000.0001) //            Min. X spins =     1000000000
> (PID.TID 0000.0001) //          Total. X spins =              0
> (PID.TID 0000.0001) //            Avg. X spins =       0.00E+00
> (PID.TID 0000.0001) //         No. Y exchanges =              0
> (PID.TID 0000.0001) //            Max. Y spins =              0
> (PID.TID 0000.0001) //            Min. Y spins =     1000000000
> (PID.TID 0000.0001) //          Total. Y spins =              0
> (PID.TID 0000.0001) //            Avg. Y spins =       0.00E+00
> (PID.TID 0000.0001) // o Thread number: 000001
> (PID.TID 0000.0001) //            No. barriers =     2103458542
> (PID.TID 0000.0001) //      Max. barrier spins =              1
> (PID.TID 0000.0001) //      Min. barrier spins =              1
> (PID.TID 0000.0001) //     Total barrier spins =     2103458542
> (PID.TID 0000.0001) //      Avg. barrier spins =       1.00E+00
> PROGRAM MAIN: Execution ended Normally
>
>
>
> On Jul 14, 2016, at 10:45 AM, Stephen Cousins <steve.cousins at maine.edu>
> wrote:
>
> Hi,
>
> I'm trying to help researchers from the University of Maine to run MITgcm.
> The model runs they think it should run much faster.
>
> I have run or helped run many models while working for the Ocean Modeling
> Group however this is the first time I have encountered MITgcm.
>
> With Rutgers ROMS there is a method of running a number of tiles per
> sub-domain and it seems that MITgcm can do that too. The reason for doing
> so with ROMS was (I believe) to try to get the tiles to fit in cache to
> increase performance. Is that the reason for doing so with MITgcm? We have
> tried a number of combinations with not much luck.
>
> For testing, the full domain we have is 600 x 520 x 21 using 64 processes
> and getting only 30 time steps per minute. I wondered if the domain was too
> small for that many processes so I reduced the number of processes but that
> didn't help. The plan is to triple the resolution in each horizontal
> direction and double in the vertical.
>
> Our cluster has nodes with Intel E5-2600v3 processors totaling 24 cores
> per node with FDR-10 Infiniband. The way the jobs were specified, some
> compute nodes had many processes (like 20) on them and some had only 1 or
> 2. I experimented and found that by using only 4 cores per node and only 48
> cores, it ran close to twice as fast as with 64 cores and a mix of the
> numbers of cores per node. To me this indicates that the
> inter-process-communication is high and it is saturating the memory
> bandwidth of the nodes with large process counts. That might point to the
> subdomains being too small halo region being a significant proportion of
> the subdomain) but in that case when I decreased the run to 16 cores I
> would have thought that it would have improved things quite a bit. I
> haven't profiled the code yet. I thought it might be quicker to write to
> you to get some information first.
>
> Can you please explain what the optimal layout is for performance? Is
> there an optimal size subdomain that you know of for these processors?
> Optimal number of tiles per subdomain? Also can you explain at a somewhat
> high level any other factors to consider when running the model to get
> better performance? Also, are there Intel Haswell CPU-specific compiler
> flags (we're using the Intel compilers with MVAPICH2) that you can
> recommend to us? Finally, is there a benchmark case where we can verify
> that we are getting the expected performance?
>
> Thanks very much,
>
> Steve
> --
> ________________________________________________________________
>  Steve Cousins             Supercomputer Engineer/Administrator
>  Advanced Computing Group            University of Maine System
>  244 Neville Hall (UMS Data Center)              (207) 561-3574
>  Orono ME 04469                      steve.cousins at maine.edu
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
>


-- 
________________________________________________________________
 Steve Cousins             Supercomputer Engineer/Administrator
 Advanced Computing Group            University of Maine System
 244 Neville Hall (UMS Data Center)              (207) 561-3574
 Orono ME 04469                      steve.cousins at maine.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20160714/b1b9fd8a/attachment-0001.htm>


More information about the MITgcm-support mailing list