[MITgcm-support] (no subject)
Jody Klymak
jklymak at uvic.ca
Thu Jul 14 16:42:48 EDT 2016
> On 14 Jul 2016, at 12:48 PM, Stephen Cousins <steve.cousins at maine.edu> wrote:
>
> Thanks very much. That does help a lot. For one of the runs the total time was 285000 seconds. SOLVE_FOR_PRESSURE is 238445 and CG3D is 222765. Those must be overlapped, ie CG3D is part of SOLVE_FOR_PRESSURE? So CG3D is taking up by far the most of the time. So, I think what Jody was saying about dialing down the CG3D iterations may help a lot.
… and making sure its not trying to calculate non-hydrostatic pressure for a velocity field with a lot of noise in it.
Yes, cg3d is called inside solve_for_pressure. Sounds like you found the issue. I checked some DNS-like runs I did, and I had:
cg2dMaxIters=400,
cg2dTargetResidual=1.E-13,
cg3dMaxIters=100,
cg3dTargetResidual=1.E-13,
So presumably that would speed things up by a factor of about four (with a loss in fidelity). But my grid was truly isotropic, in that dz=dx. If dx>>dz then again, I’m not sure I think nonHydrostatic makes any sense.
Cheers, Jody
>
> Steve
>
> On Thu, Jul 14, 2016 at 2:26 PM, Malte Jansen <mfj at uchicago.edu <mailto:mfj at uchicago.edu>> wrote:
> Steve,
>
> At the end of the run the model should produce a summary of how much time was spent doing what, which should be written in the STDOUT file. It should look something like what I pasted below. It might be helpful to look at that. (In addition to all the things Jody already pointed out.)
>
> Cheers,
> Malte
>
> ------------------------------------------------------
> Malte F Jansen
> Assistant Professor
> Department of the Geophysical Sciences
> The University of Chicago
> 5734 South Ellis Avenue
> Chicago, IL 60637 USA
>
>
> (PID.TID 0000.0001) %CHECKPOINT 21900000 0021900000
> (PID.TID 0000.0001) Seconds in section "ALL [THE_MODEL_MAIN]":
> (PID.TID 0000.0001) User time: 81553.5515705012 <tel:5515705012>
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 83194.7516629696
> (PID.TID 0000.0001) No. starts: 1
> (PID.TID 0000.0001) No. stops: 1
> (PID.TID 0000.0001) Seconds in section "INITIALISE_FIXED [THE_MODEL_MAIN]":
> (PID.TID 0000.0001) User time: 0.133980002254248
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 1.00463199615479
> (PID.TID 0000.0001) No. starts: 1
> (PID.TID 0000.0001) No. stops: 1
> (PID.TID 0000.0001) Seconds in section "THE_MAIN_LOOP [THE_MODEL_MAIN]":
> (PID.TID 0000.0001) User time: 81553.4175904989 <tel:4175904989>
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 83193.7469999790
> (PID.TID 0000.0001) No. starts: 1
> (PID.TID 0000.0001) No. stops: 1
> (PID.TID 0000.0001) Seconds in section "INITIALISE_VARIA [THE_MAIN_LOOP]":
> (PID.TID 0000.0001) User time: 7.298800349235535E-002
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 0.855633974075317
> (PID.TID 0000.0001) No. starts: 1
> (PID.TID 0000.0001) No. stops: 1
> (PID.TID 0000.0001) Seconds in section "MAIN LOOP [THE_MAIN_LOOP]":
> (PID.TID 0000.0001) User time: 81553.3446024954
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 83192.8913462162
> (PID.TID 0000.0001) No. starts: 1
> (PID.TID 0000.0001) No. stops: 1
> (PID.TID 0000.0001) Seconds in section "MAIN_DO_LOOP [THE_MAIN_LOOP]":
> (PID.TID 0000.0001) User time: 81545.1413792670
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 83145.3943319321
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "FORWARD_STEP [MAIN_DO_LOOP]":
> (PID.TID 0000.0001) User time: 81528.1345684826
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 83048.1237185001
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "DO_STATEVARS_DIAGS [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 1078.10327297449
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 2827.88697195053
> (PID.TID 0000.0001) No. starts: 14600000
> (PID.TID 0000.0001) No. stops: 14600000
> (PID.TID 0000.0001) Seconds in section "LOAD_FIELDS_DRIVER [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 876.897960990667
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 1301.42414379120
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "EXTERNAL_FLDS_LOAD [LOAD_FLDS_DRIVER]":
> (PID.TID 0000.0001) User time: 28.5411767959595
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 49.7934198379517
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "RBCS_FIELDS_LOAD [I/O]":
> (PID.TID 0000.0001) User time: 754.574070543051
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 1100.32427453995
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "DO_ATMOSPHERIC_PHYS [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 35.8292319774628
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 49.0405611991882
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "DO_OCEANIC_PHYS [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 8746.01205241680
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 9023.84966444969
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "THERMODYNAMICS [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 11385.7259706557 <tel:7259706557>
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 11436.7402439117 <tel:7402439117>
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "DYNAMICS [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 14802.8334793448
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 14765.4700200558
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "UPDATE_SURF_DR [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 261.719356000423
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 286.261898517609
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "SOLVE_FOR_PRESSURE [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 34763.6573372781 <tel:6573372781>
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 36091.8884937763 <tel:8884937763>
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "MOM_CORRECTION_STEP [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 501.247010856867
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 506.126838922501
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "INTEGR_CONTINUITY [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 765.209014028311
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 747.568499565125
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "TRC_CORRECTION_STEP [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 47.1817142963409
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 49.4994082450867
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "BLOCKING_EXCHANGES [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 7571.84065386653
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 4694.35633111000
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "DO_STATEVARS_TAVE [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 6.13304901123047
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 49.2227129936218
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "MONITOR [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 112.063203334808
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 214.197835445404
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "DO_THE_MODEL_IO [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 9.37701416015625
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 62.9506850242615
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) Seconds in section "DO_WRITE_PICKUP [FORWARD_STEP]":
> (PID.TID 0000.0001) User time: 8.42355561256409
> (PID.TID 0000.0001) System time: 0.000000000000000E+000
> (PID.TID 0000.0001) Wall clock time: 53.2940809726715
> (PID.TID 0000.0001) No. starts: 7300000
> (PID.TID 0000.0001) No. stops: 7300000
> (PID.TID 0000.0001) // ======================================================
> (PID.TID 0000.0001) // Tile <-> Tile communication statistics
> (PID.TID 0000.0001) // ======================================================
> (PID.TID 0000.0001) // o Tile number: 000001
> (PID.TID 0000.0001) // No. X exchanges = 0
> (PID.TID 0000.0001) // Max. X spins = 0
> (PID.TID 0000.0001) // Min. X spins = 1000000000
> (PID.TID 0000.0001) // Total. X spins = 0
> (PID.TID 0000.0001) // Avg. X spins = 0.00E+00
> (PID.TID 0000.0001) // No. Y exchanges = 0
> (PID.TID 0000.0001) // Max. Y spins = 0
> (PID.TID 0000.0001) // Min. Y spins = 1000000000
> (PID.TID 0000.0001) // Total. Y spins = 0
> (PID.TID 0000.0001) // Avg. Y spins = 0.00E+00
> (PID.TID 0000.0001) // o Thread number: 000001
> (PID.TID 0000.0001) // No. barriers = 2103458542 <tel:2103458542>
> (PID.TID 0000.0001) // Max. barrier spins = 1
> (PID.TID 0000.0001) // Min. barrier spins = 1
> (PID.TID 0000.0001) // Total barrier spins = 2103458542 <tel:2103458542>
> (PID.TID 0000.0001) // Avg. barrier spins = 1.00E+00
> PROGRAM MAIN: Execution ended Normally
>
>
>
>> On Jul 14, 2016, at 10:45 AM, Stephen Cousins <steve.cousins at maine.edu <mailto:steve.cousins at maine.edu>> wrote:
>>
>> Hi,
>>
>> I'm trying to help researchers from the University of Maine to run MITgcm. The model runs they think it should run much faster.
>>
>> I have run or helped run many models while working for the Ocean Modeling Group however this is the first time I have encountered MITgcm.
>>
>> With Rutgers ROMS there is a method of running a number of tiles per sub-domain and it seems that MITgcm can do that too. The reason for doing so with ROMS was (I believe) to try to get the tiles to fit in cache to increase performance. Is that the reason for doing so with MITgcm? We have tried a number of combinations with not much luck.
>>
>> For testing, the full domain we have is 600 x 520 x 21 using 64 processes and getting only 30 time steps per minute. I wondered if the domain was too small for that many processes so I reduced the number of processes but that didn't help. The plan is to triple the resolution in each horizontal direction and double in the vertical.
>>
>> Our cluster has nodes with Intel E5-2600v3 processors totaling 24 cores per node with FDR-10 Infiniband. The way the jobs were specified, some compute nodes had many processes (like 20) on them and some had only 1 or 2. I experimented and found that by using only 4 cores per node and only 48 cores, it ran close to twice as fast as with 64 cores and a mix of the numbers of cores per node. To me this indicates that the inter-process-communication is high and it is saturating the memory bandwidth of the nodes with large process counts. That might point to the subdomains being too small halo region being a significant proportion of the subdomain) but in that case when I decreased the run to 16 cores I would have thought that it would have improved things quite a bit. I haven't profiled the code yet. I thought it might be quicker to write to you to get some information first.
>>
>> Can you please explain what the optimal layout is for performance? Is there an optimal size subdomain that you know of for these processors? Optimal number of tiles per subdomain? Also can you explain at a somewhat high level any other factors to consider when running the model to get better performance? Also, are there Intel Haswell CPU-specific compiler flags (we're using the Intel compilers with MVAPICH2) that you can recommend to us? Finally, is there a benchmark case where we can verify that we are getting the expected performance?
>>
>> Thanks very much,
>>
>> Steve
>> --
>> ________________________________________________________________
>> Steve Cousins Supercomputer Engineer/Administrator
>> Advanced Computing Group University of Maine System
>> 244 Neville Hall (UMS Data Center) (207) 561-3574 <tel:%28207%29%20561-3574>
>> Orono ME 04469 steve.cousins at maine.edu <http://maine.edu/>
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org <mailto:MITgcm-support at mitgcm.org>
>> http://mitgcm.org/mailman/listinfo/mitgcm-support <http://mitgcm.org/mailman/listinfo/mitgcm-support>
>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org <mailto:MITgcm-support at mitgcm.org>
> http://mitgcm.org/mailman/listinfo/mitgcm-support <http://mitgcm.org/mailman/listinfo/mitgcm-support>
>
>
>
>
> --
> ________________________________________________________________
> Steve Cousins Supercomputer Engineer/Administrator
> Advanced Computing Group University of Maine System
> 244 Neville Hall (UMS Data Center) (207) 561-3574
> Orono ME 04469 steve.cousins at maine.edu <http://maine.edu/>
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support
--
Jody Klymak
http://web.uvic.ca/~jklymak/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20160714/5f9c6842/attachment-0001.htm>
More information about the MITgcm-support
mailing list