[MITgcm-devel] advection routines: strange results in flow trace analysis
Martin Losch
Martin.Losch at awi.de
Thu Apr 10 11:04:21 EDT 2008
Hi Jean-Michel,
does that mean you do not recommend the use of MULTIDIM_OLD_VERSION?
How severe is this non-conservation?
It does speed up my short (216 timestep) run by over 10%, mostly
because the time spent in BLOCKING_EXCHANGES is reduced from 15% to 2%.
where exactly is this figure you are talking about, I can't find it ...
Martin
On 10 Apr 2008, at 15:37, Jean-Michel Campin wrote:
> Hi Martin,
>
> On Thu, Apr 10, 2008 at 09:13:27AM +0200, Martin Losch wrote:
>> Me again,
>>
>> can I still use #define MULTIDIM_OLD_VERSION in gad_advection.F?
>> will that fix my performance problem? and at what cost?
>
> The MULTIDIM_OLD_VERSION does not conserve the total tracer
> amount.
>
>> I guess, I never realized (and probably never will), how many
>> complications arise with the cubed shpere configuration.
>
> I added a figure in the manual, but the description & legend
> ares still missing !!!
>
> And regarding the other suggestion (3 call for every tiles,
> even if 1 call is not needed at all), you will get more flops
> but is unlikely to really "speed up" a lot our run. And it
> will definitively slow down some other setups we have.
>
> Jean-Michel
>
>>
>> Martin
>>
>> On 9 Apr 2008, at 18:51, Martin.Losch at awi.de wrote:
>>> OK, I can see now where this comes from:
>>> C- CubedSphere : pass 3 times, with partial update of local
>>> tracer field
>>> IF (ipass.EQ.1) THEN
>>> overlapOnly = MOD(nCFace,3).EQ.0
>>> interiorOnly = MOD(nCFace,3).NE.0
>>> calc_fluxes_X = nCFace.EQ.6 .OR. nCFace.EQ.1 .OR. nCFace.EQ.2
>>> calc_fluxes_Y = nCFace.EQ.3 .OR. nCFace.EQ.4 .OR. nCFace.EQ.5
>>> ELSEIF (ipass.EQ.2) THEN
>>> overlapOnly = MOD(nCFace,3).EQ.2
>>> interiorOnly = MOD(nCFace,3).EQ.1
>>> calc_fluxes_X = nCFace.EQ.2 .OR. nCFace.EQ.3 .OR. nCFace.EQ.4
>>> calc_fluxes_Y = nCFace.EQ.5 .OR. nCFace.EQ.6 .OR. nCFace.EQ.1
>>> ELSE
>>> interiorOnly = .TRUE.
>>> calc_fluxes_X = nCFace.EQ.5 .OR. nCFace.EQ.6
>>> calc_fluxes_Y = nCFace.EQ.2 .OR. nCFace.EQ.3
>>> ENDIF
>>>
>>>
>>> I assume that this is the minimum number of calls to gad_$
>>> {advscheme}_adv_x/y that is possible? Why is it not symmetric for
>>> all faces? I wonder if the load imbalance on the cpus is more
>>> severe (because of waiting in the exchange routines than) than
>>> calling gad_${advscheme}_adv_x/y for two more faces, so that the
>>> load is nearly the same for all faces. Currently the four
>>> exch2_send/recv_rl1/2 routines take up over 20% of the total time
>>> (mostly because they wait).
>>>
>>> Martin
>>>
>>> ----- Original Message -----
>>> From: Jean-Michel Campin <jmc at ocean.mit.edu>
>>> Date: Wednesday, April 9, 2008 6:30 pm
>>> Subject: Re: [MITgcm-devel] advection routines: strange results in
>>> flow trace analysis
>>>
>>>> Hi Martin,
>>>>
>>>> MultiDim advection on CS-grid has special stuff depending on
>>>> which face is computed (2 or 3 calls to advection S/R, npass=3);
>>>> It's very likely that it comes from there.
>>>>
>>>> Jean-Michel
>>>>
>>>> On Wed, Apr 09, 2008 at 04:29:44PM +0200, Martin Losch wrote:
>>>>> Hi there,
>>>>>
>>>>> I have an unexpected result in flow trace analysis (see below). I
>>>> am
>>>>> running the high resultion cubed sphere configuration (CS510)
>>>> with 16
>>>>> passive tracers on 24CPU of a SX8-R, my advection scheme is 7
>>>> (os7mp)
>>>>> for the tracers and 33 (dst3fl) for the seaice variables. As
>>>> expected
>>>>> the advection routines use most of the time (18 tracers). The
>>>> flow
>>>>> trace analysis below gives the cumulative/average values in the
>>>> first
>>>>> line, and then the values for the individual process in the
>>>> following
>>>>> 24 lines. However, if you look closely, you'll see that on some
>>>> (8)
>>>>> cpus the advection routine is called twice as often as on the
>>>>> remaining 16 cpus, this is true both for gad_os7mp_adv_x/y (which
>>>> is
>>>>> called from gad_advection in this case) and gad_dst3fl_adv_x/y
>>>> (which
>>>>> is called from seaice_advection in this case, not shown). We
>>>> (Jens-
>>>>> Olaf and I) suspect that this imbalance is responsible for the
>>>>> terrible performance of the exch2-routines in this run that Chris
>>>> and
>>>>> I talked about in February, because 16 cpus have to wait for 8
>>>> all
>>>>> the time in the exchange routines.
>>>>>
>>>>> All other routines seem to be called with the same frequency on
>>>> all
>>>>> CPUs.
>>>>>
>>>>> What is the explanation for this?
>>>>>
>>>>> Martin
>>>>>
>>>>>
>>>>>
>>>>>> *--------------------------*
>>>>>> FLOW TRACE ANALYSIS LIST
>>>>>> *--------------------------*
>>>>>>
>>>>>> Execution : Wed Apr 9 10:54:36 2008
>>>>>> Total CPU : 13:40'31"663
>>>>>>
>>>>>>
>>>>>> FREQUENCY EXCLUSIVE AVER.TIME MOPS MFLOPS V.OP AVER.
>>>>
>>>>>> VECTOR I-CACHE O-CACHE BANK PROG.UNIT
>>>>>> TIME[sec]( % ) [msec] RATIO V.LEN
>>>>
>>>>>> TIME MISS MISS CONF
>>>>>>
>>>>>> 6220800 6776.032( 13.8) 1.089 25809.1 7541.0 99.88 256.0
>>>>
>>>>>> 6774.327 0.1500 0.1642 10.0670 gad_os7mp_adv_x
>>>>>> 194400 211.705 1.089 25814.8 7542.6 99.88 256.0
>>>>
>>>>>> 211.646 0.0084 0.0104 0.2688 0.0
>>>>>> 194400 211.692 1.089 25816.3 7543.1 99.88 256.0
>>>>
>>>>>> 211.641 0.0013 0.0027 0.2656 0.1
>>>>>> 194400 211.890 1.090 25792.1 7536.0 99.88 256.0
>>>>
>>>>>> 211.838 0.0014 0.0021 0.4195 0.10
>>>>>> 194400 211.907 1.090 25790.2 7535.4 99.88 256.0
>>>>
>>>>>> 211.852 0.0020 0.0024 0.4203 0.11
>>>>>> 194400 211.706 1.089 25814.6 7542.5 99.88 256.0
>>>>
>>>>>> 211.648 0.0059 0.0064 0.2785 0.12
>>>>>> 194400 211.698 1.089 25815.6 7542.8 99.88 256.0
>>>>
>>>>>> 211.644 0.0011 0.0019 0.2743 0.13
>>>>>> 194400 211.720 1.089 25812.8 7542.0 99.88 256.0
>>>>
>>>>>> 211.654 0.0171 0.0173 0.2838 0.14
>>>>>> 194400 211.713 1.089 25813.8 7542.3 99.88 256.0
>>>>
>>>>>> 211.658 0.0034 0.0041 0.2903 0.15
>>>>>> 194400 211.673 1.089 25818.7 7543.8 99.88 256.0
>>>>
>>>>>> 211.615 0.0096 0.0076 0.2493 0.16
>>>>>> 194400 211.681 1.089 25817.7 7543.5 99.88 256.0
>>>>
>>>>>> 211.613 0.0181 0.0169 0.2498 0.17
>>>>>> 194400 211.645 1.089 25822.0 7544.7 99.88 256.0
>>>>
>>>>>> 211.590 0.0041 0.0044 0.2292 0.18
>>>>>> 194400 211.650 1.089 25821.4 7544.6 99.88 256.0
>>>>
>>>>>> 211.596 0.0042 0.0045 0.2281 0.19
>>>>>> 194400 211.684 1.089 25817.3 7543.4 99.88 256.0
>>>>
>>>>>> 211.628 0.0050 0.0063 0.2656 0.2
>>>>>> 388800 423.306 1.089 25821.1 7544.4 99.88 256.0
>>>>
>>>>>> 423.206 0.0061 0.0076 0.4798 0.20
>>>>>> 388800 423.311 1.089 25820.8 7544.4 99.88 256.0
>>>>
>>>>>> 423.208 0.0057 0.0064 0.4736 0.21
>>>>>> 388800 423.306 1.089 25821.1 7544.4 99.88 256.0
>>>>
>>>>>> 423.204 0.0024 0.0031 0.4841 0.22
>>>>>> 388800 423.321 1.089 25820.2 7544.2 99.88 256.0
>>>>
>>>>>> 423.218 0.0017 0.0024 0.4834 0.23
>>>>>> 194400 211.678 1.089 25818.0 7543.5 99.88 256.0
>>>>
>>>>>> 211.625 0.0101 0.0112 0.2526 0.3
>>>>>> 388800 423.756 1.090 25793.7 7536.4 99.88 256.0
>>>>
>>>>>> 423.648 0.0025 0.0048 0.8570 0.4
>>>>>> 388800 423.705 1.090 25796.7 7537.3 99.88 256.0
>>>>
>>>>>> 423.595 0.0079 0.0095 0.8142 0.5
>>>>>> 388800 423.733 1.090 25795.1 7536.8 99.88 256.0
>>>>
>>>>>> 423.660 0.0159 0.0174 0.8413 0.6
>>>>>> 388800 423.742 1.090 25794.5 7536.7 99.88 256.0
>>>>
>>>>>> 423.647 0.0024 0.0039 0.8203 0.7
>>>>>> 194400 211.906 1.090 25790.2 7535.4 99.88 256.0
>>>>
>>>>>> 211.845 0.0124 0.0089 0.4195 0.8
>>>>>> 194400 211.904 1.090 25790.4 7535.5 99.88 256.0
>>>>
>>>>>> 211.849 0.0012 0.0019 0.4183 0.9
>>>>>> 6220800 6482.742( 13.2) 1.042 27041.7 7882.1 99.88 256.0
>>>>
>>>>>> 6480.721 0.5066 0.1471 7.7018 gad_os7mp_adv_y
>>>>>> 194400 202.452 1.041 27059.5 7887.3 99.88 256.0
>>>>
>>>>>> 202.387 0.0137 0.0075 0.2022 0.0
>>>>>> 194400 202.439 1.041 27061.3 7887.8 99.88 256.0
>>>>
>>>>>> 202.380 0.0073 0.0014 0.1965 0.1
>>>>>> 388800 405.687 1.043 27007.3 7872.1 99.88 256.0
>>>>
>>>>>> 405.582 0.0103 0.0025 0.6622 0.10
>>>>>> 388800 405.711 1.043 27005.7 7871.6 99.88 256.0
>>>>
>>>>>> 405.568 0.0446 0.0029 0.6483 0.11
>>>>>> 194400 202.487 1.042 27054.9 7886.0 99.88 256.0
>>>>
>>>>>> 202.422 0.0128 0.0065 0.2192 0.12
>>>>>> 194400 202.461 1.041 27058.3 7887.0 99.88 256.0
>>>>
>>>>>> 202.401 0.0084 0.0013 0.2061 0.13
>>>>>> 194400 202.497 1.042 27053.5 7885.6 99.88 256.0
>>>>
>>>>>> 202.425 0.0237 0.0147 0.2160 0.14
>>>>>> 194400 202.519 1.042 27050.6 7884.7 99.88 256.0
>>>>
>>>>>> 202.423 0.0424 0.0033 0.2155 0.15
>>>>>> 388800 404.770 1.041 27068.5 7889.9 99.88 256.0
>>>>
>>>>>> 404.664 0.0198 0.0146 0.3562 0.16
>>>>>> 388800 404.766 1.041 27068.7 7890.0 99.88 256.0
>>>>
>>>>>> 404.653 0.0310 0.0243 0.3585 0.17
>>>>>> 388800 404.796 1.041 27066.7 7889.4 99.88 256.0
>>>>
>>>>>> 404.692 0.0125 0.0057 0.3540 0.18
>>>>>> 388800 404.830 1.041 27064.4 7888.8 99.88 256.0
>>>>
>>>>>> 404.696 0.0435 0.0060 0.3553 0.19
>>>>>> 194400 202.459 1.041 27058.6 7887.1 99.88 256.0
>>>>
>>>>>> 202.396 0.0111 0.0046 0.1934 0.2
>>>>>> 194400 202.395 1.041 27067.1 7889.5 99.88 256.0
>>>>
>>>>>> 202.338 0.0094 0.0032 0.1811 0.20
>>>>>> 194400 202.400 1.041 27066.4 7889.3 99.88 256.0
>>>>
>>>>>> 202.342 0.0092 0.0030 0.1827 0.21
>>>>>> 194400 202.406 1.041 27065.6 7889.1 99.88 256.0
>>>>
>>>>>> 202.347 0.0074 0.0012 0.1768 0.22
>>>>>> 194400 202.438 1.041 27061.4 7887.9 99.88 256.0
>>>>
>>>>>> 202.347 0.0376 0.0008 0.1773 0.23
>>>>>> 194400 202.473 1.042 27056.8 7886.5 99.88 256.0
>>>>
>>>>>> 202.392 0.0463 0.0091 0.1846 0.3
>>>>>> 194400 202.854 1.043 27005.8 7871.7 99.88 256.0
>>>>
>>>>>> 202.791 0.0090 0.0011 0.3488 0.4
>>>>>> 194400 202.871 1.044 27003.6 7871.0 99.88 256.0
>>>>
>>>>>> 202.806 0.0125 0.0041 0.3407 0.5
>>>>>> 194400 202.796 1.043 27013.6 7873.9 99.88 256.0
>>>>
>>>>>> 202.748 0.0174 0.0081 0.3077 0.6
>>>>>> 194400 202.880 1.044 27002.5 7870.7 99.88 256.0
>>>>
>>>>>> 202.788 0.0424 0.0012 0.3204 0.7
>>>>>> 388800 405.695 1.043 27006.7 7871.9 99.88 256.0
>>>>
>>>>>> 405.581 0.0246 0.0181 0.6572 0.8
>>>>>> 388800 405.660 1.043 27009.1 7872.6 99.88 256.0
>>>>
>>>>>> 405.551 0.0097 0.0021 0.6412 0.9
>>>>>> 4572288 5253.314( 10.7) 1.149 26038.1 7946.7 99.88 255.2
>>>>
>>>>>> 5251.591 0.1115 0.1259 8.4362 gad_os7mp_adv_r
>>>>>> 190512 218.247 1.146 26114.6 7970.0 99.88 255.2
>>>>
>>>>>> 218.173 0.0070 0.0075 0.2710 0.0
>>>>>> 190512 218.203 1.145 26119.9 7971.6 99.88 255.2
>>>>
>>>>>> 218.134 0.0012 0.0023 0.2445 0.1
>>>>>> 190512 219.379 1.152 25979.8 7928.9 99.88 255.2
>>>>
>>>>>> 219.309 0.0017 0.0023 0.4233 0.10
>>>>>> 190512 219.377 1.152 25980.0 7929.0 99.88 255.2
>>>>
>>>>>> 219.305 0.0019 0.0021 0.4169 0.11
>>>>>> 190512 218.427 1.147 26093.0 7963.4 99.88 255.2
>>>>
>>>>>> 218.354 0.0051 0.0057 0.2925 0.12
>>>>>> 190512 218.376 1.146 26099.1 7965.3 99.88 255.2
>>>>
>>>>>> 218.305 0.0010 0.0017 0.2655 0.13
>>>>>> 190512 218.430 1.147 26092.7 7963.4 99.88 255.2
>>>>
>>>>>> 218.352 0.0144 0.0143 0.2949 0.14
>>>>>> 190512 218.424 1.147 26093.4 7963.6 99.88 255.2
>>>>
>>>>>> 218.353 0.0029 0.0035 0.2948 0.15
>>>>>> 190512 218.846 1.149 26043.1 7948.2 99.88 255.2
>>>>
>>>>>> 218.771 0.0073 0.0077 0.3413 0.16
>>>>>> 190512 218.903 1.149 26036.3 7946.1 99.88 255.2
>>>>
>>>>>> 218.826 0.0131 0.0131 0.3785 0.17
>>>>>> 190512 218.789 1.148 26049.9 7950.3 99.88 255.2
>>>>
>>>>>> 218.716 0.0035 0.0039 0.3222 0.18
>>>>>> 190512 218.737 1.148 26056.1 7952.2 99.88 255.2
>>>>
>>>>>> 218.665 0.0036 0.0037 0.3009 0.19
>>>>>> 190512 218.213 1.145 26118.6 7971.3 99.88 255.2
>>>>
>>>>>> 218.141 0.0043 0.0052 0.2531 0.2
>>>>>> 190512 219.118 1.150 26010.8 7938.3 99.88 255.2
>>>>
>>>>>> 219.046 0.0034 0.0044 0.3932 0.20
>>>>>> 190512 219.104 1.150 26012.3 7938.8 99.88 255.2
>>>>
>>>>>> 219.032 0.0031 0.0039 0.3778 0.21
>>>>>> 190512 219.107 1.150 26012.0 7938.7 99.88 255.2
>>>>
>>>>>> 219.034 0.0018 0.0023 0.3809 0.22
>>>>>> 190512 219.113 1.150 26011.3 7938.5 99.88 255.2
>>>>
>>>>>> 219.040 0.0011 0.0014 0.3814 0.23
>>>>>> 190512 218.109 1.145 26131.1 7975.1 99.88 255.2
>>>>
>>>>>> 218.042 0.0086 0.0092 0.2149 0.3
>>>>>> 190512 219.504 1.152 25965.0 7924.4 99.88 255.2
>>>>
>>>>>> 219.429 0.0017 0.0031 0.4977 0.4
>>>>>> 190512 219.464 1.152 25969.7 7925.8 99.88 255.2
>>>>
>>>>>> 219.390 0.0043 0.0052 0.4517 0.5
>>>>>> 190512 219.381 1.152 25979.6 7928.8 99.88 255.2
>>>>
>>>>>> 219.328 0.0083 0.0088 0.4260 0.6
>>>>>> 190512 219.394 1.152 25978.0 7928.4 99.88 255.2
>>>>
>>>>>> 219.327 0.0017 0.0027 0.4132 0.7
>>>>>> 190512 219.319 1.151 25987.0 7931.1 99.88 255.2
>>>>
>>>>>> 219.243 0.0091 0.0097 0.3975 0.8
>>>>>> 190512 219.351 1.151 25983.1 7929.9 99.88 255.2
>>>>
>>>>>> 219.277 0.0013 0.0022 0.4027 0.9
>>>>>>
>>>>> _______________________________________________
>>>>> MITgcm-devel mailing list
>>>>> MITgcm-devel at mitgcm.org
>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>>> _______________________________________________
>>>> MITgcm-devel mailing list
>>>> MITgcm-devel at mitgcm.org
>>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>>>
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list