[MITgcm-devel] exch_ and adjoint
Martin Losch
Martin.Losch at awi.de
Wed Dec 2 11:31:55 EST 2009
Hi there,
I am struggling with a performance issue: I am trying to increase my
throughput, but increasing my number of CPUs from 1 to 2 (Vector)
CPUs, I divide the domain in y-direction. In both case I run with MPI.
With this change the time required per CPU for exch_rl_*_y in creases
from (second column)
FREQUENCY EXCLUSIVE AVER.TIME MOPS MFLOPS V.OP AVER.
VECTOR I-CACHE O-CACHE BANK PROG.UNIT
TIME[sec]( % ) [msec] RATIO V.LEN
TIME MISS MISS CONF
[...]
83874 0.263( 0.6) 0.003 1606.6 0.0 77.88 45.5
0.108 0.1018 0.0086 0.0001 exch_rl_send_put_y
83874 0.259( 0.6) 0.003 1684.4 62.9 77.33 146.9
0.068 0.0917 0.0341 0.0001 exch_rl_recv_get_y
to
167748 5.867( 8.9) 0.035 381.0 5.7 68.74 56.7
1.641 2.1979 0.7529 0.0269 exch_rl_recv_get_y
83874 3.386 0.040 346.5 4.9 69.09 43.0
1.203 1.0984 0.3761 0.0159 0.0
83874 2.481 0.030 428.1 6.8 68.35 87.9
0.438 1.0995 0.3768 0.0110 0.1
167748 2.371( 3.6) 0.014 701.7 0.3 73.86 66.9
0.459 1.1025 0.2787 0.0314 exch_rl_send_put_y
83874 1.193 0.014 697.1 0.3 73.86 66.9
0.229 0.5535 0.1433 0.0157 0.0
83874 1.177 0.014 706.4 0.3 73.86 66.9
0.229 0.5490 0.1354 0.0157 0.1
So over a factor of ten for recv_get_y and a factor of 5 for
send_put_y. For the corresponding x-routines, nothing much changes.
When I do a division in x the relative roles of y-and x-exchanges
switch. Is there a simple explanantion for this?
Further, exch_* seems to get called plenty of times for the adjoint
(also just the corners, which make these routines very inefficient on
our vector computer, but still, why does it get so much worse for 2
CPU). Is there a way around that?
Martin
More information about the MITgcm-devel
mailing list