[MITgcm-devel] global_sum_ad.F

chris hill cnh at mit.edu
Wed Dec 22 12:58:12 EST 2004


Could it be that there are two processors with the same max value.

BTW - what I usually do is write so uniproc code that emulates the MPI
as close as possible and then get taf to tell me what the adjoint should
be - thats what its for.

Chris
On Wed, 2004-12-22 at 12:48, mlosch at awi-bremerhaven.de wrote:
> OK, I didn't really think, that this was wrong. We were a little confused about myThid in the MPI_Bcast argument and I felt I needed to understand everything before I start modifying global_max_ad.F (global_admax_r8/4), because that's really not correct. Patrick and I see problems in the adjoint at tile corners when this rourinte is used. global_admax is the identical copy of global_max (except for the order of the arguments) and that can't be right.
> Chris, the adjoint of max(x,y) is 0.5+sign(0.5,x-y) and you are right it is not defined for x=y, but if sign(.5,0.) returns something (either 0.5 or -0.5) that should not be a serious problem (but it's not exactly correct). For the adjoint of global_max, one probably has to compute the global maximum of the input parameter and then find out which processor hat this maximum value. For this processor the dervative is one, the others it is zero. That's the theory, in pratice I haven't got it to work, yet, and I probably won't make it this year ...
> 
> M.
> 
> Martin Losch
> Alfred Wegener Institute 
> Postfach 120161, 27515 Bremerhaven, Germany; 
> Tel./Fax: ++49(0471)4831-1872/1797
> 
> 
> 
> ----- Original Message -----
> From: chris hill <cnh at mit.edu>
> Date: Wednesday, December 22, 2004 3:16 pm
> Subject: Re: [MITgcm-devel] global_sum_ad.F
> 
> > Martin,
> > 
> > The parallelism in max looks fine to me as is. The reason why its 
> > in a
> > MASTER section is to ensure thread 0 does the BCAST. As Constantinos
> > points out that is important.
> > 
> > If do switch to use OpenMP SINGLE we will change it. 
> > 
> > There is a subtle adjoint issue that I have never fully reconciled.
> > Since MAX(2,2,2) is somewhat arbitrary in which 2 it returns, in the
> > adjoint form it could notionally return a "different" 2 to the forward
> > run. As far as I know there is node code that would care which 2 it
> > gets, but in theory its possible for this to cause a problem in 
> > reversecomputations (I think).
> > 
> > Chris
> > On Wed, 2004-12-22 at 10:06, Constantinos Evangelinos wrote:
> > > On Wednesday 22 December 2004 06:00, Martin Losch wrote:
> > > 
> > > > Hi,
> > > > while looking at a different problem (adjoint of global_max is 
> > broken),> > I had a closer look at pkg/autodiff/global_sum_ad.F 
> > together with "our"
> > > > MPI specialist. We have the feeling (and Patrick has confirmed 
> > this> > feeling) that the argument in MPI_Bcast shouldn't be 
> > myThid, but
> > > > something like myProcessorId (myMPIid,myPid??). Otherwise 
> > threads and
> > > > processors will be mixed. What's your opinion?
> > > 
> > > The issue may be a completely moot point as the threaded code is 
> > broken (at 
> > > least my tests show that on an SGI IRIX box both with SGI 
> > directives as well 
> > > as OpenMP (derived from an earlier OpenMP version of MITgcm that 
> > never made 
> > > it into the main tree). I've promised Chris to finish debugging 
> > the threaded 
> > > code (the problems I've seen so far lie in that some of I/O has 
> > been written 
> > > outside of MASTER sections and write conflicts arise). 
> > > 
> > > However the argument to MPI_Bcast in question should be a fixed 
> > integer and 
> > > not a variable such as the process ID. This argument is the root 
> > processor 
> > > argument and should be the same on all processes calling 
> > MPI_Bcast. As the 
> > > code is run with one thread per process, myThid=0 always and the 
> > code works 
> > > fine. A parallel-threaded version of it would also work as the 
> > call to 
> > > MPI_Bcast is done within a MASTER section and thus myThid=0 once 
> > again on all 
> > > nodes. But this is just plain luck... If instead of a MASTER 
> > section we were 
> > > using an OpenMP SINGLE directive (which would accomplish the same 
> > thing in 
> > > this case) different threads would call MPI_Bcast on each process 
> > and the 
> > > whole code would break down (the call would not even complete). 
> > Thus I 
> > > suggest myThid is replaced by "0" and all should be fine.   
> > > 
> > > Constantinos
> > 
> > _______________________________________________
> > MITgcm-devel mailing list
> > MITgcm-devel at mitgcm.org
> > http://dev.mitgcm.org/mailman/listinfo/mitgcm-devel
> > 
> 
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://dev.mitgcm.org/mailman/listinfo/mitgcm-devel




More information about the MITgcm-devel mailing list