[MITgcm-devel] [altMITgcm/MITgcm66h] Bugfix/scratch files (#11)

Martin Losch Martin.Losch at awi.de
Mon Aug 7 11:03:55 EDT 2017


Hi Jean-Michel,

I guess you are right, I overlooked that myProcId is a global variable in a common block.

The current eeset_parms.F fixes the declaration problem for USE_FORTRAN_SCRATCH_FILES
by I still did not revert to using myProcId. 
As far as I can see (and I am sure that you will show me places where I can’t see) eeset_parms.F is called from eeboot with “myProcId” as an argument and from eeboot_minimal with “mpiMyWId”. I guess that mpiMyWId can be different from 0, but in this case doReport = .FALSE. and the model stops. In the other case (called from eeboot) there shouldn’t be a difference between myProcId and procId, should it?

Do you want to revert to using myProcId, so that it is undefined on purpose when eeset_parms is called from eeboot_minimal?

Should we test the single_disk_io code anywhere in verification?

Martin

> On 5. Aug 2017, at 16:32, Jean-Michel Campin <jmc at mit.edu> wrote:
> 
> Hi Martin,
> 
> Sorry to insist, but I've just tried to compile with:
> #define SINGLE_DISK_IO
> and the 3 versions of eeset_parms.F, 1.40, 1.41 and the latest 1.42
> compile fine (no problem with myProcId).
> 
> The only one that does not compile is the latest (1.42) when I also set
> #define USE_FORTRAN_SCRATCH_FILES
> but the problem is not myProcId but missing declaration of 
> scratchFile1 & scratchFile2
> 
> Cheers,
> Jean-Michel
> 
> On Sat, Aug 05, 2017 at 12:09:22PM +0200, Martin Losch wrote:
>> Hi Jean-Michel,
>> 
>> I hope that my messy checkin sequence produced something that you can live with. I think, that it is pretty much inline with your last email, except that I changed one myProcId into procId, so that the code will compile with SINGLE_DISK_IO defined.
>> 
>> Will add the test for the old default on Monday 
>> 
>> M.
>> 
>>> On 4. Aug 2017, at 17:40, Jean-Michel Campin <jmc at mit.edu> wrote:
>>> 
>>> Hi Martin,
>>> 
>>>> On Fri, Aug 04, 2017 at 05:14:15PM +0200, Martin Losch wrote:
>>>> OK,
>>>> I didn???t realize that we don???t need that anymore, will remove it with the next version.
>>>> 
>>>> About the single_disk_io: current code will not compile: myProcID was renamed into procID and I forgot to change,
>>> No, I did it on purpose, and it's fine & safe, there is a stop.
>>> 
>>> also we can have USE_FORTRAN_SCRATCH_FILES and SINGLE_DISK_IO defined at the same time (not sure if anyone would do that), in this case scratchfile1 and 2 are not defined. I suggest to replace lines 142-147:
>>>>     WRITE(scratchFile1,'(A)') 'scratch1'
>>>>     WRITE(scratchFile2,'(A)') 'scratch2'
>>>>     IF( procId .EQ. 0 ) THEN
>>>>        OPEN(UNIT=scrUnit1, FILE=scratchFile1, STATUS='UNKNOWN')
>>>>        OPEN(UNIT=scrUnit2, FILE=scratchFile2, STATUS='UNKNOWN')
>>>>     ENDIF
>>>> with
>>>>     IF( procId .EQ. 0 ) THEN
>>>>        OPEN(UNIT=scrUnit1, FILE=???scratch1???, STATUS='UNKNOWN')
>>>>        OPEN(UNIT=scrUnit2, FILE='scratch2', STATUS='UNKNOWN')
>>>>     ENDIF
>>> 
>>> Apart from missing declaratiopn of scratchFile1 & scratchFile2 in the case:
>>> #defined SINGLE_DISK_IO with #defined USE_FORTRAN_SCRATCH_FILES
>>> which need to be fixed, i would not change anything in SINGLE_DISK_IO blocks
>>> (as I wrote earlier).
>>> 
>>>> Also I suggest to define SINGLE_DISK_IO in ideal_2D_oce/code/CPP_EEOPTIONS.h to test this code
>>> I am not very much in favor of this, at least not now.
>>> 
>>>> and USE_FORTRAN_SCRATCH_FILES in lab_sea/code_ad/CPP_EEOPTIONS.h
>>>> This would avoid having to check in another version of CPP_EEOPTIONS.h (all other use the default)
>>> This sounds good.
>>> 
>>> Cheers,
>>> Jean-Michel
>>> 
>>>> 
>>>> Martin
>>>> 
>>>>> On 4. Aug 2017, at 17:01, Jean-Michel Campin <jmc at mit.edu> wrote:
>>>>> 
>>>>> Hi Martin,
>>>>> 
>>>>> The changes you made seems complicated:
>>>>> This part: line 155-160
>>>>>    IF ( .NOT.doReport ) THEN
>>>>> C     called from eeboot_minimal.F before myProcId is set, so we have to
>>>>> C     use scratch files and keep our fingers crossed
>>>>>     OPEN(UNIT=scrUnit1,STATUS='SCRATCH')
>>>>>     OPEN(UNIT=scrUnit2,STATUS='SCRATCH')
>>>>>    ELSE
>>>>> is not needed + it relies on opening unit with STATUS='SCRATCH' that we would
>>>>> like to avoid when USE_FORTRAN_SCRATCH_FILES is undef (and with this 
>>>>> IF ( .NOT.doReport ) THEN .. the procId argument that I added few days ago is 
>>>>> of no use).
>>>>> 
>>>>> But I would not change anything regarding the SINGLE_DISK_IO block (there is a 
>>>>> stop there, for good reasons, and it already open scrUnit 1 & 2
>>>>> as real file, i.e, STATUS='UNKNOWN').
>>>>> 
>>>>> Cheers,
>>>>> Jean-Michel
>>>>> 
>>>>>> On Fri, Aug 04, 2017 at 04:03:38PM +0200, Martin Losch wrote:
>>>>>> Hi Jean-Michel,
>>>>>> I checked in a new eeset_parms.F While I think that this version will not break any tests, it is probably not very good in terms of some special cases (e.g. it will break SINGLE_DISK_IO, because I forgot add a proper flag for the declaration of scratchFile1 and 2).
>>>>>> It???s Friday afternoon and my brain seems to be in weekend mode already, that???s why I am reluctant to check in anything without consulting with you. Here???s what I think I should do:
>>>>>> (1) remove the SINGLE_DISK_IO block, because now you always pass something meaningfull in ???procID" to eeboot_minimal.
>>>>>> (2) replace it with a
>>>>>> #ifdef SINGLE_DISK_IO
>>>>>>     IF ( procID .EQ. 0 ) THEN
>>>>>> #else
>>>>>>     IF ( .TRUE. ) THEN
>>>>>> #endif
>>>>>>     ELSE
>>>>>> ???
>>>>>>     ENDIF
>>>>>> 
>>>>>> at the beginning of the default (if !defined USE_FORTRAN_SCRATCH_FILES) block.
>>>>>> I think that should work, what do you think?
>>>>>> 
>>>>>> Martin
>>>>>> 
>>>>>>> On 3. Aug 2017, at 15:10, Jean-Michel Campin <jmc at mit.edu> wrote:
>>>>>>> 
>>>>>>> Hi Martin,
>>>>>>> 
>>>>>>> Yes, last changes are good, and you can proceed with next step
>>>>>>> when you want.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> Jean-Michel
>>>>>>> 
>>>>>>>> On Thu, Aug 03, 2017 at 12:54:56PM +0200, Martin Losch wrote:
>>>>>>>> Hi Jean-Michel,
>>>>>>>> 
>>>>>>>> I know you have been busy with other stuff, but it does not look like there are any problems with my changes to eeset_parms.F
>>>>>>>> Should I now do the second step and change the default as suggested (just to eeset_parms.F, if it works, I can add the stuff to all namelists)?
>>>>>>>> 
>>>>>>>> Martin
>>>>>>>> 
>>>>>>>>> On 28. Jul 2017, at 14:57, Martin Losch <Martin.Losch at awi.de> wrote:
>>>>>>>>> 
>>>>>>>>> OK,then Iet???s wait until Monday,
>>>>>>>>> 
>>>>>>>>> Martin
>>>>>>>>> 
>>>>>>>>>> On 28. Jul 2017, at 14:50, Jean-Michel Campin <jmc at mit.edu> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi Martin,
>>>>>>>>>> 
>>>>>>>>>> These experiments were already failing before, in the same way,
>>>>>>>>>> so I am not worried too much. 
>>>>>>>>>> Now some tests are not running everyday (I alternate -fast and -devel), 
>>>>>>>>>> so it might be good to wait at least an other day (to pass more -devel tests).
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> Jean-Michel
>>>>>>>>>> 
>>>>>>>>>>> On Fri, Jul 28, 2017 at 09:58:35AM +0200, Martin Losch wrote:
>>>>>>>>>>> Hi Jean-Michel,
>>>>>>>>>>> 
>>>>>>>>>>> it looks like some forward tests actually do fail since my change to eeset_parms.F, e.g. here:
>>>>>>>>>>> svante linux_amd64_pgf77+mth.fast ( the corresponding linux_amd64_pgf77+mth.dvlp looks OK)
>>>>>>>>>>> 
>>>>>>>>>>> Y Y Y N .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . N/O   aim.5l_cs
>>>>>>>>>>> Y Y Y N .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . N/O   aim.5l_cs.thSI
>>>>>>>>>>> Y Y Y N .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . N/O   aim.5l_Equatorial_Channel
>>>>>>>>>>> Y Y Y N .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . N/O   aim.5l_LatLon
>>>>>>>>>>> 
>>>>>>>>>>> Y Y N N .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . N/O   hs94.cs-32x32x5
>>>>>>>>>>> Y Y N N .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . N/O   hs94.cs-32x32x5.impIGW
>>>>>>>>>>> 
>>>>>>>>>>> Y Y N N .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . N/O   short_surf_wave
>>>>>>>>>>> 
>>>>>>>>>>> The comile time error (hs94.cs-32x32x5, short_surf_wave) does not look related to me:
>>>>>>>>>>> 
>>>>>>>>>>> pgf77 -byteswapio -Ktrap=fp -mp -tp k8-64 -pc=64 -O2 -Mvect=sse  -c ini_dynvars.f
>>>>>>>>>>> PGFTN-F-0007-Subprogram too large to compile at this optimization level  (ini_dynvars.f)
>>>>>>>>>>> PGFTN/x86-64 Linux 16.9-0: compilation aborted
>>>>>>>>>>> Makefile:1653: recipe for target 'ini_dynvars.o' failed
>>>>>>>>>>> make[1]: *** [ini_dynvars.o] Error 2
>>>>>>>>>>> make[1]: Leaving directory '/net/fs09/d0/jm_c/test_svante/MITgcm_pgiMth/verification/hs94.cs-32x32x5/build'
>>>>>>>>>>> Makefile:1561: recipe for target 'fwd_exe_target' failed
>>>>>>>>>>> make: *** [fwd_exe_target] Error 2
>>>>>>>>>>> 
>>>>>>>>>>> but the aim.* experiments loose their threads. 
>>>>>>>>>>>>>> Error: _mp_pcpu_reset: lost thread
>>>>>>>>>>> Can that be related to closing some files?
>>>>>>>>>>> 
>>>>>>>>>>> Martin
>>>>>>>>>>> 
>>>>>>>>>>>> On 27. Jul 2017, at 00:22, Jean-Michel Campin <jmc at mit.edu> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi Martin,
>>>>>>>>>>>> 
>>>>>>>>>>>> two things:
>>>>>>>>>>>> 1) I've checked that MPI_COMM_RANK is not blocking (can be called
>>>>>>>>>>>> by only a subset of procs) so I added this call in the OASIS block
>>>>>>>>>>>> and add argument "procId" to EESET_PARMS as suggested before.
>>>>>>>>>>>> This should make your coming set of changes simpler.
>>>>>>>>>>>> 2) the set of changes you propose seems good to me. And for now,
>>>>>>>>>>>> I would set this USE_FORTRAN_SCRATCH_FILES in CPP_EEOPTIONS.h 
>>>>>>>>>>>> and not worry about genmake_local.
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Jean-Michel
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Jul 26, 2017 at 10:16:45AM +0200, Martin Losch wrote:
>>>>>>>>>>>>> Hi Jean-Michel,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I suggest to test this now as you say, i.e. check in an eeset_parms.F where only the appropriate close statements are ammended with STATUS=???DELETE??? (which in my opinion should always work, since this option is F77 standard, but you never know ???), but also have (at least) one testreport-verification-experiment use the USE_FORTRAN_SCRATCH_FILES flag, so that it is always tested (that???s a bit annoying, since it would be the only experiment with it???s own CPP_EEOPTIONS.h file, or can this be put into some genmake_local?)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Martin
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 25. Jul 2017, at 18:17, Jean-Michel Campin <jmc at mit.edu> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> An other thing:
>>>>>>>>>>>>>> Are we 100% sure that closing a scratch unit file with status "delete" 
>>>>>>>>>>>>>> is completly standard on all platforms & compilers ? If not, we could
>>>>>>>>>>>>>> test just this independently (i.e., check-in and see how daily test run). 
>>>>>>>>>>>>>> The reason is that when someone chose to use USE_FORTRAN_SCRATCH_FILES,
>>>>>>>>>>>>>> (which is not going to be the default and therefore not tested) we need to be 
>>>>>>>>>>>>>> sure that the close instruction is OK.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> MITgcm-devel mailing list
>>>>>>>>>>>>> MITgcm-devel at mitgcm.org
>>>>>>>>>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>>>>>>>>>>> 
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> MITgcm-devel mailing list
>>>>>>>>>>>> MITgcm-devel at mitgcm.org
>>>>>>>>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> MITgcm-devel mailing list
>>>>>>>>>>> MITgcm-devel at mitgcm.org
>>>>>>>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> MITgcm-devel mailing list
>>>>>>>>>> MITgcm-devel at mitgcm.org
>>>>>>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> MITgcm-devel mailing list
>>>>>>>>> MITgcm-devel at mitgcm.org
>>>>>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> MITgcm-devel mailing list
>>>>>>>> MITgcm-devel at mitgcm.org
>>>>>>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
>>>>>>> _______________________________________________
>>>>>>> MITgcm-devel mailing list
>>>>>>> MITgcm-devel at mitgcm.org
>>>>>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
>>>>>> 
>>>>>> _______________________________________________
>>>>>> MITgcm-devel mailing list
>>>>>> MITgcm-devel at mitgcm.org
>>>>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
>>>>> _______________________________________________
>>>>> MITgcm-devel mailing list
>>>>> MITgcm-devel at mitgcm.org
>>>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
>>>> 
>>>> _______________________________________________
>>>> MITgcm-devel mailing list
>>>> MITgcm-devel at mitgcm.org
>>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel



More information about the MITgcm-devel mailing list