[MITgcm-devel] [altMITgcm/MITgcm66h] Bugfix/scratch files (#11)
Martin Losch
Martin.Losch at awi.de
Fri Aug 4 11:14:15 EDT 2017
OK,
I didn’t realize that we don’t need that anymore, will remove it with the next version.
About the single_disk_io: current code will not compile: myProcID was renamed into procID and I forgot to change, also we can have USE_FORTRAN_SCRATCH_FILES and SINGLE_DISK_IO defined at the same time (not sure if anyone would do that), in this case scratchfile1 and 2 are not defined. I suggest to replace lines 142-147:
WRITE(scratchFile1,'(A)') 'scratch1'
WRITE(scratchFile2,'(A)') 'scratch2'
IF( procId .EQ. 0 ) THEN
OPEN(UNIT=scrUnit1, FILE=scratchFile1, STATUS='UNKNOWN')
OPEN(UNIT=scrUnit2, FILE=scratchFile2, STATUS='UNKNOWN')
ENDIF
with
IF( procId .EQ. 0 ) THEN
OPEN(UNIT=scrUnit1, FILE=‘scratch1’, STATUS='UNKNOWN')
OPEN(UNIT=scrUnit2, FILE='scratch2', STATUS='UNKNOWN')
ENDIF
Also I suggest to define SINGLE_DISK_IO in ideal_2D_oce/code/CPP_EEOPTIONS.h to test this code
and USE_FORTRAN_SCRATCH_FILES in lab_sea/code_ad/CPP_EEOPTIONS.h
This would avoid having to check in another version of CPP_EEOPTIONS.h (all other use the default)
Martin
> On 4. Aug 2017, at 17:01, Jean-Michel Campin <jmc at mit.edu> wrote:
>
> Hi Martin,
>
> The changes you made seems complicated:
> This part: line 155-160
> IF ( .NOT.doReport ) THEN
> C called from eeboot_minimal.F before myProcId is set, so we have to
> C use scratch files and keep our fingers crossed
> OPEN(UNIT=scrUnit1,STATUS='SCRATCH')
> OPEN(UNIT=scrUnit2,STATUS='SCRATCH')
> ELSE
> is not needed + it relies on opening unit with STATUS='SCRATCH' that we would
> like to avoid when USE_FORTRAN_SCRATCH_FILES is undef (and with this
> IF ( .NOT.doReport ) THEN .. the procId argument that I added few days ago is
> of no use).
>
> But I would not change anything regarding the SINGLE_DISK_IO block (there is a
> stop there, for good reasons, and it already open scrUnit 1 & 2
> as real file, i.e, STATUS='UNKNOWN').
>
> Cheers,
> Jean-Michel
>
> On Fri, Aug 04, 2017 at 04:03:38PM +0200, Martin Losch wrote:
>> Hi Jean-Michel,
>> I checked in a new eeset_parms.F While I think that this version will not break any tests, it is probably not very good in terms of some special cases (e.g. it will break SINGLE_DISK_IO, because I forgot add a proper flag for the declaration of scratchFile1 and 2).
>> It???s Friday afternoon and my brain seems to be in weekend mode already, that???s why I am reluctant to check in anything without consulting with you. Here???s what I think I should do:
>> (1) remove the SINGLE_DISK_IO block, because now you always pass something meaningfull in ???procID" to eeboot_minimal.
>> (2) replace it with a
>> #ifdef SINGLE_DISK_IO
>> IF ( procID .EQ. 0 ) THEN
>> #else
>> IF ( .TRUE. ) THEN
>> #endif
>> ELSE
>> ???
>> ENDIF
>>
>> at the beginning of the default (if !defined USE_FORTRAN_SCRATCH_FILES) block.
>> I think that should work, what do you think?
>>
>> Martin
>>
>>> On 3. Aug 2017, at 15:10, Jean-Michel Campin <jmc at mit.edu> wrote:
>>>
>>> Hi Martin,
>>>
>>> Yes, last changes are good, and you can proceed with next step
>>> when you want.
>>>
>>> Cheers,
>>> Jean-Michel
>>>
>>> On Thu, Aug 03, 2017 at 12:54:56PM +0200, Martin Losch wrote:
>>>> Hi Jean-Michel,
>>>>
>>>> I know you have been busy with other stuff, but it does not look like there are any problems with my changes to eeset_parms.F
>>>> Should I now do the second step and change the default as suggested (just to eeset_parms.F, if it works, I can add the stuff to all namelists)?
>>>>
>>>> Martin
>>>>
>>>>> On 28. Jul 2017, at 14:57, Martin Losch <Martin.Losch at awi.de> wrote:
>>>>>
>>>>> OK,then Iet???s wait until Monday,
>>>>>
>>>>> Martin
>>>>>
>>>>>> On 28. Jul 2017, at 14:50, Jean-Michel Campin <jmc at mit.edu> wrote:
>>>>>>
>>>>>> Hi Martin,
>>>>>>
>>>>>> These experiments were already failing before, in the same way,
>>>>>> so I am not worried too much.
>>>>>> Now some tests are not running everyday (I alternate -fast and -devel),
>>>>>> so it might be good to wait at least an other day (to pass more -devel tests).
>>>>>>
>>>>>> Cheers,
>>>>>> Jean-Michel
>>>>>>
>>>>>> On Fri, Jul 28, 2017 at 09:58:35AM +0200, Martin Losch wrote:
>>>>>>> Hi Jean-Michel,
>>>>>>>
>>>>>>> it looks like some forward tests actually do fail since my change to eeset_parms.F, e.g. here:
>>>>>>> svante linux_amd64_pgf77+mth.fast ( the corresponding linux_amd64_pgf77+mth.dvlp looks OK)
>>>>>>>
>>>>>>> Y Y Y N .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . N/O aim.5l_cs
>>>>>>> Y Y Y N .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . N/O aim.5l_cs.thSI
>>>>>>> Y Y Y N .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . N/O aim.5l_Equatorial_Channel
>>>>>>> Y Y Y N .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . N/O aim.5l_LatLon
>>>>>>>
>>>>>>> Y Y N N .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . N/O hs94.cs-32x32x5
>>>>>>> Y Y N N .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . N/O hs94.cs-32x32x5.impIGW
>>>>>>>
>>>>>>> Y Y N N .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . . . . . N/O short_surf_wave
>>>>>>>
>>>>>>> The comile time error (hs94.cs-32x32x5, short_surf_wave) does not look related to me:
>>>>>>>
>>>>>>> pgf77 -byteswapio -Ktrap=fp -mp -tp k8-64 -pc=64 -O2 -Mvect=sse -c ini_dynvars.f
>>>>>>> PGFTN-F-0007-Subprogram too large to compile at this optimization level (ini_dynvars.f)
>>>>>>> PGFTN/x86-64 Linux 16.9-0: compilation aborted
>>>>>>> Makefile:1653: recipe for target 'ini_dynvars.o' failed
>>>>>>> make[1]: *** [ini_dynvars.o] Error 2
>>>>>>> make[1]: Leaving directory '/net/fs09/d0/jm_c/test_svante/MITgcm_pgiMth/verification/hs94.cs-32x32x5/build'
>>>>>>> Makefile:1561: recipe for target 'fwd_exe_target' failed
>>>>>>> make: *** [fwd_exe_target] Error 2
>>>>>>>
>>>>>>> but the aim.* experiments loose their threads.
>>>>>>>>>> Error: _mp_pcpu_reset: lost thread
>>>>>>> Can that be related to closing some files?
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>>> On 27. Jul 2017, at 00:22, Jean-Michel Campin <jmc at mit.edu> wrote:
>>>>>>>>
>>>>>>>> Hi Martin,
>>>>>>>>
>>>>>>>> two things:
>>>>>>>> 1) I've checked that MPI_COMM_RANK is not blocking (can be called
>>>>>>>> by only a subset of procs) so I added this call in the OASIS block
>>>>>>>> and add argument "procId" to EESET_PARMS as suggested before.
>>>>>>>> This should make your coming set of changes simpler.
>>>>>>>> 2) the set of changes you propose seems good to me. And for now,
>>>>>>>> I would set this USE_FORTRAN_SCRATCH_FILES in CPP_EEOPTIONS.h
>>>>>>>> and not worry about genmake_local.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Jean-Michel
>>>>>>>>
>>>>>>>> On Wed, Jul 26, 2017 at 10:16:45AM +0200, Martin Losch wrote:
>>>>>>>>> Hi Jean-Michel,
>>>>>>>>>
>>>>>>>>> I suggest to test this now as you say, i.e. check in an eeset_parms.F where only the appropriate close statements are ammended with STATUS=???DELETE??? (which in my opinion should always work, since this option is F77 standard, but you never know ???), but also have (at least) one testreport-verification-experiment use the USE_FORTRAN_SCRATCH_FILES flag, so that it is always tested (that???s a bit annoying, since it would be the only experiment with it???s own CPP_EEOPTIONS.h file, or can this be put into some genmake_local?)
>>>>>>>>>
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>>> On 25. Jul 2017, at 18:17, Jean-Michel Campin <jmc at mit.edu> wrote:
>>>>>>>>>>
>>>>>>>>>> An other thing:
>>>>>>>>>> Are we 100% sure that closing a scratch unit file with status "delete"
>>>>>>>>>> is completly standard on all platforms & compilers ? If not, we could
>>>>>>>>>> test just this independently (i.e., check-in and see how daily test run).
>>>>>>>>>> The reason is that when someone chose to use USE_FORTRAN_SCRATCH_FILES,
>>>>>>>>>> (which is not going to be the default and therefore not tested) we need to be
>>>>>>>>>> sure that the close instruction is OK.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> MITgcm-devel mailing list
>>>>>>>>> MITgcm-devel at mitgcm.org
>>>>>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> MITgcm-devel mailing list
>>>>>>>> MITgcm-devel at mitgcm.org
>>>>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> MITgcm-devel mailing list
>>>>>>> MITgcm-devel at mitgcm.org
>>>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> MITgcm-devel mailing list
>>>>>> MITgcm-devel at mitgcm.org
>>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> MITgcm-devel mailing list
>>>>> MITgcm-devel at mitgcm.org
>>>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>>>
>>>> _______________________________________________
>>>> MITgcm-devel mailing list
>>>> MITgcm-devel at mitgcm.org
>>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
>>
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list