[MITgcm-devel] [altMITgcm/MITgcm66h] Bugfix/scratch files (#11)

Jean-Michel Campin jmc at mit.edu
Tue Jul 25 09:40:54 EDT 2017


Hi Martin,

Just to clarify:
 when EESET_PARMS is called from eeboot_minimal.F, myProcId is not yet set
 (it's just equal to 0 for all processes).
This feature is used for atm+oce coupled set-up.
Now I agree that there is currently a problem if one want to use
either TARGET_BGL or TARGET_CRAYXT in coupled set-up, but since it's
not the default, it's less a problem.
Now if the default is changed, the coupled set-up will no longer run
(or worst, sometime it will and sometimes not depending on a race 
 between the different procs of the same component).
In short, before changing the default, this issue need to be addressed.

Regarding point #2, I agree that closing both scrUnit1 & scrUnit2 
units with "delete" is a better test than just changing one of them.

Cheers,
Jean-Michel

On Tue, Jul 25, 2017 at 03:10:11PM +0200, Martin Losch wrote:
> Hi Jean-Michel,
> 
> I am confused, I am afraid, and I didn???t/don???t understand your suggestions completely.
> 
> (1) issue #1, as you call it, is something that I do not understand and therefore don???t want to touch. You are refering to this line:
>       IF ( .NOT.doReport )
>      &     STOP 'ABNORMAL END: S/R EESET_PARMS: myProcId unset???
> which is exclusive to the SINGLE_DISK_IO code. I have no idea what would happen, if we do this for all cases and would like to leave it to people who know better. Can we leave this aside and deal with it later?
> 
> (2) close one unit, i.e. just close, say scrUnit1 for a test? Fine with me, but what do we gain? I expect that it will only lead to scratchFile2.???? in the run directories. Do we want that? I am attaching, what I had planned to commit for your inspection and scrutiny ??? I tested it on 1CPU linux and Apple computers (with gfortran)
> 
> Martin
> 
> PS. if you are uncomfortable with this entire change, we can leave it as it is.


> 
> > On 25. Jul 2017, at 15:00, Jean-Michel Campin <jmc at mit.edu> wrote:
> > 
> > Hi Martin,
> > 
> > 1) What about issue #1
> >>> I think this need to be addressed.
> > 
> > 2) Why not stating to just change one "close" statatement as suggested before ?
> > 
> > Jean-Michel
> > 
> > On Tue, Jul 25, 2017 at 09:18:09AM +0200, Martin Losch wrote:
> >> Hi again,
> >> 
> >> if you agree, I will check in a new version of eeset_parms.F where I change the default to what is now done for TARGET_CRAYXT and close the ???scratchFile1/2??? with STATUS=???DELETE??? and we???ll see how this goes.
> >> Everything else (open_copy_data_file with 60+ $pkg_readme.F, and format macro for I9.9 or similar) will follow, if the checks are OK next Monday. 
> >> Does this sound like a plan?
> >> 
> >> Martin
> >> 
> >>> On 24. Jul 2017, at 18:15, Jean-Michel Campin <jmc at mit.edu> wrote:
> >>> 
> >>> Hi Martin,
> >>> 
> >>> I am not concerned about SINGLE_DISK_IO but, as written in the comment,
> >>>>>> C Stop if called from eeboot_minimal.F before myProcId is set
> >>> when EESET_PARMS is called from eeboot_minimal.F
> >>> I think this need to be addressed.
> >>> 
> >>> Otherwise, we could try to change (in one place only) the close statement,
> >>> to see how this goes. And eeset_parms.F would be a good candidate.
> >>> 
> >>> Cheers,
> >>> Jean-Michel
> >>> 
> >>> On Mon, Jul 24, 2017 at 04:48:06PM +0200, Martin Losch wrote:
> >>>> Hi Jean-Michel,
> >>>> 
> >>>> maybe better to move this to the devel list, because it is really something that we can now only test in the ???real repository???:
> >>>> 
> >>>> - Previously the STOP statement was exclusive to SINGLE_DISK_IO, I did not change that, see the CVS-repository. Do we have a single regular test anywhere, where we check this SINGLE_DISK_IO code?
> >>>> - yes, it's exactly like the TARGET_BGL or TARGET_CRAYXT code, except that now, the associated close statements have a CLOSE(iUnit,STATUS='DELETE'), so that the files are deleted. This STATUS='DELETE' is default for files that are opened with STATUS='SCRATCH???. Now i just do it implicitly.
> >>>> 
> >>>> Maybe I can check-in all the close statements for now (they should not change anything, because for the ???SCRATCH??? status files, it???s the default anyway), have everything tested overnight and then do the other two changes in eeset_parms and open_copy_data_file
> >>>> OR
> >>>> (which may be easier), just do this for eeset_parms.F because that???s simple and requires only changing one file.
> >>>> 
> >>>> Martin
> >>>> 
> >>>>> On 24. Jul 2017, at 16:37, jm-c <notifications at github.com> wrote:
> >>>>> 
> >>>>> Hi,
> >>>>> 
> >>>>> It requires more tests. For instance, I read in "eeset_parms.F", line 137:
> >>>>>> C Stop if called from eeboot_minimal.F before myProcId is set
> >>>>> this will also apply to scratchFile1 & 2 settings.
> >>>>> 
> >>>>> Also, it seems that the new way is similar to what was done with
> >>>>> TARGET_BGL or TARGET_CRAYXT defined (or may be I missed something).
> >>>>> May be keeping the old way (i.e., OPEN(UNIT=scrUnit1,STATUS='SCRATCH'))
> >>>>> but changing the default would allow -- in case some problems occur on/with
> >>>>> some disk systems -- to continue to use what currently works.
> >>>>> 
> >>>>> Cheers,
> >>>>> Jean-Michel
> >>>>> 
> >>>>> On Mon, Jul 24, 2017 at 10:41:53AM +0000, christophernhill wrote:
> >>>>>> @mjlosch I think the change look good generally. I need to talk to @jm-c to see if there are catches. At present we don't have testreport pulling from github for different machines, but we should start that soon I think. That means its hard to check if this breaks something.
> >>>>>> 
> >>>>>> Best way to check this is to put in CVS fairly soon, lets see what @jm-c says first though. For a real commit though it would be good to use I9.9 for myProcId (we are running on 100,000+ cores these days). An alternative would be to use a funky internal write and macros for the format string e.g
> >>>>>> in a header file (EEMACORS.h ?)
> >>>>>> ```
> >>>>>> #define FMT_MYPROC_ID I9.9
> >>>>>> ```
> >>>>>> and then
> >>>>>> ```
> >>>>>> WRITE(FMT, ......
> >>>>>> ```
> >>>>>> which is a bit annoying, but more flexible going forward. 
> >>>>>> 
> >>>>>> 
> >>>>>> -- 
> >>>>>> You are receiving this because you were mentioned.
> >>>>>> Reply to this email directly or view it on GitHub:
> >>>>>> https://github.com/altMITgcm/MITgcm66h/pull/11#issuecomment-317385174
> >>>>> ???
> >>>>> You are receiving this because you were mentioned.
> >>>>> Reply to this email directly, view it on GitHub, or mute the thread.
> >>>>> 
> >>>> 
> >>>> 
> >>>> _______________________________________________
> >>>> MITgcm-devel mailing list
> >>>> MITgcm-devel at mitgcm.org
> >>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
> >>> 
> >>> _______________________________________________
> >>> MITgcm-devel mailing list
> >>> MITgcm-devel at mitgcm.org
> >>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
> >> 
> >> 
> >> _______________________________________________
> >> MITgcm-devel mailing list
> >> MITgcm-devel at mitgcm.org
> >> http://mitgcm.org/mailman/listinfo/mitgcm-devel
> > 
> > _______________________________________________
> > MITgcm-devel mailing list
> > MITgcm-devel at mitgcm.org
> > http://mitgcm.org/mailman/listinfo/mitgcm-devel
> 

> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel




More information about the MITgcm-devel mailing list