[MITgcm-devel] Problems with yesterday changes
Jean-Michel Campin
jmc at ocean.mit.edu
Mon Dec 10 14:47:47 EST 2012
Hi Gael,
few remarks:
1) I did not answer your question about nc_open because I have no
idea of how nc_open works.
2) for non-netcdf file, it's seems to me that the use of MDSFINDUNIT
is the right way to go; it seems to work (despite your remark about
file range limit of 99) and so the question should be, in my view, to try
to extend the same approach to NetCDF file ? (rather than to return to
hard-coded file id).
3) if you are searching for a solution, we can talk when you come at MIT.
4) I am not sure that the problem is limited to g77 compiler, because
when NetCDF is not available, pkg/profile is turned off, so global_with_exf
might show more fail if NetCDF was available (old pgi compiler for instance);
and not as many compilers are used for AD testing (AD global_ocean.cs32x15).
5) I would like to keep FWD global_with_exf tested with g77. What do you
propose.
Cheers,
Jean-Michel
On Mon, Dec 10, 2012 at 12:53:27PM -0500, Gael Forget wrote:
> Hi Jean Michel,
>
> the g77 crashes would indeed be due to my check-in.
>
> That checkin, at least with modern compilers, allows for a more thorough
> adjoint test of pkg/profiles, which also uses ALLOW_AUTODIFF_WHTAPEIO.
> In this configuration, we keep binary files open throughout the run (both for profiles
> and ad tapes) and also do some netcdf I/O (profiles input and/or output). Keeping the
> files open largely improves performance of I/O extensive runs (adjoint, ecco) with
> certain file systems (e.g. lustre on pleiades). The profiles package has become of
> wide use in state estimation context, and is about to get extended. So we needed
> a proper test for this. The experiment I used to this end is
> MITgcm_contrib/gael/verification/global_oce_cs32
> that is easily set up with MITgcm_contrib/gael/verification/setup_these_exps.csh
> as explained in details by MITgcm_contrib/gael/verification/README
>
> I struggled mostly to make the serial run work, where 16 tiles are treated by one processor
> which requires more file units (fid). NF_OPEN was picking file units ('fid') that were already
> associated with the 'kept open files', leading to conflicts and crashes. I asked about that
> issue in my previous MITgcm-devel email. The work around I found was to hard code (as
> otherwise done in mdsfinunit.F) the range of file units for the 'kept open files'. I set them to
> large values in the 1000-6000 range, and hope nc_open does not end up using those file
> unit numbers. While I reckon this may not be the most elegant approach it works
> with gfortran and most tested compilers but not with g77.
>
> I did not account for the old compiler issue :
> http://gcc.gnu.org/onlinedocs/gcc-3.4.6/g77/Large-File-Unit-Numbers.html
> As a side note, mdsfindunit.F is not full proof either since its hardcoded
> range has been widened to 9,999, and g77 is limited to 1,99. Anyway
> I am not too sure what the better approach would be. Is there a cpp switch
> I can use that tells me whether g77 is in use? In this case I could
> comment out my hard coded ranges and fall back to mdsfindunit?
> Do you have a better idea? I will give you a ring at the office later.
>
> Cheers,
> Gael
>
>
>
>
> On Dec 10, 2012, at 10:16 AM, Jean-Michel Campin wrote:
>
> > Hi Gael,
> >
> > Seems that this is breaking some experiments:
> > on baudelaire with g77, global_ocean.cs32x15, 4 AD tests:
> >> open: illegal unit number
> >> apparent state: internal I/O
> >> lately writing sequential formatted external IO
> >> ./testreport: line 791: 20090 Aborted (core dumped)
> > Also same error on old aces cluster (geo, 32-bits) with g77,
> > same 4 AD tests.
> >
> > Now there is also an other "fail", but forward test this time,
> > with experiment global_with_exf (on geo with g77)
> > that I suspect to be related to the other check-in you made
> > yesterday, in pkg/profile, since this experiment is using of pkg/profile:
> >> open: illegal unit number
> >> apparent state: internal I/O
> >> lately writing sequential formatted external IO
> >> open: illegal unit number
> >> apparent state: internal I/O
> >> lately writing sequential formatted external IO
> >> /usr/local/pkg/mpich/mpich-gcc/bin/mpirun: line 1: 27659 Aborted /home/jmc/test_aces/MITgcm_gnu/verification/global_with_exf/run/./mitgcmuv -p4pg /home/jmc/test_aces/MITgcm_gnu/verification/global_with_exf/run/PI27579 -p4wd /home/jmc/test_aces/MITgcm_gnu/verification/global_with_exf/run
> >
> > Cheers,
> > Jean-Michel
> >
> > On Sun, Dec 09, 2012 at 07:01:59PM -0500, Gael Forget wrote:
> >> Update of /u/gcmpack/MITgcm/pkg/autodiff
> >> In directory baudelaire:/srv/scratch/gforget/MITgcm/pkg/autodiff
> >>
> >> Modified Files:
> >> autodiff_whtapeio_sync.F
> >> Log Message:
> >>
> >> - hard code file units to avoid conflict with nectdf file units (temporary fix?).
> >>
> >>
> >>
> >> _______________________________________________
> >> MITgcm-cvs mailing list
> >> MITgcm-cvs at mitgcm.org
> >> http://mitgcm.org/mailman/listinfo/mitgcm-cvs
> >
> > _______________________________________________
> > MITgcm-devel mailing list
> > MITgcm-devel at mitgcm.org
> > http://mitgcm.org/mailman/listinfo/mitgcm-devel
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list