[MITgcm-devel] Problems with yesterday changes
Gael Forget
gforget at MIT.EDU
Mon Dec 10 15:10:58 EST 2012
Jean Michel,
> few remarks:
> 1) I did not answer your question about nc_open because I have no
> idea of how nc_open works.
> 2) for non-netcdf file, it's seems to me that the use of MDSFINDUNIT
> is the right way to go; it seems to work (despite your remark about
> file range limit of 99) and so the question should be, in my view, to try
> to extend the same approach to NetCDF file ? (rather than to return to
> hard-coded file id).
I agree. I just dont know how to do it at present time. Can you
explain to me how pkg/mnc chooses netcdf its file units?
May be that would hint to a solution for pkg/profiles.
> 3) if you are searching for a solution, we can talk when you come at MIT.
> 4) I am not sure that the problem is limited to g77 compiler, because
> when NetCDF is not available, pkg/profile is turned off, so global_with_exf
> might show more fail if NetCDF was available (old pgi compiler for instance);
> and not as many compilers are used for AD testing (AD global_ocean.cs32x15).
I believe that this is the reason why on baudelaire g77 did not
fail the global_with_exf forward. Right? Unfortunately I dont
have access to another machine to test g77, so I am not
sure how to proceed with regard to pkg/profiles vs g77. The
global_ocean.cs32x15 ad tape issue I was able to reproduce.
> 5) I would like to keep FWD global_with_exf tested with g77. What do you
> propose.
For now, we may simply switch pkg/profiles off in data.pkg. Shall I do this?
For the global_ocean.cs32x15 adjoint, a quick although ugly fix would
be to put ALLOW_PROFILES brackets in autodiff_whtapeio_sync.F
reverting to MDSFINDUNIT when profiles is not used.
Shall I do this as well?
> Cheers,
> Jean-Michel
Cheers,
Gael
>
> On Mon, Dec 10, 2012 at 12:53:27PM -0500, Gael Forget wrote:
>> Hi Jean Michel,
>>
>> the g77 crashes would indeed be due to my check-in.
>>
>> That checkin, at least with modern compilers, allows for a more thorough
>> adjoint test of pkg/profiles, which also uses ALLOW_AUTODIFF_WHTAPEIO.
>> In this configuration, we keep binary files open throughout the run (both for profiles
>> and ad tapes) and also do some netcdf I/O (profiles input and/or output). Keeping the
>> files open largely improves performance of I/O extensive runs (adjoint, ecco) with
>> certain file systems (e.g. lustre on pleiades). The profiles package has become of
>> wide use in state estimation context, and is about to get extended. So we needed
>> a proper test for this. The experiment I used to this end is
>> MITgcm_contrib/gael/verification/global_oce_cs32
>> that is easily set up with MITgcm_contrib/gael/verification/setup_these_exps.csh
>> as explained in details by MITgcm_contrib/gael/verification/README
>>
>> I struggled mostly to make the serial run work, where 16 tiles are treated by one processor
>> which requires more file units (fid). NF_OPEN was picking file units ('fid') that were already
>> associated with the 'kept open files', leading to conflicts and crashes. I asked about that
>> issue in my previous MITgcm-devel email. The work around I found was to hard code (as
>> otherwise done in mdsfinunit.F) the range of file units for the 'kept open files'. I set them to
>> large values in the 1000-6000 range, and hope nc_open does not end up using those file
>> unit numbers. While I reckon this may not be the most elegant approach it works
>> with gfortran and most tested compilers but not with g77.
>>
>> I did not account for the old compiler issue :
>> http://gcc.gnu.org/onlinedocs/gcc-3.4.6/g77/Large-File-Unit-Numbers.html
>> As a side note, mdsfindunit.F is not full proof either since its hardcoded
>> range has been widened to 9,999, and g77 is limited to 1,99. Anyway
>> I am not too sure what the better approach would be. Is there a cpp switch
>> I can use that tells me whether g77 is in use? In this case I could
>> comment out my hard coded ranges and fall back to mdsfindunit?
>> Do you have a better idea? I will give you a ring at the office later.
>>
>> Cheers,
>> Gael
>>
>>
>>
>>
>> On Dec 10, 2012, at 10:16 AM, Jean-Michel Campin wrote:
>>
>>> Hi Gael,
>>>
>>> Seems that this is breaking some experiments:
>>> on baudelaire with g77, global_ocean.cs32x15, 4 AD tests:
>>>> open: illegal unit number
>>>> apparent state: internal I/O
>>>> lately writing sequential formatted external IO
>>>> ./testreport: line 791: 20090 Aborted (core dumped)
>>> Also same error on old aces cluster (geo, 32-bits) with g77,
>>> same 4 AD tests.
>>>
>>> Now there is also an other "fail", but forward test this time,
>>> with experiment global_with_exf (on geo with g77)
>>> that I suspect to be related to the other check-in you made
>>> yesterday, in pkg/profile, since this experiment is using of pkg/profile:
>>>> open: illegal unit number
>>>> apparent state: internal I/O
>>>> lately writing sequential formatted external IO
>>>> open: illegal unit number
>>>> apparent state: internal I/O
>>>> lately writing sequential formatted external IO
>>>> /usr/local/pkg/mpich/mpich-gcc/bin/mpirun: line 1: 27659 Aborted /home/jmc/test_aces/MITgcm_gnu/verification/global_with_exf/run/./mitgcmuv -p4pg /home/jmc/test_aces/MITgcm_gnu/verification/global_with_exf/run/PI27579 -p4wd /home/jmc/test_aces/MITgcm_gnu/verification/global_with_exf/run
>>>
>>> Cheers,
>>> Jean-Michel
>>>
>>> On Sun, Dec 09, 2012 at 07:01:59PM -0500, Gael Forget wrote:
>>>> Update of /u/gcmpack/MITgcm/pkg/autodiff
>>>> In directory baudelaire:/srv/scratch/gforget/MITgcm/pkg/autodiff
>>>>
>>>> Modified Files:
>>>> autodiff_whtapeio_sync.F
>>>> Log Message:
>>>>
>>>> - hard code file units to avoid conflict with nectdf file units (temporary fix?).
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> MITgcm-cvs mailing list
>>>> MITgcm-cvs at mitgcm.org
>>>> http://mitgcm.org/mailman/listinfo/mitgcm-cvs
>>>
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>
>
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list