[MITgcm-devel] Problems with yesterday changes

Gael Forget gforget at MIT.EDU
Mon Dec 10 15:10:58 EST 2012


Jean Michel,

> few remarks:
> 1) I did not answer your question about nc_open because I have no
>  idea of how nc_open works.
> 2) for non-netcdf file, it's seems to me that the use of MDSFINDUNIT
> is the right way to go; it seems to work (despite your remark about
> file range limit of 99) and so the question should be, in my view, to try 
> to extend the same approach to NetCDF file ? (rather than to return to
>  hard-coded file id).
I agree. I just dont know how to do it at present time. Can you 
explain to me how pkg/mnc chooses netcdf its file units? 
May be that would hint to a solution for pkg/profiles.
> 3) if you are searching for a solution, we can talk when you come at MIT.
> 4) I am not sure that the problem is limited to g77 compiler, because
>  when NetCDF is not available, pkg/profile is turned off, so global_with_exf
>  might show more fail if NetCDF was available (old pgi compiler for instance);
>  and not as many compilers are used for AD testing (AD global_ocean.cs32x15).
I believe that this is the reason why on baudelaire g77 did not 
fail the global_with_exf forward. Right? Unfortunately I dont
have access to another machine to test g77, so I am not 
sure how to proceed with regard to pkg/profiles vs g77. The 
global_ocean.cs32x15 ad tape issue I was able to reproduce.
> 5) I would like to keep FWD global_with_exf tested with g77. What do you 
>  propose.
For now, we may simply switch pkg/profiles off in data.pkg. Shall I do this?

For the global_ocean.cs32x15 adjoint, a quick although ugly fix would 
be to put ALLOW_PROFILES brackets in autodiff_whtapeio_sync.F 
reverting to MDSFINDUNIT when profiles is not used.
Shall I do this as well? 
> Cheers,
> Jean-Michel
Cheers,
Gael

> 
> On Mon, Dec 10, 2012 at 12:53:27PM -0500, Gael Forget wrote:
>> Hi Jean Michel,
>> 
>> the g77 crashes would indeed be due to my check-in.
>> 
>> That checkin, at least with modern compilers, allows for a more thorough
>> adjoint test of pkg/profiles, which also uses ALLOW_AUTODIFF_WHTAPEIO. 
>> In this configuration, we keep binary files open throughout the run (both for profiles 
>> and ad tapes) and also do some netcdf I/O (profiles input and/or output). Keeping the
>> files open largely improves performance of I/O extensive runs (adjoint, ecco) with 
>> certain file systems (e.g. lustre on pleiades). The profiles package has become of 
>> wide use in state estimation context, and is about to get extended. So we needed 
>> a proper test for this. The experiment I used to this end is
>> 							MITgcm_contrib/gael/verification/global_oce_cs32
>> that is easily set up with		MITgcm_contrib/gael/verification/setup_these_exps.csh
>> as explained in details by  	MITgcm_contrib/gael/verification/README	
>> 
>> I struggled mostly to make the serial run work, where 16 tiles are treated by one processor 
>> which requires more file units (fid). NF_OPEN was picking file units ('fid') that were already 
>> associated with the 'kept open files', leading to conflicts and crashes. I asked about that 
>> issue in my previous MITgcm-devel email. The work around I found was to hard code (as 
>> otherwise done in mdsfinunit.F) the range of file units for the 'kept open files'. I set them to 
>> large values in the 1000-6000 range, and hope nc_open does not end up using those file 
>> unit numbers. While I reckon this may not be the most elegant approach it works 
>> with gfortran and most tested compilers but not with g77. 
>> 
>> I did not account for the old compiler issue :
>> http://gcc.gnu.org/onlinedocs/gcc-3.4.6/g77/Large-File-Unit-Numbers.html
>> As a side note, mdsfindunit.F is not full proof either since its hardcoded
>> range has been widened to 9,999, and g77 is limited to 1,99. Anyway 
>> I am not too sure what the better approach would be. Is there a cpp switch 
>> I can use that tells me whether g77 is in use? In this case I could 
>> comment out my hard coded ranges and fall back to mdsfindunit?
>> Do you have a better idea? I will give you a ring at the office later.
>> 
>> Cheers,
>> Gael
>> 
>> 
>> 
>> 
>> On Dec 10, 2012, at 10:16 AM, Jean-Michel Campin wrote:
>> 
>>> Hi Gael,
>>> 
>>> Seems that this is breaking some experiments:
>>> on baudelaire with g77, global_ocean.cs32x15, 4 AD tests:
>>>> open: illegal unit number
>>>> apparent state: internal I/O
>>>> lately writing sequential formatted external IO
>>>> ./testreport: line 791: 20090 Aborted                 (core dumped)
>>> Also same error on old aces cluster (geo, 32-bits) with g77,
>>> same 4 AD tests.
>>> 
>>> Now there is also an other "fail", but forward test this time,
>>> with experiment global_with_exf (on geo with g77)
>>> that I suspect to be related to the other check-in you made
>>> yesterday, in pkg/profile, since this experiment is using of pkg/profile:
>>>> open: illegal unit number
>>>> apparent state: internal I/O
>>>> lately writing sequential formatted external IO
>>>> open: illegal unit number
>>>> apparent state: internal I/O
>>>> lately writing sequential formatted external IO
>>>> /usr/local/pkg/mpich/mpich-gcc/bin/mpirun: line 1: 27659 Aborted                 /home/jmc/test_aces/MITgcm_gnu/verification/global_with_exf/run/./mitgcmuv -p4pg /home/jmc/test_aces/MITgcm_gnu/verification/global_with_exf/run/PI27579 -p4wd /home/jmc/test_aces/MITgcm_gnu/verification/global_with_exf/run
>>> 
>>> Cheers,
>>> Jean-Michel
>>> 
>>> On Sun, Dec 09, 2012 at 07:01:59PM -0500, Gael Forget wrote:
>>>> Update of /u/gcmpack/MITgcm/pkg/autodiff
>>>> In directory baudelaire:/srv/scratch/gforget/MITgcm/pkg/autodiff
>>>> 
>>>> Modified Files:
>>>> 	autodiff_whtapeio_sync.F 
>>>> Log Message:
>>>> 
>>>> - hard code file units to avoid conflict with nectdf file units (temporary fix?).
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> MITgcm-cvs mailing list
>>>> MITgcm-cvs at mitgcm.org
>>>> http://mitgcm.org/mailman/listinfo/mitgcm-cvs
>>> 
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>> 
> 
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
> 
> 
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel




More information about the MITgcm-devel mailing list