[MITgcm-devel] Problems with yesterday changes

Gael Forget gforget at MIT.EDU
Mon Dec 10 12:53:27 EST 2012


Hi Jean Michel,

the g77 crashes would indeed be due to my check-in.

That checkin, at least with modern compilers, allows for a more thorough
adjoint test of pkg/profiles, which also uses ALLOW_AUTODIFF_WHTAPEIO. 
In this configuration, we keep binary files open throughout the run (both for profiles 
and ad tapes) and also do some netcdf I/O (profiles input and/or output). Keeping the
files open largely improves performance of I/O extensive runs (adjoint, ecco) with 
certain file systems (e.g. lustre on pleiades). The profiles package has become of 
wide use in state estimation context, and is about to get extended. So we needed 
a proper test for this. The experiment I used to this end is
							MITgcm_contrib/gael/verification/global_oce_cs32
that is easily set up with		MITgcm_contrib/gael/verification/setup_these_exps.csh
as explained in details by  	MITgcm_contrib/gael/verification/README	

I struggled mostly to make the serial run work, where 16 tiles are treated by one processor 
which requires more file units (fid). NF_OPEN was picking file units ('fid') that were already 
associated with the 'kept open files', leading to conflicts and crashes. I asked about that 
issue in my previous MITgcm-devel email. The work around I found was to hard code (as 
otherwise done in mdsfinunit.F) the range of file units for the 'kept open files'. I set them to 
large values in the 1000-6000 range, and hope nc_open does not end up using those file 
unit numbers. While I reckon this may not be the most elegant approach it works 
with gfortran and most tested compilers but not with g77. 

I did not account for the old compiler issue :
http://gcc.gnu.org/onlinedocs/gcc-3.4.6/g77/Large-File-Unit-Numbers.html
As a side note, mdsfindunit.F is not full proof either since its hardcoded
range has been widened to 9,999, and g77 is limited to 1,99. Anyway 
I am not too sure what the better approach would be. Is there a cpp switch 
I can use that tells me whether g77 is in use? In this case I could 
comment out my hard coded ranges and fall back to mdsfindunit?
Do you have a better idea? I will give you a ring at the office later.

Cheers,
Gael




On Dec 10, 2012, at 10:16 AM, Jean-Michel Campin wrote:

> Hi Gael,
> 
> Seems that this is breaking some experiments:
> on baudelaire with g77, global_ocean.cs32x15, 4 AD tests:
>> open: illegal unit number
>> apparent state: internal I/O
>> lately writing sequential formatted external IO
>> ./testreport: line 791: 20090 Aborted                 (core dumped)
> Also same error on old aces cluster (geo, 32-bits) with g77,
> same 4 AD tests.
> 
> Now there is also an other "fail", but forward test this time,
> with experiment global_with_exf (on geo with g77)
> that I suspect to be related to the other check-in you made
> yesterday, in pkg/profile, since this experiment is using of pkg/profile:
>> open: illegal unit number
>> apparent state: internal I/O
>> lately writing sequential formatted external IO
>> open: illegal unit number
>> apparent state: internal I/O
>> lately writing sequential formatted external IO
>> /usr/local/pkg/mpich/mpich-gcc/bin/mpirun: line 1: 27659 Aborted                 /home/jmc/test_aces/MITgcm_gnu/verification/global_with_exf/run/./mitgcmuv -p4pg /home/jmc/test_aces/MITgcm_gnu/verification/global_with_exf/run/PI27579 -p4wd /home/jmc/test_aces/MITgcm_gnu/verification/global_with_exf/run
> 
> Cheers,
> Jean-Michel
> 
> On Sun, Dec 09, 2012 at 07:01:59PM -0500, Gael Forget wrote:
>> Update of /u/gcmpack/MITgcm/pkg/autodiff
>> In directory baudelaire:/srv/scratch/gforget/MITgcm/pkg/autodiff
>> 
>> Modified Files:
>> 	autodiff_whtapeio_sync.F 
>> Log Message:
>> 
>> - hard code file units to avoid conflict with nectdf file units (temporary fix?).
>> 
>> 
>> 
>> _______________________________________________
>> MITgcm-cvs mailing list
>> MITgcm-cvs at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-cvs
> 
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-devel/attachments/20121210/a1a37da5/attachment.htm>


More information about the MITgcm-devel mailing list