[MITgcm-support] scratch1.00000#### in run directory

Thu Sep 27 04:28:34 EDT 2018

Hi Ivana,

this is the normal behavior for reading namelist files: they are first read line by line as character variables and then written to scratch files where scratch2 is the same as scratch1 with the comment lines (#) removed (or vice versa). The Fortran READ then reads scratch2 (i.e. the one without the comment lines) and afterwards closes both files with status=“DELETE” (i.e. delete after closing). This happens very quickly and you normally don’t see these files. When the program stops before this CLOSE statement, then you are left with the scratch files. Every MPI process opens it’s own pair of scratch file.

in checkpoint66j (2017/08/15), we changed the default behavoir of namelist reading. You can recover the old behavior by setting USE_FORTRAN_SCRATCH_FILES in CPP_EEOPTIONS.h. Then the scratch files may not appear in your working directory, but somewhere on some tmp-filesystem where you never see them.

I would not define SINGLE_DISK_IO during debugging, because you will then only get output from process 0.

debugging namelist files is a pain, and Matt’s suggestion is the way to go (strip data.exf and then add back lines, if your compile actually gives you information about the line where the error appears, all the better, then having the scratch files without comment lines available is certainly an advantage). 
I would use the same namelist files in a small 1-CPU configuration on your laptop or desktop computer, where you don’t have to queue up to get hundreds of CPUs. For example you could just use one tile of your config. The model will probably crash very quickly, but most likely only in the first time step after you’ve read all of your namelist files.

Martin

> On 27. Sep 2018, at 00:54, Matthew Mazloff <mmazloff at ucsd.edu> wrote:
> 
> Hi Ivana
> 
> I think setting
> #define SINGLE_DISK_IO
> in CPP_EEOPTIONS will reduce it to just one set of files 
> 
> defining/undefining 
>  TARGET_CRAYXT
> in CPP_EEOPTIONS will also change how scratch files are written
> 
> Usually when I have a namelist error (which it sounds like you have) there is a job log output file that tiles me what line in scratch is the problem
> 
> If not you could try stripping down data.exf and adding lines back in one at a time….
> 
> Matt
> 
> Matt
> 
> 
> 
> 
>> On Sep 26, 2018, at 1:20 PM, Ivana Escobar <ivana at utexas.edu> wrote:
>> 
>> Hi from Texas,
>> 
>> I’m working on debugging my data.* files to get a problem to run. When it crashes, I get two series of files in my run directory: 
>> scratch1.00000#### (I get one for each processor I call for, 698 in my case)
>> scratch2.00000#### (numbering goes up to 698 in my case, but file count is way less)
>> 
>> The contents of the files vary from being completely empty to containing data.exf content, which is where I think my bug is. I am not sure how I turned on this level of debugging, but can anyone suggest which series of flags control printing these large quantity of files?
>> 
>> Thank you,
>> 
>> Ivana Escobar
>> Graduate Student
>> The University of Texas at Austin
>> The Institute for Computational Engineering and Sciences
>> POB 3SEi4D
>> ivana at utexas.edu
>> (210) 788-1499
>> 
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support