[MITgcm-devel] Fwd: job crashing with message forrtl: Text file busy

An T Nguyen antnguyen13 at gmail.com
Tue Jul 30 15:19:28 EDT 2013


they all crashed during model initialization, before the first time-step.  First 7 crashes were all due to error reading the same first grid file that was linked in.  Once i copied grid files in, it crashed reading the next linked file which happened to be the diffusivity one.

I don't think it's a model issue even though NAS is saying so.  I think something went wrong with their filesystem beginning saturday evening when my original run crashed due to reading error of the runoff (which was linked and was read just fine the first 20-hr of the run.)  Since then the linked files caused reading error within 20sec of model init.  But they say it's "our model issue"...

An


On Jul 30, 2013, at 2:48 PM, Matthew Mazloff wrote:

> is it crashing at a different place/timestep each time?  If so that is a big hint that it is your use of the filesystem and not a systematic model issue
> 
> is the runtime directory a local directory (off the NFS) -- or is the runtime still on the NFS?
> 
> I still think your problem is the same as mine :o)
> 
> Matt
> 
> 
> On Jul 30, 2013, at 10:31 AM, An T Nguyen <antnguyen13 at gmail.com> wrote:
> 
>> hi Matt, 
>> 
>> the first crash was in the middle of the run, error reading the runoff climatology.  Then over the following 16 hours, I resubmitted 8 times and they all crash immediately during model initialization with problems reading the grid files from links.  Next, I manually copied those grid files into run-time dir, the model read grid info ok, then crashed again due to error reading in a linked 3-d diffusivity file.   So the last test I did was copying all binary input files into runtime dir.  Then the model ran without a problem... (still running, fingers crossed it's not crashing...)
>> 
>> But yes, let's chat later today.  I still think my problem is not quite the same as yours.  I was surprised that the NAS people also didn't know why things were crashing, so they kept suggesting to me to just resubmit the job even though it seems to be very systematic.
>> An
>> 
>> 
>> On Jul 30, 2013, at 12:19 PM, Matthew Mazloff wrote:
>> 
>>> Hi An,
>>> 
>>> I've had this problem many times. It is an issue with overwhelming the NFS. Its not the link but the fact that you are opening and closing so many files at once. There are many possible fixes so it depends on your specific problem. Is it only during model initialization when this is happening?
>>> 
>>> Perhaps we can chat today about it?
>>> 
>>> Matt
>>> 
>>> 
>>> 
>>> On Jul 28, 2013, at 8:34 PM, An T Nguyen <antnguyen13 at gmail.com> wrote:
>>> 
>>>> Hello mitgcm-devel gurus,
>>>> 
>>>> Just thought i put this here.  Since yesterday, i kept having the mitgcm crashed on pleiades system with a series of error message:
>>>> 
>>>> forrtl: Text file busy
>>>> forrtl: severe: open failure, unit 9, file filename <-- where filename can be anything from grid, bathymetry, runoff, obcs, etc. which are LINKED from another directory.
>>>> 
>>>> I've ran ok for the last few years linking files instead of copying into run-time dir without issue until yesterday.  After 2 days talking to NAS, it seems they suggest that we fix on our end instead of them taking any action...
>>>> 
>>>> Just checking if anyone else has issue or just me.  I had a job running the last 2 days, then suddenly it crashed yesterday and wouldn't run again until i manually copy all input binary files into my runtime dir.  Atmospheric forcing seems to be the exception as I'm linking them and it's running without crashing.
>>>> 
>>>> An
>>>> 
>>>> Begin forwarded message:
>>>> 
>>>>> From: Johnny Chang <Johnny.Chang at nasa.gov>
>>>>> Date: July 28, 2013 10:39:03 PM EDT
>>>>> To: An T Nguyen <antnguyen13 at gmail.com>
>>>>> Subject: Re: INC000000053332 : APP : Low : job crashing with message forrtl: Text file busy
>>>>> 
>>>>> On 7/28/13 7:22 PM, An T Nguyen wrote:
>>>>>> i think it's related to files that are linked.
>>>>> 
>>>>> This makes more sense now.  When your program calls OPEN, it will
>>>>> open it for both read and write.  It's possible that you can't
>>>>> really write to those linked files, and hence the fortran runtime
>>>>> library (forrtl) complains "text file busy".  If you are only
>>>>> reading those linked files and not writing to them, then you
>>>>> can make the file read-only:
>>>>> 
>>>>> chmod u-wx some_file
>>>>> 
>>>>> will delete the (u)ser permission to (w)rite and e(x)ecute that some_file.
>>>>> 
>>>>> Alternately, you would need to modify the OPEN statement and put in
>>>>> action='read' to tell the fortran runtime library to OPEN the file with
>>>>> 'read' only.
>>>>> 
>>>>> Johnny
>>>>> 
>>>>> -- 
>>>>> Johnny Chang
>>>>> 650-604-4356

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mitgcm.org/pipermail/mitgcm-devel/attachments/20130730/133c211b/attachment.htm>


More information about the MITgcm-devel mailing list