[MITgcm-support] opteron problems, cont'd

Samar Khatiwala spk at ldeo.columbia.edu
Fri Jun 1 11:29:06 EDT 2007


Hi Matt

I'm just running on 8 CPU's.

I'll try your suggestion. I know there is a "TARGET_BGL" cpp option  
now, but switching it on doesn't do
exactly the same thing as what you suggest.

This is just a "standard" opteron cluster with PGI compilers. So its  
all a bit odd.

Thanks, Samar

On Jun 1, 2007, at 4:59 PM, Matthew Mazloff wrote:

> Hi Samar,
>
> no idea how many procs you are using or if this is at all related,  
> but on bluegene machines i have come across a problem with scratch  
> files when using many procs and fixed it by adding
>
>       WRITE(scrname1,'(3a)') 'scratch',myProcessStr(1:4),'_1'
>       WRITE(scrname2,'(3a)') 'scratch',myProcessStr(1:4),'_2'
>
> before
>
>       OPEN(UNIT=scrUnit1,FILE=scrname1,STATUS='UNKNOWN')
>       OPEN(UNIT=scrUnit2,FILE=scrname2,STATUS='UNKNOWN')
>
> -Matt
>
>
>
> On Jun 1, 2007, at 7:45 AM, Samar Khatiwala wrote:
>
>> Hi
>>
>> I think I figured out part of the problem. If I replace
>>
>>      OPEN(UNIT=scrUnit1,STATUS='SCRATCH')
>>      OPEN(UNIT=scrUnit2,STATUS='SCRATCH')
>>
>> with
>>
>>       OPEN(UNIT=scrUnit1,FILE='scratch1',STATUS='UNKNOWN')
>>       OPEN(UNIT=scrUnit2,FILE='scratch2',STATUS='UNKNOWN')
>>
>> in ini_parms.F, open_copy_data_file.F, and eeset_parms.F, the  
>> model seems to go much further along.
>> This may have something to do with inadequate write permissions  
>> in /tmp (or wherever the scratch files
>> are created) on the nodes.
>>
>> However, the model still hangs and now gives the following error:
>>
>> PGFIO-F-228/namelist read/unit=11/end of file reached without  
>> finding group.
>> File name = scratch1    formatted, sequential access   record = 10
>> In source file packages_boot.f, at line number 1704
>> PGFIO-F-228/namelist read/unit=11/end of file reached without  
>> finding group.
>> File name = scratch1    formatted, sequential access   record = 1
>> In source file gmredi_readparms.f, at line number 3554
>> Terminated
>>
>> The above lines correspond to:
>>
>> READ(UNIT=iUnit,NML=PACKAGES) in packages_boot.F, and
>> READ(UNIT=iUnit,NML=GM_PARM01) in gmredi_readparms.F
>>
>> Any ideas on fixing this would be appreciated. The code runs fine  
>> on another cluster (32bit Athlon).
>>
>> Thanks, Samar
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support




More information about the MITgcm-support mailing list