[MITgcm-support] Problem with MPI execution

Abbas Dorostkar abbas.dorostkar at ce.queensu.ca
Thu May 22 10:41:59 EDT 2008


Hi Martin and Dimitris,
Thanks so much!

I do use option "-mpi" when I generated the Makefile with genmake2.
There is no problem with non-mpi executable as well. 
Martin, you were right! I just noticed that when I use the command "ln
-s ../input/* .", it doesn't link eedata to my run folder properly.
Anyways, I copied eedata in the run folder and I get new error. 
What do you mean by "maybe the read permission are different for
mpirun". How can I change it? Your fruitful comments are much
appreciated!

I have attached the new error as follows:

STOP ABNORMAL END: S/R EESET_PARMS
STOP ABNORMAL END: S/R EESET_PARMS

(PID.TID 0001.0001) *** ERROR *** S/R EESET_PARMS
(PID.TID 0001.0001) *** ERROR *** Error reading execution environment
(PID.TID 0001.0001) *** ERROR *** parameter file "eedata"

 
PID.TID 0000.0001) //
======================================================
(PID.TID 0000.0001) //                      MITgcm UV
(PID.TID 0000.0001) //                      =========
(PID.TID 0000.0001) //
======================================================
(PID.TID 0000.0001) // execution environment starting up...
(PID.TID 0000.0001)
(PID.TID 0000.0001) // MITgcmUV version:  checkpoint59j
(PID.TID 0000.0001) // Build user:        Abbas
(PID.TID 0000.0001) // Build host:        LEO.CiVil.QueensU.Ca
(PID.TID 0000.0001) // Build date:        Thu May 22 09:09:53 EDT 2008
(PID.TID 0000.0001)
(PID.TID 0000.0001) //
=======================================================
(PID.TID 0000.0001) // Execution Environment parameter file "eedata"
(PID.TID 0000.0001) //
=======================================================
(PID.TID 0000.0001) ># Example "eedata" file
(PID.TID 0000.0001) ># Lines beginning "#" are comments
(PID.TID 0000.0001) ># nTx - No. threads per process in X
(PID.TID 0000.0001) ># nTy - No. threads per process in Y
(PID.TID 0000.0001) > &EEPARMS
(PID.TID 0000.0001) > nTx=1,
(PID.TID 0000.0001) > nTy=1,
(PID.TID 0000.0001) > usingMPI=.TRUE.
(PID.TID 0000.0001) > &
(PID.TID 0000.0001) ># Note: Some systems use & as the
(PID.TID 0000.0001) ># namelist terminator. Other systems
(PID.TID 0000.0001) ># use a / character (as shown here).
(PID.TID 0000.0001)
(PID.TID 0000.0001) // Shown below is an example "eedata" file.
(PID.TID 0000.0001) // To use this example copy and paste the
(PID.TID 0000.0001) // ">" lines. Then remove the text up to
(PID.TID 0000.0001) // and including the ">".
(PID.TID 0000.0001) ># Example "eedata" file
(PID.TID 0000.0001) ># Lines beginning "#" are comments
(PID.TID 0000.0001) ># nTx - No. threads per process in X
(PID.TID 0000.0001) ># nTy - No. threads per process in Y
(PID.TID 0000.0001) >&EEPARMS
(PID.TID 0000.0001) >nTx=1,nTy=1
(PID.TID 0000.0001) >/
(PID.TID 0000.0001) ># Note: Some systems use & as the
(PID.TID 0000.0001) ># namelist terminator. Other systems
(PID.TID 0000.0001) ># use a / character (as shown here).



Thanks again 
Abbas



-----Original Message-----
From: mitgcm-support-bounces at mitgcm.org
[mailto:mitgcm-support-bounces at mitgcm.org] On Behalf Of Martin Losch
Sent: May 22, 2008 9:25 AM
To: mitgcm-support at mitgcm.org
Subject: Re: [MITgcm-support] Problem with MPI execution

I don't think that the MPI within the MITgcm is the problem. Try to  
generate a non-mpi executable and see if the problem goes away.

The error message is clear: eedata cannot be opened. Is it in the  
correct directory? What about read permissions? Maybe the read  
permission are different for mpirun? This is the code where it happens:

>       OPEN(UNIT=eeDataUnit,FILE='eedata',STATUS='OLD',
>      &     err=1,IOSTAT=errIO)
>       IF ( errIO .GE. 0 ) GOTO 2
>     1 CONTINUE
>        WRITE(msgBuf,'(A)')
>      &  'S/R EESET_PARMS'
>        CALL PRINT_ERROR( msgBuf , 1)
>        WRITE(msgBuf,'(A)')
>      &  'Unable to open execution environment'
>        CALL PRINT_ERROR( msgBuf , 1)
>        WRITE(msgBuf,'(A)')
>      &  'parameter file "eedata"'
>        CALL PRINT_ERROR( msgBuf , 1)
>        CALL EEDATA_EXAMPLE
>        STOP 'ABNORMAL END: S/R EESET_PARMS'

no other ideas on this side of the Atlantic ...

Martin

On 22 May 2008, at 15:06, Dimitris Menemenlis wrote:

> Abbas, you probably have already done this but to be sure:
> did you use option "-mpi" when you generated the Makefile with  
> genmake2 ?
>
> Dimitris Menemenlis <menemenlis at sbcglobal.net>
> 5056 Oakwood Ave, La Canada, CA 91011-2450
> tel/fax: 818-790-6735;  cell: 818-625-6498
>
> On May 22, 2008, at 6:02 AM, Abbas Dorostkar wrote:
>
>> Hi Martin,
>> Thanks for quick reply.
>> It is strange because I have this file in the folder I run  
>> mitgcmuv. I
>> have attached it:
>>
>>> # Example "eedata" file
>>>> # Lines beginning "#" are comments
>>>> # nTx - No. threads per process in X
>>>> # nTy - No. threads per process in Y
>>>> &EEPARMS
>>>> nTx=1,
>>>> nTy=1,
>>>> usingMPI=.TRUE.
>>>> &
>>>> # Note: Some systems use & as the
>>>
>>> # namelist terminator. Other systems
>>>> # use a / character (as shown
>>
>> Do you have any other idea? I have been trying to fix this problem  
>> for a
>> while
>> Abbas
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: mitgcm-support-bounces at mitgcm.org
>> [mailto:mitgcm-support-bounces at mitgcm.org] On Behalf Of Martin Losch
>> Sent: May 22, 2008 3:42 AM
>> To: mitgcm-support at mitgcm.org
>> Subject: Re: [MITgcm-support] Problem with MPI execution
>>
>> Abbas,
>>
>> you are missing the file "eedata", as the error messgae clearly tells
>> you. I won't tell you how often I made this mistake!!!
>>
>> Martin
>>
>> On 21 May 2008, at 22:46, Abbas Dorostkar wrote:
>>
>>> Dear all,
>>>
>>>
>>>
>>> I have been trying to run the exp1 with MPI execution on my desktop
>>> (ia32_linux equipped with one dual 1.5 processor) before running my
>>> own model on a node with 72  dual-core processors and 570 GB RAM. I
>>> haven't got any error during compiling. However, when I run the
>>> mitgcmuv with command "mpirun -np 2 ./mitgcmuv", I get following
>>> error:
>>>
>>>
>>>
>>> STOP ABNOSTOP ABNORMAL END: S/R EESET_PARMS
>>> RMAL rank 0 in job 1  LEO.CiVil.QueensU.Ca_55941   caused
>>> collective abort of all ranks
>>>  exit status of rank 0: return code 0
>>>
>>> (PID.TID 0000.0001) *** ERROR *** S/R EESET_PARMS
>>> (PID.TID 0000.0001) *** ERROR *** Unable to open execution  
>>> environment
>>> (PID.TID 0000.0001) *** ERROR *** parameter file "eedata"
>>>
>>>
>>>
>>> I run successfully some simple "Hello World"-type MPI programs,
>>> showing my MPI (MPICH2) install is working correctly. I don't know
>>> what I am missing?? Could someone provide some solution?  Here I
>>> have attached size.h, eedata and optfile :
>>>
>>>
>>>      PARAMETER (
>>>
>>>     &           sNx =  60,
>>>
>>>     &           sNy =  60,
>>>
>>>     &           OLx =   2,
>>>
>>>     &           OLy =   2,
>>>
>>>     &           nSx =   1,
>>>
>>>     &           nSy =   1,
>>>
>>>     &           nPx =   2,
>>>
>>>     &           nPy =   1,
>>>
>>>     &           Nx  = sNx*nSx*nPx,
>>>
>>>     &           Ny  = sNy*nSy*nPy,
>>>
>>>     &           Nr  =   4)---------------------------------------
>>> # Example "eedata" file
>>>
>>> # Lines beginning "#" are comments
>>>
>>> # nTx - No. threads per process in X
>>>
>>> # nTy - No. threads per process in Y
>>>
>>> &EEPARMS
>>>
>>> nTx=1,
>>>
>>> nTy=1,
>>>
>>> usingMPI=.TRUE.
>>>
>>> &
>>>
>>> # Note: Some systems use & as the
>>>
>>> # namelist terminator. Other systems
>>>
>>> # use a / character (as shown
>>> here).---------------------------------------
>>> #!/bin/bash
>>>
>>> #
>>>
>>> #  $Header: /u/gcmpack/MITgcm/tools/build_options/linux_ia32_g77
>>> +mpi_cg01,v 1.6 2006/03/24 22:34:43 edhill Exp $
>>>
>>> #  $Name:  $
>>>
>>>
>>>
>>> FC='/usr/local/bin/mpif77'
>>>
>>> CC='/usr/local/bin/mpicc'
>>>
>>> DEFINES='-DALLOW_USE_MPI -DALWAYS_USE_MPI -D_BYTESWAPIO -
>>> DWORDLENGTH=4'
>>>
>>> INCLUDEDIRS='/usr/local/include'
>>>
>>> INCLUDES='-I/usr/local/include'
>>>
>>> CPP='/lib/cpp  -traditional -P'
>>>
>>> NOOPTFLAGS='-O0'
>>>
>>>
>>>
>>> if test "x$IEEE" = x ; then
>>>
>>>    #  No need for IEEE-754
>>>
>>>    FFLAGS='-Wimplicit -Wunused -Wuninitialized'
>>>
>>>    FOPTIM='-O3 -malign-double -funroll-loops'
>>>
>>> else
>>>
>>>    #  Try to follow IEEE-754
>>>
>>>    FFLAGS='-Wimplicit -Wunused -ffloat-store'
>>>
>>>    FOPTIM='-O0 -malign-double'
>>>
>>> fi
>>>
>>>
>>>
>>> # netcdf
>>>
>>> #LIBS="-lnetcdf"
>>> Your help would be much appreciated.
>>>
>>> Thanks a lot
>>>
>>> Abbas
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> MITgcm-support mailing list
>>> MITgcm-support at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>>
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-support
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support

_______________________________________________
MITgcm-support mailing list
MITgcm-support at mitgcm.org
http://mitgcm.org/mailman/listinfo/mitgcm-support




More information about the MITgcm-support mailing list