[MITgcm-support] segmentation fault problem

Jean-Michel Campin jmc at ocean.mit.edu
Wed Aug 11 14:32:27 EDT 2010


Hi Q Li,

I think this call to MDS_READ_FIELD does not come from the standard MITgcm 
code, but rather from a modified/customized piece.
I was not aware of this after reading your 1rst email reporting
this "segmentation fault problem".

Might be easier and simpler (less arguments, so less chance to 
have something wrong in the list of arguments, which is probably 
why you are getting a seg-fault) to use a subroutine from pkg/wr 
to read array "dampAlpha" from file "dampCoeffFile".

Thanks,
Jean-Michel

On Wed, Aug 11, 2010 at 05:52:41AM -0700, q li wrote:
> Anyone has any thoughts on this.
> 
> I moved my code to my own laptop, and also reduced the array to 512x20. I still 
> have the segmentation problem. When I trace back to the crash, I doubt if my 
> MDS_READ_FIELD is wrong. Am I right at it? The problem is still not solved.
> 
> Here is the MDS_READ_FIELD:
> 
>       CALL MDS_READ_FIELD(
>      &     dampCoeffFile, readBinaryPrec, .TRUE.,
>      &     'RL', 1, 1, 1,
>      &      dampAlpha,dummyRS
>      &      1, myThid)
> 
> Here is some debug thing: (I don't know why I got AMD x86-64 while I am using 
> intel CPU).
> 
> (PID.TID 0000.0001) // =======================================================
> (PID.TID 0000.0001) // Parameter file "data.relaxbt"
> (PID.TID 0000.0001) // =======================================================
> (PID.TID 0000.0001) ># Open-boundaries
> (PID.TID 0000.0001) > &RELAXBT_PARM
> (PID.TID 0000.0001) > dampCoeffFile = 'dampAlpha.bin',
> (PID.TID 0000.0001) > &
> (PID.TID 0000.0001) 
> Segmentation fault (core dumped)
> [hiphop at localhost run1]$ file core.8708 
> core.8708: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, 
> from 'mitgcmuv'
> [hiphop at localhost run1]$ gdb ./mitgcmuv core.8708 
> GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
> Copyright (C) 2009 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /home/hiphop/MITgcmStudy/Iwamae/run1/mitgcmuv...(no 
> debugging symbols found)...done.
> Reading symbols from /usr/lib64/libg2c.so.0...(no debugging symbols 
> found)...done.
> Loaded symbols for /usr/lib64/libg2c.so.0
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libgcc_s.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libgcc_s.so.1
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols 
> found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Core was generated by `./mitgcmuv'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00000000004a32d6 in master_cpu_io__ ()
> (gdb) backtrace
> #0  0x00000000004a32d6 in master_cpu_io__ ()
> #1  0x000000000041f241 in mds_read_field__ ()
> #2  0x0000000000401cf8 in ini_relaxbt__ ()
> #3  0x00000000004023b1 in initialise_varia__ ()
> #4  0x000000000050f803 in the_main_loop__ ()
> #5  0x000000000050fb07 in the_model_main__ ()
> #6  0x00000000004a3229 in MAIN__ ()
> #7  0x0000000000517fb2 in main ()
> (gdb) 
> 
> Any help?
> 
> Li
> 
> 
> 
> 
> ________________________________
> From: q li <qliuri at yahoo.com>
> To: MITgcm-support at mitgcm.org
> Sent: Tue, August 10, 2010 11:40:53 AM
> Subject: [MITgcm-support] segmentation fault problem
> 
> 
> Hi users,
> 
> I am having a segmentation problem on a AMD64 cluster (ifort, mpich2). I got a 
> warning (see below) when I compiled it. Then I got an error of segmentation 
> fault. I thought it was a stack problem, but the same error still occurs even if 
> I change "ulimit -s unlimited". Anyone had this problem before?
> 
> Li
> 
> Here is the warning and error:
> 
> [hiphop at node4 build]$ make > makeoutput.txt
> sigreg.c(46): warning #556: a value of type "void *" cannot be assigned to an 
> entity of type "void (*)(int, siginfo_t *, void *)"
>       s.sa_sigaction = (void *)killhandler;
>                               ^
> [hiphop at node4 run]$ !mpi
> mpirun -np 1 ./mitgcmuv 
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
> rank 0 in job 30  node4_46608   caused collective abort of all ranks
>   exit status of rank 0: killed by signal 9 
> 
> 
>       
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support




More information about the MITgcm-support mailing list