[MITgcm-support] A very strange problem

Jean-Michel Campin jmc at ocean.mit.edu
Tue Jun 1 13:57:08 EDT 2010


Hi Dwight,

I found:
[node7:03388] Signal: Floating point exception (8)
[node7:03388] Signal code: Floating point divide-by-zero (3)
in file "my_job.o1562"
and this is why the run stopped.

It could be that the model just blew up, or you bump into a specific
problem in the code (which might be related to OBCS + NonHydrostatic)

One usefull thing to look at in the STDOUT would be 
a) cg2d & cg3d stuff
b) all the monitor output (in particular checking the CFL)
c) may be 1 of the STDOUT is longer and contains more
 information (e.g., the processor that found this division
 by zero might have flushed its buffer, whereas the other
 might not), and if it's the case, the end of this particular 
 STDOUT might tell more.

I also noticed that cg3dMaxIters=20 is much less than the number
of levels. To get a good convergence of the 3-D solver you need 
much more, but then it slows down the run.
But generally, as a compromise, I try to keep at least 
cg3dMaxIters > a couple of times Nr.
cg2dMax=300 might also be relatively small since your domain (Nx,Ny,Nr) 
is quiet big, but it could also be OK. to check.

Cheers,
Jean-Michel

To have m better idea of why this happens, ore insite
On Sun, May 30, 2010 at 11:24:10AM -0400, Jean-Michel Campin wrote:
> Hi Dwight,
> 
> There is no particular restriction on file you can attach to email
> sent to MITgcm-support, apart from the size (should not be too big,
> so sending movie or large image file is not a good idea). It's better
> to use standard format that everybody can read (otherwise you tend to
> limit the number of people that are likely to answer your question).
> 
> Have not yet look at the attached file.
> 
> Thanks,
> Jean-Michel
> 
> On Sun, May 30, 2010 at 09:17:01AM +0800, ouc.edu.cn wrote:
> > Hi Jean-Michel,
> >   Thank you very much for your reply. Since I don't know whether we can attach files or not when writing emails to 'mitgcm-support at mitgcm.org', I'd better write directly to you. Hope you do not mind it.
> >  Attached is several files from my model configurations. There is no error message at all in 'STDERR.xxxx', and the content below is the very last few lines from 'STDOUT.0000':
> > PID.TID 0000.0001) // =======================================================
> > (PID.TID 0000.0001) // Begin MONITOR dynamic field statistics
> > (PID.TID 0000.0001) // =======================================================
> > (PID.TID 0000.0001) %MON time_tsnumber                =                   317
> > (PID.TID 0000.0001) %MON time_secondsf                =   3.9625000000000E+03
> > (PID.TID 0000.0001) %MON dynstat_eta_max              =   4.2644526849910E-01
> > (P
> >  
> > There several files attached in this email, in which, 'my_job.o1562' is the output file from the system showing why it is terminated. In data file, I did not do nothing but only changed 'f0' and ' vVelInitFile', and the model just stoped without overflowing.(please see my last email).
> > Thanks again for your help.
> > Best Regards,
> > Dwight
> 


> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support




More information about the MITgcm-support mailing list