[MITgcm-support] running error!

Martin Losch mlosch at awi-bremerhaven.de
Mon Aug 7 05:56:16 EDT 2006


Hi Van Thinh,

since the model completed 46640 timesteps before it crashes I can  
think of two possibilities:

1. some spontaneous problem with the hardware (unlikely with a  
floating point exception). I sometimes have the model crash without  
error messages and when I rerun the same job, it does fine, so it's  
worth retrying

2. if the error is reproducable then I assume that the model simply  
"runs out of bounds", that is something in the forcing or whatever  
causes the model to explode or diverge. In that case you should be  
able so see this happening in monitor output (if the monitorFreq is  
small enough) or at least in the cg2d output that you get at every  
timestep (if you did not set debugLevel < 0). These numbers (cfl- 
numbers or cg2d residuals) will probably diverges exponentially until  
you get numbers that the compile cannot handle. I had this happen  
after hundreds of years of integration (order 100000 timesteps) with  
a coarse model configuration when the timestep was too large.

Hope that helps,

Martin


On Aug 3, 2006, at 8:53 PM, Van Thinh Nguyen wrote:

> Hi all,
>
> I have complied & run the MItgcm on a cluster (HP Linux XC 3.0). At  
> the time 23320 s (time step=0.5s), the program was terminated with  
> this error:
>
> ------
> srun: error: req256: task[0-1]: Floating point exception (core dumped)
> srun: Terminating job ------
>
> Someone may have any idea?
>
> Thanks a lot!
>
> Van Thinh -----------------------------------------------
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support




More information about the MITgcm-support mailing list