[MITgcm-support] memory on optim.x => Exit code -5

Constantinos Evangelinos ce107 at ocean.mit.edu
Tue Jan 6 13:16:58 EST 2009


On Tuesday 06 January 2009 12:57:56 am Matthew Mazloff wrote:

> I run this job for my smaller setup (much less memory needs to be
> allocated), and everything goes through smoothly.
> I run this again with everything the same but for my larger setup and
> it crashes almost immediately with the error:
> " Exit code -5 signaled from i142-401.ranger.tacc.utexas.edu Killing
> remote processes..."

If you do
size optim.x
what does it give you?

> Does anyone know what this error means?  If this means I am asking
> for too much memory (which seems the likely case), does anyone know
> if there is a way to use 2 nodes (and thus reserve 64GB) for one
> seriel job on ranger? (http://www.tacc.utexas.edu/services/userguides/
> ranger/)

No - there is no way to see the memory across 2 different nodes without 
rewriting optim.x in a distributed memory fashion.

> Or does anyone have a better idea.  For example, would it be
> ridiculous to bring the 4GB ecco_c* files over to Stommell (or some
> other local machine) and run the linesearch there?

Well ross/weddell do not have more than 32GB of RAM either so stommel would be 
the only machine that could do it locally. If you decide to use stommel make 
sure to do the file transfers through ao or geo or itrda and not via 
ross/weddell.

Unfortunately TACC does not have a system with more shared memory than 32GB. 
If you cannot do things in those 32GB then for future growth we have to 
rewrite optim.x.

Constantinos
-- 
Dr. Constantinos Evangelinos
Department of Earth, Atmospheric and Planetary Sciences
Massachusetts Institute of Technology




More information about the MITgcm-support mailing list