[MITgcm-support] Segmentation Fault in MITgcm Adjoint Simulations

Yohei Takano - BAS yokano at bas.ac.uk
Tue May 7 07:24:23 EDT 2024


Hello all,

   My colleague and I have been working on adjoint sensitivity setup & simulations
based on global coarse resolution biogeochemistry model (https://github.com/MITgcm/MITgcm/tree/master/verification/tutorial_global_oce_biogeo),
similar configurations but we have our own customization.

    We manage to compile the model and run adjoint test simulations. However, the model crashes
(with segmentation fault) towards the end of adjoint simulations and we would like your advice to
figure out why this is happening. The strange part is that we manage to successfully run at one point
but when try to reproduce with the exact same setting (on the same HPC) it starts to crash again...
Roughly speaking 1 out of 5 times it runs successfully but most of the time crashes at the same point.
We are puzzled because we are using the exact same settings/executable every time and wondering what causes
this unstable situation. The environment should be the same everytime we run the model.

    Does anyone have similar experiences? Here is the code/configuration I have been working on with my colleague Dani Jones.
https://github.com/ytakano3/MITgcm_BGC_Model_Config

   In "global_ocn3deg_bgcv0" you see "code_ad" and "input_ad/input_ad_kpp_atmco2pv0" and you can compile
(with TAF) and run the model. It is 2 years test run. When I am trying this, it fails (i.e. segmentation fault) most of the time but
again say 1 in 5 times the model runs successfully... We would like to figure out why this is happening, why it gets unstable so
let me know if you have thoughts on this.

   Sorry about the long e-mail but please let me know if anything is unclear. Thank you in advance.

Regards,

Yohei



This email and any attachments are intended solely for the use of the named recipients. If you are not the intended recipient you must not use, disclose, copy or distribute this email or any of its attachments and should notify the sender immediately and delete this email from your system. UK Research and Innovation (UKRI) has taken every reasonable precaution to minimise risk of this email or any attachments containing viruses or malware but the recipient should carry out its own virus and malware checks before opening the attachments. UKRI does not accept any liability for any losses or damages which the recipient may sustain due to presence of any viruses.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20240507/f7c30074/attachment.html>


More information about the MITgcm-support mailing list