<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi Christoph et al,<div class=""><br class=""></div><div class="">In case this is of interest to you, there is an example of running a 96 core, 20-year global-ocean, MITgcm simulation on the <a href="https://aws.amazon.com" class="">Amazon cloud</a> using <a href="https://cfncluster.readthedocs.io/en/latest/" class="">cfncluster</a> @</div><div class=""><br class=""></div><div class=""><a href="https://eccov4.readthedocs.io/en/latest/runs.html#" class="">https://eccov4.readthedocs.io/en/latest/runs.html#</a> (on-premise instructions)</div><div class=""><a href="https://github.com/gaelforget/ECCOv4/tree/master/example_scripts/" class="">https://github.com/gaelforget/ECCOv4/tree/master/example_scripts/</a> (see README.md)</div><div class=""><a href="https://www.doi.org/10.13140/RG.2.2.13115.46889" class="">https://www.doi.org/10.13140/RG.2.2.13115.46889</a> (see 2nd part of presentation) </div><div class=""><br class=""></div><div class="">This dates back to 2017 so it’s possible that the quoted cost & performance (40$, 36h), as well as the workflow need updating. Have not tried on Google, Microsoft, or other cloud services but would be great if we could have a whole set of these recipes. Don’t hesitate to PR fixes, additions, etc. @ the <a href="https://github.com/gaelforget/ECCOv4" class="">ECCOv4 repo</a> if you want.</div><div class=""><br class=""></div><div class="">Cheers,</div><div class="">Gael</div><div class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Apr 10, 2019, at 10:27 AM, Ali Ramadhan <<a href="mailto:alir@mit.edu" class="">alir@mit.edu</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div dir="ltr" class=""><div dir="ltr" class=""><div class="">+1 for Google Cloud. Trying things out on the cloud is a great option and might be more cost-effective than buying a fancy new rig.</div><div class=""><br class=""></div><div class="">I just wanted to point out that the <$1 / hour for 96 CPUs and 360 GB RAM is for a <i class="">preemptible virtual machine</i> (VM) which from my understanding uses spare resources on Google Cloud and thus your VM can be interrupted at any time (5-15% chance per day apparently) and can only run for a maximum of 24 hours.
See: <a href="https://cloud.google.com/compute/docs/instances/preemptible" class="">https://cloud.google.com/compute/docs/instances/preemptible</a>
</div><div class=""><br class=""></div><div class="">A regular VM seems much more suitable for MITgcm runs but then the node with 96 CPUs and 360 GB RAM costs $4.56 / hour, or $3.19 / hour if you run long enough to make use of the <a href="https://cloud.google.com/compute/docs/sustained-use-discounts" class="">sustained use discounts</a>, which is still pretty good I think. You also don't have to worry about setting up the machine and maintaining it.</div><div class=""><br class=""></div><div class="">Cheers,</div><div class="">Ali<br class=""></div></div></div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Apr 10, 2019 at 10:15 AM Ryan Abernathey <<a href="mailto:ryan.abernathey@gmail.com" class="">ryan.abernathey@gmail.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class="">You can rent an Intel Skylake node with 96 CPUs and 360 GB RAM for < $1 / hour on Google Cloud:<div class=""><a href="https://cloud.google.com/compute/pricing" target="_blank" class="">https://cloud.google.com/compute/pricing</a><br class=""></div><div class="">For low resolution simulations, this would be more than sufficient.</div><div class=""><br class=""></div><div class="">You could use this to experiment before buying any hardware. Or maybe you would decide you don't actually need to buy at all.</div><div class=""><br class=""></div><div class="">-Ryan</div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Apr 10, 2019 at 1:31 AM Matthew Mazloff <<a href="mailto:mmazloff@ucsd.edu" target="_blank" class="">mmazloff@ucsd.edu</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="">Hi Christoph<div class=""><br class=""></div><div class="">Some answers to your questions. But there are more knowledgable people out there!</div><div class=""><br class=""></div><div class="">The MITgcm scales well and is routinely run on thousands of cores. </div><div class="">example:</div><div class=""><a href="https://people.nas.nasa.gov/~chenze/ECCO/SC05/ecco_sc05.pdf" target="_blank" class="">https://people.nas.nasa.gov/~chenze/ECCO/SC05/ecco_sc05.pdf</a></div><div class=""><br class=""></div><div class="">(Obviously if you try to run a small model domain on many cores it will be inefficient.)</div><div class=""><br class=""></div><div class=""><div class="">In my experience with forward model runs <font face="Verdana" class="">memory isn’t a bottleneck. </font></div><div class=""><br class=""></div><div class="">I am not sure what size runs you are talking about, but for runs with great than a few hundred cores I think the bottleneck is primarily with the interconnects and I/O to the NFS. Hopefully people will correct me if I am wrong. </div><div class=""><br class=""></div><div class="">Matt</div><div class=""><br class=""></div><div class=""><blockquote type="cite" class=""><div class="">On Apr 9, 2019, at 6:13 AM, Christoph Stappert <<a href="mailto:cstappert@gmx.de" target="_blank" class="">cstappert@gmx.de</a>> wrote:</div><br class="gmail-m_-321471543395447274gmail-m_3248916622251863747Apple-interchange-newline"><div class=""><div class=""><div style="font-family:Verdana;font-size:12px" class=""><div class="">Hello everyone,</div>
<div class=""> </div>
<div class="">I am currently building a workstation to run some MITgcm simulations, and I am wondering which of the different CPU models I am considering would be best suited for the task:</div>
<div class=""> </div>
<div class="">Ryzen 7 1700 (8x 3.0 GHz, dual-channel RAM): A consumer-grade CPU and siginificantly cheaper than the others. However, while it does have ECC, the ECC feature is not officially supported by AMD, so I am reluctant to use this CPU in scientific computing.</div>
<div class=""> </div>
<div class="">Xeon E-2146G (6x 3.5 GHz, dual-channel RAM): This is the option I am leaning towards at the moment.</div>
<div class=""> </div>
<div class="">Ryzen Threadripper 1950X (16x 3.4 GHz, quad-channel RAM): More CPU cores than the other two options, but also more expesive. I am wondering, how big would the performance gain actually be in practice?</div>
<div class=""> </div>
<div class="">I have read in some messages on this list that MITgcm does not scale well with an increasing number of CPU cores and that memory bandwidth is an issue. However, these messages were more than 10 years old, so I am not sure if this still applies to the latest generation of CPUs and to the latest version of the software. I was not able to find any newer messages on hardware recommendations, performance and such.</div>
<div class=""> </div>
<div class="">My specific questions are:</div>
<div class="">- How well does MITgcm scale with an increasing number of CPU cores (4, 8, 16, 32...)? At which point would I stop seeing a significant increase in performance?</div>
<div class="">- Is there a bottleneck with memory bandwidth in today's CPUs? Does a higher number of RAM channels significantly increase performance?</div>
<div class="">- Are L2 cache and L3 cache a major bottleneck?</div>
<div class="">- Does MITgcm benefit from using AVX-512 or other Intel-specific features (since AMD hasn't really been a factor in scientific computing in the last couple of years)?</div>
<div class=""> </div>
<div class="">Of course, I could just get all the CPU models under consideration and do my own benchmarks, but unforunately, I do not currently have the budget or the time for this. So I was hoping that someone here might have some insights based on their knowledge of the MITgcm code or some personal experience using different kinds of hardware.</div>
<div class=""> </div>
<div class="">Thank you and kind regards,</div>
<div class=""> </div>
<div class="">Christoph</div>
<div class=""> </div></div></div>
_______________________________________________<br class="">MITgcm-support mailing list<br class=""><a href="mailto:MITgcm-support@mitgcm.org" target="_blank" class="">MITgcm-support@mitgcm.org</a><br class=""><a href="http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support" target="_blank" class="">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support</a><br class=""></div></blockquote></div><br class=""></div></div>_______________________________________________<br class="">
MITgcm-support mailing list<br class="">
<a href="mailto:MITgcm-support@mitgcm.org" target="_blank" class="">MITgcm-support@mitgcm.org</a><br class="">
<a href="http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support" rel="noreferrer" target="_blank" class="">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support</a><br class="">
</blockquote></div>
_______________________________________________<br class="">
MITgcm-support mailing list<br class="">
<a href="mailto:MITgcm-support@mitgcm.org" target="_blank" class="">MITgcm-support@mitgcm.org</a><br class="">
<a href="http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support" rel="noreferrer" target="_blank" class="">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support</a><br class="">
</blockquote></div>
_______________________________________________<br class="">MITgcm-support mailing list<br class=""><a href="mailto:MITgcm-support@mitgcm.org" class="">MITgcm-support@mitgcm.org</a><br class="">http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support<br class=""></div></blockquote></div><br class=""></div></body></html>