[MITgcm-support] Choice of workstation CPU for running MITgcm

Gael Forget gforget at mit.edu
Thu Apr 11 09:03:13 EDT 2019


Hi Christoph et al,

In case this is of interest to you, there is an example of running a 96 core, 20-year global-ocean, MITgcm simulation on the Amazon cloud <https://aws.amazon.com/> using cfncluster <https://cfncluster.readthedocs.io/en/latest/> @

https://eccov4.readthedocs.io/en/latest/runs.html# <https://eccov4.readthedocs.io/en/latest/runs.html#> (on-premise instructions)
https://github.com/gaelforget/ECCOv4/tree/master/example_scripts/ <https://github.com/gaelforget/ECCOv4/tree/master/example_scripts/> (see README.md)
https://www.doi.org/10.13140/RG.2.2.13115.46889 <https://www.doi.org/10.13140/RG.2.2.13115.46889> (see 2nd part of presentation) 

This dates back to 2017 so it’s possible that the quoted cost & performance (40$, 36h), as well as the workflow need updating. Have not tried on Google, Microsoft, or other cloud services but would be great if we could have a whole set of these recipes. Don’t hesitate to PR fixes, additions, etc. @ the ECCOv4 repo <https://github.com/gaelforget/ECCOv4> if you want.

Cheers,
Gael

> On Apr 10, 2019, at 10:27 AM, Ali Ramadhan <alir at mit.edu> wrote:
> 
> +1 for Google Cloud. Trying things out on the cloud is a great option and might be more cost-effective than buying a fancy new rig.
> 
> I just wanted to point out that the <$1 / hour for 96 CPUs and 360 GB RAM is for a preemptible virtual machine (VM) which from my understanding uses spare resources on Google Cloud and thus your VM can be interrupted at any time (5-15% chance per day apparently) and can only run for a maximum of 24 hours. See: https://cloud.google.com/compute/docs/instances/preemptible <https://cloud.google.com/compute/docs/instances/preemptible>
> 
> A regular VM seems much more suitable for MITgcm runs but then the node with 96 CPUs and 360 GB RAM costs $4.56 / hour, or $3.19 / hour if you run long enough to make use of the sustained use discounts <https://cloud.google.com/compute/docs/sustained-use-discounts>, which is still pretty good I think. You also don't have to worry about setting up the machine and maintaining it.
> 
> Cheers,
> Ali
> 
> On Wed, Apr 10, 2019 at 10:15 AM Ryan Abernathey <ryan.abernathey at gmail.com <mailto:ryan.abernathey at gmail.com>> wrote:
> You can rent an Intel Skylake node with 96 CPUs and 360 GB RAM for < $1 / hour on Google Cloud:
> https://cloud.google.com/compute/pricing <https://cloud.google.com/compute/pricing>
> For low resolution simulations, this would be more than sufficient.
> 
> You could use this to experiment before buying any hardware. Or maybe you would decide you don't actually need to buy at all.
> 
> -Ryan
> 
> On Wed, Apr 10, 2019 at 1:31 AM Matthew Mazloff <mmazloff at ucsd.edu <mailto:mmazloff at ucsd.edu>> wrote:
> Hi Christoph
> 
> Some answers to your questions. But there are more knowledgable people out there!
> 
> The MITgcm scales well and is routinely run on thousands of cores. 
> example:
> https://people.nas.nasa.gov/~chenze/ECCO/SC05/ecco_sc05.pdf <https://people.nas.nasa.gov/~chenze/ECCO/SC05/ecco_sc05.pdf>
> 
> (Obviously if you try to run a small model domain on many cores it will be inefficient.)
> 
> In my experience with forward model runs memory isn’t a bottleneck. 
> 
> I am not sure what size runs you are talking about, but for runs with great than a few hundred cores I think the bottleneck is primarily with the interconnects and I/O to the NFS. Hopefully people will correct me if I am wrong. 
> 
> Matt
> 
>> On Apr 9, 2019, at 6:13 AM, Christoph Stappert <cstappert at gmx.de <mailto:cstappert at gmx.de>> wrote:
>> 
>> Hello everyone,
>>  
>> I am currently building a workstation to run some MITgcm simulations, and I am wondering which of the different CPU models I am considering would be best suited for the task:
>>  
>> Ryzen 7 1700 (8x 3.0 GHz, dual-channel RAM): A consumer-grade CPU and siginificantly cheaper than the others. However, while it does have ECC, the ECC feature is not officially supported by AMD, so I am reluctant to use this CPU in scientific computing.
>>  
>> Xeon E-2146G (6x 3.5 GHz, dual-channel RAM): This is the option I am leaning towards at the moment.
>>  
>> Ryzen Threadripper 1950X (16x 3.4 GHz, quad-channel RAM): More CPU cores than the other two options, but also more expesive. I am wondering, how big would the performance gain actually be in practice?
>>  
>> I have read in some messages on this list that MITgcm does not scale well with an increasing number of CPU cores and that memory bandwidth is an issue. However, these messages were more than 10 years old, so I am not sure if this still applies to the latest generation of CPUs and to the latest version of the software. I was not able to find any newer messages on hardware recommendations, performance and such.
>>  
>> My specific questions are:
>> - How well does MITgcm scale with an increasing number of CPU cores (4, 8, 16, 32...)? At which point would I stop seeing a significant increase in performance?
>> - Is there a bottleneck with memory bandwidth in today's CPUs? Does a higher number of RAM channels significantly increase performance?
>> - Are L2 cache and L3 cache a major bottleneck?
>> - Does MITgcm benefit from using AVX-512 or other Intel-specific features (since AMD hasn't really been a factor in scientific computing in the last couple of years)?
>>  
>> Of course, I could just get all the CPU models under consideration and do my own benchmarks, but unforunately, I do not currently have the budget or the time for this. So I was hoping that someone here might have some insights based on their knowledge of the MITgcm code or some personal experience using different kinds of hardware.
>>  
>> Thank you and kind regards,
>>  
>> Christoph
>>  
>> _______________________________________________
>> MITgcm-support mailing list
>> MITgcm-support at mitgcm.org <mailto:MITgcm-support at mitgcm.org>
>> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support <http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support>
> 
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org <mailto:MITgcm-support at mitgcm.org>
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support <http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org <mailto:MITgcm-support at mitgcm.org>
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support <http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mailman.mitgcm.org/mailman/listinfo/mitgcm-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20190411/78dc9ee0/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1859 bytes
Desc: not available
URL: <http://mailman.mitgcm.org/pipermail/mitgcm-support/attachments/20190411/78dc9ee0/attachment-0001.p7s>


More information about the MITgcm-support mailing list