[MITgcm-support] mpi runs, throughput, latency, InfiniBand, Myrinet

Ed Hill ed at eh3.com
Wed Feb 28 16:39:05 EST 2007


On Wed, 28 Feb 2007 14:42:34 -0500 Nicolas Wienders
<wienders at ocean.fsu.edu> wrote:

> 
> Dear MITgcmers,
> 
> We are running a 12 processor job using 3 four AMD processor boxes.
> 
> First, we have noticed that the network communication between nodes
> is about 1 to 2 MB bi-directional (total), between nodes (using the  
> utility 'bwm').
> This is not much data. This makes us wonder if the slowdown is not
> throughput but latency. 1Gb Ethernet is a high latency networking  
> technology
> in contrast to InfiniBand and Myrinet.

Hi Nicolas,

I've been using OpenMPI over InfiniBand recently so I'll try to
respond.


> 1. What is the general nonde to node I/O for MITgcm processes? Is 1  
> to 2 MB
>     bidirectional normal, high or low?

Sorry, I've no idea what is "normal" here.


> 2. Bandwidth vs' Latency. If 1 to 2 MB is normal, would the jobs run  
> faster
>     using Myrinet or Infiniband?

MITgcm is rather sensitive to latency since, for example, there is an
MPI global sum within every loop of the cg2d linear solver.  And gigabit
Ethernet has awful latency -- especially if you aren't using the
specialized gigE drivers such as GAMMA.


> 3. Does MITgcm support Myrinet or Infinband and which is recommended?

I'm sure others can better answer this question but I have a vague
memory that gigabit Ethernet only allows about ~6--10 or so machines to
be connected with MPI before the MITgcm communication overhead kills
any advantage you get from having the additional computational
horsepower.

With IB and Myrinet you can see consistent speedups over much larger
numbers of nodes.


> 4. Is 10Gb ethernet an option to Myrinet or Infinband since 10Gb  
> Ethernet
>     uses a TCP offload engine on the network adapters? I mention
> this because
>     10Gb Ethernet is an emerging technology. To overcome the latency
> and contention issues found in 10/100/1000 Mb Ethernet, 10Gb Ethernet
>     introduces some new, latency improving, features. It seems that
> 10Gb Ethernet will be a great technology, but will it be great for  
> cluster
>     computing is the next question.
> 
> 
> Note: It seems that Infiniband is a better choice than Myrinet due
> to the
>        adoption of open standards.
> 
> Note: It appears that Infiniband technology is slightly cheaper than  
> 10GbE
>        at this time. Low density switches are about 8K to 10K and  
> network
>        cards for the nodes are about 1K each.
> 
> 
> I would appreciate any comment/experience you would have on this
> matter.


I'm currently using OpenMPI with IB hardware and it works very nicely.
Specifically, I'm using Silverstorm HCAs with a Silverstorm 24-port
switch and Fedora Core 6 for x86_64.  Fedora Core 6 comes with pre-
packaged software (kernel, libmthca, libibverbs, etc.) that "just
works" [little or no fiddling required] with most IB hardware.

Further, the IB switches (especially the 24-port and larger IB ones)
are substantially cheaper than the available 10G Ethernet switches.
This may change over time but at the moment it appears that IB has an
appreciable cost advantage and only a minuscule performance difference
relative to 10G Ethernet.

Ed

ps - The new 10G Myrinet actually *is* 10G Ethernet.


-- 
Edward H. Hill III, PhD  |  ed at eh3.com  |  http://eh3.com/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://mitgcm.org/pipermail/mitgcm-support/attachments/20070228/45801894/attachment.sig>


More information about the MITgcm-support mailing list