ORIGINALLY POSTED 28th June 2011
22,983 views on developerworks
Greetings this week from the Storage Buddhist as I do a guest spot on this blog for Barry. I’m wondering if he’s a little sensitive about Nigel Poulton‘s recent Infosmack jibe about how often this blog gets updated, so I’m stepping in while Bazza contemplates possible topics for his next blog entry. Personally I’d be interested in Barry’s thoughts on data compression…
Anyway, the thing that caught my interest this month is InfiniBand. After a ten year gestation it seems to be cropping up more and more.
Infiniband began to take shape back in 1999 after a truce was declared between two technology camps. IBM, HP and Compaq/Tandem were working on “Future I/O” while Intel, Microsoft & Sun were working on “Next Gen I/O”. Infiniband was the merger of those two projects and was designed to be flexible enough to be used in a whole variety of applications.
The last ten years have seen a variety of pundits alternately predicting complete victory or the total demise of Infiniband. In theory Infiniband could take over the world and be used instead of PCI, FC and Ethernet. You can for example run IP or SCSI over Infiniband, and it also supports remote direct memory addressing (RDMA) across nodes which means very low CPU ioverhead. Infiniband’s real strength seems to be when used in scale-out systems where low latency, robust, high bandwidth switched serial connections between nodes are required.
As a comparison, Infiniband latency is typically one tenth of 10GigE latency, it also has higher bandwidth and very low CPU overhead, not to mention having much better reliability under heavy load. To some extent it does depend what you run over it of course, since some protocols like TCP have inherently higher latency and CPU demands. Infiniband supports an almost unlimited number of nodes and typical cable lengths of up to 10 metres. Interestingly Converged Enhanced Ethernet starts to get a bit closer to Infiniband’s features, including plans to share the Open Fabrics Enterprise Distribution (OFED) API, but CEE is still pretty immature – most folks don’t realise that the IEEE 802.1 Working Group has still not completed a version 1 standard for CEE.
An example of a product using Infiniband interconnect is IBM’s SONAS (Scale-Out Network-Attached Storage). SONAS uses IB as the cluster connect for up to 60 nodes, while presenting only Ethernet/10GigE on the outside.
IB was an option on IBM’s DCS9900 (Deep Computing Storage) as an alternative to the FC host-side fabric, but it’s not available on the new DCS3700 (which has effectively replaced the DCS9900) so that suggests to me that IBM doesn’t see IB as having a lot of advantage over FC as a host-side connect. FC host-side connect is actually pretty hard to beat.
When it comes to internal interconnect however, not only SONAS, but also Exadata makes use of IB, as do Isilon and Ibrix for scale-out NAS. That starts to paint a picture of IB as the default choice for the internal node connections in true scale-out designs.
I guess one of the reasons that neither VMAX nor 3PAR use IB is because they don’t scale beyond 8 nodes. I see that HDS also recently announced a roadmap that takes them away from monolithic storage and into a more 3PAR/VMAX/V7000 style product, possibly in a 2013 timeframe. Their roadmap shows a 4 node scale-out system “w/Interconnect”. Maybe that will be based on IB, but with only 4 nodes maybe they will use something less scalable instead.
Anyway, it seems likely that IBM will continue to expand its use of Infiniband in it’s scale-out product sets and I expect other vendors to do likewise, especially as we all move forward into more scale-out world and leave the monoliths behind. Most of today’s implementations can be expected to use 20 Gbps DDR or 40 Gbps QDR but much higher speeds are available so there is plenty of headroom in the technology should that be required.