Why did you do it like that?

ORIGINALLY POSTED 6th October 2007

11,046 views on developerworks

This morning one of my colleagues in my office bay started a discussion about the fine line between divulging useful technical information on this blog-o-sphere while maintaining the confidentiality of technical information, product plans and general intellectual property of IBM. I explained to him that I think of this as I would a customer forum. That is, when I wonder if I should be talking about something I consider if I’d discuss it freely when visiting, or with visiting customers who are not under NDA. If the answer is yes, then I consider it fair game. My colleague commented that in his previous department there was very little customer interaction and it was something he wanted to address in his new role.

This moved the discussion on to how I find it extremely useful to understand how and why customers are interested in SVC and how they plan to use it. Sometimes this opens a whole new area of thought or innovative way in which virtualization could be used, that I / we have never even considered.

Like Chuck, I enjoy this kind of meeting, where you get a chance to discuss the technical aspects of the product, how we solved common problems and so on – especially with like-minded technical folks that have a deep understanding of their business needs and environment.

So here’s a starter for ten, with a couple of the most commonly asked questions.

Why did you chose a commodity Intel based platform over either custom hardware or a pSeries type box?

There are several very good reasons for this decision, most of which have already proved that this was a very wise decision.

Cost. SVC was designed to have a low entry cost and scale out to larger solutions. As soon as you start designing custom hardware you are at the start of a very costly development cycle that never ends. You have to design the boards, possibly custom ASIC, and by the time you get it out the door its already on last years technology. This is a common problem that afflicts almost all custom hardware products. Using an already developed and low cost server platform means the costly work has already been done, debugged revised and tested. UNIX based servers are also usually more expensive.
Bandwidth. As I’m sure everyone is aware SVC has to act as a ‘big pipe’ Since it sits in the middle of the SAN with all data flowing through it, you need to be able to shovel the data in and out as fast as you can, and as much as you can. Intel based servers are led primarily by the requirements of the gaming market, in particular peripheral BUS, memory BUS and CPU bandwidth. Exactly what we need too. As if by magic, the latest 8G4 nodes have an 8 lane PCIe bus, 1.6MB/s, which equals 4x 4Gbit FC ports. The 1.3GHz FSB and DDR2 memory mean again the bandwidth into CPU and memory is great. UNIX based servers tend to lag one or two years behind, certainly on peripheral bandwidth.
Hardware refresh rates. Every 12-18 months the next generation of Intel server is available, with the latest and greatest CPU’s, memory speeds and peripheral BUS architectures. For example, PCIe Gen2 is not far away with double the bandwidth. One of the code requirements was to isolate and minimise what we call the Platform code. That is, the parts of the code that actually need to talk to the hardware. For example to support the 8G4 x3550 hardware base over the 8F4 x336 hardware involved only a few thousand lines of extra / modified SVC code.

SVC only has 8 ports per IO group, 32 in an 8 node cluster – my storage controller has many more and I’ve been told I need that many?

This all boils down to how well you make use of what you have. Dare I bring up the HDS USP-V recent benchmark. One point that hasn’t been covered elsewhere is the number of ports used. In this case HDS used 64 ports to achieve their ~200K benchmark. Compare that to SVC’s ~270K with half the number of ports and you see my point.

Because we have the backend BUS capability to drive the 4 ports to their maximum it means that where other products need many more ports to cover up for the lack of bandwidth behind the ports, SVC can drive the ports to almost saturation (not that this would be generally recommended unless your fabric can cope).

Many products require more ports to enable host fan out – for example I understand that DMX can only share ports with multiple hosts of the same type (folks please correct me if I am wrong) As SVC is only talking to the switch (so you get the fan out from the switch ports) and caters for shared access (only recommending you zone like hosts together – for host non-interop reasons) you don’t need the extra ports for fan out/limitation reasons.

As an example, the table below shows the maximum theoretical bandwidth vs what I measure under various test conditions (all random workloads using a 2 node SVC) :

	Theoretical	Measured #
Read Hit	3.2GB/s	3.0GB/s
Write Hit	1.6GB/s*	1.4GB/s
Read Miss	3.2GB/s	2.6GB/s**
Write Miss	1.6GB/s*	1.2GB/s**

Notes: # These are provided as an example of max rates I have measured under lab stress conditions. * Half the read bandwidth due to mirroring of write cache data. ** I have included the miss measurements however these are purely dependent on the back-end read and write data rate capability.

I’ll cover some more intersting and FAQ type questions in later posts, for now, enjoy your weekend.