Storage Virtualization – Part 5 – Upgrades – Barry Whyte and Andrew Martin : IBM Storage

ORIGINALLY POSTED 7th October 2007

6,043 views on developerworks

I’ve been meaning to post this entry in my Storage Virtualization part work for some time now. Day to day work, testing some new functions and customer commitments have left little, if any, time to dump my thoughts. This weekend has proved to be a good respite, and allowed me not only time to catch up on whats being talked about out there, but also to spend some time with the family.

Balancing time between work and play is always a struggle. Before I got married it wasn’t so difficult, if I wanted to spend a Wednesday evening continuing my day job, or working on something new and exciting the only person I had to answer to was myself! One thing is for sure, with every passing day we are all getting older, and hopefully wiser. Times change, priorities shift and suddenly before you know it you are saying all those phrases you vowed you’d never say when you had kids of your own, and the grandparents are watching on with a knowing smile. The ‘Santa’ threats have already started here, as our 5 year old continues to push to boundaries and see just how far Mummy and Daddy will bend before we snap.

These days I find it more difficult to ‘get away with that Wednesday night call’, its not just me anymore, its the family… and the evil looks I get from my better half when I comment that ‘hopefully the call won’t go on too long’. For sure I am not alone, and in our modern 24×7 society its more an more of an issue. Who’s going to be there to sort out a system when it goes wrong at 3am. How can we do this needed update, or cut over to a new system when we only have a 30 minute maintenance window at 5am on the third Sunday of each month…

As vendors, this means our products must not only be concurrently upgradeable, but full scale hot-swapable. As customers this means system administrators are asked to put even more strain on their family life by performing upgrades or hardware replacements at un-godly hours of the morning.

Storage as we know it has not really changed in the 50+ years since the first IBM 250 Magnetic Disk Random Access File used in the 1956 IBM 305 RAMAC – with a whopping 5MB storage capacity, coming from fifty 24″ platters! If you look closely enough its still a spinning platter, consisting of base metals and electric currents storing many millions of 1’s and 0’s. When you compare this with the rest of the computing industry, it really is quite an archaic medium. All that said, the hardware and software surrounding the spinning platters get ever more advanced in an attempt to hide and compensate for the downsides of the magnetic medium. Rotational speeds seem to have peaked at 15K RPM for now, seek and latency times have improved slightly due to the smaller platters used in 2.5″ drives, but ultimately the technology needs caching, RAID for reliability, and has a strangely accepted replacement life-cycle of about three years.

When that three year life-cycle is up, the requests are sent out to the subset of ‘trusted’ suppliers and the quotes come back. Some vendors may be ruled out due to current vendor lockin – the fear of changing more than one thing at a time, and the strategic allegencies between high level executive management teams (vendor and customer). Choices maybe limited and decisions made that suit the economies of the business. Storage Virtualization can allow the removal of vendor lockin, and a larger set of vendors being tendered – the buying decision can now be based on who provides the cheapest storage this quarter (assuming it meets the customer requirements).

Now the real headache begins, migration. Most vendors provide services to migrate data, usually software based network or host based backup/restore technology. This is where the Storage Virtualization ‘killer app’ of online data migration makes a massive difference. In a Virtualized SAN / storage environment data can be migrated during the working day, and doesn’t necessitate late night working for the storage admins. I’ve covered this in part2 so won’t go into it again, but this does bring up another key question – how do you upgrade the virtualizer?

As your storage infrastructure grows and needs more performance, you may need to add to, or upgrade the base virtualization hardware itself. There will always be software upgrades too and being in the middle of the SAN means these have to be concurrent. Depending on the approach taken these upgrades may or may not be possible.

Lets start with SVC, all software upgrades are concurrent. With the dual node nature of an IO Group within the cluster, we can take down one node, upgrade the software and bring the node back up running the previous version of cluster state code. Half the nodes are done at a time (odd and even) the only impact being the cache flush and subsequent write through running while only one node in an IO group is running. It is obviously important to ensure there are no issues prior to the upgrade and all vdisk paths are active. There is a pause between the node upgrades to ensure all multipathing software has time to recover the paths once the first set of nodes come back into the cluster. Only after the second half of the nodes have upgraded is the actual upgrade and new code level ‘committed’ and any cluster state changes are synchronised across the cluster.

The story is similar for hardware upgrades, for example if you are using the original 4F2 node hardware you may consider upgrading to this years 8G4 hardware (about 4x performance capability per node) As above, the IO group can run with just one node, so the removal and powerdown of a node, followed by powerup of the replacement hardware and its addition to the cluster means the nodes themselves can be replaced without the loss of access to data. A cluster can contain any combination of node hardware. This is extensively tested. The final thing is of course the zoning, but no, you don’t need to change it. The SVC node itself controls its WWPN’s, the unique part of the WWNN comes from VPD data held in the front panel unit, this can be changed via the front panel menus. The node upgrade procedure guides you through setting the new node to the old node’s WWNN.

Moving onto USP and USP/V – as I understand it Hitachi does not provide a sleek method of migrating your Virtualized external controllers from say a USP to USP-V. Presumably because the actual USP main unit contains the LUN details, you have to use traditional techniques to migrate the raw data from one box to another. I also presume that this means you have to have 2x the external controllers during this time. Hu Yoshida has just posted a good discussion about the virtues of virtualized storage when it comes to migration, however he has omitted this rather fundamental issue in his discussion. I assume Hitachi will address, or do provide a service to customers? It kind of defeats th purpose of online data migration when you have to revert back to old school techniques for the base box itself.

Finally Invista, which I must confess I have no information about. I understand you can add additional SSM’s but I can’t find any references to upgradability statements. Can two different SSMs co-exist in the same instance while an upgrade is performed? Can the software on the SSMs be upgraded and run two different versions of code for a period of time? I’m not stating that it can’t I’m just asking the questions as I don’t know, and it seems Google doesn’t know either 😉 Maybe Mark or BarryB can enlighten me?

Concurrent code upgrades in a clustered or multi-node SAN environment are non-trivial. It has to be designed into the code from the start. I know with SVC we spent a lot of time and effort on this, especially the algorithms that allow us to simultaneously run parts of the new code (the non-cluster state code) at the same time as parts of the old code (the cluster state code). Trying to retro-fit such code execution models would not be fun!