So just what do I mean by Storage Virtualization

ORIGINALLY POSTED August 10th 2007

13,146 views on developerworks

Less than one week into my blogging career, why not open a huge can of worms and let the ‘fishermen’ fight over the big fat juicy ones…

I changed the title of this post several times before settling on the phrase “what do I mean” – this way I’m not dictating anything to anyone other than what I picture when talking about Storage virtualization and what this blog is intended to explore.

Some months ago while meandering through Wikipedia I found a rather stubby Storage virtualization topic with a two line description and far too many vendor links to specific products. In the discussion pages user “Plowden” had outlined a set of topic headers. The overriding aim was to keep it vendor and product neutral so that any poor soul sailing the ‘Sea of Storage’ could get a insight into the different topics, including the major pros and cons associated with the different implementation approaches. I spent a day or two tackling this, it was difficult to keep it vendor neutral, but hopefully the end result was worth it. As with all things Wikipedia, feel free to chip in and enhance the page.

Storage virtualization is an over used term. More often than not it is used by vendors as a buzz word to claim their product is a virtualizer. I guess almost every storage hardware and software product could technically claim to provide some form of virtualization. So where do we stop and draw a line in the sand? Does the fact that my Thinkpad has 3 logical volumes carved from a single physical drive mean its virtual? Not in my eyes. RAID has been around for years and provides a way to glue unreliable JBOD’s into something a bit more reliable, but again its not what I’m interested in here.

When I think of Storage virtualization, I see a system that must provide what I’ve chosen to call the ‘Cornerstones of Virtualization’. Quite simply, these are the set of core advantages a Virtual SAN can provide over traditional direct attached SAN storage, namely :

Online volume migration while applications are running. This is possibly the ‘killer app’ for Storage virtualization. It enables tiered storage for information life-cycle management (ILM), balancing I/O across controllers, upgrading and retiring storage, etc.
Simplification of storage management by providing a single image for multiple controllers and a consistent user interface for provisioning heterogeneous storage. (After some initial array setup of course)
Enterprise level copy services for existing storage. The customer can license a function once and use it everywhere. New storage can be purchased as low-cost RAID ‘bricks’. (The source and target of a copy relationship can be on different controllers.)
Increased storage utilization by pooling storage across the SAN.
The potential to Increase system performance by reducing hot spots, striping disks across many arrays and controllers – and in some implementations provide additional caching.

So yes I’m talking mainly about the big vendor products SVC, USP and Invista, and the many smaller companies like DataCore, Falconstor and Incipient. (This is not meant to be an exhaustive list and there are lots of startups – if I missed you I’m sorry – you never know you maybe ‘lucky’ and EMC will gobble you up!)

There are many approaches to implementing Storage virtualization, and interstingly the big three players have all chosen a unique approach. The ‘Storage virtualization war’ has, and will continue to rage between the vendors as to which approach is ‘best’. I’d like to wave a “whyte” flag for the purposes of this blog – and welcome comments and thoughts on all approaches. Personally of course I believe that we (IBM) selected the most flexible and long-term viable solution that can provide all of the promised benefits of Storage virtualization. Over this series of posts I aim to explain why I think this is the case. I plan to compare and contrast the various approaches in an un-biased manner – well as much as possible!

This is bound to be a contentious issue and if (maybe that should be when) you dispute anything I say I’m sure you will all shout 🙂

Storage Virtualization – part 1 – ‘The approach war’

The whole concept of virtualizing your storage network is a disruptive one. It doesn’t matter which approachyou choose, it radically changes not only the way your storage administrator thinks about his job, but also the very fundamentals of how and where things are done. Once you have an abstraction device sitting in the middle of the SAN between your hosts and your storage, or even your storage and storage, the rules have changed.

I’m sure most readers are aware of the three main approaches in use today, for more detailed explanations, google is great. However while all these approaches provide in essence the same basic ‘Cornerstones of Virtualization’ there are some interesting side affects with some or all approaches.

Network Based – Appliance
The device is a SAN appliance that sits in the data-path and all I/O flows through the device. The device is both target and initiator. It is the target of I/O requests from the host perspective and the initiator of I/O requests from the storage perspective. The redirection is performed by issuing new I/O requests to the storage.
Switch Based – Split-path
The device is usually an intelligent switch that intercepts I/O requests on the fabric and redirects the frames to the correct stroage location. The actual I/O requests are themselves redirected.
Controller Based
The device is a storage controller that provides an internal switch for external storage attachment. Here the storage controller intercepts and redirects I/O requests to the external storage as it would for internal storage. (I’m not sure if USP actually re-generates new I/O requests or simply forwards to original I/O – maybe someone can enlighten me)

It is the implications and side-affects of each approach that I plan to discuss in the next few posts, namely :

Copy Services – data migration, controller copy services, virtual copy services
Performance – striping, adding latency, caching
Interoperability – multi-pathing, device co-existence
Addressability – scalability
RAS – upgradability, storage upgrades
And anything else that comes to me while I write (or from comments received)

My only request is that we keep to topic, that is I’m sure the urge to jump in now and talk about additional latency now is almost too much for some readers to handle (Mark), but as a wise man (or woman) once said… there’s a time and a place… and there will be… till next time… everyone keep smiling

Disclaimer: Added 13th August 07

A few of my peers have pointed out that here I’m really talking about SAN based block virtualization. The official IBM stance on ‘Storage Virtualization’ is :

Technology that makes one set of resources look and feel like another set of resources, preferably with more desirable characteristics…
A logical representation of resources not constrained by physical limitations
- Hides some of the complexity
- Adds or integrates new function with existing services
- Can be nested or applied to multiple layers of a system

This covers a much wider range of products and technologies like Host-based software, HBA firmware, Solid-state and CD/DVD posing Disk, Disk posing as Tape (VTS etc), NAS and iSCSI gateways, DR550, GMAS etc