ORIGINALLY POSTED 18th June 2008
16,511 views on developerworks
I see that Hu thought he’d better make up some mumbo-jumbo about their ‘chubby-provisioning’ solution. I did post a comment but, as usual, his moderating means he ignores comments he doesn’t like. I’d point you back to my discussion about random 4K writes randomly across a large volume, say one hits every 50 to 100MB apart. Very quickly in HDS’s solution you will have fully-provisioned the disk… Can anyone enlighten me on what he’s talking about with respect to de-frag? When we allocate a ‘grain’ of say 32K, its a contiguous 32K chunk on the vritual disk. For example, from LBA 64 to LBA 127 (512byte blocks). If you then allocate something at LBA 256, there is a 64K gap on the virtual disk, but at the managed disk level its two contiguous 32K blocks. No de-frag needed… maybe I’m missing something with HDS’s implementation – but his waffle about this getting more complicated with smaller grain sizes – i.e. true thin provisioning – is not at all relevant to our implementation. Meanwhile Chuck has been saying Virtualization has failed to deliver on its promise (curious that he comes out the woodwork to bang this old school stance when we make a big splash), and that if you buy lots of expensive EMC software you can get out of the mess that they have let you get into in the first place.
Anyway, I wanted to talk about Virtual Disk Mirroring (VDM) today. This was one feature that people maybe didn’t see or hear was coming, but it is something that many of our enterprise customers have been asking for and actually has some nifty uses that may not at first be aparent.
Virtual Disk Mirroring (VDM)
Mirroring itself isn’t anything new or ground breaking, but mirroring across controllers is something that until now only host software has been able to provide. In a host based solution you need to provision two LUNs from your storage, then use something like LVM mirroring to manage. We’ve taken this capability out of the host and embedded the functionality into SVC. Building on my ‘flexible stacked I/O component’ post from last week, yet again we slipped the Mirroring component into the stack. This time we inserted the new component below FlashCopy and above space-efficient, for good reasons. For those of you that know, or knew, the SVC component stack here is the updated picture :
Before
diving in with why we did this and what it means lets step back and
look at what VDM does for you. Basically VDM means you can now have two
copies of a virtual disk. The host therefore sees a single vdisk, but
behind the scenes its two copies of the vdisk are stored on two sets of
managed disks. The primary use case being to copy a virtual disk onto
two physical storage systems, usually in two different managed disk
groups. This lets you not only protect against single array failures
(multiple physical disk failures) but also the loss of an entire disk
storage system itself. It also allows you to greatly improve the
availability of low-end storage systems – bringing the Mirrored vdisk up
to the expected availability of enterprise volume. Virtual Disk
Mirroring is designed to protect critical business data against storage
system failures. In that sense, it is a high availability function
comparable in scope with HACMP. Virtual Disk Mirroring is not a
disaster recovery solution due to both copies being accessed by the same
node pair and only addressable by a single cluster in one site.
Existing customers can add a new ‘copy’ to an existing vdisk, which will require an initial synchronisation process. The rate of synchronisation can be controlled. New vdisks can be created as a mirrored vdisk (two copies) which can be created as ‘synchornised’ or needing synchronisation (That is, once created a the sync operation will begin – the vdisk is online during this operation). One copy has to be marked as the primary copy. This primary designation can be changed at any time without disruption to applications. (Care should be taken with the create sync’d option as it will assume the primary copy is valid and the other copy is already in sync) Once a mirrored vdisk is created, SVC will maintain the two copies in sync. Writes to the vdisk will be sent to both copies. VDM does not load balance on reads, it will always read from the primary copy – unless the primary copy is offline. Of course you can configure your system to load balance on reads. Assume you have two storage controllers that you are using, each one storing a single copy of each vdisk. There is nothing to stop you alternating the primary copy across the two controllers. Thus you can spread the read workload across all the spindles and get the best performance by getting as many spindles going at once.
Looking at availability, the whole idea of VDM is really to protect against the loss of an mdisk that takes out one or more vdisks. If such an event occurs the mirrored vdisk will switch to use the one remaining online copy. A bitmap of changed grains is maintained from this point onwards. The grain size for VDM is fixed at 256K. When you bring the offline copy back online – by fixing the missing mdisk or controller, only the changed grains since the copy went offline are re-synced. Should you need to take a complete controller offline for some disruptive maintenance actions you could add new copies to the vdisks and thus maintain access during this mainteance period. We provide a new command line view to list the ‘vdisks dependent on a controller’ to aid just such operations. On a different note, if one copy has a medium error, the other copy will be used and the medium error will attempt to be fixed.
A comment on my previous post was asking if VDM essentially looks like RAID-10 when used in conjunction with SVC’s striped vdisk option. In a sense yes , you could think of this as a form of RAID-10, however you still need to use some form of RAID below SVC. VDM does not provide any hot-spare ability, nor does it provide automatic data scrubbing – however it does provide repair and validate. These two options allow you to first validate that the two copies are in sync, and if not, perform a repair to re-synchronise the two copies. VDM does not limit any of the existing SVC features or functions, the vdisk itself can be any of the currently supported vdisk types and you can specify if each copy is fully allocated or space-efficient.
VDM therefore provides a way to migrate a vdisk from a space-efficient vdisk to a fully-allocated vdisk. Suppose you have a space-efficient vdisk that has become almost fully-allocated, or you have decided for other reasons to convert it to a fully-allocated vdisk. You can ‘add a copy’ to the vdisk that is fully-allocated and synchronise the two copies. You now have a fully allocated second copy of the disk. Now switch the primary copy to be the fully allocated copy and ‘split’ them. Split is a function that allows you to separate the two copies, with the primary copy remaining the online active copy. So you can migrate from space-efficient to fully-allocated without any disruption to the host application. Then you can return the second copy back to the free space on its managed disk group. You could of course make the switch the other way around, however you’d end up with a fully allocated space-efficient disk after the synchronisation. There is nothing to stop you making both copies space-efficient. Again this is another advantage of the stacked I/O component model used in SVC, and the thinking behind inserting VDM above space-efficient.
SVC currently has a restriction within migration, that is, you can only migrate vdisks between managed disk groups that have the same basic extent size. VDM solves this by allowing you to create mirrored copies between managed disk groups with different extent sizes. Simply add a second copy, that belongs to the managed disk group you wish to migrate onto, to the vdisk you wish to migrate. Synchronise the copies, then split them and delete the original copy. Some customers I spoke to during the run up to 4.3.0, coming from a Symmetrix view of the world saw the split function as something they recognise their systems today. This maybe a way to migrate from a Symm view of the world while getting to grips with the much more efficient FlashCopy features – especially incremental space-efficient FlashCopy.
Performance wise, you are now making two copies of the data, and on writes we won’t complete the write until both copies have completed. The slowest mdisk in the VDM will dictate the performance. You should try and mirror across similar performing storage systems.
With the cache sitting about VDM, it should hide some of the impact on writing to two copies. Reads are sent to the primary, so to gain best performance on reads as I mentioned, round robin the primary copy across both managed disk groups/storage systems. The synchronisation operation is comparable to a FlashCopy or migration operation – in terms of performance impact.
Additional Features and Support
The
other key features added to SVC 4.3.0 is a doubling of the supported
vdisks. You can now map 2048 virtual disks from a single node pair, thus
8192 in total for an 8 node cluster. I’ve already covered the 256 flash
copy targets per single source vdisk.
IPv6 support is added to both the SVC cluster – DHCP support, front panel interactions (which have also seen some re-work based on user forums) and the SSPC console also support IPv6 addresses.
The 4.3.0 release also adds to our very extensive support matrix :
- Microsoft Windows 2008 Enterprise x64 Edition, SAN boot, 32-bit support, and clustering
- Microsoft Windows 2008 Enterprise Edition for Itanium-based systems, including SAN boot
- HP-UX 11i V3 for PA-RISC and Itanium-based systems including clustering with HP ServiceGuard
- Apple Mac OS X Server 10.5.2 with ATTO Celerity FC-42ES HBA
- Pillar Axiom 300 and 500 controller support
- For the full support list see : http://www-03.ibm.com/systems/storage/software/virtualization/svc/interop.html
Licensing Changes
We are also changing the licensing for FlashCopy with SVC 4.3. In the past, you licensed FlashCopy for the sum of source and target vdisk capacity that you wanted to copy. So if you wanted to make a single copy of 2TB of data, you would license 4TB of FlashCopy (2TB source+2TB target). Note that this is the maximum amount of data being copied at any time. With the ability to make up to 256 copies and the fact that space-efficient copies occupy very little physical space, we figured that charging for the full virtual capacity of the targets didn’t make much sense. So, with immediate effect, FlashCopy is now licensed by the source capacity only. You can make as many copies as you like. In the example above, you would need to license only 2TB of FlashCopy capacity. This change is also retroactive for our existing SVC customers. An existing customer licensed for 4TB of FlashCopy can now make as many copies as they like of up to 4TB of virtual capacity; that’s at least double the replication ability for no extra charge. If you are using cascaded FlashCopy, note that intermediate copies are both sources and targets. If you copy A->B->C, virtual disks ‘A’ and ‘B’ are sources and so count towards the FlashCopy entitlement (‘B’ and ‘C’ are targets).
Thats my roundup of the more detailed explanations behind the new SVC 4.3.0 release which will be available for download from the SVC support pages on June 27th and will be pre-installed for new customers on hardware shipped from manufacturing in early July.
Leave a Reply