ORIGINALLY POSTED 13th June 2008
13,095 views on developerworks
As I promised yesterday this post is devoted to the technical details behind the new Space-Efficient Virtual Disk (SEV) and Space-Efficient FlashCopy (SEFC) functions available for no extra charge upon upgrade to the 4.3.0 software release of SVC. This post covers :
- SEV Implementation Details
- SEV Meta-data Management
- Configuration & Application Implications
- SEV with FlashCopy – SEFC
SEV Implementation details
In previous posts I have described how SVC is implemented as a stack of I/O components, each of these performs a specific encapsulated function, with the interface between each component being the same. This allows new components to be inserted into the stack in the most appropriate place. The key point for us in development is the lack of changes that need to be made to existing components, thus allowing fast development of new functions and limited chance of regression in existing code. Anyway, the SEV I/O component is inserted in the stack above the Virtualization layer (that performs the mapping between vdisks and mdisks) and below the FlashCopy layer. Thus FlashCopy does not need to know if a source or target vdisk is actually Space-Efficient. Thus SEFC is implemented in SVC with almost no additional development effort (over and above SEV that is!). Of course there are components that do need to be altered to make use of the new SEV nature, like the configuration interfaces, but generally the new function is encapsulated in the one new component.
SVC SEV is implemented as a fine-grain thin provisioning solution. By fine-grained I mean a relatively small allocation size is used. For each vdisk you can specify the grain size. This can be 32K, 64K, 128K or 256K. In general we expect people to use the smallest 32K grain size, as this means a 16K write is only going to allocate 32K on disk, with the free 16K available for the next write. In SVC this means we now have vdisk/mdisk extents, and within an extent we have grains. I’m assuming you have some knowledge of how SVC works, and understand what an extent is. This first write will allocate a new grain at the start of an extent belonging to the vdisk and the write will be inserted into this grain. This may result in only a partial grain being allocated. Each write I/O will result in the directory structure being updated to specify which extent, which grain, and the offset into the grain.
If we step back a bit and look at the SVC I/O flow, I/O requests now flow down through the ‘Remote Copy’ (MetroMirror and GlobalMirror) layer, into Cache, into FlashCopy and then down to SEV. When SEV receives the I/O request it has to perform a lookup in the directory structure for the relevant vdisk (The directory is implemented as a b-tree). I’ve described what happens on a new write above, but if this is a write to an existing grain that has already been allocated, then a lookup in the directory is needed to find where to write this I/O. This is still an LBA in terms of the virtual disk.
If the I/O is a read then the directory should contain a tree node that points to the required location (grain) within the virtual disk extent. In both cases (read/write) we now have the correct LBA to reference within the virtual extent and the I/O passed down to the Virtualization layer. This behaves as it always has, i.e. performs the virtual extent to physical extent lookup in the virtualization table and the I/O can be submitted to disk. ( There could be the strange case that a read is coming into a grain that has not yet been allocated. If so, SEV will simply return the I/O with a bunch of zeros – we get read miss performance that is on a par with read hit.)
Still with me I hope! The key to SEV is really the directory.
SEV Meta-data management
We could have gone for a course-grained solution as HDS have (42MB allocation unit) but that wouldn’t be very Space-Efficient. That is, we could have stuck with our extent size being the allocation unit, 16MB to 2GB (in powers of 2) but it would mean with some applications that write all over the disk we’d be fully provisioned very quickly!
With the ability to fine-grain provision the SEVdisks we are pretty close to only using the space you use to store your data. If you are doing sequential 4K writes over an SEVdisk, with a grain size of 32K, you can do 8 writes into a single grain before we go ahead and allocate a new grain. Even an application that writes small blocks randomly across the whole disk range will end up only consuming the ( number of writes x grain size ). Compare this with a 42MB allocation unit a-la HDS!
We now have more meta-data to manage and store. As with all fine grained solutions, this makes the directory structure potentially too large to store in memory and we actually cache it, destage / stage / prestage from the same disks that are providing capacity for the SEVdisk. The overhead of additional capacity we need if you end up allocating the entire SEVdisk is less than 1%. ( But if you’ve fully allocated an SEVdisk you may as well convert it to a normal vdisk (i.e. fully provisioned) as you won’t need to do the directory lookup. We even provide the means to do that with the new Virtual Disk Mirroring function – more of that in a later post. ) In general although we need <1% additional capacity to store the meta-data, this is nothing when compared to the potential savings SEV can provide – if for example you provision a 2TB SEVdisk, but only used 100MB of real space…
Storing the directory data with the real data means that we can also easily re-create the SEVdisk directory in the event of a disaster taking out the cluster and its live state information. We checkpoint this data and other meta-data to the quorum disks for use in the event of such a disaster. If you wanted to extract an SEVdisk from SVC, then again a combination of Virtual Disk Mirroring and image mode migrates can turn the SEVdisk back into a directly accessible fully provisioned LUN.
Configuration & Application Implications
With SVC we have provided two main ‘types’ of SEVdisk. You can chose to let SEVdisks automatically expand when they need more space, or you can do it manually. Doing it manually may suit you at first as you get to know the function. At any time, you can dynamically switch between the two types without any disruption to the application.
When you create an SEVdisk its just like creating a normal vdisk, except you now specify the ‘type’ and a ‘real size’. The real size specifies how much capacity is upfront ‘reserved’ for the SEVdisk. SVC allocates this space from the managed disk group. As has always been the case, a Vdisk uses space from just one managed disk group. This helps to limit the risks of an application or user going crazy and using lots of space – thus reducing the potential impact. Unlike other vendors implementations (EMC, HDS) you do not need to create a special pool of storage for SEV to use. You can use managed disk groups you have today, you can mix Vdisks and SEVdisks in the same group, or you can keep them separate if you wish. The choice is yours.
If an SEVdisk is set not to expand automatically it will not be able to grow beyond the specified ‘real size’, it will go offline until you increase the real size or convert it to automatically expand. You can also specify a per vdisk warning level. This will raise an event in the cluster error log, and can send an SNMP trap or email notification to the storage administrator. The vdisk warning level is provided to alert when an SEVdisk is getting close to its defined real size.
If an SEVdisk is set to expand automatically it will be allowed to grow until it reaches its defined virtual size (the size as reported to the host system). You can also specify a contingency value for auto expanding SEVdisks. This specifies the amount that the ‘real size’ grows ahead of the allocated size. This can be used to prioritise which SEVdisks can grow and by how much. If you have ‘over provisioned’ and you have created more virtual capacity than you have physical disk capacity in the managed disk group, it is possible that you may run out of available physical space. If this happens, SEVdisks requiring additional capacity go offline. (Its worth noting that in this case only SEVdisks that need more capacity will go offline, any fully allocated vdisks or SEVdisks that do not require additional capacity at this time will remain online) The vdisk warning level can be used with auto expanding SEVdisks to catch ‘rogue’ disks that are growing fast. A second warning level is also provided at the managed disk group level. This notifies you when the group itself is over the threshold that you have set. It may be that none of the SEVdisks are over their limit, but overall you are consuming space in the managed disk group. Of course even in the non SEV case, the new managed disk group warning level maybe a useful warning when you need to think about buying more storage, or using a different group. Both limits can be configured in terms of capacity or percentage.
When to use SEV is a good question. It really depends on the application, and how it behaves. Many open system applications typically run at no more than 50% capacity utilisation. Thats a lot of spindles and space being wasted. With SEV you can now specify a base size and let the application grow to the size it needs. Generally again most applications only then grow at a slow and steady rate. Traditional techniques are to provision as large a disk as maybe needed next year, or the year after, now you can really use what you need. Again, it comes down to understanding your needs, your application needs and the corresponding performance requirements of the application and IBM can help you with all aspects of planning, implementing, testing and through into production.
I mentioned above that one of the great things about the SVC ‘I/O stack’ design is that we can pick and choose where to place new functionality while encapsulating the impact on other components. For example, FlashCopy itself needs to sit below the cache component so that you help to hide the ‘copy on write’ nature of FlashCopy, in this case the write simply completes to cache and later we do the copy on write operation. Its even better when we get something for free. With SEV sitting below the cache and FlashCopy, we can help to hide not only the SEV directory lookups, but FlashCopy doesn’t need to know that its target vdisk is actually an SEVdisk, it simply issues a read from the source vdisk and a write to the target vdisk.
SEFC really does save you space. Now that we support up to 256 copies of a single source disk the innovative design of the SVC FlashCopy feature really starts to make a difference. SVC is one of the few products to support cascaded copies (copies of copies at later points in time) SVC’s “Cascaded, Incremental, Multi-Target, Space-Efficient FlashCopy” isa true example of IBM’s power at innovating. Lets looks at a real life example covering an application and its associated test environments.
In many cases, there can be multiple copies (“clones”) of production systems being used for various purposes such as development, test, QA and training. With traditional replication techniques, each copy occupies as much disk space as the original.
Using SEFC could significantly reduce the amount of space required because capacity would be used only for differences between the copies and the original data. Assume you use a fully-provisioned test master version of the data to create a copy that is physically isolated from the production data. From this test master copy, you create four test versions of the data. After a test run, you could “re-trigger” the copy from the test master to the test copy to reset its contents back to the original state for another test.
With traditional replication, the configuration would have required a total of five times the capacity of the production data (original data plus four copies) or even six times if you used a ‘test master’ copy as well. Using SEFC, that requirement would fall to a little over twice the production data (original data, test master copy, and some space for changes).
This example takes advantage of combining together the richness of the SVC FlashCopy function: space-efficient, cascaded, and multi-target FlashCopy are all being used. Of course you can add incremental to update the copies from the production data, where only the changed data from the previous full copy needs to be re-copied.
Anyway, here are a few of my quick thoughts. Make the grain size for target SEVdisks the same as the grain size for SEFC. That is, if you use 64K or 256K FC grain sizes, make the target SEVdisk use the same allocation size. If you plan to copy a large amount of data then 256K is the most ‘efficient’. But if you plan to make a full copy of the source there is probably not much point in using an SEVdisk as the target. Because of SVC’s layered architecture, you can use any combination of fully-provisioned and SEVdisks with all of the FlashCopy functions.
Thats my first part of whats new when you upgrade to 4.3.0, with respect to SEV and SEFC. In my next post I’ll be covering Virtual Disk Mirroring (VDM) and the other enhancements we’ve made to SVC with the new release. I’ve tried to cover the answers without knowing your questions, ask away and I’ll answer as best I can, rest assured if I don’t know the answer, I know someone that does.