ORIGINALLY POSTED 20th February 2018

It’s that Tuesday in 1Q where IBM announces a load of new products (SpectrumNAS) and enhancements to the existing products. There is so much being announced today, that I will refer you to Tony’s post that covers them all, and I will concentrate on Spectrum Virtualize as usual – for what doesn’t look like much in the marketing material, but in reality is a HUGE change to some fundamentals of how Spectrum Virtualize operates.

Since the first introduction of Spectrum Virtualize, when it was just known as SVC back in 2003, we haven’t really changed the storage pool concept or fundamentals. Partly because the virtualization layer is a pretty simple layer, i.e. a lookup between virtual and physical extents. However, in the recent years with the introduction of data reduction technology, in particular Compression and Deduplication, we need a way to be able to easily stay ‘thin’. Enter the Data Reduction Pool.

Traditional, or ‘legacy’ storage pools have a fixed allocation unit of an extent, and that itself won’t be changing with Data Reduction Pools, however features like Thin Provisioning and Compression use smaller allocation units and manage this allocation via their own meta-data structures – typically Binary Tree’s (b-tree) or Log Structured Arrays (LSA) In order to ‘stay thin’ you need to be able to reclaim capacity that is no longer used, or in the case of an LSA (where all writes go to new capacity) garbage collect the old overwritten data blocks. This also needs to be done at the smaller allocation unit size. i.e. matters of KB not GB as per extents.

Data Reduction Pools

For these reasons, Data Reduction Pools (DRP) have been designed from the ground up with 4 main tenants :

Fine Grained allocation of data blocks.
The ability to free back unused (unmapped, or overwritten) capacity at a fine grain
Give consistent, predictable performance
Optimise performance for Solid State storage, such as Flash

A DRP at its heart uses an LSA to allocate capacity. Therefore, the volume you create from the pool to present to a host application consists of a directory that stores the allocation of blocks within the capacity of the pool.

The allocation size of these blocks is now 8KB – previously thin provisioned volumes used 32KB and RACE Compression write 32KB of compressed data. This new fine grained, 8KB, allocation and addressability gives us the ability to meet all 4 requirements above. There are many reasons why this 8KB allocation is a good thing, here are just a few :

Unmap requests as small as 8KB can be catered for
The addressability of data in the pool is at an 8KB (uncompressed) boundary – compared to 32KB compressed with old RACE compression.
All random read requests are of 8KB size (or less if compressed) which is ideal for Flash Storage
With a common access size, performance is much more consistent

A DRP is a new type of pool, and there is no in-place migration of an existing pool, so you do need some slack space, or new capacity to create a new DRP and then use volume mirroring to migrate existing volumes into a DRP, just as you would to convert a fully allocated volume to a compressed, thin or encrypted volume today.

Data Reduction Pools and SCSI Unmap

DRP’s support end to end unmap functionality. That is, a host can issue a small file unmap (or a large chunk of unmap space if you are deleting a volume that is part of a data store on VMware for example) and these will result in the freeing of all the capacity allocated within that unmap. Similarly deleting a volume at the DRP level will free all the capacity back to the pool.

DRP have the built in services to enable garbage collection of unused blocks. This means that lots of smaller unmap’s will end up in allowing a much larger chunk (extent) to be freed back to the pool, and if the storage behind SVC supports unmap, we will pass an unmap command to the backend storage – again equally important with today’s Flash backend systems – especially so when they themselves implement some form of data reduction.

This gives true end to end unmap from the host all the way to the disk.

Data Reduction Pools and Easy Tier

A DRP uses an LSA a mentioned above. RACE Compression has used a form of LSA since its introduction in 2011, and this means that there is normal garbage collection that needs to be done all the time. An LSA will always append new writes to the end of the allocated space, even if data already exists, and the write is an over-write, the new data will not be written in place. Instead the new write is appended at the end and the old data is marked as needing garbage collected. This itself allows for a couple of nice things.

Writes to a DRP volume are always sequential – so we can build all the 8KB chunks into a larger 256KB chunk and destage the writes from cache, either as full stride writes, or as large 256KB sequential stream of writes.
This should give best performance both in terms of RAID on backend systems, but also on Flash, where its easier for the Flash device to also garbage collect on a larger boundary.
We can start to record meta-data about how frequently certain areas of a volume are over-written.

We can then bin-sort the chunks into a heat map in terms of re-write activity and then group commonly re-written data onto a single extent. Why? Well then EasyTier will operate correctly for not only read data, but write data when data reduction is in use. Previously writes to compressed volumes held lower value to the EasyTier algorithms, because writes were always to a new extent, so the previous heat was lost. Now, we can maintain the heat over time and ensure that frequently re-written data gets groups together. This also aids the garbage collection, both at the Virtualize level, but also on Flash, where its likely that large contiguous areas will end up being garbage collected together.

Data Reduction Pools and Compression (2.0)

RACE Compression that we have had in the products since 2011 was one of the first real time compression solutions. i.e. in-band compress and decompress of data as the write/read requests come into the system. Most other vendors run a background task to compress on the fly, which means you need slack space to store the data fully allocated before then compressing. So real-time has its advantages. We also relied on the concept of ‘temporal locality’ where there is a time based access characteristic of data blocks – when this works it works extremely well and we have many thousands of clients happily compressing their data with RACE today. However, in some application workloads where there was no temporal nature to the workload pattern. Its also virtually impossible to benchmark it easily in PoC’s etc without using real application workloads.

With DRP, we are introducing a new style of real-time compression. (I’ve been calling this Compression 2.0 – but that is in no way official!) First difference is obviously the 8KB blocks. So we get or split I/O into 8KB chunks and then compress those independently. So for small block workloads, we only have to read in the compressed (say 4KB) of data – decompress to 8KB and serve the I/O from there. Previously, with RACE 1.0 we would build at least 64KB of data, compress that to 32KB and then save that. So the smallest I/O from disk was 32KB, decompressed to say 64KB and then pull out the data you need. This could result in small block, truly random workloads turning into much larger MB/s streams, and adding additional load onto the compression engines / hardware offload.

The main benefit here is therefore predictable and consistent performance, no matter the workload. As more and more customers turn to all Flash, or predominantly Flash storage for their active data, being able to keep the latency lower, and predict the performance is critical. For these reasons we seem much more consistent low latency from Compression 2.0 and Flash storage.

At the same time, we have done the integration work to allocate the compression internal resources from the main system resources. What does this mean to you? Much higher throughputs, and access to all the system memory and CPU. The RACE 1.0 software engines used to run in their own processes, and require their own CPU and memory. Now that Compression 2.0 is integrated into the main Spectrum Virtualize process, they gain all the benefits of our multi-threaded nature – custom memory management and you don’t have to dedicate hardware just for compression. Once you migrate your last volume from Compression 1.0 to a DRP running Compression 2.0 all the old compression CPU and memory will be freed back into the system.

As a result of all of this not only is performance more consistent and predictable, but in terms of GB/s we see between 2x and 3x the bandwidth – on existing hardware!

Oh and before I forget, this all also means you can now have all 10,000 volumes on the system compressed!

Data Reduction Pools and Licences

Corrected 21st Feb 2018

There is no additional, or new license to use a Data Reduction Pool. It is included in the base capacity license in the product.

For SVC and V9000 there is no additional license to use Compression 2.0 – When you have migrated your last Compression 1.0 volume into a DRP you will no longer need to maintain your old compression license. Note I say ‘maintain’ i.e. from your next renewal onwards!

However if you are using Compression 2.0 on V5030 or V7000 there is still an additional license required for the Compression functionality.

Spectrum Virtualize v8.1.2 Software

I’ve spent a lot of time discussing DRPs and they are at the heart of the new v8.1.2 software. But in addition, v8.1.2 includes :

25GbE iSCSI Host Support – A new dual port NIC card with native 25GbE ports for iSCSI. Available for all the current generation Virtualize products only.
NVMe – Hardware ready-ness statements. The new 25GbE and existing 16Gb FC adapters are both NVMe-oF capable, and Statements of Direction were made that this will be available in a subsequent software update.

I will expand on some of the internals of how DRP’s are working behind the scenes, but for now, I hope you are as excited about this major update as we all are!