ORIGINALLY POSTED 13th January 2017
TOP 10 POST – 50,563 views on developerworks
Here’s another topic I often get asked about. Things used to be quite simple and I covered this for many years in my Configuring for Optimal Performance series of technical university presentations (also here on the blog – parts 1, 2, 3) – and the basics are still the same when configuring RAID arrays, but now with DRAID, people are asking if this still applies. In general the same rules and concepts apply, but you may have to adjust some of the thinking, particularly when you are configuring large DRAID arrays behind an SVC system.
Configuring on a Stand-Alone Storwize or FlashSystem
Things are fairly simple when configuring a stand-alone Storwize system. i.e. one that isn’t being virtualised behind an SVC. The GUI RAID configuration wizard does a pretty good job of recommending what you should configure based on your installed drives. You generally can’t go far wrong accepting its recommendations and configuring the system that way. But if you do want to have more manual control of whats happening, then you can use the CLI, or adjust the wizard accordingly.
RAID vs DRAID
[BDW Feb2020 – Authors note – clearly we have moved on from the guidance, to ‘always use DRAID-6’ messaging!]
Since most people are buying Flash drives, mainly Tier 1 flash and large Nearline SAS drives, for all of these, given their size we’d recommend RAID-6 – and with DRAID available now, DRAID-6 should be used in general for the best performance and redundancy protection.
RAID 5,6 or 10
I’ve not seen anyone use RAID-10 for a while now, not since Flash drives became the performance tier, and even with the Enterprise Tier-0 Flash drives, RAID-5 had become the normal. As for 10K and 15K drives, again, these are becoming the exception, and here DRAID-6 should be used, with the larger 10K drives in particular, DRAID-6 is the norm. With the recent performance code enhancements in 7.7.1, testing shows that DRAID-6 outperforms RAID-5 in all workloads, and so our general recommendation from 7.7.1 onwards is to always deploy DRAID-6 no matter what the drive technology or speed.
DRAID Component Counts
As for number of HDD drives, the typical best benefit for rebuild times is around 48 HDD drives in a single DRAID. Here you get the faster rebuild times down to just a couple of hours with the least impact during rebuild, with the load being spread over a large number of drives. However, its unusual – unless you are deploying an All Flash system that you will have that many Flash drives. But again, DRAID should be used, and whatever number of Flash drives you have should be added. For example, with 12 Flash drives, in a single DRAID, you don’t need to designate a specific drive as a spare, so while you still have spare capacity in the array, distributed across the 12 drives, you get the performance from all 12 drives, rather than wasting a whole drives worth of performance for spares.
For a V7000F where you either buy the F model, or build your own with All flash drives in the system, the performance of a single V7000 control enclosure is going to be saturated (in theory assuming you drive all the Flash drives to their maximum) between 16 and 24 Tier1 Flash drives – slightly less if you are using Tier0. Here you should configure all the drives in the V7000F in a single DRAID array. Since the cost of Flash drives is more or less linear with the capacity, if you wanted say 40TB of Flash, you would be better going with 20x 2TB rather than 10x 4TB for example., plus the overheads for sparing and parity is smaller.
Since Spectrum Virtualize version 7.7.1 the DRAID code has been enhanced to make full use of the multi-core environment and so the number of DRAID arrays doesn’t matter. That is, you can get maximum system performance from a single DRAID array. The old rules and restrictions no longer apply. As stated above, with 7.7.1 DRAID will have better performance over RAID in all use cases. DRAID-6 will also give the same or better performance over RAID-5 with much better data protection. Particularly important as we get to 15TB and even larger Flash drives!
Note: I created some confusion with the old – at least 4 disk (RAID array) per system comments – you will see this in the Optimizing series of posts – the point here is that old RAID (Traditional RAID) is single threaded in per array terms. So any given RAID array’s I/O is handled by a single core/thread. This meant in an All Flash solution you wanted to create at least as many arrays as you had threads (4 in V7000G1, 7 in V7000G2, 8 in V7000G2+ etc) However, this didn’t really apply to non Flash drives, since a single array of HDD drives couldn’t come close to saturating a single core. This rule still applies to Traditional RAID, but only when deploying All Flash solutions. However, we’d recommend using DRAID with 7.7.1 onwards, and then you can create just 1 DRAID if you wish, and get the same performance, since it will use all the threads/cores in the system.
Configuring Storwize or FlashSystem for use behind SVC
SVC Mdisk Handling
SVC acts like a host when it comes to backend storage. That is, we control the I/O down to the configured mdisks and just like a host we have to have a queue depth value. In general, this queue depth per disk is set to approx 60. This does change based on various factors, and I’ll cover this in another post soon, for those that can’t wait – check out the Best Practice Performance Guidelines Redbook. Note that we talk about queue depths all the time, and really what we mean is a maximum concurrency level before we start queuing. So per disk, SVC will generally send up to 60 concurrent I/O before queuing.
The HDD Rule of 8
This gives us our first rule. Since HDD like approx 8-10 concurrent I/O outstanding at the device to perform at their maximum, and this doesn’t change per drive speed, we want to make sure in a highly loaded system, that any given mdisk can queue up around 8 I/O per drive. Thus, with 60ish queue depth per disk from SVC, an mdisk with 8 physical drives is optimal.
The Flash drive Rule of 8
Well now, with Flash, for simplicity, we also have the same queue depth concepts, but remember it’s very unlikely that a Flash drive will get this close to the maximum, because we end up with such low latency (compared to HDD) that the queue’s never build up that much. Since 7.7.1 when running on V7000Gen2 or better, the Flash drive queue depth has been increased to 32 (from the old 10)
If you only have a few Flash drives, say a single DRAID array, then the same rule of 8 applies.
The All-Flash Controller Rule of 16
However for All-Flash controllers, the considerations are more of I/O distribution. Across SVC ports, and threads. Since most All-flash arrays that are put behind SVC have very high I/O capabilities, we want to make sure that we are giving SVC the optimal chance to spread the load and evenly make use of SVC resources, so queue depths are of less a concern here (because of the lower latency per I/O)
In testing with IBM and non-IBM All-flash solutions, 16 mdisks from the available capacity does just what we want – keeps the queue depths high enough and spreads the work across the SVC resources.
Stripe on Stripe / Pool Considerations
Things have got simpler here now too. Since most DRAID will use a large number of drives and you will end up with just a small number of DRAID even in a large configuration, you can put them either in their own pool at the Storwize level, or add multiple similar DRAID into one pool at the Storwize. Then create the required number of volumes from the pool to present to SVC as mdisks.
At the SVC level you then group these back into pools adding all the disks from the given Storwize pool, and of course maybe other pools if you are creating a Hybrid Easy Tier pool. Remember to check and set the tier type when adding the disks at the SVC level, as it doesn’t know what the tiers are until you tell it.
So lets say you have 12 Tier1 7.6TB flash drives, and 64 NL-SAS 8TB drives, and want to run Easy Tier.
From the 12x Tier1 Flash, create a single DRAID-6 array at Storwize and put in Pool0.
Use the Flash rule of 8, tells us we want 12/8 = 1.5 mdisks, so here we round up to 2, and create 2 volumes to present to SVC from Pool0. Now, here is where you have some flexibility. If you want to really push the Tier1 storage hard, then you could create more than 2 volumes, i.e. 4 and it will essentially double the load that SVC would drive to the devices. This really depends on how hard you want to push the system. With 2 it will perform just fine but if latency does increase for some reason (say you are pushing large blocks > 64KB) then it may queue at SVC with just 2 volumes. So going up to 4 is fine. Its not a hard and fast rule, there is some flexibility and you can just use as a guideline.
From the 64x NL-SAS, create a single DRAID-6 array at Storwize and put in Pool1
Use the HDD rule of 8, tells us we want 64/8 = 8 mdisks, so create 8 volumes to present to SVC from Pool1.
At SVC, we now discover 10 mdisks, and put them all into one pool, making sure we set the 2 Tier1 Flash drives to the tier1 class, and the other 8 to tier3 class.
You have 4 Tier0 1.6TB flash drives, 24 10K SAS 1.8TB drives, and 36 NL-SAS 10TB drives, and want to run Easy Tier.
From the 4x Tier0 Flash, create a single DRAID-5 array at Storwize and put in Pool0. (DRAID-5 here because we only have a small number and they are the fastest devices)
Use the Flash rule of 8, tells us we want 4/8 = 0.5 mdisks. But as before, this would say only 1 volume to be created, but I’d suggest at least 2 here, just to get more concurrency and the like – again, as before its not a concrete rule, use common sense along with the guidelines here and tailor to suit your requirements.
From the 24x 10K, create a single DRAID-6 array at Storwize and put in Pool1.
Use the HDD tile of 8, so 24/8 and create 3 volumes to present to SVC from Pool1.
From the 36x NL-SAS, create a single DRAID-6 array at Storwize and put in Pool2.
Use the HDD rule of 8, so 36/8 = 4.5 – so round up to 5. (Always round up) and create 5 volumes to present to SVC from Pool2.
At SVC discover the 10 volumes and add to the correct tier inside a single pool. Note – a useful tip here is to only create one set of volumes at a time, so the two Tier0 volumes first, map to SVC, discover at SVC, then you know these 2 are Tier0, then move on to the next tier and repeat, otherwise you need to tally the UUID numbers and work out which tier is which!
You have an All Flash V7000F with 16 Tier1 7.6TB drives, and a FlashSystem 900. You want two classes of storage at SVC.
From the 16x Tier1 on V7000F, create a single DRAID-6 array and put in Pool0.
Use the two rules to give you an idea of whats best… based on the Flash rule of 8 you would create just 2 volumes. But from the All-flash Rule of 16, it suggest 16 for maximum performance. So here I’d compromise and go in the middle and create 8 volumes. Since there isn’t a huge number of Flash drives in this V7000F, but its going to be able to drive the V7000 pretty hard, I’d want to make sure SVC is going to be pushing it harder than with just 2 volumes, hence picking 8. Now again, there is no concrete rule to follow, just guidelines to get you thinking about whats going on under the covers!
From the FS900, create the default RAID-5 array which automatically goes into Pool0.
Since this is an all Flash Tier0 solution, jump straight to the All-flash rule of 16, and create 16 volumes.
On SVC, discover the 24 mdisks, and put into two pools, one for each controller. Now when you think at this level, it makes sense, we have a Tier0 and a Tier1 pool, and the Tier0 pool has twice as many disks as the Tier1 pool, which means SVC would be capable of driving it roughly twice as hard.
Hopefully this has answered more questions than it asks, but remember the rules of 8 and 16 are there as guidelines.
The HDD side is more simple, stick to the rule of 8 and round up.
For Flash, things are a little more complicated, and its more about load distribution and how hard you want SVC to drive the underlying systems, so use the rule of 8 and 16, and go with a more flexible approach to deciding the number of volumes/mdisks to create, such that you can ensure you are pushing enough load into the systems to get the throughput you need.
PS – Don’t forget the host volume queue depths, you may need to increase these from the defaults if you have just a few host volumes!