All Flash Storage – Why RAID Matters

Feb 26, 2020

Barry Whyte

GUEST POST by

Distinguished Engineer – Dr Bill Scales

Today we are lucky to have the first in a series of guest posts from Dr Bill Scales. Bill is one of the lead product architects in the storage team in Hursley. Bills depth and breadth of knowledge of storage systems is second to none, which means there is always a queue of people waiting to pick his brains in the office! Enjoy.

All Flash Storage – Why RAID Matters – by Dr Bill Scales

The high speed of Flash storage has changed where the performance bottlenecks are in storage systems. In the magnetic world users had to make difficult trade offs between performance, usable capacity and reliability by choosing whether to use mirroring, single parity or dual parity RAID. With all Flash storage systems the choice is simple – use dual parity RAID-6!

Magnetic Performance
Storage systems built with harddiscs are usually limited by the performance of the individual harddiscs – it is easy to build a storage controller with enough compute power to handle the maximum I/O rate of hundreds or even thousands of harddiscs. The clever stuff in these systems is how you manage the I/O to each harddisc. For example by using distributed RAID to stripe the data and rebuilds evenly across all the drives and also by using a write cache to buffer peaks in I/O so that the workload generated at busy moments can be written to the devices when there is a slight lull.

In these systems the choice of which type of RAID to use has a big impact on performance. RAID-1 or mirroring will provide the best performance while providing redundancy against device failures, but because all your data is stored twice means you need twice as much raw capacity as the amount of data you want to store. RAID-5 stripes data across a number of devices and has a single parity to provide redundancy, the ratio of data to parity can be varied and with ratios like 10:1 can provide a much more economic reduction in raw capacity while still providing protection. However while a write update to RAID-1 requires two write I/Os to mirror the data an update to a RAID-5 array requires two reads and two writes to update the data and the parity. RAID-6 provides even more redundancy by having dual parity and hence being able to protect against two simultaneous device failures, but the penalty is that write updates require three reads and three writes because both parities have to be updated.

	Redundancy	Raw capacity available for use	Device I/O to perform an update
RAID-1	Survives single failure	50%	2 writes
RAID-5	Survives single failure	91% (with 10:1 ratio)	2 reads + 2 writes
RAID-6	Survives double failure	83% (with 10:2 ratio)	3 reads + 3 writes

Note that it is only write I/Os that require more device I/Os – the cost of a read is the same for all types of RAID, but for a mixed I/O workload this still means that using RAID-6 will reduce performance by about 25% compared to using RAID-5. So with these systems users end up balancing the requirement for high performance with the economics of providing enough storage capacity and often end up using a mixture of different RAID types and tiering their data by putting the more performance critical data on the more expensive RAID-1 storage.

Flash Performance
With an all Flash storage system the performance is unlikely to be limited by the Flash devices – todays Flash SSDs can easily sustain 30,000 write and in excess of 200,000 read I/Os. With this kind of performance as few as ten devices running at full speed can challenge the compute power of the storage controller and hence the performance bottleneck moves – it is no longer the speed of the individual devices but the overall performance of the storage controller. But why is this important? Well the answer is that it changes the performance cost of RAID and the quantity of device I/Os is no longer as important in determining overall performance; instead its all about how much compute resource is required. This significantly narrows the performance difference between different types of RAID, for example the performance gap between RAID-5 and RAID-6 is reduced to about 8%. With the other advantages of increased redundancy and being able to minimise the performance impact of rebuilds because after a single device failure there is not so much urgency to restore full redundancy this makes using just RAID-6 for all Flash storage an attractive proposition. A single RAID-6 storage pool eliminates the complexity of selecting which volumes to store on which type of RAID, eliminates performance problems during rebuilds, protect your data with the redundancy it needs for 24×365 operation while maximising the amount of capacity available for storing data.

So in summary if you’re using all flash then use RAID-6! In my next post I’ll talk more about advances in Distributed RAID.