DRAID Expansion. The Important Details

When I introduced the updates in Spectrum Virtualize 8.3.1 I promised a dedicated post discussing the details around the new DRAID dynamic expansion feature. Before we dive in, there are some DRAID basic terms that you need to understand.

DRAID Basics

Check out the DRAID – How it Works post for more details but in summary a DRAID array is made up of a given number of component drives (often called a pack or row) which is usually larger than the RAID geometry. The geometry being the normal n+P+Q layout. Each drive provides a strip that makes up a stripe. The stripe width being the n+P+Q, which will always be lower than the component count. Additionally, between 1 and 4 spare areas can be configured per DRAID array.

In this example slide the component count is 24, the geometry is 7+P and there is 1 spare area. Please note that the actual algorithmic layout is not represented, this is just for simple illustration of the terms used. In reality the 7+P stripe would appear to be dispersed randomly across 8 of the 24 drives for each stripe!

So in summary the terms we will use are :

component countthe number of drives in the DRAID array
stripa chunk (256KB) of capacity on a single drive
stripea set of strips that make up a single instance of the chosen geometry
stripe widththe count of strips that make up a stripe
geometrythe chosen protection and stripe width, i.e. 8+P+Q
spare areasthe number of distributed spare strips that are available for each stripe

What is DRAID Expansion?

DRAID Expansion provides the ability for you to dynamically add one or more drives and increase the component count of an existing in use DRAID array. The expansion process is dynamic, and non-distruptive, i.e. you can do this and continue to access the data on the DRAID array.

New drives are integrated and data is re-striped to maintain the algorithm placement of strips across the existing and new components.

Each stripe is handled in turn, that is, the data in the existing stripe is re-distributed to ensure the DRAID protection across the new larger set of component drives.

All pools and volumes remain online and available during the expansion.

What can be added?

Between 1 and 12 drives can be added in a single operation or task. That is, you can decide to expand in multiples of up to 12 drives. If you need to add more than 12 drives to a DRAID array, this would involve multiple serialised tasks.

Only one expansion task can be in progress on a given DRAID array at any point in time. In addition, only one expansion is permitted per storage pool at any point in time. Up to four expansions can run in parallel on a single system, assuming you have four or more pools., i.e. one per pool.

New drives can be used to increase the number of spare areas. Stick to a rough rule of 1 spare per 24 drives.

What should I know?

It kind of goes without saying, but I will say it anyway – only add drives that match the same capability as those already in the DRAID array. Both in terms of performance and capacity.

Provision has been made to override this in extreme exception cases, always try to add the same class or tier of drive. You can override and add a superior drive class, but you won’t see the benefit and are wasting performance. Same goes for capacity, larger capacity drives can be added, but the usable capacity will match the smaller existing drives and the additional capacity will be unusable.

Only the component count, or number of spare areas can be increased. You cannot change the geometry, a 10+P+Q will remain a 10+P+Q. (I recommend going for the largest stripe width you can – so 14+P+Q ideally on day 1)

What is the impact during the expansion? And how long will it take?

The idea is to minimise the impact to the end user application. The expansion is therefore a background task that will take many hours, days, or even weeks to complete. The exact time will be dictated by how busy the DRAID array is during the expansion, the performance and capacity of the component drives. Think days… the GUI and CLI will show an ‘estimated completion date/time’

As the expansion will take some time to complete, the new free space is drip fed into the mdisk/pool. So when a complete row has been expanded to make use of the new component count, so some more capacity will be available to be used. Therefore during the expansion you will see the pool capacity gradually increase until the expansion is completed. i.e. you don’t have to wait until the entire expansion completes before using some of the new free space.

What happens if a drive fails during expansion?

The rebuild takes priority. Wherever the expansion has got to, it pauses, begins the distributed rebuild of the missing strips onto one of the spare areas and continues until the rebuild completes.

Only when the rebuild is finished, the DRAID array is online (not degraded) will the expansion process resume.

Note that the expansion does take priority over the build-back. Build-back is the final stage that a DRAID array requires when you replace the failed drive, where is copies the data back from the distributed spare areas onto the replacement drive to re-establish the original algorithmic layout.

What cannot be done?

As mentioned before, expansion does not allow you to change the stripe width or geometry. If you want to migrate from one geometry to another you would need to use migration functions to vacate the existing DRAID array of volume data, destroy and rebuild it. Its worth noting that if you have spare capacity to do this, and you are expanding a DRAID array with large numbers of drives, this may actually still be quicker.

Finally, no, you cannot shrink the component count, the function is called expansion afterall 😉

One at a time, or all at once? (Added 28/5/2021)

The process of expanding the RAID array requires all existing data in the array to be moved around so that it is striped across the new set of drives. Therefore the amount of work required to complete the expansion is the basically the same whether you add one drive or 12 drives.

So if you have multiple drives to add to an array – add them in one CLI command (or in batches of 12). It will take much less time to do it this way than to do it one at a time.


What else?

I think I’ve answered most of the frequently asked questions I’ve had so far about DRAID expansion, but if I’ve missed something, or you have a question that I didn’t answer here – ask in the comments and I will answer there and update the post.

13 responses to “DRAID Expansion. The Important Details”

  1. Hi Sir, thank you for this, very informative.

    I just want to clarify the ffg:

    -recommended number of drives to expand an array within an existing pool is 1-12 drives only?
    -let say I have existing draid6 pool with 9 drives then I want to add more 15 drives (to max out 24 drive slots) should I just put it on new array (same pool) using draid 6 also? OR expand the existing array?

    Thank you! 🙂

    Like

    1. Hi, so you can do either. Expand the array by adding 12 drives, then when that completes, add the other 3. Probably best to do this because you gain the performance of the new drives in the old array and its volumes, and will get better rebuild performance.

      Like

  2. Rasmus Teglgaard Avatar
    Rasmus Teglgaard

    Hi, so you say one can expand by 1 to 12 disks. So if we have a SAN with 16 disks using an array size of 8, can we expand by one disk only? And if so, how does that work exactly?
    /Rasmus

    Like

    1. Yes, you can add only 1 disk if you want. The array now has 17 disks but still each stripe width is 8. The actual 8 disk strips that make up a stripe are distributed by an algorithm across 8 of the 17. Each stripe uses a different set of 8 disks in such a way that we can suffer still any 2 drives failing and still reconstruct the missing strips. Its really the distributed spare area algorithm thats the masterpiece, the secret sauce so to speak.

      Like

  3. Hello, Thanks for the information, I have a question. Are there any considerations for growing an encrypted array? is DRAID

    Like

    1. No extra concerns. The encryption is on the DRAID array, so adding more drives just keeps the same encryption.

      Like

      1. Thank you very much for your answer!

        Like

  4. Hey. Help me understand the stripe width parameter. When creating an array inside a pool of 18 disks (everything that is in the array), the system offers to select the stripe width equal to 12. But it also allows you to expand it to 16, while the maximum pool size grows. What does it depend on and how does increasing the bandwidth increase the pool capacity? Thank you!

    Like

    1. Hi, so the stripe width dictates how many strips make up the stripe. If you have a small stripe width, say 4+P`Q then for ever 4 data strips you have 2 parity strips, so 2/6 or 1/3 of the total capacity is lost to parity. If you have 14+P+Q then 2/16 or 1/8 of the total capacity is lost to parity etc. So the larger the stripe width, the less proportional overhead is taken away by parity.

      Of course spare area also in real life reduces this further.

      To maximize capacity use a larger stripe width but be aware it could take longer to rebuild a drive as each parity strip requires more data strips to be read to rebuild it.

      Like

      1. Thank you so much for your answer! I always thought the stripe width was equal to the number of disks in the array (not including parity). This is written in the tutorials. Are there any possible problems if I increase the stripe width from 12 to 16, what else can this affect?? And also, please tell me where you can read about the rebuild time, which depends on the bandwidth. I would like to understand the work DRAID in more detail. Thanks again!

        Like

  5. Vineet Sharma Avatar

    Hi Barry,
    A nice a detailed one, as always.

    Two queries:-
    Does the drive part number has to be the same during expansion? (from same vendor only)
    And
    Does different firmware on drives prevent expansion?
    Its for FS7200 with 8.3.1.1
    but
    FCM drive’s f/w are 2.0.9 and 2.1.3 .

    Like

  6. You noted that the expansion is a background activity. I would like to give the expand more priority. In my case I am adding 6 drives to an existing, 66 x 6TB drive DRAID. It says it is going to take another 378 hours, It has already been running for over 2 weeks. 🙂 This array is used to store backups, so speed is not the main function, but space is getting a bit tight. I am hopeing there is a setting similar to the Mirror sync rate on a volume.

    Like

    1. Hi Bill, how long did the expand take? the article says the pool space gradually increase, is this true? thank you

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: