First off, I wrote about this seven years ago… but it seems the usual suspects and some newcomers are back on the FUD spreading band-wagon trying to say that ALUA must mean active/passive … it doesn’t!
Secondly lets get the main fact on the table straight away :
Spectrum Virtualize, hence SVC, Storwize and FlashSystem are all active/active controllers both in terms of pathing and I/O processing.
IBM has never implemented an active/passive solution. Most vendors have gone away from them too (EMC) – except Pure, who implement active/active paths, but active/passive I/O handling and try to claim it as a benefit….
Really that’s all that needs to be said, but this comes up again and again, particularly if you have been drinking too much of that orange Kool-aid
But ALUA…
So why the confusion and or FUD… well it all comes down to the part of the SCSI spec that implements a path discovery description that is known as Asymmetric Logical Unit Assignment (ALUA). If you have access to T10 SCSI documents you can go and read all about it, but for those with neither the will nor desire to go that far, ALUA was originally intended to create target port group assignments.
The basic idea is a way to advertise that some access routes to a given LUN or group of LUNs may not be as efficient as others.
Some early implementations of SCSI based storage controllers (going back 20+ years) took a lazy approach and built 2-way controller systems, but with one controller being a passive device. In such a system (DG anyone) all data actually flowed through just one controller and only if that failed would it make the painful act of failover occur, thus bringing the passive controller online and moving all I/O flow to that now active device. But at any point in time, only one half of the the system was actively processing I/O. It makes for simple coding, much simpler error handling and generally can be seen as a poor man’s approach to implementing a 2-way system. (I find it amusing that a certain orange vendor attempts to turn this into a positive in their marketing – again, avoid the sugar rush from the Kool-aid)
So ALUA has been tainted with the active/passive mantra, when in reality ALUA was just how they butchered the SCSI spec to implement such an active/passive system.
ALUA – optimised/non-optimised
The optimised and non-optimised flags on a path can however be used to provide an end user performance benefit.
Spectrum Virtualize based systems use the optimised/non-optimised flags (by default) to create what we call preferred and non-preferred paths. Each LUN is aligned with a node and paths to that LUN on that node will be marked as preferred. Paths to the other node as non-preferred. Straight off, we have a nice bit of load balancing. But remember, don’t confuse this with active/passive.
Both preferred and non-preferred paths are active.
That is, you can send I/O to preferred and non-preferred paths and the system doesn’t care. It will handle them, and there is zero performance impact for sending an I/O via a non-preferred path.
So why use this …
The nitty gritty details…
If you’ve made it this far, then we are now into the weeds as they say,.
Read I/O
When a read request is received by a controller the first thing it will do is look to see if it has that data in cache. If yes, great, we can return it without doing any backend storage (disk/flash) I/O and all is good.
How does data get into the cache? Well if we read data from the backend, we may as well hang onto it for a while (if there is space in the cache) – the application my read it again soon.
If you’ve written data, and we’ve written it to the backend, we can turn that into potential read data now (since a read of what has just been written == what has just been written!).
Finally a cache will often look out for sequential workloads and, when detected, will start to guess what you are going to ask for next (read-ahead or pre-fetch).
Read data is not usually mirrored, it only stays in the cache on the node that processed the I/O.
If you always send I/O for a given LUN to the same node you increase the probability that a re-read will hit data already in cache.
For example, if you read the same bit of data twice, but send the read to two different nodes, then each one will have to read from the backend because the read cache isn’t mirrored. The same goes for pre-fetch algorithms, if the sequential stream stays on one node, we can easily detect it and start reading ahead. If you jump around between nodes, we don’t actually see sequential I/O, more like a jumping square wave pattern…
So for read workloads, by honoring the preferred path, we can improve potential read performance (reduce latency by getting more cache hits)
Write I/O
For write I/O, its simple. There is no actual performance difference no matter where you write to. Since a write has to be mirrored to both nodes in a caching controller, then writing to the preferred or non-preferred node has the same end result – both nodes get a copy before the write is acknowledged back to the host.
However, think about what the controller has to do with the write. As far as the application is concerned its done. But all that has actually happened is we’ve made two copies of the data, one in each node’s cache memory. At some point later the controller has to write that out to the disk/flash media itself to permanently store it (i.e. destage the write). The last thing you want to do is let both nodes to do that, not only is it wasteful, but also you run the risk of creating inconsistent data at the backend. So one node is nominated to do the destage. For this we use whichever node is the preferred node, i.e. the one that advertised the preferred paths. Its responsible and again has the nice effect of load balancing the backend destage work.
So there are the details, that’s why we have a preferred and non-preferred, optimised and non-optimised ALUA based implementation, but remember, don’t confuse path optimisation, with path activation – ALL paths are active no matter their optimisation state.
Disabling optimised/non-optimised
Hopefully after reading this you will see that there are real end user benefits to having the optimised and non-optimised pathing rules. But if you really have been brainwashed and want to reduce your potential cache hit rates you can modify your system to ignore the ALUA path settings.
In most multipath configuration tools (scripts, config files etc) you can tell them to use a different pathing model. By default they will recognise Spectrum Virtualize LUNs and setup the pathing as described here – but if you set them to “fixed” or “round-robin” and remove the “weight” or “priority” settings then the system will use a different pathing model and you can send I/O to all active paths, preferred or not.
But as I say, really, don’t get sucked into the FUD spreading, ask your vendor directly and if they say that ALUA is bad ask them why? If they start down the active/passive argument, take them to task and ask them what about active/active with ALUA… surely that *could* be a benefit if you implemented it like we have.
Leave a Reply