This question has come up a few times in recent conversations, and was asked at the recent German virtual Storage Strategy Days event. Evelyn Perez covered this in the Q&A session, but I thought it was worthy of a post to help explain.
The question comes in various guises, either “When will you support”, “Does it support” or “Why doesn’t it support”. Hopefully this will answer them all.
Polling vs Interrupt
We need to do a little bit of a technical level set first around how a hardware device driver works.
There are two main types of device driver. The most common and more typical is an interrupt driven driver. Clearly when a bit of hardware wants to do something, in our case copy some data from the network into the storage system, it has to be given access to copy data into memory. The normal way to do this is to pull an interrupt line/signal to tell the kernel that some bit of hardware wants to interrupt whatever else is going on and have control so it can copy from the hardware buffers into the system memory.
Typically this memory copy and the interrupt itself have to be processed by the kernel. When the kernel runs, all other functional code running in non kernel space (user mode) halts, the CPU will context switch all its execution lines, branch prediction gets discarded, caches flushed etc etc and then give the kernel control. The kernel can then handle the requests from the hardware. This wastes time and CPU resources, as the context switch is costly itself, both in CPU cycles, and basically no-op processing. (NB: branch prediction equates to a HUGE amount of processor speed/power these days). Then you need to decide just how long we let the kernel side run, before we kick it out to let the user mode processes run again… less than ideal for a real-time system.
For these and other reasons, when we designed the original SVC device drivers we chose the alternative, a polling loop.
Here, instead of the hardware telling the software it has some work to do via an interrupt, the software regularly polls the hardware asking it if it has something to do. We then run this in a loop that constantly does useful work, and polls to find more etc. By doing this, and of course writing our own memory manager, we can keep the device driver function running in user space and don’t need to context switch to the kernel to process the memory copy / hardware buffers.
So clearly a polling loop is much more efficient and has no wasted kernel timeslices, or costly CPU context switches, just to process an interrupt – essentially there are no interrupts.
NVMe-FC Drivers
The biggest advantage that NVMe device drivers provide is that they also use a polling loop. Traditional SCSI based Fibre Channel drivers are interrupt driven (clearly SCSI-MQ is the exception, but lets leave that for another day).
So an NVMe initiator (host) device driver can remove all the same interrupt overheads from processing at the server side. This is why you see potentially big savings on the CPU usage on the server (initiator) when moving to an NVMe-FC driver.
Also, if you were a storage system that used an interrupt based SCSI driver (target), you will see a big difference by moving to a polling based driver. You can see who had pretty shoddy device drivers by those that claim big latency savings on the storage side when moving to NVMe-FC… no names…
SVC is an initiator and target
Since SVC – and hence all FlashSystem products too – use a polling loop we didn’t see any gains on the front end latency at the storage. Yes, the server itself will save CPU and resources, but SVC was already interrupt-less and so doesn’t gain any savings internally from NVMe-FC to the host.
You may then think, ok, so SVC is a target device (server -> svc) and an initiator device (svc -> storage controller) – so why don’t you support NVMe-FC on the backed initiator connections (the original question)
Well, in this case SVC IS the host (initiato), and since the device driver is already a polling loop driver, we don’t see any CPU savings here. Therefore there is no advantage to providing NVMe-FC on the SVC side.
Now you could argue that if you have one of those shoddy storage controllers, it would make savings there if we could talk to it via NVMe-FC – and I guess that is true. But most storage systems have worked around the context switch / interrupt issues by various means, and so there aren;’t that many systems that would benefit.
And of course, because FlashSystem, and Storwize before it, all use the same polling loop drivers,. if you have a FlashSystem or Storrwize controller behind SVC the whole thing becomes a non-issue.
Leave a Reply