• Part3c here, covers DR like configurations.

Legend :

Part3c – Comparing Availability – Active/Standby – DR Like

Normal Running

The main difference between Active/Passive and DR like configurations is that an application is generally only running at the AMS location. That is, the application only runs at one location and any access at the non-AMS location will require an application failover task – however the storage is generally always available at both locations. The active location should be aligned with the AMS and the cross-site public-ISL links are not configured in any form.

From a host/application perspective, this configuration looks more like a standard synchronous mirroring setup.

The hosts at location2 are in a pseudo-offline mode, i.e. they can, by default, see the storage at the non-AMS location, but no active workloads should be running on those hosts. (Unless measures have been taken as outlined in the notes above)

Pathing, in terms of optimised/non-optimised (preferred/non-preferred) follows the usual AMS preferred rules. So all paths to the storage at the AMS from hosts marked in the AMS location will be preferred and there will be no non-preferred paths visble, as there is no public-ISL cross site pathing.

The IP quorum device must still be deployed, and would keep the AMS active in the event of a link loss (split-brain) where both locations are still available.

Storage Failure

In the case of a storage failure, this could be the system as a whole (power issue for example) or multiple-drive failures that cause a pool to go offline, access to the storage at the failure location will be lost. If this is the non-AMS then you have lost your DR copy. If this is the AMS location, we are no in the situation where we must invoke a DR “application-failover”.

Once the application has been started at Location2, the AMS should be switched to make Location2 the new AMS. This ensures the application location matches the AMS and the system continues to operate from the “DR” location once the storage system has been fixed.

When the storage issue is resolved, the system will automatically begin re-synchronisation of the data from the current active location (DR) back to the old production site (Location1)

Once the production site (Location1) has been recovered, and the Storage Partition has returned to an “in-sync” state, the failback to the production site requires :

  • Stop application at Location2 (DR)
  • Failback the application and bring it up at Location1 (Production)
  • Switch the AMS back to Location1 (Production)
  • Resume the application I/O at Location1.

ISL Failure / Split brain

If the two storage systems lose contact with each other, then the IP Quorum is used to validate that both systems are still online. The AMS will be instructed to continue and the other system will take the paths to the volumes in the partition offline. Quorum is assessed on a per HA-partition pair basis, always following which location is the AMS for a given pair. (Unless of course one of the storage systems has of course failed – in that case the remaining system will continue as described in the previous section)

This is the reason it is imperative to ensure the AMS is allocated to the same location as where you are running the applications using volumes in this partition. If you get this wrong, the AMS (assuming it is available) will always win the quorum tie-break, then you could lose access if the application is actually running at the non-AMS location.

In normal running, there should be no impact to the applications, as the AMS location will win, and the DR location will be taken offline.

Part 3 Summary

In Part 3c we have looked at how you the system handles DR like configurations with a few considerations if you plan to use PBHA to provide a form of single direction, synchronous mirroring.

Across this whole series, we wanted to make sure people are armed with the possibilities that PBHA affords you, as well as things to consider as you plan and build your next gen High Availability solutions. Hopefully this information has proved useful and we will link this in the “How it works” section for future easy of search!

Leave a comment