Windows Server 2012 – Failover Clustering Explained

By | February 26, 2014

What is Failover clustering?

  • High Availibility (HA) is a service level for a given app or service (aka workload).  The usually is based on SLA’s with customers.
  • FC is a HA method that protects a service or app from downtime such as failover or fallback.  You can set this up as an Active/Passive configuration or an Active/Active configuration.
  • Microsoft added a large amount of features to failover clustering with Server 2012.
  • If you are load balancing SQL, VM’s, Web sites, etc. it is ideal to have your storage shared.  Each application can you use a secondary NIC/FibreChannel connection to reach the shared storage being used.
  • It is also recommended to have a separate network for heartbeats between servers.

Failover Clustering improvements in Server 2012:

  • More scalability, each cluster can have 64 physical nodes/4000 VMs.  This is up from 16/1000 in Server 2008.
  • CSVs (Cluster Shared Volumes) now support SMB 3.0 and SoFS (Scale-Out File Server) in addition to Hyper-V.
  • Cluster-Aware Updating (CAU).
  • Faster and more comprehensive validation routines for cluster validation.
  • Cluster.exe is gone and fully replaced with PowerShell v3.

Cluster Shared Volumes (CSVs):

  • Historically, a cluster node cannot see shared storage unless it becomes the active node for a resource.  Thus, failing over a VM resource meant failing over the entire LUN to the other host.  CSV’s enable multiple cluster nodes to access a single LUN concurrently.  From now on there is no need un-mount LUNs for failover to occur, allowing faster failover since both are actually connecting to the same LUN at the same time.
  • Local Mount Points are used – C:\ClusterStorage\Volume1\ is how a LUN volume would show up on a host.
  • We can now fail over individual VMs independently of others.
  • Thanks to SMB 3.0, Active/Active file share(s) can be used.

Quorum and how it works:

  • This is the number of nodes/elements required to keep the cluster online.
  • Each element (node, disk, file share) gets a vote in the Quorum to decide if the cluster has enough resources.
  • Quorum type can be left to the OS to decide or manually assigned.  Microsoft recommends letting this be decided by the OS.
  • The different types of Quorum voting works as shown below:
  • Node Majority:  Only nodes have a vote; more than half must be online.  This happens with an odd number of nodes.
  • Node and Disk Majority:   This includes the disk witness/storage and you must have an even number of nodes.
  • Node and File Share Majority:   FS witness included and you must have an even number of nodes.
  • No Majority, Disk Only:  Last node standing, only requires connectivity to the disk.  As long as you have disk connectivity the cluster will still continue even if 4 out of 5 nodes are down.

Networking and Storage Options:

  • Network requirements:  Public, Private (UDP unicast, port 3343), and Storage connections are needed for cluster connections.  The private is used for heart beat messages, the storage is used for storage connectivity, and public is your public facing network connections that the cluster is serving.
  • Storage Options:  SAS disks (Storage Spaces; iSCSI target (this is the windows SAN storage feature); NAS),  iSCSI (“Cheaper SAN option), or Fibre Channel (Expensive and Legacy compared to iSCSI) can all be used for shared storage.

Requirements for Building a Cluster:

  • Be sure to use ceritified hardware.
  • Nodes should be identical HW.
  • The NICs need to use identical settings.
  • Nodes should have matching server roles.
  • Any edition of 2012 will work with any GUI settings/layer.
  • Be sure to run any and all validation tests.

Monitoring Failover Clusters in Server 2012:

  • Event Logs
  • Performance and Reliability Monitor
  • Tracerpt.exe event tracing
  • MHT reports from the Cluster Validation Tool
  • VM Monitoring:  Detects workload failures inside VM HA roles.  You can also specify the services to monitor on a per VM basis.  Lastly, it is a cheaper alternative to SCOM (Systems Center Operation Manager).  The VM Monitoring is all PowerShell based and lets you collect specific information.

Cluster Aware Updating (CAU)

  • Lets you handle updating without having to manually fail over resources.
  • CAU lets you take a node offline, drain roles, apply updates, reboot and resumes the hosted roles while moving the updating to another node transparently.

Noteable PowerShell Commands

If you wanted to Failover cluster services don’t know the name of the WindowsFeature you need, you can run:

Get-WindowsFeature *FILE* – this would show you the features tied to file services to install.