Having built and rebuilt several demo environments with Failover Clustering using the Microsoft iSCSI Software Target 3.3, there is one gotcha that you have to be careful of: Make sure that you leave at least one domain controller un-clustered, and not stored on the Software SAN.
Here’s the deal: Microsoft Failover Clustering requires Active Directory for authentication. If one of your clustered servers goes down it won’t matter because the domain controller will just fail over to another node. However if all of your nodes go down then when the servers come back up again there will not be an available domain controller for them to authenticate to, hence the Failover Cluster will not be able to come back up, hence the DC will not come up. In other words, your network is toast.
How to prevent this from happening:
They say that an ounce of prevention is worth a pound of cure, and in this case they are right: There is a very simple solution that will prevent this issue, which is to build a non-clustered DC on direct-attached storage as your second (or third?) domain controller. My Software iSCSI Target has far more CPUs and RAM than the software SAN needs, and since the OS was already Windows Server 2008 R2 SP1, I simply installed the Hyper-V role on that server (I had already done so because my System Center Essentials VM is also not clustered) and used it as a host. I created a VM with the Server Core installation of Windows Server 2008 R2 SP1, which performs great as a domain controller while taking up few resources (RAM, CPU, storage). Two days ago when my electricity was off for an hour it took me an hour to recover. This morning when the same thing happened the recovery happened automatically.
How to recover if your cluster does go down:
If you find yourself in this situation, it is easy to want to jump off a bridge. Don’t! Firstly most bridges high enough to hurt yourself from have fences that are harder to scale than they look, and secondly someone will have to clean up your body. So here’s what you do:
- On your iSCSI Software Target, disable the Target.
- From the same console mount the LUN VHD locally. (Right-click on the device; under Disk Access click Mount Read/Write. (See screen shot)
- Copy the VHD files onto another Hyper-V host.
- Create a new virtual machine on that host called DC-Temp and instead of creating a new VHD, point to the one you just copied. Make sure that the new VM is connected to a network that is accessible to your cluster nodes.
- In the newly created VM you will have to assign a static IP address to your NIC. You will likely have the best chance of success if you assign the same IP address that the original virtual DC had. Remember, you have created a new virtual machine, even though it looks the same… the hardware is new, so there is no IP adress.
At this point your cluster nodes might have to be rebooted so they can authenticate to a domain controller, which means you will be able to manage your cluster. At this point you will notice that all of the Cluster storage has failed. You have to rerun the iSCSI Initiator and rediscover the Target on each node, and then from Failover Cluster Manager bring all of the shared storage back on-line.
Your cluster should now be healthy again; you will have to bring each of your virtual machines back on-line (from the Services and Applications tab in FCM). Remember that somewhere about the time your DC is about to go on-line, take the temporary one down so you don’t get crashing IP addresses and SIDs
When you are back up, please see the paragraph entitled: How to prevent this from happening. You’ll avoid having to do this next time.
Good luck, and happy virtualizing!
Leave a Reply