How would you prepare for a complete datacenter power outage?
I had the opportunity to consider this recently — last weekend, some of the VMware labs needed to be shut down completely for widespread power maintenance. It was a minor inconvenience, but fortunately these occasions are few and far between. At least I fared better than Scott Lowe who recently powered off all of his gear for nothing. Ouch!
In this particular lab the primary hardware is a couple dozen HP servers and two CLARiiON storage arrays. Naturally, all of the supporting infrastructure is virtualized, so I looked at this as an opportunity to see how I could best manage such an event. Oh, by the way, this lab is two states away, so everything is done remotely.
When I returned to the office after the weekend, I was pleased to find things up and running again. These were my main considerations:
Make vCenter Server easy to find
While some will still debate the merits of running management servers on physical machines, I say virtualize them. However, with the wonders of VMware DRS automatically balancing workloads across a cluster, it may not be entirely clear where the vCenter Server management VM is running. This is an important piece of information because if something goes wrong and vCenter Server is not running, it may be necessary to connect directly to a host for manual startup.
I used one of the newer vSphere features to create a DRS group — consisting of one VM and one host — that would keep my VC VM on a desired host:
Another option would have been to simply disable DRS for the vCenter VM, but when the servers all suddenly powered on, I could not control where that VM would be initially started. Using the DRS groups technique, the VM can start and then migrate to the preferred host. If a problem prevented the host from booting up, then at least vCenter would start on another host in the cluster.
The DRS rules looked like this:
Also note that I have an anti-affinity rule that strives to keep the two domain controllers on different hosts to better tolerate an outage. Overkill for a lab, perhaps, but it’s easy in vSphere — why not take advantage of it?
Start your engines!
The only other thing that I configured before shutting everything down was automatic startup for the domain controllers and the vCenter Server VMs, as seen here:
The VMs started as requested when power was restored to the datacenter. Everything came up, so I consider this a success. It’ll probably be years before something like this happens again, but it’s good to know that there are some features in vSphere to accommodate whatever IT challenges are thrown your way.
Is your datacenter ready?
What measures have you put in place to recover from a complete datacenter power event? Have you deployed a small portion of key infrastructure on physical machines or virtualized them on a separate management cluster? Will you trust the automatic startup feature, and if you use VMware DRS, do you have rules in place to steer things in a predictable direction?