Snapshots are one of the great features of virtualization, but forget about one on an active VM for too long and you may be calling for cleanup on aisle one. Recently there was a fair amount of discussion among VM bloggers about snapshots. Jason Boche had a nice article that summarized some monitoring strategies and Rich Brambley also weighed in.
You may be wondering what makes snapshots so great. At the top of the list is the ability to have an easy roll-back plan for failed upgrades. For example, an administrator can take a VM snapshot before applying patches to a guest OS. After rebooting and testing, if all is well, simply delete the snapshot and move on — the changes are merged while the virtual machine is running. If for some reason a patch caused problems, the previous state can be quickly restored.
That’s how VMware ESX works. Is Microsoft Hyper-V the same?
It turns out that Hyper-V snapshots (or SCVMM checkpoints, depending on which single pane of glass you are using) use a slightly different design. The Microsoft interfaces allow administrators to remove snapshots from powered-on VMs, and after doing so, they appear to be gone:
But, upon closer inspection, the changes are not actually merged until the VM is powered off:
What happens if an administrator is not aware of this behavior? Just as if the snapshot was never deleted, it grow and grow until the LUN on which it lives runs out of space:
And when that happens, the VM will go into a Paused-Critical state:
Cleanup on aisle one!
What is the best way to recover gracefully from such a problem, if there really is no free space on the LUN? Gabes Virtual World published a great tip that could help next time. Maybe all that ISO file copying isn’t so bad after all.
Pingback: VCritical · Half the VM reboots on Patch Tuesday
I know this is a bit of an old post, but just as relevant today, so I have a question. Do you know if this functionality will be improved in Hyper-V 2012 so that we don’t have to turn the VM’s off to merge the delete snapshots? Can you link me to an article if you know of one that explains the improvements, if any? Thanks!