After another customer recently had an issue with old VMware snapshots, I thought I would put together some pointers regarding VMware snapshots on production servers and some items I discussed with the VMware engineer at the time.
- Snapshots are not a backup and should not be treated as such, snapshots should be used as a moment in time recovery point whilst undertaking installations of updates and new software etc on a server. Once you are happy remove the snapshot immediately.
- Only keep snapshots live for the shortest amount of time possible.
- As a rule of thumb ensure your snapshots are deleted at the latest within 24 hours, if you have a requirement to keep them longer consider taking a backup instead.
- Don’t remove snapshots during your servers busiest times
- If you have a very old snapshot consider either cloning the VM to a new VM (This will consolidate the snapshots and keep the original for fail back) or turning off the VM and removing the snapshot, this will mean there is no change happening to the delta whilst it is being consolidated.
- When removing large snapshots from the VI client it will timeout after around 15 minutes. This doesn’t mean it has failed! Be paitent check via the service console the progress by looking at the datestamp on the VMDK.
- Check regularly to ensure there are no outstanding snapshots, a tool such as the Virtualisation Eco Shell will help you with this.
- If you wish to check for snapshots from the service console use the following command, this will show you the location and size of the delta files.
find /vmfs/volumes/ -name “*delta*” -type f -print0 | xargs -0 du –human-readable –total
Please feel free to comment other suggestions, I will add to this other time.
what are you thoughts when it comes to wanting to keep various versions/snaps of a vm for testing, if i’m testing features in one version of product compared to another and want to use snapshots for keeping those two versions
You could make use of clones and clone the vm in the different stages, although if the server is non production and only used for testing the change maynot be big. I would recommend keeping an eye on the delta sizes and available free space on the relevant luns.