[Warning: pretty technical article, but if you’re interested in virtualization and backup, hopefully an illuminating one. Also – changed the name of the other vendor who is a little reluctant to have their name included in our blog.]
A few weeks ago, a potential buyer tested Unitrends backup ingest rates versus a virtualization-only vendor backup ingest rates in a virtual environment. I was surprised when this person reported that the virtualization-only vendor was much faster – typically we’re told in these situations that we’re slightly faster – in fact I was so surprised that me and a few people got involved to understand what was going on. I thought that the results of what we found would make an interesting blog post.
First, the basics. virtualization-only vendors and Unitrends support an HOS (Host Operating System)-based backup. This means that both companies supports a method of working underneath the virtual machines using the Microsoft VSS (Volume Shadow Copy Service) that handles the quiescence of the virtual machines. (Note: for you VMware aficionados, VSS is the approximate counterpart at the HOS level to VMware’s VADP (VMware API for Data Protection.) Both HOS-level VSS and VADP take advantage of Windows-based virtual machines performing VSS at the GOS (Guest Operating System) level.
Now, there can be differences in how HOS-based backup is implemented. In VMware, these differences are typically not major since the software uses VMware’s native CBT (Changed Block Tracking) and pulls data off as quickly as it can. In Hyper-V, it can be since vendors have to implement CBT themselves (or don’t do it.) However, our CBT algorithms are pretty quick – and the comparison was being done to master backups that didn’t contain changed blocks, so we were confused.
The major difference in any type of HOS-level virtualization-based backup performance typically has to do with compression (the algorithm used) and deduplication (whether inline or post-processing is used, whether local or global deduplication is used, and the like.) What confused us was that in our labs Unitrends, because it uses inline compression but post-processing global deduplication, typically is faster than products that do inline deduplication at the backup job level.
Now, there’s one more wrinkle in the story. In addition to HOS-level backup, Unitrends also supports what is called GOS-level backup. GOS-level backup in a virtual system simply treats the virtual machine as if it were a physical machine – in fact the GOS-level software typically doesn’t have a clue as to whether it’s running on a physical or a virtual machine. GOS-level backup can be done at an image-level (bare metal is an example) or a file level for ingest rates.
What’s typically the critical path with GOS-level backup? The Windows file system (NTFS) is slower when performing read operations than treating the entire virtual machine (in an HOS-level backup) or the disk within that virtual machine (in a GOS-level backup) as a series of blocks (or an image.) This applies to the first master. After that, you use techniques such as incremental forever to drastically lower all successive backups
What that means is for the same set of files, an HOS-level backup is faster than a GOS-level file-level backup. Now – the advantage to GOS-level file-level backup is that you can be a lot more selective about which files you choose to backup – so if you’re backing up 10X fewer files (for example) the slowness of NTFS is offset by dramatically fewer total read operations (and of course deduplication has much less work to do.)
What did we in the end discover? That a comparison was being made between our GOS-level backup to an HOS-level backup on a file-for-file basis.