Image- Versus File-Based Backup
George Crump does a good job of giving an overview of image- versus file-based backup and the consequences of both in a blog post at InformationWeek (link to the blog post.) Since I work for a company that does both image and file-level backup, I think I understand the trade-offs.
I understand the tremendous advantage that image has when dealing with a slow file system (such as NTFS) and performing a master when you have millions of relatively small files. Your choices when you are in that situation and want to use file-based backup are simple: either parallelize the backup or wait a very, very long time.
But it’s also important to note that there are other operating systems in which performing a master of millions of files isn’t painful – because they don’t carry the overhead of NTFS. Linux is a prime example. If your backup software aggregates the files and then sends them downstream to your backup appliance’s ingest function the practical network overhead is the same as streaming a large image over that same network.
At the same time, I understand the disadvantages that image (which we call BareMetal) has with respect to file recovery and deduplication (deduplication beginning at the block level is less efficient on the same hardware as deduplication starting at the file level and then working on a subfile basis.) Yes, there’s been some work on “cracking” the image and being able to restore files – but I’d note that most of this work applies to Windows. If you’re trying to do heterogeneous backup across many operating systems, you’re going to be “cracking” a lot of images – and that’s going to drive your complexity up as well as get you into versioning problems across tens and tens of operating systems and applications.
Regarding the vStorage API set, image-based backup, and differentials — the company at which I work has a new release coming out in a few months called Release 5 which uses the vStorage API set in precisely the way George describes. My one comment – and I touch on it above with respect to deduplication – is to pay special attention to how much hardware (cores and memory) you’re talking about when trying to eliminate redundant blocks. It can get expensive — quickly — or otherwise it is slow and painful.