This is the second article in the series “Everybody Lies: Backup”  The introductory article may be found here.

The other day I had a potential customer ask me about backup ingest rates (i.e., how fast could a backup potentially be for one of our appliances.)  I told her that the fastest the backup could be would be limited by the network that connected our backup appliance to the system being backed up.  I ran through some numbers with her – and then was surprised when she told me that another backup vendor could backup much faster than that.  She sent me a link to the information.

The backup rate that the vendor was quoted was huge – far higher than the physical network was capable of supporting!  I started digging through the information – and found that the way that the ingest rate was being calculated was based on what the THEORETICAL performance of doing a master backup every day, and then included in their THEORETICAL ingest rate the improvement of doing incremental forever backups.  They then assumed the change rate was relatively low.  Thus they were able to claim amazing backup ingest rates far surpassing anything I had ever seen.

In other words, the backup ingest performance was being specified for data that the backup appliance didn’t actually ingest. It’s a neat trick. Once I explained to the potential customer what had occurred and how the number was being computed…well, I think that the credibility of the other backup vendor took a pretty big hit.

Sigh.  I was used to this trick with the deduplication vendors specifying ingest rates that didn’t include the backup server and backup software – now I have to also get used to the trick of publishing backup ingest rates for backup data that you don’t ingest.

One of the things that marketing people are taught is that numbers are good – that by using numbers marketing copy sounds more authorative – and thus the reader tends to believe writing in which numbers are cited more than qualitative copy.  I have no doubt it’s true.  At the same time, it’s important to dig in beyond the numbers – and understand how the numbers were actually were calculated.  Otherwise – you’re going to be subject to backup buyer’s remorse.

Here’s the truth.  Your backup ingest rate is typically governed by a combination of a few major factors:

  • How fast can you read the data and how much parallelism (the number of concurrent reads) of that data can occur?
  • What’s the effective usable bandwidth between the client being backed up and the backup appliance (or server?)
  • How fast can the backup appliance (or backup software on a backup server) process that incoming data?
  • How fast can the backup appliance (or backup storage) store that incoming data?

Got any favorite backup lies that you’ve found?