An Overview of Local Versus Global Deduplication with Respect to Backup

Earlier it was noted that one way that deduplication device vendors handled some of the shortcomings of their approach was to sell multiple deduplication devices. These multiple deduplication devices are a pretty expensive way to solve the ingest rate issues of both inline and post-processing deduplication as well as the processor, memory, and general resource requirements of block-level deduplication when done at the target.

If you require multiple deduplication devices, either now or in the future, you want to ensure that these devices appear as federated storage – which simply means that when you add a second physical 1TB device to the first physical 1TB device that the device appears as the aggregate (2TBs) of those devices.

In this situation, you want to make sure you understand local versus global deduplication. Local deduplication in this environment means that each separate device performs deduplication for only that device; global deduplication means that deduplication applies across all of the devices. Global deduplication tends to have a superior data reduction rate because only a single deduplication index (or cache of actual blocks) is needed; the disadvantage is typically one of performance.

Federated Storage, Deduplication, and Resiliency

If using global deduplication and federated storage, pay particular attention to the resiliency and availability of the network topology used.  A lesson here from the aircraft industry.  Two engines increases the probability of engine failure, but decreases the probability of a crash due to engine failure.  Make sure that you can lose one of the devices that is a member of the federated pool and that you don’t lose your deduplicated data.