RTO/RTA: How to measure RTO compliance of your backups using Powershell

A few weeks ago we reviewed common acronyms used in Business Continuity and explained how to find out if your backups are in compliance with your RPO policy. You can read the first article here: RPO, RPA: How to measure RPO compliance of your backups using Powershell

Today we are going to do the same exercise for RTO.

Unitrends Enterprise Plus edition provides reporting on RPO/RTO compliance and actual values, both virtual and physical but if you dont have Enterprise Plus license in this post we are going to explain how you can do it by yourself using powershell and Unitrends REST API.

Recovery Time Objective

RTO is the maximum length of time that a system or application can be down in case of a disaster or major IT outage.

RTO is a business specification that dictactes the BC/DR strategy IT have to implement to match the RPO. Depending on your business objective you can use technologies like active-active metroclusters, storage replication, backup to disk, backup to tape, backup to cloud, etc

Once or twice a year companies do a DR excerciste where they simulate a disaster and try to recover their systems and measure the time to recover. If there is a significant GAP/difference between RTO (goal) and their recovery time they will have to realign the DR process and look at other technologies that allow faster recovery.

With the growing number of changes that occurs to IT infrastructure every day the time to recover a system can vary a lot from week to another. Just an upgrade to an application or system patch may cause it to take more time to recover, if the last backup you are trying to recover from is corrupted then you have to discard it and start again from a previous one increasing the recovery time.

Doing this exercise once a year is not enough to be sure you will be able to recover your systems in time during the next few months.

So how can you be sure that you are in compliance at anytime? Monitoring your RTA daily and reporting any deviation between actual and target values.

Recovery Time Actual

RTA is a metric that represents the actual amount of time an organization needs to recover an application.

To calculate your RTA you have to do a DR exercise, there is no way to predict it. You have to restore your systems from a secondary copy to an isolated enviroment where you can test and validate your applications without affecting production systems. This time may vary due to daily changes.

To be in compliance with your RTO at any time your RTA always must be lower than your RTO definition.

Unitends Instant Recovery provides a fast way to recover and run your virtual machines directly from backup. It takes just few minutes to have to your systems online while in background the virtual machines are moved from backup storage to production storage using hypervisor online migration capabilities.

Lets have a look at how you can daily monitor RTA from your last backup using powershell and Unitrends REST API.

How to use powershell to get RTA from your backups

To calculate RTA we will use Start-UebIR powershell cmdlet from Unitrends Powershell Toolkit to spin-up a virtual machine directly from backup storage and then use VMware PowerCLI to connect to vCenter and monitor VM power on process until vmtools heartbeat is OK.

Once we get vmtools heartbeat we can say that the VM has bootup succesfully without errors. If vmtools are not running after 5 minutes (you can modify it) we can assume the virtual machine didnt boot successfully from last available backup.

The script content:

param(
        $Name,
        $BackupId
	)

    #Change this settings for your default enviroment values
    $ESXHost = "38373035-3436-5a43-4a39-343430343745"
    $Datastore = "TINTRI"
    $UebIp = "192.168.11.20"
    $timeout = 5
    #End of settings


	CheckConnection

    New-VIProperty -Name ToolsStatus -ObjectType VirtualMachine -ValueFromExtensionProperty 'Guest.ToolsStatus' -Force|Out-Null

    $start_date = Get-Date
	
    Write-Host " [*] Starting Unitrends Instant Recovery..."
    Start-UebIr -Host $ESXHost -Name $Name -Datastore $Datastore -BackupId $BackupId -Address $UebIp

    $ir = Get-UebIr

    while($ir.status -eq "running") {
        Sleep 5
        $ir = Get-UebIr
    }

    if($ir.status -ne "available" ) {
        Write-Error "Unitrends Instant Recovery Failed"
    }

    Write-Host " [*] Waiting for VMware tools to be ready..."
    

    $timeout = new-timespan -Minutes $timeout
    $sw = [diagnostics.stopwatch]::StartNew()
    
    $vm = Get-VM $Name
    while($vm.ToolsStatus -ne "toolsOk" -and $sw.elapsed -lt $timeout)
    {
        Sleep 5
        $vm = Get-VM $Name
    }

    if($vm.ToolsStatus -eq "toolsOk")
    {
        Write-Host "    $Name VMtools heartbeat is successful!"

        $end_date = Get-Date
        $rta = New-TimeSpan -Start $start_date -End $end_date
        Write-Host " [*] $Name RTA is $($rta.days)d $($rta.hours)h $($rta.minutes)m"
    } else {
        Write-Warning "    $Name VMtools heartbeat not OK: $($vm.ToolsStatus)"
        Write-Host " [*] $Name RTA is unavailable (vmtools didnt start)"
    }

    Write-Host " [*] Stopping Unitrends Intanst Recovery..."
    $stopir = Get-UebIr|Where-Object {$_.vm_name -eq $Name}
    Stop-UebIr -Id $stopir.virtual_id

Running our script

Open VMware PowerCLI console and connect to vCenter and Unitrends Appliance:

PowerCLI c:\unitrends-pstoolkit> Connect-VIServer -Server vcsa01 -User root -Password xxxx
PowerCLI c:\unitrends-pstoolkit> .\Init.ps1
PowerCLI c:\unitrends-pstoolkit> connect-uebserver -Server ueb08 -User root -Password xxxx
PowerCLI c:\unitrends-pstoolkit> cd Scripts

PowerCLI C:\unitrends-pstoolkit\Scripts> .\Get-UebBackupRto.ps1 -Name SQL01_restore01 -BackupId 1609
 [*] Starting Unitrends Instant Recovery...
 [*] Waiting for VMware tools to be ready...
    SQL01_restore01 VMtools heartbeat is successful!
 [*] SQL01_restore01 RTA is 0d 0h 2m
 [*] Stopping Unitrends Intanst Recovery...

Your RTA for this SQL service was 2 minutes! But this is just a small test. In a real DR scenario you will be recovering multiple machines and there will dependencies between them so you will have to boot first domain controllers, then databases and later front ends, that will add extra time to the total recovery process.

Unitrends Enterprise Plus

ReliableDR, a feature of Unitrends Enterprise Plus solves this problem providing continuous RPO/RTO Compliance monitoring, one-click Failover Recovery Plans and Recovery Assurance to be sure your backups are tested and that you are going to be able to recover from them matching your RPO/RTO and without having to maintain any scripts.

MARKET-LEADING BACKUP AND RECOVERY SOLUTIONS

Discover how Unitrends can help protect your organization's sensitive data