Trillium Insights

Thoughts and Insights from Trillium's Practice Leaders

Time to Recover

Time to Recover

OK, if I asked, everybody would tell me that they are backing up their data.  And some are even testing their ability to restore data from those backups, but even for those companies that go through that exercise, few of them are aware of how long the recovery process actually takes. The old adage that recovery is what’s important in the data protection process is only partially true. The amount of time it takes to bring an application back into production can now significantly impact even a small or mid-sized company’s users and customers as well. Backup is not about backup, it’s not about recovery, it is about time to recover.

Restoration is more than just copying data back from a disk backup appliance, it’s the time required to bring all their users or customers back online. All restoration must start with a perfect and secure backup, which means a backup that’s sent to a location that can survive a disaster, not just stored locally. This leads many organizations to consider cloud-based backup solutions since these by their nature move data off-site automatically.

Of course the reason that IT administrators are tasked with backing data up is that someday something might go wrong. These things can range from accidental user deletion of key data, to application corruption, to server hardware failure, to server hard disk failure or even a total site loss. While the loss of an entire facility captures the headlines more common sources of failures are user errors (accidental deletion) or an application corrupting its own data. Hard drive and server failures, while more scarce, are so severe that specific protection must be provided against their occurrence, regardless of how improbable.

Each of these server-specific failures are disasters in their own right and the time to recover from them becomes more critical as the business grows. And, each directly impacts user productivity and potentially, the customer experience. In most cases they also impact the company’s ability to generate revenue.

Even though a site disaster is the most severe, most mid-market companies will find a patient user and customer community while they work to recover from these more dramatic disasters. As long as the data is still accessible and can be recovered in a few days most businesses will survive a total site disaster. Again a cloud based copy of data is ideal for this.

There are three key phases to bringing an application back online in the case of one of the server specific disasters. The first is the time it takes to replace the failed component if there was a server or hard drive failure. This means obtaining the replacement, connecting it to the network and more than likely loading the core operating system onto the new server or hard drive. These steps can take days in some cases and can mean a significant loss in productivity.

The second phase is the time it takes to copy the data from the backup target to the primary hard drive. If the company chose to use a cloud-only backup solution this can also take days as data is transported across the internet to the replacement server. While some cloud backup providers can copy data to a hard drive and ship it to the customer, this process can still take several days to receive the hard drive and copy the data to the new server.

As a result a business with servers to protect needs to consider a cloud solution where, at a minimum, the most recent copy of data is stored locally. This is typically an appliance that acts as the backup disk target and a gateway to the cloud. However, the cloud appliance approach still has the challenge of copying data back to the new server, which, depending on the size of the data set, can still take hours. The copy process across the network consumes more than just the time it takes to move a large data set across the network. There is also the time it takes to write data to the local hard drive or the SAN/NAS attached to the server. The final phase is to actually bring the application back online. This can be complex depending on the state of data when it was backed up. If the application was not shut down correctly, recovery may take hours. There may also be steps required to reconfigure the application to work on the new server or correctly access the new storage device. The net result is that recovery takes time, often much longer than expected. It’s not uncommon to hear IT personnel complain that a recovery of the failed application server took days longer than they expected.

The solution to the recovery challenge is to always be in a “instant recovery” state by leveraging more powerful local appliances that can actually run virtual versions of a server when the original server fails. These products and recovery appliances allow the small to medium sized company to bring an application back online in a matter of minutes by leveraging on-site appliances and virtualization.

The biggest cause of problems in the recovery process is not testing that process before a real failure occurs. As stated earlier this has to be more than just occasionally copying a few files back and forth, it should mean bringing the whole application online. But again that’s expensive and time consuming. The value in a virtual recovery solution is that complete application recovery can be tested without the need to purchase standby hardware, software or to copy data. Testing can become a regular event that takes only a few minutes. Also, regular testing improves the skill level of IT personnel, reducing response time in an actual recovery scenario.

Most small to medium-sized businesses tend to focus on the backup task and assume the data protection job is done when their backups are successfully copied somewhere else. That “somewhere else” used to be a tape drive. Today it’s commonly a disk backup appliance and is more often becoming some form of cloud backup. A larger number of mid-sized companies are understanding that recovery of that data and the associated applications are just as important and are testing occasionally their ability to restore data. Very few though, appreciate the value of time to recovery. Virtual recovery solutions like appliances can help solve the time-to-recover problem that most small to medium-sized businesses are ill prepared to face.