Macs in Chemistry

Insanely great science

Backing Up Data Offsite

Whilst I have an external hard drive for backups I used to use my dotMac account as my off site backup but with its closure I had to look for alternatives. I thought it might be useful to summarise my findings.

Backups are one of those things that seem pretty mundane until you really need the backup, and it always seems to occur at the most inconvenient time.

There are a couple of reasons why you may want a copy of a particular file, firstly to restore a file that has become lost or damaged, or perhaps you might need to keep a copy for potential regulatory inspection at some later date. In the later case it may well be very important to ensure all the appropriate meta data is also captured and stored appropriately. Fortunately there is an application you can use to check the veracity of the backup. Backup Bouncer is a command-line-based test suite that makes it easy to find out how good or bad your backup software is. It aims to be a comprehensive test for preservation of all OS X file metadata. The results of testing a selection of popular backup software options by Haystack software are shown below.

Arq Passes all tests

Backblaze Failed 19 of 20 tests

Carbonite Failed 20 of 20 tests

CrashPlan Failed 1 of 20 tests

Dropbox Failed 19 of 20 tests

Jungle Disk Passes all tests

Mozy Failed 16 of 20 tests

Arq

Arq is a true Mac software application written in Objective-C and so has the familiar look and feel. It uses Amazon Web Services for storage and Arq can back up to either Amazon S3 (Simple Storage Service) or Amazon Glacier. Amazon Glacier is very low priced storage but it is optimised for data that is infrequently accessed. Initiating retrieval from Glacier typically takes 3-5 hours, and Amazon charges for retrieving large amounts of data from Glacier. Encryption uses uses AES256/CBC using your key before the files leave your computer. All encryption is done before your data leave your computer (not using Amazon's "server-side encryption").

Interestingly, to avoid any potential concerns that you might feel “locked in” to a proprietary software package they have provided an open source arq restore tool, arq_restore.

This command-line utility is a key part of giving you full control of your backups. You control your backup data (it’s in your own S3 account) and you have the means to easily restore from it in the future without depending on Haystack Software.

Backblaze

Backblaze looks to be a very simple system to manage in that it basically backs up everything, except the operating system and applications folder. However Backblaze is not an archival system since it simply mirrors your hard drive.

Backblaze will keep versions of a file that changes for up to 30 days. However, Backblaze is not designed as an additional storage system when you run out of space. Backblaze mirrors your drive. If you delete your data, it will be deleted from Backblaze after 30 days.

This also applies to external drives, if they are not plugged in for 30 days then the backup is deleted.

CrashPlan

CrashPlan offer a variety of backup plans, using both local hardware and cloud based recovery. They have public cloud support, PROe Cloud backup provides a secure, scalable offsite backup solution with low IT management overhead or hardware investment. Secures information before transmission with 448-bit encryption. On the Mac the software is Java based.

Dropbox for Teams

I suspect many people use dropbox to share personal files, but Dropbox for Teams is intended for business users, it uses the Amazon S3 service used to store the Dropbox data. You can apply additional encryption with third-party applications before placing files in Dropbox, giving you added control over the security.

Jungle Disk

Jungle Disk provides backup, sync, and access data between teams of 2-100. Powered by storage options from cloud Rackspace and Amazon. Jungle Disk encrypts user data with AES-256 encryption

Mozy

Mozy provide a variety of different backup schemes from Personal to Enterprise with support for desktop and mobile devices. They offer both Mozy and custom encryption and you can order a Data Shuttle device from Mozy, they overnight it to you, and you do the initial backup to the shuttle device. Put it back in the box and ship it to their data center and you've skipped the initial upload over the wire, this sounds very useful if you have a large initial backup.

Where to Store?

The next question is where to store the backup? An external hard drive or Time Machine capsule may be convenient but they only work when you are in the office and in the event of a fire or burglary it is likely you would lose both the computer and the backup. Using multiple external hard drives, with one stored offsite is an option but the logistics can be a bit of a pain.

The ideal solution is an offsite backup that can be accessed anywhere at anytime. Whilst MobileMe offered some storage it is only with the advent of cloud storage that multiple options have become available.

Amazon Web Services

Amazon offers as part of Amazon Web Services (AWS) A simple storage service (Amazon S3). It offers a simple web service that can be used to securely store data and then retrieve it. It is designed to with a minimal feature set to ensure ease of use and reliability. Data is stored within a specific geographic region and the user does not have to worry about data being transferred to another geographic region.

Amazon S3’s standard storage is backed with the Amazon S3 Service Level Agreement and is designed for 99.999999999% durability and 99.99% availability of objects over a given year. It is also designed to sustain the concurrent loss of data in two facilities.

There is also a free usage option for new users allowing unto 5GB standard storage, 20,000 Get Requests, 2,000 Put Requests, and 15GB of data transfer out each month for one year

Not all data needs to be instantly accessible and Amazon Glacier’s extremely low-cost storage service would be an alternative as a storage option for data archival. Amazon Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable. Examples include digital media archives, financial and healthcare records, raw genomic sequence data, long-term database backups, and data that must be retained for regulatory compliance.

Full details of the costings can be found here.

Google Drive for Business

Installation of Google Drive allows you to save and access files across multiple devices. All your data is automatically backed up on Google secure servers. So when accidents happen you can be up and running again in seconds. The SLA Guarantees 99.9% availability with zero scheduled down-time, they also claim a robust disaster recovery but don’t give details. If you are planning to use Google Apps for business this might be worth considering.

Full details of the costings can be found here

CrashPlan Pro for Enterprise

CrashPlan offer a variety of backup plans, using both local hardware and cloud based recovery. They have public cloud support, PROe Cloud backup provides a secure, scalable offsite backup solution with low IT management overhead or hardware investment. They have data centres distributed though out the world. Alternatively you can use their software with your own hardware. Secures information before transmission with 448-bit encryption.

Rackspace

Rackspace Option for dedicated and cloud platform. Partnered with the Akamai Technologies, Inc., highly distributed content distribution network—with over 84,000 servers in 72 countries. Secure encryption (Advanced Encryption Standard, 256 bit key) is available.

Full details of the costings can be found here

Last Updated 27 February 2013