Macs in Chemistry

Insanely great science

 

Backing Up Data

Not the most exciting of topics but if you have ever had a drive failure you will rapidly realise how important it is. I've had a couple of drives fail (they all do eventually) and one difference I've noticed is that whilst spinning disc drives often give errors and warning prior to failure, solid state drive seem to fail abruptly without warning. So a good backup strategy is now even more important. We now have so much of our digital life on a hard drive including photos, music, emails from friends and family. In addition it is always worth having a current backup prior to a major OS upgrade, and with macOS Catalina on the horizon now would be a good time to review.

Backups are one of those things that seem pretty mundane until you really need the backup, and it always seems to occur at the most inconvenient time.

For Mac users the simplest way to back up is to use Time Machine. To create backups with Time Machine, all you need is an external storage device. Simply connect the device and select it as your backup disk, Time Machine automatically makes hourly backups for the past 24 hours, daily backups for the past month, and weekly backups for all previous months. The oldest backups are deleted when your backup disk is full. You can also select folders not to backup (I have a couple of chemical structure databases that are over a TB that I keep copies on an external server that I copy to my Mac when I'm using them but don't need them to be part of the Mac backup). This all works fine as long as the machine is connected to the external storage device periodically.

Using Time Machine you can also designate another volume as a second backup destination. Open System Preferences and then select Time Machine. Click the Select Disk button and you should see not only the volume you’re currently using for your backup but also any volumes available to you. Select an alternative volume and click Use Disk and a sheet will appear asking if you’d like to replace your original backup destination or use both volumes. Click on Use Both. Time Machine will then set about creating a backup archive on the second volume and then back up your Mac to it. From then on it will alternate between the two destinations. If one of the volumes is unavailable, Time Machine will continue backing up to the one it can access. Since external hard drives can also fail this is a simple way to have a "duplicate" backup.

timemachine

There are now many options for external storage from portable SSD for example (amazon link) Samsung Portable SSD T5 1 TB to larger desktop disc drives Seagate 6 TB Backup Plus Hub . If you have multiple machines it is probably worth looking at network drives, I use Synology DS418play 4 Bay Desktop NAS Enclosure. Of course the reliability of the internal disc drives can vary fortunately Backblaze who obviously use a vast number of drives keep a regular report on hard drive reliability for both manufacturers and disk models/sizes. It is well worth looking up the latest report before you decide on a purchase.

hard_drive_failure_rate2019

There are a couple of reasons why you may want a copy of a particular file, firstly to restore a file that has become lost or damaged, or perhaps you might need to keep a copy for potential regulatory inspection at some later date. In the later case it may well be very important to ensure all the appropriate meta data is also captured and stored appropriately. Or you may want to reinstall everything onto a new computer after a drive failure, accidental damage or theft.

However remember that external hard drives will fait also so you may need an additional strategy to keep multiple copies.

In the past people have suggested using multiple media for storing backups, DVD, Zip drives etc. whilst this might seem a good idea, you will also need to keep the device and connectors to read the media and the software/drivers etc. up to date.

Nowadays I suspect using an offsite backup is probably the best approach.

Offsite backups.

Clearly an external hard drive may suffer the same fate as the computer in the case of fire, house burglary etc. so it is important to have a backup in another physical location. This could be a hard drive stored in a safety deposit box but nowadays more people are making use of cloud storage options. This could be

Whilst all these backup systems will store most items from your user folder they may not backup system folders or usr/local. So if you have a lot of custom python files installed in usr/local/bin then you might have to check if that are included in the backup. In addition many do not backup attached external or network drives.

Arq

Arq is a true Mac software application written in Objective-C and so has the familiar look and feel. It uses a variety of cloud storage options including Amazon Web Services for storage and Arq can back up to either Amazon S3 (Simple Storage Service) or Amazon Glacier. Amazon Glacier is very low priced storage but it is optimised for data that is infrequently accessed. Initiating retrieval from Glacier typically takes 3-5 hours, and Amazon charges for retrieving large amounts of data from Glacier. Encryption uses uses AES256/CBC using your key before the files leave your computer. All encryption is done before your data leave your computer (not using Amazon's "server-side encryption"). It also uses built-in compression and de-duplication reduce upload times

Interestingly, to avoid any potential concerns that you might feel “locked in” to a proprietary software package they have provided an open source arq restore tool, arq_restore.

This command-line utility is a key part of giving you full control of your backups. You control your backup data (it’s in your own S3 account) and you have the means to easily restore from it in the future without depending on Haystack Software.

Backblaze

Backblaze looks to be a very simple system to manage in that it basically backs up everything, except the operating system and applications folder.

Backblaze does not want to waste your bandwidth or Backblaze datacenter disk space. Thus, we do not backup your operating system, application folder, or temporary internet files that are transient and would not be useful in the future. Backblaze also excludes podcasts in iTunes.

However Backblaze is not an archival system since it simply mirrors your hard drive.

Backblaze will keep versions of a file that changes for up to 30 days. However, Backblaze is not designed as an additional storage system when you run out of space. Backblaze mirrors your drive. If you delete your data, it will be deleted from Backblaze after 30 days.

This also applies to external drives, if they are not plugged in for 30 days then the backup is deleted.

CrashPlan

CrashPlan offer a variety of backup plans, using both local hardware and cloud based recovery. They have public cloud support, PROe Cloud backup provides a secure, scalable offsite backup solution with low IT management overhead or hardware investment. Secures information before transmission with 448-bit encryption. On the Mac the software seems to be Java based.

Dropbox for Teams

I suspect many people use dropbox to share personal files, but Dropbox for Teams is intended for business users, it uses the Amazon S3 service used to store the Dropbox data. You can apply additional encryption with third-party applications before placing files in Dropbox, giving you added control over the security.

Jungle Disk

Jungle Disk provides backup, sync, and access data between teams of 2-100. Powered by storage options from cloud Rackspace and Amazon. Jungle Disk encrypts user data with AES-256 encryption

Mozy

Mozy provide a variety of different backup schemes from Personal to Enterprise with support for desktop and mobile devices. They offer both Mozy and custom encryption and you can order a Data Shuttle device from Mozy, they overnight it to you, and you do the initial backup to the shuttle device. Put it back in the box and ship it to their data center and you've skipped the initial upload over the wire, this sounds very useful if you have a large initial backup.

Update

A reader suggested using Carbon Copy Cloner to create backups. I have used this in the past to create bootable backups but not as a scheduled archival system.

Where to Store?

Amazon Web Services

Amazon offers as part of Amazon Web Services (AWS) A simple storage service (Amazon S3). It offers a simple web service that can be used to securely store data and then retrieve it. It is designed to with a minimal feature set to ensure ease of use and reliability. Data is stored within a specific geographic region and the user does not have to worry about data being transferred to another geographic region.

Amazon S3’s standard storage is backed with the Amazon S3 Service Level Agreement and is designed for 99.999999999% durability and 99.99% availability of objects over a given year. It is also designed to sustain the concurrent loss of data in two facilities.

There is also a free usage option for new users allowing unto 5GB standard storage, 20,000 Get Requests, 2,000 Put Requests, and 15GB of data transfer out each month for one year

Not all data needs to be instantly accessible and Amazon Glacier’s extremely low-cost storage service would be an alternative as a storage option for data archival. Amazon Glacier is optimized for data that is infrequently accessed and for which retrieval times of several hours are suitable. Examples include digital media archives, financial and healthcare records, raw genomic sequence data, long-term database backups, and data that must be retained for regulatory compliance.

Full details of the costings can be found here.

Google Drive for Business

Installation of Google Drive allows you to save and access files across multiple devices. All your data is automatically backed up on Google secure servers. So when accidents happen you can be up and running again in seconds. The SLA Guarantees 99.9% availability with zero scheduled down-time, they also claim a robust disaster recovery but don’t give details. If you are planning to use Google Apps for business this might be worth considering.

Full details of the costings can be found here

CrashPlan Pro for Enterprise

CrashPlan offer a variety of backup plans, using both local hardware and cloud based recovery. They have public cloud support, PROe Cloud backup provides a secure, scalable offsite backup solution with low IT management overhead or hardware investment. They have data centres distributed though out the world. Alternatively you can use their software with your own hardware. Secures information before transmission with 448-bit encryption.

Rackspace

Rackspace Option for dedicated and cloud platform. Partnered with the Akamai Technologies, Inc., highly distributed content distribution network—with over 84,000 servers in 72 countries. Secure encryption (Advanced Encryption Standard, 256 bit key) is available.

Full details of the costings can be found here

Mobile Devices

Most of the services described above are designed for desktop of laptop machines so mobile devices (iPhone, iPad) are not really covered. If you lose or change your device you can install from an iClod backup. Most documents created using iOS app (Numbers, Pages, Keynote) will be automatically copied to iCloud Drive, however this is not a backup in the sense that older versions are also stored. Photos and music can be stored in the cloud, and anything that is deleted can be restored within a limited time period. If you have lots of third party apps it might be worth checking if it is possible to restore from a backup or invest in something like Tensorshare.

Last Updated 20 September 2019