My entry in the “nightly snapshot backups of EBS volumes” meme

By | August 7, 2012

UPDATED 9/4/2012: I accidentally had a hard-coded AWS_VOLUME_IDS setting in the script, which I inserted while debugging my own copy of the script and forgot to remove before posting the script here. I’ve removed it. D’oh!

UPDATED 8/19/2012: The logic in my original script for determining which backups to preserve was incorrect. It is updated below.

The easiest way to robustly back up an Amazon EBS volume is to take a snapshot. Whereas EBS volumes are stored in only a single availability zone, such that a catastrophic failure in that zone could destroy your backups along with your EBS volume, snapshots are stored in S3 and replicated across all availability zones in a region, resulting in a ridiculously low likelihood of data loss. (Nevertheless, if your disaster recovery plans need to account for the possibility that an entire EC2 region could kick the bucket, you need to back up your data some other way in addition to the mechanism outlined here.)

Many people have posted their solutions to the “automatically backing up an EBS volume on a reguilar basis” problem. Here’s mine.

To use it, create /etc/sysconfig/aws with settings for AWS_ACCESS_KEY, AWS_SECRET_KEY, and AWS_VOLUME_IDS in it. The latter should contain one or more whitespace-separated volume IDs to be backed up.

The account with the access and secret keys you specify must have at least ec2:CreateSnapshot, ec2:DescripeSnapshots, and ec2:DeleteSnapshot permissions.

Every time the script runs, it creates a new snapshot for each specified volume, then prunes previous backup snapshots of the same volume as follows:

  • Save daily backups for the past week.
  • Save weekly backups for the past month.
  • Save monthly backups for the past year.
  • Prune everything else.

You can save the script in /etc/cron.daily to run your backups automatically on a daily basis.

#!/bin/bash -le
# -l above so /etc/profile.d/aws-apitools-common.sh is loaded
# Daily snapshot, preserving weekly and monthly for a year.

# Put AWS_ACCESS_KEY, AWS_SECRET_KEY, and AWS_VOLUME_IDS here.
export AWS_ACCESS_KEY AWS_SECRET_KEY

. /etc/sysconfig/aws

volume_ids="$AWS_VOLUME_IDS"
set -- $volume_ids

if [ $# = 0 ]; then
    echo "Can't determine volume IDs" 1>&2
    exit 1
fi

for volume_id in $volume_ids; do
    ec2-create-snapshot $volume_id --description 'Automated volume backup'

    # Prune old snapshots

    cutoffs="$(date '+%F')"
    for days in 1 2 3 4 5 6; do
        cutoffs="$cutoffs $(date -d "$days days ago" '+%F')"
    done
    for weeks in 2 3 4; do
        cutoffs="$cutoffs $(date -d "$weeks weeks ago" '+%F')"
    done
    for months in 2 3 4 5 6 7 8 9 10 11 12; do
        cutoffs="$cutoffs $(date -d "$months months ago" '+%F')"
    done

    set $cutoffs

    ec2-describe-snapshots |
        grep "$volume_id.*Automated volume backup" |
        sort -k +5r |
        while read type id volume status timestamp rest; do
            if [ "$volume" != $volume_id ]; then
                continue
            fi
            if [ $# = 0 ]; then
                ec2-delete-snapshot $id
                continue
            fi
            date=$(expr "$timestamp" : '\(....-..-..\)')
            if [[ $1 < $date ]]; then
                if [ -n "$last" ]; then
                    ec2-delete-snapshot $last
                fi
                last="$id"
                continue
            fi
            last=""
            shift
        done
done
Share

4 thoughts on “My entry in the “nightly snapshot backups of EBS volumes” meme

  1. Mike

    “Whereas EBS volumes are stored in only a single region, such that a catastrophic failure in that region could destroy your backups along with your EBS volume, snapshots are stored in S3 and replicated across all regions, resulting in a ridiculously low likelihood of data loss.”

    If you are relying on this technique to offer redundancy for your backups, you may want to review the S3 documentation: http://docs.amazonwebservices.com/AmazonS3/2006-03-01/UG/Introduction.html

    Specifically, “Objects stored in a Region never leave the Region unless you explicitly transfer them to another Region.” Since snapshots are stored in S3, they are just as susceptible as anything else should that region become unavailable for some reason. They are replicated across zones within a region, but NOT to other regions.

    Reply
    1. jik Post author

      You’re correct, I misspoke. EBS snapshots aren’t replicated across regions. They are, however, replicated across availability zones, giving them a significantly higher level of redundancy than EBS volumes. Nevertheless, you’re correct that if your disaster recovery plans need to account for the possibility of an entire Amazon region kicking the bucket, you need to back up your data using some other mechanism in addition to snapshots. I’ve updated the text above to reflect this.

      Reply
  2. dave

    Nice but I think it has a couple of bugs, possibly from copy pasting?

    Reply
    1. jik Post author

      Yeah, looks like <& and < got translated into HTML entities incorrectly. I've fixed it. Thanks for pointing that out!

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *