The last time I looked, there were no options I was happy with for backing up my Linux PC in the cloud for a reasonable price. That may have changed, and perhaps if I were less stubborn I’d pay CrashPlan a few bucks per month to back up my system, but I feel compelled to build my own backup infrastructure for three reasons: (1) I want complete control over it; (2) I worry about a company backing up my data going belly-up and leaving me high and dry; and (3) I’m a cheap bastard.
Until recently I was backing up my data into a ReiserFS filesystem being stored in the Amazon S3 cloud via s3backer. That was costing me on average a little under $5 per month in storage and transaction costs.
Then I got an offer from AppSumo for 10GB of storage for life from LetsCrate for only $25. That got me wondering… There are a bunch of cloud storage / file sharing services on the Internet right now, and just about every one of them offers some amount of storage for free. Could I find away to take advantage of all that free storage to reduce my backup costs almost to nothing?
What’s out there
Here are the cloud storage services that offer free Linux-accessible storage that I know about (if you know of others, please post a comment or email me!):
|Service||Free space||Referral link|
|SugarSync ||5GB, plus bonuses for doing various things like installing clients, plus 500MB per referral||Thanks!|
|IDrive ||5GB, plus 10GB if you let IDrive send a promotional email to all of your contacts, plus 1GB per referral up to a limit of 50GB||Thanks!|
|Dropbox||2GB, plus 250MB just for setting up the client successfully, plus up to 8GB more in 250MB increments for referrals ||Thanks!|
|Syncplicity ||2GB plus referrals (1GB each up to 3GB)||Thanks!|
|TeamDrive||2GB, plus 250MB per referral up to 8GB||Thanks!|
|SpiderOak ||2GB, plus 1GB per referral up to 50GB||Thanks!|
 IDrive isn’t directly accessible from Linux, and it doesn’t seem to play well with WINE, at least as of WINE 1.3.29, so you’ll have to run Windows inside of VirtualBox or something if you want to use it. Also, under Windows, you have to use msconfig to prevent it from starting up when you log in.
 If you use an email address ending in “.edu” and use that to register for your Dropbox account, and then validate it at http://dropbox.com/edu, you get 500MB per referral instead of 250MB. You can even change your Dropbox email address to a “.edu” one and then use the link above to get the extra referral space retroactively!
 Syncplicity isn’t directly accessible from Linux, and I can’t confirm whether it runs under WINE because winetricks won’t install .NET 3.0 properly on an x86_64 system like mine. Under Windows, you have to use msconfig to prevent it from starting up when you log in.
 The Linux client provided by ZumoDrive doesn’t work on Fedora 16 Linux, and I couldn’t figure out how to get it to work. The Windows client installs, but I couldn’t find anywhere in the client or on the ZumoDrive Web site where I could actually sign up for a new account! Maybe they’re not accepting new users anymore or something?
 So far, I am unimpressed with LetsCrate. Once you actually get a file into their system, it seems to be safe, but getting files in can be a challenge, since the web app is flaky and unreliable. Perhaps they will improve over time. (I wrote this in October 2011)
That’s 50GB of free space in the cloud (43GB if you don’t have a Windows VM), if you can figure out how to use it effectively. So, how do you use it effectively?
The backup script: crateify.pl
My answer is crateify.pl, a simple Perl script I wrote for this purpose.
Without further ado, here is its embedded Perl “POD” documentation:
crateify.pl – Package up files for backing up in the cloud
This script packages files within a directory tree into compressed, encrypted tar “crates” that can be easily uploaded to free cloud storage accounts, providing a sort of poor-man’s cloud backup solution.
The files are packaged in chronological order, i.e., oldest files first, to minimize the frequency with which you have to rebuild crates. Files that are updated between runs of the script are repackaged in new crates.
The following variables can and should be edited in the script before you use it:
- The directory whose contents should be crated.
Note that files with newlines in their names will not be crated.
- The directory in which crates and associated metadata files should be stored.
- The directory in which the keyring containing your GPG key (used to encrypt the crates for safe storage online) is stored.
- The identifier of the GPG key that should be used to encrypt the crates.
NOTE: Make sure you have copies of your public and private GPG keys backed up somewhere safe not inside a crate. If your computer crashes and you need to restore from your backup, it won’t do any good if you can’t decrypt it!
- The (pre-compression, pre-encryption) size of each crate, in bytes. A crates can end up being much bigger than this if the last file inserted into it is large.
- Regular expressions (relative to the root of $backup_dir) of directories and files to be excluded from crating.
Here’s a trick I use to find out what’s taking up space in my crates:
cd $backup_dir sed -e 's/ [0-9]*$//' $data_dir/crate-##### | xargs -d '\n' ls -lSr
This lists the files in the specific crate, in size order, so you an see what’s taking up a lot of space. I do this whenever my nightly backup report email tells me that a larger than expected crate was built.
Note that I personally do not back up my “live” hard drive, but rather a mirror hard drive maintained with rdiff-backup. Therefore, most of the files I would not want/need to crate are already excluded from my $backup_dir, which is why my @exclude list is so short.
If you do backup of your live hard drive, then make sure you exclude cloud storage directories, e.g., ~/Dropbox, especially if you store crates in them! Otherwise, you’ll create a loop where each time you create new crates in a backup, your old crates will be included in them, which would obviously be Very Bad.
If you specify both @exclude and @include, then @include is applied first and @exclude is applied to what’s left.
- Regular expressions (relative to the root of $backup_dir) of directories and files to be included from crating.
If you specify both @exclude and @include, then @include is applied first and @exclude is applied to what’s left.
- Produce (at most) te specified number of crates, rather than just one new crate, which is the default.
This is faster when you want to produce multiple crates, since it won’t have to rescan the entire backup directory for each one.
- Create enough crates to hold everything that currently needs to be crated.
- Update meta-data files (see below) without building any new crates.
- Don’t print warnings about updated or deleted files in existing crates.
The early crates you build will probably be relatively static, assuming that you have a lot of old data that isn’t likely to change anymore.
However, over time your crates will accumulate files that are obsolete because they’ve been deleted or updated versions have been packed into newer crates. Each time you run it, the script prints warnings about such files.
You will probably want to occasionally “compact” your crates to remove such obsolete files. To do this, simply remove the corresponding crate-##### files from $data_dir, and the corresponding compressed, encrypted tar files from wherever you put them, and the script will repack the files that were in those crates the next time you run it.
The script creates the following meta-data files:
- Listings of the files in each crate. The script needs these to work, so you should leave them in $data_dir even if you move the crates themselves into the cloud.
- A list of the crated files that have been deleted since they were crated.
- A list of the crated files that have been updated since they were crated, i.e., files that have obsolete versions in one or more crates, and will also, if your crates are up-to-date, have a current version in one crate.
- Temporary file created and used while packing crates. It should not exist between successful runs of the script, but you shouldn’t create a file with this name in $data_dir or it’ll get overwritten.
- A list of all te files in all of the crates, intended to be used to exclude those files from some other backup system.
Suppose you want to use this script to back up your old, static files that never change, but you’d rather use some other backup system to back up frequently changing files. To do that, you would tell the other backup system to exclude the files listed in $data_dir/excludes.
For example, if you use rsync to backup frequently changing files to a remote filesystem, then you can tell it to “–exclude-from $data_dir/excludes”.
The crates you build with this script obviously don’t do much good as a backup if they sit on the same drive as the files being backed up. Here are some examples of what you can do with them to turn them into a real backup.
- Stick an extra hard drive (internal or external) into your system and put your crates on it. This won’t do you much good if your house burns down or somebody steals your computer, but it’ll at least protect you against drive failure.
- Make a deal with a friend — he lets you use unused space on his hard disk to scp your crates to every night when you back up, and vice versa.
- Free cloud storage! See http://blog.kamens.us/ for a list of cloud storage platform which will give you a total of 50GB of free storage just for asking. You can store a lot of crates in 50GB!
- Upload them to Amazon S3 or some other commercial cloud storage service.
Personally, I have uploaded most of my crates, the ones containing older files that change rarely if ever, by hand to free accounts on SkyDrive and LetScrate. Then, my nightly backup puts new crates in my Dropbox folder, so they get synchronized to the cloud automatically. Occasionally, I compact the Dropbox crates as described above and move some of the compacted crates SkyDrive or LetsCrate as needed.
You probably have some really huge files (home videos, anyone?) that you want to back up. Since this script doesn’t split files between crates, any crate containing a really huge file is going to be really huge itself.
Depending on where you store your crates, this may present a problem, since some cloud storage services limit the size of uploaded files.
The easiest solution is to split big crates before uploading it. For example:
split -b 50000000 -d crate-#####.tar.bz2.gnupg crate-#####.tar.bz2.gnupg. && \ rm crate-#####.tar.bz2.gnupg
The name of the crate is specified to the “split” command a second time with a period at the end of it as the file-name prefix for the split files that are produced.
If you ever need to restore from a split crate, you can cat all of the split files directly into gpg, something like this:
cat crate-#####.tar.bz2.gnupg.* | gpg | tar xj
If you can’t figure out on your own how to restore from the crates produced by this script, then you probably shouldn’t use it. CrashPlan is a pretty nice service, and it’s very inexpensive. 🙂
Having said that…
To restore from a set of crates, you decrypt and untar all the crates in order (preferably as root, so that read-only, updated files can be overwritten) and then remove the ones listed in the “deleted” file.
Alternatively, if you just need to restore a specific file, you can look through the crate-##### files in reverse order to find the file you want, and then extract it from the corresponding crate.
This script isn’t really intended to preserve historical versions of files or to allow you to recover files that were deleted long ago. It sort of does that if you never compact your crates, but that’ll eat up a lot of extra storage space for files that change regularly.
Therefore, if you want access to a historical record of your files, as opposed to an emergency recovery snapshot of what you’ve got on disk right now, this probably isn’t the right tool for you.
This script was written and is maintained by Jonathan Kamens <[email protected]>.
Please let me know if you have questions, comments or suggestions!
I won’t lie to you… It takes work to set up and use this script for backups. If you’re the kind of do-it-yourselfer who likes stuff like this, great, but if not, you might be asking yourself, “Are there other options for backing up my Linux box for free?”
There are probably quite a few of them, but if you have one that’s you’re favorite please free to email me email me and I’ll add it here, but here’s the one I like…
CrashPlan (http://crashplan.com/), which I’ve mentioned elsewhere in this document, will let you back up an unlimited amount of data to their servers for $3.00 per month. This is neat, but they’ll also let use their easy-to-use software for free to back up your data to your own server instead of theirs.
“How is that free?” you’re asking? Well, if you can find a friend with an Internet connection (who doesn’t?) and some extra hard drive space (hard drives are really cheap nowadays!), you can back up your system on his hard drive, and vice versa. Both of you need to install the CrashPlan software on your systems and open up your firewalls to allow access to it, and that’s it. You can configure CrashPlan to limit the amount of bandwidth it uses so it won’t max out your Internet connection (in fact, it comes configured that way by default). The one caveat is that if you ever do need to do a restore, it’ll probably take longer from your friend’s computer than it would from something in the cloud, since most home Internet connections have a slower uplink speed than downlink.
This script is and always will be free for you to use or modify as you see fit. Having said that, it took me time to write the script, and it takes me time to support the people using it. So if you do use it and save yourself some money, please consider showing your appreciation by sending me a donation at http://blog.kamens.us/support-my-blog/. Any donation, large or small, is appreciated!
Copyright (c) 2011 Jonathan Kamens.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
$Id: crateify.pl,v 1.35 2012/01/04 13:07:30 jik Exp $
The current version of this script should always be available from http://stuff.mit.edu/~jik/software/crateify.pl.txt.