Long-time readers of my blog will know that I am obsessed with backups and with keeping control over data that belongs to me. For example, in additional to a comprehensive backup system I built myself, I have an archive of (legally obtained) music files and movies because I don’t trust any of the cloud music and video apps to preserve my access to them, and my family’s photo / video archive is stored on a Synology NAS in my basement rather than in Google Photos or any other cloud photo app.
When it comes to social media, however, this starts to break down. Most of the for-profit social media sites make it somewhere between difficult and impossible to regularly, automatically export your data and save it in a format you’ll be able to use later. They usually have a process for requesting an archive of all the data they have about you, but it’s usually cumbersome and slow and can’t be automated, and there are frequently limits on how often you can use it.
But there is good news! When I migrated recently from Twitter (ugh!) to Mastodon, I was delighted to discover that Mastodon has a public API that anyone can use, that said API exposes nearly all of my data, and that someone else had already written a tool for exporting it called mastodon-archive
.
In addition to this being useful for maintaining control over your data, it is useful in two others ways:
- If you have your Mastodon account set up to auto-delete old posts, which many sites encourage their users to do, you can retain copies of those posts in your archive for future reference.
- If you’re having trouble finding something you posted at some point in the past because full-text search in Mastodon isn’t great, you can load the archive file into a text editor and search through it to your heart’s content.
After spending a bit of time playing with mastodon-archive
, I found it lacking a few features that were important to me, so I modified the tool to add those features, and the maintainer of the tool was kind enough to accept my patches, so the code I added will be present in the next release. Here’s what I added:
- I added a
--quiet
command-line option which tells the script not to generate any “normal” (non-error) informational output while it is archiving, so that it can be run from a cron job without generating emails when everything is working fine. - Similarly, I added a
--suppress-errors
command-line option when archiving media files which tells the script not to complain when it can’t download a particular media file because it’s just not there. - I taught the script how to download and archive the lists of users I’ve muted, the list of user’s I’ve blocked, and the private notes I’ve added to the profiles of my follows, followers, muted users, and blocked users (it currently isn’t possible to download all private notes, a deficiency which I hope will be fixed in a future release).
Given all that background information, here’s what I’m doing to regularly archive the content from my Mastodon account. If you care about maintaining control of your data as much as I do, you might want to consider doing something like this yourself.
(These instructions assume you know how to install Python packages and set up cron jobs. Explaining how to do these things is outside the scope of this article.)
First, install the mastodon-archive
utility referenced above. Once the next version after 1.4.2 is posted on PyPI, you can install from there. Until then, you should install from the Github repository to get my changes, e.g., “pip install https://github.com/kensanata/mastodon-archive/archive/main.zip
“.
In the directory on your local machine where you want to download and archive your data regularly, run “mastodon-archive login user@host
“, where “user@host” is your Mastodon username and hostname.
Create a shell script which looks something like this (you need to fill in “[user@host]” and “[archive-directory]” with your Mastodon username and hostname and the directory in which you just ran the mastodon-archive login
command:
#!/usr/bin/env -S bash -e
# -*- mode: sh; sh-shell: bash; -*-
quiet=--quiet
suppress=--suppress-errors
while [ "x$1" != "x" ]; do
case "$1" in
--verbose|-v)
shift
quiet=
suppress=
;;
*)
echo "Unrecognized argument \"$1\"" 1>&2
exit 1
;;
esac
done
accounts="[user@host]"
cd [archive-directory]
for account in $accounts; do
mastodon-archive $quiet archive --with-followers --with-following \
--with-mutes --with-blocks --with-notes $account
for coll in statuses favourites bookmarks; do
mastodon-archive $quiet media --collection $coll $suppress $account
done
done
(Note that there are additional archiving options you may wish to specify; see the documentation for the script on its home page for additional information.)
Create a cron job to run the shell script regularly. For example, I have this cron job set up to run it at midnight every night:
0 0 * * * PATH=/usr/local/bin:$PATH $HOME/src/scripts/mastodon-archive-cron
Make sure the directory you’re archiving into is being backed up by whatever you have set up to back up your data in general.
Et voila! If you ever lose access to your Mastodon account, either because your server goes down or because you get banned or whatever, you will have a recent copy of your data to facilitate starting over without losing everything.