Notwithstanding their recent boneheaded announcement (reported on by Ars Technica) about restricting which drives can be used in their NASes, #Synology gets most things right, but every once in a while their apps just… lose data, and it’s not clear that they care.
I’ve written before about a Synology Drive Client bug on Linux they’ve known about for years and haven’t bothered to fix. And then there’s the time one of their NASes had a gradually manifesting hardware bug that they could have notified customers about and proactively done a recall, but instead they just let customer NASes fail at which point they were forced to shell out money for a new one.
Today I’m hear to tell you about another data-loss bug in Synology Drive, and the workaround I’ve been forced to implement to avoid having it bite me (again).
Simply put, sometimes Synology Drive Client stops pulling files down from the server. When this happens it claims that everything is fine and it’s synchronizing successfully and it will happily upload to the server any files you modify locally, but any files modified on other computers and synchronized by them to the server don’t get pulled down to the computer that is in this broken state.
Let me say this again: it claims everything is working properly but it isn’t. That’s generally considered Really Bad.
You can get the client to start synchronizing again by restarting the client, but (a) it’s not clear to me that files which weren’t synchronized in the interim get synchronized when you restart, and (b) there are various data-loss and data-conflict scenarios which occur when you modify files on multiple computers when one or more of them aren’t synchronizing properly.
I don’t know the root cause of this, so I don’t know of any way to prevent the problem from happening. Therefore, instead I am now running a script every minute on all of my computers that sends and receives “pings” to/from the other computers in the group via temporary directories and files created within my Synology Drive directory. The script emails me when it doesn’t receive a “response” to a ping it sent to one of the other computers in the group. This means I’ll get some spurious emails when one of my computers is asleep or not on the network, but these are a small price to pay compared to the price of losing data because Synology Drive is failing again.
I haven’t reported this issue to Synology Drive because it’s intermittent and I have no idea how to reproduce it so I’m certain they’ll blow me off.
Here’s the script, for those of you who are curious.
#!/bin/bash
set -e
shopt -s nullglob
PINGDIR=~jik/CloudStation/tmp/syno-pings
ME=$(hostname --short)
DEBUG=false
INTERVAL=60
while [ -n "$1" ]; do
case "$1" in
-d|--debug) DEBUG=true; shift ;;
-i|--interval) shift; INTERVAL="$1"; shift ;;
-*) echo "Unrecognized option: $1" 1>&2; exit 1 ;;
*) break ;;
esac
done
if [ -z "$1" ]; then
echo "No remote host(s) specified" 1>&2
exit 1
fi
debug() {
if ! $DEBUG; then
return
fi
echo "$@"
}
file_age() {
local path="$1"; shift
now=$(date +%s)
if then=$(stat -c %Y "$path" 2>/dev/null); then
echo $((now-then))
else
echo missing
fi
}
wait_for() {
local delay="$1"; shift
local path="$1"; shift
age=$(file_age "$path")
if [ $age = missing ]; then
echo missing
elif ((age < delay)); then
echo waiting
else
echo finished
fi
}
settling() {
local path="$1"; shift
case $(wait_for $((INTERVAL/2)) "$path") in
missing) echo missing ;;
waiting) echo yes ;;
finished) echo no ;;
esac
}
late() {
local path="$1"; shift
case $(wait_for $((INTERVAL*2)) "$path") in
missing) echo missing ;;
waiting) echo no ;;
finished) echo yes ;;
esac
}
dohost() {
local them="$1"; shift
debug Working on pings from $ME to $them
# Note if we were previously broken.
set -- $PINGDIR/ping.$ME-$them.*/broken
if [ -n "$1" ]; then
was_broken=true
else
was_broken=false
fi
debug was_broken=$was_broken
# Clear any pings that have been answered
for ping in $PINGDIR/ping.$ME-$them.*/ack; do
dir=$(dirname $ping)
if [ $(settling $dir) = yes ]; then
debug Ignoring recently acknowledged ping $dir
continue
fi
debug Clearing acknowledged ping $dir
rm -rf $dir
done
# Check for old pings that have not been answered yet.
is_broken=false
for ping in $PINGDIR/ping.$ME-$them.*/syn; do
dir=$(dirname $ping)
if [ -f $dir/broken ]; then
debug $dir remains broken
continue
fi
if [ $(late $dir) = no ]; then
debug Ignoring recently generated ping $dir
continue
fi
is_broken=true
echo $(date) > $dir/broken
debug $dir is newly broken
done
if $was_broken && ! $is_broken; then
echo Pings from $ME to $them have recovered
elif ! $was_broken && $is_broken; then
echo Pings from $ME to $them are failing, one of us is not syncing 1>&2
fi
# Create a new ping.
newpingdir=$PINGDIR/ping.$ME-$them.$(date +%s)
mkdir $newpingdir
echo $(date) > $newpingdir/syn
debug Created $newpingdir/syn
}
# Respond to pings sent to me.
for ping in $PINGDIR/ping.*-$ME.*/syn; do
dir=$(dirname $ping)
if [ -f $dir/ack ]; then
debug Ignoring already acknowledged ping $dir
continue
fi
result=$(settling $dir)
if [ $result = missing ]; then
# Other end deleted it
debug Ignoring $dir after it disappeared
continue
elif [ $result = yes ]; then
debug Ignoring recently received ping $dir
continue
fi
debug Responding to $dir
echo $(date) > $dir/ack
done
for them; do
case "$them" in
*\ *) echo no spaces allowed in host names 1>&2; exit 1 ;;
esac
dohost $them
done