Another Synology Drive data loss bug

Notwithstanding their recent boneheaded announcement (reported on by Ars Technica) about restricting which drives can be used in their NASes, #Synology gets most things right, but every once in a while their apps just… lose data, and it’s not clear that they care.

I’ve written before about a Synology Drive Client bug on Linux they’ve known about for years and haven’t bothered to fix. And then there’s the time one of their NASes had a gradually manifesting hardware bug that they could have notified customers about and proactively done a recall, but instead they just let customer NASes fail at which point they were forced to shell out money for a new one.

Today I’m hear to tell you about another data-loss bug in Synology Drive, and the workaround I’ve been forced to implement to avoid having it bite me (again).

Simply put, sometimes Synology Drive Client stops pulling files down from the server. When this happens it claims that everything is fine and it’s synchronizing successfully and it will happily upload to the server any files you modify locally, but any files modified on other computers and synchronized by them to the server don’t get pulled down to the computer that is in this broken state.

Let me say this again: it claims everything is working properly but it isn’t. That’s generally considered Really Bad.

You can get the client to start synchronizing again by restarting the client, but (a) it’s not clear to me that files which weren’t synchronized in the interim get synchronized when you restart, and (b) there are various data-loss and data-conflict scenarios which occur when you modify files on multiple computers when one or more of them aren’t synchronizing properly.

I don’t know the root cause of this, so I don’t know of any way to prevent the problem from happening. Therefore, instead I am now running a script every minute on all of my computers that sends and receives “pings” to/from the other computers in the group via temporary directories and files created within my Synology Drive directory. The script emails me when it doesn’t receive a “response” to a ping it sent to one of the other computers in the group. This means I’ll get some spurious emails when one of my computers is asleep or not on the network, but these are a small price to pay compared to the price of losing data because Synology Drive is failing again.

I haven’t reported this issue to Synology Drive because it’s intermittent and I have no idea how to reproduce it so I’m certain they’ll blow me off.

Here’s the script, for those of you who are curious.

#!/bin/bash

set -e
shopt -s nullglob

PINGDIR=~jik/CloudStation/tmp/syno-pings
ME=$(hostname --short)
DEBUG=false
INTERVAL=60

while [ -n "$1" ]; do
    case "$1" in
        -d|--debug) DEBUG=true; shift ;;
        -i|--interval) shift; INTERVAL="$1"; shift ;;
        -*) echo "Unrecognized option: $1" 1>&2; exit 1 ;;
        *) break ;;
    esac
done
           
if [ -z "$1" ]; then
    echo "No remote host(s) specified" 1>&2
    exit 1
fi

debug() {
    if ! $DEBUG; then
        return
    fi
    echo "$@"
}

file_age() {
    local path="$1"; shift
    now=$(date +%s)
    if then=$(stat -c %Y "$path" 2>/dev/null); then
        echo $((now-then))
    else
        echo missing
    fi
}

wait_for() {
    local delay="$1"; shift
    local path="$1"; shift
    age=$(file_age "$path")
    if [ $age = missing ]; then
        echo missing
    elif ((age < delay)); then
        echo waiting
    else
        echo finished
    fi
}  
    
settling() {
    local path="$1"; shift
    case $(wait_for $((INTERVAL/2)) "$path") in
        missing) echo missing ;;
        waiting) echo yes ;;
        finished) echo no ;;
    esac
}

late() {
    local path="$1"; shift
    case $(wait_for $((INTERVAL*2)) "$path") in
        missing) echo missing ;;
        waiting) echo no ;;
        finished) echo yes ;;
    esac
        
}

dohost() {
    local them="$1"; shift

    debug Working on pings from $ME to $them

    # Note if we were previously broken.
    set -- $PINGDIR/ping.$ME-$them.*/broken
    if [ -n "$1" ]; then
        was_broken=true
    else
        was_broken=false
    fi

    debug was_broken=$was_broken

    # Clear any pings that have been answered
    for ping in $PINGDIR/ping.$ME-$them.*/ack; do
        dir=$(dirname $ping)
        if [ $(settling $dir) = yes ]; then
            debug Ignoring recently acknowledged ping $dir
            continue
        fi
        debug Clearing acknowledged ping $dir
        rm -rf $dir
    done

    # Check for old pings that have not been answered yet.
    is_broken=false
    for ping in $PINGDIR/ping.$ME-$them.*/syn; do
        dir=$(dirname $ping)
        if [ -f $dir/broken ]; then
            debug $dir remains broken
            continue
        fi
        if [ $(late $dir) = no ]; then
            debug Ignoring recently generated ping $dir
            continue
        fi
        is_broken=true
        echo $(date) > $dir/broken
        debug $dir is newly broken
    done

    if $was_broken && ! $is_broken; then
        echo Pings from $ME to $them have recovered
    elif ! $was_broken && $is_broken; then
        echo Pings from $ME to $them are failing, one of us is not syncing 1>&2
    fi

    # Create a new ping.
    newpingdir=$PINGDIR/ping.$ME-$them.$(date +%s)
    mkdir $newpingdir
    echo $(date) > $newpingdir/syn
    debug Created $newpingdir/syn
}

# Respond to pings sent to me.
for ping in $PINGDIR/ping.*-$ME.*/syn; do
    dir=$(dirname $ping)
    if [ -f $dir/ack ]; then
        debug Ignoring already acknowledged ping $dir
        continue
    fi
    result=$(settling $dir)
    if [ $result = missing ]; then
        # Other end deleted it
        debug Ignoring $dir after it disappeared
        continue
    elif [ $result = yes ]; then
        debug Ignoring recently received ping $dir
        continue
    fi
    debug Responding to $dir
    echo $(date) > $dir/ack
done
    
for them; do
    case "$them" in
        *\ *) echo no spaces allowed in host names 1>&2; exit 1 ;;
    esac

    dohost $them
done

Another Synology Drive data loss bug

Related

Leave a Reply Cancel reply

Share this:

Related

Leave a Reply Cancel reply

Discover more from Something better to do