Synology Drive Client for Linux has a data-loss bug Synology refuses to fix; here’s a workaround

By | March 17, 2021

[For the record, as of September 3, 2025, this bug still isn’t fixed in the current version of Synology Drive Client for Linux, nearly five years after I reported it to Synology. The workaround described below still works.]

I use GnuCash to track my finances. I run GnuCash on three different computers: two Linux and one Mac. For a long time I was using a shell-script wrapper to sync my GnuCash data file between the computers when launching GnuCash, but I recently decided to store the file on my Synology NAS and synchronize it between computers using Synology Drive Client.

Unfortunately, I quickly noticed a significant problem: when I edited my GnuCash data on Mac, it was successfully synchronized onto the NAS as soon as I saved it, but when I edited on Linux, it wasn’t. Then, the next time I edited and saved on Mac, Linux decided there was a conflict between the edited version it had and the updated version sent over from the Mac, so it uploaded its conflicting version onto the NAS, and suddenly I was faced with two different, divergent versions of my GnuCash data file. I then had to merge these by hand, figuring out all the changes in both files from their common ancestor and merging them into one file to avoid losing data. Even worse, if I edited on Linux 1, then edited on Linux 2, then edited on the Mac, I was ending up with three conflicting versions of the data file, with three different sets of changes. Oy!

The root cause of this is actually quite straightforward: on Linux, when a hard link is created within a Drive Client folder, the client does not notice the hard link or upload the file to the NAS. When GnuCash saves a modified data file on Linux, it first saves the file under a temporary file name, then deletes the older version of the file with its “real” file name, then creates a hard link from that name to the temporary file, then deletes the temporary file.

The macOS version of Drive Client does not have this bug. The Linux version of the Dropbox Client does not have this bug.

I reported this problem to Synology Support. Even after I explained to them exactly what the problem is and even explained to them how to reproduce it easily, they refused to acknowledge that the behavior is incorrect or commit to fixing it.

To work around this issue, I wrote a Python script which scrapes the list of sync directories from the Drive Client SQLite database, sets up watchers for files created within those directories, and every time it detects that a file has been created, it updates the timestamp on the file, which tricks Drive Client into noticing the file and synchronizing it to the NAS.

The script is below.

For what it’s worth, in January 2023, I had a conversation with a helpful and competent Synology support engineer, in which I believe I have successfully convinced Synology that there is a bug here that they should fix, and they claim they’ve put it into the queue to be fixed as resources permit. So maybe we’ll get a fix at some point, but as of September 2025 we haven’t yet.

#!/usr/bin/env python3

import argparse
import inotify.adapters
import logging
import logging.handlers
import os
import requests
import signal
import sqlite3
import stat
import sys
import threading
import time

sys_db_path = os.path.expanduser('~/.SynologyDrive/data/db/sys.sqlite')
# The file should be modified at least this often (seconds) or something is
# wrong and we shouldn't update the canary.
sys_db_max_idle = 120

logger = None
resetting = False
last_crash = None
watchers = {}
watcher_tests = {}


class TaskWatcher(object):
    def __init__(self, task):
        self.task = task
        path = task
        if path.endswith(os.path.sep):
            path = os.path.dirname(path)
        logger.info(f'Starting watcher {id(self):x} for {path}')
        self.path = path
        self.synology_dir = os.path.join(self.path,
                                         '.SynologyWorkingDirectory')
        self.obsolete = False
        self.thread = threading.Thread(target=self.watch, daemon=True)
        self.thread.start()

    def clean_tree(self):
        base = os.path.join(self.path, '.SynologyWorkingDirectory')
        prefix = base + os.path.sep
        # This is naughty because we are accessing private attributes inside
        # the Inotify object. Hopefully they won't change the internal
        # structure of the code!
        for watch in list(self.inotify.inotify._Inotify__watches.keys()):
            if watch == base or watch.startswith(prefix):
                self.inotify.inotify.remove_watch(watch)
                logger.debug(f'Removed watch {watch}')

    def watch(self):
        global watcher_tests

        self.inotify = inotify.adapters.InotifyTree(self.path)
        self.clean_tree()

        while not self.obsolete:
            for event in self.inotify.event_gen(
                    yield_nones=False, timeout_s=1):
                self.clean_tree()
                (_, type_names, path, filename) = event
                logger.debug(f'event: type_names={type_names}, path={path}, '
                             f'filename={filename}')
                if path == self.synology_dir:
                    continue
                if 'IN_CREATE' not in type_names:
                    continue
                full_path = os.path.join(path, filename)
                try:
                    stat_obj = os.stat(full_path, follow_symlinks=False)
                except Exception:
                    continue
                if not stat.S_ISREG(stat_obj.st_mode):
                    continue
                if self.task in watcher_tests and \
                   watcher_tests[self.task][0] == filename:
                    logger.info(f'Got event for test file {full_path}')
                    os.unlink(full_path)
                    watcher_tests.pop(self.task)
                    continue
                logger.info('Touching {}'.format(full_path))
                try:
                    os.utime(full_path, times=(stat_obj.st_mtime,
                                               stat_obj.st_mtime))
                except Exception as e:
                    logger.info('Failed to touch {} ({}), continuing'.format(
                        full_path, e))
        logger.info(f'Exiting obsolete watcher {id(self):x} for {self.path}')

    def wait(self):
        self.thread.join()

    def is_alive(self):
        return self.thread.is_alive()


def find_tasks():
    conn = sqlite3.connect(sys_db_path)
    cursor = conn.cursor()
    cursor.execute('SELECT sync_folder from session_table')
    return list(r[0] for r in cursor)


def start_watchers():
    global resetting, watchers

    db_warned = False
    while True:
        if not db_warned:
            logger.info(f'{"Resetting" if resetting else "Scanning"} tasks.')
        try:
            tasks = find_tasks()
        except Exception as e:
            if not db_warned:
                logger.error(f'Failed to open {sys_db_path} ({e}), '
                             f'sleeping and retrying until success')
                db_warned = True
            time.sleep(5)
            continue
        if db_warned:
            logger.info(f'Successfully opened {sys_db_path}')
            db_warned = False
        new_watchers = {}
        for task in tasks:
            if not resetting and task in watchers:
                new_watchers[task] = watchers.pop(task)
            else:
                new_watchers[task] = TaskWatcher(task)
        for task, watcher in watchers.items():
            logger.info(f'Telling watcher {id(watcher):x} for {task} to exit')
            watcher.obsolete = True
        resetting = False
        watchers = new_watchers
        return


def resurrect_watchers():
    global resetting, watchers, last_crash

    crash_time = None

    for p, w in watchers.items():
        if not w.is_alive():
            logger.error(f'Watcher {id(w):x} for {p} crashed')
            crash_time = time.time()
    if crash_time:
        if last_crash and crash_time - last_crash < 5:
            raise Exception('Watchers are crashing too quickly, aborting')
        last_crash = crash_time
        logger.error('Resetting all watchers because of crashed threads')
        resetting = True
        start_watchers()


def watch_tasks():
    global resetting, watcher_tests

    start_watchers()
    start_time = 0
    db_warned = False
    while True:
        # I use 62 seconds here because in my experience Synology Drive updates
        # the file every 60 seconds, perhaps 61 at the outside, so waiting 62
        # seconds should be long enough for it to do it. If not, no harm done,
        # there's no harm in creating a new watcher for it.
        if time.time() - start_time >= 62:
            # If we didn't do a rescan during the previous pass through the
            # loop, then perhaps something is wrong with the inotify watcher?
            # Let's throw our old one away and start over just to be safe.
            # One way this could happen: if the user deletes and recreates
            # their ~/.SynologyDrive directory!
            if not db_warned:
                logger.info(f'Creating watcher for {sys_db_path}')
            i = inotify.adapters.Inotify()
            try:
                i.add_watch(sys_db_path)
            except Exception as e:
                if not db_warned:
                    logger.info(f'Failed to watch {sys_db_path} ({e}), '
                                f'will delay and keep retrying')
                    db_warned = True
                time.sleep(5)
                continue
            start_time = time.time()
            if db_warned:
                logger.info(f'Successfully watched {sys_db_path}')
                db_warned = False

        if resetting:
            start_watchers()
        else:
            resurrect_watchers()

        for task, test in [(task, test) for task, test in watcher_tests.items()
                           if time.time() - test[1] > 2]:
            watcher_tests.pop(task)
            full_path = os.path.join(task, test[0])
            logger.error(f'No event for test file {full_path} after 2 seconds')
            os.unlink(full_path)

        for event in i.event_gen(yield_nones=False, timeout_s=1):
            (_, type_names, path, filename) = event
            if 'IN_MODIFY' not in type_names:
                continue
            start_watchers()
            start_time = time.time()


def maintain_canary(url, stable_interval):
    unstable_interval = 1
    last_problem = ''
    interval = 0
    while True:
        time.sleep(interval)
        interval = stable_interval
        stat_obj = os.stat(sys_db_path)
        delta = int(time.time() - stat_obj.st_mtime)
        if delta > sys_db_max_idle:
            interval = unstable_interval
            if last_problem != 'idle':
                last_problem = 'idle'
                logger.error(f'{sys_db_path} unmodified in {delta}s; '
                             f'not triggering canary')
            continue
        elif last_problem == 'idle':
            last_problem = ''
            logger.info(f'{sys_db_path} modifications resumed; '
                        f'triggering canary')
        try:
            response = requests.get(url, timeout=5)
            response.raise_for_status()
            logger.debug(f'Successfully fetched {url}')
            last_problem = ''
        except Exception as e:
            new_problem = False
            # It's gross to test for this using a string operation like this,
            # but the root cause of the failure is buried so deep in a stack of
            # nested exceptions that doing it this way is less gross than any
            # of the alternatives.
            if 'Temporary failure in name resolution' in str(e) or \
               'Name or service not known' in str(e):
                if last_problem != 'dns':
                    last_problem = 'dns'
                    new_problem = True
                    logger.error(f'DNS failure fetching {url}')
            else:
                if last_problem != 'fetch':
                    last_problem = 'fetch'
                    new_problem = True
                    logger.exception(f'Failed to fetch {url}')
            interval = unstable_interval
            if new_problem:
                logger.error('Sleeping briefly and retrying until success')


def parse_args():
    parser = argparse.ArgumentParser(
        description='Work around Synology Drive data loss bug')
    parser.add_argument('--canary-url', action='store', help='URL to fetch '
                        'periodically as proof of life')
    parser.add_argument('--canary-interval', type=int, action='store',
                        default=300, help='How frequently (seconds) to  fetch '
                        'canary URL (default 300)')
    return parser.parse_args()


def toggle_debug(signum, frame):
    debugging = logger.level == logging.DEBUG
    logger.info(f'Changing log level to {"INFO" if debugging else "DEBUG"} '
                f'in response to signal')
    logger.setLevel(logging.INFO if debugging else logging.DEBUG)


def reset_watchers(signum, frame):
    global resetting
    logger.info('Queueing watcher reset in response to signal')
    resetting = True


def test_watchers(signum, frame):
    global watcher_tests

    filename1 = f'testfile1.{os.getpid()}'
    filename2 = f'testfile2.{os.getpid()}'
    for task in watchers.keys():
        watcher_tests[task] = (filename2, time.time())
        path1 = os.path.join(task, filename1)
        path2 = os.path.join(task, filename2)
        with open(path1, 'w') as f:
            print("foo", file=f)
        os.link(path1, path2)
        os.unlink(path1)
        logger.info(f'Waiting for event for test file {path2}')


def main():
    global logger
    logger = logging.getLogger(os.path.basename(sys.argv[0]))
    logger.setLevel(logging.INFO)
    handler = logging.handlers.SysLogHandler(address='/dev/log')
    logger.addHandler(handler)
    signal.signal(signal.SIGUSR1, toggle_debug)
    signal.signal(signal.SIGUSR2, reset_watchers)
    signal.signal(signal.SIGPWR, test_watchers)

    args = parse_args()
    if args.canary_url:
        canary_thread = threading.Thread(
            target=maintain_canary, daemon=True,
            args=(args.canary_url, args.canary_interval))
        canary_thread.start()
    watch_tasks()


if __name__ == '__main__':
    main()

Note that the script depends on some non-standard modules you’ll have to install from your OS package manager or PyPI.

Here’s the trivial systemd unit file I use to run the script on my Linux computers (obviously, you’ll have to change the path for to wherever you put the script) as a systemd user service when I log in (if you don’t understand what that means, perhaps you shouldn’t be trying to run this script with systemd 😉 ):

[Unit]
Description=Force hard-linked files to sync to Synology Drive

[Service]
Type=exec
ExecStart=/home/jik/bin/synology-inotify.py

[Install]
WantedBy=default.target

Perhaps this will be useful to someone other than me! If so, post a comment or email me and let me know.

Share

Leave a Reply

Your email address will not be published. Required fields are marked *