Stop services while creating snapshots during backup? - eviltoast

It’s fairly obvious why stopping a service while backing it up makes sense. Imagine backing up Immich while it’s running. You start the backup, db is backed up, now image assets are being copied. That could take an hour. While the assets are being backed up, a new image is uploaded. The live database knows about it but the one you’ve backed up doesn’t. Then your backup process reaches the new image asset and it copies it. If you restore this backup, Immich will contain an asset that isn’t known by the database. In order to avoid scenarios like this, you’d stop Immich while the backup is running.

Now consider a system that can do instant snapshots like ZFS or LVM. Immich is running, you stop it, take a snapshot, then restart it. Then you backup Immich from the snapshot while Immich is running. This should reduce the downtime needed to the time it takes to do the snapshot. The state of Immich data in the snapshot should be equivalent to backing up a stopped Immich instance.

Now consider a case like above without stopping Immich while taking the snapshot. In theory the data you’re backing up should represent the complete state of Immich at a point in time eliminating the possibility of divergent data between databases and assets. It would however represent the state of a live Immich instance. E.g. lock files, etc. Wouldn’t restoring from such a backup be equivalent to kill -9 or pulling the cable and restarting the service? If a service can recover from a cable pull, is it reasonable to consider it should recover from restoring from a snapshot taken while live? If so, is there much point to stopping services during snapshots?

  • Avid Amoeba@lemmy.caOP
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    4 months ago

    And this implies you have tested such backups right?

    Side Q, how long do those LVM snapshots take? How long does it take to merge them afterwards?

    • butitsnotme@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 months ago

      Yes, I have. I should probsbly test them again though, as it’s been a while, and Immich at least has had many potentially significant changes.

      LVM snapshots are virtually instant, and there is no merge operation, so deleting the snapshot is also virtually instant. The way it works is by creating a new space where the difference from the main volume are written, so each time the application writes to the main volume the old block will be copied to the snapshot first. This does mean that disk performance will be somewhat lower than without snapshots, however I’ve not really noticed any practical implications. (I believe LVM typically creates my snapshots on a different physical disk from where the main volume lives though.)

      You can my backup script here.

      • Avid Amoeba@lemmy.caOP
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 months ago

        Oh interesting. I was under the impression that deletion in LVM was actually merging which took some time but I guess not. Thanks for the info!