You just finished setting up all your services and it works fine - how do you now prepare for eventual drive failure?

Kaldo@kbin.social · 1 year ago

You just finished setting up all your services and it works fine - how do you now prepare for eventual drive failure?

dr_robot@kbin.social · 1 year ago

My configuration and deployment is managed entirely via an Ansible playbook repository. In case of absolute disaster, I just have to redeploy the playbook. I do run all my stuff on top of mirrored drives so a single failure isn’t disastrous if I replace the drive quickly enough.

For when that’s not enough, the data itself is backed up hourly (via ZFS snapshots) to a spare pair of drives and nightly to S3 buckets in the cloud (via restic). Everything automated with systemd timers and some scripts. The configuration for these backups is part of the playbooks of course. I test the backups every 6 months by trying to reproduce all the services in a test VM. This has identified issues with my restoration procedure (mostly due to potential UID mismatches).

And yes, I have once been forced to reinstall from scratch and I managed to do that rather quickly through a combination of playbooks and well tested backups.

subtext@lemmy.world · 1 year ago

Dang I really like your idea of testing the backup in a VM… I was worried about how I’d test mine since I only have the one machine, but a VM on my desktop or something should do just fine.