My homelab had the stupidest outage ever - eviltoast

This morning I woke up to my phone using mobile data and my home assistant automations not working. Initially I thought it the power was out, but I could turn on the lights just fine. I checked my UniFi app and saw that the server was not connected to the network at all. This meant that the cable got unplugged, the switch isn’t working, or the server isn’t working. It said the switch was connected and another device was connected to the switch so that narrows it down to just 2 cases. So I opened my server closet in the basement and immediately noticed something was wrong. I couldn’t figure out what was wrong but I just felt like something was wrong. Everything was plugged in, the network switch lights were blinking like normal, my raspberry pi was running just fine, even the server indicator lights were on. My main server is an old gaming PC so it has a glass side panel so I looked inside and I could see the fan spinning, but I could not hear it. Usually I have it set to full speed and I can hear full speed very well. I tried rebooting the server with the power button and the fans didn’t go to full speed. As a last resort, I brought down a keyboard and monitor. As soon as I plugged in the monitor, I saw that there was a prompt to set the time on the BIOS! Picture of the prompt In my opinion, this was the stupidest reason for an outage.

Further investigations

I dug a little deeper and discovered that the BIOS had been reset during a power outage right before all of this happened. So far I have consulted the motherboard manual and found absolutely nothing about this. After a bit of research, I think it could have been that the CMOS battery has died. This is a really simple fix but I don’t have the replacement battery right now. This means that I will have the same exact issue after the next power outage unless I replace the battery.

Preventing this in the future

From what I can see, I just need to replace the CMOS battery. But this computer has been running for over 4 years, so what is stopping this from happening again around 2028? The most effective solution is going to be preventing power outages in the first place. This can be done using a battery backup or a standby generator. Standby generators will last longer during a power outage but are typically more expensive and harder to setup than a simple battery backup.

  • Dave.@aussie.zone
    link
    fedilink
    English
    arrow-up
    157
    arrow-down
    1
    ·
    edit-2
    3 months ago
    1. Replace CMOS battery.
    2. Get small UPS.
    3. Discover that small UPS’s fail regularly, usually with cooked batteries.
    4. Add maintenance routine for UPS battery.
    5. Begin to wonder if this is really worth it when the rest of the house has no power during an outage.
    6. Get small generator.
    7. Discover that small generators also need maintenance and exercise.
    8. Decide to get a whole house battery backup a-la Tesla Powerwall topped off by solar and a dedicated generator.
    9. Spend 15 years paying this off while wondering if the payback was really worth it, because you can count on one hand the number of extended power outages in that time.
    10. In the end times a roving band of thugs comes around and kills you and strips your house of valuable technology, leaving your homelab setup behind and - sadly - without power. Your dream of unlimited availability has all been for nought.

    Conclusion: just replace the CMOS battery on a yearly basis during planned system downtime.

    • catloaf@lemm.ee
      link
      fedilink
      English
      arrow-up
      30
      arrow-down
      1
      ·
      3 months ago

      CMOS batteries last a lot longer than a year. Unless the system has been unplugged for a long time, they should be good for several years. I’m sure there’s actual data out there somewhere.

      But yeah, a lot of people think “oh I’ll just put a UPS on it”. They don’t consider that unless you get a really big UPS, they’re only good for very short outages, seconds or minutes, to bridge the gap between the outage and your generator coming on (or the mains power coming back if it was just a flicker).

      Also, the batteries in them need to be replaced every 3-5 years.

      • Dave.@aussie.zone
        link
        fedilink
        English
        arrow-up
        10
        ·
        3 months ago

        Yeah , it’s really a little strange in OPs case, I can’t really recall changing a CMOS battery in ages, like decades of computer use.

      • yonder@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        5
        ·
        3 months ago

        I bought a small APC UPS about a year ago and am glad I did. In my area, very brief outages are somewhat common so a small UPS will work for the majority of outages.

    • rambos@lemm.ee
      link
      fedilink
      English
      arrow-up
      11
      ·
      3 months ago

      What about UPS just for CMOS battery? And a tiny diesel generator with 60 ml tank

      • poVoq@slrpnk.net
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        2
        ·
        3 months ago

        Currently at step 9. Waiting for the roving bands of thugs to arrive 😅

    • umbrella@lemmy.ml
      link
      fedilink
      English
      arrow-up
      6
      ·
      edit-2
      3 months ago

      hey, AI defense turrets sound like a cool and totally not dangerous project. hope you have a decent GPU in your setup, good luck.

    • sugar_in_your_tea@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      3
      ·
      3 months ago

      Yup. I don’t have a UPS (probably should; have a good surge protector though), and replacing CMOS batteries is way easier than dealing with the rest. Thanks for the reminder, I’ll go pick some up and swap it out every so often.