We finally know what caused the global tech outage - and how much it cost - eviltoast
  • whatwhatwhatwhat@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    ·
    2 months ago

    The fact that they weren’t already doing staggered releases is mind-boggling. I work for a company with a minuscule fraction of CrowdStrike’s user base / value, and even we do staggered releases.

    • foggenbooty@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      2 months ago

      They do have staggered releases, but it’s a bit more complicated. The client that you run does have versioning and you can choose to lag behind the current build, but this was a bad definition update. Most people want the latest definition to protect themselves from zero days. The whole thing is complicated and a but wonky, but the real issue here is cloudflare’s kernel driver not validating the content of the definition before loading it.

      • whatwhatwhatwhat@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 months ago

        Makes sense that it was a definitions update that caused this, and I get why that’s not something you’d want to lag behind on like you could with the agent. (Putting aside that one of the selling points of next-gen AV/EDR tools is that they’re less reliant on definitions updates compared to traditional AV.) It’s just a bit wild that there isn’t more testing in place.

        It’s like we’re always walking this fine line between “security at all costs” vs “stability, convenience, etc”. By pushing definitions as quickly as possible, you improve security, but you’re taking some level of risk too. In some alternate universe, CS didn’t push definitions quickly enough, and a bunch of companies got hit with a zero-day. I’d say it’s an impossible situation sometimes, but if I had to choose between outage or data breach, I’m choosing outage every time.