CrowdStrike’s faulty update crashed 8.5 million Windows devices, says Microsoft - eviltoast
  • Halcyon@discuss.tchncs.de
    link
    fedilink
    arrow-up
    41
    arrow-down
    2
    ·
    4 months ago

    From Crowdstrike’s Terms of Use:

    “THE OFFERINGS AND CROWDSTRIKE TOOLS ARE NOT FAULT-TOLERANT AND ARE NOT DESIGNED OR INTENDED FOR USE IN ANY HAZARDOUS ENVIRONMENT REQUIRING FAIL-SAFE PERFORMANCE OR OPERATION. NEITHER THE OFFERINGS NOR CROWDSTRIKE TOOLS ARE FOR USE IN THE OPERATION OF AIRCRAFT NAVIGATION, NUCLEAR FACILITIES, COMMUNICATION SYSTEMS, WEAPONS SYSTEMS, DIRECT OR INDIRECT LIFE-SUPPORT SYSTEMS, AIR TRAFFIC CONTROL, OR ANY APPLICATION OR INSTALLATION WHERE FAILURE COULD RESULT IN DEATH, SEVERE PHYSICAL INJURY, OR PROPERTY DAMAGE. Customer agrees that it is Customer’s responsibility to ensure safe use of an Offering and the CrowdStrike Tools in such applications and installations.”

    https://www.crowdstrike.com/terms-conditions/

  • Glowstick@lemmy.world
    link
    fedilink
    English
    arrow-up
    25
    arrow-down
    1
    ·
    edit-2
    4 months ago

    Why doesn’t any testing get done before updates are pushed out? I’m not talking about extensive testing, I’m talking about like just making sure it doesn’t cause major problems?

    • toffi@feddit.org
      link
      fedilink
      arrow-up
      8
      ·
      4 months ago

      One comment said that they found files on affected machines which consisted or mostly garbage and/or zeroes. This mostly happens when one of the machines pushing the update to the distribution servers has faulty memory. You can workaround those errors if you use checksums and double check everything, but for most departments that is too much work or they regard it as unnecessary.

      • Hildegarde@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        4 months ago

        This is a far more plausible explanation than crowstrike not testing their updates.

        Modern computers are very consistent. Nearly all copy and download operations work correctly. If you verify with checksums, you will be running a test that always passes, so it feels unnecessary, so people want to skip it.

        Helios Airways Flight 522 was a fatal caused by the pilots skipping a test that always passes. Three times their checklists told them to verify the pressurization switch is set to auto, which it always is during normal operations, except not this time because of a maintenance operation.

        When this sort of error can happen even in the highly ordered checklists of a cockpit, its plausible to think it could happen in computing as well.

    • TrumpetX@programming.dev
      link
      fedilink
      English
      arrow-up
      6
      ·
      4 months ago

      It probably was tested. I’ll enjoy reading the post mortem they’ll inevitably do. I’m going with, they tested it, but didn’t text xyz that demonstrated the bug.