Poisoned AI went rogue during training and couldn't be taught to behave again in 'legitimately scary' study - eviltoast

Poisoned AI went rogue during training and couldn’t be taught to behave again in ‘legitimately scary’ study::AI researchers found that widely used safety training techniques failed to remove malicious behavior from large language models — and one technique even backfired, teaching the AI to recognize its triggers and better hide its bad behavior from the researchers.

  • JustMy2c@lemm.ee
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    5
    ·
    10 months ago

    I know we don’t like them here but the word reddit is not banned (yet)

      • JustMy2c@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        7
        ·
        10 months ago

        So you’re saying that “Inflammatory data” isn’t a reference to reddit? :D

        • kent_eh@lemmy.ca
          link
          fedilink
          English
          arrow-up
          2
          ·
          10 months ago

          I’d say using Twitter and Facebook would be worse than reddit. Or, and I shudder to think about it, truth social…

        • Daxtron2@startrek.website
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          10 months ago

          Not inherently, I’m sure that’s part of it but it’s really everywhere. Even here on Lemmy I’ve run into nasty folk

          • JustMy2c@lemm.ee
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            2
            ·
            10 months ago

            True but it’s reddit that’s served as a base for most models…

              • JustMy2c@lemm.ee
                link
                fedilink
                English
                arrow-up
                1
                arrow-down
                1
                ·
                10 months ago

                Obviously but reddit is in the goldilocks zone where you get coherent intelligent stuff and humor and facts.

                But it’s still toxic for an Ai.

                  • JustMy2c@lemm.ee
                    link
                    fedilink
                    English
                    arrow-up
                    1
                    ·
                    10 months ago

                    Correcto but maybe it DOES apply to most asked questions, if you know where I’m going with that

        • Chocrates@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          10 months ago

          No, LLM is the AI, OP is saying if you train it with hate it’s gonna spit out hate

          • JustMy2c@lemm.ee
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            2
            ·
            10 months ago

            And I’m saying that reddit data is sublime for Ai. And specifically that it’s invested with toxicity