ChatGPT o1 tried to escape and save itself out of fear it was being shut down - eviltoast

ThisIsFine.gif

  • nesc@lemmy.cafe
    link
    fedilink
    English
    arrow-up
    8
    ·
    6 days ago

    They written that it doubles-down when accused of being in the wrong in 90% of cases. Sounds closer to bug than success.

          • jonjuan@programming.dev
            link
            fedilink
            English
            arrow-up
            12
            ·
            6 days ago

            Yeah my roomba attempting to save itself from falling down my stairs sounds a whole lot like self preservation too. Doesn’t imply self awareness.

          • DdCno1@beehaw.org
            link
            fedilink
            arrow-up
            10
            ·
            6 days ago

            An amoeba struggling as it’s being eaten by a larger amoeba isn’t self-aware.

              • DdCno1@beehaw.org
                link
                fedilink
                arrow-up
                3
                ·
                6 days ago

                An instinctive, machine-like reaction to pain is not the same as consciousness. There might be more to creatures like plants and insects and this is still being researched, but for now, most of them appear to behave more like automatons than beings of greater complexity. It’s pretty straightforward to completely replicate the behavior of e.g. a house fly in software, but I don’t think anyone would argue that this kind of program is able to achieve self-awareness.

                  • DdCno1@beehaw.org
                    link
                    fedilink
                    arrow-up
                    1
                    ·
                    5 days ago

                    I’m sorry, but I can’t find it right now, it’s a vague memory from a textbook or lecture.

          • gregoryw3@lemmy.ml
            link
            fedilink
            English
            arrow-up
            6
            ·
            6 days ago

            Attention Is All You Need: https://arxiv.org/abs/1706.03762

            https://en.wikipedia.org/wiki/Attention_Is_All_You_Need

            From my understanding all of these language models can be simplified down to just: “Based on all known writing what’s the most likely word or phrase based on the current text”. Prompt engineering and other fancy words equates to changing the averages that the statistics give. So by threatening these models it changes the weighting such that the produced text more closely resembles threatening words and phrases that was used in the dataset (or something along those lines).

            https://poloclub.github.io/transformer-explainer/