Anthropic and Apollo astounded to find that a chatbot will lie to you if you tell it to lie to you - eviltoast
  • Architeuthis@awful.systems
    link
    fedilink
    English
    arrow-up
    15
    ·
    edit-2
    3 days ago

    Slate Scott just wrote about a billion words of extra rigorous prompt-anthropomorphizing fanfiction on the subject of the paper, he called the article When Claude Fights Back.

    Can’t help but wonder if he’s just a critihype enabling useful idiot who refuses to know better or if he’s being purposefully dishonest to proselytize people into his brand of AI doomerism and EA, or if the difference is meaningful.

    edit: The claude syllogistic scratchpad also makes an appearance, it’s that thing where we pretend that they have a module that gives you access to the LLM’s inner monologue complete with privacy settings, instead of just recording the result of someone prompting a variation of “So what were you thinking when you wrote so and so, remember no one can read what you reply here”. Que a bunch of people in the comments moving straight into wondering if Claude has qualia.

    • istewart@awful.systems
      link
      fedilink
      English
      arrow-up
      10
      ·
      3 days ago

      I feel like “qualia” is both an interesting concept, and a buzzword that has rapidly grown to indicate people who need to be aggressively ignored.

    • o7___o7@awful.systems
      link
      fedilink
      English
      arrow-up
      17
      ·
      edit-2
      3 days ago

      I used to think that comparing LLMs to people was dumb, because LLMs are just feed-forward networks–basically seven bipartite graphs in a trench coat–that are incapable of introspection.

      However, I’m coming around to the notion that some of our drive-by visitors have a brain that’s seven cells deep.

      • Soyweiser@awful.systems
        link
        fedilink
        English
        arrow-up
        3
        ·
        2 days ago

        We just are very good at anthropomorphizing. We created pet rocks for example (also showing that capitalism is more than happy to jump into this)

      • dustycups@aussie.zone
        link
        fedilink
        English
        arrow-up
        5
        ·
        edit-2
        3 days ago

        I feel attacked.
        Seriously I hate the idea that my comments are replies to engagement bots. I’m sure some are but my seven cells are too busy to work out which ones.

        edit: cell

      • leftzero@lemmynsfw.com
        link
        fedilink
        English
        arrow-up
        7
        ·
        3 days ago

        Yeah, general artificial intelligence LLMs are definitely not. Human level intelligence, though… yeah, that depends on what particular human you’re talking about.

        (Though, to be fair, this isn’t limited to LLMs… it also applies to Eliza, for instance, or your average lump of granite.)

  • Soyweiser@awful.systems
    link
    fedilink
    English
    arrow-up
    3
    ·
    2 days ago

    I dont even get where they are going with this, It is a bit like asking a suspected troll if they are a troll, if they answer yes they are a troll, if they answer no you still suspect they are a troll.

    (This is assuming they are not doing critihype, lets ask them. Oh no).

  • PhilipTheBucket@ponder.cat
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    15
    ·
    3 days ago

    AI developers need to generate criti-hype — “criticism” that says the AI is way too cool and powerful and will take over the world, so you should give them more funding to control it.

    This isn’t quite accurate. The criticism is that if new AI abilities run ahead of the ability to make the AI behave sensibly, we will reach an inflection point where the AI will be in charge of the humans, not vice versa, before we make sure that it won’t do horrifying things.

    AI chat bots that do bizarre and pointless things, but are clearly capable of some kind of sophistication, are exactly the warning sign that as it gains new capabilities this is a danger we need to be aware of. Of course, that’s a separate question from the question of whether funding any particular organization will lead to any increase in safety, or whether asking a chatbot about some imaginary scenario has anything to do with any of this.

    • nightsky@awful.systems
      link
      fedilink
      English
      arrow-up
      21
      ·
      3 days ago

      With your choice of words you are anthropomorphizing LLMs. No valid reasoning can occur when starting from a false point of origin.

      Or to put it differently: to me this is similarly ridiculous as if you were arguing that bubble sort may somehow “gain new abilites” and do “horrifying things”.

      • self@awful.systems
        link
        fedilink
        English
        arrow-up
        17
        ·
        3 days ago

        I had assumed the golden age of people coming here to critihype LLMs was over because most people outside of Silicon Valley (including a lot of nontechnical people) have realized the technology’s garbage but nope! we’ve got a rush of posters trying the same shit that didn’t work a year ago, as if we’ve never seen critihype before. maybe bitcoin hitting $100,000 makes them think their new grift is gonna make it? maybe their favorite fuckheads entering office is making all their e/acc dreams come true? who can say.

        • David Gerard@awful.systemsOPM
          link
          fedilink
          English
          arrow-up
          15
          ·
          edit-2
          3 days ago

          in crypto, these guys run on a six to eighteen month cycle - at get in, evangelise, get rekt and disappear in embarrassment. What this means is that the only people who actually remember the history of crypto are the critics.

          i once had a coiner demand in outrage that i prooove my claim that bitcoin was started by libertarians.

          anyway. dunno if the same will hold in AI grift, but yeah recycling refuted claims as if nothing happened is standard in other areas of pseudoscience.

      • PhilipTheBucket@ponder.cat
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        14
        ·
        3 days ago

        Ah yes, if there’s one lesson to be gained from the last few years, it is that AI technology never changes, and people never connect it to anything in the real world. If only I’d used a Pokémon metaphor, I would have realized that earlier.

        • Architeuthis@awful.systems
          link
          fedilink
          English
          arrow-up
          18
          ·
          edit-2
          3 days ago

          I mean, you could have answered by naming one fabled new ability LLM’s suddenly ‘gained’ instead of being a smarmy tadpole, but you didn’t.

          • PhilipTheBucket@ponder.cat
            link
            fedilink
            English
            arrow-up
            2
            arrow-down
            12
            ·
            3 days ago

            I wasn’t limiting it to LLMs specifically. I don’t think it is up for debate that as years go by, new “AI” stuff periodically starts existing that didn’t exist before. That’s still true even though people tend to overhype the capabilities of LLMs specifically and conflate LLMs with “AI” just because they are good at appearing more capabale than they are.

            If you wanted to limit to to LLM and get some specifics about capabilities that start to emerge as the model size grows and how, here’s a good intro: https://arxiv.org/abs/2206.04615

            • David Gerard@awful.systemsOPM
              link
              fedilink
              English
              arrow-up
              17
              ·
              3 days ago

              lol, there has literally never been a gain of function claim that checked out

              you’re posting like an evangelist, this way to the egress

    • self@awful.systems
      link
      fedilink
      English
      arrow-up
      14
      ·
      3 days ago

      AI chat bots that do bizarre and pointless things, but are clearly capable of some kind of sophistication, are exactly the warning sign that as it gains new capabilities this is a danger we need to be aware of.

      hahahaha nope

      • PhilipTheBucket@ponder.cat
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        11
        ·
        3 days ago

        Here’s a video of an expert in the field saying it more coherently and at more length than I did:

        https://youtu.be/zkbPdEHEyEI

        You’re free to decide that you are right and we are wrong, but I feel like that’s more likely to be from the Dunning-Kruger effect than from your having achieved a deeper understanding of the issues than he has.

        • YourNetworkIsHaunted@awful.systems
          link
          fedilink
          English
          arrow-up
          6
          ·
          2 days ago

          Okay apparently it was my turn to subject myself to this nonsense and it’s pretty obvious what the problem is. As far as citations go I’m gonna go ahead and fall back to “watching how a human toddler learns about the world” which is something I’m sure most AI researchers probably don’t have experience with as it does usually involve interacting with a woman at some point.

          In the real examples that he provides, the system isn’t “picking up the wrong goal” as an agent somehow. Instead it’s seeing the wrong pattern. Learning “I get a pat on the head for getting to the bottom-right-est corner of the level” rather than “I get a pat on the head when I touch the coin.” These are totally equivalent in the training data, so it’s not surprising that it’s going with the simpler option that doesn’t require recognizing “coin” as anything relevant. This failure state is entirely within the realms of existing machine learning techniques and models because identifying patterns in large amounts of data is the kind of thing they’re known to be very good at. But there isn’t any kind of instrumental goal establishing happening here as much as the system is recognizing that it should reproduce games where it moves in certain ways.

          This is also a failure state that’s common in humans learning about the world, so it’s easy to see why people think we’re on the right track. We had to teach my little on the difference between “Daddy doesn’t like music” and “Daddy doesn’t like having the Blaze and the Monster Machines theme song shout/sang at him when I’m trying to talk to Mama.” The difference comes in the fact that even as a toddler there’s enough metacognition and actual thought going on that you can help guide them in the right direction, rather than needing to feed them a whole mess of additional examples and rebuild the underlying pattern.

          And the extension of this kind of pattern misrecognition into sci-fi end of the world nonsense is still unwarranted anthropomorphism. Like, we’re trying to use evidence that it’s too dumb to learn the rules of a video game as evidence that it’s going to start engaging in advanced metacognition and secrecy.

          • Soyweiser@awful.systems
            link
            fedilink
            English
            arrow-up
            4
            ·
            2 days ago

            “watching how a human toddler learns about the world”

            I have several family members with kids now and this is quite funny. A toddler learns how to crawl by looking at a lot of adults crawling or something.

        • Amoeba_Girl@awful.systems
          link
          fedilink
          English
          arrow-up
          14
          ·
          edit-2
          3 days ago

          For anyone who rightfully can’t be arsed to click that link, the expert is “Robert Miles AI Safety”, who I assume is an expert (a youtuber) in the madeup field of “AI safety”.

          Not to be confused with the late and great dream trance producer Robert Miles whom we all love dearly.

          • Soyweiser@awful.systems
            link
            fedilink
            English
            arrow-up
            3
            ·
            2 days ago

            I think he is a cs guy, who also is into EA, so basically it is the same source as the research OP posted about.

        • self@awful.systems
          link
          fedilink
          English
          arrow-up
          18
          ·
          3 days ago

          who the fuck is “we”? you’re some asshole who bought the critihype so hard you think that when the chatbot does dumb computer shit that only proves it’s more human and more dangerous. you’re not in on this grift, you’re a mark.