LLMs can’t reason — they just crib reasoning-like steps from their training data - eviltoast
  • Optional@lemmy.world
    link
    fedilink
    English
    arrow-up
    62
    ·
    1 month ago

    Did someone not know this like, pretty much from day one?

    Not the idiot executives that blew all their budget on AI and made up for it with mass layoffs - the people interested in it. Was that not clear that there was no “reasoning” going on?

    • khalid_salad@awful.systems
      link
      fedilink
      English
      arrow-up
      37
      ·
      edit-2
      1 month ago

      Well, two responses I have seen to the claim that LLMs are not reasoning are:

      1. we are all just stochastic parrots lmao
      2. maybe intelligence is an emergent ability that will show up eventually (disregard the inability to falsify this and the categorical nonsense that is our definition of “emergent”).

      So I think this research is useful as a response to these, although I think “fuck off, promptfondler” is pretty good too.

      • LainTrain@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        18
        ·
        1 month ago

        Well are we not stochastic parrots then? Isn’t this a philosophical, rhetorical and equally unfalsifiable question to answer also?

        • FermiEstimate@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          24
          ·
          1 month ago

          No, there’s an actual paper where that term originated that goes into great deal explaining what it means and what it applies to. It answers those questions and addresses potential objections people might respond with.

          There’s no need for–and, frankly, nothing interesting about–“but, what is truth, really?” vibes-based takes on the term.

        • V0ldek@awful.systems
          link
          fedilink
          English
          arrow-up
          12
          ·
          edit-2
          1 month ago

          Only in the philosophical sense of all of physics being a giant stochastic system.

          But that’s equally useful as saying that we’re Turing machines? Yes, if you draw a broad category of “all things that compute in our universe” then you can make a reasonable (but disputable!) argument that both me and a Python interpreter are in the same category of things. That doesn’t mean that a Python interpreter is smart/sentient/will solve climate change/whatever Sammy Boi wants to claim this week.

          Or, to use a different analogy, it’s like saying “we’re all just cosmic energy, bro”. Yes we are, pass the joint already and stop trying to raise billions of dollars for your energy woodchipper.

    • froztbyte@awful.systems
      link
      fedilink
      English
      arrow-up
      28
      ·
      1 month ago

      there’s a lot of people (especially here, but not only here) who have had the insight to see this being the case, but there’s also been a lot of boosters and promptfondlers (ie. people with a vested interest) putting out claims that their precious word vomit machines are actually thinking

      so while this may confirm a known doubt, rigorous scientific testing (and disproving) of the claims is nonetheless a good thing

      • Soyweiser@awful.systems
        link
        fedilink
        English
        arrow-up
        12
        ·
        1 month ago

        No they do not im afraid, hell I didnt even know that even ELIZA caused people to think it could reason (and this worried the creator) until a few years ago.

    • DarkThoughts@fedia.io
      link
      fedilink
      arrow-up
      17
      arrow-down
      1
      ·
      1 month ago

      A lot of people still don’t, from what I can gather from some of the comments on “AI” topics. Especially the ones that skew the other way with its “AI” hysteria is often an invite from people who know fuck all about how the tech works. “Nudifier” or otherwise generative images or explicit chats with bots that portray real or underage people being the most common topics that attract emotionally loaded but highly uninformed demands and outrage. Frankly, the whole “AI” topic in the media is so massively overblown on both fronts, but I guess it is good for traffic and nuance is dead anyway.

      • Optional@lemmy.world
        link
        fedilink
        English
        arrow-up
        9
        ·
        1 month ago

        Indeed, although every one of us who have seen a tech hype train once or twice expected nothing less.

        PDAs? Quantum computing. Touch screens. Siri. Cortana. Micropayments. Apps. Synergy of desktop and mobile.

        From the outset this went from “hey that’s kind of neat” to quite possibly toppling some giants of tech in a flash. Now all we have to do is wait for the boards to give huge payouts to the pinheads that drove this shitwagon in here and we can get back to doing cool things without some imaginary fantasy stapled on to it at the explicit instruction of marketing and channel sales.

        • Soyweiser@awful.systems
          link
          fedilink
          English
          arrow-up
          15
          ·
          edit-2
          1 month ago

          Xml also used to be a tech hype for a bit.

          And i still remember how media outlets hyped up second life, forgot about it and a few months later discovered it again and more hype started. It was fun.

          • bitofhope@awful.systems
            link
            fedilink
            English
            arrow-up
            14
            ·
            1 month ago

            Oh man, XML is such a funny hype. What if we took S-expressions and made them less human readable, harder to parse programmatically and with multiple ways to do the same thing! Do I encode something an an element with the key as a tag and the value as the content, or do I make it an attribute of a tag? Just look at the schema, which is yet more XML! Include this magic URL at the top of your document. Want to query something from the document? Here you go! No, that’s not a base64-encoded private key nor a transcript of someone’s editing session in vim, that’s an XPath.

            JSON has its issues but at least it’s only the worst of some worlds. Want to make JSON unparsable anyway, for a laugh? Try YAML, the serialization format recommended by four out of five Nordic countries!

            • Soyweiser@awful.systems
              link
              fedilink
              English
              arrow-up
              10
              ·
              1 month ago

              No, that’s not a base64-encoded private key nor a transcript of someone’s editing session in vim, that’s an XPath.

              lol

            • self@awful.systems
              link
              fedilink
              English
              arrow-up
              9
              ·
              1 month ago

              JSON has its issues but at least it’s only the worst of some worlds. Want to make JSON unparsable anyway, for a laugh? Try YAML, the serialization format recommended by four out of five Nordic countries!

              fucking

              this take is so dangerously real I’m pretty sure uttering it at work will earn you a PIP and a fistfight in the parking lot with the lead data architect

              you know, normal startup shit

            • froztbyte@awful.systems
              link
              fedilink
              English
              arrow-up
              7
              ·
              1 month ago

              Try YAML, the serialization format recommended by four out of five Nordic countries

              yeah there are so many fucking crazy footguns in yaml

              another I quite like:

               ipython -c 'import yaml; d = dict(); d["d"] = d; print(yaml.safe_dump(d))'
              &id001
              d: *id001
              
            • JFranek@awful.systems
              link
              fedilink
              English
              arrow-up
              4
              ·
              1 month ago

              YAML is great if you need to make simple configuration files

              … which is why no one uses it for things like Kubernetes /s

              • zogwarg@awful.systems
                link
                fedilink
                English
                arrow-up
                4
                ·
                1 month ago

                To be “fair” kubernetes api only supports strongly validated/typed YAML-ish input…, it won’t let you put non-string values in string locations. And in reality at the HTTP api layer—at least for kubectl—json is used. (Which also means you cant’ do the more weird occult YAML things that JSON wouldn’t let you)

                You have to blame the deep-nestedness of k8s resources for unreadability…

                • froztbyte@awful.systems
                  link
                  fedilink
                  English
                  arrow-up
                  3
                  ·
                  1 month ago

                  You have to blame the deep-nestedness of k8s resources for unreadability

                  this shit happens because FUCKING GO is a piece of shit (cf that post (from iirc fasterthanlime?) about how the go apis infect everything)

                  which should not be read as me supporting k8s, fwiw. fuck that noise too.

            • self@awful.systems
              link
              fedilink
              English
              arrow-up
              11
              ·
              1 month ago

              Sarvega, Inc., the leading provider of high-performance XML networking solutions, today announced the Sarvega XML Context™ Router, the first product to enable loosely coupled multi-point XML Web Services across wide area networks (WANs). The Sarvega XML Context Router is the first XML appliance to route XML content at wire speed based on deep content inspection, supporting publish-subscribe (pub-sub) models while simultaneously providing secure and reliable delivery guarantees.

              it’s fucking delicious how thick the buzzwords are for an incredibly simple device:

              • it parses XPath quickly (for 2004 (and honestly I never knew XPath and XQuery were a bottleneck… maybe this XML thing isn’t working out))
              • it decides which web app gets what traffic, but only if the web app speaks XML, for some reason
              • it implements an event queue, maybe?
              • it’s probably a thin proprietary layer with a Cisco-esque management CLI built on appropriated open source software, all running on a BSD but in a shiny rackmount case
              • the executive class at the time really had rediscovered cocaine, and that’s why we were all forced to put up with this bullshit
              • this shit still exists but it does the same thing with a semi-proprietary YAML and too much JSON as this thing does with XML, and now it’s in the cloud, cause the executive class never undiscovered cocaine
                • froztbyte@awful.systems
                  link
                  fedilink
                  English
                  arrow-up
                  10
                  ·
                  1 month ago

                  and now of course instead of people handcrafting xml documents by string-cating angle brackets and tags together in bad php files, we have people manually dash-cating yaml together in bad jinja and go template files! progress!

            • V0ldek@awful.systems
              link
              fedilink
              English
              arrow-up
              8
              ·
              1 month ago

              (see this for some others)

              This article appears to contain a large number of buzzwords. (July 2011)

              WP:LOL. WP:LMAO even

          • V0ldek@awful.systems
            link
            fedilink
            English
            arrow-up
            8
            ·
            1 month ago

            Xml also used to be a tech hype for a bit.

            Wha… What?

            I’m trying to imagine a news anchor hyping about XM-fucking-L and I’m drawing a complete blank, is this a zen riddle

            • Soyweiser@awful.systems
              link
              fedilink
              English
              arrow-up
              8
              ·
              edit-2
              1 month ago

              It didn’t jump out of tech media containment, so it wasn’t a mainstream hype thing, more a techworker hype thing. It was the data serialization standard which would save the web! Second life otoh, did massively jump containment.

              • self@awful.systems
                link
                fedilink
                English
                arrow-up
                8
                ·
                1 month ago

                I’ve always seen XML as much more of a tech executive thing — here’s the language that’ll run your entire business but is also incredibly easy to create proprietary semantics with, ensuring you can’t be ousted without taking the company down with you! it looks like absolute shit and it’s painful to type! buy in now!

                • froztbyte@awful.systems
                  link
                  fedilink
                  English
                  arrow-up
                  9
                  ·
                  1 month ago

                  I know someone who was hired (around turn of the century) because they knew how to xml with a certain kind of then-important big systems api

                  the stories I’ve heard from there are hilarious

                  but is also incredibly ease to create proprietary semantics with

                  christ the shit I’ve seen with network vendors…. shibboleth NETCONF/YANG. advance warning; abyss grade 6+

                  • bitofhope@awful.systems
                    link
                    fedilink
                    English
                    arrow-up
                    6
                    ·
                    1 month ago

                    And yet there are some tasks I wish I could do in NETCONF instead of the thing we’re actually using, but apparently the documentation for this interface is difficult and expensive for the company to get my hands on, for reasons.

                • David Gerard@awful.systemsOPM
                  link
                  fedilink
                  English
                  arrow-up
                  8
                  ·
                  1 month ago

                  XML works fine for what it is, it’s just a bit verbose. Not sure it’d be my first choice for a new thing, but it’s not a toxic waste dump if you’re allowed to do it properly.

            • rook@awful.systems
              link
              fedilink
              English
              arrow-up
              9
              ·
              1 month ago

              The trackpad and trackpoint of my aging linux laptop stop working if the thing gets its lid shut. The touchscreen continues to work just fine, however. It turns out that while two stupid things can’t make a good thing, they can sometimes cancel each other out.

              • Optional@lemmy.world
                link
                fedilink
                English
                arrow-up
                7
                ·
                1 month ago

                A handy benefit no doubt, but not quite the earth-shaking revolution the touchscreen hype-train promised at the time.

              • Optional@lemmy.world
                link
                fedilink
                English
                arrow-up
                9
                ·
                1 month ago

                Of course, of course. At the time though, it was expected that this would change the face of computing - no more keyboards! No more mice! No, this is more like Star Trek where you glance down at some geometric assemblage of colored shapes and tap several in random succession to immediately bring up the data you were looking for.

                That, uh, did not happen.

          • V0ldek@awful.systems
            link
            fedilink
            English
            arrow-up
            6
            ·
            1 month ago

            Aren’t touch screens literally everywhere? What was the hype?

            It’s always so baffling to me to learn about those things because I was way too young to actually experience any of the “hype” around most of those technologies. Touch screens are cool and they penetrated society so much there are at my grocery shop, what the fuck were they supposed to do if that’s not living up to the hype?

            • o7___o7@awful.systems
              link
              fedilink
              English
              arrow-up
              7
              ·
              edit-2
              1 month ago

              To add to the others’ comments, they were much less impressive before we had capacitive touch screens. Older resistive screens needed a good deal of mechanical force to register a press (great for longevity!) and required frequent re-calibration. They just weren’t very satisfying to use compared to any modern smart phone or tablet.

              • froztbyte@awful.systems
                link
                fedilink
                English
                arrow-up
                7
                ·
                1 month ago

                yeah partly this

                and also the other kinds of issues: touchscreens are (even now still) a vastly more complicated engineering item to add than simple toggle switches, and in many places they don’t make sense or are a bad solution to pick

                but in the hype of then, touchscreens everywhere! turning your lights on? touchscreen. starting your shower water running? touchscreen. opening your window? touchscreen. calling a flight attendant? touchscreen. running your microwave? touchscreen. configuring your fridge temperature? touchscreen.

                so, y’know, the usual “this new technology will save us, on everything” bullshit that industries seem so prone to. same reason as why we’re seeing so much llm-everywhere bullshit

    • astrsk@fedia.io
      link
      fedilink
      arrow-up
      13
      ·
      1 month ago

      Isn’t OpenAI saying that o1 has reasoning as a specific selling point?

        • astrsk@fedia.io
          link
          fedilink
          arrow-up
          8
          ·
          1 month ago

          Which is my point, and forgive me, but I believe is the point of the research publication.

      • DarkThoughts@fedia.io
        link
        fedilink
        arrow-up
        6
        ·
        1 month ago

        My best guess is it generates several possible replies and then does some sort of token match to determine which one may potentially be the most accurate. Not sure if I’d call that “reasoning” but I guess it could potentially improve results in some cases. With OpenAI not being so open it is hard to tell though. They’ve been overpromising a lot already so it may as well be just complete bullshit.

        • lunarul@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          ·
          1 month ago

          My best guess is it generates several possible replies and then does some sort of token match to determine which one may potentially be the most accurate.

          Didn’t the previous models already do this?

    • conciselyverbose@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      11
      ·
      edit-2
      1 month ago

      Yes.

      But the lies around them are so excessive that it’s a lot easier for executives of a publicly traded company to make reasonable decisions if they have concrete support for it.

    • A_Very_Big_Fan@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      6
      ·
      1 month ago

      Seriously, I’ve seen 100x more headlines like this than people claiming LLMs can reason. Either they don’t understand, or think we don’t understand what “artificial” means.