Oops! We Automated Bullshit. | Department of Computer Science and Technology - eviltoast
  • huginn@feddit.it
    link
    fedilink
    arrow-up
    9
    ·
    1 year ago

    I’m a few months out of date in the latest in the field and I know it’s changing quickly. What progress has been made towards solving hallucinations? The feeding output into another LLM for evaluation never seemed like a tenable solution to me.

    • mkhoury@lemmy.ca
      link
      fedilink
      arrow-up
      6
      ·
      1 year ago

      Essentially, you don’t ask them to use their internal knowledge. In fact, you explicitly ask them not to. The technique is generally referred to as Retrieval Augmented Generation. You take the context/user input and you retrieve relevant information from the net/your DB/vector DB/whatever, and you give it to an LLM with how to transform this information (summarize, answer a question, etc).

      So you try as much as you can to “ground” the LLM with knowledge that you trust, and to only use this information to perform the task.

      So you get a system that can do a really good job at transforming the data you have into the right shape for the task(s) you need to perform, without requiring your LLM to act as a source of information, only a great data massager.

      • huginn@feddit.it
        link
        fedilink
        arrow-up
        4
        ·
        1 year ago

        Definitely a good use for the tool: NLP is what LLMs do best and pinning down the inputs to only be rewording or compressing ground truth avoids hallucination.

        I expect you could use a much smaller model than gpt to do that though. Even llama might be overkill depending on how tightly scoped your DB is

      • Blóðbók@slrpnk.net
        link
        fedilink
        arrow-up
        4
        ·
        1 year ago

        That seems like it should work in theory, but having used Perplexity for a while now, it doesn’t quite solve the problem.

        The biggest fundamental problem is that it doesn’t understand in any meaningful capacity what it is saying. It can try to restate something it sourced from a real website, but because it doesn’t understand the content it doesn’t always preserve the essence of what the source said. It will also frequently repeat or contradict itself in as little as two paragraphs based on two sources without acknowledging it, which further confirms the severe lack of understanding. No amount of grounding can overcome this.

        Then there is the problem of how LLMs don’t understand negation. You can’t reliably reason with it using negated statements. You also can’t ask it to tell you about things that do not have a particular property. It can’t filter based on statements like “the first game in the series, not the sequel”, or “Game, not Game II: Sequel” (however you put it, you will often get results pertaining to the sequel snucked in).