Reasoning failures highlighted by Apple research on LLMs - eviltoast
  • gr3q@lemmy.ml
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    17 days ago

    I tested chatgpt, it needed some nagging but it could do it. Needed the size, blank and white keywords.

    Obviously a lot harder than it should be, but not impossible.