Reasoning failures highlighted by Apple research on LLMs - eviltoast