Stubsack: weekly thread for sneers not worth an entire post, week ending 21st December 2025

BlueMonday1984@awful.systems · 1 month ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 21st December 2025

rook@awful.systems · 1 month ago

So, I’m taking this one with a pinch of salt, but it is entertaining: “We Let AI Run Our Office Vending Machine. It Lost Hundreds of Dollars.”

The whole exercise was clearly totally pointless and didn’t solve anything that needed solving (like every other “ai” project, i guess) but it does give a small but interesting window into the mindset of people who have only one shitty tool and are trying to make it do everything. Your chatbot is too easily lead astray? Use another chatbot to keep it in line! Honestly, I thought they were already doing this… I guess it was just to expensive or something, but now the price/desperation curves have intersected

Anthropic had already run into many of the same problems with Claudius internally so it created v2, powered by a better model, Sonnet 4.5. It also introduced a new AI boss: Seymour Cash, a separate CEO bot programmed to keep Claudius in line. So after a week, we were ready for the sequel.

Just one more chatbot, bro. Then prompt injection will become impossible. Just one more chatbot. I swear.

Anthropic and Andon said Claudius might have unraveled because its context window filled up. As more instructions, conversations and history piled in, the model had more to retain—making it easier to lose track of goals, priorities and guardrails. Graham also said the model used in the Claudius experiment has fewer guardrails than those deployed to Anthropic’s Claude users.

Sorry, I meant just one more guardrail. And another ten thousand tokens capacity in the context window. That’ll fix it forever.

https://archive.is/CBqFs

nfultz@awful.systems · edit-2 1 month ago

Why is WSJ rehashing six month old whitepapers? Slow news week /s

Anything new vs the last time it popped up? https://www.anthropic.com/research/project-vend-1

EDIT:

In mid-November, I agreed to an experiment. Anthropic had tested a vending machine powered by its Claude AI model in its own offices and asked whether we’d like to be the first outsiders to try a newer, supposedly smarter version.

Whats that word for doing the same thing and expecting different results?

rook@awful.systems · 1 month ago

Hah! Well found. I do recall hearing about another simulated vendor experiment (that also failed) but not actual dog-fooding. Looks like the big upgrade the wsj reported on was the secondary “seymour cash” 🙄 chatbot bolted on the side… the main chatbot was still claude v3.7, but maybe they’d prompted it harder and called that an upgrade.

I wonder if anthropic trialled that in house, and none of them were smart enough to break it, and that’s what lead to the external trial.