Chai is running their own open source leaderboard

noneabove1182@sh.itjust.works · edit-2 2 年前

Chai is running their own open source leaderboard

noneabove1182@sh.itjust.works · 2 年前

Yeah it’s a step in the right direction at least, though now that you mention it doesn’t lmsys or someone do the same with human eval and side by side comparisons?

It’s such a tricky line to walk between deterministic questions (repeatable but cheatable) and user questions (real world but potentially unfair)