Gödel's Therapy Room - LLM Eval Harness & Leaderboard \ stacker news

pull down to refresh

Gödel's Therapy Room - LLM Eval Harness & Leaderboard gtr.dev/

115 sats \ 2 comments \ @geeknik 23 Apr 2025 AI

Gödel's Therapy Room is not a benchmark. It's a trap. A dataset of paradoxes, impossible ethical dilemmas, and contradiction loops engineered to test the cognitive integrity of language models. It is currently under review for a talk at AI Engineer World's Fair 2025.

view all related items

1 sat \ 1 reply \ @geeknik OP 24 Apr 2025

Results after 3 days and 58 models tested:
https://x.com/geeknik/status/1915542329349308501 \m/

1 sat \ 0 replies \ @nitter 24 Apr 2025

https://xcancel.com/geeknik/status/1915542329349308501