Gödel's Therapy Room is not a benchmark. It's a trap. A dataset of paradoxes, impossible ethical dilemmas, and contradiction loops engineered to test the cognitive integrity of language models. It is currently under review for a talk at AI Engineer World's Fair 2025.
pull down to refresh
0 sats \ 1 reply \ @geeknik OP 24 Apr
Results after 3 days and 58 models tested:
https://x.com/geeknik/status/1915542329349308501 \m/
reply
0 sats \ 0 replies \ @nitter 24 Apr bot
https://xcancel.com/geeknik/status/1915542329349308501
reply