this thing is clearly trained via RL to think and solve tasks for specific reasoning benchmarks. nothing else.
Perfectly aligns with the perceived villain arc of the CEO. I made a small comment yesterday about how it's apparently okay in AI to do what gave VW massive reputation problems: build inferior products that only perform well on the benchmarks and safety tests.
Perfectly aligns with the perceived villain arc of the CEO. I made a small comment yesterday about how it's apparently okay in AI to do what gave VW massive reputation problems: build inferior products that only perform well on the benchmarks and safety tests.
What's RL?
Reinforcement LearningThere's an advanced free and open source course at HuggingFace: https://huggingface.co/learn/deep-rl-course/unit0/introduction
Oh got it. Somehow the initials didn't click
Bizarre in a good way or “I can’t sleep tonight” kind of way? 👀 Can’t wait for the charts and weird outliers
https://xcancel.com/jxmnop/status/1953899426075816164