pull down to refresh

this thing is clearly trained via RL to think and solve tasks for specific reasoning benchmarks. nothing else.
Perfectly aligns with the perceived villain arc of the CEO. I made a small comment yesterday about how it's apparently okay in AI to do what gave VW massive reputation problems: build inferior products that only perform well on the benchmarks and safety tests.
What's RL?
reply
95 sats \ 1 reply \ @optimism 2h
Reinforcement Learning
There's an advanced free and open source course at HuggingFace: https://huggingface.co/learn/deep-rl-course/unit0/introduction
reply
Oh got it. Somehow the initials didn't click
reply