pull down to refresh

this thing is clearly trained via RL to think and solve tasks for specific reasoning benchmarks. nothing else.

Perfectly aligns with the perceived villain arc of the CEO. I made a small comment yesterday about how it's apparently okay in AI to do what gave VW massive reputation problems: build inferior products that only perform well on the benchmarks and safety tests.

reply

What's RL?

reply

Reinforcement Learning

There's an advanced free and open source course at HuggingFace: https://huggingface.co/learn/deep-rl-course/unit0/introduction

reply

Oh got it. Somehow the initials didn't click

reply

Bizarre in a good way or “I can’t sleep tonight” kind of way? 👀 Can’t wait for the charts and weird outliers

reply