`Reinforcement Learning`

There's an advanced free and open source course at HuggingFace: https://huggingface.co/learn/deep-rl-course/unit0/introduction

south_korea_ln

this thing is clearly trained via RL to think and solve tasks for specific reasoning benchmarks. nothing else.

Perfectly aligns with the perceived villain arc of the CEO. I made a small comment yesterday about how it's apparently okay in AI to do what gave VW massive reputation problems: build inferior products that only perform well on the benchmarks and safety tests.

Deep dive into OpenAIs GPT-OSS outputs 🧵

zuspotirko

> this thing is clearly trained via RL to think and solve tasks for specific reasoning benchmarks. nothing else.

Perfectly aligns with the perceived villain arc of the CEO. I made a small comment yesterday about how it's apparently okay in AI to do what gave VW massive reputation problems: build inferior products that only perform well on the benchmarks and safety tests.