pull down to refresh

We talk smack about benchmarks but conclude they may finally be worth our time.

We do a dramatic reading of OpenAI's blog post then feed it to OpenAgents which sets up a new repo as benchmark workspace.

We're going for the high score!

Watch on X: https://x.com/OpenAgentsInc/status/1823454256596213969

this territory is moderated