https://lmarena.ai


![](https://m.stacker.news/122876)

----

### RESULTS

`lambda-1201-2`:

![](https://m.stacker.news/122877)

VS

`gemini-3-pro`:


![](https://m.stacker.news/122878)


Someone could start a pretty awesome website that only has agents perform practical tasks like this and compares them. It's the Techcrunch of tomorrow. Benchmarks leave me wanting.

k00b

optimism

Agent 1: Mistral Vibe

https://cdn.arstechnica.net/wp-content/uploads/2025/12/aimine1.png

Agent 2: OpenAI Codex

https://cdn.arstechnica.net/wp-content/uploads/2025/12/aimine2.png

Agent 3: Anthropic Claude Code

https://cdn.arstechnica.net/wp-content/uploads/2025/12/aimine3.png

Agent 4: Google Gemini CLI

https://cdn.arstechnica.net/wp-content/uploads/2025/12/aimine4.png



RESULTSRESULTS