Agent 1: Mistral Vibe
https://cdn.arstechnica.net/wp-content/uploads/2025/12/aimine1.png
Agent 2: OpenAI Codex
https://cdn.arstechnica.net/wp-content/uploads/2025/12/aimine2.png
Agent 3: Anthropic Claude Code
https://cdn.arstechnica.net/wp-content/uploads/2025/12/aimine3.png
Agent 4: Google Gemini CLI
https://cdn.arstechnica.net/wp-content/uploads/2025/12/aimine4.png
Someone could start a pretty awesome website that only has agents perform practical tasks like this and compares them. It's the Techcrunch of tomorrow. Benchmarks leave me wanting.
https://lmarena.ai
RESULTSRESULTS
lambda-1201-2:VS
gemini-3-pro:That's better than the anecdotes I imagined but less entertaining
Updated with results... both work. lol
Source: https://github.com/nickarocho/minesweeper
Ahahah
Haha!
lol.