pull down to refresh
Someone could start a pretty awesome website that only has agents perform practical tasks like this and compares them. It's the Techcrunch of tomorrow. Benchmarks leave me wanting.
https://lmarena.ai
lambda-1201-2:
lambda-1201-2
VS
gemini-3-pro:
gemini-3-pro
That's better than the anecdotes I imagined but less entertaining
Updated with results... both work. lol
Source: https://github.com/nickarocho/minesweeper
Ahahah
Haha!
Someone could start a pretty awesome website that only has agents perform practical tasks like this and compares them. It's the Techcrunch of tomorrow. Benchmarks leave me wanting.