pull down to refresh

Someone could start a pretty awesome website that only has agents perform practical tasks like this and compares them. It's the Techcrunch of tomorrow. Benchmarks leave me wanting.

https://lmarena.ai


RESULTSRESULTS

lambda-1201-2:

VS

gemini-3-pro:

reply

That's better than the anecdotes I imagined but less entertaining

reply

Updated with results... both work. lol

reply