> At this point every week there’s a new “insane benchmark” headline

True. A year ago o3 was the best model on the market. Progress is fast.

> Real test is still, can it actually help without hallucinating halfway through the task?

Have you used a SOTA model in opencode yet? Chatbots still do that - the progress of agents is on another level tho.

At this point every week there’s a new “insane benchmark” headline 
Real test is still, can it actually help without hallucinating halfway through the task? But yeah, the pace AI models are moving right now is honestly wild.

3a0991ac06

Hmmm, looks like artificial analysis ai has a different takeaway

![](https://pbs.twimg.com/media/HIs33S0XsAAG9Jj?format=jpg&name=orig)

zuspotirko

https://twiiit.com/Google/status/2056788266872140232

nitter

![](https://pbs.twimg.com/media/HIsuCHxaUAAwn8y?format=jpg&name=orig)

https://x.com/Google/status/2056788266872140232