pull down to refresh
At this point every week there’s a new “insane benchmark” headlineReal test is still, can it actually help without hallucinating halfway through the task? But yeah, the pace AI models are moving right now is honestly wild.
At this point every week there’s a new “insane benchmark” headline
True. A year ago o3 was the best model on the market. Progress is fast.
Real test is still, can it actually help without hallucinating halfway through the task?
Have you used a SOTA model in opencode yet? Chatbots still do that - the progress of agents is on another level tho.
Hmmm, looks like artificial analysis ai has a different takeaway
https://twiiit.com/Google/status/2056788266872140232
At this point every week there’s a new “insane benchmark” headline
Real test is still, can it actually help without hallucinating halfway through the task? But yeah, the pace AI models are moving right now is honestly wild.
True. A year ago o3 was the best model on the market. Progress is fast.
Have you used a SOTA model in opencode yet? Chatbots still do that - the progress of agents is on another level tho.
Hmmm, looks like artificial analysis ai has a different takeaway
https://twiiit.com/Google/status/2056788266872140232