pull down to refresh
At this point every week there’s a new “insane benchmark” headline
True. A year ago o3 was the best model on the market. Progress is fast.
Real test is still, can it actually help without hallucinating halfway through the task?
Have you used a SOTA model in opencode yet? Chatbots still do that - the progress of agents is on another level tho.
reply
At this point every week there’s a new “insane benchmark” headline
Real test is still, can it actually help without hallucinating halfway through the task? But yeah, the pace AI models are moving right now is honestly wild.