pull down to refresh

The best answer — AI has “jagged intelligence” — lies in between hype and skepticism.
AI companies now claim that their models are capable of genuine reasoning — the type of thinking you and I do when we want to solve a problem.
And the big question is: Is that true?
The stakes are high, because the answer will inform how everyone from your mom to your government should — and should not — turn to AI for help.
AI experts are torn over how to interpret this. Skeptics take it as evidence that “reasoning” models aren’t really reasoning at all. Believers insist that the models genuinely are doing some reasoning, and though it may not currently be as flexible as a human’s reasoning, it’s well on its way to getting there.
So, who’s right?
The best answer will be unsettling to both the hard skeptics of AI and the true believers.
...
What’s going on here? Greenblatt says the model messed up because this prompt is actually a classic logic puzzle that dates back centuries and that would have appeared many times in the training data. In some formulations of the river-crossing puzzle, a farmer with a wolf, a goat, and a cabbage must cross over by boat. The boat can only carry the farmer and a single item at a time — but if left together, the wolf will eat the goat or the goat will eat the cabbage, so the challenge is to get everything across without anything getting eaten. That explains the model’s mention of a cabbage in its response. The model would instantly “recognize” the puzzle.
“My best guess is that the models have this incredibly strong urge to be like, ‘Oh, it’s this puzzle! I know what this puzzle is! I should do this because that performed really well in the training data.’ It’s like a learned heuristic,” Greenblatt said. The implication? “It’s not that it can’t solve it. In a lot of these cases, if you say it’s a trick question, and then you give the question, the model often does totally fine.”
Humans fail in the same way all the time, he pointed out. If you’d just spent a month studying color theory — from complementary colors to the psychological effects of different hues to the historical significance of certain pigments in Renaissance paintings — and then got a quiz asking, “Why did the artist paint the sky blue in this landscape painting?”... well, you might be tricked into writing a needlessly complicated answer! Maybe you’d write about how the blue represents the divine heavens, or how the specific shade suggests the painting was done in the early morning hours which symbolizes rebirth … when really, the answer is simply: Because the sky is blue!
Ajeya Cotra, a senior analyst at Open Philanthropy who researches the risks from AI, agrees with Greenblatt on that point. And, she said of the latest models, “I think they’re genuinely getting better at this wide range of tasks that humans would call reasoning tasks.”
She doesn’t dispute that the models are doing some meta-mimicry. But when skeptics say “it’s just doing meta-mimicry,” she explained, “I think the ‘just’ part of it is the controversial part. It feels like what they’re trying to imply often is ‘and therefore it’s not going to have a big impact on the world’ or ‘and therefore artificial superintelligence is far away’ — and that’s what I dispute.”
To see why, she said, imagine you’re teaching a college physics class. You’ve got different types of students. One is an outright cheater: He just looks in the back of the book for the answers and then writes them down. Another student is such a savant that he doesn’t even need to think about the equations; he understands the physics on such a deep, intuitive, Einstein-like level that he can derive the right equations on the fly. All the other students are somewhere in the middle: They’ve memorized a list of 25 equations and are trying to figure out which equation to apply in which situation.
Like the majority of students, AI models are pairing some memorization with some reasoning, Cotra told me.
“The AI models are like a student that is not very bright but is superhumanly diligent, and so they haven’t just memorized 25 equations, they’ve memorized 500 equations, including ones for weird situations that could come up,” she said. They’re pairing a lot of memorization with a little bit of reasoning — that is, with figuring out what combination of equations to apply to a problem. “And that just takes you very far! They seem at first glance as impressive as the person with the deep intuitive understanding.”
Of course, when you look harder, you can still find holes that their 500 equations just happen not to cover. But that doesn’t mean zero reasoning has taken place.
...