pull down to refresh

To me, this is still a suspect claim.
I've been reading any decent blog I can find on DeepSeek over the last few days, and much of the AI community believes the claims, and have seemingly reasoned to them being accurate (I can't really verify it myself). It also makes sense to me that these processes have a lot of room to be optimized.1
I found this blog very approachable.
I've come across a few bearish takes that I must not have bookmarked, where researchers were finding it lacking but still surprised at how good it was for the cost.
I also came across this bear case for nvidia today that I haven't read yet.

Footnotes

  1. I'm planning to write about this more but because I couldn't help mentioning it: Using their own data, and some unchecked AI math, it took 3 terajoules of energy to produce their sub-human intelligence using 2.8m H800 GPU hours. A human brain consumes about 56 gigajoules by the time it's 40yo (using more unchecked AI math). If we take the energy consumed by the human brain as the limit of what's minimally required to make a human intelligence, we have a lot more leaps in efficiency still on the table. Obviously this is ignoring the mammalian kernel that we're born with and the energy required by the environment we're trained on, but i still think it's directionally true.
Damn, I still have a lot to learn. But that blog also confirms what I've been learning, that it's the Reinforcement Learning part that's the real big innovation.
As to the cost, the blog says this:
Some say DeepSeek simply digested OAI’s models and have replicated their intelligence, and the benefit of all that human-in-the-loop reinforcement, at a fraction of the cost.
You could argue OAI scraped the internet to make ChatGPT and now DeepSeek have scraped ChatGPT.
All is fair in love and AI, right?
So, the cost figure is definitely a lot more believable if you consider that it was built on top of previously trained models and weights. I think the way some people were talking about it, they made it seem like they just did it from scratch.
reply
30 sats \ 1 reply \ @k00b 29 Jan
I think the ChatGPT scraping is still unverified afaik. They might've used it to build the RL training data I guess.
In this blog that gets a lot more technical they note that the optimizations they made were obviously good targets for optimization in retrospect:
None of these improvements seem like they were found as a result of some brute-force search through possible ideas. Instead, they look like they were carefully devised by researchers who understood how a Transformer works and how its various architectural deficiencies can be addressed.
reply
Fascinating
reply