pull down to refresh

I found this to be the most useful video on DeepSeek that I've seen so far. I wanted to understand what the real innovation of DeepSeek was.
Some takeaways:
  • DeepSeek claims to have trained a Llama-sized model for only $5 million. <-- To me, this is still a suspect claim.
  • DeepSeek uses a Mixture of Experts and Distillation approach. These are essentially strategies for making the models smaller.
    • I'm not sure how much of an innovation this really is. both methods were already known
  • DeepSeek has made some mathematical innovations that speed up some computations
  • DeepSeek R1 is training Chain-of-Thought models using reinforcement learning.
    • Advantage here is that the thought-chains are not needed as part of the training data, only questions and answers. DeepSeek optimizes its chains-of-thought simply to maximize the rewards / minimize the punishments from the reinforcement learning. The model is rewarded for getting the right answer, with a small reward for explaining its chain of thought. This makes training more accessible for people without huge data centers & huge training datasets.
    • This seems to be one of the main innovations, because previously only OpenAI's o1 was working on chain-of-thought models. The claim is that DeepSeek makes training and inference of CoT models cheaper and more accessible.
I feel like I understand more why people are hyped about DeepSeek, but I'm not fully convinced I should believe in the hype. There are dozens of videos out there showing DeepSeek's supposed capabilities, but the incentives of YouTubers is to overhype stuff. They show all the cool things, but how many failed DeepSeek responses did they not show? How much of the training cost is not being properly reported?
In any case, the fact that DeepSeek released their code as open source is a good sign. It means that even if they're overhyped, people can quickly learn where the pros and cons are.
Is this bad for Nvidia? I don't think so. See Jevons Paradox.
Is this bad for OpenAI? Let's just say Sam Altman is probably wetting his pants
Is this bad for Nvidia? I don't think so. See Jevons Paradox.
Good observation.
DeepSeek is easier to run on a home PC, which may create demand for self-hosting and, consequently, Nvidia hardware.
reply
200 sats \ 3 replies \ @k00b 29 Jan
To me, this is still a suspect claim.
I've been reading any decent blog I can find on DeepSeek over the last few days, and much of the AI community believes the claims, and have seemingly reasoned to them being accurate (I can't really verify it myself). It also makes sense to me that these processes have a lot of room to be optimized.1
I found this blog very approachable.
I've come across a few bearish takes that I must not have bookmarked, where researchers were finding it lacking but still surprised at how good it was for the cost.
I also came across this bear case for nvidia today that I haven't read yet.

Footnotes

  1. I'm planning to write about this more but because I couldn't help mentioning it: Using their own data, and some unchecked AI math, it took 3 terajoules of energy to produce their sub-human intelligence using 2.8m H800 GPU hours. A human brain consumes about 56 gigajoules by the time it's 40yo (using more unchecked AI math). If we take the energy consumed by the human brain as the limit of what's minimally required to make a human intelligence, we have a lot more leaps in efficiency still on the table. Obviously this is ignoring the mammalian kernel that we're born with and the energy required by the environment we're trained on, but i still think it's directionally true.
reply
Damn, I still have a lot to learn. But that blog also confirms what I've been learning, that it's the Reinforcement Learning part that's the real big innovation.
As to the cost, the blog says this:
Some say DeepSeek simply digested OAI’s models and have replicated their intelligence, and the benefit of all that human-in-the-loop reinforcement, at a fraction of the cost.
You could argue OAI scraped the internet to make ChatGPT and now DeepSeek have scraped ChatGPT.
All is fair in love and AI, right?
So, the cost figure is definitely a lot more believable if you consider that it was built on top of previously trained models and weights. I think the way some people were talking about it, they made it seem like they just did it from scratch.
reply
30 sats \ 1 reply \ @k00b 29 Jan
I think the ChatGPT scraping is still unverified afaik. They might've used it to build the RL training data I guess.
In this blog that gets a lot more technical they note that the optimizations they made were obviously good targets for optimization in retrospect:
None of these improvements seem like they were found as a result of some brute-force search through possible ideas. Instead, they look like they were carefully devised by researchers who understood how a Transformer works and how its various architectural deficiencies can be addressed.
reply
Fascinating
reply
DeepSeek CEO presenting the HQ office.
for those that don't know, this is s scene from the series "Silicon Valley". It's exactly the same with deepseek crap.
reply
The thing it’s highlighted most to me: AI has unpredictable “halvings” events for gpu revenue and it will be hyper deflationary quickly as companies undercut / open source new developments.
Between innovation in hardware and software stacks shit is gonna get wild and there’s gonna be tons of unknown unknowns / black swan developments
reply
Hot take: LLMs are commodities... all the money thrown at developing them ends up being furnace fodder when someone else did it faster/better/smarter on cheaper hardware.
reply
I'm far from an expert in AI stuff. I understand it about as well as you'd expect someone with our econometrics training to.
One of the claims I heard about what they were doing, that made a lot of sense to me, is that they radically reduced the computational intensity by cutting way back on the precision of calculations.
Like k00b said, there's probably a lot of room for optimization of these processes.
reply
It would have been nice to have ai in my grad school days. It's actually a great tool for lit review and for explaining basic ideas that you have a general idea of but want more elaboration
reply