reply on: DeepSeek is super censored (My Experience) \ stacker news ~tech

pull down to refresh

14 sats \ 8 replies \ @freetx 27 Jan \ on: DeepSeek is super censored (My Experience) tech

markets crashed, people are saying Nvidia is doomed

Imagine you ran factory that required 100MW of power to churn out 1000 widgets per day, but then a breakthru came out that only required your factory to only use 1MW of power. You would have 2 choices: (a) churn out 1000 widgets and use only 1MW of power.....or churn out 100,000 widgets using the same 100MW power budget.

My point is that: Yes DeepSeek requires only a fraction of the computing power that traditional AI used....however those sitting with all that "over-provisioned" datacenters will now be able to scale up even bigger models, with more parameters.

I suspect this will dawn on the market over the next few weeks.

0 sats \ 5 replies \ @Bell_curve 27 Jan

9 sats \ 1 reply \ @Catcher OP 27 Jan

I actually didn't think that only huge tech can play with it, look at the venice.ai or unleashed.chat. Or you cannot compare them cause they kinda use models that were already developed by big guys?

30 sats \ 0 replies \ @freetx 27 Jan

The problem is the training.

Its was possible to run bigger models at home given a $~10k investment, but almost impossible to train large models with hundreds of millions to billions of parameters.

0 sats \ 0 replies \ @nitter 27 Jan

https://xcancel.com/morganb/status/1883686179276788197

0 sats \ 0 replies \ @nitter 27 Jan

https://xcancel.com/morganb/status/1883686179276788197

0 sats \ 0 replies \ @nitter 27 Jan

https://xcancel.com/morganb/status/1883686179276788197

0 sats \ 1 reply \ @Catcher OP 27 Jan

Question is - what is the source / prove that DeepSeek in fact used only fraction of what OpenAI used?

30 sats \ 0 replies \ @freetx 27 Jan

From the avrix (sp?) paper that they released. I was skeptical of this, but evidently its completely open-sourced and thus verifiable.

People on twitter have said most of this "breakthru" was common sense tuning optimizations that resulted in less memory use, with a slight uptick in error rate, but the optimizations were scaled so that the increase in error rates didn't spike significantly.

Basically: Anyone with a constrained hardware budget would've eventually taken this approach.