Here's an article you might have come across in the last couple days. It's about how a group came up with a way to train quite large LLMs (70b params) on equipment that a normal person could have in her basement. It turns out there was like a 10x price / performance gain possible but that nobody was incentivized to do it:
Solving this problem is hard. It requires understanding many separate libraries (e.g bitsandbytes, PEFT, Transformers, Accelerate, and PyTorch), and computer science and math concepts (e.g discretization, distributed computing, GPU programming, linear algebra, SGD concepts such as gradient checkpointing), and how they all interact.
So an eclectic set of skills requiring deep education and high intelligence. But the world is full of those problems and those kinds of people! And yet something about this was different:
Academia is full of brilliant people that solve hard problems. But academia hasn’t solved this particular problem. That’s because it’s difficult for university researchers to justify spending time on this kind of work. Combining existing tools and techniques together isn’t generally considered “novel” enough to result in publication in a high impact journal, but that’s the currency that academics need. Furthermore, academics are generally expected to become highly specialized within their field, making it challenging to bring together so many pieces into a single solution.
And, of course, big tech companies are also full of brilliant people that solve hard problems. But this particular problem, training models with consumer GPUs, isn’t a problem they need to solve – they’ve already bought the big expensive GPUs!
Many startups are also full of brilliant people that solve hard problems! But, as Eric Ries explains, “today’s financial market forces businesses to prioritize short-term gains over everything else”. It’s extremely hard for a startup to justify to investors why they’re spending their funds on open source software and public research.
So essentially, there was a problem, it was worth solving, it would be useful to lots of people to have it solved, but the nature of the problem, and the coordination required to solve it, and the system of incentives operating across the various groups that would have to coordinate to solve a problem like this, did not align. The problem dropped between the outfielders of the existing structure.
This happens all the time, of course. It's just interesting to see it happen about this topic, right now, and for these particular reasons. It makes you ask the obvious question of what other important yet solveable problems are embedded in an ecosystem that can't solve them, even though it would be good for everyone if it did.
And the next obvious question, of how one might introduce a meta-system atop the existing system that could do better.