(I'm mainly using token instead of compute to trigger you all, but also because its how gpu inference is billed)
assumptions: various AI models are becoming very useful and with the era of agents being on our doorstep there are few things on the horizon
main thesis:
- capital allocation game will change significantly, instead of giving 100s of startups preseed/seed checks you just buy gpus and allocate compute time to ideas
- anyone with access to cheap power and capital to build infrastructure now will be in a very advantageous position since compute is a lot more reusable than money paid out to developers
- its gonna be interesting to see how much of an edge running a large frontier model vs good enough model than can run on somewhat commodity hardware (lets say $10k budget) will be, I guess small good enough model with lots of unified memory for context could get decently far
- this has the potential to be an extremely centralizing force
- a lot of big bitcoin miners might be underpriced given that they are perfectly set up for this (tho to be fair many are already running hybrid loads)
- it could lead to interesting business models where people/companies could pitch in spare compute as investment into new ventures
thoughts?
venice
(because they let me pay one-off with sats) vs small models on a macbook for agents. Large models are better at task breakdown and code generation, but evenqwen3:4b
locally can do pretty amazing things if you instruct it correctly.task graph decomposition
, this is an example what a small model did locally (took forever) because of a modeling inefficiency in thejson
output I instructed it to give me:devstral
orcodellama
for actual operations.