pull down to refresh

I have experimental pipelines with fast-agent - though may switch frameworks or code something myself - locally that can do these things too. You simply pre-program the optimal model per agent, prompting and so on.
For example, in my current experimental setup I use a large qwen3 on leased compute for analysis or walking a pre-collected code graph through mcp and then use mistral 32b locally to code a prototype, call pylint through mcp and fix issues, and so on.
It works okay-ish if you define small enough actions and then just loop
110 sats \ 1 reply \ @freetx 15h
There is so much to learn.
I actually think eventually we are going to be able to self-host most of this. I think coding models will eventually top-out where their incremental usefulness starts slowing down and commodity hardware catches up (I've been watching the AMD AI MAX+ 395 setups).
Sure the top-end models will keep being impressive but eventually everything becomes a commodity....I mean in the early days of smart-phones it was practically a necessity to upgrade from iPhone 1 to 2 to 3 ... as each change was huge. Now a person could reasonably use an iPhone 10 even though its going on a decade....these eventually become "solved problems".
reply
21 sats \ 0 replies \ @optimism 15h
Agreed.
Out of principle, I don't use any LLM plan and have no middlemen in my setup. They shall not steal my data, they shall not mess with the output, and they shall definitely not know what I'm coding. Because fuck these guys. They aren't players with your best interest at heart.
So yeah: everything sovereign. I wish there was a larger version of llama3.2 or a distilled version of llama4 because the small models, despite nice and clean instruct, still hallucinate too much to do analysis and I can't run the big ones on an apple m4.
reply