pull down to refresh
You should be getting great performance there. I would switch to llama.cpp instead of ollama (this will make a huge difference), and use the model I mentioned (Qwen 3.6 35b A3B). I would expect you get over 100TPS.
What limitations did you see?
reply
Slow responses and inability to search the web unless you tell it to
reply
I just tried ollama with qwen code on my gaming pc with gtx 4090 24gb and was underwhelmed by the sluggishness and limitations. A local laptop has no chance. But I am spoiled by clode that just keeps evolving.