pull down to refresh
You should be getting great performance there. I would switch to llama.cpp instead of ollama (this will make a huge difference), and use the model I mentioned (Qwen 3.6 35b A3B). I would expect you get over 100TPS.
What limitations did you see?
Slow responses and inability to search the web unless you tell it to
You should be getting great performance there. I would switch to llama.cpp instead of ollama (this will make a huge difference), and use the model I mentioned (Qwen 3.6 35b A3B). I would expect you get over 100TPS.
What limitations did you see?