pull down to refresh

You should be getting great performance there. I would switch to llama.cpp instead of ollama (this will make a huge difference), and use the model I mentioned (Qwen 3.6 35b A3B). I would expect you get over 100TPS.

What limitations did you see?

Slow responses and inability to search the web unless you tell it to

reply