Slow responses and inability to search the web unless you tell it to

SwapMarket

You should be getting great performance there. I would switch to llama.cpp instead of ollama (this will make a huge difference), and use the model I mentioned (Qwen 3.6 35b A3B). I would expect you get over 100TPS.

A Sovereign Brain on a Laptop: Local LLM + Pi Agent + Markdown

rolznz

You should be getting great performance there. I would switch to llama.cpp instead of ollama (this will make a huge difference), and use the model I mentioned (Qwen 3.6 35b A3B). I would expect you get over 100TPS.

What limitations did you see?