I tried it out. It's extremely slow. I sent a simple "Hello" prompt, and he took like 10 seconds to write me response back. My PC is somewhat decent (i7 and RTX 2000 with 16 GB RAM).
Interesting.
I think there's no GPU acceleration at all by default at least on this setup, so basically the entire speed is based on the CPU and the instructions it supports. The more modern the CPU, the better.
I've read that you can configure llama to use the GPU to get much better results
reply