Interesting.

I think there's no GPU acceleration at all by default at least on this setup, so basically the entire speed is based on the CPU and the instructions it supports. The more modern the CPU, the better.

I've read that you can configure [llama](https://github.com/ggerganov/llama.cpp/pull/1827) to use the GPU to get much better results

nullama

tech

Setup your own private chatGPT

I tried it out. It's extremely slow. I sent a simple "Hello" prompt, and he took like 10 seconds to write me response back. My PC is somewhat decent (i7 and RTX 2000 with 16 GB RAM).