reply on: How Is LLaMa.cpp Possible? \ stacker news

pull down to refresh

I'm getting around 5 token/sec with i7 + 16GB RAM + RTX 2000 using LLaMa 7B.
It's not fast enough for me to consider it usable.