pull down to refresh

I'm getting around 5 token/sec with i7 + 16GB RAM + RTX 2000 using LLaMa 7B. It's not fast enough for me to consider it usable.