I've used gpt4 as a subscriber and I run some models from huggingface locally on Linux with llama.cpp through the oobabooga interface. I feel like Gpt4 is smarter and can definitely hold a long conversation better than anything you'd be able to run locally. I've chatted with it about all sorts of topics on a 2 hour commute to work several times. The models that I can run locally (30b) are fun for a while, but then they will run out of context window or something and drift off topic or start repeating. Conversations with a local uncensored model do feel like they have more human responses at times with the right prompt and character for a while, though.. maybe it is the lack of guardrails or something. For me, I use openai Gpt4 for real consultation and inspiration, local models for fun. You can run a 30b gguf model at reasonable speeds locally on an ssd with an old Nvidia Tesla P40 off eBay if you don't want to spend a lot on VRAM/GPU power.. that's what I've been doing.. just read up on how to get them to work in consumer PCs (bios settings, power adapter and cooling).
10 sats \ 0 replies \ @o OP 15 Feb
This is precisely the kind of feedback I was looking to get. Thank you!
reply