Yes. Its getting _much_ easier. You can easily spin up a back end now without much tech knowledge. You don't even need a GPU (although that speeds things up significantly).

If you're interested, look into Ollama or LM Studio. They provide APIs to interface with the LLMs. Then there are a bunch of clients you can install and point to this endpoint. 

Even for self hosting like this guys doing?

tech

Making my local LLM voice assistant faster and more scalable with RAG

Probably much sooner than you think. AI is moving pretty quick.