It's probably not as fast as the one you linked, but the vector db is not the bottleneck for me as most of the text embedding models are super slow for me and I'm not entirely sure why yet.
This morning I used Chroma for embedding audio. With a cheap, old tokenizer, but just to see if that actually works (it does, because apparently maffs don't give a shit if a float comes from text, pictures, audio)
afaik if you're running the embedding model on a GPU, or quantized on a CPU, it shouldn't be super slow. But I also haven't run much of this stuff locally yet.
We were only scratching the surface when I was in college, but everyone imagined inference would be much cheaper/more efficient than it ended up being.
If bigger=smarter forever, edge inference will always be relatively slow/dumb.
Like with all things, that extrapolation of the upslope fails to consider that fun isn't infinite (I hate this fact of life.) So there's a time when bigger=smarter, and there is a time when the diminishing returns on how much smarter you get for your bigger, and at that equilibrium, suddenly smarter=smarter.
I've been using Chroma for this.
Ah I hadn't heard of it which shows how unfamiliar am I with these tools.
It's probably not as fast as the one you linked, but the vector db is not the bottleneck for me as most of the text embedding models are super slow for me and I'm not entirely sure why yet.
This morning I used Chroma for embedding audio. With a cheap, old tokenizer, but just to see if that actually works (it does, because apparently maffs don't give a shit if a float comes from text, pictures, audio)
afaik if you're running the embedding model on a GPU, or quantized on a CPU, it shouldn't be super slow. But I also haven't run much of this stuff locally yet.
I've been running it on Apple Metal - torch says it is using the NPU, but the Apple part is probably why it is such a mess.
We were only scratching the surface when I was in college, but everyone imagined inference would be much cheaper/more efficient than it ended up being.
If bigger=smarter forever, edge inference will always be relatively slow/dumb.
Like with all things, that extrapolation of the upslope fails to consider that fun isn't infinite (I hate this fact of life.) So there's a time when bigger=smarter, and there is a time when the diminishing returns on how much smarter you get for your bigger, and at that equilibrium, suddenly smarter=smarter.
We'll get there.