pull down to refresh

Ah I hadn't heard of it which shows how unfamiliar am I with these tools.

147 sats \ 4 replies \ @optimism 2h

It's probably not as fast as the one you linked, but the vector db is not the bottleneck for me as most of the text embedding models are super slow for me and I'm not entirely sure why yet.

This morning I used Chroma for embedding audio. With a cheap, old tokenizer, but just to see if that actually works (it does, because apparently maffs don't give a shit if a float comes from text, pictures, audio)

reply
147 sats \ 3 replies \ @k00b OP 2h

afaik if you're running the embedding model on a GPU, or quantized on a CPU, it shouldn't be super slow. But I also haven't run much of this stuff locally yet.

reply
147 sats \ 2 replies \ @optimism 2h

I've been running it on Apple Metal - torch says it is using the NPU, but the Apple part is probably why it is such a mess.

reply
147 sats \ 1 reply \ @k00b OP 1h

We were only scratching the surface when I was in college, but everyone imagined inference would be much cheaper/more efficient than it ended up being.

If bigger=smarter forever, edge inference will always be relatively slow/dumb.

reply
147 sats \ 0 replies \ @optimism 1h

Like with all things, that extrapolation of the upslope fails to consider that fun isn't infinite (I hate this fact of life.) So there's a time when bigger=smarter, and there is a time when the diminishing returns on how much smarter you get for your bigger, and at that equilibrium, suddenly smarter=smarter.

We'll get there.

reply