pull down to refresh

(in case you just want the github link without reading the hallucinations of my neural network - here you go)

I hate voice memos. Mostly I hate to receive them. You can't quickly glance over to remember the content of the conversation, the key piece of information you need is hidden in a 3minute voice note. But I recently learned that talking to machines is quite practical.

I even tweeted (its forever gonna be tweeting) about talking to an LLM while pooping. Not that I haven't used the feature before a few times but I never got the hang of it because I like to brain dump but then I want to read the reply (my brain works funny).
I've even created a bot a while back that would transcribe signal voice memos that I braindumped to it, but it didn't stick, Until I've come across hyprwhspr about a week or two ago. I immediately liked the idea, I've done several whisper projects in the past, whisper-rs-cli just this week so I dug into it. I wanted it as lean as possible so I first created a stripped down fork of it and fitted it to my fedora/ubuntu setup. But it was somewhat slowish in python and I've just came out of doing whisper-rs-cli so the decision was obvious. I need a rust version. Given that I had a working python version that I liked I thought its gonna be a slightly more straightforward path. It wasn't. Specially when I went down the path of "lets add real time typing into any app just for giggles" path. Since I've done some real(ish) time transcription api experiments before I thought that will also just be an extra half an hour.

Narrator: it wasn't.

In the end I realized that I don't really WANT that functionality, I just thought it will look cooler in the README so saner thoughts prevailed, I pushed the code to a branch in case I want to procrastinate at some later time and viola, we're back at actually working version of the thing that I wanted. Which is whisper-talk.

Pure simplicity:

  • press SUPER+ALT+d to start recording
  • braindump everything you want
  • press SUPER+ALT+d to stop recording and transcribe
  • almost instantly the text is already in your clipboard (if you're using gpu) so you can paste it wherever you want

I've been using the python version of it for couple days and it became an almost natural part of my workflow already. I mostly use it for writing prompts, and given how much of them I write my fingers are very happy about it.

originally posted on disobey.dev

21 sats \ 0 replies \ @Scoresby 2h

Why do you prefer talking to typing? Is it because multitasking?

Do you find yourself reading over these transcriptions very much? I'd think I'd braindump and then rarely go back around to read the dump.

Finally, do you find that it is changing how you speak? Especially when speaking with the intention of talking to whisper-talk?

reply