I've just spent ten minutes looking for software that does voice anonymization and have not found anything that is obviously good or convenient. My desired use case is for recording a podcast.
Option 1: the acceptable but non-ideal solution would be to isolate the audio channels, feed the recording from an isolated channel into the software, and get a transformed voice out, which I could then mix. This would preclude real-time privacy, but would allow me to release a podcast without doxxing the guests.
Option 2: an ideal use case would something that worked in real time -- you apply something akin to a filter, so the voice is transformed as the person talks, and it appears in the podcast transformed, so that's what the mixing board 'hears.'
It seems like at least option 1 should exist in a reasonably convenient form. All I've found so far are toy apps (you can transform your voice so it sounds like you've just inhaled a bunch of helium, for instance, or into a chipmunk voice) or super complicated research things that I'd have to figure out how to compile and run. In a pinch I can do this, but surely the world has advanced to a place where a better solution exists?
Also: it's important that whatever the anonymized voice is, it doesn't sound like you're a psychopath hostage-taker from the 1970s. I want a voice that sounds like a real voice, just not the actual person. Anyone have any pointers?
You can use a ML solution like voice-changer audio-webui, or go with simpler effects with lyrebird
reply
I've seen lyrebird for this use case in the past, but I've never used it. I'd like to hear from someone who has.
reply
219 sats \ 2 replies \ @k00b 30 Mar
Not being able to find this pissed me off.
It looks like there a few companies like ElevenLabs that offer the best off-shelf option 2 and they recommend the non-convenient RVC for option 1.
reply
Thanks for looking into it! We can be pissed off together.
reply
It's better to be pissed off than pissed on. Trust me, I know.
reply
@thebitcoinbugle's podcast hosts seem to have voice mods in their first episode (unless that's just how world class journalists sound from smoking their RDA of cigarettes).
reply
That's a rude way to say we sound funny.
reply
What's a nice way to say you sound funny?
reply
I'm not British
reply
Neither am I
reply
100 sats \ 2 replies \ @OT 30 Mar
In the music world there are vocoders that place your singing back into key. This could make a pretty colorful podcast. Imagine a bot having a conversation with another bot. The listener might not last very long.
I bet there would be some plebs happy to read out a script of the real pod for a few sats. Might lack some of the "real" feel of the conversation though.
reply
I bet there would be some plebs happy to read out a script of the real pod for a few sats. Might lack some of the "real" feel of the conversation though.
That's a fun idea, but as you guessed, that feel is important.
reply
27 sats \ 0 replies \ @OT 30 Mar
Pleb actors?
Might be fun to try. Possibly overly dramatic
reply
100 sats \ 4 replies \ @k00b 30 Mar
Did you see VoiceMod?
reply
deleted by author
reply
I'm not sure if they have normal voices though. Two year old video of many of the voices:
reply
Yeah, I saw that, and it would be awesome, except all the voices are stupid. Can you imagine listening to those things for an hour? It's like wacky morning zoo DJs. Ugh.
reply
They are definitely targeting gamers with the presets, but I'd guess their software can produce pretty good real sounding voices too.
reply
Just keep a supply of helium balloons on hand.
reply
I feel like Batman solved this problem decades ago.
reply
You're right, but a pre-req was that I didn't want the psychopath voice.
Although it does paint a suggestive picture of an all-Batman podcast future.
reply
deleted by author
reply
Wow thanks for sharing, I’m super bullish on this technology. Many great use cases!
reply
redacted? Seems pricey and odd that they want to train data on your machine but their voices do sound more real.
Seems other people have identified it as a scam.
reply
47 sats \ 1 reply \ @k00b 30 Mar
Okay, I'm surprised it's this hard to find something not targeted at gamers and are more on the funny side, but it makes sense as the artificial voice consumer market probably isn't booming outside of gamers.
reply
Right?
It's one of those weird things. You're right that market demand is probably not super high, but even still. Just a giant hole for something where the tech is just sitting there.
reply
Would a workaround for Option 1 be to use transcription software on the original interview, then run the transcript through a good text-to-speech program? I'm not sure if there are ones that will hit all the nuanced beats, but at least the voice itself would sound mostly human.
reply
I think that would work to communicate stuff, and w/ the progress of the deep learning models the synthesis would probably be good; but it would be hard to synchronize with multiple people, I would think. I really want it to capture the warmth and interaction of real conversation, but now I'm kind of curious how your idea would feel. Will update if I try it.
reply
Yeah, the warmth/flow is big. I've seen some shows use voice actors (especially with translations), but that obviously is costly and also not fast.
This seems like one of those things where one of the million generative AI projects should be able to step in, but it doesn't seem like that's an option.
reply