I've just spent ten minutes looking for software that does voice anonymization and have not found anything that is obviously good or convenient. My desired use case is for recording a podcast.
Option 1: the acceptable but non-ideal solution would be to isolate the audio channels, feed the recording from an isolated channel into the software, and get a transformed voice out, which I could then mix. This would preclude real-time privacy, but would allow me to release a podcast without doxxing the guests.
Option 2: an ideal use case would something that worked in real time -- you apply something akin to a filter, so the voice is transformed as the person talks, and it appears in the podcast transformed, so that's what the mixing board 'hears.'
It seems like at least option 1 should exist in a reasonably convenient form. All I've found so far are toy apps (you can transform your voice so it sounds like you've just inhaled a bunch of helium, for instance, or into a chipmunk voice) or super complicated research things that I'd have to figure out how to compile and run. In a pinch I can do this, but surely the world has advanced to a place where a better solution exists?
Also: it's important that whatever the anonymized voice is, it doesn't sound like you're a psychopath hostage-taker from the 1970s. I want a voice that sounds like a real voice, just not the actual person. Anyone have any pointers?
Not being able to find this pissed me off.
It looks like there a few companies like ElevenLabs that offer the best off-shelf option 2 and they recommend the non-convenient RVC for option 1.
Thanks for looking into it! We can be pissed off together.
It's better to be pissed off than pissed on. Trust me, I know.
You can use a ML solution like voice-changer
audio-webui, or go with simpler effects with lyrebird
I've seen lyrebird for this use case in the past, but I've never used it. I'd like to hear from someone who has.
Did you see VoiceMod?
I'm not sure if they have normal voices though. Two year old video of many of the voices:
view on www.youtube.comYeah, I saw that, and it would be awesome, except all the voices are stupid. Can you imagine listening to those things for an hour? It's like wacky morning zoo DJs. Ugh.
They are definitely targeting gamers with the presets, but I'd guess their software can produce pretty good real sounding voices too.
deleted by author
In the music world there are vocoders that place your singing back into key. This could make a pretty colorful podcast. Imagine a bot having a conversation with another bot. The listener might not last very long.
I bet there would be some plebs happy to read out a script of the real pod for a few sats. Might lack some of the "real" feel of the conversation though.
That's a fun idea, but as you guessed, that feel is important.
Pleb actors?
Might be fun to try. Possibly overly dramatic
@thebitcoinbugle's podcast hosts seem to have voice mods in their first episode (unless that's just how world class journalists sound from smoking their RDA of cigarettes).
That's a rude way to say we sound funny.
What's a nice way to say you sound funny?
I'm not British
Neither am I
Would a workaround for Option 1 be to use transcription software on the original interview, then run the transcript through a good text-to-speech program? I'm not sure if there are ones that will hit all the nuanced beats, but at least the voice itself would sound mostly human.
I think that would work to communicate stuff, and w/ the progress of the deep learning models the synthesis would probably be good; but it would be hard to synchronize with multiple people, I would think. I really want it to capture the warmth and interaction of real conversation, but now I'm kind of curious how your idea would feel. Will update if I try it.
Yeah, the warmth/flow is big. I've seen some shows use voice actors (especially with translations), but that obviously is costly and also not fast.
This seems like one of those things where one of the million generative AI projects should be able to step in, but it doesn't seem like that's an option.
redacted? Seems pricey and odd that they want to train data on your machine but their voices do sound more real.Seems other people have identified it as a scam.
I feel like Batman solved this problem decades ago.
You're right, but a pre-req was that I didn't want the psychopath voice.
Although it does paint a suggestive picture of an all-Batman podcast future.
Just keep a supply of helium balloons on hand.
Wow thanks for sharing, I’m super bullish on this technology. Many great use cases!
deleted by author