pull down to refresh
200 sats \ 4 replies \ @south_korea_ln OP 4 Jul \ parent \ on: Hibiki Samples - real-time speech-to-speech AI
I just asked a friend. He recommended this one.
# clone the repo
git clone https://github.com/kyutai-labs/hibiki.git
# use the rust version
cd hibiki/hibiki-rs
# do questionable things to fetch the video, for science - don't try this at home
yt-dlp -t mp4 "https://www.youtube.com/watch?v=6ZWf4Jfd1sM" -o 6ZWf4Jfd1sM.mp4
# demux the audio (as mp3 encoded and mp3 container)
ffmpeg -i 6ZWf4Jfd1sM.mp4 -c:v none -c:a libmp3lame 6ZWf4Jfd1sM.mp3
# do the magic translation
# note: i used this on a mac, use --features cuda to run on an nvidia gpu instead
cargo run --features metal -r -- gen 6ZWf4Jfd1sM.mp3 out_en.wav
# remux the english audio in (as aac encoded)
ffmpeg -i 6ZWf4Jfd1sM.mp4 -i out_en.wav -c:v copy aac -map 0:v -map 1:a output.mp4
reply
Thanks for the tutorial.
Didn't come out too well... humor can be very subtle, human translators still have some edge.
Pretty cool though.