reply on: Hibiki Samples - real-time speech-to-speech \ stacker news ~AI

pull down to refresh

200 sats \ 4 replies \ @south_korea_ln OP 4 Jul \ parent \ on: Hibiki Samples - real-time speech-to-speech AI

I just asked a friend. He recommended this one.

0 new comment

# clone the repo
git clone https://github.com/kyutai-labs/hibiki.git

# use the rust version
cd hibiki/hibiki-rs 

# do questionable things to fetch the video, for science - don't try this at home
yt-dlp -t mp4 "https://www.youtube.com/watch?v=6ZWf4Jfd1sM" -o 6ZWf4Jfd1sM.mp4

# demux the audio (as mp3 encoded and mp3 container)
ffmpeg -i 6ZWf4Jfd1sM.mp4 -c:v none -c:a libmp3lame 6ZWf4Jfd1sM.mp3

# do the magic translation
# note: i used this on a mac, use --features cuda to run on an nvidia gpu instead
cargo run  --features metal -r -- gen 6ZWf4Jfd1sM.mp3 out_en.wav

# remux the english audio in (as aac encoded)
ffmpeg -i 6ZWf4Jfd1sM.mp4 -i out_en.wav -c:v copy  aac -map 0:v -map 1:a output.mp4

0 new comment

97 sats \ 1 reply \ @south_korea_ln OP 4 Jul

Thanks for the tutorial.

Didn't come out too well... humor can be very subtle, human translators still have some edge.

Pretty cool though.

0 new comment

0 sats \ 0 replies \ @optimism 4 Jul

We can only learn what to improve through finding what doesn't work :-)

0 new comment

0 sats \ 0 replies \ @optimism 4 Jul

Note: the last line should read

ffmpeg -i 6ZWf4Jfd1sM.mp4 -i out_en.wav -c:v copy -c:a aac -map 0:v -map 1:a output.mp4

not sure how i messed that up, but i did.

0 new comment