TL;DR: single portrait photo + speech audio = hyper-realistic talking face video with precise lip-audio sync, lifelike facial behavior, and naturalistic head movements, generated in real time.

tech

> Microsoft Research Asia
> TL;DR: single portrait photo + speech audio = hyper-realistic talking face video with precise lip-audio sync, lifelike facial behavior, and naturalistic head movements, generated in real time.

![g345.jpg](https://m.stacker.news/27243)

### [... read more](https://www.microsoft.com/en-us/research/project/vasa-1/)

... read more... read more

... read more ... read more