A model that supports multi-shot video generation from both text and image. It achieves breakthroughs in semantic understanding and prompt following, and can create 1080p videos with smooth motion, rich details, and cinematic aesthetics.
pull down to refresh