pull down to refresh

A model that supports multi-shot video generation from both text and image. It achieves breakthroughs in semantic understanding and prompt following, and can create 1080p videos with smooth motion, rich details, and cinematic aesthetics.