Video generation is currently a highly popular task. However, common video generation models rely solely on large-scale data training to implicitly simulate motion, which can sometimes result in inaccuracies. To address this problem, we propose utilizing explicit 3D guidance to guide the video generation process. Given a single image and the user’s motion demands, we model the object and motion in 3D, allowing for rendering from various viewpoints. Subsequently, we inject the motion prior from these rendered views into the video generation model in producing high-quality videos with the desired motion.
Zoom link: https://cam-ac-uk.zoom.us/j/89792055791?pwd=OTYyQXdnUXZMcnU4VmZWL2p4dUhZdz09