For centuries, unraveling the mysteries of nature through the lens of physics has captivated countless scientists. Today, generative AI models excel at reproducing visual worlds in pixels, but still struggle with basic physical concepts such as 3D shape, motion, material, and lighting---key elements that connect computer vision to a wide range of real-world engineering applications, including interactive VR, robotics, biology, and medical analysis. The main challenge arises from the difficulty of collecting large-scale physical measurements for training machine learning models.
In this talk, I will discuss an alternative unsupervised approach based on inverse rendering, which enables machine learning models to learn explicit physical representations from raw, unstructured image data, such as Internet photos and videos. This approach thus circumvents the need for any direct supervision, allowing us to model a wide variety of 3D objects in nature, including diverse wildlife, using only casually recorded imagery. The resulting model can generate physically-grounded 3D assets with controllable animations instantly, ready for downstream rendering and analysis. The papers presented can be found at: https://elliottwu.com/.
Link to join virtually: https://cam-ac-uk.zoom.us/j/87421957265
This talk is being recorded. If you do not wish to be seen in the recording, please avoid sitting in the front three rows of seats in the lecture theatre. Any questions asked will also be included in the recording. The recording will be made available on the Department’s webpage