Professor David W. Jacobs, Chair
Professor Behtash Babadi
Professor Thomas Goldstein
Professor Abhinav Shrivastava
Dr. Carlos Castillo
Professor Rama Chellappa, Dean's Respresentative
Inverse Rendering deals with recovering the underlying intrinsic components of an image, i.e. geometry, reflectance, illumination and the camera with which the image was captured. Inferring these intrinsic components of an image is a fundamental problem in Computer Vision. Solving Inverse Rendering unlocks a host of real-world applications in Augmented and Virtual Reality, Robotics, Computational Photography, and gaming. Researchers have made significant progress in solving Inverse Rendering from a large number of images of an object or a scene under relatively constrained settings. However, most real-life applications rely on a single or a small number of images captured in an unconstrained environment.
In this thesis, we consider two different approaches to solving Inverse Rendering under limited observations from unconstrained images. First, we consider learning data-driven priors that can be used for Inverse Rendering from a single image. Second, we consider enforcing low-rank multi-view constraints in an optimization framework to enable Inverse Rendering from a few images. In this talk, we focus on the first approach, i.e. learning data-driven priors for Inverse Rendering from a single image. We specifically describe our recent works on Inverse Rendering of a face and a scene from a single image.
Our goal is to jointly learn all intrinsic components of an image, such that we can recombine them and train on unlabeled real data using self-supervised reconstruction loss. A key component that enables self-supervision is a differentiable rendering module that can combine the intrinsic components to accurately regenerate the image. We show how such a self-supervised reconstruction loss can be used for Inverse Rendering of faces. While this is relatively straightforward for faces, complex appearance effects (e.g. inter-reflections, cast-shadows, and near-field lighting) present in a scene can't be captured with a differentiable rendering module. Thus we also propose a deep CNN based differentiable rendering module (Residual Appearance Renderer) that can capture these complex appearance effects and enable self-supervised learning. Another contribution is a novel Inverse Rendering architecture, SfSNet, that performs Inverse Rendering for faces and scenes. We also introduce a large scale labeled synthetic dataset of scenes and faces with physically based rendering. Experimental results show that our approach outperforms state-of-the-art methods for faces and scenes, especially on real images.