Event
PhD. Dissertation Defense: Nitesh Shroff
Monday, July 30, 2012
10:30 a.m.
Room 3450, AVW Bldg.
Maria Hoo
301 405 3681
mch@umd.edu
ANNOUNCEMENT: Ph.D. Dissertation Defense
Name: Nitesh Shroff
Date: Monday, 30th July, 2012 at 10:30 am
Location: Room 3450, A V Williams
Committee:
Professor Rama Chellappa, Chair/Advisor
Professor Larry Davis
Professor Min Wu
Professor Pavan Turaga
Professor David Jacobs, Dean's Representative
Title: Efficient Sensing, Summarization and Classification of Videos
Abstract:
In this dissertation, we address the role of motion in various aspects of video consumption: (a) sensing, (b) summarization, and (c) classification. We start by developing efficient sensing techniques, particularly in cases where computational vision is used for measurement inferring depth map and pose of scene elements. Towards this direction, we propose an architecture and algorithm that senses the video by varying focal settings between consecutive frames. By extending the paradigm of Depth-from-defocus (DFD) to dynamic scenes, we achieve the reconstruction of the depth video and all-focus video from the captured video. This is followed by devising a technique which under constrained scenarios allows us to take a step further and estimate the precise location and orientation of the objects in the scene. We show that by capturing a sequence of images, while moving the illumination source between two consecutive frames, we can extract specular features on the high-curvature metallic objects.
Next, we address the problem of concisely representing (summarizing) a large video where the goal is to gain a quick overview of the video while minimizing the loss of details. We propose and argue that this can be achieved by optimizing for the following two conflicting
criteria: (a) Coverage, and (b) Diversity. The objective is formulated as a subset selection problem first in the Euclidean space and then generalized to non-Euclidean manifolds. The generic non-Euclidean manifold formulation allows the algorithm to handle various computer-vision datasets like shapes, textures, linear dynamical systems, etc.
Finally, we turn our attention to classification of videos. Here, we begin with devising exact and approximate nearest neighbor techniques for fast retrieval of videos lying in non-Euclidean manifolds. We present a geodesic hashing technique which employs geodesic based functions to hash the data for realizing approximate but fast nearest neighbor retrieval. This is followed by another classification technique which focuses on generating content-based, particularly scene-based, annotations of videos. We focus on characterizing the motion of scene elements, and show that it not only provides fine-grained description of videos but also improves the classification accuracy.