PhD Dissertation Defense: Xianzhi Du

Wednesday, December 13, 2017
11:00 a.m.
3450 AVW
Melanie Prange
301 405 3686
mprange@umd.edu

Ph.D. Dissertation Defense: Xianzhi Du
Wednesday, Dec. 13, 2017, 11am
Room 3450 AVW

Committee members:
Dr. Larry Davis, Chair/Advisor
Dr. Rama Chellappa, Dean's representative
Dr. Min Wu
Dr. David Jacobs
Dr. Ramani Duraiswami

Title: Computer vision and deep learning with applications to object detection, segmentation, and document analysis

Abstract:
I’ve been working on computer vision and deep learning with applications to object detection, segmentation, and document analysis.

There are three work on deep learning for object detection and segmentation. In the first work, we propose a deep neural network fusion architecture for fast and robust pedestrian detection. The proposed network fusion architecture allows for parallel processing of multiple networks for speed. A single shot deep convolutional network is trained as an object detector to generate all possible pedestrian candidates of different sizes and occlusions. This network outputs a large variety of pedestrian candidates to cover majority of ground-truth pedestrians while also introducing a large number of false positives. Next, multiple deep neural networks are used in parallel for further refinement of these pedestrian candidates. We introduce a soft-rejection based network fusion method to fuse the soft metrics from all networks together to generate the final confidence scores. Our method performs better than existing state-of-the-arts, especially when detecting small-size and occluded pedestrians. Furthermore, we propose a method for integrating pixel-wise semantic segmentation network into the network fusion architecture as a reinforcement to the pedestrian detector. The approach outperforms state-of-the-art methods on most protocols on Caltech Pedestrian dataset, with significant boosts on several protocols. It is also faster than all other methods. In the second work, in addition to the first work, a fusion network is trained to fuse the multiple classification networks. Furthermore, a novel soft-label method is devised to assign floating point labels to the pedestrian candidates. This metric for each candidate detection is derived from the percentage of overlap of its bounding box with those of other ground truth classes. This work is evaluated on two more popular pedestrian detection datasets and achieved the best performance. In the third work, we propose a boundary-sensitive deep neural network architecture for portrait segmentation. A residual network and atrous convolution based framework is trained as the base portrait segmentation network. To better solve boundary segmentation, three techniques are introduced. First, an individual boundary-sensitive kernel is introduced by labeling the boundary pixels as a separate class and using the soft-label strategy to assign floating-point label vectors to pixels in the boundary class. Each pixel contributes to multiple classes when updating loss based on its relative position to the contour. Second, a global boundary-sensitive kernel is used when updating loss function to assign different weights to pixel locations on one image to constrain the global shape of the resulted segmentation map. Third, we add multiple binary classifiers to classify boundary-sensitive portrait attributes, so as to refine the learning process of our model.

There are three work on signature matching for document analysis. In the first work, we propose a large-scale signature matching method based on locality sensitive hashing (LSH). Shape Context features are used to describe the structure of signatures. Two stages of hashing are performed to find the nearest neighbors for query signatures. In the first stage, we use M randomly generated hyperplanes to separate shape context feature points into different bins, and compute a term frequency histogram to represent the feature point distribution as a feature vector. In the second stage, we again use LSH to categorize the high-level features into different classes. We show that our algorithm can achieve a high accuracy even when few signatures are collected from one same person and perform fast matching when dealing with a large dataset. In the second work, we present a novel signature matching method based on supervised topic models. Shape Context features are extracted from signature shape contours which capture the local variations in signature properties. We then use the concept of topic models to learn the shape context features which correspond to individual authors. The approach consists of three primary steps. First, K-means is used to cluster shape context features to form term frequency histograms which correspond to a vocabulary for the set of signatures in the gallery. Second, a supervised topic model is used to construct an observation/author correspondence. Finally, the correspondence is used to classify query signatures and return the corresponding author. We demonstrate considerable improvement over state of the art methods. In the third work, we present a partial signature matching method using graphical models. In additional to the second work, modified shape context features are extracted from the contour of signatures to describe both full and partial signatures. Hierarchical Dirichlet processes are implemented to infer the number of salient regions needed. The results show the effectiveness of the approach for both the partial and full signature matching.

Audience: Public Graduate Faculty

Browse All Events

July 2025

SU	MO	TU	WE	TH	FR	SA
29	30	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31	1	2

Submit an Event