Ph.D. Defense: Jun Wang

Monday, November 28, 2022
9:30 a.m.
3137 IRB Join Zoom Meeting:
Maria Hoo
301 405 3681


NAME: Jun Wang

Professor Joseph F. JaJa (Chair)
Professor Larry S. Davis (Co-Chair)
Professor Min Wu
Professor Furong Huang
Professor Yang Tao (Dean's Representative)

Date/time: Monday, November 28, 2022, 9:30am-11:30am EST

Location: 3137 IRB

Join Zoom Meeting:

Title: Deep Learning for Scene Perception and Understanding

The ability to accurately perceive objects and capture motion information from the environment is crucial in many real-world applications, including autonomous driving, augmented reality, and robotics. In this dissertation, we will give an overview of our recent work on scene perception and understanding.

The point cloud data has been widely used in scene perception tasks. We propose three approaches to improve the efficiency and accuracy from different perspectives. First, to address the varying density problem of 3D point clouds, we introduce InfoFocus, which improves the accuracy of 3D object detection with little overhead by forcing the network to attend to the most informative part of the point cloud. Second, to narrow different feature representations gap, we introduce M3DETR, which models the point cloud by using transformers to fuse multi-representation, multi-scale, and mutual-relation features. Third, to understand dynamic 3D environments and identify motion information of objects, we further propose PointMotionNet, which handles 3D motion learning with a novel point-based spatiotemporal convolution operation module.

Besides accurately classifying, locating objects, and predicting their behaviors, the scene is always text-rich scenarios, which provides useful contextual information and can further help the perception. For example, to safely navigate through complex traffic scenarios, an autonomous system needs to understand traffic rules of the road, such as spotting traffic signals or temporary road signs. We introduce TAG, which exploits underexplored scene text information and enhances scene understanding of Text-VQA models by producing meaningful, and accurate question-answer (QA) samples using a multimodal transformer. TAG has the potential to be applied to identify challenging traffic situations that the autonomous vehicles will encounter on roads.


Audience: Graduate  Faculty 

remind we with google calendar


September 2023

27 28 29 30 31 1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
1 2 3 4 5 6 7
Submit an Event