Ph.D Dissertation Defense: Trevine Oorloff

Tuesday, December 3, 2024
9:00 a.m.-11:00 a.m.
AVW 1146
Maria Hoo
301 405 3681
mch@umd.edu

ANNOUNCEMENT: Ph.D Dissertation Defense

Name: Trevine Oorloff

Committee:

Prof. Abhinav Shrivastava, Chair/ Advisor

Dr. Yaser Yacoob, Co-Chair/ Advisor

Prof. Dinesh Manocha
Prof. Sanghamitra Dutta
Prof. Nikhil Chopra, Dean's Representative

Date/Time: Tuesday, December 3rd, 2024 at 9:00 AM - 11:00 AM

Location: AVW 1146

Title: Latent Space Explorations for Generative AI

Abstract:

Generative AI has revolutionized content creation through models such as Generative Adversarial Networks (GANs) and diffusion models, which produce high-quality, realistic outputs across various domains. These advancements rely on the ability of generative models to learn and encode complex patterns and semantic relationships within high-dimensional latent spaces, which serve as a foundation for their capacity to generate coherent and diverse outputs. Beyond image generation, these latent spaces hold immense potential for adaptation to a variety of downstream applications, making them a critical focus for research.

This thesis systematically explores latent spaces for generative AI along three key dimensions. The first and primary dimension investigates how latent spaces can be harnessed to extend generative models beyond traditional image synthesis. By leveraging the structured latent spaces of StyleGAN2 and Stable Diffusion, this work introduces novel methodologies for expressive face video encoding, robust one-shot face reenactment, and training-free visual in-context learning. Key contributions include methods for encoding fine-grained facial expressions and motions for video generation, decomposing identity and motion for seamless reenactment within the StyleGAN’s latent space, and reformulating self-attention in Stable Diffusion for multi-task visual in-context learning.

The second dimension addresses a critical limitation of generative models: hallucinations in diffusion models. A novel framework, Adaptive Attention Modulation (AAM), is proposed to dynamically modulate self-attention distributions during early denoising stages. By introducing temperature scaling and a masked perturbation strategy, AAM mitigates the emergence of unrealistic artifacts, significantly improving the fidelity and reliability of diffusion-generated content.

The third dimension focuses on mitigating societal risks posed by generative AI, particularly the proliferation of deepfakes. Through a multi-modal framework called Audio-Visual Feature Fusion (AVFF), this thesis develops a robust deepfake detection method that explicitly captures audio-visual correspondences. Combining self-supervised representation learning with a novel complementary masking and cross-modal fusion strategy, AVFF achieves state-of-the-art performance in identifying manipulated multimedia content, addressing a pressing ethical challenge in generative AI.

Audience: Graduate Faculty

Browse All Events

July 2025

SU	MO	TU	WE	TH	FR	SA
29	30	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31	1	2

Submit an Event