Event
Ph.D Dissertation Defense: Trevine Oorloff
Tuesday, December 3, 2024
9:00 a.m.-11:00 a.m.
AVW 1146
Maria Hoo
301 405 3681
mch@umd.edu
ANNOUNCEMENT: Ph.D Dissertation Defense
Name: Trevine Oorloff
Committee:
Prof. Abhinav Shrivastava, Chair/ Advisor
Dr. Yaser Yacoob, Co-Chair/ Advisor
Prof. Dinesh ManochaProf. Sanghamitra Dutta
Prof. Nikhil Chopra, Dean's Representative
Date/Time: Tuesday, December 3rd, 2024 at 9:00 AM - 11:00 AM
Location: AVW 1146
Title: Latent Space Explorations for Generative AI
Abstract:
Abstract:
Generative AI has revolutionized content creation through models such as Generative Adversarial Networks (GANs) and diffusion models, which produce high-quality, realistic outputs across various domains. These advancements rely on the ability of generative models to learn and encode complex patterns and semantic relationships within high-dimensional latent spaces, which serve as a foundation for their capacity to generate coherent and diverse outputs. Beyond image generation, these latent spaces hold immense potential for adaptation to a variety of downstream applications, making them a critical focus for research.
This thesis systematically explores latent spaces for generative AI along three key dimensions. The first and primary dimension investigates how latent spaces can be harnessed to extend generative models beyond traditional image synthesis. By leveraging the structured latent spaces of StyleGAN2 and Stable Diffusion, this work introduces novel methodologies for expressive face video encoding, robust one-shot face reenactment, and training-free visual in-context learning. Key contributions include methods for encoding fine-grained facial expressions and motions for video generation, decomposing identity and motion for seamless reenactment within the StyleGAN’s latent space, and reformulating self-attention in Stable Diffusion for multi-task visual in-context learning.
The second dimension addresses a critical limitation of generative models: hallucinations in diffusion models. A novel framework, Adaptive Attention Modulation (AAM), is proposed to dynamically modulate self-attention distributions during early denoising stages. By introducing temperature scaling and a masked perturbation strategy, AAM mitigates the emergence of unrealistic artifacts, significantly improving the fidelity and reliability of diffusion-generated content.
The third dimension focuses on mitigating societal risks posed by generative AI, particularly the proliferation of deepfakes. Through a multi-modal framework called Audio-Visual Feature Fusion (AVFF), this thesis develops a robust deepfake detection method that explicitly captures audio-visual correspondences. Combining self-supervised representation learning with a novel complementary masking and cross-modal fusion strategy, AVFF achieves state-of-the-art performance in identifying manipulated multimedia content, addressing a pressing ethical challenge in generative AI.