Event
Ph.D. Research Proposal Exam: Gowtham Premananth
Friday, May 9, 2025
8:00 a.m.-10:00 a.m.
AVW 2460
Maria Hoo
301 405 3681
mch@umd.edu
ANNOUNCEMENT: Ph.D. Research Proposal Exam
Name: Gowtham Premananth
Committee:
Professor Carol Espy-Wilson (Chair)
Professor Jonathan Simon
Professor Shihab Shamma
Date/Time: Friday, May 09, 2025 at 08.00 AM - 10.00 AM
Location: AVW 2460
Title: Multimodal Behavioral Biomarkers for Schizophrenia: Towards Explainable Detection and Severity Estimation
Abstract: Schizophrenia is a complex and heterogeneous mental disorder that presents significant challenges in both diagnosis and symptom severity assessment. Traditional diagnostic approaches rely on subjective clinical evaluations, which can be inconsistent, time-consuming, and dependent on clinician expertise. These limitations highlight the need for objective, scalable, and automated assessment tools. In this work, we explore multimodal and speech-based machine learning frameworks to enhance schizophrenia classification, symptom recognition, and severity estimation, leveraging advancements in artificial intelligence (AI) and deep learning.
Our approach integrates audio, video, and text modalities to distinguish individuals with schizophrenia spectrum disorders from healthy controls while also providing a detailed assessment of symptom severity. To extract meaningful features, we analyze the coordination of Vocal Tract Variables (TVs) from speech and Facial Action Units (FAUs) from video recordings, transforming them into high-level coordination features that capture motor and articulatory impairments commonly associated with schizophrenia. Additionally, we employ context-independent text embeddings derived from speech transcriptions to incorporate linguistic information, offering a more holistic representation of symptomatology. To effectively combine these diverse modalities, we utilize a Convolution Neural Network (CNN)-based multimodal model with cross-modal attention, allowing the model to focus on the most relevant features across different data streams for robust detection and symptom-class-based classification of schizophrenia subjects. Beyond traditional classification models, we explore novel fusion strategies to improve model performance and generalizability. One such strategy integrates Minimal Gated Multimodal Unit (mGMU) fusion, facilitating dynamic interactions between different modalities before final decision-making, ensuring more refined feature representations. Additionally, we implement a Vector Quantized Variational Auto-Encoder (VQ-VAE)-based Multimodal Representation Learning (MRL) framework, enabling the development of task-agnostic multimodal speech representations. These representations are then leveraged within a Multi-Task Learning (MTL) framework to jointly predict schizophrenia symptom classes and overall severity, enhancing both classification accuracy and symptom severity estimation.
Speech-based assessment plays a particularly critical role in our study due to its potential for remote, automated, and scalable mental health monitoring without the same privacy concerns related with video recordings. To enhance the precision of speech-based severity estimation, we develop self-supervised learning techniques that utilizes articulatory coordination features alongside speech representations from pre-trained audio models. To optimize feature fusion and improve learning efficiency, we incorporate Multi-Head Attention (MHA), which enhances the model’s ability to detect subtle speech anomalies indicative of schizophrenia. Furthermore, to provide a more structured approach to subject intervention prioritization, we introduce a Bradley-Terry pairwise comparison model, refining severity ranking and ensuring a more robust model architecture that works well in data-scarce scenarios.
Despite promising advancements, several challenges persist. Data scarcity remains a major hurdle, limiting the generalizability of our models across diverse populations and clinical conditions. Future work will focus on investigating techniques like the pairwise comparison model and by integrating data from multiple sources, to overcome the data scarcity problem and develop robust, adaptable and generalizable models applicable across difference settings. Additionally, while our current framework provides an overall schizophrenia severity estimation, future efforts will aim to assess individual symptom severity in a more granular manner, enabling precise, symptom-specific interventions tailored to individual patients. Another critical challenge is interpretability, which is essential for real-world clinical adoption. To bridge the gap between AI and clinical utility, we plan to conduct in-depth interpretability studies, examining how learned representations capture schizophrenia-related speech and behavioral patterns. Understanding the relationship between these patterns and clinical symptoms will improve model transparency, foster trust among healthcare professionals, and facilitate seamless integration into clinical workflows.
By bridging computational intelligence with real-world clinical needs, this research lays the foundation for scalable, objective, and accessible mental health assessment tools. Future advancements in multimodal AI approaches will continue to refine schizophrenia diagnosis and symptom monitoring, making mental health care more practical, data-driven, and widely available. Through continued innovation, we aim to empower clinicians with reliable, AI-driven tools that enhance early detection, track disease progression, and ultimately improve patient outcomes in schizophrenia care.