Event
Ph.D. Dissertation Defense: Gowtham Premananth
Friday, March 27, 2026
2:00 p.m.
AVW 2168
ANNOUNCEMENT: Ph.D. Dissertation Defense
Name: Gowtham Premananth
Committee:
Professor Carol Espy-Wilson (Chair)
Professor Jonathan Simon
Professor Shihab Shamma
Name: Gowtham Premananth
Committee:
Professor Carol Espy-Wilson (Chair)
Professor Jonathan Simon
Professor Shihab Shamma
Professor Dinesh Manocha
Professor William Idsardi (Dean's Representative)
Date/Time: Friday, March 27, 2026 at 02.00 PM - 04.00 PM
Location: AVW 2168
Title: Multimodal Behavioral Biomarkers for Schizophrenia: Towards Explainable Detection and Severity Estimation
Abstract: Schizophrenia is a complex and heterogeneous mental disorder that presents significant challenges in both diagnosis and symptom severity assessment. Traditional diagnostic approaches rely on subjective clinical evaluations, which can be inconsistent, time-consuming, and dependent on clinician expertise. Furthermore, many regions face a shortcoming in access to mental healthcare services and a shortage of trained professionals worldwide, further restricting timely and accurate diagnosis. This shortage of mental health professionals also creates a huge clinical whitespace, the duration between and before clinical appointments, when people affected by schizophrenia are not diagnosed yet or whose symptoms worsen between appointments, could cause potential harm to themselves and those around them. These drawbacks emphasize the necessity of automated, scalable, and objective evaluation instruments. In this work, we leverage advances in deep learning and artificial intelligence (AI) to enhance schizophrenia classification, symptom recognition, and severity estimation, leveraging advancements in artificial intelligence (AI) and deep learning.
Our approach integrates audio, video, and text modalities to distinguish individuals with schizophrenia spectrum disorders from healthy controls while also providing a detailed assessment of symptom severity. These modalities are chosen because the data can be collected easily in a non-invasive manner and have demonstrated effectiveness in capturing behavioral and cognitive changes associated with alterations in brain function due to different mental health disorders. To extract meaningful features, we analyze the coordination of Vocal Tract Variables (TVs) from speech and Facial Action Units (FAUs) from video recordings, transforming them into high-level coordination features that capture motor and articulatory impairments commonly associated with schizophrenia. Additionally, we employ context-independent text embeddings derived from speech transcriptions to incorporate linguistic information, offering a more holistic representation of symptomatology. To effectively combine these diverse modalities, we utilize a Convolution Neural Network (CNN)-based multimodal model with cross-modal attention, allowing the model to focus on the most relevant features across different data streams for robust detection and symptom-class-based classification of schizophrenia subjects.
Beyond traditional classification models, we implement a Vector Quantized Variational Auto-Encoder (VQ-VAE)-based Multimodal Representation Learning (MRL) framework, enabling the development of task-agnostic multimodal speech representations. These representations are then leveraged within a Multi-Task Learning (MTL) framework to jointly predict schizophrenia symptom classes and overall severity, enhancing both classification accuracy and symptom severity estimation. Furthermore, extending beyond overall severity assessment, we designed a unified multimodal model capable of estimating multiple individual schizophrenia symptoms within a single framework, providing a more comprehensive and clinically meaningful evaluation.
Speech-based assessment plays a particularly critical role in our studies due to its potential for remote, automated, and scalable mental health monitoring without the same privacy concerns related to video recordings. To enhance the precision of speech-based symptom severity estimation, we develop feature fusion models that utilize articulatory coordination features alongside acoustic speech representations extracted from pre-trained speech models. In addition to the traditional articulatory coordination features, we propose Concise Articulatory (C-Art) representations, generated through a representation learning framework to produce more information-dense embeddings from the sparse coordination matrices obtained from TVs. The proposed C-Art representations outperformed traditional coordination features in the feature fusion models for symptom severity estimation, proving the utility of the proposed representations. Furthermore, to provide a more structured approach to subject intervention prioritization based on symptom severity, we introduce a Bradley-Terry pairwise comparison model, refining severity ranking and ensuring a more robust model architecture that works well in data-scarce scenarios.
Interpretability of the results obtained from these AI-based models is essential for real-world clinical adoption. We explore this aspect by quantifying articulatory coordination extracted from speech into a biomarker. This proposed biomarker distinguishes between complex and simpler speech coordination in relation to healthy controls. The clinical relevance of this biomarker is validated by performing systematic analyses that showed specific schizophrenia symptom subtypes correlate with the biomarker based on their presence and individual symptom severity. These associations suggest that the biomarker not only differentiates diagnostic groups but also meaningfully reflects underlying symptom dimensions, supporting its potential use for objective symptom characterization and clinical monitoring.
Another challenge in clinical adoption is the generalizability of the models and biomarkers developed. Data scarcity, a common problem with behavioral data, including video and speech, especially in the medical domain due to privacy issues, is something that usually hinders the generalizability of the frameworks. We were able to effectively mitigate it through our feature extraction, selection, and model design process. The trained models were evaluated on multiple independent datasets collected in different settings to ensure their generalizability.
In essence, the work in this dissertation presents steps taken towards integrating computational intelligence with clinical practice for schizophrenia assessment, and highlights the importance of developing scalable, interpretable, and accessible AI-driven tools that can support symptom monitoring, early detection, and intervention prioritization in real-world mental health care.
