Event
Ph.D Dissertation Defense: Faisal Hamman
Monday, November 3, 2025
12:30 p.m.
Room IRB-4105 (Brendan Iribe Building)
Sarah Pham
301 473 2449
spham124@umd.edu
Professor Furong Huang
Professor Tudor Dumitras
Abstract: Machine learning is increasingly deployed in high-stakes domains such as finance, healthcare, and education, in ways that profoundly impact people's lives. However, the decision-making process of these complex black-box models is difficult to understand for
human stakeholders such as auditors, institutions, and end-users, raising questions about their adoption, accountability, and trust. Regulatory and ethical standards increasingly advocate for reliable and trustworthy explanations for automated decision-making, particularly in the case of adverse actions and denials. Trustworthy adoption of machine learning requires more systematic and mathematically rigorous approaches for explainable machine learning, as many existing methods rely on heuristics that may lack consistency.
This thesis seeks to address emergent challenges in explainable and trustworthy machine learning by developing novel mathematical frameworks deep-rooted in information-theoretic methods.
An emerging problem in explainability is the problem of robust counterfactual explanations. Counterfactual explanations (CFEs) guide toward changing the outcome of a model with minimum input perturbation, e.g., increase your income by 10K to qualify for a loan. However, such CFEs can often become invalid if the machine learning model is updated even slightly. Models are in fact updated quite frequently, leading to the phenomenon of model multiplicity (also known as the Rashomon effect), where multiple models with comparable performance make conflicting predictions on the same input. Such variability can cause previously issued CFEs to become invalid, undermining user trust and the reliability of algorithmic recourse. To address this challenge, we propose a measure called Stability, that captures the robustness of CFEs under natural model updates. We develop practical algorithms to generate robust CFEs for neural networks along with theoretical guarantees, providing a principled foundation for reliable algorithmic recourse in evolving machine learning systems.
Going beyond neural networks, we observe that Tabular large language models (LLMs) are also affected by model multiplicity after fine-tuning. Such variability in predictions can raise concerns about the reliability of Tabular
LLMs even as they generate interest in critical domains such as finance for classification with limited labeled data. Interestingly, our stability measure helps quantify the consistency of individual predictions for Tabular LLMs without expensive model retraining and ensembling. Our measure quantifies a prediction’s consistency by analyzing (sampling) the model’s local behavior around the input in its embedding space. We provide probabilistic guarantees on prediction consistency across a broad class of fine-tuned models, along with experiments on Tabular LLMs.
Expanding our investigation of LLM consistency, we move beyond the classification setting and explore generative LLMs. In retrieval augmented generation (RAG) systems, users might expect that paraphrased or reworded queries will yield outputs that convey the same underlying information. However, existing RAG pipelines often show variability in both the retriever and the generator, undermining reliability in high-stakes applications. To address this challenge, we propose a reinforcement-learning-based approach that improves consistency through group similarity rewards computed over paraphrased input sets. Our training strategy yields Con-RAG, a reliable RAG system that improves both consistency and accuracy across several QA benchmarks.
In addition to improving inference-time consistency and robustness, we explore how CFEs can enhance training-time efficiency in LLMs. We propose CFE-infused Distillation (CoD), a new framework to distill large teacher models into smaller students using few-shot task-specific data by systematically infusing training data with CFE examples.
We provide both statistical and geometric guarantees motivating this approach, and show empirically that CoD significantly outperforms standard distillation baselines in few-shot regimes, thus connecting explainability with model compression.
Finally, we turn to another dimension of trust - algorithmic fairness - where we explain fairness trade-offs using information-theory. We reveal how local fairness (within each client) and global fairness (across clients) interact under data heterogeneity in distributed and federated learning by using a new tool in information theory called Partial Information Decomposition (PID). We further unify classical group fairness notions, e.g., statistical parity, equalized odds, and predictive parity using the same PID framework, offering a granular understanding of their overlaps and impossibilities.
This thesis lays the foundational guiding principles for data-scientists and policymakers toward trustworthy AI adoption, fostering user acceptance and reducing reputational and regulatory risks.