Event
Ph.D. Research Proposal: Sachindra Pasan Dissanayake
Monday, May 11, 2026
11:00 a.m.
AVW 1146
ANNOUNCEMENT: Ph.D. Research Proposal Exam
Name: Sachindra Pasan Dissanayake
Committee:
Prof. Sanghamitra Dutta (Chair/Advisor)
Prof. Behtash Babadi
Prof. Sennur Ulukus
Date/time: May 11, 2026 from 11:00 AM to 1:00 PM
Location: AVW 1146
Title: Efficient and Sustainable Artificial Intelligence Using Explainability Methods
Abstract:
AI is driving remarkable advancements across industries, from finance and healthcare to drug discovery, material science, and astronomy. However, these advancements come with ever-increasing model size and complexity, leading to an unprecedented demand for energy. For instance, a general-purpose large language model (LLM) such as ChatGPT has trillions of parameters and consumes 1.059 billion kWh on average every year which is equivalent to charging 223.4 million iPhones everyday for a year. This trend raises a fundamental question: How can we develop sustainable and energy-efficient AI frameworks which achieve high-performance under constrained resources? To this end, we explore novel strategies for reconstructing simpler and efficient models, extracting task-relevant information, and improving computational and data efficiency by systematically integrating explainability methods.
Counterfactual explanations are a burgeoning explainability technique, traditionally used for algorithmic recourse. Counterfactuals are the closest points on the accepted side of the decision boundary, e.g., if a loan is denied, a counterfactual is to increase income by 10K to get approved. Going beyond algorithmic recourse, we demonstrate that counterfactuals could also be leveraged for efficient model reconstruction, i.e., obtaining a surrogate model that mimics the original model using very few data points. Accordingly, we propose an algorithm that enables high-fidelity model reconstruction (about 90% agreement between surrogate and original models) on benchmark datasets with as low as 100 to 400 counterfactual samples, alongside theoretical guarantees rooted in random polytope theory and learning theory. Our findings advance more sustainable learning paradigms where one can extract the maximum information using a minimum number of samples.
Extending this perspective, we harness counterfactual explanations to improve the efficiency of AI-powered sustainability applications. Biopolymer nanocomposites offer a promising class of sustainable plastic alternatives. However, candidate nanocomposites must often meet multiple performance criteria simultaneously, such as mechanical strength, biodegradability, and optical transparency. Given the vast search space, brute-force search for new materials that satisfy multiple properties is challenging. We propose a new unified framework for molecular design that can cater to multiple target properties using a novel iterative counterfactual generation mechanism based on the entropic risk measure. The algorithm finds entirely new compositions with preferable properties within seconds, surpassing existing genetic and Bayesian optimization methods in speed. It was further validated on a lab-generated dataset of biopolymer nanocomposites. This work pushes the boundaries of inverse design, enabling the discovery of sustainable plastic alternatives.
Another facet of sustainable AI is knowledge distillation, which aims to transfer the intelligence of a large, complex teacher model into a smaller student model for resource-constrained environments such as mobile phones, Internet of Things, edge devices, etc. Existing distillation approaches often train the student model to exactly mimic the teacher by naively aligning the intermediate representations of the two models. However, the teacher’s intermediate representations may encode a lot of information that might not be relevant to the downstream task, resulting in a sub-optimal student model. We propose a novel quantification of the task-relevant knowledge available in the teacher using emerging information-theoretic measures from Partial Information Decomposition (PID). Going beyond classical information theory, PID further decomposes the total information about a task contained in a teacher and a student into fine-grained atoms: the amount of information unique in each, redundant/common in both, or synergistic. We provide a novel alternating-optimization algorithm which efficiently distills only the task-relevant information from teacher to student, ignoring task-irrelevant information. This work formalizes the information-theoretic limits of knowledge transfer, paving the way for distilling energy-efficient, task-aware student models.
Finally, we explore extreme distillation strategies, where the goal is to extract light-weight neural networks from large pre-trained models. Large Language Models (LLMs) demonstrate powerful “few-shot'' learning capabilities that enable them to adapt to new tasks with minimal labeled data due to their vast pretrained knowledge. Labeled data can be scarce due to factors such as the rarity of natural or medical phenomena, and the high cost and subjectivity of expert annotation. However, LLMs come with massive energy consumption, high latency, and hardware requirements, often preventing deployment in resource-constrained environments such as phones and specialized hardware. This research studies an extreme paradigm of few-shot distillation of large-model intelligence into light-weight architectures: directly extracting simpler neural networks (hyponets) from large pre-trained models to achieve the best of both worlds: being parameter-efficient while also performing well with limited training data. Our approach bridges the gap between large-model performance and efficient deployment, further advancing the frontiers of efficient and sustainable AI.
Counterfactual explanations are a burgeoning explainability technique, traditionally used for algorithmic recourse. Counterfactuals are the closest points on the accepted side of the decision boundary, e.g., if a loan is denied, a counterfactual is to increase income by 10K to get approved. Going beyond algorithmic recourse, we demonstrate that counterfactuals could also be leveraged for efficient model reconstruction, i.e., obtaining a surrogate model that mimics the original model using very few data points. Accordingly, we propose an algorithm that enables high-fidelity model reconstruction (about 90% agreement between surrogate and original models) on benchmark datasets with as low as 100 to 400 counterfactual samples, alongside theoretical guarantees rooted in random polytope theory and learning theory. Our findings advance more sustainable learning paradigms where one can extract the maximum information using a minimum number of samples.
Extending this perspective, we harness counterfactual explanations to improve the efficiency of AI-powered sustainability applications. Biopolymer nanocomposites offer a promising class of sustainable plastic alternatives. However, candidate nanocomposites must often meet multiple performance criteria simultaneously, such as mechanical strength, biodegradability, and optical transparency. Given the vast search space, brute-force search for new materials that satisfy multiple properties is challenging. We propose a new unified framework for molecular design that can cater to multiple target properties using a novel iterative counterfactual generation mechanism based on the entropic risk measure. The algorithm finds entirely new compositions with preferable properties within seconds, surpassing existing genetic and Bayesian optimization methods in speed. It was further validated on a lab-generated dataset of biopolymer nanocomposites. This work pushes the boundaries of inverse design, enabling the discovery of sustainable plastic alternatives.
Another facet of sustainable AI is knowledge distillation, which aims to transfer the intelligence of a large, complex teacher model into a smaller student model for resource-constrained environments such as mobile phones, Internet of Things, edge devices, etc. Existing distillation approaches often train the student model to exactly mimic the teacher by naively aligning the intermediate representations of the two models. However, the teacher’s intermediate representations may encode a lot of information that might not be relevant to the downstream task, resulting in a sub-optimal student model. We propose a novel quantification of the task-relevant knowledge available in the teacher using emerging information-theoretic measures from Partial Information Decomposition (PID). Going beyond classical information theory, PID further decomposes the total information about a task contained in a teacher and a student into fine-grained atoms: the amount of information unique in each, redundant/common in both, or synergistic. We provide a novel alternating-optimization algorithm which efficiently distills only the task-relevant information from teacher to student, ignoring task-irrelevant information. This work formalizes the information-theoretic limits of knowledge transfer, paving the way for distilling energy-efficient, task-aware student models.
Finally, we explore extreme distillation strategies, where the goal is to extract light-weight neural networks from large pre-trained models. Large Language Models (LLMs) demonstrate powerful “few-shot'' learning capabilities that enable them to adapt to new tasks with minimal labeled data due to their vast pretrained knowledge. Labeled data can be scarce due to factors such as the rarity of natural or medical phenomena, and the high cost and subjectivity of expert annotation. However, LLMs come with massive energy consumption, high latency, and hardware requirements, often preventing deployment in resource-constrained environments such as phones and specialized hardware. This research studies an extreme paradigm of few-shot distillation of large-model intelligence into light-weight architectures: directly extracting simpler neural networks (hyponets) from large pre-trained models to achieve the best of both worlds: being parameter-efficient while also performing well with limited training data. Our approach bridges the gap between large-model performance and efficient deployment, further advancing the frontiers of efficient and sustainable AI.
