Ph.D. Research Proposal Exam: Erfaun Noorani
Friday, December 3, 2021
301 405 3681
Ph.D. Research Proposal Exam
Name: Erfaun Noorani
Professor John S. Baras (Chair)
Professor Eyad H. Abed
Professor Micheal C. Fu
Date/time: Friday, December 3rd, 2021 at 8 am
Location: AVW 1146
Title: Robust Reinforcement Learning via Risk-sensitivity
The recent impressive performance of Reinforcement Learning (RL) algorithms in the domain of video games, starting with the DQN algorithm, as well as promising applications of RL systems in multiple domains, such as protein folding, robotics, traffic control, resource management, finance, interactive education, and health care, to name a few, have brought RL to the forefront of current research. Current Reinforcement Learning has several weaknesses, with the most well-known ones being low generalizability and brittleness (i.e., non-robustness), which have hindered the adoption of such RL systems for critical applications, especially high-stakes and safety-critical real-world applications.
Robust properties of Risk-sensitive RL algorithms, coupled with the improved generalizability, are a strong indication that risk-sensitizing RL algorithms can pave the way to so-called “real-world” RL. We propose to develop Single-agent Reinforcement learning (RL), Multi-Agent Reinforcement Learning (MARL), and Human Multi-Agent Reinforcement Learning (H-MARL) systems that are generic, provide performance guarantees and can generalize-reason-improve in complex and unknown task environments.
In our preliminary work, we established the connection between Risk-sensitive RL, (Distributionally) Robust RL, and Regularized RL objectives (such as entropy and KL-regularized RL) and hence a host of well-known RL algorithms, such as Trust Region Policy Optimization (TRPO), Proximal Policy Optimization (PPO), and Maximum A Posteriori Policy Optimization (MPO). Such equivalences (I) allow to understand several well-known RL algorithms from a risk minimization perspective and hence offers a taxonomy of RL algorithms based on such perspective and (II) analytically establish the robustness and generalizability properties of Risk-sensitive Reinforcement Learning, which in turn, provides a theoretical justification for the robust performance of a host of well-known RL algorithms. These further motivate risk-sensitizing current Risk-neutral RL algorithms. We also derived Policy Gradient Theorem for the Risk-sensitive Control “exponential of integral” criteria and proposed a risk-sensitive Monte Carlo policy gradient algorithm, as a risk-sensitive generalization of Policy Gradient Algorithm REINFORCE. Our simulations, together with our theoretical analysis, show that the use of Risk-sensitive RL, with an appropriately chosen risk parameter, not only results in a risk-sensitive policy but also reduces variance during the learning process and accelerates learning, which in turn results in a policy with a higher expected return— that is to say, risk-sensitiveness leads to sample efficiency and improved performance. We also explored the use of such Risk-sensitive Policy Gradient algorithms in Independent Multiagent environments. Our simulation results show the agent’s risk-attitudes influence coordination and collaboration by influencing the agents’ learning dynamics and, if appropriately chosen, can lead to efficient learning of Hicks optimal policies. This suggests that risk-sensitive agents could better coordinate and collaborate, which results in a better performance in multi-agent task environments.
We propose to further extend such risk-sensitive approaches to RL algorithms with Temporal Logic constraints and develop risk-sensitive algorithms for (Human) Multi-agent environments with Temporal Logic constraints. Such development is a step forward for enabling the adoption of RL systems for safety-critical high-impact real-world applications.