Safe reinforcement learning from single agent to multiagent ‒ UPKAMGARPOUR ‐ EPFL

Reinforcement Learning (RL) involves studying sequential decision-making problems, where an agent aims to maximize an expected cumulative reward by interacting with an unknown environment. While RL has achieved impressive success in domains like video games and board games, safety concerns arise when applying RL to real-world problems, such as autonomous driving, robotics, power systems, and cyber-security. This project proposal aims to explore the field of safe RL, addressing the challenges of sample complexity, stability, and theoretical guarantees.Outline: This project contains the following potential directions:

Designing Safe RL Algorithms: Explore and propose approaches for safe reinforcement learning that address the challenges of sample complexity, stability, and safety guarantees.
Theoretical Advancements: Advance the theoretical foundations of safe RL algorithms, focusing on a specific class of RL methods, such as policy-based methods, actor-critic methods, or other relevant approaches. Develop provable guarantees for the safety and stability of these algorithms.
Implementation and Evaluation: Implement the proposed safe RL algorithms in a realistic simulation environment. Evaluate their performance and compare them against existing methods using safety benchmarks.
Moving from single agent setting to multi-agent setting.

Requirement: We seek for motivated students with a strong mathematical, or computer science background. We do have some concrete ideas on how to tackle the above challenges, but we are always open for different suggestions. If you are interested, please send an email containing 1. one paragraph on your background and fit for the project, 2. your BS and MS transcripts to [email protected]. The students who have suitable track record will be contacted.This project will be supervised by Prof. Maryam Kamgarpour and Tingting Ni ([email protected]).

References:

Ying, D., Zhang, Y., Ding, Y., Koppel, A., & Lavaei, J. (2023). Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities. arXiv preprint arXiv:2305.17568. https://arxiv.org/pdf/2305.17568.pdf
Canese, L., Cardarilli, G. C., Di Nunzio, L., Fazzolari, R., Giardino, D., Re, M., & Spanò, S. (2021). Multi-agent reinforcement learning: A review of challenges and applications. Applied Sciences, 11(11), 4948. https://www.mdpi.com/2076-3417/11/11/4948
Agarwal, Alekh, et al. “On the theory of policy gradient methods: Optimality, approximation, and distribution shift.” The Journal of Machine Learning Research 22.1 (2021): 4431-4506.
Boob, Digvijay, Qi Deng, and Guanghui Lan. “Stochastic first-order methods for convex and nonconvex functional constrained optimization.” Mathematical Programming 197.1 (2023): 215-279.
Muehlebach, Michael, and Michael I. Jordan. “On constraints in first-order optimization: A view from non-smooth dynamical systems.” The Journal of Machine Learning Research 23.1 (2022): 11681-11727.