EPFL

Prof. Semih Cayci, RWTH Aachen

Title: Partially Observable Reinforcement Learning: Memory, Approximation, and Guarantees

Abstract: Many control and multi-agent systems involve decision-making under partial information. Even in the single-agent case, partial observability fundamentally changes the structure of optimal control and learning: optimal policies depend on action-observation histories and therefore require an internal memory. This tutorial provides a theory-driven overview of reinforcement learning for POMDPs, with a focus on structured memory mechanisms (finite-state controllers) and recurrent policies (e.g., RNN-based) for history compression. Using tools from learning theory and filter stability, I will present non-asymptotic performance guarantees for natural policy gradient and temporal difference learning methods with internal memory, and show how finite-memory approximation, inference, and statistical errors can be explicitly quantified and controlled. Finally, I will connect these partially observable reinforcement learning insights to learning in dynamic games with partial/asymmetric information and conclude with concrete open problems at the interface of control, learning, and game theory.

Brief bio: Semih Cayci is a tenure-track Assistant Professor in the Department of Mathematics at RWTH Aachen University, Germany. Previously, he was an NSF TRIPODS Postdoctoral Fellow at the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign. His research focuses on the theoretical and algorithmic foundations of reinforcement learning, deep learning theory and optimization.