Prof. Lacra Pavel, University of Toronto

Title: How can Passivity inform (Reinforcement) Learning in Multi-Agent Games

Abstract: We consider a set of agents playing a game whose goal is to achieve a collective configuration described by an unknown Nash equilibrium. The behaviour of different learning algorithms highly depends on the game setting. Many recent applications based on game theory can benefit from relaxed assumptions on the player’s informational requirements as well as on the structural properties of the game. Bandit information represents one of the
weakest for which convergence can be shown. In this talk, we show how passivity theory can inform the design of such learning algorithms. We focus on mirror descent (MD) and its variants: discounted, second-order, up to a regularized-bandit variant. We show that when viewed from a system theoretic viewpoint, convergence can be shown by balancing passivity properties. Furthermore, such a perspective can offer insights into devising
novel second-order and bandit versions of MD. These have new convergence properties that are unachievable by unmodifiedd MD dynamics.