EE-568 – Slides 2025 ‒ LIONS ‐ EPFL

Outline

The 2025 course consists of the following topics:

Introduction to Reinforcement Learning + Dynamic Programming I
Definition of Markov Decision Processes, policy and performance criteria.
Dynamic programming with known transition dynamics: Value Iteration, Policy Iteration.

Lecture 02

Dynamic Programming II
Dynamic programming with unknown transition dynamics: Q-Learning

Lecture 03

Linear Programming
Algorithms based on Primal and Dual Linear Programming formulation of RL: constraint
sampling, REPS and DICE methods.

Lecture 04

Policy Gradient I
Policy Parameterization, REINFORCE and techniques to compute unbiased estimator of
the policy gradient.

Lecture 05

Policy Gradient II
Non concavity of the policy gradient objective, global convergence of projected gradient
descent, Global convergence of natural policy gradient, TRPO and PPO.

Lecture 06

Deep and Robust Reinforcement Learning
Importance of robustness in RL, Robust RL as a Zero Sum Markov Game.

Lecture 07

Imitation Learning
Motivations, Setting, maximum causal entropy IRL, GAIL and LP approaches.

Lecture 08

Alignment and Reasoning with Reinforcement Learning
Small intro to Language Models, Alignment, RLHF, Reasoning, Reasoning in modern models
(GPT-o1, DeepSeek-R1).

Final lecture

Project Presentations