Safe reinforcement learning

Reinforcement learning addresses finding a policy for a Markov decision process (MDP) to optimise a cumulative reward function based on the observations of the rewards and the evolution of the MDP. The application of reinforcement learning to safety-critical systems such as autonomous driving and robotics requires safety, that is, satisfying constraints while learning (e.g. collision avoidance). Most existing algorithms on reinforcement learning for constrained systems assume that the constraints of the problem are known. In several realistic applications however,
the constraints themselves are being learned online (e.g. location of other cars/robots). Thus, we need algorithms to do “safe” reinforcement learning. The objective of this project is to develop and implement algorithms for safe reinforcement learning in a realistic testbed.
The student will first explore potential approaches to safe reinforcement learning. She/he will then implement the algorithms in a realistic simulation environment. The theoretically inclined student
works on advancing the theory on safety for a particular class of reinforcement learning algorithms. The practically inclined student can implement existing algorithms on a robotic testbed.
This project requires a strong background in reinforcement learning and in optimisation theory. The student will have a chance to improve their skills in both of these problems. The project can have a more theoretical or applied focus depending on the student’s background. Furthermore, it
can be taken as a semester or Master’s project.


To apply, send an email containing 1. one paragraph on your background and fit for the project, 2. your BS and MS transcripts to [email protected].

The students who have suitable
track record will be contacted.