Reward learning from human feedback

Motivation Reinforcement Learning (RL) is a promising approach to many complex decision making problems. However, one of the key challenges in applying RL to real-world scenarios is determining an appropriate reward function for the problem at hand. To alleviate this, RL from human preferences [1] aims to use human feedback to learn a reward function that is aligned with human preferences. To this end, the algorithm is iteratively asking human experts for their preferences between pairs of optimal demonstrations corresponding to different reward functions and then uses this new information to update its reward function. RL from human preferences has proven to be successful in simulation, and has recently also been applied to fine tuning of large language models [2].


In a previous master project, RL was used to teach robots to build a self-supporting structure connecting two points, for example a bridge. Determining a reward function for this task turns out to be difficult. However, human feedback may be valuable for telling whether a given
demonstration is close to achieving the goal of building a stable bridge or not.


Outline In a first step, you will start by familiarizing yourself with the literature on RL and reward learning from human preferences. Next, you will explore novel strategies to maximize the information gained by expert queries in order to efficiently explore the reward space. Depending
on your own interest and ideas you will then either continue by applying your algorithm to the bridge application, or by analyzing the theoretical convergence and sample complexity of your algorithm. Moreover, another interesting and novel direction would be to add constraints to RL problem.


We seek for motivated students with a strong mathematical, or computer science background. We do have some concrete ideas on how to tackle the above challenges, but we are always open for different suggestions. If you are interested, please send an email containing 1. one paragraph on your background and fit for the project, 2. your BS and MS transcripts to [email protected].

This project will be supervised by Prof. Maryam Kamgarpour ,
Anna Maddux ([email protected]), and Andreas Schlaginhaufen ([email protected]).

References:

  1. Christiano, Paul F., et al. “Deep reinforcement learning from human preferences.” Advances
    in neural information processing systems 30 (2017).
  2. “ChatGPT: Optimizing Language Models for Dialogue.” OpenAI, 30 Nov. 2022,
    https://openai.com/blog/chatgpt/.
  3. Cao, Haoyang, Samuel Cohen, and Lukasz Szpruch. “Identifiability in inverse reinforcement
    learning.” Advances in Neural Information Processing Systems 34 (2021): 12362-12373.