| Type | Semester project |
| Split | 30% theory, 60% implementation, 10% experimentation |
| Knowledge | Machine learning, Model predictive control; Programming skills: Python |
| Subjects | Machine Learning, Optimal Control |
| Supervision | Baiyu Peng |
| Published | 12.01.2026 |

Unlike traditional Learning from Demonstration (LfD) methods that directly mimic expert trajectories, Constraint Learning from Demonstrations (CLfD) focuses on identifying the underlying constraints of what is allowed to do and what is not allowed to do from the demonstrations, and finally learns a controller that explicitly adheres to the learned constraint.
Previously, we developed the Positive-Unlabeled Constraint Learning (PUCL) algorithm, which recovers the constraint function through a two-step process:
- Generate potentially unsafe trajectories using a reinforcement learning (RL) policy.
- Identify unsafe states by distinguishing them from safe expert demonstrations and unsafe trajectories.
Our current framework employs an RL policy. Despite its flexibility, RL often requires substantial training time and struggles to obtain a feasible and high-performing policy. In contrast, Model Predictive Control (MPC) is naturally suited for constraint handling, requires no training, and is therefore a promising alternative for trajectory generation in constraint learning.
In this project, the student will modify our existing code and implement an MPC policy to replace the RL policy for constraint learning. The learned constraints will then be used for constraint-aware motion planning in the original task and transferred to similar tasks. The performance of this MPC-based approach will be evaluated and compared with RL-based and Dynamical System (DS)-based methods, both of which have been previously studied and evaluated.
Expectation
- Proficiency in Python coding and common libraries (e.g., numpy, torch, matplotlib).
- Understanding optimal control and model predictive control.
- Understanding classification with neural networks.