Learning to maximize the social welfare from human preferences

 

Motivation: Shared decision-making is at the heart of any multi-agent system. Each agent independently aims to maximize its reward which depends on the decisions of all agents. The outcome of such independent decision-making, however, is often inefficient in terms of social welfare and may violate other central objectives such as fairness. In traffic routing, for example, each driver aims to minimize their travel time independently which can result in traffic jams. Traffic routing under a central authority, however, can lead to a reduction in the aggregate travel time of all drivers. The field of mechanism design is concerned with how a central authority can incentivize independent decision-making towards an outcome that is aligned with social welfare, possibly by incurring a cost on each agent. A key challenge is that the social welfare function can be complex taking into account several possibly conflicting objectives, and may thus be a priori unknown. To address this, this master project aims to incorporate feedback from the central authority to learn a social welfare function that is aligned with the central authority’s preferences.

Outline: In the first step, you will start by familiarizing yourself with the relevant literature on mechanism design (see e.g. [1,2]) and preference-based optimization (see e.g. [3,4,5]). Next, the goal is to design an algorithm that is iteratively learning the social welfare function based on the preferences of the central authority (preference-based optimization). Then, the parameters of each agent’s utilities are adapted based on the updated social welfare function such that the resulting Nash equilibrium maximizes the social welfare function (mechanism design). Depending on your interest and background you can then either focus on practical implementations such as the traffic routing problem and evaluate how your algorithm performs given actual human preferences. Alternatively, you can continue by analyzing the theoretical convergence and sample complexity of your algorithm.

Requirements: We seek motivated students with a strong mathematical, or computer science background. We do have some concrete ideas on how to tackle the above challenges, but we are always open to different suggestions. If you are interested, please send an email containing 1. one paragraph on your background and fit for the project, 2. your BS and MS transcripts to [email protected] and [email protected].

This project will be supervised by Prof. Maryam Kamgarpour ([email protected]), Anna Maddux ([email protected]), and Andreas Schlaginhaufen ([email protected]).

References:

  1. Marden, Jason R., and Adam Wierman. “Distributed welfare games with applications to sensor coverage.” 2008 47th IEEE Conference on Decision and Control. IEEE, 2008.
  2. Sessa, Pier Giuseppe, et al. “Online Submodular Resource Allocation with Applications to Rebalancing Shared Mobility Systems.” International Conference on Machine Learning. PMLR, 2021.
  3. Gonzalez, Javier, et al. “Preferential bayesian optimization.” International Conference on Machine Learning. PMLR, 2017.
  4. Yue, Yisong, et al. “The k-armed dueling bandits problem.” Journal of Computer and System Sciences 78.5 (2012): 1538-1556.
  5. The Many Facets of Preference-Based Learning, Workshop, ICML 2023