Active Preference-Based Learning of Reward Functions
Citations Over TimeTop 1% of 2017 papers
Abstract
Our goal is to efficiently learn reward functions encoding a human's preferences for how a dynamical system should act.There are two challenges with this.First, in many problems it is difficult for people to provide demonstrations of the desired system trajectory (like a high-DOF robot arm motion or an aggressive driving maneuver), or to even assign how much numerical reward an action or trajectory should get.We build on work in label ranking and propose to learn from preferences (or comparisons) instead: the person provides the system a relative preference between two trajectories.Second, the learned reward function strongly depends on what environments and trajectories were experienced during the training phase.We thus take an active learning approach, in which the system decides on what preference queries to make.A novel aspect of our work is the complexity and continuous nature of the queries: continuous trajectories of a dynamical system in environments with other moving agents (humans or robots).We contribute a method for actively synthesizing queries that satisfy the dynamics of the system.Further, we learn the reward function from a continuous hypothesis space by maximizing the volume removed from the hypothesis space by each query.We assign weights to the hypothesis space in the form of a log-concave distribution and provide a bound on the number of iterations required to converge.We show that our algorithm converges faster to the desired reward compared to approaches that are not active or that do not synthesize queries in an autonomous driving domain.We then run a user study to put our method to the test with real people.
Related Papers
- → Physician-Friendly Machine Learning: A Case Study with Cardiovascular Disease Risk Prediction(2019)71 cited
- → A Machine Learning Tutorial for Operational Meteorology. Part I: Traditional Machine Learning(2022)25 cited
- → Breakdown of Machine Learning Algorithms(2022)1 cited
- → Weather and Forecasting64 cited
- → Risk Aware Ranking for Top-$k$ Recommendations(2019)1 cited