Skip to yearly menu bar Skip to main content

Invited Speaker
Workshop: Complex feedback in online learning

Learning from Preference Feedback in Combinatorial Action Spaces

Thorsten Joachims


The feedback that users provide through their choices (e.g. clicks, purchases, streams) is one of the most common types of data readily available for training autonomous systems. However, naively training systems based on choice data has many pitfalls, since the observed choices are only boundedly rational at best -- leading to biased training data and consequently biased learning results. Instead of correcting for these biases post-hoc, this talk explores how specifically designed information-acquisition interventions can eliminate biases during data collection and provide reliable feedback in the form of pairwise preferences. I will discuss how to use this preference feedback in the Dueling Bandits framework and the Coactive Learning framework, highlighting how these two frameworks differ in their reliance on human-driven exploration versus algorithm-driven exploration.

Chat is not available.