Invited Speaker
in
Workshop: Complex feedback in online learning
Learning from Preference Feedback in Combinatorial Action Spaces
Thorsten Joachims
The feedback that users provide through their choices (e.g. clicks, purchases, streams) is one of the most common types of data readily available for training autonomous systems. However, naively training systems based on choice data has many pitfalls, since the observed choices are only boundedly rational at best -- leading to biased training data and consequently biased learning results. Instead of correcting for these biases post-hoc, this talk explores how specifically designed information-acquisition interventions can eliminate biases during data collection and provide reliable feedback in the form of pairwise preferences. I will discuss how to use this preference feedback in the Dueling Bandits framework and the Coactive Learning framework, highlighting how these two frameworks differ in their reliance on human-driven exploration versus algorithm-driven exploration.