Skip to yearly menu bar Skip to main content


Invited talk
in
Workshop: Interactive Learning with Implicit Human Feedback

Paul Mineiro: Contextual Bandits without Rewards


Abstract:

Contextual bandits are highly practical, but the need to specify a scalar reward limits their adoption. This motivates study of contextual bandits where a latent reward must be inferred from post-decision observables, aka Interactive Grounded Learning. An information theoretic argument indicates the need for additional assumptions to succeed, and I review some sufficient conditions from the recent literature. I conclude with speculation about composing IGL with active learning.

Chat is not available.