Invited talk
in
Workshop: Interactive Learning with Implicit Human Feedback
Paul Mineiro: Contextual Bandits without Rewards
Abstract:
Contextual bandits are highly practical, but the need to specify a scalar reward limits their adoption. This motivates study of contextual bandits where a latent reward must be inferred from post-decision observables, aka Interactive Grounded Learning. An information theoretic argument indicates the need for additional assumptions to succeed, and I review some sufficient conditions from the recent literature. I conclude with speculation about composing IGL with active learning.
Chat is not available.