Timezone: »
Poster
Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback
Chicheng Zhang · Alekh Agarwal · Hal Daumé III · John Langford · Sahand Negahban
We investigate the feasibility of learning from both fully-labeled supervised data and contextual bandit data. We specifically consider settings in which the underlying learning signal may be different between these two data sources. Theoretically, we state and prove no-regret algorithms for learning that is robust to divergences between the two sources. Empirically, we evaluate some of these algorithms on a large selection of datasets, showing that our approaches are feasible, and helpful in practice.
Author Information
Chicheng Zhang (Microsoft Research)
Alekh Agarwal (Microsoft Research)
Hal Daumé III (Microsoft Research)
John Langford (Microsoft Research)
Sahand Negahban (YALE)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Oral: Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback »
Thu. Jun 13th 11:20 -- 11:25 PM Room Hall B
More from the Same Authors
-
2021 : Provable RL with Exogenous Distractors via Multistep Inverse Dynamics »
Yonathan Efroni · Dipendra Misra · Akshay Krishnamurthy · Alekh Agarwal · John Langford -
2022 : Interaction-Grounded Learning with Action-inclusive Feedback »
Tengyang Xie · Akanksha Saran · Dylan Foster · Lekan Molu · Ida Momennejad · Nan Jiang · Paul Mineiro · John Langford -
2023 Workshop: Interactive Learning with Implicit Human Feedback »
Andi Peng · Akanksha Saran · Andreea Bobu · Tengyang Xie · Pierre-Yves Oudeyer · Anca Dragan · John Langford -
2023 Tutorial: Discovering Agent-Centric Latent States in Theory and in Practice »
John Langford · Alex Lamb -
2023 Expo Talk Panel: Vowpal Wabbit: year in review and looking ahead in an LLM world »
John Langford · Byron Xu · Cheng Tan · Jack Gerrits · Lili Wu · Mark Rucker · Olga Vrousgou -
2022 Poster: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2022 Poster: Contextual Bandits with Large Action Spaces: Made Practical »
Yinglun Zhu · Dylan Foster · John Langford · Paul Mineiro -
2022 Spotlight: Contextual Bandits with Large Action Spaces: Made Practical »
Yinglun Zhu · Dylan Foster · John Langford · Paul Mineiro -
2022 Spotlight: Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning »
Alberto Bietti · Chen-Yu Wei · Miroslav Dudik · John Langford · Steven Wu -
2022 : Introduction »
John Langford -
2021 : RL + Recommender Systems Panel »
Alekh Agarwal · Ed Chi · Maria Dimakopoulou · Georgios Theocharous · Minmin Chen · Lihong Li -
2021 : RL Foundation Panel »
Matthew Botvinick · Thomas Dietterich · Leslie Kaelbling · John Langford · Warrren B Powell · Csaba Szepesvari · Lihong Li · Yuxi Li -
2021 Poster: Interaction-Grounded Learning »
Tengyang Xie · John Langford · Paul Mineiro · Ida Momennejad -
2021 Spotlight: Interaction-Grounded Learning »
Tengyang Xie · John Langford · Paul Mineiro · Ida Momennejad -
2021 Poster: ChaCha for Online AutoML »
Qingyun Wu · Chi Wang · John Langford · Paul Mineiro · Marco Rossi -
2021 Spotlight: ChaCha for Online AutoML »
Qingyun Wu · Chi Wang · John Langford · Paul Mineiro · Marco Rossi -
2021 Town Hall: Town Hall »
John Langford · Marina Meila · Tong Zhang · Le Song · Stefanie Jegelka · Csaba Szepesvari -
2021 : Conclusions »
Kate Crawford · Hal Daumé III -
2021 : Political and Legal Implications »
Hal Daumé III · Kate Crawford -
2021 : Environmental Implications »
Kate Crawford · Hal Daumé III -
2021 : Social Aspects »
Kate Crawford · Hal Daumé III -
2021 : Economic Implications »
Hal Daumé III · Kate Crawford -
2021 Tutorial: Social Implications of Large Language Models »
Hal Daumé III · Kate Crawford -
2021 : Introduction »
Kate Crawford · Hal Daumé III -
2021 Expo Workshop: Real World RL: Azure Personalizer & Vowpal Wabbit »
Sheetal Lahabar · Etienne Kintzler · Mark Rucker · Bogdan Mazoure · Qingyun Wu · Pavithra Srinath · Jack Gerrits · Olga Vrousgou · John Langford · Eduardo Salinas -
2020 : Discussion Panel »
Krzysztof Dembczynski · Prateek Jain · Alina Beygelzimer · Inderjit Dhillon · Anna Choromanska · Maryam Majzoubi · Yashoteja Prabhu · John Langford -
2020 Workshop: Workshop on eXtreme Classification: Theory and Applications »
Anna Choromanska · John Langford · Maryam Majzoubi · Yashoteja Prabhu -
2020 Poster: Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning »
Dipendra Kumar Misra · Mikael Henaff · Akshay Krishnamurthy · John Langford -
2020 Poster: Feature Selection using Stochastic Gates »
Yutaro Yamada · Ofir Lindenbaum · Sahand Negahban · Yuval Kluger -
2019 : panel discussion with Craig Boutilier (Google Research), Emma Brunskill (Stanford), Chelsea Finn (Google Brain, Stanford, UC Berkeley), Mohammad Ghavamzadeh (Facebook AI), John Langford (Microsoft Research) and David Silver (Deepmind) »
Peter Stone · Craig Boutilier · Emma Brunskill · Chelsea Finn · John Langford · David Silver · Mohammad Ghavamzadeh -
2019 : Poster Session 1 (all papers) »
Matilde Gargiani · Yochai Zur · Chaim Baskin · Evgenii Zheltonozhskii · Liam Li · Ameet Talwalkar · Xuedong Shang · Harkirat Singh Behl · Atilim Gunes Baydin · Ivo Couckuyt · Tom Dhaene · Chieh Lin · Wei Wei · Min Sun · Orchid Majumder · Michele Donini · Yoshihiko Ozaki · Ryan P. Adams · Christian Geißler · Ping Luo · zhanglin peng · · Ruimao Zhang · John Langford · Rich Caruana · Debadeepta Dey · Charles Weill · Xavi Gonzalvo · Scott Yang · Scott Yak · Eugen Hotaj · Vladimir Macko · Mehryar Mohri · Corinna Cortes · Stefan Webb · Jonathan Chen · Martin Jankowiak · Noah Goodman · Aaron Klein · Frank Hutter · Mojan Javaheripi · Mohammad Samragh · Sungbin Lim · Taesup Kim · SUNGWOONG KIM · Michael Volpp · Iddo Drori · Yamuna Krishnamurthy · Kyunghyun Cho · Stanislaw Jastrzebski · Quentin de Laroussilhe · Mingxing Tan · Xiao Ma · Neil Houlsby · Andrea Gesmundo · Zalán Borsos · Krzysztof Maziarz · Felipe Petroski Such · Joel Lehman · Kenneth Stanley · Jeff Clune · Pieter Gijsbers · Joaquin Vanschoren · Felix Mohr · Eyke Hüllermeier · Zheng Xiong · Wenpeng Zhang · Wenwu Zhu · Weijia Shao · Aleksandra Faust · Michal Valko · Michael Y Li · Hugo Jair Escalante · Marcel Wever · Andrey Khorlin · Tara Javidi · Anthony Francis · Saurajit Mukherjee · Jungtaek Kim · Michael McCourt · Saehoon Kim · Tackgeun You · Seungjin Choi · Nicolas Knudde · Alexander Tornede · Ghassen Jerfel -
2019 : invited talk by John Langford (Microsoft Research): How do we make Real World Reinforcement Learning revolution? »
John Langford -
2019 Poster: Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case »
Alina Beygelzimer · David Pal · Balazs Szorenyi · Devanathan Thiruvenkatachari · Chen-Yu Wei · Chicheng Zhang -
2019 Poster: Non-Monotonic Sequential Text Generation »
Sean Welleck · Kiante Brantley · Hal Daumé III · Kyunghyun Cho -
2019 Poster: Fair Regression: Quantitative Definitions and Reduction-Based Algorithms »
Alekh Agarwal · Miroslav Dudik · Steven Wu -
2019 Oral: Non-Monotonic Sequential Text Generation »
Sean Welleck · Kiante Brantley · Hal Daumé III · Kyunghyun Cho -
2019 Oral: Fair Regression: Quantitative Definitions and Reduction-Based Algorithms »
Alekh Agarwal · Miroslav Dudik · Steven Wu -
2019 Oral: Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case »
Alina Beygelzimer · David Pal · Balazs Szorenyi · Devanathan Thiruvenkatachari · Chen-Yu Wei · Chicheng Zhang -
2019 Poster: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2019 Poster: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daumé III · John Langford · Paul Mineiro -
2019 Oral: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2019 Oral: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daumé III · John Langford · Paul Mineiro -
2018 Poster: Hierarchical Imitation and Reinforcement Learning »
Hoang Le · Nan Jiang · Alekh Agarwal · Miroslav Dudik · Yisong Yue · Hal Daumé III -
2018 Poster: A Reductions Approach to Fair Classification »
Alekh Agarwal · Alina Beygelzimer · Miroslav Dudik · John Langford · Hanna Wallach -
2018 Oral: Hierarchical Imitation and Reinforcement Learning »
Hoang Le · Nan Jiang · Alekh Agarwal · Miroslav Dudik · Yisong Yue · Hal Daumé III -
2018 Oral: A Reductions Approach to Fair Classification »
Alekh Agarwal · Alina Beygelzimer · Miroslav Dudik · John Langford · Hanna Wallach -
2018 Poster: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Oral: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Poster: Learning Deep ResNet Blocks Sequentially using Boosting Theory »
Furong Huang · Jordan Ash · John Langford · Robert Schapire -
2018 Oral: Learning Deep ResNet Blocks Sequentially using Boosting Theory »
Furong Huang · Jordan Ash · John Langford · Robert Schapire -
2017 : Corralling a Band of Bandit Algorithms »
Alekh Agarwal -
2017 Poster: On Approximation Guarantees for Greedy Low Rank Optimization »
RAJIV KHANNA · Ethan R. Elenberg · Alexandros Dimakis · Joydeep Ghosh · Sahand Negahban -
2017 Talk: On Approximation Guarantees for Greedy Low Rank Optimization »
RAJIV KHANNA · Ethan R. Elenberg · Alexandros Dimakis · Joydeep Ghosh · Sahand Negahban -
2017 Poster: Contextual Decision Processes with low Bellman rank are PAC-Learnable »
Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2017 Poster: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik -
2017 Talk: Contextual Decision Processes with low Bellman rank are PAC-Learnable »
Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2017 Talk: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik -
2017 Poster: Logarithmic Time One-Against-Some »
Hal Daumé · Nikos Karampatziakis · John Langford · Paul Mineiro -
2017 Poster: Active Learning for Cost-Sensitive Classification »
Akshay Krishnamurthy · Alekh Agarwal · Tzu-Kuo Huang · Hal Daumé III · John Langford -
2017 Talk: Active Learning for Cost-Sensitive Classification »
Akshay Krishnamurthy · Alekh Agarwal · Tzu-Kuo Huang · Hal Daumé III · John Langford -
2017 Talk: Logarithmic Time One-Against-Some »
Hal Daumé · Nikos Karampatziakis · John Langford · Paul Mineiro -
2017 Tutorial: Real World Interactive Learning »
Alekh Agarwal · John Langford