Timezone: »
Oral
Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback
Chicheng Zhang · Alekh Agarwal · Hal Daume · John Langford · Sahand Negahban
We investigate the feasibility of learning from both fully-labeled supervised data and contextual bandit data. We specifically consider settings in which the underlying learning signal may be different between these two data sources. Theoretically, we state and prove no-regret algorithms for learning that is robust to divergences between the two sources. Empirically, we evaluate some of these algorithms on a large selection of datasets, showing that our approaches are feasible, and helpful in practice.
Author Information
Chicheng Zhang (Microsoft Research)
Alekh Agarwal (Microsoft Research)
Hal Daume (Microsoft Research)
John Langford (Microsoft Research)
Sahand Negahban (YALE)
Related Events (a corresponding poster, oral, or spotlight)
-
2019 Poster: Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback »
Fri Jun 14th 01:30 -- 04:00 AM Room Pacific Ballroom
More from the Same Authors
-
2020 Workshop: Workshop on eXtreme Classification: Theory and Applications »
Anna Choromanska · John Langford · Maryam Majzoubi · Yashoteja Prabhu -
2020 Poster: Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning »
Dipendra Misra · Mikael Henaff · Akshay Krishnamurthy · John Langford -
2020 Poster: Feature Selection using Stochastic Gates »
Yutaro Yamada · Ofir Lindenbaum · Sahand Negahban · Yuval Kluger -
2019 Poster: Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case »
Alina Beygelzimer · David Pal · Balazs Szorenyi · Devanathan Thiruvenkatachari · Chen-Yu Wei · Chicheng Zhang -
2019 Poster: Non-Monotonic Sequential Text Generation »
Sean Welleck · Kiante Brantley · Hal Daume · Kyunghyun Cho -
2019 Poster: Fair Regression: Quantitative Definitions and Reduction-Based Algorithms »
Alekh Agarwal · Miroslav Dudik · Steven Wu -
2019 Oral: Non-Monotonic Sequential Text Generation »
Sean Welleck · Kiante Brantley · Hal Daume · Kyunghyun Cho -
2019 Oral: Fair Regression: Quantitative Definitions and Reduction-Based Algorithms »
Alekh Agarwal · Miroslav Dudik · Steven Wu -
2019 Oral: Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case »
Alina Beygelzimer · David Pal · Balazs Szorenyi · Devanathan Thiruvenkatachari · Chen-Yu Wei · Chicheng Zhang -
2019 Poster: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2019 Poster: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daume · John Langford · Paul Mineiro -
2019 Oral: Provably efficient RL with Rich Observations via Latent State Decoding »
Simon Du · Akshay Krishnamurthy · Nan Jiang · Alekh Agarwal · Miroslav Dudik · John Langford -
2019 Oral: Contextual Memory Trees »
Wen Sun · Alina Beygelzimer · Hal Daume · John Langford · Paul Mineiro -
2018 Poster: Hierarchical Imitation and Reinforcement Learning »
Hoang Le · Nan Jiang · Alekh Agarwal · Miroslav Dudik · Yisong Yue · Hal Daume -
2018 Poster: A Reductions Approach to Fair Classification »
Alekh Agarwal · Alina Beygelzimer · Miroslav Dudik · John Langford · Hanna Wallach -
2018 Oral: Hierarchical Imitation and Reinforcement Learning »
Hoang Le · Nan Jiang · Alekh Agarwal · Miroslav Dudik · Yisong Yue · Hal Daume -
2018 Oral: A Reductions Approach to Fair Classification »
Alekh Agarwal · Alina Beygelzimer · Miroslav Dudik · John Langford · Hanna Wallach -
2018 Poster: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Oral: Practical Contextual Bandits with Regression Oracles »
Dylan Foster · Alekh Agarwal · Miroslav Dudik · Haipeng Luo · Robert Schapire -
2018 Poster: Learning Deep ResNet Blocks Sequentially using Boosting Theory »
Furong Huang · Jordan Ash · John Langford · Robert Schapire -
2018 Oral: Learning Deep ResNet Blocks Sequentially using Boosting Theory »
Furong Huang · Jordan Ash · John Langford · Robert Schapire -
2017 Poster: On Approximation Guarantees for Greedy Low Rank Optimization »
RAJIV KHANNA · Ethan R. Elenberg · Alexandros Dimakis · Joydeep Ghosh · Sahand Negahban -
2017 Talk: On Approximation Guarantees for Greedy Low Rank Optimization »
RAJIV KHANNA · Ethan R. Elenberg · Alexandros Dimakis · Joydeep Ghosh · Sahand Negahban -
2017 Poster: Contextual Decision Processes with low Bellman rank are PAC-Learnable »
Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2017 Poster: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik -
2017 Talk: Contextual Decision Processes with low Bellman rank are PAC-Learnable »
Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2017 Talk: Optimal and Adaptive Off-policy Evaluation in Contextual Bandits »
Yu-Xiang Wang · Alekh Agarwal · Miroslav Dudik -
2017 Poster: Logarithmic Time One-Against-Some »
Hal Daumé · Nikos Karampatziakis · John Langford · Paul Mineiro -
2017 Poster: Active Learning for Cost-Sensitive Classification »
Akshay Krishnamurthy · Alekh Agarwal · Tzu-Kuo Huang · Hal Daumé III · John Langford -
2017 Talk: Active Learning for Cost-Sensitive Classification »
Akshay Krishnamurthy · Alekh Agarwal · Tzu-Kuo Huang · Hal Daumé III · John Langford -
2017 Talk: Logarithmic Time One-Against-Some »
Hal Daumé · Nikos Karampatziakis · John Langford · Paul Mineiro -
2017 Tutorial: Real World Interactive Learning »
Alekh Agarwal · John Langford