Timezone: »
Stochastic linear bandits are a natural and well-studied model for structured exploration/exploitation problems and are widely used in applications such as on-line marketing and recommendation. One of the main challenges faced by practitioners hoping to apply existing algorithms is that usually the feedback is randomly delayed and delays are only partially observable. For example, while a purchase is usually observable some time after the display, the decision of not buying is never explicitly sent to the system. In other words, the learner only observes delayed positive events. We formalize this problem as a novel stochastic delayed linear bandit and propose OTFLinUCB and OTFLinTS, two computationally efficient algorithms able to integrate new information as it becomes available and to deal with the permanently censored feedback. We prove optimal O(d\sqrt{T}) bounds on the regret of the first algorithm and study the dependency on delay-dependent parameters. Our model, assumptions and results are validated by experiments on simulated and real data.
Author Information
Claire Vernade (DeepMind)
Alexandra Carpentier (Otto-von-Guericke University)
Tor Lattimore (DeepMind)
Giovanni Zappella (Amazon)
Beyza Ermis (Amazon Research)
Michael Brueckner (Amazon Research Berlin)
More from the Same Authors
-
2021 : A resource-efficient method for repeated HPO and NAS problems »
Giovanni Zappella · David Salinas · Cedric Archambeau -
2022 : Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms »
MohammadJavad Azizi · Thang Duong · Yasin Abbasi-Yadkori · Claire Vernade · András György · Mohammad Ghavamzadeh -
2022 Poster: Contextual Information-Directed Sampling »
Botao Hao · Tor Lattimore · Chao Qin -
2022 Spotlight: Contextual Information-Directed Sampling »
Botao Hao · Tor Lattimore · Chao Qin -
2021 Poster: Problem Dependent View on Structured Thresholding Bandit Problems »
James Cheshire · Pierre Menard · Alexandra Carpentier -
2021 Spotlight: Problem Dependent View on Structured Thresholding Bandit Problems »
James Cheshire · Pierre Menard · Alexandra Carpentier -
2021 Poster: Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient »
Botao Hao · Yaqi Duan · Tor Lattimore · Csaba Szepesvari · Mengdi Wang -
2021 Spotlight: Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient »
Botao Hao · Yaqi Duan · Tor Lattimore · Csaba Szepesvari · Mengdi Wang -
2021 Poster: On the Optimality of Batch Policy Optimization Algorithms »
Chenjun Xiao · Yifan Wu · Jincheng Mei · Bo Dai · Tor Lattimore · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2021 Spotlight: On the Optimality of Batch Policy Optimization Algorithms »
Chenjun Xiao · Yifan Wu · Jincheng Mei · Bo Dai · Tor Lattimore · Lihong Li · Csaba Szepesvari · Dale Schuurmans -
2020 Poster: Stochastic bandits with arm-dependent delays »
Anne Gael Manegueu · Claire Vernade · Alexandra Carpentier · Michal Valko -
2020 Poster: Non-Stationary Delayed Bandits with Intermediate Observations »
Claire Vernade · András György · Timothy Mann -
2020 Poster: Learning with Good Feature Representations in Bandits and in RL with a Generative Model »
Tor Lattimore · Csaba Szepesvari · Gellért Weisz -
2019 Poster: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2019 Poster: Online Learning to Rank with Features »
Shuai Li · Tor Lattimore · Csaba Szepesvari -
2019 Oral: Online Learning to Rank with Features »
Shuai Li · Tor Lattimore · Csaba Szepesvari -
2019 Oral: Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits »
Branislav Kveton · Csaba Szepesvari · Sharan Vaswani · Zheng Wen · Tor Lattimore · Mohammad Ghavamzadeh -
2017 Poster: On Context-Dependent Clustering of Bandits »
Claudio Gentile · Shuai Li · Purushottam Kar · Alexandros Karatzoglou · Giovanni Zappella · Evans Etrue Howard -
2017 Talk: On Context-Dependent Clustering of Bandits »
Claudio Gentile · Shuai Li · Purushottam Kar · Alexandros Karatzoglou · Giovanni Zappella · Evans Etrue Howard