Timezone: »
In many real-world sequential decision-making problems, an action might not immediately reflect on the feedback and spread its effects over a long time horizon. For instance, in online advertising, investing in a platform produces an increase in awareness, but the actual reward (e.g., a conversion) might occur far in the future. Furthermore, whether a conversion takes place depends on several factors: how fast the awareness grows; possible awareness vanishing effects; synergy or interference with other advertising platforms. Previous work has investigated the Multi-Armed Bandit framework with the possibility of delayed and aggregated feedback, without a particular structure on how an action propagates into the future, disregarding possible hidden dynamical effects. In this paper, we introduce a novel setting, Dynamical Linear Bandits (DLB), an extension of linear bandits characterized by a hidden state. When an action is performed, the learner observes a noisy reward whose mean is a linear function of the hidden state and the action, and the hidden state evolves according to a linear dynamics. In this way, the effects of each action are delayed by the system evolution, persist over time, and the interplay between the action components is taken into account. After introducing the setting and discussing the notion of optimal policy, we provide an any-time optimistic regret minimization algorithm Dynamical Linear Upper Confidence Bound (DynLin-UCB) that suffers regret of order O(c d sqrt(T)), where c is a constant dependent on the properties of the linear dynamical evolution. Finally, we conduct a numerical validation on a synthetic environment to show the effectiveness of DynLin-UCB in comparison with bandit baselines.
Author Information
Marco Mussi (Politecnico di Milano)
Alberto Maria Metelli (Politecnico di Milano)
Marcello Restelli (Politecnico di Milano)
More from the Same Authors
-
2021 : Meta Learning the Step Size in Policy Gradient Methods »
Luca Sabbioni · Francesco Corda · Marcello Restelli -
2021 : Subgaussian Importance Sampling for Off-Policy Evaluation and Learning »
Alberto Maria Metelli · Alessio Russo · Marcello Restelli -
2021 : The Importance of Non-Markovianity in Maximum State Entropy Exploration »
Mirco Mutti · Riccardo De Santi · Marcello Restelli -
2021 : Efficient Inverse Reinforcement Learning of Transferable Rewards »
Giorgia Ramponi · Alberto Maria Metelli · Marcello Restelli -
2021 : Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection »
Matteo Papini · Andrea Tirinzoni · Aldo Pacchiano · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 : Reward-Free Policy Space Compression for Reinforcement Learning »
Mirco Mutti · Stefano Del Col · Marcello Restelli -
2021 : Learning to Explore Multiple Environments without Rewards »
Mirco Mutti · Mattia Mancassola · Marcello Restelli -
2021 : The Importance of Non-Markovianity in Maximum State Entropy Exploration »
Mirco Mutti · Riccardo De Santi · Marcello Restelli -
2022 : Challenging Common Assumptions in Convex Reinforcement Learning »
Mirco Mutti · Riccardo De Santi · Piersilvio De Bartolomeis · Marcello Restelli -
2022 : Stochastic Rising Bandits for Online Model Selection »
Alberto Maria Metelli · Francesco Trovò · Matteo Pirola · Marcello Restelli -
2022 : Invariance Discovery for Systematic Generalization in Reinforcement Learning »
Mirco Mutti · Riccardo De Santi · Emanuele Rossi · Juan Calderon · Michael Bronstein · Marcello Restelli -
2022 : Recursive History Representations for Unsupervised Reinforcement Learning in Multiple-Environments »
Mirco Mutti · Pietro Maldini · Riccardo De Santi · Marcello Restelli -
2022 : Directed Exploration via Uncertainty-Aware Critics »
Amarildo Likmeta · Matteo Sacco · Alberto Maria Metelli · Marcello Restelli -
2022 : Non-Markovian Policies for Unsupervised Reinforcement Learning in Multiple Environments »
Pietro Maldini · Mirco Mutti · Riccardo De Santi · Marcello Restelli -
2023 : A Best Arm Identification Approach for Stochastic Rising Bandits »
Alessandro Montenegro · Marco Mussi · Francesco Trovò · Marcello Restelli · Alberto Maria Metelli -
2023 : Parameterized projected Bellman operator »
Théo Vincent · Alberto Maria Metelli · Jan Peters · Marcello Restelli · Carlo D'Eramo -
2023 Poster: Dynamical Linear Bandits »
Marco Mussi · Alberto Maria Metelli · Marcello Restelli -
2023 Oral: Towards Theoretical Understanding of Inverse Reinforcement Learning »
Alberto Maria Metelli · Filippo Lazzati · Marcello Restelli -
2023 Poster: Towards Theoretical Understanding of Inverse Reinforcement Learning »
Alberto Maria Metelli · Filippo Lazzati · Marcello Restelli -
2023 Poster: Truncating Trajectories in Monte Carlo Reinforcement Learning »
Riccardo Poiani · Alberto Maria Metelli · Marcello Restelli -
2022 Poster: The Importance of Non-Markovianity in Maximum State Entropy Exploration »
Mirco Mutti · Riccardo De Santi · Marcello Restelli -
2022 Poster: Balancing Sample Efficiency and Suboptimality in Inverse Reinforcement Learning »
Angelo Damiani · Giorgio Manganini · Alberto Maria Metelli · Marcello Restelli -
2022 Spotlight: Balancing Sample Efficiency and Suboptimality in Inverse Reinforcement Learning »
Angelo Damiani · Giorgio Manganini · Alberto Maria Metelli · Marcello Restelli -
2022 Oral: The Importance of Non-Markovianity in Maximum State Entropy Exploration »
Mirco Mutti · Riccardo De Santi · Marcello Restelli -
2022 Poster: Stochastic Rising Bandits »
Alberto Maria Metelli · Francesco Trovò · Matteo Pirola · Marcello Restelli -
2022 Poster: Delayed Reinforcement Learning by Imitation »
Pierre Liotet · Davide Maran · Lorenzo Bisi · Marcello Restelli -
2022 Spotlight: Delayed Reinforcement Learning by Imitation »
Pierre Liotet · Davide Maran · Lorenzo Bisi · Marcello Restelli -
2022 Spotlight: Stochastic Rising Bandits »
Alberto Maria Metelli · Francesco Trovò · Matteo Pirola · Marcello Restelli -
2021 Poster: Leveraging Good Representations in Linear Contextual Bandits »
Matteo Papini · Andrea Tirinzoni · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 Spotlight: Leveraging Good Representations in Linear Contextual Bandits »
Matteo Papini · Andrea Tirinzoni · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 Poster: Provably Efficient Learning of Transferable Rewards »
Alberto Maria Metelli · Giorgia Ramponi · Alessandro Concetti · Marcello Restelli -
2021 Spotlight: Provably Efficient Learning of Transferable Rewards »
Alberto Maria Metelli · Giorgia Ramponi · Alessandro Concetti · Marcello Restelli -
2020 Poster: Control Frequency Adaptation via Action Persistence in Batch Reinforcement Learning »
Alberto Maria Metelli · Flavio Mazzolini · Lorenzo Bisi · Luca Sabbioni · Marcello Restelli -
2020 Poster: Sequential Transfer in Reinforcement Learning with a Generative Model »
Andrea Tirinzoni · Riccardo Poiani · Marcello Restelli -
2019 Poster: Reinforcement Learning in Configurable Continuous Environments »
Alberto Maria Metelli · Emanuele Ghelfi · Marcello Restelli -
2019 Oral: Reinforcement Learning in Configurable Continuous Environments »
Alberto Maria Metelli · Emanuele Ghelfi · Marcello Restelli -
2019 Poster: Transfer of Samples in Policy Search via Multiple Importance Sampling »
Andrea Tirinzoni · Mattia Salvini · Marcello Restelli -
2019 Oral: Transfer of Samples in Policy Search via Multiple Importance Sampling »
Andrea Tirinzoni · Mattia Salvini · Marcello Restelli -
2019 Poster: Optimistic Policy Optimization via Multiple Importance Sampling »
Matteo Papini · Alberto Maria Metelli · Lorenzo Lupo · Marcello Restelli -
2019 Oral: Optimistic Policy Optimization via Multiple Importance Sampling »
Matteo Papini · Alberto Maria Metelli · Lorenzo Lupo · Marcello Restelli -
2018 Poster: Importance Weighted Transfer of Samples in Reinforcement Learning »
Andrea Tirinzoni · Andrea Sessa · Matteo Pirotta · Marcello Restelli -
2018 Poster: Stochastic Variance-Reduced Policy Gradient »
Matteo Papini · Damiano Binaghi · Giuseppe Canonaco · Matteo Pirotta · Marcello Restelli -
2018 Poster: Configurable Markov Decision Processes »
Alberto Maria Metelli · Mirco Mutti · Marcello Restelli -
2018 Oral: Importance Weighted Transfer of Samples in Reinforcement Learning »
Andrea Tirinzoni · Andrea Sessa · Matteo Pirotta · Marcello Restelli -
2018 Oral: Configurable Markov Decision Processes »
Alberto Maria Metelli · Mirco Mutti · Marcello Restelli -
2018 Oral: Stochastic Variance-Reduced Policy Gradient »
Matteo Papini · Damiano Binaghi · Giuseppe Canonaco · Matteo Pirotta · Marcello Restelli -
2017 Poster: Boosted Fitted Q-Iteration »
Samuele Tosatto · Matteo Pirotta · Carlo D'Eramo · Marcello Restelli -
2017 Talk: Boosted Fitted Q-Iteration »
Samuele Tosatto · Matteo Pirotta · Carlo D'Eramo · Marcello Restelli