Timezone: »
We study decision making problems in which an agent sequentially interacts with a stochastic environment defined by means of a tree structure. The agent repeatedly faces the environment over time, and, after each round, it perceives a utility and a cost, which are both stochastic. The goal of the agent is to learn an optimal strategy in an online fashion, while, at the same time, keeping costs below a given safety threshold. Our model naturally fits many real-world scenarios, such as, e.g., opponent exploitation in games and web link selection. We study the hard-threshold problem of achieving sublinear regret while guaranteeing that the threshold constraint is satisfied at every iteration with high probability. First, we show that, in general, any algorithm with such a guarantee incurs in a linear regret. This motivates the introduction of a relaxed problem, namely the soft-threshold problem, in which we only require that the cumulative violation of the threshold constraint grows sublinearly, and, thus, we can provide an algorithm with sublinear regret. Next, we show how, in the hard-threshold problem, a sublinear regret algorithm can be designed under the additional assumption that there exists a known strategy strictly satisfying the threshold constraint. We also show that our regret bounds are tight. Finally, we cast the opponent exploitation problem to our model, and we experimentally evaluate our algorithms on a standard testbed of games.
Author Information
Martino Bernasconi (Politecnico di Milano)
Federico Cacciamani (Politecnico di Milano)
Matteo Castiglioni (Politecnico di Milano)
Alberto Marchesi (Politecnico di Milano)
Nicola Gatti (Politecnico di Milano)
Francesco Trovò (Politecnico di Milano)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Spotlight: Safe Learning in Tree-Form Sequential Decision Making: Handling Hard and Soft Constraints »
Thu. Jul 21st 05:50 -- 05:55 PM Room Ballroom 3 & 4
More from the Same Authors
-
2022 : Stochastic Rising Bandits for Online Model Selection »
Alberto Maria Metelli · Francesco Trovò · Matteo Pirola · Marcello Restelli -
2023 Poster: Constrained Phi-Equilibria »
Matteo Castiglioni · Martino Bernasconi · Alberto Marchesi · Francesco Trovò · Nicola Gatti -
2023 Poster: Online Mechanism Design for Information Acquisition »
Federico Cacciamani · Matteo Castiglioni · Nicola Gatti -
2023 Poster: Optimal Rates and Efficient Algorithms for Online Bayesian Persuasion »
Martino Bernasconi · Matteo Castiglioni · Andrea Celli · Alberto Marchesi · Francesco Trovò · Nicola Gatti -
2022 Poster: Online Learning with Knapsacks: the Best of Both Worlds »
Matteo Castiglioni · Andrea Celli · Christian Kroer -
2022 Poster: A Marriage between Adversarial Team Games and 2-player Games: Enabling Abstractions, No-regret Learning, and Subgame Solving »
Luca Carminati · Federico Cacciamani · Marco Ciccone · Nicola Gatti -
2022 Spotlight: Online Learning with Knapsacks: the Best of Both Worlds »
Matteo Castiglioni · Andrea Celli · Christian Kroer -
2022 Spotlight: A Marriage between Adversarial Team Games and 2-player Games: Enabling Abstractions, No-regret Learning, and Subgame Solving »
Luca Carminati · Federico Cacciamani · Marco Ciccone · Nicola Gatti -
2022 Poster: Stochastic Rising Bandits »
Alberto Maria Metelli · Francesco Trovò · Matteo Pirola · Marcello Restelli -
2022 Spotlight: Stochastic Rising Bandits »
Alberto Maria Metelli · Francesco Trovò · Matteo Pirola · Marcello Restelli -
2021 Poster: Multi-Receiver Online Bayesian Persuasion »
Matteo Castiglioni · Alberto Marchesi · Andrea Celli · Nicola Gatti -
2021 Spotlight: Multi-Receiver Online Bayesian Persuasion »
Matteo Castiglioni · Alberto Marchesi · Andrea Celli · Nicola Gatti -
2021 Poster: Connecting Optimal Ex-Ante Collusion in Teams to Extensive-Form Correlation: Faster Algorithms and Positive Complexity Results »
Gabriele Farina · Andrea Celli · Nicola Gatti · Tuomas Sandholm -
2021 Spotlight: Connecting Optimal Ex-Ante Collusion in Teams to Extensive-Form Correlation: Faster Algorithms and Positive Complexity Results »
Gabriele Farina · Andrea Celli · Nicola Gatti · Tuomas Sandholm