Timezone: »
In the past decade, contextual bandit and reinforcement learning algorithms have been successfully used in various interactive learning systems such as online advertising, recommender systems, and dynamic pricing. However, they have yet to be widely adopted in high-stakes application domains, such as healthcare. One reason may be that existing approaches assume that the underlying mechanisms are static in the sense that they do not change over different environments. In many real world systems, however, the mechanisms are subject to shifts across environments which may invalidate the static environment assumption. In this paper, we tackle the problem of environmental shifts under the framework of offline contextual bandits. We view the environmental shift problem through the lens of causality and propose multi-environment contextual bandits that allow for changes in the underlying mechanisms. We adopt the concept of invariance from the causality literature and introduce the notion of policy invariance. We argue that policy invariance is only relevant if unobserved confounders are present and show that, in that case, an optimal invariant policy is guaranteed to generalize across environments under suitable assumptions. Our results do not only provide a solution to the environmental shift problem but also establish concrete connections among causality, invariance and contextual bandits.
Author Information
Sorawit Saengkyongam (University of Copenhagen)
Nikolaj Thams (University of Copenhagen)
Jonas Peters (University of Copenhagen)
Niklas Pfister (University of Copenhagen)
More from the Same Authors
-
2022 : Evaluating Robustness to Dataset Shift via Parametric Robustness Sets »
Michael Oberst · Nikolaj Thams · David Sontag -
2022 : Evaluating Robustness to Dataset Shift via Parametric Robustness Sets »
Nikolaj Thams · Michael Oberst · David Sontag -
2022 Poster: Invariant Ancestry Search »
Phillip Bredahl Mogensen · Nikolaj Thams · Jonas Peters -
2022 Spotlight: Invariant Ancestry Search »
Phillip Bredahl Mogensen · Nikolaj Thams · Jonas Peters -
2022 Poster: Exploiting Independent Instruments: Identification and Distribution Generalization »
Sorawit Saengkyongam · Leonard Henckel · Niklas Pfister · Jonas Peters -
2022 Spotlight: Exploiting Independent Instruments: Identification and Distribution Generalization »
Sorawit Saengkyongam · Leonard Henckel · Niklas Pfister · Jonas Peters -
2021 Poster: Regularizing towards Causal Invariance: Linear Models with Proxies »
Michael Oberst · Nikolaj Thams · Jonas Peters · David Sontag -
2021 Spotlight: Regularizing towards Causal Invariance: Linear Models with Proxies »
Michael Oberst · Nikolaj Thams · Jonas Peters · David Sontag