Timezone: »
Poster
Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality
Dhruv Malik · Conor Igoe · Yuanzhi Li · Aarti Singh
In human-interactive applications of online learning, a human's preferences or abilities are often a function of the algorithm's recent actions. Motivated by this, a significant line of work has formalized settings where an action's loss is a function of the number of times it was played in the prior $m$ timesteps, where $m$ corresponds to a bound on human memory capacity. To more faithfully capture decay of human memory with time, we introduce the Weighted Tallying Bandit (WTB), which generalizes this setting by requiring that an action's loss is a function of a *weighted* summation of the number of times it was played in the last $m$ timesteps. WTB is intractable without further assumption. So we study it under Repeated Exposure Optimality (REO), a condition requiring the existence of an action that when repetitively played will eventually yield smaller loss than any other action sequence. We study the minimization of complete policy regret (CPR), which is the strongest notion of regret, in WTB under REO. Since $m$ is often unknown, we only assume access to an upper bound $M$ on $m$. We show that for problems with $K$ actions and horizon $T$, a simple modification of the successive elimination algorithm has $\mathcal{O} \left( \sqrt{KT} + (m+M)K \right)$ CPR. Upto an additive (in lieu of mutliplicative) factor in $(m+M)K$, this recovers the classical guarantee for the far simpler stochastic multi-armed bandit with traditional regret. We additionally show that in our setting, any algorithm will suffer additive CPR of $\Omega \left( mK + M \right)$, demonstrating our result is near optimal. Our method is computationally efficient, and we experimentally demonstrate its practicality and superiority over various baselines.
Author Information
Dhruv Malik (Carnegie Mellon University)
Conor Igoe (CMU Machine Learning Department)

I work primarily on Reinforcement Learning and Design of Experiments, although I am broadly interested in Uncertainty in Machine Learning.
Yuanzhi Li (CMU)
Aarti Singh (Carnegie Mellon University)
More from the Same Authors
-
2020 : Contributed Talk: Catch Me if I Can: Detecting Strategic Behaviour in Peer Assessment »
Ivan Stelmakh · Nihar Shah · Aarti Singh -
2021 : When Is Generalizable Reinforcement Learning Tractable? »
Dhruv Malik · Yuanzhi Li · Pradeep Ravikumar -
2021 : Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 : Towards understanding how momentum improves generalization in deep learning »
Samy Jelassi · Yuanzhi Li -
2022 : Threshold Bandit Problem with Link Assumption between Pulls and Duels »
Keshav Narayan · Aarti Singh -
2023 : How Does Adaptive Optimization Impact Local Neural Network Geometry? »
Kaiqi Jiang · Dhruv Malik · Yuanzhi Li -
2023 : Plan, Eliminate, and Track --- Language Models are Good Teachers for Embodied Agents. »
Yue Wu · So Yeon Min · Yonatan Bisk · Ruslan Salakhutdinov · Amos Azaria · Yuanzhi Li · Tom Mitchell · Shrimai Prabhumoye -
2023 : SPRING: Studying Papers and Reasoning to play Games »
Yue Wu · Shrimai Prabhumoye · So Yeon Min · Yonatan Bisk · Ruslan Salakhutdinov · Amos Azaria · Tom Mitchell · Yuanzhi Li -
2023 : How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding »
Yuchen Li · Yuanzhi Li · Andrej Risteski -
2023 Poster: How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding »
Yuchen Li · Yuanzhi Li · Andrej Risteski -
2023 Poster: The Benefits of Mixup for Feature Learning »
Difan Zou · Yuan Cao · Yuanzhi Li · Quanquan Gu -
2023 Poster: The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms »
Anirudh Vemula · Yuda Song · Aarti Singh · J. Bagnell · Sanjiban Choudhury -
2022 : Threshold Bandit Problem with Link Assumption between Pulls and Duels »
Keshav Narayan · Aarti Singh -
2022 Poster: Towards understanding how momentum improves generalization in deep learning »
Samy Jelassi · Yuanzhi Li -
2022 Spotlight: Towards understanding how momentum improves generalization in deep learning »
Samy Jelassi · Yuanzhi Li -
2021 : Towards understanding how momentum improves generalization in deep learning »
Samy Jelassi · Yuanzhi Li -
2021 Poster: Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 Spotlight: Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity »
Dhruv Malik · Aldo Pacchiano · Vishwak Srinivasan · Yuanzhi Li -
2021 Poster: Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning »
Zixin Wen · Yuanzhi Li -
2021 Spotlight: Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning »
Zixin Wen · Yuanzhi Li -
2018 Poster: Nonparametric Regression with Comparisons: Escaping the Curse of Dimensionality with Ordinal Information »
Yichong Xu · Hariank Muthakana · Sivaraman Balakrishnan · Aarti Singh · Artur Dubrawski -
2018 Oral: Nonparametric Regression with Comparisons: Escaping the Curse of Dimensionality with Ordinal Information »
Yichong Xu · Hariank Muthakana · Sivaraman Balakrishnan · Aarti Singh · Artur Dubrawski -
2017 Poster: Near-Optimal Design of Experiments via Regret Minimization »
Zeyuan Allen-Zhu · Yuanzhi Li · Aarti Singh · Yining Wang -
2017 Talk: Near-Optimal Design of Experiments via Regret Minimization »
Zeyuan Allen-Zhu · Yuanzhi Li · Aarti Singh · Yining Wang -
2017 Poster: Uncorrelation and Evenness: a New Diversity-Promoting Regularizer »
Pengtao Xie · Aarti Singh · Eric Xing -
2017 Talk: Uncorrelation and Evenness: a New Diversity-Promoting Regularizer »
Pengtao Xie · Aarti Singh · Eric Xing