Timezone: »
Spotlight
Improved Regret Bounds of Bilinear Bandits using Action Space Analysis
Kyoungseok Jang · Kwang-Sung Jun · Se-Young Yun · Wanmo Kang
We consider the bilinear bandit problem where the learner chooses a pair of arms, each from two different action spaces of dimension $d_1$ and $d_2$, respectively. The learner then receives a reward whose expectation is a bilinear function of the two chosen arms with an unknown matrix parameter $\Theta^*\in\mathbb{R}^{d_1 \times d_2}$ with rank $r$. Despite abundant applications such as drug discovery, the optimal regret rate is unknown for this problem, though it was conjectured to be $\tilde O(\sqrt{d_1d_2(d_1+d_2)r T})$ by Jun et al. (2019) where $\tilde O$ ignores polylogarithmic factors in $T$. In this paper, we make progress towards closing the gap between the upper and lower bound on the optimal regret. First, we reject the conjecture above by proposing algorithms that achieve the regret $\tilde O(\sqrt{d_1 d_2 (d_1+d_2) T})$ using the fact that the action space dimension $O(d_1+d_2)$ is significantly lower than the matrix parameter dimension $O(d_1 d_2)$. Second, we additionally devise an algorithm with better empirical performance than previous algorithms.
Author Information
Kyoungseok Jang (KAIST)
Kwang-Sung Jun (University of Arizona)
Se-Young Yun (KAIST)
Wanmo Kang (KAIST)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Improved Regret Bounds of Bilinear Bandits using Action Space Analysis »
Thu. Jul 22nd 04:00 -- 06:00 AM Room Virtual
More from the Same Authors
-
2022 Poster: Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation »
Dongjun Kim · Seungjae Shin · Kyungwoo Song · Wanmo Kang · IL CHUL MOON -
2022 Spotlight: Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation »
Dongjun Kim · Seungjae Shin · Kyungwoo Song · Wanmo Kang · IL CHUL MOON -
2022 Poster: Rotting infinitely many-armed bandits »
Jung-hun Kim · Milan Vojnovic · Se-Young Yun -
2022 Spotlight: Rotting infinitely many-armed bandits »
Jung-hun Kim · Milan Vojnovic · Se-Young Yun -
2021 Poster: Improved Confidence Bounds for the Linear Logistic Model and Applications to Bandits »
Kwang-Sung Jun · Lalit Jain · Blake Mason · Houssam Nassif -
2021 Spotlight: Improved Confidence Bounds for the Linear Logistic Model and Applications to Bandits »
Kwang-Sung Jun · Lalit Jain · Blake Mason · Houssam Nassif -
2020 : Short Talk 1 - Crush Optimism with Pessimism: Structured Bandits Beyond Asymptotic Optimality »
Kwang-Sung Jun -
2019 Poster: Spectral Approximate Inference »
Sejun Park · Eunho Yang · Se-Young Yun · Jinwoo Shin -
2019 Oral: Spectral Approximate Inference »
Sejun Park · Eunho Yang · Se-Young Yun · Jinwoo Shin