( events)
Timezone: »
Workshop
Sat Jul 23 05:45 AM -- 03:00 PM (PDT) @ Room 314 - 315
Complex feedback in online learning
While online learning has become one of the most successful and studied approaches in machine learning, in particular with reinforcement learning, online learning algorithms still interact with their environments in a very simple way.The complexity and diversity of the feedback coming from the environment in real applications is often reduced to the observation of a scalar reward. More and more researchers now seek to exploit fully the available feedback to allow faster and more human-like learning.This workshop aims to present a broad overview of the feedback types being actively researched, highlight recent advances and provide a networking forum for researchers and practitioners.
Opening remarks (Remarks) | |
Learning from Preference Feedback in Combinatorial Action Spaces (Invited Speaker) | |
Delayed Feedback in Generalised Linear Bandits Revisited (Invited Speaker) | |
Break | |
Online learning in digital markets (Invited speaker) | |
Beyond Learning from Demonstrations (Invited Speaker) | |
Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback (Oral) | |
Contextual Inverse Optimization: Offline and Online Learning (Oral) | |
Lunch Break (Break) | |
Decentralized Learning in Online Queuing Systems (Invited Speaker) | |
Giving Complex Feedback in Online Student Learning with Meta-Exploration (Oral) | |
Threshold Bandit Problem with Link Assumption between Pulls and Duels (Oral) | |
Meta-learning from Learning Curves Challenge: Lessons learned from the First Round and Design of the Second Round (Oral) | |
Break | |
Poster session (Poster Session) | |
Prescriptive solutions in games: from theory to scale (Invited Speaker) | |
ActiveHedge: Hedge meets Active Learning (Oral) | |
Closing remarks (Remarks) | |
Interaction-Grounded Learning with Action-inclusive Feedback (Poster) | |
On the Importance of Critical Period in Multi-stage Reinforcement Learning (Poster) | |
Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback (Poster) | |
Giving Complex Feedback in Online Student Learning with Meta-Exploration (Poster) | |
Threshold Bandit Problem with Link Assumption between Pulls and Duels (Poster) | |
Contextual Inverse Optimization: Offline and Online Learning (Poster) | |
Meta-learning from Learning Curves Challenge: Lessons learned from the First Round and Design of the Second Round (Poster) | |
ActiveHedge: Hedge meets Active Learning (Poster) | |
Optimal Parameter-free Online Learning with Switching Cost (Poster) | |
Challenging Common Assumptions in Convex Reinforcement Learning (Poster) | |
Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms (Poster) | |
Provably Correct SGD-based Exploration for Linear Bandit (Poster) | |
You Only Live Once: Single-Life Reinforcement Learning via Learned Reward Shaping (Poster) | |
Stochastic Rising Bandits for Online Model Selection (Poster) | |
Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP (Poster) | |
Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk (Poster) | |
Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning (Poster) | |
Dynamical Linear Bandits for Long-Lasting Vanishing Rewards (Poster) | |
Online Learning with Off-Policy Feedback (Poster) | |
On Adaptivity and Confounding in Contextual Bandit Experiments (Poster) | |
Unimodal Mono-Partite Matching in a Bandit Setting (Poster) | |
Beyond IID: data-driven decision-making in heterogeneous environments (Poster) | |
Big Control Actions Help Multitask Learning of Unstable Linear Systems (Poster) | |
Adversarial Attacks Against Imitation and Inverse Reinforcement Learning (Poster) | |
Choosing Answers in Epsilon-Best-Answer Identification for Linear Bandits (Poster) | |