Skip to yearly menu bar Skip to main content


( events)   Timezone:  
Workshop
Sat Jul 23 05:45 AM -- 03:00 PM (PDT) @ Room 314 - 315
Complex feedback in online learning
Rémy Degenne · Pierre Gaillard · Wouter Koolen · Aadirupa Saha





Workshop Home Page

While online learning has become one of the most successful and studied approaches in machine learning, in particular with reinforcement learning, online learning algorithms still interact with their environments in a very simple way.The complexity and diversity of the feedback coming from the environment in real applications is often reduced to the observation of a scalar reward. More and more researchers now seek to exploit fully the available feedback to allow faster and more human-like learning.This workshop aims to present a broad overview of the feedback types being actively researched, highlight recent advances and provide a networking forum for researchers and practitioners.

Opening remarks (Remarks)
Learning from Preference Feedback in Combinatorial Action Spaces (Invited Speaker)
Delayed Feedback in Generalised Linear Bandits Revisited (Invited Speaker)
Break
Online learning in digital markets (Invited speaker)
Beyond Learning from Demonstrations (Invited Speaker)
Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback (Oral)
Contextual Inverse Optimization: Offline and Online Learning (Oral)
Lunch Break (Break)
Decentralized Learning in Online Queuing Systems (Invited Speaker)
Giving Complex Feedback in Online Student Learning with Meta-Exploration (Oral)
Threshold Bandit Problem with Link Assumption between Pulls and Duels (Oral)
Meta-learning from Learning Curves Challenge: Lessons learned from the First Round and Design of the Second Round (Oral)
Break
Poster session (Poster Session)
Prescriptive solutions in games: from theory to scale (Invited Speaker)
ActiveHedge: Hedge meets Active Learning (Oral)
Closing remarks (Remarks)
Online Learning with Off-Policy Feedback (Poster)
Dynamical Linear Bandits for Long-Lasting Vanishing Rewards (Poster)
Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk (Poster)
Stochastic Rising Bandits for Online Model Selection (Poster)
You Only Live Once: Single-Life Reinforcement Learning via Learned Reward Shaping (Poster)
Provably Correct SGD-based Exploration for Linear Bandit (Poster)
Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms (Poster)
Optimal Parameter-free Online Learning with Switching Cost (Poster)
Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning (Poster)
On Adaptivity and Confounding in Contextual Bandit Experiments (Poster)
Meta-learning from Learning Curves Challenge: Lessons learned from the First Round and Design of the Second Round (Poster)
Big Control Actions Help Multitask Learning of Unstable Linear Systems (Poster)
ActiveHedge: Hedge meets Active Learning (Poster)
Choosing Answers in Epsilon-Best-Answer Identification for Linear Bandits (Poster)
Contextual Inverse Optimization: Offline and Online Learning (Poster)
Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback (Poster)
On the Importance of Critical Period in Multi-stage Reinforcement Learning (Poster)
Interaction-Grounded Learning with Action-inclusive Feedback (Poster)
Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP (Poster)
Challenging Common Assumptions in Convex Reinforcement Learning (Poster)
Giving Complex Feedback in Online Student Learning with Meta-Exploration (Poster)
Threshold Bandit Problem with Link Assumption between Pulls and Duels (Poster)
Adversarial Attacks Against Imitation and Inverse Reinforcement Learning (Poster)
Beyond IID: data-driven decision-making in heterogeneous environments (Poster)
Unimodal Mono-Partite Matching in a Bandit Setting (Poster)