Skip to yearly menu bar Skip to main content


Poster Wed, Jul 16, 2025 • 11:00 AM – 1:30 PM PDT

Near-optimal Regret Using Policy Optimization in Online MDPs with Aggregate Bandit Feedback

Tal Lancewicki · Yishay Mansour

Abstract

Lay Summary

Video

Chat is not available.